It is currently March 29th, 2024, 8:39 am

Help with parsing html

Get help with creating, editing & fixing problems with skins
User avatar
Youkai1977
Posts: 164
Joined: October 31st, 2018, 4:11 pm
Location: Germany

Re: Help with parsing html

Post by Youkai1977 »

balala wrote: April 3rd, 2021, 11:19 am But nor in the skin, which still doesn't work, not even with the new code. Please check once again.
Okay, curious. :uhuh:
What exactly does not work with the corrected newsfeed.ini or what error is output in the LOG?
With me the feed runs WITHOUT error in the LOG. Only the time manipulation problem is still there
- Win11 Pro x64 (23H2 - 22631.3085)
- Rainmeter 4.5.18
- Gigabyte B550i AORUS Pro AX V1.2
- Corsair Venegeance LPX 2x 16GB (32GB) DDR4 3200MHz
- RYZEN 7 5800X
- PowerColor RX570 8GB
- Samsung 980Pro 250GB (NVMe) - Drive C: Windows
- Kingston SNV2S1000G (NVMe) - Drive D: Rainmeter, Skins & Others - Drive D: Games
- NAS Synology DS216j - 2x 1GB HDDs - My Main Backup & Data Storage in my Home-Network
- Mon 1: 24" HP 24f (1920 x 1080 @ 75Hz) - Primary
- Mon 2: 22" Philips 226VL (1920 x 1080 @ 60Hz) - Secondary 1
- Mon 3: 50" Philips 50PUS7304/12 (3840 x 2160 @ 60Hz) - Secondary 2
- Corsair CX 650M Power Supply
- NZXT H210 Case
- ISP Vodafone with 1000/50 Mbit Cable Internet

The absolutly High-End Machine on 2024 ... at least the graphics card :oops: O.O :rofl:
User avatar
balala
Rainmeter Sage
Posts: 16110
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Help with parsing html

Post by balala »

Youkai1977 wrote: April 3rd, 2021, 12:51 pm Okay, curious. :uhuh:
What exactly does not work with the corrected newsfeed.ini or what error is output in the LOG?
With me the feed runs WITHOUT error in the LOG. Only the time manipulation problem is still there
Please pack the config and upload it. I think would be easiest way to get the working skin.
User avatar
Youkai1977
Posts: 164
Joined: October 31st, 2018, 4:11 pm
Location: Germany

Re: Help with parsing html

Post by Youkai1977 »

balala wrote: April 3rd, 2021, 2:36 pm Please pack the config and upload it. I think would be easiest way to get the working skin.
Ok, here the NewsFeedReader als rmskin-Package. :thumbup:
You do not have the required permissions to view the files attached to this post.
- Win11 Pro x64 (23H2 - 22631.3085)
- Rainmeter 4.5.18
- Gigabyte B550i AORUS Pro AX V1.2
- Corsair Venegeance LPX 2x 16GB (32GB) DDR4 3200MHz
- RYZEN 7 5800X
- PowerColor RX570 8GB
- Samsung 980Pro 250GB (NVMe) - Drive C: Windows
- Kingston SNV2S1000G (NVMe) - Drive D: Rainmeter, Skins & Others - Drive D: Games
- NAS Synology DS216j - 2x 1GB HDDs - My Main Backup & Data Storage in my Home-Network
- Mon 1: 24" HP 24f (1920 x 1080 @ 75Hz) - Primary
- Mon 2: 22" Philips 226VL (1920 x 1080 @ 60Hz) - Secondary 1
- Mon 3: 50" Philips 50PUS7304/12 (3840 x 2160 @ 60Hz) - Secondary 2
- Corsair CX 650M Power Supply
- NZXT H210 Case
- ISP Vodafone with 1000/50 Mbit Cable Internet

The absolutly High-End Machine on 2024 ... at least the graphics card :oops: O.O :rofl:
User avatar
balala
Rainmeter Sage
Posts: 16110
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Help with parsing html

Post by balala »

Youkai1977 wrote: April 3rd, 2021, 10:10 pm Ok, here the NewsFeedReader als rmskin-Package. :thumbup:
Alright, now got it working.
There are two dates / times, on this format: LAST NFR-UPD: <First-Date-Time> - LAST NFP-BUILD: <Second-Date-Time>.
  • First-Date-Time is the date and time of last update of parent WebParser measure ([mRSS]). It is returned by the [mNFRUT] measure, which has set an UpdateDivider=-1 option. Accordingly it is updated when the [mRSS] measure updates (due to the [!UpdateMeasure mNFRUT] bang of the FinishAction option of the measure).
  • Second-Date-Time is the date and time returned by the [mLBTSF] measure. This measure is using a TimeStamp option to get the formated form of the time returned by the [mRSS] measure, as has been described previously.
So these dates and times can't be the same. The are representing two different information, it's normal they are different. Not sure why are you waiting them to coincide.
User avatar
Youkai1977
Posts: 164
Joined: October 31st, 2018, 4:11 pm
Location: Germany

Re: Help with parsing html

Post by Youkai1977 »

balala wrote: April 4th, 2021, 3:55 pmSo these dates and times can't be the same. The are representing two different information, it's normal they are different. Not sure why are you waiting them to coincide.
Mhh, I think we're talking past each other here.
I do NOT expect the two date/time values to be identical. How could they be, if they come from 2 different measures, which evaluate/format differently.

BUT, that's exactly what it's all about, although these measures as said, actually CANNOT output identical time, especially the measure [mLBTSF] CANNOT, because it should (actually) ONLY format the read time of the [mRSS], it just does NOT, or only the hour. The minutes and seconds are taken from the realtime. See my screenshots from the day before yesterday.

The one measure should give me the timestamp when this happens when updating the [mRSS]. This works also. And the other measure [mLBTSF], should ONLY format the read <lastBuilddate> (Index1) of the [mRSS] measure from (example): Sun, 04 April 2021 19:20:20 GMT to --> 04.04.2021 19:20:20 <---.
And the latter does NOT work. Because here then the minutes and seconds of the real time, for whatever reason, are taken over.
- Win11 Pro x64 (23H2 - 22631.3085)
- Rainmeter 4.5.18
- Gigabyte B550i AORUS Pro AX V1.2
- Corsair Venegeance LPX 2x 16GB (32GB) DDR4 3200MHz
- RYZEN 7 5800X
- PowerColor RX570 8GB
- Samsung 980Pro 250GB (NVMe) - Drive C: Windows
- Kingston SNV2S1000G (NVMe) - Drive D: Rainmeter, Skins & Others - Drive D: Games
- NAS Synology DS216j - 2x 1GB HDDs - My Main Backup & Data Storage in my Home-Network
- Mon 1: 24" HP 24f (1920 x 1080 @ 75Hz) - Primary
- Mon 2: 22" Philips 226VL (1920 x 1080 @ 60Hz) - Secondary 1
- Mon 3: 50" Philips 50PUS7304/12 (3840 x 2160 @ 60Hz) - Secondary 2
- Corsair CX 650M Power Supply
- NZXT H210 Case
- ISP Vodafone with 1000/50 Mbit Cable Internet

The absolutly High-End Machine on 2024 ... at least the graphics card :oops: O.O :rofl:
User avatar
balala
Rainmeter Sage
Posts: 16110
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Help with parsing html

Post by balala »

Youkai1977 wrote: April 4th, 2021, 5:52 pm And the other measure [mLBTSF], should ONLY format the read <lastBuilddate> (Index1) of the [mRSS] measure from (example): Sun, 04 April 2021 19:20:20 GMT to --> 04.04.2021 19:20:20 <---.
And the latter does NOT work. Because here then the minutes and seconds of the real time, for whatever reason, are taken over.
The website, in the <lastBuildDate>(.*)</lastBuildDate> tag is returning the date and time of the load (when have you accessed the content of the site). You can easily see what am I talking about if you load the content of the site into your browser, then try to refresh manually a few times, one after the other, following the date and time contained in the lastBuildDate tag. You can see you get the results with a difference of just a few seconds. So this tag is returning the date of access of the site. This is approximately the local date and time, there might be differences of a few seconds only.
Each article (maybe not the proper term) on the other hand, has a publication date and time. This time can be accessed with the <pubDate>(.*)</pubDate> tag. But have to get one such date and time for each article out there.
User avatar
Youkai1977
Posts: 164
Joined: October 31st, 2018, 4:11 pm
Location: Germany

Re: Help with parsing html

Post by Youkai1977 »

You can easily see what am I talking about if you load the content of the site into your browser, then try to refresh manually a few times...
ARRRRGGGHHH ...oh man, right, now I see it too. O.O O.O O.O :o :o :o
Man ne, I thought the <lastBuildDate> was the timestamp where Google News was updated altogether. I noticed that when you are on the website, about every 20-30 minutes (haven't checked exactly yet) a message appears at the bottom left of the Screen that there is an updated version and if you want to update.
Therefore, as I said, the assumption that this is the <lastBuildDate>.
But now that you have found out that this is only the timestamp when I or the [mRSS] Measure has accessed the page, this is for me all ad absurdum.
And now for each feed to take the <PupDate> I find exaggerated ... or in other words, this is like shooting with cannons on sparrows.

So the whole snot now flies out of the NewsFeed reader (what has to do with the <lastBuildDate>) and I leave it at the update display, when the NewsFeed skin has updated. I need this among other things, because we are still trying out this update rate for the [mRSS] measure.

Then still something to my COLOR PROBLEM (InlineSettings).
I found out that if special characters occur in the feed, I then have this color play. So a feed is COLORFUL (multicolored), or in WHITE instead of e.g. blue.

If a feed has e.g. the character | in it, the coloring breaks off at that point, and AFTER the | character the feed has a different color.
If the feed contains the characters +++ three times in a row, the feed is COLORFUL.

So your solution from the other day to put a (?iu) in front of my [mRSSItem] seems either NOT to be enough, or not quite the correct way to do it.
- Win11 Pro x64 (23H2 - 22631.3085)
- Rainmeter 4.5.18
- Gigabyte B550i AORUS Pro AX V1.2
- Corsair Venegeance LPX 2x 16GB (32GB) DDR4 3200MHz
- RYZEN 7 5800X
- PowerColor RX570 8GB
- Samsung 980Pro 250GB (NVMe) - Drive C: Windows
- Kingston SNV2S1000G (NVMe) - Drive D: Rainmeter, Skins & Others - Drive D: Games
- NAS Synology DS216j - 2x 1GB HDDs - My Main Backup & Data Storage in my Home-Network
- Mon 1: 24" HP 24f (1920 x 1080 @ 75Hz) - Primary
- Mon 2: 22" Philips 226VL (1920 x 1080 @ 60Hz) - Secondary 1
- Mon 3: 50" Philips 50PUS7304/12 (3840 x 2160 @ 60Hz) - Secondary 2
- Corsair CX 650M Power Supply
- NZXT H210 Case
- ISP Vodafone with 1000/50 Mbit Cable Internet

The absolutly High-End Machine on 2024 ... at least the graphics card :oops: O.O :rofl:
User avatar
balala
Rainmeter Sage
Posts: 16110
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Help with parsing html

Post by balala »

Youkai1977 wrote: April 5th, 2021, 8:35 am ARRRRGGGHHH ...oh man, right, now I see it too. O.O O.O O.O :o :o :o
Man ne, I thought the <lastBuildDate> was the timestamp where Google News was updated altogether. I noticed that when you are on the website, about every 20-30 minutes (haven't checked exactly yet) a message appears at the bottom left of the Screen that there is an updated version and if you want to update.
Therefore, as I said, the assumption that this is the <lastBuildDate>.
But now that you have found out that this is only the timestamp when I or the [mRSS] Measure has accessed the page, this is for me all ad absurdum.
Yeah, indeed makes not too much sense, but that's it.
Youkai1977 wrote: April 5th, 2021, 8:35 am And now for each feed to take the <PupDate> I find exaggerated ... or in other words, this is like shooting with cannons on sparrows.
No, it's not a big deal at all. It can be done extremely easily. Here is an example:
  • Replace the NF1 - NF5 variables into the [Variables] section of the newsfeeddata.inc file with the following ones:

    Code: Select all

    [Variables]
    ...
    NF1=.*<item.*<title.*>(.*)</title>.*<pubDate>(.*)</pubDate>
    NF2=.*<title.*>(.*)</title>.*<pubDate>(.*)</pubDate>
    NF3=.*<title.*>(.*)</title>.*<pubDate>(.*)</pubDate>
    NF4=.*<title.*>(.*)</title>.*<pubDate>(.*)</pubDate>
    NF5=.*<title.*>(.*)</title>.*<pubDate>(.*)</pubDate>
    Just note here that I'd not use five variables, because NF2, NF3, NF4 and NF5 are the same, there is no difference between them.
  • Modify the StringIndex values of the child WebParser measures in the newsfeed.ini file as it follows:

    Code: Select all

    [mRSSItem1]
    ...
    StringIndex=3
    ...
    
    [mRSSItem2]
    ...
    StringIndex=5
    ...
    
    [mRSSItem3]
    ...
    StringIndex=7
    ...
    
    [mRSSItem4]
    ...
    StringIndex=9
    ...
    
    [mRSSItem5]
    ...
    StringIndex=11
    ...
  • Finally add a few other measures to get the dates:

    Code: Select all

    [mRSSDate1]
    Measure=WEBPARSER
    URL=[mRSS]
    StringIndex=4
    Disabled=1
    Group=mCHILDS
    
    [mRSSDate2]
    Measure=WEBPARSER
    URL=[mRSS]
    StringIndex=6
    Disabled=1
    Group=mCHILDS
    
    [mRSSDate3]
    Measure=WEBPARSER
    URL=[mRSS]
    StringIndex=8
    Disabled=1
    Group=mCHILDS
    
    [mRSSDate4]
    Measure=WEBPARSER
    URL=[mRSS]
    StringIndex=10
    Disabled=1
    Group=mCHILDS
    
    [mRSSDate5]
    Measure=WEBPARSER
    URL=[mRSS]
    StringIndex=12
    Disabled=1
    Group=mCHILDS
  • Additional Time measures might be needed, to get the appropriate format for dates / times. Don't post the for now, hope you can add them if you want.
Now you have to use the acquired dates somehow, to add them to string meters or whatever. I leave this for you, add them where you like them.
Youkai1977 wrote: April 5th, 2021, 8:35 am Then still something to my COLOR PROBLEM (InlineSettings).
I found out that if special characters occur in the feed, I then have this color play. So a feed is COLORFUL (multicolored), or in WHITE instead of e.g. blue.

If a feed has e.g. the character | in it, the coloring breaks off at that point, and AFTER the | character the feed has a different color.
If the feed contains the characters +++ three times in a row, the feed is COLORFUL.

So your solution from the other day to put a (?iu) in front of my [mRSSItem] seems either NOT to be enough, or not quite the correct way to do it.
No, I don't get this. The only different colored character is the | itself. See my screenshot (click it to enlarge):
NewsFeed.png
Another comment related to the code: I definitely would move the included newsfeeddata.inc file into the @Resources folder (which doesn't yet exist but can easilly be created). This folder has been added exactly to store such resources, included files and so on.
You do not have the required permissions to view the files attached to this post.
User avatar
Youkai1977
Posts: 164
Joined: October 31st, 2018, 4:11 pm
Location: Germany

Re: Help with parsing html

Post by Youkai1977 »

balala wrote: April 5th, 2021, 6:27 pm Yeah, indeed makes not too much sense, but that's it.
That's why I've completely removed it from my NewsFeed skin now.
No, it's not a big deal at all. It can be done extremely easily. Here is an example:
Oh man you are crazy. :o It's meant nicely and I thank you for the example, but I think it will be too much of a good thing, if I now also add this to the skin.
I will keep your example for later times, but in my current skin there I leave it out rather. That is otherwise too much, if I stuff for each feed now also still the PupDate with purely.
Nevertheless, many many thanks for it :) :rosegift:
[*]Additional Time measures might be needed, to get the appropriate format for dates / times. Don't post the for now, hope you can add them if you want.[/list]
Now you have to use the acquired dates somehow, to add them to string meters or whatever. I leave this for you, add them where you like them.
Yes as I said, possibly in a different skin. From these I probably leave it out for now
No, I don't get this. The only different colored character is the | itself.
Correct, if the special character | is in a feed, it will NOT be colored. But I also experience that AFTER the special character | the text is NOT colored anymore or OTHER.
Likewise if the special characters +++ are used in a feed, I have color problems.
So something is definitely wrong.
I am currently already thinking about whether I make a separate meter for each feed to avoid the InlineSetting problem.

Currently there is no feed running with the mentioned special characters, so I could present you a screenshot of what I mean. But as soon as one comes, submit such a screenshot
Another comment related to the code: I definitely would move the included newsfeeddata.inc file into the @Resources folder (which doesn't yet exist but can easilly be created). This folder has been added exactly to store such resources, included files and so on.
Mhh, ok I don't really understand why I should do that. I find a lot of skins on the net that have a *.inc file always in the order of the respective skin.
Therefore, I think the thing with the @Resources folder will have its sense and purpose. But I don't understand why I should put my *.inc. files in there.
- Win11 Pro x64 (23H2 - 22631.3085)
- Rainmeter 4.5.18
- Gigabyte B550i AORUS Pro AX V1.2
- Corsair Venegeance LPX 2x 16GB (32GB) DDR4 3200MHz
- RYZEN 7 5800X
- PowerColor RX570 8GB
- Samsung 980Pro 250GB (NVMe) - Drive C: Windows
- Kingston SNV2S1000G (NVMe) - Drive D: Rainmeter, Skins & Others - Drive D: Games
- NAS Synology DS216j - 2x 1GB HDDs - My Main Backup & Data Storage in my Home-Network
- Mon 1: 24" HP 24f (1920 x 1080 @ 75Hz) - Primary
- Mon 2: 22" Philips 226VL (1920 x 1080 @ 60Hz) - Secondary 1
- Mon 3: 50" Philips 50PUS7304/12 (3840 x 2160 @ 60Hz) - Secondary 2
- Corsair CX 650M Power Supply
- NZXT H210 Case
- ISP Vodafone with 1000/50 Mbit Cable Internet

The absolutly High-End Machine on 2024 ... at least the graphics card :oops: O.O :rofl:
User avatar
balala
Rainmeter Sage
Posts: 16110
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Help with parsing html

Post by balala »

Youkai1977 wrote: April 6th, 2021, 9:39 am Oh man you are crazy. :o It's meant nicely and I thank you for the example, but I think it will be too much of a good thing, if I now also add this to the skin.
I will keep your example for later times, but in my current skin there I leave it out rather. That is otherwise too much, if I stuff for each feed now also still the PupDate with purely.
Nevertheless, many many thanks for it :) :rosegift:Yes as I said, possibly in a different skin. From these I probably leave it out for now
No, I don't think, but it's your choice.
And you're welcome, was a pleasure to write / modify the RegExp.
Youkai1977 wrote: April 6th, 2021, 9:39 am Correct, if the special character | is in a feed, it will NOT be colored. But I also experience that AFTER the special character | the text is NOT colored anymore or OTHER.
Likewise if the special characters +++ are used in a feed, I have color problems.
So something is definitely wrong.
I am currently already thinking about whether I make a separate meter for each feed to avoid the InlineSetting problem.

Currently there is no feed running with the mentioned special characters, so I could present you a screenshot of what I mean. But as soon as one comes, submit such a screenshot
Ok, please do so. Couldn't discover this coloring problem so far, however I'm not running the skin continuously.
Youkai1977 wrote: April 6th, 2021, 9:39 am Mhh, ok I don't really understand why I should do that. I find a lot of skins on the net that have a *.inc file always in the order of the respective skin.
Therefore, I think the thing with the @Resources folder will have its sense and purpose. But I don't understand why I should put my *.inc. files in there.
No, you don't have to put them in the @Resources folder. It was just an idea, which I always apply. But the skin does perfectly work in the existing format as well. Keep the .inc file near the main .ini file, if you like more it this way.