My skin needs an expert in regex...

provato · Post by **provato** » March 19th, 2023, 11:25 am

I have successfully made a nice skin that displays the first eight feeds from my feedly account on the desktop, and it mostly works fine.
I followed the guides here and first created a parent measure to capture the whole feedly feed:

Code: Select all

[MeasureFeedlyFeedParser]
Measure=WebParser
Url=https://cloud.feedly.com/v3/streams/contents?streamId=user/xxxxxxxxxxxxxxxxxxxx/category/global.all
Header=Authorization: Bearer #Feedly Developer API#
RegExp=(?siU)"items":\[{(.*)}\]}\]}.*
UpdateRate=#UpdateRate#
FinishAction=[!Redraw]

Then I created children for every aspect of the feed I want to be displayed, for example title, origin website, time uploaded, image, author, category, summary and so on...
Here is an example of the first feed and its structure in feedly:

Code: Select all

{"id":"user/xxxxxxxxxxxxxxxxxxxx/category/global.all",
"updated":1675980094406,
"continuation":"186096173fd:1b5c252:74502776",
"items":
	[{"fingerprint":"d6b826a",
	"language":"en",
	"id":"zWn1Ip/2vVP5xOLcVKMo1yyOEFE5KkclvWcP+hHVlMs=_1863832bbc6:261b3dd:74502776",
	"keywords":["RetroRGB"],
	"originId":"https://admin.retrorgb.com/?p=39164",
	"origin":
		{"title":"RetroRGB",
		"streamId":"feed/http://retrorgb.com/feed",
		"htmlUrl":"https://www.retrorgb.com"},
	"title":"GrechTech Retro Rosetta: A Multi-Player BlueRetro Adaptor",
	"author":"SaturnDave (Sega Saturn, SHIRO!)",
	"crawled":1675980094406,
	"published":1675978490000,
	"summary":
		{"content":
			"<a 
				rel=\"nofollow\"
			 	href=\"https://www.retrorgb.com/grechtech-retro-rosetta-a-multi-player-blueretro-adaptor.html\"
			 	title=\"GrechTech Retro Rosetta: A Multi-Player BlueRetro Adaptor\">
			<img src=\"https://cdn.retrorgb.com/wp-content/uploads/2023/02/09160028/GTRR-150x150.jpg\"
				 width=\"150\"
				 alt=\"\"
				 class=\"webfeedsFeaturedVisual wp-post-image\"
				 height=\"150\">
			</a>GrechTech has recently made available their new RetroRosetta, a modular Bluetooth receiver/controller
				 adaptor designed for retro systems with a focus on robust multi-player support and an aim to avoid
				 e-waste and tackle accessibility issues. The RetroRosetta supports a myriad of different game consoles
				 and is made possible thanks to the amazing BlueRetro project by developer […]",
				"direction":"ltr"},
				"alternate":
					[{"type":"text/html",
					"href":"https://www.retrorgb.com/grechtech-retro-rosetta-a-multi-player-blueretro-adaptor.html"}],
				"visual":
					{"contentType":"image/jpeg",
					"url":"https://cdn.retrorgb.com/wp-content/uploads/2023/02/09160028/GTRR-150x150.jpg",
					"processor":"feedly-nikon-v3.1",
					"width":150,
					"height":150,
					"expirationDate":1678650603247,
					"edgeCacheUrl":"https://lh3.googleusercontent.com/2t2djT8Pgx1XusKv7EZsd9IL2qC2-_jJc6SOLt9olXZkRAp3akMsQXYZLug-					CaHl5HWQfPhTJFlLu8TRYc0r2wdlIdgJ0vFkl4WWOA"},
				"canonicalUrl":"https://www.retrorgb.com/grechtech-retro-rosetta-a-multi-player-blueretro-adaptor.html",
				"unread":true,
				"categories":[{"id":"user/82955c76-fa63-4a3f-b56a-bcdccb9b9432/category/VIDEO GAMES","label":"VIDEO GAMES"}]}

And here is an example of children measures with a regex expression and a string index # each, that return the categories of each of the eight feeds:

Code: Select all

[MeasureFeedlyFeedCategory1]
Measure=WebParser
Url=[MeasureFeedlyFeedParser]
RegExp=(?siU)/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*
StringIndex=1
StringIndex2=1

[MeasureFeedlyFeedCategory2]
Measure=WebParser
Url=[MeasureFeedlyFeedParser]
RegExp=(?siU)/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*
StringIndex=1
StringIndex2=2

[MeasureFeedlyFeedCategory3]
Measure=WebParser
Url=[MeasureFeedlyFeedParser]
RegExp=(?siU)/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*/category/.*","label":"(.*)".*
StringIndex=1
StringIndex2=3
.
.
.
.

(This first finds the text /category/, then ignores all text until it finds the text ","label":", and finally it returns all text between ","label":" and ")

My problem now...
There are two feed elements that are not always present in all feeds (they may be omitted in some of them): the feed image and the feed author.

Here is the regex for the image:

Code: Select all

[MeasureFeedlyFeedImage1]
Measure=WebParser
Url=[MeasureFeedlyFeedParser]
RegExp=(?siU)(?(?=.*"content":).*src\=\\"(.*)\\".*(?<=}\]}))
StringIndex=1
StringIndex2=1
Download=1

[MeasureFeedlyFeedImage2]
Measure=WebParser
Url=[MeasureFeedlyFeedParser]
RegExp=(?siU)(?(?=.*"content":).*src\=\\"(.*)\\".*(?<=}\]}))(?(?=.*"content":).*src\=\\"(.*)\\".*(?<=}\]}))
StringIndex=1
StringIndex2=2
Download=1
.
.
.
.

...and here is the regex for the author:

Code: Select all

[MeasureFeedlyFeedAuthor1]
Measure=WebParser
Url=[MeasureFeedlyFeedParser]
RegExp=(?siU)(?(?=.*"origin":).*"author":"(.*)".*(?<=}\]}))
StringIndex=1
StringIndex2=1
ErrorString=-
Substitute="":"-"

[MeasureFeedlyFeedAuthor2]
Measure=WebParser
Url=[MeasureFeedlyFeedParser]
RegExp=(?siU)(?(?=.*"origin":).*"author":"(.*)".*(?<=}\]}))(?(?=.*,{").*"author":"(.*)".*(?<=}\]}))
StringIndex=1
StringIndex2=2
ErrorString=-
Substitute="":"-"
.
.
.
.

When one of the feeds doesn't have an author or an image, I get either an error or the wrong image/author for this feed (e.g. when the fifth feed from top doesn't have an image/author, my skin displays the imageauthor of the 6th feed in place #5, the image/author of the 7th feed in place #6 and so on...)

Is there a way to make a regex expression that detects when an image/author is missing in one of the feeds of the whole feedly output, and skips/leaves blank the image/author of this feed?

P.S: It's confusing I know, I'll try to explain better if someone really knows their regex

Post by **Yincognito** » March 19th, 2023, 9:29 pm

provato wrote: ↑March 19th, 2023, 11:25 am Is there a way to make a regex expression that detects when an image/author is missing in one of the feeds of the whole feedly output, and skips/leaves blank the image/author of this feed?

P.S: It's confusing I know, I'll try to explain better if someone really knows their regex

Let's assume the simple case of something like:

Code: Select all

item1:abc,item2:def,item3:ghi,item4:jkl,item5:mno,

and a result like this, which will list stuff on new lines:

Code: Select all

item1=\1\nitem2=\2\nitem3=\3\nitem4=\4\nitem5=\5\n

There are a couple of approaches you can try:
- using lookahead assertions or conditionals, e.g. (?(?=.*item3:).*item3:(.*),):

Code: Select all

(?siU)^(?(?=.*item1:).*item1:(.*),)(?(?=.*item2:).*item2:(.*),)(?(?=.*item3:).*item3:(.*),)(?(?=.*item4:).*item4:(.*),)(?(?=.*item5:).*item5:(.*),)$

- using optionals, e.g. (?:item3:(.*),)?:

Code: Select all

(?siU)^(?:item1:(.*),)?(?:item2:(.*),)?(?:item3:(.*),)?(?:item4:(.*),)?(?:item5:(.*),)?$

- using possessives, e.g. .*+, to prevent backtracking, but while working with stuff in a String measure Substitute instead:

Code: Select all

(?siU)^(?:(?(?=.*item3:).*item3:(.*),).*+|.*+)$

While the first two can be applied in your WebParser system via StringIndex, the last one is a bit special in that it needs a String measure that operates on the string, in effect extracting from there only what is needed and discarding the rest. Of course, for the last one to work, the result should be in the form of, say, item3=\1 or similar (for example, if you want the 3rd author from the list of authors and such).

There might be other methods besides those above, but in general, this is how you solve it. While doing this in your case, be aware that Rainmeter doesn't like "captures of nothing", so if by any chance you get results of \1 or similar in some cases, substitute them via "\\1":"" and similar. All the methods above should work if, for example, one of the items like item3 is missing.

P.S. You did well providing all the necessary information in understanding the question, but in such cases that might seem more complex / confusing, it could help to simplify the example even further (obviously, not too much to not represent something that you can easily test and see if it works in your actual case).

provato · Post by **provato** » March 20th, 2023, 1:29 pm

Thank you for taking the time to answer. When I find the time I’ll study your solutions to understand how to implement better, and write back here if I succeed or hit a wall.

Post by **Yincognito** » March 20th, 2023, 3:07 pm

provato wrote: ↑March 20th, 2023, 1:29 pm Thank you for taking the time to answer. When I find the time I’ll study your solutions to understand how to implement better, and write back here if I succeed or hit a wall.

No problem - looking forward to read how it went and, if necessary, to provide you additional info or code to help solve this in an acceptable manner.

provato · Post by **provato** » March 22nd, 2023, 8:23 pm

I tried to study your proposed methods without luck...
What I seem to not understand is how to make a numbered list like you have in your hypothetical example. I don't know how to make a regular expression with "item1, item2, item3" etc... I just put "item, item, item, item...." in order and then let stringindex= pick from 1 to 8.

I'm including my recent whole feedly output in a text file in this post.
In my skin, I'm only interested in the first eight feeds of this output.
In these first eight feeds, there is an image url after src=\" and an author text after "author":", EXCEPT for the second one (no src=\" and no "author":" text inside the 2nd feed)

Can you write child webparser measures with stringindex 1 to 8 that return all image urls and authors AND in their correct positions AND leave the 2nd feed's image and author blank?

Because I've tried your first two methods and I always end up "rolling up" the images/authors to fill the 2nd feed's blank space.

feedly output for testing.txt

Post by **Yincognito** » March 22nd, 2023, 11:10 pm

provato wrote: ↑March 22nd, 2023, 8:23 pmCan you write child webparser measures with stringindex 1 to 8 that return all image urls and authors AND in their correct positions AND leave the 2nd feed's image and author blank?

I'll look into it most likely tomorrow, but in the meantime ... are you sure the 2nd one is an actual feed and you want to get / parse it? I mean, after formatting things via webformatter.com (a bracket needs to be added at the beginning to avoid errors), the 2nd one has clearly a different structure and in part content than the rest:

Code: Select all

        {
            language: "en",
            id: "QtnGwmqQ1oV1t5U744LWl6RJWNNk8szUpijauuuYLqo=_186fcb278a5:1815f8c:eca0ac4",
            fingerprint: "23e129fc",
            originId: "https://www.progettosnaps.net#20230319",
            origin: { htmlUrl: "https://www.progettosnaps.net", streamId: "feed/http://www.progettosnaps.net/rssfeed.xml", title: "MAME progetto-SNAPS" },
            title: "MAME Resources 0.252 update",
            crawled: 1679276800165,
            published: 1679232600000,
            canonical: [{ href: "https://www.progettosnaps.net#20230319", type: "text/html" }],
            summary: { content: 'Available the update of all categories of <a href="https://www.progettosnaps.net/snapshots/" target="_blank"></a><span>Snapshots</span> 0.252.', direction: "ltr" },
            alternate: [{ href: "https://www.progettosnaps.net/", type: "text/html" }],
            unread: true,
            categories: [{ id: "user/82955c76-fa63-4a3f-b56a-bcdccb9b9432/category/baff35c6-f96c-4d3e-9a1e-2057c70d73c3", label: "MAME" }],
        },

And the rest all look like:

Code: Select all

        {
            fingerprint: "86d2eafc",
            language: "en",
            id: "zWn1Ip/2vVP5xOLcVKMo1yyOEFE5KkclvWcP+hHVlMs=_186f69cbbc6:118af50:c51640ac",
            keywords: ["RetroRGB"],
            originId: "https://admin.retrorgb.com/?p=39468",
            origin: { htmlUrl: "https://www.retrorgb.com", title: "RetroRGB", streamId: "feed/http://retrorgb.com/feed" },
            title: "A Fix For Bricked Wii U’s???",
            author: "Bob",
            crawled: 1679174712262,
            published: 1679173694000,
            summary: {
                content:
                    '<a rel="nofollow" href="https://www.retrorgb.com/a-fix-for-bricked-wii-us.html" title="A Fix For Bricked Wii U’s???"><img src="https://cdn.retrorgb.com/wp-content/uploads/2023/03/18170026/VoultarWiiUFix-150x150.jpg" width="150" alt="" class="webfeedsFeaturedVisual wp-post-image" height="150"></a>Voultar has just posted a video that shares a method of potentially fixing Wii U’s showing errors that have been leading people to believe their consoles are permanently bricked.  He used software by GaryOderNichts that’s flashed on a RPi Pico – All that’s needed is the pico, a USB cable and an SD card.  This […]',
                direction: "ltr",
            },
            alternate: [{ type: "text/html", href: "https://www.retrorgb.com/a-fix-for-bricked-wii-us.html" }],
            visual: {
                processor: "feedly-nikon-v3.1",
                contentType: "image/jpeg",
                url: "https://cdn.retrorgb.com/wp-content/uploads/2023/03/18170026/VoultarWiiUFix-150x150.jpg",
                edgeCacheUrl: "https://lh3.googleusercontent.com/dkUcT8HJlX2qv8zqxks6CpZ3jCMoSzuGM5tArlF91ZaUUrDzd8JWTFWr8kzj4V8xzWlpXjKO6NZZhP-7WlSc6XiQcPlwKSNDUHmyy1w",
                expirationDate: 1681927429276,
                width: 150,
                height: 150,
            },
            canonicalUrl: "https://www.retrorgb.com/a-fix-for-bricked-wii-us.html",
            unread: true,
            categories: [{ id: "user/82955c76-fa63-4a3f-b56a-bcdccb9b9432/category/VIDEO GAMES", label: "VIDEO GAMES" }],
        },

I'm just asking to make sure there isn't a misunderstanding somewhere along the line...

provato · Post by **provato** » March 23rd, 2023, 6:09 pm

Yes I’m positive the second one is an actual feed, and I can explain why you see this big difference only in the second one:

All other feeds (1st and 3rd to 8th) are from the same source (RetroRGB) which has a habit of furiously posting everyday feeds. The 2nd feed’s source is different and they post feeds once or twice a month.

About the images and authors:
RetroRGB has this info inside the [content], whereas progettosnaps (the 2nd feed’s source) does not.

I just need the image and author to be omitted/skipped for the second feed, since there is no “author” or “src” text inside [content].

Post by **Yincognito** » March 23rd, 2023, 9:30 pm

provato wrote: ↑March 23rd, 2023, 6:09 pm Yes I’m positive the second one is an actual feed, and I can explain why you see this big difference only in the second one:

All other feeds (1st and 3rd to 8th) are from the same source (RetroRGB) which has a habit of furiously posting everyday feeds. The 2nd feed’s source is different and they post feeds once or twice a month.

About the images and authors:
RetroRGB has this info inside the [content], whereas progettosnaps (the 2nd feed’s source) does not.

I just need the image and author to be omitted/skipped for the second feed, since there is no “author” or “src” text inside [content].

Ah, ok, thanks for clarifying this. Here you go (I added some bits at the beginning of the local source to make it closer to the real thing online):

Feedly_1.0.0.rmskin

And a preview:

Feedly.jpg

You have a couple of alternate measures as well, if you one day need them. Feel free to ask for details if by any chance you don't understand something.

provato · Post by **provato** » March 24th, 2023, 11:15 pm

Wow I can't thank you enough!

I implemented your way of splitting and numbering feeds (items), and now everything works like a charm!

the following image shows the 3rd feed from top without an image (as it should be because the feed doesn't provide one):

Untitled-2.jpg

Once again thank you so much!

Post by **Yincognito** » March 25th, 2023, 2:18 am

provato wrote: ↑March 24th, 2023, 11:15 pm Wow I can't thank you enough!
I implemented your way of splitting and numbering feeds (items), and now everything works like a charm!
the following image shows the 3rd feed from top without an image (as it should be because the feed doesn't provide one):
Once again thank you so much!

Well, you asked for an expert in regex, so...

Just joking, of course, I'm glad I could help!

Note: I believe you tried to capture and parse too much at once in the parent WebParser of your earlier implementation and that's why you had this issue, because otherwise, the way I set up the regex patterns to be used by StringIndex and StringIndex2 is pretty standard in Rainmeter. With such an approach, there is no way things can be skipped since each item is captured separately via StringIndex, and then StringIndex2 acts only on the item specified by the former (i.e. it can't act on the next items anyway).

P.S. For the record, the ItemN measures (and the alternative measures at the end) in my example are just for reference / demonstration purposes, you can safely delete or omit them if you don't need that stuff.

My skin needs an expert in regex...

My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...

Re: My skin needs an expert in regex...