It is currently June 25th, 2019, 6:27 am

lookahead assertions (among other things) not working?

Help with creating, editing & fixing problems with skins
hpgbproductions
Posts: 10
Joined: June 12th, 2018, 7:18 am

lookahead assertions (among other things) not working?

hpgbproductions » April 13th, 2019, 8:55 am

I'm trying to get the main screenshots and the following description in a webpage with a given ID (obtained earlier in the skin) using WebParser.
An example of a webpage that may be obtained is https://www.simpleplanes.com/a/m0Z83q/Synergy-Prime

Usually each page has three screenshots like the example so the following code works. URL is empty because it will be filled in by another WebParser measure.

Code: Select all

[info1]
Measure=WebParser
URL=
RegExp=(?iu).*img-responsive".*src="(.*)".*>.*img-responsive".*src="(.*)".*>.*img-responsive".*src="(.*)".*>.*class="post-description".*</p>(.*)</div>"
; this RegExp works on Rainmeter but not on regex101.com even with escaping slashes
Debug=1
UpdateRate=-1
DynamicVariables=1

[info1img0]
Measure=WebParser
URL=[info1]
StringIndex=1
DecodeCharacterReference=1
Download=1

[info1getimg0]
Measure=WebParser
URL=[info1img0]

[info1img1]
Measure=WebParser
URL=[info1]
StringIndex=2
DecodeCharacterReference=1
Download=1

[info1getimg1]
Measure=WebParser
URL=[info1img1]

[info1img2]
Measure=WebParser
URL=[info1]
StringIndex=3
DecodeCharacterReference=1
Download=1

[info1getimg2]
Measure=WebParser
URL=[info1img2]
regex101 supports the regular expression without the description part:

Code: Select all

RegExp=(?iu).*img-responsive".*src="(.*)".*>.*img-responsive".*src="(.*)".*>.*img-responsive".*src="(.*)".*>
However, there is are cases that a page will have only 1 or 2 screenshots. Which in that case the following using lookahead assertions is tested. This doesn't work on Rainmeter, but it does on regex101 and is able to pick up the other images further below if expanded (unintended feature I guess)

Code: Select all

RegExp=(?iu)img-responsive".*src="(.*?)">.*(?(?=img-responsive).*"src="(.*)">)(?(?=.*img-responsive").*"src="(.*)">)
; Rainmeter only gets the first image regardless of how many images exist

RegExp=(?iu)img-responsive".*src="(.*?)">.*(?(?=img-responsive).*"src="(.*)">)(?(?=.*img-responsive").*"src="(.*)">).*class="post-description".*</p>(.*)</div>"
; Interestingly the description part will break this
It seems like Rainmeter and regex101 (resource linked by Rainmeter docs) behave differently when group constructs are used.
If anyone knows what's going on with WebParser please help, thanks!
Yincognito
Posts: 652
Joined: February 27th, 2015, 2:38 pm

Re: lookahead assertions (among other things) not working?

Yincognito » April 13th, 2019, 6:20 pm

This works for me:

Code: Select all

[Variables]

[Rainmeter]
AccurateText=1
Update=1000
DynamicWindowSize=1

[info1]
Measure=WebParser
URL=https://www.simpleplanes.com/a/m0Z83q/Synergy-Prime
RegExp=(?siU).*class="img-responsive".*src="(.*)">(?(?=.*class="img-responsive").*class="img-responsive".*src="(.*)">)(?(?=.*class="img-responsive").*class="img-responsive".*src="(.*)">).*class="post-description".*<p>(.*)</p>
UpdateRate=3600
DynamicVariables=1

[info1img0]
Measure=WebParser
URL=[info1]
StringIndex=1
DecodeCharacterReference=1
Download=1

[info1getimg0]
Measure=WebParser
URL=[info1img0]

[info1img1]
Measure=WebParser
URL=[info1]
StringIndex=2
DecodeCharacterReference=1
Download=1

[info1getimg1]
Measure=WebParser
URL=[info1img1]

[info1img2]
Measure=WebParser
URL=[info1]
StringIndex=3
DecodeCharacterReference=1
Download=1

[info1getimg2]
Measure=WebParser
URL=[info1img2]

[info1desc]
Measure=WebParser
URL=[info1]
StringIndex=4
DecodeCharacterReference=1

[MT_Test]
Meter=STRING
SolidColor=64,64,64,255
FontColor=255,255,255,255
W=750
ClipString=2
MeasureName=info1img0
MeasureName2=info1img1
MeasureName3=info1img2
MeasureName4=info1desc
Text="info1img0: %1#CRLF#info1img1: %2#CRLF#info1img2: %3#CRLF#info1desc: %4"
DynamicVariables=1
Basically, you had mistakes in your regex, like the case of the (?U) flag, the omission of the (?s) flag, a bit confusing lookahead syntax, and incorrect assessment of the post description part. Bear in mind that this will take the long text of the post description (the one between the <p> and </p> tags) and not the "title" of the post description (i.e. the text between the <strong> and </strong> tags). If you want the latter, just replace the tags.

And yes, there are minor differences between how sites like regexr.com (the one I use) or regex101.com and Rainmeter operate with regular expressions, and some of them are about lookarounds. However, the fact that you got different results in this case in regex101 compared to Rainmeter had nothing to do with those differences, but with the fact that you constructed your regex a little sloppy, combined with the fact that those sites show all the results of the parsing in certain circumstances, instead of showing just the ones you capture. Hopefully, the above solution will work for you.

NOTE: I assumed the post description always exist in the source page. If that isn't the case, just turn that part into a lookahead, but try to be aware of the small differences in the way I wrote the lookaheads compared to yours and apply that for the post description as well. Basically, try not to abuse the .* parts, when they're not an actual necessity.