It is currently May 6th, 2024, 9:24 pm

Variable number of items in regex--lookahead?

Get help with creating, editing & fixing problems with skins
User avatar
qwerky
Posts: 182
Joined: April 10th, 2014, 12:31 am
Location: Canada

Variable number of items in regex--lookahead?

Post by qwerky »

Hello World, I'm working on a weather skin, and need some help with the WebParser regex. It is working fine until it comes to the Wind item. At that point, the html looks like this:

Code: Select all

<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="West">W</abbr> 10 <abbr title="kilometres per hour">km/h</abbr>
</dd>
and this portion of regex:

Code: Select all

.*<dt>Wind:.*title.*>(.*)<.*>(.*)<.*>(.*)<
works fine to capture the wind direction "W", speed "10" and unit "km/h", until the wind becomes calm, and the html changes to this:

Code: Select all

<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">calm</dd>
Now, there is only one item to capture, rather than three. I'm thinking I need to use lookaheads, but am having trouble puzzling it out. I tried this:

Code: Select all

.*<dt>Wind:.*class.*>(?(?=(calm)))
which works to set the first string index item to "calm" only if the wind is calm; so far so good, but that still leaves two string index items unfilled, and I'm not sure what to follow that code with? Any ideas?
User avatar
balala
Rainmeter Sage
Posts: 16201
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Variable number of items in regex--lookahead?

Post by balala »

Please post the whole code you have so far. The source (URL) would be needed for sure.
User avatar
FreeRaider
Posts: 826
Joined: November 20th, 2012, 11:58 pm

Re: Variable number of items in regex--lookahead?

Post by FreeRaider »

Try this

Code: Select all

(?siU)(?(?=calm)(.*)<|.*abb.*>(.*)<.*>(.*)<.*>(.*)<.*)
User avatar
qwerky
Posts: 182
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: Variable number of items in regex--lookahead?

Post by qwerky »

FreeRaider wrote: January 17th, 2019, 9:12 pm Try this

Code: Select all

(?siU)(?(?=calm)(.*)<|.*abb.*>(.*)<.*>(.*)<.*>(.*)<.*)
Thanks, I didn't know that a lookahead could contain an OR condition. What is "abb"? Should that be "title"? (In the original code I posted, "title" is a literal, not a variable.)

Where my indexes are:
11 = Wind direction
12 = Wind speed
13 = Wind unit
14 = other things,
your code, when the wind is not calm, produces:
11 = null
12 = W (direction)
13 = 10 (speed)
14 = km/h (unit).

When the wind is calm, that code produces:
11 = null
12 = km
13 = null
14 = null.
User avatar
qwerky
Posts: 182
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: Variable number of items in regex--lookahead?

Post by qwerky »

balala wrote: January 17th, 2019, 9:11 pm Please post the whole code you have so far. The source (URL) would be needed for sure.
The source is: https://weather.gc.ca/city/pages/on-143_metric_e.html

The relevant code in the weather include file is:

Code: Select all

[Variables]
SiteUpdateRate=600
SiteURL=http://weather.gc.ca/
PageURL=https://weather.gc.ca/city/pages/on-143_metric_e.html
IndexHeader=1
IndexCurrent=2

; regexBase strings:  1 Header, 2 Current Conditions, 3 Graphic Forecast, 4 Detailed Forecast, 5 Averages, 6 Yesterday
regexBase=(?siU)property="mainContentOfPage"(.*)</ul>(.*)<h2>Forecast(.*)detailedfore(.*)<h2>Averages(.*)<h2>Yesterday(.*)</details>

; regexHeader strings:  1 City, 2 Province, 3 Province Abbreviation, 4 Alert Name, 5 Alert URL, 6 Alert Text, 7 Past24URL
regexHeader=(?siU)property="name">(.*), <abbr title="(.*)">(.*)</abbr>.*<div id=".*<div id="(.*)".*<a href="(.*)">(.*)</a>.*href="(.*)">Past

; regexCurrent strings:  1 Current Icon, 2 Rounded Temperature, 3 Observed Time, 4 Condition, 5 Pressure, 6 Pressure Unit, 7 Tendency, 8 Precise Temperature, 9 Dew Point, 10 Humidity, 11 Wind Direction, 12 Wind Speed, 13 Wind Unit, 14 Wind Chill/Heat Index Label, 15 Wind Chill/Heat Index Unit, 16 Visibility, 17 Visibility Unit
regexCurrent=(?siU)<img.*src="(.*)".*<span.*>(.*)<.*<dt>Date:.*class.*>(.*)<.*<dt>Condition:.*class.*>(.*)<.*<dt>Pressure:.*class.*>(.*)<.*>(.*)<.*<dt>Tendency:.*class.*>(.*)<.*<dt>Temperature:.*class.*>(.*)<.*<dt>Dew point:.*class.*>(.*)<.*<dt>Humidity:.*class.*>(.*)<.*<dt>Wind:.*class.*>(?(?=calm)(.*)<|.*title.*>(.*)<.*>(.*)<.*>(.*)<.*)href=".*>(.*)<.*class.*>(.*)<.*<dt>Visibility:.*class.*>(.*)<.*>(.*)<

; --------------------------------------
; MEASURES
; --------------------------------------

; Main Site
; --------------------------------------

[msrSite]
Measure=WebParser
URL=#PageURL#
RegExp=#regexBase#
UpdateRate=#SiteUpdateRate#

; Current Conditions
; --------------------------------------

[msrCurrentIcon]
Measure=WebParser
URL=#SiteURL#[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexIcon#
Download=1

[msrCurrentTempRound]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexTempRound#

[msrCurrentTime]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexTime#

[msrCurrentCondition]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexCondition#

[msrCurrentPressure]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexPressure#

[msrCurrentPresUnit]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexPresUnit#

[msrCurrentTendency]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexTendency#

[msrCurrentTempPrecise]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexTempPrecise#

[msrCurrentDewpoint]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexDewpoint#

[msrCurrentHumidity]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexHumidity#

[msrCurrentWindDir]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexWindDir#

[msrCurrentWindSpeed]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexWindSpeed#

[msrCurrentWindUnit]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexWindUnit#

[msrCurrentHeatChillLabel]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexHeatChillLabel#

[msrCurrentHeatChillUnit]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexHeatChillUnit#

[msrCurrentVisibility]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexVisibility#

[msrCurrentVisUnit]
Measure=WebParser
URL=[msrSite]
RegExp=#regexCurrent#
StringIndex=#IndexCurrent#
StringIndex2=#CurrentIndexVisUnit#
User avatar
FreeRaider
Posts: 826
Joined: November 20th, 2012, 11:58 pm

Re: Variable number of items in regex--lookahead?

Post by FreeRaider »

qwerky wrote: January 17th, 2019, 11:06 pm Thanks, I didn't know that a lookahead could contain an OR condition. What is "abb"? Should that be "title"? (In the original code I posted, "title" is a literal, not a variable.)

Where my indexes are:
11 = Wind direction
12 = Wind speed
13 = Wind unit
14 = other things,
your code, when the wind is not calm, produces:
11 = null
12 = W (direction)
13 = 10 (speed)
14 = km/h (unit).

When the wind is calm, that code produces:
11 = null
12 = km
13 = null
14 = null.
Based on what you wrote, I created that regexp code.
"abb" is the first part of "abbr title"
image001.PNG
image002.PNG
You do not have the required permissions to view the files attached to this post.
User avatar
qwerky
Posts: 182
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: Variable number of items in regex--lookahead?

Post by qwerky »

FreeRaider wrote: January 17th, 2019, 11:29 pm Based on what you wrote, I created that regexp code.
"abb" is the first part of "abbr title"
image001.PNG
image002.PNG
Okay, I understand "abb". But as your image shows, when the wind is not calm, the first index is null, and indices 2, 3 and 4 contain what should be in 1, 2 and 3.
User avatar
FreeRaider
Posts: 826
Joined: November 20th, 2012, 11:58 pm

Re: Variable number of items in regex--lookahead?

Post by FreeRaider »

qwerky wrote: January 17th, 2019, 11:45 pm But as your image shows, when the wind is not calm, the first index is null, and indices 2, 3 and 4 contain what should be in 1, 2 and 3.
I know unfortunately, it happens anytime it is used (at least for me) an IF statement in regexp option.
User avatar
balala
Rainmeter Sage
Posts: 16201
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Variable number of items in regex--lookahead?

Post by balala »

qwerky wrote: January 17th, 2019, 11:18 pm The source is: https://weather.gc.ca/city/pages/on-143_metric_e.html

The relevant code in the weather include file is:
First I have to admit I am a bit lazy and that's why I didn't read all your conversation with FreeRaider, so maybe I missed something, but here is a solution to your initial question (I hope). Try the following RegExp option: RegExp=(?siU)<dt>Wind:</dt>.*<dd class=".*">.*(?(?=.*<abbr).*title=".*">(.*)</abbr>(.*)<abbr title=".*">(.*)</abbr>.*</dd>).*(.*)</dd>.
Note the followings:
  • If the site shows "calm", then this will be the only result, returned by the WebParser measure which has set the StringIndex to 4 (StringIndex=4). The measures with StringIndex from 1 to 3 are returning nothing in this case.
  • If there is a wind speed, then the measure with StringIndex=1 will show the direction of wind, that one with StringIndex=2 will show the value of the speed and finally the measure with StringIndex=3 will show the measurement unit of the speed. In this case the measure with StringIndex=4 returns a mess and has to be ignored.
Unfortunately for now I could test the above RegExp only for those cases when there is wind, but not for calm. I saved in a file what you've posted in your initial request and used that file to test what's going on when there is "calm". I hope online it'll work in both cases. Please let me know if it does.
User avatar
qwerky
Posts: 182
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: Variable number of items in regex--lookahead?

Post by qwerky »

balala wrote: January 18th, 2019, 4:42 pm First I have to admit I am a bit lazy and that's why I didn't read all your conversation with FreeRaider, so maybe I missed something, but here is a solution to your initial question (I hope). Try the following RegExp option: RegExp=(?siU)<dt>Wind:</dt>.*<dd class=".*">.*(?(?=.*<abbr).*title=".*">(.*)</abbr>(.*)<abbr title=".*">(.*)</abbr>.*</dd>).*(.*)</dd>.
Note the followings:
  • If the site shows "calm", then this will be the only result, returned by the WebParser measure which has set the StringIndex to 4 (StringIndex=4). The measures with StringIndex from 1 to 3 are returning nothing in this case.
  • If there is a wind speed, then the measure with StringIndex=1 will show the direction of wind, that one with StringIndex=2 will show the value of the speed and finally the measure with StringIndex=3 will show the measurement unit of the speed. In this case the measure with StringIndex=4 returns a mess and has to be ignored.
Unfortunately for now I could test the above RegExp only for those cases when there is wind, but not for calm. I saved in a file what you've posted in your initial request and used that file to test what's going on when there is "calm". I hope online it'll work in both cases. Please let me know if it does.
Thank you both for your help. Unfortunately, I couldn't get the above to work live. Yes, it's rare to find calm (no wind), though I have such a page saved to test with.

The samples and the regex in my original post concerned only the wind, however my latter post showed the whole regex. Here are two new samples that begin with Humidity (ahead of Wind), and include Visibility following Wind:

Code: Select all

<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">70%</dd>
</dl></div>
<div class="col-sm-4" style="vertical-align: top; min-height: 107px;"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">calm</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">calm</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>

Code: Select all

<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">70%</dd>
</dl></div>
<div class="col-sm-4" style="vertical-align: top; min-height: 107px;"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="West">W</abbr> 10 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="West">W</abbr> 6 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>
<a href="https://www.canada.ca/en/environment-climate-change/services/weather-health/wind-chill-cold-weather.html">Wind Chill</a>:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-16</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">4</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
After more work, I've come up with this regex:

Code: Select all

.*<dt>Humidity:.*class.*>(.*)<.*<dt>Wind:.*class.*>(?(?=calm)(.*)<.*<dt>Visibility|.*title.*>(.*)<.*>(.*)<.*>(.*)<.*href=".*>(.*)<.*class.*>(.*)<.*<dt>Visibility).*class.*>(.*)<.*>(.*)<
Which goes through Humidity, Wind and Visibility.

Using the above seems to work both in RainRegExp, and with my saved sites--to a degree. If you try it in RainRegExp, you will see that in either case, string index 1 contains the humidity, and string indices 8 and 9 contain the visibility distance/unit.

When the wind is calm, index 2 contains "calm", and indices 3 through 7 are null (which is correct, as there is no wind speed nor wind chill when the wind is calm).

When the wind is not calm, string index 2 is null, and indices 3 through 7 contain wind direction, speed, units, "Wind Chill", and wind chill value; all correct.

The only issue is that in either case, there is one or more null measures. I would prefer to have had "calm" appear in the "Wind Direction" measure when appropriate. But since that hasn't been accomplished, the only issue now is how to add code to detect the two cases and display the appropriate measures.

P.S. Thanks, FreeRaider, for showing me the OR in the lookahead, and giving me a start on the regex.