It is currently June 26th, 2019, 10:58 pm

Help with regex lookahead

Help with creating, editing & fixing problems with skins
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Help with regex lookahead

qwerky » March 19th, 2019, 1:05 am

Hi again. In my weather skin, for wind, there are three possibilities:
1. Calm,
2. Wind (with or without gust) but no wind-chill, and
3. Wind (with or without gust) plus wind-chill.

These are represented by the following HTML fragments:

Code: Select all

<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">70%</dd>
</dl></div>
<div class="col-sm-4" style="vertical-align: top; min-height: 107px;"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">calm</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">calm</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">15 miles</dd>
</dl></div>

Code: Select all

<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">80%</dd>
</dl></div>
<div class="col-sm-4"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="West-Northwest">WNW</abbr> 13 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="West-Northwest">WNW</abbr> 8 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">15 miles</dd>
</dl></div>

Code: Select all

<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">79%</dd>
</dl></div>
<div class="col-sm-4" style="vertical-align: top; min-height: 107px;"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="South-Southwest">SSW</abbr> 20 <br class="visible-xs">gust 32 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="South-Southwest">SSW</abbr> 13 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>
<a href="https://www.canada.ca/en/environment-climate-change/services/weather-health/wind-chill-cold-weather.html">Wind Chill</a>:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-35</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">-31</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">15 miles</dd>
</dl></div>
The regex portion I am working with:

Code: Select all

(?siU).*<dt>Humidity:.*class.*>(.*)<.*<dt>Wind:.*class.*>(?(?=\R).*title.*>(.*)<.*>(.*)<(?(?=br).*>(.*)<).*<dt>(?(?=\R))(?(?=<a href=).*>(.*)<.*class.*>(.*)<)).*Visibility.*class.*>(.*)<
In RainRegExp, Humidity (before wind) and Visibility (after wind) show up fine (in the first HTML fragment, the rest are all empty, since the wind is calm). Wind direction, speed, and gust show up fine, when present. But wind-chill-label (should be "Wind Chill" when present) and wind-chill-unit (should be number, eg. "-35" when present) do not show up. I've tried many variations on the regex, without success. Any ideas?
User avatar
balala
Rainmeter Sage
Posts: 8319
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Help with regex lookahead

balala » March 19th, 2019, 10:28 am

qwerky wrote:
March 19th, 2019, 1:05 am
In RainRegExp, Humidity (before wind) and Visibility (after wind) show up fine (in the first HTML fragment, the rest are all empty, since the wind is calm). Wind direction, speed, and gust show up fine, when present. But wind-chill-label (should be "Wind Chill" when present) and wind-chill-unit (should be number, eg. "-35" when present) do not show up. I've tried many variations on the regex, without success. Any ideas?
There is one small missing detail in the posted RegExp. Here is the corrected one, which return both missing strings as well: RegExp=(?siU).*<dt>Humidity:.*class.*>(.*)<.*<dt>Wind:.*class.*>(?(?=\R).*title.*>(.*)<.*>(.*)<(?(?=br).*>(.*)<).*<dt>(?(?=\R))(?(?=.*<a href=).*>(.*)<.*class.*>(.*)<)).*Visibility.*class.*>(.*)< (at least based on the posted html codes).
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: Help with regex lookahead

qwerky » March 19th, 2019, 7:20 pm

balala wrote:
March 19th, 2019, 10:28 am
There is one small missing detail in the posted RegExp. Here is the corrected one, which return both missing strings as well: RegExp=(?siU).*<dt>Humidity:.*class.*>(.*)<.*<dt>Wind:.*class.*>(?(?=\R).*title.*>(.*)<.*>(.*)<(?(?=br).*>(.*)<).*<dt>(?(?=\R))(?(?=.*<a href=).*>(.*)<.*class.*>(.*)<)).*Visibility.*class.*>(.*)< (at least based on the posted html codes).
Thank you. However, the problem with that is, that should there not be an <a href= at that point, the parser will continue searching for it all the way through the HTML, until it finds a different one. As mentioned, the HTML shown in the first post are only fragments of a larger page for testing. Sorry if that wasn't clear.

So what we have so far:
1. Find "Wind:", then "class", then the closing angle bracket ">". If, at this point there is a line-break (\R), then we know that there are wind elements, which are all captured within this first look-ahead. Otherwise we're done with wind, and a later Substitiute will insert the word "calm".
2. Capture the wind speed element and the wind direction element, up to the opening angle bracket "<". If the next characters are "br", then capture the wind gust element. Finally, search to the next "<dt>".
3. At this point, there will either be a line-break followed by an "<a href=", indicating wind chill elements ahead; or there will be the word "Visibility". This is where the difficulty lies. We need to look-ahead for "<a href=" after the line-break, and it is that line-break which it seems is causing the problem. Since the regex begins with "(?siU)", shouldn't white-space (including line-breaks) be handled automatically?
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: Help with regex lookahead

qwerky » March 19th, 2019, 7:29 pm

Well, I think it is solved. The answer was to extend the look-ahead for a line-break (\R), to include all of the remainder of the wind elements, including the look-ahead for "<a href=". :)
User avatar
balala
Rainmeter Sage
Posts: 8319
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Help with regex lookahead

balala » March 19th, 2019, 7:32 pm

qwerky wrote:
March 19th, 2019, 7:29 pm
Well, I think it is solved. The answer was to extend the look-ahead for a line-break (\R), to include all of the remainder of the wind elements, including the look-ahead for "<a href=". :)
It's ok then. I'm glad if you got it working well.