balala wrote: ↑January 18th, 2019, 4:42 pm
First I have to admit I am a bit lazy and that's why I didn't read all your conversation with FreeRaider, so maybe I missed something, but here is a solution to your initial question (I hope). Try the following RegExp option:
RegExp=(?siU)<dt>Wind:</dt>.*<dd class=".*">.*(?(?=.*<abbr).*title=".*">(.*)</abbr>(.*)<abbr title=".*">(.*)</abbr>.*</dd>).*(.*)</dd>.
Note the followings:
- If the site shows "calm", then this will be the only result, returned by the WebParser measure which has set the StringIndex to 4 (StringIndex=4). The measures with StringIndex from 1 to 3 are returning nothing in this case.
- If there is a wind speed, then the measure with StringIndex=1 will show the direction of wind, that one with StringIndex=2 will show the value of the speed and finally the measure with StringIndex=3 will show the measurement unit of the speed. In this case the measure with StringIndex=4 returns a mess and has to be ignored.
Unfortunately for now I could test the above RegExp only for those cases when there is wind, but not for calm. I saved in a file what you've posted in your initial request and used that file to test what's going on when there is "calm". I hope online it'll work in both cases. Please let me know if it does.
Thank you both for your help. Unfortunately, I couldn't get the above to work live. Yes, it's rare to find calm (no wind), though I have such a page saved to test with.
The samples and the regex in my original post concerned only the wind, however my latter post showed the whole regex. Here are two new samples that begin with Humidity (ahead of Wind), and include Visibility following Wind:
Code: Select all
<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">70%</dd>
</dl></div>
<div class="col-sm-4" style="vertical-align: top; min-height: 107px;"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">calm</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">calm</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
Code: Select all
<dt>Humidity:</dt>
<dd class="mrgn-bttm-0">70%</dd>
</dl></div>
<div class="col-sm-4" style="vertical-align: top; min-height: 107px;"><dl class="dl-horizontal wxo-conds-col3">
<dt>Wind:</dt>
<dd class="longContent mrgn-bttm-0 wxo-metric-hide">
<abbr title="West">W</abbr> 10 <abbr title="kilometres per hour">km/h</abbr>
</dd>
<dd class="longContent mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">
<abbr title="West">W</abbr> 6 <abbr title="miles per hour">mph</abbr>
</dd>
<dt>
<a href="https://www.canada.ca/en/environment-climate-change/services/weather-health/wind-chill-cold-weather.html">Wind Chill</a>:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">-16</dd>
<dd class="mrgn-bttm-0 wxo-imperial-hide wxo-city-hidden">4</dd>
<dt>Visibility:</dt>
<dd class="mrgn-bttm-0 wxo-metric-hide">24 <abbr title="kilometres">km</abbr>
</dd>
After more work, I've come up with this regex:
Code: Select all
.*<dt>Humidity:.*class.*>(.*)<.*<dt>Wind:.*class.*>(?(?=calm)(.*)<.*<dt>Visibility|.*title.*>(.*)<.*>(.*)<.*>(.*)<.*href=".*>(.*)<.*class.*>(.*)<.*<dt>Visibility).*class.*>(.*)<.*>(.*)<
Which goes through Humidity, Wind and Visibility.
Using the above seems to work both in RainRegExp, and with my saved sites--to a degree. If you try it in RainRegExp, you will see that in either case, string index 1 contains the humidity, and string indices 8 and 9 contain the visibility distance/unit.
When the wind is calm, index 2 contains "calm", and indices 3 through 7 are null (which is correct, as there is no wind speed nor wind chill when the wind is calm).
When the wind is not calm, string index 2 is null, and indices 3 through 7 contain wind direction, speed, units, "Wind Chill", and wind chill value; all correct.
The only issue is that in either case, there is one or more null measures. I would prefer to have had "calm" appear in the "Wind Direction" measure when appropriate. But since that hasn't been accomplished, the only issue now is how to add code to detect the two cases and display the appropriate measures.
P.S. Thanks, FreeRaider, for showing me the OR in the lookahead, and giving me a start on the regex.