It is currently July 21st, 2019, 3:42 am

WebParser variable order of strings

Help with creating, editing & fixing problems with skins
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

WebParser variable order of strings

qwerky » February 10th, 2019, 12:30 am

In this page, I wish to extract the high temperature (red background) and low temperature (blue background) for the last 24 hours. Easy enough; one can look for 'arrow', 'up-arrow', 'down-arrow', 'high', 'low', etc.

The problem is that, depending on what time of day you visit the page, either the high or the low may come first. So the general question is: how do you parse out various elements whose order may vary?
User avatar
jsmorley
Developer
Posts: 19301
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser variable order of strings

jsmorley » February 10th, 2019, 12:45 am

That's all you are going to be getting from that site?
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: WebParser variable order of strings

qwerky » February 10th, 2019, 12:49 am

jsmorley wrote:
February 10th, 2019, 12:45 am
That's all you are going to be getting from that site?
Looking a little bit more at it, I may also get the latitude and longitude (which always come before the temperatures) just for interest sake, but not of great importance. The high and low temperatures are the only important pieces from that particular page.
User avatar
jsmorley
Developer
Posts: 19301
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser variable order of strings

jsmorley » February 10th, 2019, 1:02 am

Ok, that's kinda important, as if you are looking to parse that site for all the information, using the first entry as the "current" conditions, and then parsing down until you find the high and low temperatures for the last 24-hours, I think this would be very difficult to parse. The reason is that at any point in the day the "high" or "low" temperature could also be the "first" temperature, and the difference in how that is presented in the HTML would make parsing it a challenge indeed. Doable, but tricky...

If you just want the high and low for the last 24-hours, you can't do that in one regular expression, since as noted, they can be in either order, and you simply can't do that in regular expression. It is ALWAYS in the order that you ask for things.

You will need two regular expressions, so something like this:

Code: Select all

[MeasureSite]
Measure=WebParser
URL=https://weather.gc.ca/past_conditions/index_e.html?station=yyz
RegExp=(?siU)^(.*)$

; High Temperature for last 24 hours
[MeasureGetHighTemp]
Measure=WebParser
URL=[MeasureSite]
StringIndex=1
RegExp=(?siU)<td headers="header3m" class="highTemp metricData text-center vertical-center"><span class="highLow">(.*)\((.*)\)\s

[MeasureExtractHighTempRounded]
Measure=WebParser
URL=[MeasureGetHighTemp]
StringIndex=1

[MeasureExtractHighTempDecimal]
Measure=WebParser
URL=[MeasureGetHighTemp]
StringIndex=2

; Low Temperature for last 24 hours
[MeasureGetLowTemp]
Measure=WebParser
URL=[MeasureSite]
StringIndex=1
RegExp=(?siU)<td headers="header3m" class="lowTemp metricData text-center vertical-center"><span class="highLow">(.*)\((.*)\)\s

[MeasureExtractLowTempRounded]
Measure=WebParser
URL=[MeasureGetLowTemp]
StringIndex=1

[MeasureExtractLowTempDecimal]
Measure=WebParser
URL=[MeasureGetLowTemp]
StringIndex=2

1.jpg

So you see, there is no end to the clever stuff you can do with WebParser. What I am doing is:

1) Get the ENTIRE site with one regular expression and one big (capture)
2) Create a "child" measure that uses that information from the entire site, but then acts as if it is ALSO a "parent", by parsing it with a RegExp option and creating one or more (captures).
3) Create a "grandchild" measure that uses that child/parent measure as a "parent" and gets the single capture from it I am interested in.

That way I can look for information in any order I want, as they are distinct and separate trips to the well of information, while only actually going out to the internet one time.
You do not have the required permissions to view the files attached to this post.
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: WebParser variable order of strings

qwerky » February 10th, 2019, 1:14 am

Oh, that is so cool! 8-) You only have to read the site once, but then can parse it as many times as you like with different measures. Great. :great:

And also I learned from your code that RegExp=(?siU)^(.*)$ can capture the entire page, since line-breaks are ignored; and that makes it all work. Thanks! :rosegift:
User avatar
jsmorley
Developer
Posts: 19301
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser variable order of strings

jsmorley » February 10th, 2019, 1:16 am

qwerky wrote:
February 10th, 2019, 1:14 am
Oh, that is so cool! 8-) You only have to read the site once, but then can parse it as many times as you like with different measures. Great. :great:
Yes, the androgynous nature of WebParser, where a measure can at once be both a parent and a child, can be very handy once you wrap your head around it. Ok, so androgynous isn't really the right term, but how often do I get to use that word...
User avatar
jsmorley
Developer
Posts: 19301
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser variable order of strings

jsmorley » February 10th, 2019, 1:19 am

qwerky wrote:
February 10th, 2019, 1:14 am
And also I learned from your code that RegExp=(?siU)^(.*)$ can capture the entire page, since line-breaks are ignored
Right. Technically speaking you only need (?s), but I find it a good habit to just always use (?siU) with WebParser, it can't really ever hurt, but you will be scratching your head if you leave off one of them and you actually need it.
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: WebParser variable order of strings

qwerky » February 10th, 2019, 8:04 pm

jsmorley wrote:
February 10th, 2019, 1:16 am
Yes, the androgynous nature of WebParser, where a measure can at once be both a parent and a child, can be very handy once you wrap your head around it. Ok, so androgynous isn't really the right term, but how often do I get to use that word...
Methinks, "Father, son, grandson..." But then, I've already been using that in my weather skin--the main measure parses the page into groups, then those groups become parents for other measures to parse individual strings, etc. So, I should have been able to figure it out from that; it just didn't occur to me to read the entire page, so that it doesn't matter at all what order the child measures come in! :headbang: