It is currently August 18th, 2019, 2:04 pm

Help with regex

Help with creating, editing & fixing problems with skins
ms310
Posts: 99
Joined: April 1st, 2015, 7:16 am

Help with regex

ms310 » May 9th, 2019, 7:59 am

Hi there. I would like to parse data that comes from this website: view-source:https://isitraining.in/chicago

Replace "chicago" with your city to see the results.

When you view the source you will see a mixture of data. While I am able to capture values of the format <tag>(.*)</tag> this website has some information at the beginning that I would also like to fetch.

Code: Select all

<!--

stdClass Object
(
    [daylight] => N
    [description] => Heavy rain. Fog. Cool.
    [skyInfo] => 21
    [skyDescription] => Fog
    [temperature] => 16.11
    [temperatureDesc] => Cool
    [comfort] => 16.11
    [highTemperature] => 18.80

-- snipped the data to make it shorter --

distance] => 11.09
    [elevation] => 179
    [utcTime] => 2019-05-09T01:59:00.000-05:00
)


--><!DOCTYPE html>
<html lang="en">

-- after this we see a normal html file with tags --
I would like to grab the values for temperature, humidity, comfort, etc and then at the end of the file there is a <result> value I would like to grab. I know this would be <result>(.*)</result>. What I am after is a strategy for grabbing the bits and pieces of my file. I am struggling to get more than one value like [temperature] and [humidity] to match. I can match one of them but then fail to grab the next one. Then I am failing to correctly grab the <result> because my regex is failing on the front end.

Any help with a plan of attack would be appreciated.

Thanks!
User avatar
FreeRaider
Posts: 783
Joined: November 20th, 2012, 11:58 pm

Re: Help with regex

FreeRaider » May 9th, 2019, 11:59 am

A short answer: you can't.

I think that information is provided by a script.
User avatar
jsmorley
Developer
Posts: 19372
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: Help with regex

jsmorley » May 9th, 2019, 12:46 pm

I don't see any reason why you can't parse that site.

There are two considerations that I found I had to take into account:

1) The site is checking to see if you are using a web "browser", in order to stop you from doing exactly what you are trying to do, to hit the site with an application on some regular basis. I assume the authors would prefer not to be hammered by some external, non-human process. Since you are presumably going to only be hitting the site every 10 minutes (the default for WebParser), I think it would be ok.

You do that by using the UserAgent option in your parent WebParser measure:
https://docs.rainmeter.net/manual/measures/webparser/#UserAgent
https://www.whatismybrowser.com/detect/what-is-my-user-agent

I used:

UserAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

And it worked fine. Without that, I get a 403-Forbidden error from the site.

Checking the user agent string is a pretty basic "locks are to keep honest people honest" way of detecting that you are coming from a "browser", and there is no certainty that they won't find that they are getting more traffic than they want and take more robust steps. I would strongly recommend that you don't hit the site more often than every 10 minutes or so. Doing so may only be shooting yourself in the foot.

2) If you are using the text [description] and the like to zero in on the information you want, (and I would) remember that [ and ] are both "reserved characters" in regular expression, and must be \escaped to be used as a literal in your search.

Code: Select all

[Rainmeter]
Update=1000
DynamicWindowSize=1
AccurateText=1

[MeasureSite]
Measure=WebParser
URL=https://isitraining.in/chicago
RegExp=(?siU)\[description\] => (.*)\n.*\[temperature\] => (.*)\n
UserAgent=Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

[MeasureDescription]
Measure=WebParser
URL=[MeasureSite]
StringIndex=1

[MeasureTemperature]
Measure=WebParser
URL=[MeasureSite]
StringIndex=2

[MeterDescription]
Meter=String
MeasureName=MeasureDescription
FontSize=11
FontWeight=400
FontColor=255,255,255,255
SolidColor=47,47,47,255
Padding=5,5,5,5
AntiAlias=1

[MeterTemperature]
Meter=String
MeasureName=MeasureTemperature
Y=5R
FontSize=11
FontWeight=400
FontColor=255,255,255,255
SolidColor=47,47,47,255
Padding=5,5,5,5
AntiAlias=1

1.png

So the pattern is relatively simple. Just use .*\[Whatever\] => (.*)\n repeatedly to find the element you want, detecting the end of the text with a \n linefeed, then on to the next one. Should be fairly easy to get any or all of those [ElementName] deals from the top of the site.
You do not have the required permissions to view the files attached to this post.
User avatar
SilverAzide
Posts: 571
Joined: March 23rd, 2015, 5:26 pm

Re: Help with regex

SilverAzide » May 9th, 2019, 12:55 pm

Hey, this is a pretty nice little weather site. The coverage is a little limited and some of the data outside major cities is kind of stale, but it is super easy to parse. And it's free!
DeviantArt Gadgets More...
User avatar
jsmorley
Developer
Posts: 19372
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: Help with regex

jsmorley » May 9th, 2019, 1:09 pm

That "result" bit I would get like this:

RegExp=(?siU)\[description\] => (.*)\n.*\[temperature\] => (.*)\n.*<h1 class='result'>(.*)</h1>
User avatar
FreeRaider
Posts: 783
Joined: November 20th, 2012, 11:58 pm

Re: Help with regex

FreeRaider » May 9th, 2019, 2:05 pm

jsmorley wrote:
May 9th, 2019, 12:46 pm
I don't see any reason why you can't parse that site.
My answer was based on this:
capture0001.PNG
You do not have the required permissions to view the files attached to this post.
User avatar
jsmorley
Developer
Posts: 19372
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: Help with regex

jsmorley » May 9th, 2019, 2:07 pm

FreeRaider wrote:
May 9th, 2019, 2:05 pm
My answer was based on this:
capture0001.PNG
That is due to the fact that RainRegExp is not a web "browser", and does not have a user agent string that the site likes. It uses the same "Rainmeter WebParser Plugin" user agent string that WebParser does by default. This is intentional, so what you get in RainRegExp is going to be the same thing you get in WebParser, for good or ill.
User avatar
FreeRaider
Posts: 783
Joined: November 20th, 2012, 11:58 pm

Re: Help with regex

FreeRaider » May 9th, 2019, 2:18 pm

jsmorley wrote:
May 9th, 2019, 2:07 pm
That is due to the fact that RainRegExp is not a web "browser", and does not have a user agent string that the site likes. It uses the same "Rainmeter WebParser" user agent string that WebParser does by default. This is intentional, so what you get in RainRegExp is going to be the same thing you get in WebParser, for good or ill.
But in this case if I use the UserAgent option I get information that I could never get with RainRegExp.

This limits the use of the tool, right?
User avatar
jsmorley
Developer
Posts: 19372
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: Help with regex

jsmorley » May 9th, 2019, 2:25 pm

FreeRaider wrote:
May 9th, 2019, 2:18 pm
But in this case if I use the UserAgent option I get information that I could never get with RainRegExp.

This limits the use of the tool, right?
If you are going to use UserAgent in WebParser, a pretty rare need, then I guess I would set UserAgent to the same thing as my browser, go to the site in my browser and "view source", or use Debug=2 in WebParser, then copy that source code and paste it into RainRegExp to work on.

Can't have it both ways. The logical and useful thing for RainRegExp is to get the same result that WebParser would get by default. I'm not really interested in allowing a UserAgentString setting in RainRegExp, as you would just forget you set it, and then it would not always be valid in other, normal, cases.

The amount of design work in RainRegExp needed to properly support this would be more than I think it is worth. You can simply paste any text you want into the big "HTML to parse" box in RainRegExp, or just open the WebParserDump.txt file you got with Debug=2. Nothing says you have to connect to the site in RainRegExp.

If this was a more common occurrence, I'd be tempted to deal with it, but it really isn't. I have only run into a couple of sites over the years that restrict access using the user agent string alone, as most sites that want to restrict access to a "browser" take more robust steps that are much more difficult to get around. Simply streaming the results via client-side javascript is enough to defeat WebParser and RainRegExp.
ms310
Posts: 99
Joined: April 1st, 2015, 7:16 am

Re: Help with regex

ms310 » May 9th, 2019, 11:24 pm

JSMORLEY - thank you so very much. I was wrapping my head around the [ and ] characters. I also was failing to match the line feeds with \n.

This is great - cheers!