It is currently March 29th, 2024, 9:16 am

How to use the WebParser plugin

Tips and Tricks from the Rainmeter Community
User avatar
jsmorley
Developer
Posts: 22628
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

How to use the WebParser plugin

Post by jsmorley »

This little tutorial is meant to shed some light on using the WebParser.dll plugin in Rainmeter to retrieve information from a website for use in your skins. A lot of us have used WebParser to get RSS feeds or other data from websites, but often by using/tweaking other folk's code and without really understanding how it works or how to create one of those ridiculous "RegExp=" statements which are the core of how you parse a web page to extract information.

This is not meant to be exhaustive, nor do I go into the formats of XML files or how they are used in RSS feeds. Suffice it to say though, that once you know how to parse a web page in general, looking for the XML/RSS tags will be a lot easier.

First, let's get a .zip file with everything you will need for this tutorial. The .ini file, some text files we will use, and some graphic images you will want to have. Unzip this into your ..\Skins folder and it will create a folder called "IPLookup" which will have everything you need.

ftp://uzaeagle.serveftp.com/rainmeter/webparser_tutorial/webparser_tutorial.zip


OVERVIEW:

First let me just paste in the information from the Rainmeter manual/help for WebParser so we have that as a reference as we go along:

*****************************

WebParser reads information from webpages. The plugin uses regular expressions to parse the web page which allows it to extract information pretty much from any page. The plugin can be used e.g. to get the current TV shows, weather conditions, stock exchange values, news and basically anything that is on the net. The negative side is that the regular expressions might look rather complex especially if you're not familiar with programming languages. (and even if you are.)

Url
Url to the file to be downloaded and parsed. The Url can also be another WebParser-measure, in which case the already downloaded and parsed information can be reused (e.g. when displaying different StringIndex on the same page). To do this just give the name of the measure in the Url, like this: Url=[MeasureSlashDot]

RegExp
The regular expression used in parsing. The plugin uses Perl Compatible Regular Expressions, so check the Perl docs for syntax and more info.

FinishAction
Action that is executed when the page has been downloaded and the parsing is done.

StringIndex
Defines which string from the regexp this measure returns. You can get the correct index values by setting the Debug=1, which will add all matched strings to the log-file.

StringIndex2
The second string index is used when using a RegExp in a measure that uses data from another webparser measure (i.e. the Url points to a measure and not to a real URL). In this case the StringIndex defines the index of the result of the other RegExp and the StringIndex2 defines the index of this measure's RegExp (i.e. it defines the string that the measure returns). If the RegExp is not defined in this measure the StringIndex2 has no effect.

UpdateRate
The rate how often the webpage is downloaded. This is relative to the config's main update rate. It is advisable to limit the rate so that you're not flooding the server with constant requests. The web server admins will not like it and you probably get banned from the server altogether if you try to poll the server too often. So, if the main update rate is 1000 (i.e. one second, which is the default) set this e.g. to 60 to read the webpage once per minute.

Debug
Set this to 1 and the log file will contain some useful debug information. Value 2 dumps the downloaded webpage to C:\WebParserDump.txt. This can be useful since some web servers send different information depending which client requests it. Remember to remove this from your config once you have it working correctly.

Download
If set to 1, the Url is downloaded to a temporary folder and the name to the file is returned as string value. The measure can be bound to a IMAGE meter to download images from web and show them. If the RegExp is defined the parsed string is interpreted as a link to the downloaded image.

ErrorString
String that is returned in case there is a parse error.

Proxy
Name of the proxy server. The plugin doesn't support any authentication so it's possible to use only servers that does not require it or you need a some different way to authenticate yourself to the proxy server.

CodePage
Defines the codepage of the downloaded web page. For example CodePage=28605 interprets the page as Latin 9 (ISO-8859-15). If the CodePage is set to 0 no conversion is done. CodePage=65001 means UTF-8. You can check other Windows code pages from here.

*****************************

Now let's spend a minute talking about the key part of the code, the "RegExp=" parameter. WebParser uses this "Regular Expression" to search for the information you want to retrieve, and return the information in a two dimensional "array". The first dimension is the "Index" number, (which item number in the array, starting with 1 and going to however many things you search for) and the second is the actual data retrieved based on the results of the RegExp.

So how do we build one of these mysterious "RegExp=" statements in Rainmeter? Let me use a template showing the format in simple terms:

RegExp="(?siU)Search1(Return1)Search2.*"

So enclosed in quotes, you have

1. (?siU) - which is an "options" command for RegExp, telling it how you want it to behave
2. Search1 - Which will be some text you will search on to get you right up to the data you want.
3. (Return1) - The data you want returned in the array to use in your skin
4. Search2 - Use this to tell RegExp what to look for to know it's time to "stop" collecting information in (Return1)
5. .* - This will contain data from the website which is between this set of search/return parameters and the next one. It will not be returned as you did not enclose the .* in parentheses, and will not use a StringIndex number.

What is this ?siU stuff?

The "?s" tells RegExp to ignore "line breaks" when doing a search. That way if you search for "Every good boy deserves favor" and it is split on two lines in the output, it will still match.

The "i" tells the search to be "case insensitive". Matches will work on both upper and lower case.

The "U" tells RegExp to be "ungreedy", meaning that it will return only the first instance of the match on the search string.

I obviously don't know what "stuff to return" is, what do I put there?

You have two options. You can put in more Regular Expression codes to grab just the data you want and exclude stuff you don't want, or you can tell RegExp to just get everything until you tell it to stop with Search2 by using "(.*)" without the quotes. the .* just means "everything".

An important point is that "(.*)" "returns" data in a StringIndex and ".*" "skips" data you don't need / want to return.

A quick example:

You have a web page you want to get the "TITLE" tag from. Here is the RegExp you would use:

RegExp="(?siU)<TITLE>(.*)</TITLE>"

So we have told RegExp to search for the text "<TITLE>" then return everything after it in "StringIndex 1" of the array until it sees "</TITLE>", where it will stop.


OUR EXAMPLE SKIN:

So what we are going to do today is parse a website http://www.geobytes.com/IpLocator.htm?GetLocation to get our IP and location information to use in a skin.

Here is what the web page we are going to parse looks like:



Here is the information we are going to extract from it (highlighted):



Here is the final skin we will create: (I have kept the skin simple as the purpose is to learn WebParser, not skinning in general)




GETTING STARTED:

The first thing you want to do is to go to the web page in your browser, right-click, choose "View Page Source", copy everything, paste it into a text editor and save the output/source of the web page to a text file. (Save it as a .txt file and not .html, so you can easily use the text editor to work with it in a bit.) I saved it as "Webpage.txt" which is included in the webparser_tutorial.zip file for this tutorial

Let's start by looking at the first bit of information we want to retrieve.

We want to start by getting our IP address. On the web page, it is near the top of the area with all the information we want, with a label "IP Address to locate:"

Open up Webpage.txt (the saved output from the website) and search for that label. You will find a section of the html which looks like this:

Code: Select all

 <td>
<div align="left">
<form method="POST" action="IpLocator.htm?GetLocation">
 <input type="hidden" name="cid" value="0">
 <input type="hidden" name="c" value="0">
 <input type="hidden" name="Template" value="iplocator.htm">
 <h3>IP Address to locate:<input type="text" name="ipaddress" size="15" value="72.205.26.142">&nbsp;<input type="submit" value="Submit">
 </h3>
Although YOUR IP address will be different than mine, the bit we want from this example is the "72.205.26.142" IP address. (without the quotes)

So let's start building our "MEASURE" and a sample "METER" showing the output

[MeasureIPAddress]
Measure=Plugin
Plugin=Plugins\WebParser.dll
UpdateRate=1800
Url="http://www.geobytes.com/IpLocator.htm?GetLocation"
RegExp="(?siU)<h3>IP Address to locate:<input type="text" name="ipaddress" size="15" value="(.*)">.*"

What we are telling WebParser is:

UpdateRate=1800 - We want to check the website at a rate 1800 times the value in the "Update=" parameter in the "Rainmeter" section. As this defaults to "1000" or once every 1000 milliseconds (1 second) we will be running WebParser every 1800 seconds or 30 minutes. This is plenty often, you could even check every hour or more as your IP information doesn't change much and you don't want to "spam" the website with requests. You may well find yourself blocked...

Url="http://www.geobytes.com/IpLocator.htm?GetLocation" - The URL to the website. It can be set as a variable in the "VARIABLES" section to make it easier to find and change if you want.

RegExp="(?siU)<h3>IP Address to locate:<input type="text" name="ipaddress" size="15" value="(.*)">" - Ah, the meat and potatoes.

You are telling RegExp to:

Use the (?siU) command parameters, (described earlier) search for IP Address to locate:<input type="text" name="ipaddress" size="15" value=" and return everything until it sees "> where it will stop.

So if we look again at our output in Webpage.txt

Code: Select all

 <td>
<div align="left">
<form method="POST" action="IpLocator.htm?GetLocation">
 <input type="hidden" name="cid" value="0">
 <input type="hidden" name="c" value="0">
 <input type="hidden" name="Template" value="iplocator.htm">
 <h3>IP Address to locate:<input type="text" name="ipaddress" size="15" value="72.205.26.142">&nbsp;<input type="submit" value="Submit">
 </h3>
You can see that we will return 72.205.26.142 in index 1

Now let's display the output in a "METER"

[MeterIPAddress]
MeasureName=MeasureIPAddress
Meter=STRING
X=2
Y=2
FontColor=0,0,0,255
FontSize=12
StringAlign=LEFT
FontFace=Tahoma
Antialias=1
StringIndex=1
Prefix="IP Address: "

So this will display "IP Address: 72.205.26.142" on your skin as below:



Now let's get the next bit of information we want from the website (remember, the RegExp reads the website in order from top to bottom, so you need to use the correct order in the "RegExp=" statement. You can display the information in any order you want on your skin however)

The next bit in the Webpage.txt file is the "Country Code".

Code: Select all

<tr>
<td align="right">Country Code</td>
<td align="right"><input name="ro-no_bots_pls12" value="US" size="20" readonly></td>
So we want to add to our "RegExp=" statement, search for the Country Code and return the result in the next index of the array.

RegExp="(?siU)<h3>IP Address to locate:<input type="text" name="ipaddress" size="15" value="(.*)">.*ro-no_bots_pls12" value="(.*)" size="20".*

So after the first pair of start/stop searches, we are adding:

.*ro-no_bots_pls12" value="(.*)" size="20".*

This will tell RegExp to skip everything until it finds ro-no_bots_pls12" value=" and then return everything until it sees " size="20" and put it in StringIndex 2. The result in my example will be "US".

We will need another "MEASURE" to retrieve the data to display in our next "METER"

[MeasureCountryCode]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[MeasureIPAddress]
StringIndex=2

NOTE: The "Url" parameter is now calling the original "MeasureIPAddress" measure which has the "RegExp=" statement in it instead of the URL for the website. You do this so you are not hitting the website with a new RegExp statement every time you want to get a bit of information. You get everything in the first measure with the "RegExp=" statement in it, all the results are indexed by this first measure, then you can use the index numbers and data in later Measure/Meter combinations without going back to the website. Not only is this orders of magnitude faster, but you are being a good internet citizen. (Remember the "you will be blocked" admonition earlier?)

And another "METER" to display the information

[MeterCountryCode]
MeasureName=MeasureCountryCode
Meter=STRING
X=2
Y=17r
FontColor=0,0,0,255
FontSize=12
StringAlign=LEFT
FontFace=Tahoma
Antialias=1
Prefix="Country Code: "

Ok, so you just carry on like that until you have retrieved all the data you want from the website. I won't lay it all out here, but you can go through the sample IPLookup.ini file included in the .zip

Using "DEBUG" to make things easier:

Using The "Debug=1" statement in WebParser will cause your skin to output the index numbers and associated information into "Rainmeter.log" in your main ..\Rainmeter folder. Turn this on by putting "Debug=1" on your Measure:

[MeasureWebsite]
Measure=Plugin
Plugin=Plugins\WebParser.dll
UpdateRate=1800
Url=#URL#
RegExp="(?siU)Search1(Return1)Search2(Return2)"
Debug=1

Then restart Rainmeter, right click the skin, and say "Show Log File". It will have output like this:

Code: Select all

DEBUG: (00:53:29.391) Refreshing (Name: "IPLookup" Ini: "IP.ini")
DEBUG: (00:53:29.469) WebParser: Fetching URL: http://www.geobytes.com/IpLocator.htm?GetLocation
DEBUG: (00:53:29.859) WebParser: Finished URL: http://www.geobytes.com/IpLocator.htm?GetLocation
DEBUG: (00:53:29.859) WebParser: (Index  1) 72.205.26.142
Which tells you that the data "72.205.26.142" is in Index 1 and will carry on showing every index / data combination retrieved by your RegExp statement. Very useful for figuring out why your "RegExp=" statement isn't working right.

To turn Debug off, remove the "Debug=1" statement from the Measure, exit Rainmeter, delete the "Rainmeter.log" file and restart Rainmeter.


That's it for now. Maybe we can do this again soon and talk a bit more about XML and RSS feeds and using the "Substitute=" command to filter out HTML codes from your results, but this should give you a good start! I did this pretty quickly off the top of my head, so please let me know if you see errors or omissions and I will correct.
Last edited by jsmorley on June 18th, 2009, 7:14 pm, edited 1 time in total.
dick.fickling
Posts: 5
Joined: May 31st, 2009, 10:04 am

Re: How to use the WebParser plugin

Post by dick.fickling »

Just a quick question - is it possible to use user-inputted variables in the RegExp call?
For example:

[Variables]
Phrase1=project
Phrase2=moved

[MainURL]
Measure=Plugin
Plugin=Plugins\Webparser.dll
URL=http://www.rainmeter.net
RegExp="(siU)Phrase1(.*)Phrase2"
UpdateRate=600

And have that return " has been "? If so, what is the correct way to do it?
User avatar
jsmorley
Developer
Posts: 22628
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: How to use the WebParser plugin

Post by jsmorley »

dick.fickling wrote:Just a quick question - is it possible to use user-inputted variables in the RegExp call?
For example:

[Variables]
Phrase1=project
Phrase2=moved

[MainURL]
Measure=Plugin
Plugin=Plugins\Webparser.dll
URL=http://www.rainmeter.net
RegExp="(siU)Phrase1(.*)Phrase2"
UpdateRate=600

And have that return " has been "? If so, what is the correct way to do it?
Not sure. Given the limitations Rainmeter has with variables, I doubt it, but I will give it a test today and let you know.

If it does work, it would be:

[MainURL]
Measure=Plugin
Plugin=Plugins\Webparser.dll
URL=http://www.rainmeter.net
RegExp="(siU)"#Phrase1#"(.*)"#Phrase2#
UpdateRate=600

But I am not hopeful...

Rainmeter allows you to substitute the whole parameter:

RegExp=#MyRegExp#

But I doubt it is capable of concatenating the substitutions with the static part of the parameter (in quotes) It should be, that is just standard programming, but we have to live with the limitations while the devs work on the code. Better days are coming! ;-)
User avatar
Varelse
Posts: 61
Joined: April 22nd, 2009, 7:46 pm

Re: How to use the WebParser plugin

Post by Varelse »

I'm trying to make a feed to display Last.fm weekly top artists, and for the life of me, I can't figure this out.

I'm just trying to parse one line of the xml to begin with:

Code: Select all

<weeklyartistchart user="Varelse_" from="1243771200" to="1244376000">
The regexp code I've come up with is: Regexp= "(?siU)<weeklyartistchart user="(.+)""

It displays

Code: Select all

<weeklyartistchart user="Varelse_"
instead of just Varelse_.


The code i'm using for the config is:

Code: Select all

[Artists]
Measure=Plugin
Plugin=Plugins\WebParser.dll
UpdateRate=10
Debug=1
Url="http://ws.audioscrobbler.com/2.0/user/Varelse_/weeklyartistchart.xml"
RegExp="(?siU)<weeklyartistchart user="(.+)"" 

[BG]
Meter=Image
x=0
y=0
h=20
w=1000
Solidcolor=0,0,0,220

[UserName]
MeasureName=Artists
Meter=STRING
X=2
Y=0
FontColor=255,255,255,255
FontSize=12
StringAlign=LEFT
FontFace=Tahoma
Antialias=1
StringIndex=1
What am I doing wrong? And where is the debug log file? I've looked everywhere and I can't find it. Thanks.
User avatar
jsmorley
Developer
Posts: 22628
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: How to use the WebParser plugin

Post by jsmorley »

Varelse wrote:I'm trying to make a feed to display Last.fm weekly top artists, and for the life of me, I can't figure this out.

I'm just trying to parse one line of the xml to begin with:

Code: Select all

<weeklyartistchart user="Varelse_" from="1243771200" to="1244376000">
The regexp code I've come up with is: Regexp= "(?siU)<weeklyartistchart user="(.+)""

It displays

Code: Select all

<weeklyartistchart user="Varelse_"
instead of just Varelse_.


The code i'm using for the config is:

Code: Select all

[Artists]
Measure=Plugin
Plugin=Plugins\WebParser.dll
UpdateRate=10
Debug=1
Url="http://ws.audioscrobbler.com/2.0/user/Varelse_/weeklyartistchart.xml"
RegExp="(?siU)<weeklyartistchart user="(.+)"" 

[BG]
Meter=Image
x=0
y=0
h=20
w=1000
Solidcolor=0,0,0,220

[UserName]
MeasureName=Artists
Meter=STRING
X=2
Y=0
FontColor=255,255,255,255
FontSize=12
StringAlign=LEFT
FontFace=Tahoma
Antialias=1
StringIndex=1
What am I doing wrong? And where is the debug log file? I've looked everywhere and I can't find it. Thanks.
Try an * not a + in the (.*) part of the code. That should work.

The Rainmeter.log file will be in your Rainmeter home directory. You need to put Debug=1 in the code, then right click the skin and say "show log". This will ask "do you want to create the log?", say yes. Then refresh the skin and things will start being put in the log (you will need to reload the log after each refresh to get the latest data in it. Also, it is best to have only the skin you are testing running, so the log isn't confusing with data from other skins mixed together.

To shop logging, you need to take out the Debug=1 line, stop Rainmeter, delete the log, and restart Rainmeter.
User avatar
Varelse
Posts: 61
Joined: April 22nd, 2009, 7:46 pm

Re: How to use the WebParser plugin

Post by Varelse »

I finally got it to work. I had to make another measure for each StringIndex. Apparently it didn't like me using that it in a Meter.

Thank you for writing this tutorial and for your help. I thought writing a regexp would be difficult, but it's quite simple.
ZoranVitez
Posts: 1
Joined: September 2nd, 2009, 11:15 am

Re: How to use the WebParser plugin

Post by ZoranVitez »

Thanks for the tut
Silva
Posts: 1
Joined: October 1st, 2009, 8:24 pm

Re: How to use the WebParser plugin

Post by Silva »

What does the .+ do?
User avatar
Varelse
Posts: 61
Joined: April 22nd, 2009, 7:46 pm

Re: How to use the WebParser plugin

Post by Varelse »

It does pretty much the same thing as the .*, which grabs all the info between the tags in the Regexp.

Example:

Regexp="(?siU)<artist>(.*)</artist>"

Would parse whatever is between the artist tags. But, it's better to use (.*) instead of (.+).
User avatar
Alex2539
Rainmeter Sage
Posts: 642
Joined: July 19th, 2009, 5:59 am
Location: Montreal, QC, Canada

Re: How to use the WebParser plugin

Post by Alex2539 »

With regular expressions, the "." means "Any character", the "*" means "find the preceding pattern zero or more times" and the "+" means "find the preceding pattern one or more times". So basically, ".*" means "find any character zero or more times" and ".+" means "find any character one or more times".

For example, if you have the word "barstool" and searched for bar.*stool, you would get a match. it would find "bar" and "stool", and even though there are no characters in between, the "*" accepts that possibility. If you searched for bar.+stool, you would not get a match because the "+" necessitates at least one character.

In most cases, the difference between them is negligible since a lot of the time with Rainmeter it's used when you're trying to get something that you know for certain exists, but each one does have its uses.
ImageImageImageImage