It is currently May 6th, 2024, 6:55 am

SOLVED WebParser RegExp Problem

Get help with creating, editing & fixing problems with skins
User avatar
Arne Anka
Posts: 100
Joined: April 18th, 2009, 11:31 am
Location: Sweden

SOLVED WebParser RegExp Problem

Post by Arne Anka »

I'm trying to write a new weather-skin, but have a problem with the RegExp I can't figure out. :? All I get is a "WebParser.dll: [MeasureForecast] Matching error! (-8)" error.

Code: Select all

[Rainmeter]
Update=1000

[Variables]
UpdateInt=900

;===============================================================
[MeasureForecast]
Measure=Plugin
Plugin=WebParser.dll
Url=http://foreca.com/Denmark/Smorumnedre?lang=da&units=metric&tf=24h&tenday
RegExp="(?siU)wrap-area-bot.*details=(.*)".*="(.*)".*<span class="h5">(.*)</span>.*symbol_50x50d symbol_(.*)_50x50.*<strong>(.*)</strong>.*<strong>(.*)</strong>.*<img src="/img/symb-wind/(.*)".*<strong>(.*)</strong>\s.*(.*)\n.*\s.*</span>"
Debug=1
;===============================================================
[MeasureDateD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
StringIndex=1

[MeasureWeatherD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
StringIndex=2

[MeasureHeaderD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
StringIndex=3

[MeasureIconD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
StringIndex=4

[MeasureTempHiD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
DecodeCharacterReference=1
StringIndex=5

[MeasureTempLoD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
DecodeCharacterReference=1
StringIndex=6

[MeasureWindDirectionD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
StringIndex=7
Substitute="gif":"png"

[MeasureWindSpeedD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
StringIndex=8

[MeasureWindSpeedTypeD1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureForecast]
UpdateRate=#UpdateInt#
StringIndex=9
;===============================================================
[MeterIconD1]
Meter=IMAGE
MeasureName=MeasureIconD1
X=25
Y=25
H=90
ImageName=%1
PreserveAspectRatio=1
AntiAlias=1
I've isolated the problem to StringIndex3 - the .*<span class="h5">(.*)</span>.* part of the RegExp. The "funny" thing is, it's working in mr. Morley's RainRegExp 2.0 tester.
Last edited by Arne Anka on February 9th, 2012, 8:34 pm, edited 1 time in total.
Livet är bara en period man ska överleva.
Som filosoferna säger: man föds, man lever och man dör ensam...
User avatar
jsmorley
Developer
Posts: 22631
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser RegExp Problem

Post by jsmorley »

I'm really not sure. It seems to be some kind of "encoding/codepage" issue with that website, but I can't for the life of me see why it is fine with everything up to that point and seems to blow up when it hits that <span class="h5">I dag</span>, which seems fairly innocuous compared to the mix of Arabic and who know what else in other places on that page.

Maybe spx or poiru has a better handle on what the difference is between a WebParser error of -8 and the usual -1 we get on a matching error.
User avatar
Arne Anka
Posts: 100
Joined: April 18th, 2009, 11:31 am
Location: Sweden

Re: WebParser RegExp Problem

Post by Arne Anka »

jsmorley wrote:It seems to be some kind of "encoding/codepage" issue with that website,
Well, I did have a "CodePage=65001" in there, but it got removed during my trial-and-error testing. So I don't think that's the problem here...
Livet är bara en period man ska överleva.
Som filosoferna säger: man föds, man lever och man dör ensam...
User avatar
jsmorley
Developer
Posts: 22631
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser RegExp Problem

Post by jsmorley »

Arne Anka wrote: Well, I did have a "CodePage=65001" in there, but it got removed during my trial-and-error testing. So I don't think that's the problem here...
Yes, I tried 65001 and 1201 and 1200 and all the usual suspects. No joy.
User avatar
~Faradey~
Posts: 366
Joined: November 12th, 2009, 4:47 pm
Location: Ukraine

Re: WebParser RegExp Problem

Post by ~Faradey~ »

Arne Anka,
use this link instead
http://foreca.com/Denmark/Smorumnedre?units=metric&tf=24h&tenday
or you want to pars something specific from it?
User avatar
Brian
Developer
Posts: 2689
Joined: November 24th, 2011, 1:42 am
Location: Utah

Re: WebParser RegExp Problem

Post by Brian »

It seems the file WebParser gets is slightly different file than the one you get opening your browser. Here is a sample to what WebParser gets:

WebParser Version

Code: Select all

wrap-area-bot">
              <div class="table t_cond">
                <div class="c1">
                	  <h4 class="entry-title">Aktuelle forhold</h4>
                  <div class="left">
				  							<div class="symbol_70x70n symbol_n300_70x70 cc_symb"></div>
																															<span class="cold txt-xxlarge"><strong>-3</strong> &deg;C</span><br />
																		                    <img src="/img/symb-wind/w045.gif" alt="NE" width="27" height="28" /> <strong>4 m/s</strong><br />                  </div>
                  <div class="right txt-tight">
				  	Overskyet<br />			Føles som: <strong>-8&deg;</strong><br />
                     Barometer:  <strong>1042.0 hPa </strong><br />
                     Dugpunkt:   <strong>-10&deg;</strong><br />
                     Luftfugtighed:   <strong>53%</strong><br />
					 Sigtbarhed: <strong>10 km</strong><br /><br />
					 Solopgang: <strong>07:49</strong><br />
					 Solnedgang:  <strong>17:01</strong><br />
					 </div>
                  <div class="bot txt-tight grey">
				  										  	Pr. 09/02 17:50<br />
															                   	Observeret kl. Koebenhavn / Roskilde<br />
												<ul><li><a href="/Denmark/Smorumnedre?obshist">Tidligere observationer, <strong>Koebenhavn / Roskilde</strong>&nbsp;<img src="http://img.foreca.net/i/e/arrow1.gif" alt=">" width="4" height="8" /></a></li></ul>
															  </div>
                </div>
                <div class="c2">
                  <h4 class="in">3-dages vejrudsigt</h4>
				                    <div class="c2_a">
				  <a href="/Denmark/Smorumnedre?details=20120209" title="Delvist overskyet">
				  	<strong>I dag</strong><br />
					<div class="symbol_50x50d symbol_d200_50x50"></div>
										                    <span>Hi: <strong>-2&deg;</strong><br /></span>
                    <span>Lo: <strong>-8&deg;</strong><br /></span>
				  </a>
				  </div>
				  <div class="c2_a">
				  <a href="/Denmark/Smorumnedre?details=20120210" title="Delvist overskyet">
				  	<strong>
					I morgen
					</strong><br />
					<div class="symbol_50x50d symbol_d200_50x50"></div>
										                    <span>Hi: <strong>-3&deg;</strong><br /></span>
                    <span>Lo: <strong>-11&deg;</strong><br /></span>
				  </a>
				  </div>
				  <div class="c2_a">
				  <a href="/Denmark/Smorumnedre?details=20120211" title="Overvejende klart">
				  	<strong>
					Lørdag
					</strong><br />
					<div class="symbol_50x50d symbol_d100_50x50"></div>
										                    <span>Hi: <strong>-3&deg;</strong><br /></span>
                    <span>Lo: <strong>-9&deg;</strong><br /></span>
				  </a>
				  </div>
				  <div class="in">
						<ul>
							<li><a href="/Denmark/Smorumnedre?tenday"><strong>10-dages vejrudsigt</strong>&nbsp;<img src="http://img.foreca.net/i/e/arrow1.gif" alt=">" width="4" height="8" /></a></li>
						</ul>
					</div>
                </div>
              </div>
            </div>
          </div>
The code was actually full of extra tabs and spaces, so I tried to delete most. If you compare it to the code below, you can see the difference. This is what my browser sees:

Browser Version (My browser is firefox.)

Code: Select all

wrap-area-bot">
              <div class="table t_longfore">
                <h4>10-dages vejrudsigt</h4>
                <div class="row">
					
											
												
																									<div class="c1 daily clr1">

							<a href="/Denmark/Smorumnedre?details=20120209" title="Delvist overskyet" class="cell">
								<span class="h5">I dag</span>
								
								<div class="symbol_50x50d symbol_d200_50x50" alt="Delvist overskyet" title="Delvist overskyet"></div>
								<br class="clearb" />
								
								Hi: <strong>-2&deg;</strong><br />
								Lo: <strong>-8&deg;</strong><br />

								<span><span>
									<img src="/img/symb-wind/w090.gif" alt="E" width="27" height="28" />
									<strong>8</strong> m/s
								</span></span>
								<span class="more">Detaljer</span>
							</a>
						</div>
So if you look carefully, there is no <span class="h5">I dag</span>, it is <strong>I Drag</strong>. Also, there is no wind data in the WebParser version.

So here is what I did to your RegExp:

Code: Select all

RegExp="(?siU)wrap-area-bot.*details=(.*)".*="(.*)".*<strong>(.*)</strong>.*symbol_50x50d symbol_(.*)_50x50".*<strong>(.*)</strong>.*<strong>(.*)</strong>"
;.*/img/.*/(.*)".*<strong>(.*)</strong>\s(.*)\n.*\s.*</span>"
All I did was change the <span class="h5"> to <strong>, and take out the wind data, and it now works.

I am not sure why the WebParser one is different than the browser one. You may have to look elsewhere to get your weather data.

BTW- I had the same problem when I did my weather skin using wunderground, but it wasn't that different from my browser version like this one.

-Brian
User avatar
Arne Anka
Posts: 100
Joined: April 18th, 2009, 11:31 am
Location: Sweden

Re: WebParser RegExp Problem

Post by Arne Anka »

~Faradey~ wrote:Arne Anka,
use this link instead
http://foreca.com/Denmark/Smorumnedre?units=metric&tf=24h&tenday
or you want to pars something specific from it?
Yes, I would like it to be in Danish (index 2 and 3) without translations...

Code: Select all

DEBUG: (00:24:58.732) WebParser.dll: [MeasureForecast] (Index  1) 20120209
DEBUG: (00:24:58.748) WebParser.dll: [MeasureForecast] (Index  2) Partly cloudy
DEBUG: (00:24:58.764) WebParser.dll: [MeasureForecast] (Index  3) Today
DEBUG: (00:24:58.764) WebParser.dll: [MeasureForecast] (Index  4) d200
DEBUG: (00:24:58.779) WebParser.dll: [MeasureForecast] (Index  5) -2&deg;
DEBUG: (00:24:58.795) WebParser.dll: [MeasureForecast] (Index  6) -8&deg;
DEBUG: (00:24:58.795) WebParser.dll: [MeasureForecast] (Index  7) w090.gif
DEBUG: (00:24:58.810) WebParser.dll: [MeasureForecast] (Index  8) 8
DEBUG: (00:24:58.826) WebParser.dll: [MeasureForecast] (Index  9) m/s
Livet är bara en period man ska överleva.
Som filosoferna säger: man föds, man lever och man dör ensam...
User avatar
jsmorley
Developer
Posts: 22631
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser RegExp Problem

Post by jsmorley »

Ah I hadn't thought of that. I suspect strongly that the website is reacting to the "User Agent String" sent by a browser connecting to it. I think AutoIt, (and thus RainRegExp) uses an "Internet Explorer" User Agent String, while WebParser's is "Rainmeter WebBrowser plugin" or something like that (I forget specifically). So somewhere there is code that is trying to tailor what is sent to work best on specific browsers, and WebParser is no doubt falling in some "other" bucket...
User avatar
Arne Anka
Posts: 100
Joined: April 18th, 2009, 11:31 am
Location: Sweden

Re: WebParser RegExp Problem

Post by Arne Anka »

~Faradey~ wrote:Arne Anka,
use this link instead
http://foreca.com/Denmark/Smorumnedre?units=metric&tf=24h&tenday
or you want to pars something specific from it?
You made me change/reorder the link to "http://foreca.com/Denmark/Smorumnedre?units=metric&tf=24h&lang=da&tenday" and then it's working! Without changing the RegExp! :???:
Livet är bara en period man ska överleva.
Som filosoferna säger: man föds, man lever och man dör ensam...
User avatar
Arne Anka
Posts: 100
Joined: April 18th, 2009, 11:31 am
Location: Sweden

Re: WebParser RegExp Problem

Post by Arne Anka »

Brian wrote:It seems the file WebParser gets is slightly different file than the one you get opening your browser. Here is a sample to what WebParser gets:
Funny things happening! :???: Reordering the link, made it work.
But I used the WebParser download option (Debug2File="Debug2File.txt" as in the manual) and the resulting file is an exact copy of what I get in FirefoX.

Here's a WebParser dump on-screen...
WebParserDump.png
Last edited by Arne Anka on February 9th, 2012, 8:51 pm, edited 2 times in total.
Livet är bara en period man ska överleva.
Som filosoferna säger: man föds, man lever och man dör ensam...