It is currently May 9th, 2024, 7:16 pm

Webparsing Anchors

Get help with creating, editing & fixing problems with skins
mazty
Posts: 7
Joined: January 1st, 2012, 4:44 am

Webparsing Anchors

Post by mazty »

Hi, I'm trying to make a web parsing skin which has become a lot more complex then I anticipated because of the following:

<td nowrap="nowrap" align="right" style="padding-left: 0; color: grey;"><span style="color: green;">+16</span></td>
<td nowrap="nowrap" align="right" style="padding-left: 0; color: grey;"><span style="color: red;">-10</span></td>


The highlighted numbers are what I want to return, which change roughly every few hours and there are 8 pairs of the code above. However the problem is when there is a null value the code changes to:

<td nowrap="nowrap" align="right" style="padding-left: 0; color: grey;">0</td>
<td nowrap="nowrap" align="right" style="padding-left: 0; color: grey;"><span style="color: red;">-25</span></td>

The original webparser I was using was using the "span style="colour.*>"" as an anchor, which now causes null values to be skipped, returning the wrong value.

I've tried using a larger anchor and substituting what I don't need for numbers, but the substitute command doesn't seem to work on segments of text.

Is there a way of using the </td> as the closing anchor and somehow removing all the text and leaving only the number? (<span style="color: red;">-25</span>)

Am I missing something obvious? Any help is greatly appreciated.
User avatar
Brian
Developer
Posts: 2690
Joined: November 24th, 2011, 1:42 am
Location: Utah

Re: Webparsing Anchors

Post by Brian »

What I would do is use RegExpSubstitute=1 in your child webparsers. That allows you to use regular expressions in your substitutes. You can then just take out any <span> tags you gather from the website. Note the use of the single quote rather than double quotes in the substitute.

Your webparsers would look something like this:

Code: Select all

[MeasureParent]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=http://something.com/page.html
RegExp="(?siU)<td.*>(.*)</td>.*<td.*>(.*)</td>"

[MeasureChild1]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[MeasureParent]
StringIndex=1
RegExpSubstitute=1
Substitute='<span.*>':"","</span>":""

[MeasureChild2]
Measure=Plugin
Plugin=Plugins\WebParser.dll
Url=[MeasureParent]
StringIndex=2
RegExpSubstitute=1
Substitute='<span.*>':"","</span>":""
-Brian
mazty
Posts: 7
Joined: January 1st, 2012, 4:44 am

Re: Webparsing Anchors

Post by mazty »

Many thanks for the reply, it seems to have done the trick after some tampering.

For some reason the Measure Child commands would only "backwards" as such. Maybe it's to do with the way the measure monitors the link? I don't know, but ultimately this was the resulting measure:

Code: Select all

[MeasureChild1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureParent]
StringIndex=1
RegExpSubstitute=1
Substitute='</span>' : "", '<span.*>':""

[DisplayChild1]
MeasureName=MeasureChild1
Meter=STRING
meterStyle=stylePOS
X=160
Y=50