It is currently March 28th, 2024, 12:42 pm

Webparser clean captured string

Get help with creating, editing & fixing problems with skins
Post Reply
AinaJ
Posts: 5
Joined: April 17th, 2018, 8:38 am

Webparser clean captured string

Post by AinaJ »

Hello!
I want to remove those html codes inside the text string captured by webparser, how can i deal with this in the output?
<em>Mandre ny teny </em>[<em>izy</em>] <em>ka mahazo ny heviny, ary tena mamoa.—</em><a href="/mg/wol/bc/r26/lp-mg/1102018053/66/0" data-bid="67-1" class="b"><em>Mat. 13:23</em></a><em>.</em>

Thank you for your help
User avatar
jsmorley
Developer
Posts: 22628
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: Webparser clean captured string

Post by jsmorley »

AinaJ wrote:Hello!
I want to remove those html codes inside the text string captured by webparser, how can i deal with this in the output?
<em>Mandre ny teny </em>[<em>izy</em>] <em>ka mahazo ny heviny, ary tena mamoa.—</em><a href="/mg/wol/bc/r26/lp-mg/1102018053/66/0" data-bid="67-1" class="b"><em>Mat. 13:23</em></a><em>.</em>

Thank you for your help
The better way to do this, rather than trying to use Substitute to "remove" these tags from your "output", is to use regular expression to not have them included in the first place.

https://docs.rainmeter.net/manual/skins/option-types/#RegExp

Skin:

Code: Select all

[Rainmeter]
Update=1000
DynamicWindowSize=1

[Variables]

[MeasureSite]
Measure=WebParser
URL=file://#CURRENTPATH#Test.html
RegExp=(?siU)<em>(.*)</em>.*<em>(.*)</em>.*<em>(.*)</em>.*href="(.*)".*bid="(.*)".*class="(.*)".*<em>(.*)</em>

[MeasureField1]
Measure=WebParser
URL=[MeasureSite]
StringIndex=1

[MeasureField2]
Measure=WebParser
URL=[MeasureSite]
StringIndex=2

[MeasureField3]
Measure=WebParser
URL=[MeasureSite]
StringIndex=3

[MeasureField4]
Measure=WebParser
URL=[MeasureSite]
StringIndex=4

[MeasureField5]
Measure=WebParser
URL=[MeasureSite]
StringIndex=5

[MeasureField6]
Measure=WebParser
URL=[MeasureSite]
StringIndex=6

[MeasureField7]
Measure=WebParser
URL=[MeasureSite]
StringIndex=7

[MeterDummy]
Meter=String
Test.html:

Code: Select all

<em>Mandre ny teny </em>[<em>izy</em>] <em>ka mahazo ny heviny, ary tena mamoa.—</em><a href="/mg/wol/bc/r26/lp-mg/1102018053/66/0" data-bid="67-1" class="b"><em>Mat. 13:23</em></a><em>.</em>
1.jpg
AinaJ
Posts: 5
Joined: April 17th, 2018, 8:38 am

Re: Webparser clean captured string

Post by AinaJ »

Thank you so much jsmorley.
And how can i implement this code when the captured text change and the html code also, I forgetted about it.
here is the code when parsing the webpage.

Code: Select all

<header>
<h2 id="p53" data-pid="53">Talata 17 Aprily</h2>
</header>
<p id="p54" data-pid="54" class = "themeScrp"><em>Mandre ny teny </em>[<em>izy</em>] <em>ka mahazo ny heviny, ary tena mamoa.—</em><a href="/mg/wol/bc/r26/lp-mg/1102018053/66/0" data-bid="67-1" class="b"><em>Mat. 13:23</em></a><em>.</em></p>
I used this code file to get the string text that i needed.

Code: Select all

[Rainmeter]
Update=1000
AccurateText=1
DynamicWindowSize=1

[Metadata]
Name=
Author=
Information=
Version=
License=Creative Commons Attribution - Non - Commercial - Share Alike 3.0

[MeterSite]
Measure=WebParser
URL=https://wol.jw.org/mg/wol/h/r26/lp-mg
RegExp=(?siU)<p id="p54" data-pid="54" class = "themeScrp">(.*)</p>
Debug=2
UpdateRate=3600

[MeterText]
Measure=WebParser
URL=[MeterSite]
StringIndex=1

[MeterDummy]
Meter=String

[MeterOutputStart]
Meter=String
MeasureName=MeterText
FontColor=#TextColor#
FontFace=Montserrat Light
FontSize=16
SolidColor=50,50,50,100
Padding=10,10,10,10
AntiAlias=1
W=1000
Clipstring=2 ;To word wrap
DynamicVariables=1
User avatar
balala
Rainmeter Sage
Posts: 16109
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Webparser clean captured string

Post by balala »

jsmorley's advice is usually indeed the best one, but in some circumstances it has a problem. If the html code is changing it probably can be hard to get it to properly work.
You could try to add a substitution to the [MeterText] measure (see the first Tip below), as it follows:

Code: Select all

[MeterText]
...
RegExpSubstitute=1
Substitute="<em>":"","</em>":"","\[<em>.*</em>]":"","\[izy]\s":"","<a href=.*>":"","</a>":"","—\.":""
However probably nor this solution won't work every time, depending on how the html code is changing over time. I have to follow the site (and the code) over a day or two, to see how does it work.

Tips:
  • In your code [MeterSite] and [MeterText] are (WebParser) measures, so I'd change their name to [MeasureSite] and [MeasureText].
  • [MeterDummy] isn't needed at all. You can remove it without fear.
AinaJ
Posts: 5
Joined: April 17th, 2018, 8:38 am

Re: Webparser clean captured string

Post by AinaJ »

Hi Friend,
Thank you also for the tips, I needed that because it's the first time.
I have to follow the site (and the code) over a day or two, to see how does it work.
Today I founded that the home page URL that i gave to you doesn't give the latest text that i needed. Instead this URL (https://wol.jw.org/mg/wol/dt/r26/lp-mg/2018/4/18) gives the current text depending on the date because it changes everyday.

Code: Select all

[MeasureSite]
Measure=WebParser
URL=https://wol.jw.org/mg/wol/dt/r26/lp-mg/2018/4/18
RegExp=(?siU)<p id="p57" data-pid="57" class = "themeScrp">(.*)</p>
Debug=2
UpdateRate=3600
And there is another challenge here, how can I capture the latest URL to the current date so i get it changed everyday (https: ....../2018/4/18)
User avatar
balala
Rainmeter Sage
Posts: 16109
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Webparser clean captured string

Post by balala »

I'm sorry, but I can't get the skin to work with the new [MeasureSite] measure. The initial measure (with the initial URL) worked well, but this one, just doesn't. Are you sure it does for you? Eventually post the whole working code again please.
AinaJ
Posts: 5
Joined: April 17th, 2018, 8:38 am

Re: Webparser clean captured string

Post by AinaJ »

This code worked for me:

Code: Select all

[Rainmeter]
Update=1000
AccurateText=1
DynamicWindowSize=1

[Metadata]
Name=
Author=
Information=
Version=
License=Creative Commons Attribution - Non - Commercial - Share Alike 3.0

[Variable]
Date=

[MeasureSite]
Measure=WebParser
URL=https://wol.jw.org/mg/wol/dt/r26/lp-mg/2018/4/18
RegExp=(?siU)<p id="p57" data-pid="57" class = "themeScrp">(.*)</p>
Debug=2
UpdateRate=3600

[MeasureText]
Measure=WebParser
URL=[MeasureSite]
StringIndex=1
RegExpSubstitute=1
Substitute="—</em>":" —","<em>":"","</em>":"","\[<em>.*</em>]":"","</a>":"","<a href=(.*)>":""

[MeterDummy]
Meter=String

[MeterOutputStart]
Meter=String
MeasureName=MeasureText
FontColor=#TextColor#
FontFace=Montserrat Light
FontSize=18
SolidColor=50,50,50,100
Padding=20,20,20,20
AntiAlias=1
W=2000
Clipstring=2 ;To word wrap
DynamicVariables=1
User avatar
balala
Rainmeter Sage
Posts: 16109
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Webparser clean captured string

Post by balala »

AinaJ wrote:And there is another challenge here, how can I capture the latest URL to the current date so i get it changed everyday (https: ....../2018/4/18)
Add the following Time measures to your code:

Code: Select all

[MeasureYear]
Measure=Time
Format=%Y

[MeasureMonth]
Measure=Time
Format=%#m

[MeasureDay]
Measure=Time
Format=%#d
Obviously [MeasureYear] returns the year, [MeasureMonth] returns the month and finally [MeasureDay] returns the day.
You have to use these measures in the URL option of the [MeasureSite] measure. Replace it with the following one: URL=https://wol.jw.org/mg/wol/dt/r26/lp-mg/[&MeasureYear]/[&MeasureMonth]/[&MeasureDay].
AinaJ
Posts: 5
Joined: April 17th, 2018, 8:38 am

Re: Webparser clean captured string

Post by AinaJ »

Thank you! it worked :thumbup:
I've just modified the substitute code you gave to me and Let's see if it work well for the next days
User avatar
balala
Rainmeter Sage
Posts: 16109
Joined: October 11th, 2010, 6:27 pm
Location: Gheorgheni, Romania

Re: Webparser clean captured string

Post by balala »

One more thing I'd do would be to update the skin every time the day is changing. I'm not sure how useful this would be, because if it is or not, depends when the content on the page is actualized on a new day. But maybe it worth a try.
If you want to try it out, add the following option to the [MeasureDay] measure: OnChangeAction=[!CommandMeasure "MeasureSite" "Update"].
Post Reply