It is currently March 28th, 2024, 8:35 pm

Help with parsing html

Get help with creating, editing & fixing problems with skins
TimmyT1983
Posts: 12
Joined: December 22nd, 2020, 10:39 pm

Help with parsing html

Post by TimmyT1983 »

Hey, I'm just trying to get the temperature (just the number in degrees) parsed from the html off my temperature sensor.

here is the html from my sensor's page:

Code: Select all

<div style="text-align:left;display:inline-block;color:#eaeaea;min-width:340px;"><div style="text-align:center;color:#eaeaea;"><noscript>To use Tasmota, please enable JavaScript<br></noscript><h3>Sonoff TH Module</h3><h2>Tasmota</h2></div><div id="l1" name="l1"><table style="width:100%"><tbody><tr><th>SI7021 Temperature</th><td>79.5 °F</td></tr><tr><th>SI7021 Humidity</th><td>28.6 %</td></tr><tr><th>SI7021 Dew point</th><td>44.1 °F</td></tr></tbody></table><table style="width:100%"><tbody><tr><td style="width:100%"><div style="text-align:center;font-weight:normal;font-size:62px">OFF</div></td></tr></tbody></table></div><table style="width:100%"><tbody><tr><td style="width:100%"><button onclick="la(&quot;&amp;o=1&quot;);" name="">Toggle</button></td></tr></tbody></table><div></div><p></p><form action="cn" method="get"><button name="">Configuration</button></form><p></p><p></p><form action="in" method="get"><button name="">Information</button></form><p></p><p></p><form action="up" method="get"><button name="">Firmware Upgrade</button></form><p></p><p></p><form action="cs" method="get"><button name="">Console</button></form><p></p><p></p><form action="." method="get" onsubmit="return confirm(&quot;Confirm Restart&quot;);"><button name="rst" class="button bred">Restart</button></form><p></p><div style="text-align:right;font-size:11px;"><hr><a href="https://bit.ly/tasmota" target="_blank" style="color:#aaa;">Tasmota 9.1.0 by Theo Arends</a></div></div>
the temp in there is: <td>79.5 °F</td>


Here is the code i'm using for the webparser skin:

Code: Select all

[Rainmeter]
Update=1000
AccurateText=1
DynamicWindowSize=1
BackgroundMode=3

[MeasureSite]
Measure=Plugin
Plugin=WebParser
URL=http://192.168.1.76
RegExp=(?siU) < This is where the parser info goes >
UpdateRate=3600

[MeasureIP]
Measure=Plugin
Plugin=WebParser
URL=[MeasureSite]
StringIndex=1


[MeterIP]
Meter=String
MeasureName=MeasureIP
X=100
Y=3R
W=200
H=100
FontSize=25
FontColor=255,225,181,255
SolidColor=100,100,100,100
Padding=5,5,5,5
FontWeight=200
StringAlign=Center
StringEffect=Shadow
AntiAlias=1
ClipString=1


ANy help or direction would be appreciated! Thank you!
TimmyT1983
Posts: 12
Joined: December 22nd, 2020, 10:39 pm

Re: Help with parsing html

Post by TimmyT1983 »

Couldn't get any data from that. when I test it with just

Code: Select all

<h3>(.*)</h3>
I get that text parsed but not with the rest of the code
TimmyT1983
Posts: 12
Joined: December 22nd, 2020, 10:39 pm

Re: Help with parsing html

Post by TimmyT1983 »

Yep, that's exactly what I get. I was just posting that code to show it was connecting to the IP and parsing data.

I'm just looking for parser code to get only:

Code: Select all

<td>79.5 °F</td>


The temp will update and change regularly but that's what it reads in the first snipet of html I posted
TimmyT1983
Posts: 12
Joined: December 22nd, 2020, 10:39 pm

Re: Help with parsing html

Post by TimmyT1983 »

with .*<td>(.*)</td>.* I get another section of code here in the attached pic
You do not have the required permissions to view the files attached to this post.
User avatar
SilverAzide
Rainmeter Sage
Posts: 2588
Joined: March 23rd, 2015, 5:26 pm

Re: Help with parsing html

Post by SilverAzide »

TimmyT1983 wrote: December 23rd, 2020, 12:25 am with .*<td>(.*)</td>.* I get another section of code here in the attached pic
I think perhaps you didn't understand what dvo was trying to give you. He gave you a regex that will parse almost everything in that html, not just the one single value you asked for.

For example:

Code: Select all

[MeasureSite]
Measure=Plugin
Plugin=WebParser
URL=http://192.168.1.76
RegExp=(?siU)<h3>(.*)</h3><h2>(.*)</h2>.*<th>(.*)</th><td>(.*)</td>.*<th>(.*)</th>.*<td>(.*)</td>.*<th>(.*)</th><td>(.*)</td>.*
UpdateRate=3600
Will grab 8 chunks of data in one shot:

1 => Sonoff TH Module
2 => Tasmota
3 => SI7021 Temperature
4 => 79.5 °F
5 => SI7021 Humidity
6 => 28.6 %
7 => SI7021 Dew point
8 => 44.1 °F

So what you need to do is create 1 to 8 "child measures" to grab these values (whatever ones you want). For example:

Code: Select all

[MeasureTemp]
Measure=Plugin
Plugin=WebParser
URL=[MeasureSite]
StringIndex=4

[MeasureHumdity]
Measure=Plugin
Plugin=WebParser
URL=[MeasureSite]
StringIndex=6

[MeasureDewPoint]
Measure=Plugin
Plugin=WebParser
URL=[MeasureSite]
StringIndex=8
Does this work? (I didn't try it.) Here is a invaluable tool to test WebParser regexps and see the index numbers: https://docs.rainmeter.net/tips/webparser-debugging-regexp/#RainRegExp
Gadgets Wiki GitHub More Gadgets...
TimmyT1983
Posts: 12
Joined: December 22nd, 2020, 10:39 pm

Re: Help with parsing html

Post by TimmyT1983 »

I think I understand the child measures a bit better now, thanks for clarifying.

I could not get it to display anything with this:


Code: Select all

[Rainmeter]
Update=1000
AccurateText=1
DynamicWindowSize=1
BackgroundMode=3

[MeasureSite]
Measure=Plugin
Plugin=WebParser
URL=http://192.168.1.76
RegExp=(?siU)<h3>(.*)</h3><h2>(.*)</h2>.*<th>(.*)</th><td>(.*)</td>.*<th>(.*)</th>.*<td>(.*)</td>.*<th>(.*)

</th><td>(.*)</td>.*
UpdateRate=3600

[MeasureTemp]
Measure=Plugin
Plugin=WebParser
URL=[MeasureSite]
StringIndex=4

[MeasureHumdity]
Measure=Plugin
Plugin=WebParser
URL=[MeasureSite]
StringIndex=6

[MeasureDewPoint]
Measure=Plugin
Plugin=WebParser
URL=[MeasureSite]
StringIndex=8


[MeterIP]
Meter=String
MeasureName=MeasureTemp
X=100
Y=3R
W=900
H=900
FontSize=25
FontColor=255,225,181,255
SolidColor=100,100,100,100
Padding=5,5,5,5
FontWeight=200
StringAlign=Center
StringEffect=Shadow
AntiAlias=1
ClipString=1
However, I realized I hadn't posted the entire html file so here it is:

Code: Select all

<!DOCTYPE html><html lang=\“en\“ class=""><head><meta charset='utf-8'><meta name="viewport" content="width=device-width,initial-scale=1,user-scalable=no"/><title>Tasmota - Information</title><script>var x=null,lt,to,tp,pc='';function eb(s){return document.getElementById(s);}function qs(s){return document.querySelector(s);}function sp(i){eb(i).type=(eb(i).type==='text'?'password':'text');}function wl(f){window.addEventListener('load',f);}function i(){var s,o="<table style='width:100%'><tr><th>Program Version}29.1.0(tasmota)}1Build Date & Time}22020-11-07T11:57:45}1Core/SDK Version}22_7_4_5/2.2.2-dev(38a443e)}1Uptime}20T00:00:17}1Flash write Count}237 at 0xF7000}1Boot Count}214}1Restart Reason}2Power On}1Friendly Name 1}2Tasmota}1}2&nbsp;}1AP1 SSId (RSSI)}2dd-wrt (86%, -57 dBm)}1Hostname}2tasmota_0507A9-1961}1MAC Address}2C8:2B:96:05:07:A9}1IP Address (wifi)}2192.168.1.76}1<hr/>}2<hr/>}1Gateway}2192.168.1.1}1Subnet Mask}2255.255.255.0}1DNS Server}2192.168.1.1}1}2&nbsp;}1MQTT Host}2}1MQTT Port}21883}1MQTT User}2DVES_USER}1MQTT Client}2DVES_0507A9}1MQTT Topic}2tasmota_%06X}1MQTT Group Topic 1}2cmnd/tasmotas/}1MQTT Full Topic}2cmnd/tasmota_0507A9/}1MQTT Fallback Topic}2cmnd/DVES_0507A9_fb/}1MQTT No Retain}2Disabled}1}2&nbsp;}1Emulation}2None}1mDNS Discovery}2Disabled}1}2&nbsp;}1ESP Chip Id}2329641}1Flash Chip Id}20x14605E}1Flash Size}21024kB}1Program Flash Size}21024kB}1Program Size}2600kB}1Free Program Space}2400kB}1Free Memory}226kB</td></tr></table>";s=o.replace(/}1/g,"</td></tr><tr><th>").replace(/}2/g,"</th><td>");eb('i').innerHTML=s;}wl(i);function jd(){var t=0,i=document.querySelectorAll('input,button,textarea,select');while(i.length>=t){if(i[t]){i[t]['name']=(i[t].hasAttribute('id')&&(!i[t].hasAttribute('name')))?i[t]['id']:i[t]['name'];}t++;}}wl(jd);</script><style>div,fieldset,input,select{padding:5px;font-size:1em;}fieldset{background:#4f4f4f;}p{margin:0.5em 0;}input{width:100%;box-sizing:border-box;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;background:#dddddd;color:#000000;}input[type=checkbox],input[type=radio]{width:1em;margin-right:6px;vertical-align:-1px;}input[type=range]{width:99%;}select{width:100%;background:#dddddd;color:#000000;}textarea{resize:vertical;width:98%;height:318px;padding:5px;overflow:auto;background:#1f1f1f;color:#65c115;}body{text-align:center;font-family:verdana,sans-serif;background:#252525;}td{padding:0px;}button{border:0;border-radius:0.3rem;background:#1fa3ec;color:#faffff;line-height:2.4rem;font-size:1.2rem;width:100%;-webkit-transition-duration:0.4s;transition-duration:0.4s;cursor:pointer;}button:hover{background:#0e70a4;}.bred{background:#d43535;}.bred:hover{background:#931f1f;}.bgrn{background:#47c266;}.bgrn:hover{background:#5aaf6f;}a{color:#1fa3ec;text-decoration:none;}.p{float:left;text-align:left;}.q{float:right;text-align:right;}.r{border-radius:0.3em;padding:2px;margin:6px 2px;}</style></head><body><div style='text-align:left;display:inline-block;color:#eaeaea;min-width:340px;'><div style='text-align:center;color:#eaeaea;'><noscript>To use Tasmota, please enable JavaScript<br></noscript><h3>Sonoff TH Module</h3><h2>Tasmota</h2></div><style>td{padding:0px 5px;}</style><div id='i' name='i'></div><div></div><p><form action='.' method='get'><button>Main Menu</button></form></p><div style='text-align:right;font-size:11px;'><hr/><a href='https://bit.ly/tasmota' target='_blank' style='color:#aaa;'>Tasmota 9.1.0 by Theo Arends</a></div></div></body></html>




I'm not sure if providing the full html code makes a difference.




Playing with RainRegEXp right now and I see what you mean, I get the same correct data you posted when parsing:

Code: Select all

1 => Sonoff TH Module
2 => Tasmota
3 => SI7021 Temperature
4 => 79.0 °F
5 => SI7021 Humidity
6 => 27.2 %
7 => SI7021 Dew point
8 => 42.3 °F
Last edited by TimmyT1983 on December 23rd, 2020, 2:55 am, edited 1 time in total.
TimmyT1983
Posts: 12
Joined: December 22nd, 2020, 10:39 pm

Re: Help with parsing html

Post by TimmyT1983 »

Ok, so I've tried viewing the IP address with rainregexp and for some reason I'm not getting the table I get (updated temp, humidity etc.) when visiting the ip address. All other parts of the HTML are there so i know it's getting the ip's html. I think it's javascript or something other than html
User avatar
SilverAzide
Rainmeter Sage
Posts: 2588
Joined: March 23rd, 2015, 5:26 pm

Re: Help with parsing html

Post by SilverAzide »

TimmyT1983 wrote: December 23rd, 2020, 2:41 am Ok, so I've tried viewing the IP address with rainregexp and for some reason I'm not getting the table I get (updated temp, humidity etc.) when visiting the ip address. All other parts of the HTML are there so i know it's getting the ip's html. I think it's javascript or something other than html
That code you posted is completely different from what you posted originally, so the regexp isn't going to work. You can enter the URL into RainRegExp and hit the connect option to pull the data into the program. You will then need to create the proper regexp expression to parse the text. That is the complicated bit. If you post the EXACT and complete text coming from the site, you should be able to get some help parsing it...
Gadgets Wiki GitHub More Gadgets...
TimmyT1983
Posts: 12
Joined: December 22nd, 2020, 10:39 pm

Re: Help with parsing html

Post by TimmyT1983 »

That code you posted is completely different from what you posted originally, so the regexp isn't going to work
Yeah, I assume what I originally posted was the browser's processed version of the javascript into html. After using the IP address directly with RainRegExp, I got the "raw" non processed javascript version of the page and that's why trying to parse didn't work and probably wont work because it's javascript or json :confused:

Does that make sense?

here is exactly what I get from RainRegExp:

Code: Select all

<!DOCTYPE html><html lang=\“en\“ class=""><head><meta charset='utf-8'><meta name="viewport" content="width=device-width,initial-scale=1,user-scalable=no"/><title>Tasmota - Main Menu</title><script>var x=null,lt,to,tp,pc='';function eb(s){return document.getElementById(s);}function qs(s){return document.querySelector(s);}function sp(i){eb(i).type=(eb(i).type==='text'?'password':'text');}function wl(f){window.addEventListener('load',f);}function la(p){var a='';if(la.arguments.length==1){a=p;clearTimeout(lt);}if(x!=null){x.abort();}x=new XMLHttpRequest();x.onreadystatechange=function(){if(x.readyState==4&&x.status==200){var s=x.responseText.replace(/{t}/g,"<table style='width:100%'>").replace(/{s}/g,"<tr><th>").replace(/{m}/g,"</th><td>").replace(/{e}/g,"</td></tr>").replace(/{c}/g,"%'><div style='text-align:center;font-weight:");eb('l1').innerHTML=s;}};x.open('GET','.?m=1'+a,true);x.send();lt=setTimeout(la,2345);}function lc(v,i,p){if(eb('s')){if(v=='h'||v=='d'){var sl=eb('sl4').value;eb('s').style.background='linear-gradient(to right,rgb('+sl+'%,'+sl+'%,'+sl+'%),hsl('+eb('sl2').value+',100%,50%))';}}la('&'+v+i+'='+p);}wl(la);function jd(){var t=0,i=document.querySelectorAll('input,button,textarea,select');while(i.length>=t){if(i[t]){i[t]['name']=(i[t].hasAttribute('id')&&(!i[t].hasAttribute('name')))?i[t]['id']:i[t]['name'];}t++;}}wl(jd);</script><style>div,fieldset,input,select{padding:5px;font-size:1em;}fieldset{background:#4f4f4f;}p{margin:0.5em 0;}input{width:100%;box-sizing:border-box;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;background:#dddddd;color:#000000;}input[type=checkbox],input[type=radio]{width:1em;margin-right:6px;vertical-align:-1px;}input[type=range]{width:99%;}select{width:100%;background:#dddddd;color:#000000;}textarea{resize:vertical;width:98%;height:318px;padding:5px;overflow:auto;background:#1f1f1f;color:#65c115;}body{text-align:center;font-family:verdana,sans-serif;background:#252525;}td{padding:0px;}button{border:0;border-radius:0.3rem;background:#1fa3ec;color:#faffff;line-height:2.4rem;font-size:1.2rem;width:100%;-webkit-transition-duration:0.4s;transition-duration:0.4s;cursor:pointer;}button:hover{background:#0e70a4;}.bred{background:#d43535;}.bred:hover{background:#931f1f;}.bgrn{background:#47c266;}.bgrn:hover{background:#5aaf6f;}a{color:#1fa3ec;text-decoration:none;}.p{float:left;text-align:left;}.q{float:right;text-align:right;}.r{border-radius:0.3em;padding:2px;margin:6px 2px;}</style></head><body><div style='text-align:left;display:inline-block;color:#eaeaea;min-width:340px;'><div style='text-align:center;color:#eaeaea;'><noscript>To use Tasmota, please enable JavaScript<br></noscript><h3>Sonoff TH Module</h3><h2>Tasmota</h2></div><div id='l1' name='l1'></div><table style='width:100%'><tr><td style='width:100%'><button onclick='la("&o=1");'>Toggle</button></td></tr></table><div></div><p><form action='cn' method='get'><button>Configuration</button></form></p><p><form action='in' method='get'><button>Information</button></form></p><p><form action='up' method='get'><button>Firmware Upgrade</button></form></p><p><form action='cs' method='get'><button>Console</button></form></p><p><form action='.' method='get' onsubmit='return confirm("Confirm Restart");'><button name='rst' class='button bred'>Restart</button></form></p><div style='text-align:right;font-size:11px;'><hr/><a href='https://bit.ly/tasmota' target='_blank' style='color:#aaa;'>Tasmota 9.1.0 by Theo Arends</a></div></div></body></html>

There is another method to get the data (in another part of the server) but it's also not html, just plain text and this I what I get:

Code: Select all

{"StatusSNS":{"Time":"2020-12-23T04:05:36","SI7021":{"Temperature":79.0,"Humidity":30.8,"DewPoint":45.5},"TempUnit":"F"}}
I assume that's mqtt
User avatar
SilverAzide
Rainmeter Sage
Posts: 2588
Joined: March 23rd, 2015, 5:26 pm

Re: Help with parsing html

Post by SilverAzide »

Yeah, the first thing is HTML, but the page is being rendered with Javascript so it's useless (your data isn't in there from what I can see). But that second text can be easily parsed. Is that data the exact return value?
Gadgets Wiki GitHub More Gadgets...