It is currently March 28th, 2024, 10:49 am

strip html

Get help with creating, editing & fixing problems with skins
Post Reply
scub
Posts: 30
Joined: May 16th, 2015, 10:37 pm

strip html

Post by scub »

how do you deal with html in a xml feed? what would be a substitute to use to remove the html code?
would be nice if rainmeter could parse at least the image to display it.. not possible i assume? or at least strip out the html code, but when there is <img tags, substitute it for something like "[image]" so it looks a bit better when reading.

some of my favorite xml feeds have a lot of info, like nasa. i can read the whole story, so i do not have to leave the desktop :D now if there were only easy ability to scroll text (more text is cut off in the article that does not fit the skin W/H).
Attachments
Untitled-1 copy.jpg
User avatar
jsmorley
Developer
Posts: 22628
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: strip html

Post by jsmorley »

WebParser is not a web "browser". It doesn't render anything, and can't "strip" HTML codes because It doesn't have a clue what HTML is, nor does it care.

It is simply a way to connect to some external resource, a site or a file, and capture any "text" that the resource returns. Then you use Regular Expression to parse this text to pull out the bits you want.

Generally speaking, WebParser is going to be used to access some RSS/ATOM/XML formatted site data, where the stuff returned in in the form:

<item>Some text for this item</item>

And you parse that with something like

RegExp=<item>(.*)</item>

So what is in StringIndex 1 is "Some text for this item" and the <tags> are only used to zero in on what you want to capture, but not returned in the value.

If you are going to just go get an entire web page of HTML, it will be up to you to deal with the HTML <tags> and such. You can use Substitute like:

Substitute="<p>":"","</p>":""

and so on, but uhm, There are a virtually unlimited number of <tags> and other HTML / CSS formatting codes, and I doubt you could ever catch them all.

I don't mean to discourage, but trying to use WebParser as a general-purpose web browser is only going to lead to tears, and not what it is intended for.

WebParser doesn't "see" what you "see" when you go to a site in your web browser. What is sees is what you get when you right-click and say "show source" on a site in your web browser. It sees the raw source code of the site, with all the Javascript, CSS, HTML, as is... It doesn't know that any of that should be ignored or otherwise handled, it just doesn't care. It's all just text.

As dvo says, we would need a link to that site before we could take a look and see if there is some way to parse that which would allow you to return useful information in a useful way. However, I'm skeptical. Trying to get and display entire variable-length "articles" from a web site is going to a challenge at best. There is no easy way to "scroll" information in a meter. A more common use would be to get and display just the "headlines" of the various articles, and have those link to the full article in your default web browser.
User avatar
Bananorpion
Posts: 40
Joined: April 16th, 2017, 8:35 pm

Re: strip html

Post by Bananorpion »

jsmorley wrote:There are a virtually unlimited number of <tags> and other HTML / CSS formatting codes, and I doubt you could ever catch them all.
Image

More seriously, if I wanted to properly strip html, I would use another language (I suppose Lua has some library for that.). I would store the WebParser content into a file and then a run Perl script to get what I need, but that's my preference.
scub
Posts: 30
Joined: May 16th, 2015, 10:37 pm

Re: strip html

Post by scub »

its not that difficult to catch most, if not all of the html tags.
found a solution i was looking for, well close to it.

might be on my own here but: i need to collapse the whitespace (gaps). how can i substitute to use something like \\s+ : " " to remove them? i think i need to strip out the html tags first?

this works. but doesn't remove the empty gaps:

Code: Select all

Substitute="(?i)<[^>]*>":""
Attachments
Untitled-1 copy.jpg
Post Reply