Just wanted to point out that while the vast majority of websites are encoded as UTF-8, which is what WebParser expects by default, there are some, maybe about 10%, that are encoded with:
charset=iso-8859-1
or
charset=Windows-1252
For all practical purposes, these are the same thing. They were not originally, but all modern web browsers, and the HTML specification, will assume that Windows-1252 is meant when iso-8859-1 is seen as a meta command in the HTML.
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Without getting into all the ins and outs of encoding, these characters sets are very similar, but NOT the same, as the first 255 characters of the Unicode characters set in UTF-8. Things like the ½ character for instance, when encoded as Windows-1252 will not be recognized when parsed as UTF-8. You will get a ? question mark instead. In a sense, Windows-1252 is seen as ANSI, which is not UTF-8, and in fact doesn't really exist. It's certainly not Unicode, which... come on guys, get with it! It's 2018!
In order to parse these sites correctly, you need to set
CodePage=1252
On the parent WebParser measure.
https://docs.rainmeter.net/manual/measures/webparser/#CodePage
https://en.wikipedia.org/wiki/ISO/IEC_8859-1
https://en.wikipedia.org/wiki/Windows-1252
Here is an example...
http://hosted.ap.org/dynamic/fronts/RAW?SITE=MYPSP&SECTION=HOME
It is currently December 5th, 2023, 11:24 am
Website encoding
-
- Developer
- Posts: 22593
- Joined: April 19th, 2009, 11:02 pm
- Location: Fort Hunt, Virginia, USA