Just wanted to point out that while the vast majority of websites are encoded as UTF-8, which is what WebParser expects by default, there are some, maybe about 10%, that are encoded with:
For all practical purposes, these are the same thing. They were not originally, but all modern web browsers, and the HTML specification, will assume that Windows-1252 is meant when iso-8859-1 is seen as a meta command in the HTML.
Without getting into all the ins and outs of encoding, these characters sets are very similar, but NOT the same, as the first 255 characters of the Unicode characters set in UTF-8. Things like the character for instance, when encoded as Windows-1252 will not be recognized when parsed as UTF-8. You will get a question mark instead. In a sense, Windows-1252 is seen as ANSI, which is not UTF-8, and in fact doesn't really exist. It's certainly not Unicode, which... come on guys, get with it! It's 2018!
In order to parse these sites correctly, you need to set
On the parent WebParser measure.
Here is an example...
It is currently June 20th, 2019, 5:58 pm
Tips and Tricks from the Rainmeter Community
1 post • Page 1 of 1
- Posts: 19266
- Joined: April 19th, 2009, 11:02 pm
- Location: Fort Hunt, Virginia, USA