It is currently July 15th, 2024, 2:05 pm

Website encoding

Tips and Tricks from the Rainmeter Community
User avatar
Posts: 22646
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Website encoding

Post by jsmorley »

Just wanted to point out that while the vast majority of websites are encoded as UTF-8, which is what WebParser expects by default, there are some, maybe about 10%, that are encoded with:




For all practical purposes, these are the same thing. They were not originally, but all modern web browsers, and the HTML specification, will assume that Windows-1252 is meant when iso-8859-1 is seen as a meta command in the HTML.

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

Without getting into all the ins and outs of encoding, these characters sets are very similar, but NOT the same, as the first 255 characters of the Unicode characters set in UTF-8. Things like the ½ character for instance, when encoded as Windows-1252 will not be recognized when parsed as UTF-8. You will get a ? question mark instead. In a sense, Windows-1252 is seen as ANSI, which is not UTF-8, and in fact doesn't really exist. It's certainly not Unicode, which... come on guys, get with it! It's 2018!

In order to parse these sites correctly, you need to set


On the parent WebParser measure.

Here is an example...