I have scraped through the Forums and found multiple posts regarding the Unicode Settings. I have attempted different routes and have been unsuccessful. Please Help!
Here is the Measure for the Initial Pulling of Information from the API.
[MeasurePlexPyNowPlaying]
Hidden=1
Group=Bar
; Returns the Names of the Media Being Played. (Ex. Ex. Moana, Frozen, Titanic)
Measure=Plugin
Plugin=WebParser
Url=[MeasurePlexPy]
RegExp=(?siU)(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")(?(?=.*"full_title\":.\".*\").*"full_title\":.\"(.*)\")
UpdateRate=#RefreshRate#
LogSubstringErrors=0
and here is a Measure that pulls the String Index for each stream.
I'd have to see the actual HTML that WebParser is getting.
If you add Debug=2 to the PARENT WebParser measure, it will save what it it getting as WebParserDump.txt in the skin folder. Than you can give us that.
If it is literally putting "El Gran Ca\u00fon" in the output, then I'm not sure there is much that can be done about that. While that \u00f is the 16bit unicode "escape" representation of the ñ character, that is not how HTML does it. The HTML representation of ñ is ñ and then you would use DecodeCharacterReference=1 on the child measure. There is no way I know of for it to deal with \u00f embedded in a string.
If you are reading some resource that is not HTML output, that is not intended to be used in a web browser, but is rather some file that is intended to be read and resolved by some programming language, you might be out of luck.
jsmorley wrote:I'd have to see the actual HTML that WebParser is getting.
If you add Debug=2 to the PARENT WebParser measure, it will save what it it getting as WebParserDump.txt in the skin folder. Than you can give us that.
If it is literally putting "El Gran Ca\u00fon" in the output, then I'm not sure there is much that can be done about that. While that \u00f is the 16bit unicode "escape" representation of the ñ character, that is not how HTML does it. The HTML representation of ñ is ñ and then you would use DecodeCharacterReference=1 on the child measure. There is no way I know of for it to deal with \u00f embedded in a string.
If you are reading some resource that is not HTML output, that is not intended to be used in a web browser, but is rather some file that is intended to be read and resolved by some programming language, you might be out of luck.
"full_title": "Shameless (US) - El Gran Ca\u00f1on"
[MeasurePlexPy]
; Returns the API Information this Skin will use while running.
Measure=Plugin
Plugin=WebParser
Url=http://#PlexPyAddress#/api/v2?apikey=#APIKey#&cmd=get_activity
RegExp="(?siU)^\S(.*)$"
OnRegExpErrorAction="0"
UpdateRate=#RefreshRate#
FinishAction=[!EnableMeasure MeasurePlexPyTranscodeCount][!UpdateMeasure MeasurePlexPyTranscodeCount]
OnConnectErrorAction=[!HideMeter MeterOverallText][!HideMeter MeterServerNameText][!CommandMeasure MeasurePlexPy "Reset"]
LogSubstringErrors=0
What you are reading is just apparently not HTML output. In HTML, unicode is handled in one of two way.
By far the most common, is to just encode the HTML as UTF-8 w/o BOM, which is what 99.9% of modern websites do. That means that ñ is just in the file as of ñ , no extra encoding is needed.
An older, and more rare approach is to encode unicode characters as a "character reference sequence", which for of ñ would be ñ Then browsers and other HTML-aware applications can decode that to of ñ
You will still see character references in a lot of web HTML code, but mostly these days only to represent things that can be ambiguous in HTML, like the "&" character, which you will often see as & in HTML code. It is very rare anymore to see entire language elements, alphabetical characters, encoded. There is just no need with UTF-8.
You need a way to tell your API whatever to output to UTF-8, and not ASCII with broken encoding.
I mean, there is literally no good way to parse for \00f embedded in a string like Ca\00fon. While you can treat the initial escape "\" as the start of a unicode sequence, what you would have to do to parse that is to say:
\0 : is that a unicode char? yes.
\00 : is that a uncode char? yes.
\00f : is that a unicode char? yes.
\00fo : is that a unicode char? no.. Ah, then it's \00f and I can deal with that.
And that is stupid. It could easily be wrong, if the unicode was really \00 and the "f" was actually part of the string.
<meta charset="utf-8"/>
...
Yeah, indeed.<br/><br/>What you are reading is just apparently not HTML output. In HTML, unicode is handled in one of two way. <br/><br/>By far the most common, is to just encode the HTML as UTF-8 w/o BOM, which is what 99.9% of modern websites do. That means that ñ is just in the file as of ñ , no extra encoding is needed.<br/><br/>An older, and more rare approach is to encode unicode characters as a "character reference sequence", which for of ñ would be ñ Then browsers and other HTML-aware applications can decode that to of ñ<br/><br/>You will still see character references in a lot of web HTML code, but mostly these days only to represent things that can be ambiguous in HTML, like the "&" character, which you will often see as & in HTML code. It is very rare anymore to see entire language elements, alphabetical characters, encoded. There is just no need with UTF-8.