It is currently April 19th, 2024, 3:38 pm

Robust special character substitutions

Tips and Tricks from the Rainmeter Community
slartibartfast
Posts: 15
Joined: September 10th, 2012, 3:42 pm

Robust special character substitutions

Post by slartibartfast »

Skins that extract data from web pages or XML have to deal with some special character substitutions. For instance & is used to represent the & character, which has special meaning, and Í is a numeric code way to express Í. The usual way to handle this is to apply a substitution expression in a measure declaration that has one clause for each special character code to be replaced, and the character to replace it with.

Every skin I've seen that does this has there own unique substitution skin (unless they copied it from someone else), and none of them have been truly complete. Worse still, most of them are tripped up by a quirk of Rainmeter skin parsing, and the developers have no idea their substitutions are not working as expected (I'll explain later).

I decided to implement special character translations inside my FeedReader plugin so skin developers won't even see the special character codes (unless they want to). That way, they only have to worry about substitutions specific to their skin, not translating every special character they might encounter. As a result I eliminated the special character substitution from my multiFEED skin. Before dropping it in the bit bucket though, I decided to share it with other skin developers, along with the reason it works when others don't. I researched all the special codes I could find out about and included translations in the expression:

Code: Select all

Substitute="&amp*#151;":"—","&ndash;":"–","&mdash;":"—","&iexcl;":"¡","&iquest;":"¿",""":'"',"&ldquo;":"“","&rdquo;":"”","&lsquo;":"‘","&rsquo;":"’","&laquo;":"«","&raquo;":"»","&nbsp;":" ","&":"&","&cent;":"¢","&copy;":"©","&divide;":"÷",">":">","<":"<","&micro;":"µ","&middot;":"·","&para;":"¶","&plusmn;":"±","&euro;":"€","&pound;":"£","&reg;":"®","&sect;":"§","&trade;":"™","&yen;":"¥","&aacute;":"á","&Aacute;":"Á","&agrave;":"à","&Agrave;":"À","&acirc;":"â","&Acirc;":"Â","&aring;":"å","&Aring;":"Å","&atilde;":"ã","&Atilde;":"Ã","&auml;":"ä","&Auml;":"Ä","&aelig;":"æ","&AElig;":"Æ","&ccedil;":"ç","&Ccedil;":"Ç","&eacute;":"é","&Eacute;":"É","&egrave;":"è","&Egrave;":"È","&ecir;":"ê","&Ecirc;":"Ê","&euml;":"ë","&Euml;":"Ë","&iacute;":"í","&Iacute;":"Í","&igrave;":"ì","&Igrave;":"Ì","&icirc;":"î","&Icirc;":"Î","&iuml;":"ï","&Iuml;":"Ï","&ntilde":"ñ","&Ntilde;":"Ñ","&oacute;":"ó","&Oacute;":"Ó","&ograve;":"ò","&Ograve;":"Ò","&ocirc;":"ô","&Ocirc;":"Ô","&oslash;":"ø","&Oslash;":"Ø","&otilde;":"õ","&Otilde;":"Õ","ouml;":"ö","&Ouml;":"Ö","&szlig;":"ß","&uacute;":"ú","&Uacute;":"Ú","&ugrave;":"ù","&Ugrave;":"Ù","&ucirc;":"û","&Ucirc;":"Û","&uuml;":"ü","&Uuml;":"Ü","&yuml;":"ÿ","&#*161;":"¡","&*#191;":"¿","&#*34;":'"',"&*#8220;":"“","&#*8221;":"”","&*#39;":"'","&#*8216;":"‘","&*#8217;":"’","&#*171;":"«","&*#187;":"»","&#*160;":" ","&*#38;":"&","&#*162;":"¢","&*#169;":"©","&#*247;":"÷","&*#62;":">","&#*60;":"<","&*#181;":"µ","&#*183;":"·","&*#182;":"¶","&#*177;":"±","&*#8364;":"€","&*163;":"£","&*#174;":"®","&#*167;":"§","&*#153;":"™","&#*165;":"¥","&*#225;":"á","&#*193;":"Á","&*#224;":"à","&#*192;":"À","&*#226;":"â","&#*194;":"Â","&*#229;":"å","&#*197;":"Å","&*#27;":"ã","&#*195;":"Ã","&*#228;":"ä","&#*196;":"Ä","&*#230;":"æ","&#*198;":"Æ","&*#231;":"ç","&#*199;":"Ç","&*#233;":"é","&#*201;":"É","&*#232;":"è","&#*200;":"È","&*#23;":"ê","&#*202;":"Ê","&*#235;":"ë","&#*203;":"Ë","&*#237;":"í","&#*205;":"Í","&*#236;":"ì","&#*204;":"Ì","&*#238;":"î","&#*206;":"Î","&*#239;":"ï","&#*207;":"Ï","&*#241;":"ñ","&#*209;":"Ñ","&*#243;":"ó","&#*211;":"Ó","&*#242;":"ò","&#*210;":"Ò","&*#244;":"ô","&#*212;":"Ô","&*#248;":"ø","&#*216;":"Ø","&*#245":"õ","&#*213;":"Õ","&*#246;":"ö","&#*214;":"Ö","&*#223;":"ß","&#*250;":"ú","&*#218;":"Ú","&#*249;":"ù","&*#217;":"Ù","&#*251;":"û","&*#219;":"Û","&#*252;":"ü","&*#220;;":"Ü","&#*255;":"ÿ","&*#180;":"´","&#*96;":"`","&*#8212;":"—"
All other Substitute expressions I've seen in skins have less terms than this one does, but more important, the more complete they are the more likely they are not to work. Here's why...

Many special character codes include a # character, which has special meaning to Rainmeter. Every time Rainmeter encounters one in a skin, it looks for another one, and treats everything in between as a variable to be substituted. So for instance, two substitution clauses like this:

""&#171;":"«","&#187;":"»""

...would be read by Rainmeter as "&" plus a variable called "#171;":"«","&#" plus "187;":"»"". Since there is no variable called "#171;":"«","&#", Rainmeter will replace it with "", and will read the two clauses as ""&187;":"»"", which is obviously not valid.

The more complete the substitution expression is, the more likely it is to have a # character in at least one of the match strings. None of the examples I looked at in other skins escaped these pound characters properly, and since the character codes aren't seen in web pages as often as the common ones like & the authors usually have no idea their substitution clauses are broken. Rainmeter will correctly parse all substitution match strings up to the first unescaped pound character it finds, so all the earlier substitutions work fine.

So we need to "escape" the pound symbols. Unfortunately this isn't as straightforward as it should be. Rainmeter doesn't have an escape character that works the "normal" way, such as the backslash used in *nix, C, and Regex expressions, where the escape character forces the very next character to be treated literally, even if it otherwise would have special meaning. With Rainmeter, the only way to escape a # is to pair it with another one and a couple of asterisks (#* *#). This makes escaping pound characters in Rainmeter much more complicated than it needs to be. To follow the "#* *#" pattern, every odd numbered # in your expression must have a * right after it, and every even numbered one must have the * immediately before it. It's ugly, but it works, unless you have an odd number or #'s in your expression. In that case the final one is not escaped and the rest of the expression is ignored.

The solution to this is to add a final "dummy" match-replace clause at the end of the expression with a *# pair in it to close off the final escapement. Be careful to make this dummy match clause something that will never match, like ""*#!!!!!!!!!!!!",""".

One other thing to watch out for; make sure that none of the earlier match strings prevent a later one from ever being evaluated. The "&amp*#151;" match clause is at the very beginning of my substitution expression because one of the later ones strips out any "&amp" substrings before it gets a chance to look at it.

The substitution clause I present here not only handles just about any special character codes you will ever encounter when parsing web pages or XML, it also actually works.

Slarti.
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Robust special character substitutions

Post by moshi »

after a quick look:

sure, this is helpful. it is far, far away from being complete though.
not only does it lack many characters, it also ignores hex-encoded characters (those can be upper and lower case) and characters that are encoded like \u0026.

one could expand the list, but unfortunately there seems to be a limit of what Rainmeter can swallow. the list i currently use is about 2-3 times as long as yours, but still lacks a lot of the things i mentioned above. when i tried to add even more (cyrillic characters) Rainmeter crashed. :(
slartibartfast
Posts: 15
Joined: September 10th, 2012, 3:42 pm

Re: Robust special character substitutions

Post by slartibartfast »

Well, almost a year later I'm back to respond. :welcome:

You are correct, the list of supported characters is incomplete, however this wasn't intended so much as drop-in code but instead to highlight the reason many skins had non-working substitution logic without the authors even realizing it. My FeedReader plugin handles over 150 special characters, but subsequent development of a mobile feed reading app (http://appworld.blackberry.com/webstore/content/29297891/?countrycode=CA&lang=en) revealed hundreds more that are not translated in the current release of my Rainmeter plugin.

There are so many special characters needing substitution that it is just not really practical to do it with Rainmeter substitution pairs. LUA is an option, but better yet is to translate them in plugin C++/C# code before Rainmeter even sees them. The Rainmeter devs have asked me to integrate my FeedReader plugin with the base Rainmeter release. My mobile development tasks have prevented me from doing this yet, but when I do I'll be drastically increasing the number of special characters the plugin can translate using what I've learned from my BlackBerry 10 app. Currently that app handles nearly 350 special characters.
moshi wrote:after a quick look:

sure, this is helpful. it is far, far away from being complete though.
not only does it lack many characters, it also ignores hex-encoded characters (those can be upper and lower case) and characters that are encoded like \u0026.

one could expand the list, but unfortunately there seems to be a limit of what Rainmeter can swallow. the list i currently use is about 2-3 times as long as yours, but still lacks a lot of the things i mentioned above. when i tried to add even more (cyrillic characters) Rainmeter crashed. :(