⭐ Weather.com - Parsing the V3 JSON

Post by **Yincognito** » August 4th, 2020, 7:33 pm

SilverAzide wrote: ↑August 4th, 2020, 4:17 pm OK, I need a little assistance with an issue illustrated by this skin but not directly weather.com related.

Can someone try using the location search feature in this skin and try to search for the text Wrocław? Take note of the "accented-L" symbol, it's not a latin letter L. The WebParser is returning nothing, and I can't figure out why. If you open your browser and use the exact string that WebParser is using (https://nominatim.openstreetmap.org/search.php?q=Wrocław&format=json), it is working perfectly. Searching for https://nominatim.openstreetmap.org/search.php?q=Wroc%C5%82aw&format=json also works, of course, but that's not the URL that's going out... I tried a custom header of "Accepts-Language: pl-PL", but that didn't help (the header is mentioned in the OSM API docs).

Using the latin "L" works, but that's not what normal users would enter. Is there some behind the scenes Unicode handling going on in the browsers but not in WebParser?

If I try this, the WebParser result is a "bla bla URL is not UTF-8 encoded". If however, while still having the UCS-2 LE BOM encoded file (and the InputSearch variable set to Wrocław) I convert it to UTF-8 in Notepad++ (which would turn Wrocław into WrocĹ‚aw) the WebParser would do its job and display the results properly.

So, my guess is that it's not a WebParser issue, but an encoding one. WebParser is happy to take whatever encoded string and use it in the URL and it will work, but since we use UCS-2 LE BOM in our Rainmeter .INI-s and Nominatim expects a UTF-8 encoded string, issues like these will occur. The solution, in my view, is to find a way to convert the string from UCS-2 LE BOM to UTF-8 when passing it as a parameter in the URL. Another solution is to find some hypothetical Nominatim parameter that would convert whatever encoding we use to a UTF-8 automatically in the URL.

Post by **jsmorley** » August 5th, 2020, 1:15 pm

Yincognito wrote: ↑August 4th, 2020, 7:33 pm If I try this, the WebParser result is a "bla bla URL is not UTF-8 encoded". If however, while still having the UCS-2 LE BOM encoded file (and the InputSearch variable set to Wrocław) I convert it to UTF-8 in Notepad++ (which would turn Wrocław into WrocĹ‚aw) the WebParser would do its job and display the results properly.

So, my guess is that it's not a WebParser issue, but an encoding one. WebParser is happy to take whatever encoded string and use it in the URL and it will work, but since we use UCS-2 LE BOM in our Rainmeter .INI-s and Nominatim expects a UTF-8 encoded string, issues like these will occur. The solution, in my view, is to find a way to convert the string from UCS-2 LE BOM to UTF-8 when passing it as a parameter in the URL. Another solution is to find some hypothetical Nominatim parameter that would convert whatever encoding we use to a UTF-8 automatically in the URL.

We are looking into this, and we still have not found the certain culprit. It is clearly an "encoding" issue, but we are not certain that changing the built-in API calls that WebParser uses to send a request to a remote resource is the right solution. We don't believe it has anything to do with UTF-8 vs UTF-16 or any of that.

What we are currently investigating is either a change to the :EncodeURL section variable parameter, or if that has BWC issues, a new one, something like :PercentEncode or something.

The goal would be to have a way to obey the rules found at https://www.url-encode-decode.com/ or http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for any measure value.

So in short, it would NOT encode the following characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~

and WOULD percent encode everything else in the measure value

So München would become M%C3%BCnchen. Note that the former will not work when passed to that site as is, but works perfectly well when passed to the site percent encoded.

So, if you are using InputText to capture some data that can contain non-ASCII characters, you would want to use that value as something like:

Code: Select all

URL=https://somesite.com?search=[SomeMeasure:PercentEncode]

In my view, :EncodeURL is entirely broken. It only seems to encode the URL "reserved" and "unsafe" characters, which is fine, but doesn't do anything with the rest of the many thousands of non-ASCII characters that are not allowed in URL parameters by the standard. Your web browser percent encodes those characters "under the covers", and you can see that it did it by going to a site using non-ASCII characters in the URL parameters, then copy the resulting URL from the bar at the top and paste it into a text file. You will see that the parameters are actually percent encoded by the rules before it is sent to the destination site.

We are hesitant to try to build that into WebParser, as the rules are a bit more complicated than you would think. You NEVER want to encode any part of the URI (https://) or destination (somesite.com/somepath/) and there can be "control" characters embedded in parameters like ?search=one&search2=two where you don't want to encode the ? or &, but ONLY when they are in fact being used as control characters. It's a lot of work to get it right.

We have not made a final decision on this, so stay tuned.

Post by **Yincognito** » August 5th, 2020, 2:35 pm

jsmorley wrote: ↑August 5th, 2020, 1:15 pm We are looking into this, and we still have not found the certain culprit. It is clearly an "encoding" issue, but we are not certain that changing the built-in API calls that WebParser uses to send a request to a remote resource is the right solution.

What we are currently investigating is either a change to the :EncodeURL section variable parameter, or if that has BWC issues, a new one, something like :PercentEncode or something.

The goal would be to have a way to obey the rules found at https://www.url-encode-decode.com/ or http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for any measure value.

So in short, it would NOT encode the following characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~

and WOULD percent encode everything else in the measure value

So München would become M%C3%BCnchen. Note that the former will not work when passed to that site as is, but works perfectly well when passed to the site percent encoded.

So, if you are using InputText to capture some data that can contain non-ASCII characters, you would want to use that value as something like:
Code: Select all
URL=https://somesite.com?search=[SomeMeasure:PercentEncode]
In my view, :EncodeURL is entirely broken. It only seems to encode the URL "reserved" and "unsafe" characters, which is fine, but doesn't do anything with the rest of the many thousands of non-ASCII characters that are not allowed in URL parameters by the standard. Your web browser percent encodes those characters "under the covers", and you can see that it did it by going to a site using non-ASCII characters in the URL parameters, then copy the resulting URL from the bar at the top and paste it into a text file. You will see that the parameters are actually percent encoded by the rules before it is sent to the destination site.

We are hesitant to try to build that into WebParser, as the rules are a bit more complicated than you would think. You NEVER want to encode any part of the URI (https://) or destination (somesite.com/somepath/) and there can be "control" characters embedded in parameters like ?search=one&search2=two where you don't want to encode the ? or &, but ONLY when they are in fact being used as control characters. It's a lot of work to get it right.

We have not made a final decision on this, so stay tuned.

I see. Hmm... that looks tricky indeed. I guess not building this into WebParser is the right call, since as I noticed, it seems ok with whatever encoded string one passes at it, depending on the .INI file encoding used of course. Enhancing the :EncodeURL, or making a new :PercentEncode parameter for section variables is probably best, as long as the "non-percent encoded" string is still available for usage by the user (in other words, if the percent encoding is done internally). I guess the decision on whether to implement the former or the latter also boils down to whether it would still be useful to have a string with only those already percent encoded characters in :EncodeURL encoded (e.g. maybe sometimes one wants only the commas, spaces and other stuff percent encoded instead of every non-alphanumeric character, I don't know).

Post by **Yincognito** » August 5th, 2020, 2:43 pm

jsmorley wrote: ↑August 5th, 2020, 1:15 pmWe don't believe it has anything to do with UTF-8 vs UTF-16 or any of that.

It does, if one wants to avoid the percent encoding and just use the characters themselves (like SilverAzide and I have initially attempted to). If it's a matter of doing the percent encoding on the problematic characters, then obviously it doesn't matter anymore, as there will be no non-ASCII characters in the string anyway.

Post by **SilverAzide** » August 5th, 2020, 3:43 pm

jsmorley wrote: ↑August 5th, 2020, 1:15 pm We have not made a final decision on this, so stay tuned.

Just my useless opinion, FWIW, but I kind of like your idea of adding a :PercentEncode section variable. Seems like a nice addition, very K.I.S.S., and backward compatible...

Post by **Yincognito** » August 5th, 2020, 5:28 pm

You are probably looking into some automatic way to do this through some C++ function and such (although I read somewhere it's not that easy, but then I might be very well mistaken as I'm no expert in C++), however maybe you find some good use of Unicode/UTF-8 Character Table (the percent encoding elements are in the UTF-8 column, as far as I can tell). If not useful in implementing the solution, maybe it's useful in verifying the expected results.

Post by **jsmorley** » August 20th, 2020, 12:52 pm

SilverAzide wrote: ↑August 4th, 2020, 4:17 pm OK, I need a little assistance with an issue illustrated by this skin but not directly weather.com related.

Can someone try using the location search feature in this skin and try to search for the text Wrocław? Take note of the "accented-L" symbol, it's not a latin letter L. The WebParser is returning nothing, and I can't figure out why. If you open your browser and use the exact string that WebParser is using (https://nominatim.openstreetmap.org/search.php?q=Wrocław&format=json), it is working perfectly. Searching for https://nominatim.openstreetmap.org/search.php?q=Wroc%C5%82aw&format=json also works, of course, but that's not the URL that's going out... I tried a custom header of "Accepts-Language: pl-PL", but that didn't help (the header is mentioned in the OSM API docs).

Using the latin "L" works, but that's not what normal users would enter. Is there some behind the scenes Unicode handling going on in the browsers but not in WebParser?

This has been addressed in the latest r3401 beta of Rainmeter.

https://forum.rainmeter.net/viewtopic.php?f=129&t=35885

Post by **Yincognito** » August 20th, 2020, 12:58 pm

jsmorley wrote: ↑August 20th, 2020, 12:52 pm This has been addressed in the latest r3401 beta of Rainmeter.

https://forum.rainmeter.net/viewtopic.php?f=129&t=35885

Excellent.

I guess it took quite a bit of work to get it done - I for one am grateful for that.

Post by **SilverAzide** » August 20th, 2020, 1:47 pm

jsmorley wrote: ↑August 20th, 2020, 12:52 pm This has been addressed in the latest r3401 beta of Rainmeter.

Awesome! Thank you guys for all your hard work!

Post by **jsmorley** » August 20th, 2020, 1:48 pm

SilverAzide wrote: ↑August 20th, 2020, 1:47 pm Awesome! Thank you guys for all your hard work!

1.jpg

2.jpg

⭐ Weather.com - Parsing the V3 JSON

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!