It is currently September 26th, 2020, 9:17 pm

⭐ Weather.com - Parsing the V3 JSON - NEW!

Our most popular Tips and Tricks from the Rainmeter Team and others
User avatar
SilverAzide
Posts: 956
Joined: March 23rd, 2015, 5:26 pm

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by SilverAzide »

jsmorley wrote: August 4th, 2020, 5:57 pm I'd like to understand, and if possible fix, this issue with URL's and WebParser. This limitation isn't just in this skin, but presumably any skin that passes parameters to a URL.
Yes. I'm a bit shocked this hasn't cropped up before now...
Gadgets Wiki GitHub More Gadgets...
User avatar
jsmorley
Developer
Posts: 21387
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by jsmorley »

SilverAzide wrote: August 4th, 2020, 6:08 pm Yes. I'm a bit shocked this hasn't cropped up before now...
We were just saying that in IRC
User avatar
SilverAzide
Posts: 956
Joined: March 23rd, 2015, 5:26 pm

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by SilverAzide »

jsmorley wrote: August 4th, 2020, 4:45 pmBut I have found the results from openstreetmap to be a bit hinky in general. I get really puzzling results a lot, and if you search for a location in a particular country, like "Moscow, Russia", you get results in Russian / Cyrillic, which is a bit annoying.
Welp, I've got a small gift for you! :)

In your location skin, add the lines needed to include the WeatherComJSONVariables.inc file. Then, in your main WebParser, do this:

Code: Select all

[MeasureLocations]
...
Header="Accept-Language: #Language#"
...
Then, you'll get the following. On the left is "en-US", on the right is "ru-RU", which is the default if you don't specify.
weathercode.jpg
You do not have the required permissions to view the files attached to this post.
Gadgets Wiki GitHub More Gadgets...
User avatar
jsmorley
Developer
Posts: 21387
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by jsmorley »

SilverAzide wrote: August 4th, 2020, 6:44 pm Welp, I've got a small gift for you! :)

In your location skin, add the lines needed to include the WeatherComJSONVariables.inc file. Then, in your main WebParser, do this:

Code: Select all

[MeasureLocations]
...
Header="Accept-Language: #Language#"
...
Then, you'll get the following. On the left is "en-US", on the right is "ru-RU", which is the default if you don't specify.

weathercode.jpg
Huh... That's cool indeed! Thanks! Still some Cyrillic here and there, but a lot less...
User avatar
Yincognito
Posts: 2629
Joined: February 27th, 2015, 2:38 pm
Location: Terra Yincognita

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by Yincognito »

SilverAzide wrote: August 4th, 2020, 4:17 pm OK, I need a little assistance with an issue illustrated by this skin but not directly weather.com related.

Can someone try using the location search feature in this skin and try to search for the text Wrocław? Take note of the "accented-L" symbol, it's not a latin letter L. The WebParser is returning nothing, and I can't figure out why. If you open your browser and use the exact string that WebParser is using (https://nominatim.openstreetmap.org/search.php?q=Wrocław&format=json), it is working perfectly. Searching for https://nominatim.openstreetmap.org/search.php?q=Wroc%C5%82aw&format=json also works, of course, but that's not the URL that's going out... I tried a custom header of "Accepts-Language: pl-PL", but that didn't help (the header is mentioned in the OSM API docs).

Using the latin "L" works, but that's not what normal users would enter. Is there some behind the scenes Unicode handling going on in the browsers but not in WebParser?
If I try this, the WebParser result is a "bla bla URL is not UTF-8 encoded". If however, while still having the UCS-2 LE BOM encoded file (and the InputSearch variable set to Wrocław) I convert it to UTF-8 in Notepad++ (which would turn Wrocław into WrocĹ‚aw) the WebParser would do its job and display the results properly.

So, my guess is that it's not a WebParser issue, but an encoding one. WebParser is happy to take whatever encoded string and use it in the URL and it will work, but since we use UCS-2 LE BOM in our Rainmeter .INI-s and Nominatim expects a UTF-8 encoded string, issues like these will occur. The solution, in my view, is to find a way to convert the string from UCS-2 LE BOM to UTF-8 when passing it as a parameter in the URL. Another solution is to find some hypothetical Nominatim parameter that would convert whatever encoding we use to a UTF-8 automatically in the URL.
User avatar
jsmorley
Developer
Posts: 21387
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by jsmorley »

Yincognito wrote: August 4th, 2020, 7:33 pm If I try this, the WebParser result is a "bla bla URL is not UTF-8 encoded". If however, while still having the UCS-2 LE BOM encoded file (and the InputSearch variable set to Wrocław) I convert it to UTF-8 in Notepad++ (which would turn Wrocław into WrocĹ‚aw) the WebParser would do its job and display the results properly.

So, my guess is that it's not a WebParser issue, but an encoding one. WebParser is happy to take whatever encoded string and use it in the URL and it will work, but since we use UCS-2 LE BOM in our Rainmeter .INI-s and Nominatim expects a UTF-8 encoded string, issues like these will occur. The solution, in my view, is to find a way to convert the string from UCS-2 LE BOM to UTF-8 when passing it as a parameter in the URL. Another solution is to find some hypothetical Nominatim parameter that would convert whatever encoding we use to a UTF-8 automatically in the URL.
We are looking into this, and we still have not found the certain culprit. It is clearly an "encoding" issue, but we are not certain that changing the built-in API calls that WebParser uses to send a request to a remote resource is the right solution. We don't believe it has anything to do with UTF-8 vs UTF-16 or any of that.

What we are currently investigating is either a change to the :EncodeURL section variable parameter, or if that has BWC issues, a new one, something like :PercentEncode or something.

The goal would be to have a way to obey the rules found at https://www.url-encode-decode.com/ or http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for any measure value.

So in short, it would NOT encode the following characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~

and WOULD percent encode everything else in the measure value

So München would become M%C3%BCnchen. Note that the former will not work when passed to that site as is, but works perfectly well when passed to the site percent encoded.

So, if you are using InputText to capture some data that can contain non-ASCII characters, you would want to use that value as something like:

Code: Select all

URL=https://somesite.com?search=[SomeMeasure:PercentEncode]
In my view, :EncodeURL is entirely broken. It only seems to encode the URL "reserved" and "unsafe" characters, which is fine, but doesn't do anything with the rest of the many thousands of non-ASCII characters that are not allowed in URL parameters by the standard. Your web browser percent encodes those characters "under the covers", and you can see that it did it by going to a site using non-ASCII characters in the URL parameters, then copy the resulting URL from the bar at the top and paste it into a text file. You will see that the parameters are actually percent encoded by the rules before it is sent to the destination site.

We are hesitant to try to build that into WebParser, as the rules are a bit more complicated than you would think. You NEVER want to encode any part of the URI (https://) or destination (somesite.com/somepath/) and there can be "control" characters embedded in parameters like ?search=one&search2=two where you don't want to encode the ? or &, but ONLY when they are in fact being used as control characters. It's a lot of work to get it right.

We have not made a final decision on this, so stay tuned.
User avatar
Yincognito
Posts: 2629
Joined: February 27th, 2015, 2:38 pm
Location: Terra Yincognita

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by Yincognito »

jsmorley wrote: August 5th, 2020, 1:15 pm We are looking into this, and we still have not found the certain culprit. It is clearly an "encoding" issue, but we are not certain that changing the built-in API calls that WebParser uses to send a request to a remote resource is the right solution.

What we are currently investigating is either a change to the :EncodeURL section variable parameter, or if that has BWC issues, a new one, something like :PercentEncode or something.

The goal would be to have a way to obey the rules found at https://www.url-encode-decode.com/ or http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for any measure value.

So in short, it would NOT encode the following characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~

and WOULD percent encode everything else in the measure value

So München would become M%C3%BCnchen. Note that the former will not work when passed to that site as is, but works perfectly well when passed to the site percent encoded.

So, if you are using InputText to capture some data that can contain non-ASCII characters, you would want to use that value as something like:

Code: Select all

URL=https://somesite.com?search=[SomeMeasure:PercentEncode]
In my view, :EncodeURL is entirely broken. It only seems to encode the URL "reserved" and "unsafe" characters, which is fine, but doesn't do anything with the rest of the many thousands of non-ASCII characters that are not allowed in URL parameters by the standard. Your web browser percent encodes those characters "under the covers", and you can see that it did it by going to a site using non-ASCII characters in the URL parameters, then copy the resulting URL from the bar at the top and paste it into a text file. You will see that the parameters are actually percent encoded by the rules before it is sent to the destination site.

We are hesitant to try to build that into WebParser, as the rules are a bit more complicated than you would think. You NEVER want to encode any part of the URI (https://) or destination (somesite.com/somepath/) and there can be "control" characters embedded in parameters like ?search=one&search2=two where you don't want to encode the ? or &, but ONLY when they are in fact being used as control characters. It's a lot of work to get it right.

We have not made a final decision on this, so stay tuned.
I see. Hmm... that looks tricky indeed. I guess not building this into WebParser is the right call, since as I noticed, it seems ok with whatever encoded string one passes at it, depending on the .INI file encoding used of course. Enhancing the :EncodeURL, or making a new :PercentEncode parameter for section variables is probably best, as long as the "non-percent encoded" string is still available for usage by the user (in other words, if the percent encoding is done internally). I guess the decision on whether to implement the former or the latter also boils down to whether it would still be useful to have a string with only those already percent encoded characters in :EncodeURL encoded (e.g. maybe sometimes one wants only the commas, spaces and other stuff percent encoded instead of every non-alphanumeric character, I don't know).
Since we're at it, I'm sure you remember that I consider the current DecodeCharacterReference a bit... underpowered, to put it mildly. Not saying this to throw more work on your shoulders or anything, it just happens that the topics are somewhat related, so I had to mention it. No need to think about it though, my long substitute which does the decoding alone works like a charm for over 3 years now. In fact I use DecodeCharacterReference 0 times in all my skins. :D
Last edited by Yincognito on August 5th, 2020, 2:58 pm, edited 1 time in total.
User avatar
Yincognito
Posts: 2629
Joined: February 27th, 2015, 2:38 pm
Location: Terra Yincognita

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by Yincognito »

jsmorley wrote: August 5th, 2020, 1:15 pmWe don't believe it has anything to do with UTF-8 vs UTF-16 or any of that.
It does, if one wants to avoid the percent encoding and just use the characters themselves (like SilverAzide and I have initially attempted to). If it's a matter of doing the percent encoding on the problematic characters, then obviously it doesn't matter anymore, as there will be no non-ASCII characters in the string anyway.
User avatar
SilverAzide
Posts: 956
Joined: March 23rd, 2015, 5:26 pm

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by SilverAzide »

jsmorley wrote: August 5th, 2020, 1:15 pm We have not made a final decision on this, so stay tuned.
Just my useless opinion, FWIW, but I kind of like your idea of adding a :PercentEncode section variable. Seems like a nice addition, very K.I.S.S., and backward compatible... :-)
Gadgets Wiki GitHub More Gadgets...
User avatar
Yincognito
Posts: 2629
Joined: February 27th, 2015, 2:38 pm
Location: Terra Yincognita

Re: ⭐ Weather.com - Parsing the V3 JSON - NEW!

Post by Yincognito »

You are probably looking into some automatic way to do this through some C++ function and such (although I read somewhere it's not that easy, but then I might be very well mistaken as I'm no expert in C++), however maybe you find some good use of Unicode/UTF-8 Character Table (the percent encoding elements are in the UTF-8 column, as far as I can tell). If not useful in implementing the solution, maybe it's useful in verifying the expected results.