It is currently July 21st, 2019, 3:41 am

WebParser can't connect to site

Help with creating, editing & fixing problems with skins
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

WebParser can't connect to site

qwerky » February 6th, 2019, 11:40 pm

Hi. WebParser gives parse error, and RainRegExp says "Unable to connect to web site", for the following and similar URL's:
https://weather.gc.ca/warnings/report_e.html?on61#2121537321901824111201902060503wo1171cwto. It can connect to the base URL https://weather.gc.ca/warnings/report_e.html, but that page is just an error page, because the ?on61#2121537321901824111201902060503wo1171cwto is missing. Is WebParser not able to handle such a URL?
User avatar
jsmorley
Developer
Posts: 19301
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: WebParser can't connect to site

jsmorley » February 7th, 2019, 12:23 am

Yeah, I get a 400 "Bad Request" error when I connect to that URL outside of a browser. It could be they have some protections built-in to reduce being hammered by requests from outside a browser, where they can show you their advertising on the page.

I tried using a UserAgent option to fake the user agent string to be the same as my browser, and that didn't work, so it may be that you just can't get to that URL with WebParser.
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: WebParser can't connect to site

qwerky » February 7th, 2019, 6:18 pm

jsmorley wrote:
February 7th, 2019, 12:23 am
Yeah, I get a 400 "Bad Request" error when I connect to that URL outside of a browser. It could be they have some protections built-in to reduce being hammered by requests from outside a browser, where they can show you their advertising on the page.

I tried using a UserAgent option to fake the user agent string to be the same as my browser, and that didn't work, so it may be that you just can't get to that URL with WebParser.
Thanks for trying and confirming. I had the same thoughts, and wondered whether UserAgent would help, but you already tried that. :-( The WebParser options also allow setting custom headers; does anyone know of a way to see all the headers that my browser is sending?
User avatar
qwerky
Posts: 181
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: WebParser can't connect to site

qwerky » February 7th, 2019, 9:12 pm

It seems that the #2121537321901824111201902060503wo1171cwto portion is what is preventing the page loading. Using just https://weather.gc.ca/warnings/report_e.html?on61 does work, and I am able to parse it for the text in the warning.

What I suspect is happening is this: the "on" part is the province (Ontario), and the "61" refers to the city (Toronto). The trailing part (beginning with the "#"), I think refers to the particular watch/warning. So without that trailing part, the link goes to the current/latest bulletin for that location. That is completely adequate for my needs.

So now I need to figure out how to strip the trailing portion so it will load, and then put the parsed text into a tooltip. :D