It is currently October 19th, 2020, 9:26 pm

Parse captcha-protected Feed URL

Help with creating, editing & fixing problems with skins
User avatar
Yincognito
Posts: 2760
Joined: February 27th, 2015, 2:38 pm
Location: Terra Yincognita

Parse captcha-protected Feed URL

Post by Yincognito »

So, I'm re-testing my feed aggregator skin (correcting some minor "bugs" in the process), basically iterating through a list of feeds found here, to see if they display properly without any funky character leftovers, and ran into an issue. For simplicity purposes, here's a basic skin illustrating what happens:

Code: Select all

[Variables]
;URL="http://slashdot.org/slashdot.rdf"
URL="http://defence-blog.com/feed"
;URL="http://defence-blog.com/feed?__cf_chl_captcha_tk__=..."

[Rainmeter]
Update=1000
DynamicWindowSize=1
AccurateText=1
BackgroundMode=2
SolidColor=0,0,0,255

---Measures---

[MeasureRSSParent]
Measure=WebParser
URL="#URL#"
RegExp="(?siU)^(.*)$"
StringIndex=1

---Meters---

[MeterRSSItemTitle]
Meter=String
MeasureName=MeasureRSSParent
W=144
H=36
FontSize=11
FontColor=255,255,255,255
StringStyle=Bold
AntiAlias=1
LeftMouseUpAction=["#URL#"]
DynamicVariables=1
The 1st commented URL variable is there just to confirm things are working, so you can skip uncommenting and trying it, if you like. The problem occurs when trying to get the feed content at defence-blog.com (i.e. the 2nd URL), which, due to its nature, is protected by a captcha, standard Cloudflare protection. As a result, the URL becomes similar to the 3rd URL variable (encoded captcha omitted and replaced by ... instead) in the browser, although as one can test, once you enter the captcha you can access the feed URL without any issues in the browser, using the basic URL, like in the left mouse up action from the meter.

Long story short, while for slashdot.org (the 1st feed URL) I'm getting the actual feed content (i.e. the <?xml version=... stuff), for defence-blog.com I'm getting a bare HTML without any feed content in it (i.e. the <!DOCTYPE html>... stuff). Is this part of the behavior related to this excerpt from the manual:
WebParser cannot use cookies or other session-based authentication, so it cannot be used to retrieve information from web sites requiring a login. However, Webparser can be used on sites which support HTTP authentication.
or it can be fixed and get the feed content of defence-blog.com with WebParser just like any other feed out there? Maybe some special UserAgent, Header, Flag options needed in the WebParser measure? I assume donwloading the webpage locally won't yield a different result from what I already tried, right?
Of course, things work well and my feed aggregator skin is properly displaying the said feed, if I copy paste the feed content from the browser to a local file, but naturally I'm looking to make this work directly, if possible.
EDIT: Oops, posted this in the Bugs & Feature Suggestions section by mistake. :oops: I wanted to post it in the Rainmeter Skins section, but well, it's too late for that now. My bad.
EDIT2: Thanks for moving it in the appropriate section. :thumbup: