It is currently May 7th, 2024, 6:11 am

[Solved] DecodeCharacterReference broken on straight URLs?

Report bugs with the Rainmeter application and suggest features.
User avatar
killall-q
Posts: 305
Joined: August 14th, 2009, 8:04 am

[Solved] DecodeCharacterReference broken on straight URLs?

Post by killall-q »

To test, comment out these original lines in illustro Feeds.ini and insert:

Code: Select all

[Variables]
getItem=(?(?=.*<item).*<title.*>(.*)</title>.*<link.*>(.*)</link>)
feedURL=http://www.engadget.com/rss.xml

[measureFeed]
Url=#feedURL#
RegExp="(?siU)<title.*>(.*)</title>.*<link.*>(.*)</link>.*<item[^s].*<title.*>(.*)</title>.*<link.*>(.*)</link>#getItem##getItem##getItem##getItem##getItem##getItem##getItem#"
Links and titles won't be matched correctly, but the problem can be seen.

I tried to switch to using Google Reader, where DecodeCharacterReference works correctly, but it randomly intersperses item titles with linebreaks which interfere with the skin I'm building. And apparently regex is technically unable to skip characters in the middle of a capture.

I was thinking of trying a <a href="(.*)">(.*)?\r(.*)</a> and repiece together post facto, but I don't even want to think about that headache...
Last edited by killall-q on February 6th, 2011, 4:48 am, edited 1 time in total.
User avatar
jsmorley
Developer
Posts: 22631
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: DecodeCharacterReference not working on straight URLs?

Post by jsmorley »

As far as I know, DecodeCharacterReference does not, nor ever claimed to remove things like <![CDATA[ and ]]> and such. It is meant to turn > into >. You are going to have to alter the RegExp to exclude that stuff from the capture, OR use a substitute to remove it.
User avatar
killall-q
Posts: 305
Joined: August 14th, 2009, 8:04 am

Re: DecodeCharacterReference not working on straight URLs?

Post by killall-q »

That's good news that it's not simply not working. I'll use a limited substitute with DecodeCharacterReference and keep watching for misses.
User avatar
jsmorley
Developer
Posts: 22631
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: DecodeCharacterReference not working on straight URLs?

Post by jsmorley »

I use something like:

Substitute="<![CDATA[":"","]]>":""

Which seems to work fine for me on a Google News feed I have that is loaded with that CDATA stuff.
User avatar
killall-q
Posts: 305
Joined: August 14th, 2009, 8:04 am

Re: DecodeCharacterReference not working on straight URLs?

Post by killall-q »

Yeah, thanks, it's working great. I really hated playing whack-a-mole with substitutes in the old days. Took about 3 months to catch them all with Slashdot's feed.
User avatar
killall-q
Posts: 305
Joined: August 14th, 2009, 8:04 am

Re: [Solved] DecodeCharacterReference broken on straight URL

Post by killall-q »

Ran into Slashdot's doubly fuddled HTML references. Ridiculous things like &amp;
AFAIK only Slashdot does this, and web feed readers don't have a problem with it.

So to fix these

&amp; &mdash; &ldquo; &rdquo; &lsquo; &rsquo; <br> <em> </em> <nobr> </nobr> <wbr>

I'm now using this

Code: Select all

SubstituteFeed="":"››› FEED OFFLINE ‹‹‹","<![CDATA[":"","]]>":"","&":"&","&mdash;":"—","&ldquo;":"“","&rdquo;":"”","&lsquo;":"‘","&rsquo;":"’","<br>":"","<em>":"","</em>":"","<nobr>":"","</nobr>":"","<wbr>":""
At least it's only half as long as it used to be. Wonder when I'll run into &quot;