Links and titles won't be matched correctly, but the problem can be seen.
I tried to switch to using Google Reader, where DecodeCharacterReference works correctly, but it randomly intersperses item titles with linebreaks which interfere with the skin I'm building. And apparently regex is technically unable to skip characters in the middle of a capture.
I was thinking of trying a <a href="(.*)">(.*)?\r(.*)</a> and repiece together post facto, but I don't even want to think about that headache...
Last edited by killall-q on February 6th, 2011, 4:48 am, edited 1 time in total.
As far as I know, DecodeCharacterReference does not, nor ever claimed to remove things like <![CDATA[ and ]]> and such. It is meant to turn > into >. You are going to have to alter the RegExp to exclude that stuff from the capture, OR use a substitute to remove it.
Yeah, thanks, it's working great. I really hated playing whack-a-mole with substitutes in the old days. Took about 3 months to catch them all with Slashdot's feed.
Ran into Slashdot's doubly fuddled HTML references. Ridiculous things like &
AFAIK only Slashdot does this, and web feed readers don't have a problem with it.