It is currently September 14th, 2024, 4:42 pm

Reader

Discuss the use of Lua in Script measures.
User avatar
Saiho
Posts: 46
Joined: September 24th, 2012, 2:11 pm

Re: Reader

Post by Saiho »

Ah, that's what I forgot, silly me. Thank you again for all the help!
User avatar
Kaelri
Developer
Posts: 1721
Joined: July 25th, 2009, 4:47 am

Re: Reader

Post by Kaelri »

I'm going to post what I currently have for the next script update. Right now, virtually all of my spare time is devoted to finishing the new Rainmeter documentation website (along with the rest of the team). But I know there's a lot of demand right now for the "sorting" capability in particular. So here, at least, is a minimally-functional version of that.

I want to emphasize that this is an unfinished script that may be bug-prone. I can't afford to spend too much time supporting it, although of course feedback and bug reports are always welcome. Hopefully I'll have time to finish a more presentable script in the coming weeks.
Reader.zip
New options related to sorting:
  • CombineFeeds: If 1, combines all feeds into a single display. If no sorting options are given, the combined feed automatically uses the default sorting rules for feed #1. For RSS/Atom feeds, the default is "sort by publication date, descending". For Gcal/RTM calendars, the default is "sort by event date, ascending, and starting from the present" (it excludes events that have already passed).
  • Sort: If 1, sorts the currently-displayed feed. Sorting is automatically enabled when feeds are combined (above).
  • SortDir: 1 for ascending, -1 for descending.
  • SortKey: the item metadata tag that is used to sort the feed. Supports Date or Event. Technically works on other keys, although anything that isn't a number will give you errors.
  • SortRange: If Future, items with publication or event timestamps before the present time are excluded from display. If Past, items with timestamps after the present time are excluded. (I don't know why anyone would want to use "past," but it's available.)
Also, the "MinItems" and "MaxItems" have been split up into a more logical division:
  • MaxKeepItems: the maximum number of items kept in the database and/or saved to the history file.
  • MaxShowItems: the maximum number of items to create variables for.
  • MinShowItems: the minimum number of items to create variables for (sets as blank if there aren't enough items).
Also, note that "ItemNDate" and "ItemNEvent" are now created as separate variables, so you'll need to take that into account if you're using this feed on a Google Calendar or Remember The Milk event feed.

Finally, you should no longer use DecodeCharacterReference=1 on the WebParser measure for this script.

There are a ton of other additions, too, but I'll leave them undocumented for now. ;)
You do not have the required permissions to view the files attached to this post.
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Reader

Post by moshi »

some feeds (especially newspapers) also include links or images inside the description. would be nice if the Reader script could filter these out. an example that has both is: http://www.information.dk/feed
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Reader

Post by moshi »

solved this problem for me. you'll probably find a more elegant way, but it might give you some ideas.

first of all i added this to WebparserSubstitute, so quotation marks won't break the substitutes in the next step:

Code: Select all

'"':"“¢¢“"
then i created some measures like this:

Code: Select all

[MeasureItem1Description]
Measure=Calc
Formula=1
DynamicVariables=1
RegExpSubstitute=1
Substitute="1":"#Item1Desc#","<(.*?)>":"","^$":"no description","^ +":"","“¢¢“":'"'
and use the measures instead of the variables in the meters
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Reader

Post by moshi »

as i don't understand Lua and am not sure how your script deals with tabs and line breaks, i added a little more to the measures:

Code: Select all

[MeasureItem1Description]
Measure=Calc
Formula=1
DynamicVariables=1
RegExpSubstitute=1
Substitute="1":"#Item1Desc#","\t":" ","\n":" ","\r":" ","<(.*?)>":"","^$":"no description","^ +":""," +":" ","“¢¢“":'"'
User avatar
Kaelri
Developer
Posts: 1721
Joined: July 25th, 2009, 4:47 am

Re: Reader

Post by Kaelri »

This issue is hovering somewhere on my to-do list. Since feed descriptions (at least for RSS and Atom) often include fully-valid HTML, this will be a little more complex than simply stripping all tags. (And improperly-encoded < and > characters are common, which complicates the matter.) I'll probably just end up checking the string against a complete list of standard HTML tags.

As for line breaks: right now, the script looks for any sequence of multiple consecutive whitespace characters, and condenses them into a single space. I'll probably expand that that process to strip all \r and \n, then look for <br> and <p> tags to re-insert line breaks in the appropriate places.

Thanks for your feedback. :)
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Reader

Post by moshi »

i just noticed that my solution breaks the the links in atom feeds.
fixed this with add this to WebparserSubstitute:

Code: Select all

"'":"¢¢¢¢"
and replacing:

Code: Select all

["\']
with

Code: Select all

[“¢][“¢][“¢][“¢]
in the Lua script

and of course also adding

Code: Select all

"¢¢¢¢":"'"
to the substitutes of the measures.
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Reader

Post by moshi »

Kaelri wrote:This issue is hovering somewhere on my to-do list. Since feed descriptions (at least for RSS and Atom) often include fully-valid HTML, this will be a little more complex than simply stripping all tags. (And improperly-encoded < and > characters are common, which complicates the matter.) I'll probably just end up checking the string against a complete list of standard HTML tags.
...
definately. what i posted is just a personal solution for a skin that displays a single line as the description.
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Reader

Post by moshi »

Kaelri wrote:And improperly-encoded < and > characters are common, which complicates the matter.
currently i try to fix these by replacing them in WebparserSubstitute.
User avatar
moshi
Posts: 1740
Joined: November 13th, 2012, 9:53 pm

Re: Reader

Post by moshi »

just some snippets, that will be in the next version of my Sphynx theme. makes the descriptions of Google News and Google Finance feeds a little nicer:

Code: Select all

	if Class == 'RSS' then
		if s:match('http://news.google.com/news') then
			return 'GoogleNews'
		elseif s:match('http://www.google.com/finance') then
			return 'GoogleFinance'	
		else
			return 'RSS'
		end

Code: Select all

Types = {
	RSS = {
		Link        = { '<link.->(.-)</link>' },
		Item        = '<item.-</item>',
		ItemID      = { '<guid.->(.-)</guid>' },
		ItemLink    = { '<link.->(.-)</link>' },
		ItemDesc    = { '<description.->(.-)</description>' },
		ItemDate    = { '<pubDate.->(.-)</pubDate>', '<dc:date>(.-)</dc:date>' },
		ParseDate   = { ParseDateRSS, ParseDateAtom },
		DefaultSort = { Dir = -1, Key = 'Date', Range = 'All' },
		},
	GoogleFinance = {
		Link        = { '<link.->(.-)</link>' },
		Item        = '<item.-</item>',
		ItemID      = { '<guid.->(.-)</guid>' },
		ItemLink    = { '<link.->(.-)</link>' },
		ItemDesc    = { '<description.->.-<div style=.-width:80.->(.-)</description>' },
		ItemDate    = { '<pubDate.->(.-)</pubDate>', '<dc:date>(.-)</dc:date>' },
		ParseDate   = { ParseDateRSS, ParseDateAtom },
		DefaultSort = { Dir = -1, Key = 'Date', Range = 'All' },
		},		
	GoogleNews = {
		Link        = { '<link.->(.-)</link>' },
		Item        = '<item.-</item>',
		ItemID      = { '<guid.->(.-)</guid>' },
		ItemLink    = { '<link.->(.-)</link>' },
		ItemDesc    = { '<description.->.-<font size=.-1.->.-<font size=.-1.->.-<font size=.-1.->(.-)</description>' },
		ItemDate    = { '<pubDate.->(.-)</pubDate>', '<dc:date>(.-)</dc:date>' },
		ParseDate   = { ParseDateRSS, ParseDateAtom },
		DefaultSort = { Dir = -1, Key = 'Date', Range = 'All' },
		},