Webparser - Episode Number

StArL0rd84 · Post by **StArL0rd84** » August 14th, 2017, 10:11 am

So i am parsing a rss feed to get the current episode info of a show. https://showrss.info/show/262.rss
but it's proving to be difficult for me to to just get 'Show name & Episode number' without the stuff trailing behind it.

I get a few choices from where in the feed i can parse my information from:

<title>Game of Thrones 7x05 Eastwatch 1080p</title>
New episode: Game of Thrones S07E05 1080p WEB h264 TBS. Link: <a href="magnet:?
<tv:show_name>Game of Thrones</tv:show_name>
<tv:raw_title>Game of Thrones S07E05 1080p WEB h264 TBS</tv:raw_title>

But i only need: Game of Thrones S07E05
The text after the episode number is always different so i can't really tell webparser what to search for there.

Saw a recent post about string splitting:
https://forum.rainmeter.net/viewtopic.php?t=26507&p=138849#p138849
Which looks interesting, but i'm not really sure if i can use it in my case.

I tried just parsing the whole line and then substituting the unwanted info with blanks, but got really messy.
Because the info is always changing and would require the user to constantly be on top of this and adding new substitutions to keep the info clean of "clutter". Not elegant at all.

Need to also mention that there's also a upcoming episode feed https://showrss.info/show/schedule/262.rss
The info in there is much cleaner: <title>Game of Thrones 7x06</title>
And i could just parse that and search between x & </title>
Then substitute 06 with 05.
BUT, what happens when there is no new shows scheduled for that season?
I would miss the last episode! No bueno

ShowRSS_1.0.rmskin

FreeRaider · Post by **FreeRaider** » August 14th, 2017, 10:24 am

A question: is it everytime Game of Thrones SYEX?

Code: Select all

(?siU)<tv:raw_title>(.*S\d+E\d+)\s+.*</tv:raw_title>

StArL0rd84 · Post by **StArL0rd84** » August 14th, 2017, 10:48 am

FreeRaider wrote:A question: is it everytime Game of Thrones SYEX?
Code: Select all
(?siU)<tv:raw_title>(.*S\d+E\d+)\s+.*</tv:raw_title>

Wow that works, thank you.
Yes it's consistently Game of Thrones SYEX
Trying to wrap my head around how it works now

tnx again

(.*S\d+E\d+)\s+.*
I see some patterns in the regexp like \d+ and then there's the \s+ outside the parentheses.
hmm

FreeRaider · Post by **FreeRaider** » August 14th, 2017, 10:55 am

(.*S\d+E\d+)\s+.*
1st Capturing Group(.*S\d+E\d+)
. matches any character
* Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed
S matches the character S literally (case insensitive)
\d matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed
E matches the character E literally (case insensitive)
\d matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed
\s matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed
. matches any character
* Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed

(?siU)
s modifier: single line. Dot matches newline characters
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
U modifier: Ungreedy. The match becomes lazy by default. Now a ? following a quantifier makes it greedy

StArL0rd84 · Post by **StArL0rd84** » August 14th, 2017, 10:57 am

Awesome stuff, thanks
This'll be handy

FreeRaider · Post by **FreeRaider** » August 14th, 2017, 11:00 am

Glad to help.

Have a look at https://regex101.com/ It is a useful site for regexp

Webparser - Episode Number

Webparser - Episode Number

Re: Webparser - Episode Number

Re: Webparser - Episode Number

Re: Webparser - Episode Number

Re: Webparser - Episode Number

Re: Webparser - Episode Number