It is currently October 15th, 2019, 10:46 pm

RegExp matching error (-1), (-8)

Help with creating, editing & fixing problems with skins
User avatar
qwerky
Posts: 182
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: RegExp matching error (-1), (-8)

qwerky » March 25th, 2019, 7:27 pm

Yincognito wrote:
March 25th, 2019, 2:00 am
As a matter of fact, I do have some tips to deal with the different order of elements, groups, whatever. I'm successfully applying the trick in my feeds skin, because, as you probably know, the RSS/ATOM elements can be required, can be optional, can also be written in a different order, can have additional (and similar) elements, etc.

Regex is not so good when you have to deal with a different order of elements. You can use | (the regex OR) for two, three possibilities, but since the number of permutations is the factorial of the number of elements, they increase rapidly, and so does the complexity of the regex. What I do to handle that is not use a standard "parser style" regex, but rather a "remover style" substitute.

For example, let's say I have the string: <E1>a</E1><E2>b</E2><E3>c</E3>, where the elements E1, E2 and E3 can be in any kind of order, and I want to get the contents of E3. Instead of parsing the string with a standard regex like (?siU)<E\d>(.*)</E\d>.*<E\d>(.*)</E\d>.*<E\d>(.*)</E\d> in the WebParser parent measure and taking StringIndex=3 in one of its children, I'm passing the whole string (or just the piece where the element I'm looking for is) to a String measure (just like the WebParser parent is doing with its children), where I take only E3, using a Substitute="(?siU)(?:(?(?=.*<E3>).*<E3>(.*)</E3>.*+)|.*+)":"\1","(?:^\\1|\\1$)":"" and basically delete everything else. The substitute is not that complicated as it looks, it simply looks ahead for <E3> (the (?=...) part), all wrapped in a regex conditional (the (?(?=...)...|...) part) that instructs the regex engine to give me either the contents of E3 ... or (the | part in the conditional) nothing, since there is no capture group after the |. Now since the regex engine in Rainmeter has a problem returning empty strings, there will be possible \1 leftovers after the first substitute operation, which I delete in the second substitute operation.

The above will perform exactly like a WebParser child, returning the contents of E3 wherever that is in the string and irrespective of the order. It will be empty if the contents is empty, just like a WebParser child. The only difference is that you'd have to either use a bang to manually pass the whole string from the WebParser parent to the String measure where you do the substitutions ("fun" fact, substitutions have no effect on a WebParser parent), or just set the WebParser parent as the value of the String option in the String measure.

NOTE: I've used .*+ at the end of the first substitute operation to get the entire string afterwards. This could have been written also as .*? (which would be the usual greedy .*, i.e. matching as many characters as possible) or as .*+ (which is not just greedy, but possessive, i.e. matching as many characters as possible, but not releasing them to match subsequent tokens).
Reading your post over and over, but still confused (well, what else is new?). :? If you just wanted the value of E3, why not just use regex in the WebParser, even if the order does vary.

Or, is it that you want all three values? If that is the case, then do you pass the entire substring to three different String measures, and use three similar Substitutes?

If the latter, how does that differ from having that same substring as a string index in the parent WebParser, and then using three child WebParsers, each with its own regex (just like in your String measures)? And that way the order also does not matter.

Not saying you're wrong at all; just trying to understand. ;-)
Yincognito
Posts: 657
Joined: February 27th, 2015, 2:38 pm

Re: RegExp matching error (-1), (-8)

Yincognito » March 25th, 2019, 9:55 pm

qwerky wrote:
March 25th, 2019, 7:27 pm
Reading your post over and over, but still confused (well, what else is new?). :? If you just wanted the value of E3, why not just use regex in the WebParser, even if the order does vary.
Because then I can't get the other two, E1 and E2.
qwerky wrote:
March 25th, 2019, 7:27 pm
Or, is it that you want all three values? If that is the case, then do you pass the entire substring to three different String measures, and use three similar Substitutes?
Yes, that's it. Doesn't have to be the entire string, just the part where you are sure those tags will be (since, as you said, you are not sure only about their order).
qwerky wrote:
March 25th, 2019, 7:27 pm
If the latter, how does that differ from having that same substring as a string index in the parent WebParser, and then using three child WebParsers, each with its own regex (just like in your String measures)? And that way the order also does not matter.
I beg to differ. If the string is, for example, <E3>a</E3><E2>b</E2><E1>c</E1> (changed order, that is), and you get the 3rd value in a child, that won't be the value of the expected E3, but the value of E1. If you make your regex in the parent even more specific, like (?siU)<E1>(.*)</E1>.*<E2>(.*)</E2>.*<E3>(.*)</E3> (expecting the previous order, that is), you might have the unpleasant surprise of not getting anything at all for the new order, having to modify the regex each time the site provides the values in a different order than before.

NOTE: Of course, if each child uses its own regex, but similar to the regex in my String measure, then yeah, there isn't much difference between the two methods. Except maybe (and that's something I had to workaround in one of my skins) when you want to decode the character references last (assuming you have encoded characters in the source), after making other substitutions. The String measure is more flexible in that aspect, since the DecodeCharacterReference in a WebParser child performs the decoding before any potential regex substitutions you'd want to make in the measure - and sometimes it's preferable the other way.
qwerky wrote:
March 25th, 2019, 7:27 pm
Not saying you're wrong at all; just trying to understand. ;-)
Yeah, must do something about my way of explaining things. Sometimes I have the feeling that I overcomplicate what I'm trying to say, when in some cases it's best to keep it simple... :oops: Hopefully now you understood better my point. Feel free to ask for clarifications if that's not the case. ;-)
User avatar
qwerky
Posts: 182
Joined: April 10th, 2014, 12:31 am
Location: Canada

Re: RegExp matching error (-1), (-8)

qwerky » March 26th, 2019, 12:04 am

Yincognito wrote:
March 25th, 2019, 9:55 pm
NOTE: Of course, if each child uses its own regex, but similar to the regex in my String measure, then yeah, there isn't much difference between the two methods.
Yeah, that's exactly what I meant... What? You didn't read my mind? :lol: 'Cause that's what was in my mind, but it somehow didn't make it to paper (or to the screen). :oops:
Except maybe (and that's something I had to workaround in one of my skins) when you want to decode the character references last (assuming you have encoded characters in the source), after making other substitutions. The String measure is more flexible in that aspect, since the DecodeCharacterReference in a WebParser child performs the decoding before any potential regex substitutions you'd want to make in the measure - and sometimes it's preferable the other way.
Yes, if you're going to be feeding string measures in order to do some other substitutions anyway, then you may as well do it all in the string measure, as you described. Otherwise, I think they are about even.

So that's a good tip for that particular situation. Keep 'em coming. :D
Yeah, must do something about my way of explaining things. Sometimes I have the feeling that I overcomplicate what I'm trying to say, when in some cases it's best to keep it simple... :oops: Hopefully now you understood better my point. Feel free to ask for clarifications if that's not the case. ;-)
All good now. Thanks for the explanation. ;-)
Yincognito
Posts: 657
Joined: February 27th, 2015, 2:38 pm

Re: RegExp matching error (-1), (-8)

Yincognito » March 26th, 2019, 12:13 am

qwerky wrote:
March 26th, 2019, 12:04 am
Yeah, that's exactly what I meant... What? You didn't read my mind? :lol: 'Cause that's what was in my mind, but it somehow didn't make it to paper (or to the screen). :oops:
You got me, haha! It did appear on paper, I just read it wrong the first time, thus the second paragraph. See, I also may need to read things more than once to get things right sometimes... :D