qwerky wrote: ↑
March 24th, 2019, 11:59 pm
Not only do names change, as you noted above, but also the order in which they appear! Also, the groups they are in, the order of those groups, and yes occasionally even new elements.
Having said that, I would be very interested in any tips or techniques you would like to share, that would make all of this much easier the next time they rewrite the site--which I have no doubt will happen sooner or later.
As a matter of fact, I do have some tips to deal with the different order of elements, groups, whatever. I'm successfully applying the trick in my feeds skin, because, as you probably know, the RSS/ATOM elements can be required, can be optional, can also be written in a different order, can have additional (and similar) elements, etc.
Regex is not so good when you have to deal with a different order of elements. You can use |
(the regex OR) for two, three possibilities, but since the number of permutations is the factorial of the number of elements, they increase rapidly, and so does the complexity of the regex. What I do to handle that is not use a standard "parser style" regex, but rather a "remover style" substitute.
For example, let's say I have the string: <E1>a</E1><E2>b</E2><E3>c</E3>
, where the elements E1, E2 and E3 can be in any kind of order, and I want to get the contents of E3. Instead of parsing the string with a standard regex like (?siU)<E\d>(.*)</E\d>.*<E\d>(.*)</E\d>.*<E\d>(.*)</E\d>
in the WebParser parent measure and taking StringIndex=3
in one of its children, I'm passing the whole string (or just the piece where the element I'm looking for is) to a String measure (just like the WebParser parent is doing with its children), where I take only E3, using a Substitute="(?siU)(?:(?(?=.*<E3>).*<E3>(.*)</E3>.*+)|.*+)":"\1","(?:^\\1|\\1$)":""
and basically delete everything else. The substitute is not that complicated as it looks, it simply looks ahead for <E3> (the (?=...)
part), all wrapped in a regex conditional (the (?(?=...)...|...)
part) that instructs the regex engine to give me either the contents of E3 ... or (the |
part in the conditional) nothing, since there is no capture group after the |
. Now since the regex engine in Rainmeter has a problem returning empty strings, there will be possible \1
leftovers after the first substitute operation, which I delete in the second substitute operation.
The above will perform exactly like a WebParser child, returning the contents of E3 wherever that is in the string and irrespective of the order. It will be empty if the contents is empty, just like a WebParser child. The only difference is that you'd have to either use a bang to manually pass the whole string from the WebParser parent to the String measure where you do the substitutions ("fun" fact, substitutions have no effect on a WebParser parent), or just set the WebParser parent as the value of the String
option in the String measure.
NOTE: I've used .*+
at the end of the first substitute operation to get the entire string afterwards. This could have been written also as .*?
(which would be the usual greedy .*
, i.e. matching as many characters as possible) or as .*+
(which is not just greedy, but possessive, i.e. matching as many characters as possible, but not releasing them to match subsequent tokens).