RegExp=(?siU).*>Moon Rise:<.*<td>.*>(.*)<.*>Moon Culmination:<.*<td>.*>(.*)<.*>Moon Set:<.*<td>.*>(.*)<.*>Moon Distance:<.*<td>.*>(.*)<.*>Moon Altitude:<.*<td>.*>(.*)<.*>Moon Azimuth:<.*<td>.*>(.*)<.*<acronym>.*"Moon Phase".*class=.*>(.*)<.*<acronym>.*"Moon Age".*class=.*>(.*)<.*<acronym>.*"Next New Moon".*class=.*>(.*)<.*<acronym>.*"Next Full Moon".*class=.*>(.*)<.*>Lat:<.*<td.*<span.*>(.*)<.*<span.*>(.*)<.*>Lng:<.*<td.*<span.*>(.*)<.*<span.*>(.*)<.*
Returned StringIndexes:
1 => 02:55:58 || Moon Rise
2 => 07:30:46 || Moon Culmination
3 => 12:01:09 || Moon Set
4 => 396483km || Moon Distance
5 => -61.83° || Moon Altitude
6 => 10.55° || Moon Azimuth
7 => Waning Crescent/37.0% || Moon Phase
8 => 22.93 von 29.79 Tagen || Moon Age
9 => 06.03.2019 17:03:32 || Next New Moon
10 => 21.03.2019 02:42:19 || Next Full Moon
11 => N 48°51'29.87'' || Lat
12 => 48.85830° || Lat Degrees
13 => E 2°17'40.2'' || Lng
14 => 2.29450° || Lng Degrees
The only other data is a list of eclipses starting in 2001, but that contains 20+ values for each eclipse and trying to capture just the first few would exceed the limit of 99 indexes for a single parse. This could be done using a separate measure and perhaps StringIndex2 method.
You are the RegExps god! Thanks !
This informations for now is more than enough.
It would be nice to separate items 7 and 8 into two data.
My little test RegEx works here with my WebParserDump.txt:
jsmorley wrote: ↑February 28th, 2019, 2:20 am
The output is encoded in UTF-16 LE. You will need to use Codepage=1200 on the WebParser parent measure.
MY GOD, that worked!
Thank you very much jsmorley !!!
We did the impossible!
I had already verified this and found <meta charset = "utf-8"> in output.html and in WebParserDump.txt and found that everything was correct in this part.
RegExp=(?siU).*>Moon Rise:<.*<td>.*>(.*)<.*>Moon Culmination:<.*<td>.*>(.*)<.*>Moon Set:<.*<td>.*>(.*)<.*>Moon Distance:<.*<td>.*>(.*)<.*>Moon Altitude:<.*<td>.*>(.*)<.*>Moon Azimuth:<.*<td>.*>(.*)<.*<acronym>.*"Moon Phase".*class=.*>(.*)/(.*)<.*<acronym>.*"Moon Age".*class=.*>(.*)\s.*\s(.*)\s.*<.*<acronym>.*"Next New Moon".*class=.*>(.*)<.*<acronym>.*"Next Full Moon".*class=.*>(.*)<.*>Lat:<.*<td.*<span.*>(.*)<.*<span.*>(.*)<.*>Lng:<.*<td.*<span.*>(.*)<.*<span.*>(.*)<.*
This will return the original index 7 as "Waning Crescent" and index 8 as "37.0%"
Index 9 will be "22.93" and index 10 is "29.79"
Of course the remaining indexes will be increased by these 2 additional captures.
As JSMorley pointed out, you need to know how the file is encoded to avoid errors.
GTI.H wrote: ↑February 28th, 2019, 2:40 am
MY GOD, that worked!
Thank you very much jsmorley !!!
We did the impossible!
I had already verified this and found <meta charset = "utf-8"> in output.html and in WebParserDump.txt and found that everything was correct in this part.
How did you find Codepage=1200 ?
I just loaded your output in Notepad++ and checked the encoding.
It doesn't matter what encoding the website uses, the Powershell script you are using is saving it in the default UTF-16 LE. There is likely a way to change that if you want, but I'm not sure it really matters.
RegExp=(?siU).*>Moon Rise:<.*<td>.*>(.*)<.*>Moon Culmination:<.*<td>.*>(.*)<.*>Moon Set:<.*<td>.*>(.*)<.*>Moon Distance:<.*<td>.*>(.*)<.*>Moon Altitude:<.*<td>.*>(.*)<.*>Moon Azimuth:<.*<td>.*>(.*)<.*<acronym>.*"Moon Phase".*class=.*>(.*)/(.*)<.*<acronym>.*"Moon Age".*class=.*>(.*)\s.*\s(.*)\s.*<.*<acronym>.*"Next New Moon".*class=.*>(.*)<.*<acronym>.*"Next Full Moon".*class=.*>(.*)<.*>Lat:<.*<td.*<span.*>(.*)<.*<span.*>(.*)<.*>Lng:<.*<td.*<span.*>(.*)<.*<span.*>(.*)<.*
This will return the original index 7 as "Waning Crescent" and index 8 as "37.0%"
Index 9 will be "22.93" and index 10 is "29.79"
Of course the remaining indexes will be increased by these 2 additional captures.
As JSMorley pointed out, you need to know how the file is encoded to avoid errors.
Perfect as it should be!
Thank you!
I'm still going to test and implement the childs.
I believe in you, you are the one who does not believe me!
Last edited by GTI.H on February 28th, 2019, 3:43 am, edited 1 time in total.
It doesn't matter what encoding the website uses, the Powershell script you are using is saving it in the default UTF-16 LE. There is likely a way to change that if you want, but I'm not sure it really matters.
It matters a lot, without it WebParser does not work, but we do not have to change the default UTF-16 LE.
GTI.H wrote: ↑February 28th, 2019, 3:28 am
It matters a lot, without it WebParser does not work, but we do not have to change the default UTF-16 LE.
What I mean is that is doesn't matter if the file is UTF-8 and you use WebParser with the default Codepage, or the file is UTF-16 LE and you use Codepage=1200, as long as you know which it is and can react appropriately.
jsmorley wrote: ↑February 28th, 2019, 3:51 am
What I mean is that is doesn't matter if the file is UTF-8 and you use WebParser with the default Codepage, or the file is UTF-16 LE and you use Codepage=1200, as long as you know which it is and can react appropriately.
That is what I had understood even though my answer suggested something else.
So I asked where you saw UTF-16 LE.
This is kind of confusing to anyone who is not on top of it all the time, See Here.
This is what I see in my Notepad++:
output file N++ Encoding.JPG
You do not have the required permissions to view the files attached to this post.