It is currently April 27th, 2024, 2:21 am

RegExp problem with german Umlauts again

Get help with creating, editing & fixing problems with skins
doomerino
Posts: 10
Joined: May 9th, 2014, 8:42 pm

RegExp problem with german Umlauts again

Post by doomerino »

hello..

I wrote a skin to check a german Website for some Events nearby...

So i parse the Event-LIST-Page and solved the umlautsproblem with substitute like:
"&":"&","&ouml;":"ö","&uuml;":"ü","&auml;":"ä","&Ouml;":"Ö","&Uuml;":"Ü","&Auml;":"Ä","&szlig;":"ß","&eacute;":"é","&nbsp;":" ","'":"'",""":"'","<br>":"#CRLF#"
Works fine so far...

but to Read the Description of the Event i parsed the EVENT-page, too and this will not shown correct with the german umlauts.

I saved the HTML-Code in a dumpfile to check content, and inside the dumpfile all shows fine with umlauts (codepage=ansi)

My Skin.ini is saved in Notepad++ (codepage UCS-2 LE-BOM)

Here is a Pic to show what i mean:
Image

What can i do, to solve the umlauts-problem?!?! :confused:

thx for help...

greetings


here is my skin.

Code: Select all

[Rainmeter]
AccurateText=1
DynamicWindowSize=1
Update=1000
;Background=Background.png
BackgroundMode=3
BackgroundMargins=0,34,0,14

[Variables]
ITEM=.*<div class="dfxContentItem box">.*<a href='(.*)'>(.*)</a>.*<p><b><i>(.*)</b></i>.*<p><i>(.*)</i></p>
INFO=.*<div class="dfxTextDetail" >.*(\S.*)\s</div>
URL=https://www.landfunker.de/termine/index.php?rubric=83
URLKURZ=https://www.landfunker.de/termine/
Substitute="&":"&","&ouml;":"ö","&uuml;":"ü","&auml;":"ä","&Ouml;":"Ö","&Uuml;":"Ü","&Auml;":"Ä","&szlig;":"ß","&eacute;":"é","&nbsp;":" ","'":"'",""":"'","<br>":"#CRLF#"
Yspace=15r
Xspace=10
XspaceT=160
TermineTitle=LANDFUNKER TERMINE
UpdateInfos=1800
TextFont=Trebuchet MS
FontHeightTitel=11
FontColorTitel=235,200,0,255
FontHeight=8

;---StyleTitel
[sTextTitel]
FontFace=#TextFont#
StringAlign=LEFT
FontSize=#FontHeight#
StringStyle=BOLD
StringEffect=SHADOW
FontColor=#FontColorTitel#
FontEffectColor=1a1a1a
MouseOverAction=[!SetOption "#CURRENTSECTION#" FontColor 66ccff] [!Update]
MouseLeaveAction=[!SetOption "#CURRENTSECTION#" FontColor ""] [!Update]
y=r
Antialias=1
UpdateDivider=30




[MeasureLandfunker]
Measure=Plugin
Plugin=WebParser.dll
UpdateRate=600
Url=#URL#
RegExp=(?siU)#ITEM##ITEM##ITEM##ITEM##ITEM#
FinishAction=[!EnableMeasure "MeasureParseInfo1"][!CommandMeasure MeasureParseInfo1 "Update"][!EnableMeasure "MeasureParseInfo2"][!CommandMeasure MeasureParseInfo2 "Update"][!EnableMeasure "MeasureParseInfo3"][!CommandMeasure MeasureParseInfo3 "Update"][!EnableMeasure "MeasureParseInfo4"][!CommandMeasure MeasureParseInfo2 "Updat4"][!EnableMeasure "MeasureParseInfo5"][!CommandMeasure MeasureParseInfo5 "Update"]
UpdateDivider=30

[MeasureLink1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
StringIndex=1

[MeasureTitel1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=2

[MeasureOrt1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=3

[MeasureDatum1]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=4


[MeasureLink2]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
StringIndex=5

[MeasureTitel2]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=6

[MeasureOrt2]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=7

[MeasureDatum2]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=8

[MeasureLink3]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
StringIndex=9

[MeasureTitel3]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=10

[MeasureOrt3]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=11

[MeasureDatum3]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=12

[MeasureLink4]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
StringIndex=13

[MeasureTitel4]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=14

[MeasureOrt4]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=15

[MeasureDatum4]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=16

[MeasureLink5]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
StringIndex=17

[MeasureTitel5]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=18

[MeasureOrt5]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=19

[MeasureDatum5]
Measure=Plugin
Plugin=WebParser.dll
Url=[MeasureLandfunker]
Substitute=#Substitute#
StringIndex=20


;----------INFOS ABRUFEN---------
[MeasureParseInfo1]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink1]
RegExp=(?siU)#INFO#
DynamicVariables=1
StringIndex=1
Disabled=1
RegExpSubstitute=1
Substitute=#Substitute#
Debug=2
Debug2File=#CURRENTPATH#Termin1.txt
UpdateDivider=30

[MeasureParseInfo2]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink2]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=2
Disabled=1

[MeasureParseInfo3]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink3]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=3
Disabled=1

[MeasureParseInfo4]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink4]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=4
Disabled=1

[MeasureParseInfo5]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink5]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=5
Disabled=1



;---- METERS ----


[TermineTitle]
Meter=STRING
MeterStyle=sTextTitel
FontSize=#FontHeightTitel#
;SolidColor=00000001
W=180
H=18
X=36
Y=5
LeftMouseUpAction=#URL#
ToolTipText="@LANDFUNKER TERMINE"
Text=#TermineTitle#
UpdateDivider=-1
Group=1

[TopLine]
Meter=IMAGE
SolidColor=235,170,0,200
X=10
Y=25
W=328
H=1
UpdateDivider=-1

[MeterDatum1]
MeasureName=MeasureDatum1
Meter=String
X=#Xspace#
Y=30
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
ClipString=2
ClipStringW=150
ClipStringH=20

[MeterTitel1]
MeasureName=MeasureTitel1
Meter=String
X=#XspaceT#
Y=30
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
Text="%1"
ToolTipText=[MeasureParseInfo1]#CRLF#
ToolTipType=1
LeftMouseUpAction=[#URLKURZ#[MeasureLink1]]
DynamicVariables=1 

[MeterDatum2]
MeasureName=MeasureDatum2
Meter=String
X=#Xspace#
Y=#Yspace#
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
ClipString=2
ClipStringW=140
ClipStringH=20

[MeterTitel2]
MeasureName=MeasureTitel2
Meter=String
X=#XspaceT#
Y=0r
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
Text="%1"
ToolTipText=[MeasureParseInfo2]#CRLF#
ToolTipType=1
LeftMouseUpAction=[#URLKURZ#[MeasureLink2]]
DynamicVariables=1 

[MeterDatum3]
MeasureName=MeasureDatum3
Meter=String
X=#Xspace#
Y=#Yspace#
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
ClipString=2
ClipStringW=140
ClipStringH=20

[MeterTitel3]
MeasureName=MeasureTitel3
Meter=String
X=#XspaceT#
Y=0r
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
Text="%1"
ToolTipText=[MeasureParseInfo3]#CRLF#
ToolTipType=1
LeftMouseUpAction=[#URLKURZ#[MeasureLink3]]
DynamicVariables=1 

[MeterDatum4]
MeasureName=MeasureDatum4
Meter=String
X=#Xspace#
Y=#Yspace#
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
ClipString=2
ClipStringW=140
ClipStringH=20

[MeterTitel4]
MeasureName=MeasureTitel4
Meter=String
X=#XspaceT#
Y=0r
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
Text="%1"
ToolTipText=[MeasureParseInfo4]#CRLF#
ToolTipType=1
LeftMouseUpAction=[#URLKURZ#[MeasureLink4]]
DynamicVariables=1 

[MeterDatum5]
MeasureName=MeasureDatum5
Meter=String
X=#Xspace#
Y=#Yspace#
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
ClipString=2
ClipStringW=140
ClipStringH=20

[MeterTitel5]
MeasureName=MeasureTitel5
Meter=String
X=#XspaceT#
Y=0r
;Padding=15,5,15,5
StringAlign=Left
FontFace=Tahoma
FontSize=8
FontColor=220,220,220
SolidColor=0,0,0,150
AntiAlias=1
Text="%1"
ToolTipText=[MeasureParseInfo5]#CRLF#
ToolTipType=1
LeftMouseUpAction=[#URLKURZ#[MeasureLink5]]
DynamicVariables=1 
User avatar
FreeRaider
Posts: 826
Joined: November 20th, 2012, 11:58 pm

Re: RegExp problem with german Umlauts again

Post by FreeRaider »

Have a look at CodePage
User avatar
jsmorley
Developer
Posts: 22631
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: RegExp problem with german Umlauts again

Post by jsmorley »

I do not believe that this is a CodePage issue. As far as I can tell, the HTML is being returned as UTF-8, like most of the rest of the web.

You can vastly shorten that Substitute by using DecodeCharacterReference=1 on all the child measures. That will automatically deal with all the HTML character codes like & and such, including the umlaut characters.

There is at least one character being returned in that string you highlighted in About / Skins that doesn't seem to be handled by standard fonts. Not sure what is actually there.
doomerino
Posts: 10
Joined: May 9th, 2014, 8:42 pm

Re: RegExp problem with german Umlauts again

Post by doomerino »

thank you for the hints....

i changed:

Code: Select all

[MeasureParseInfo1]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink1]
RegExp=(?siU)#INFO#
DynamicVariables=1
StringIndex=1
Disabled=1
RegExpSubstitute=1
Substitute=#Substitute#
Debug=2
Debug2File=#CURRENTPATH#Termin1.txt
CodePage=65001
DecodeCharacterReference=1
or also:

Code: Select all

[MeasureParseInfo1]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink1]
RegExp=(?siU)#INFO#
DynamicVariables=1
StringIndex=1
Disabled=1
RegExpSubstitute=1
Substitute=#Substitute#
Debug=2
Debug2File=#CURRENTPATH#Termin1.txt
CodePage=1200
DecodeCharacterReference=1
Both without any Result... (Wrong Letters)

The <?> Letters are german umlauts like the Word "Bevölkerung"(german word for Population) ... all umlauts like ä,ö,ü etc are wrong with this <?>-sign.

The Font ist the same Font as in the event-list and this first parse with the umlauts work correct...

In the downloaded Termin.txt (Dumpfile) the umlauts are correct, too.

Can you explain please a little bit more about Font/Codepage/umlauts.... i checked the manual, but i cant find out whats wrong.

thanks :confused:

pic:
Image
as you can see.. in the eventlist (first parse) (right corner) the umlauts work.. but in the "mouseover-info"(2nd parse) all umlauts are wrong...
User avatar
FreeRaider
Posts: 826
Joined: November 20th, 2012, 11:58 pm

Re: RegExp problem with german Umlauts again

Post by FreeRaider »

Have a look at follow code

Code: Select all

[Variables]
.
.
.
uni=1252

; ------------------- OMISSIS 


[MeasureLandfunker]
Measure=Plugin
Plugin=WebParser.dll
UpdateRate=600
Url=#URL#
RegExp=(?siU)#ITEM##ITEM##ITEM##ITEM##ITEM#
FinishAction=[!EnableMeasure "MeasureParseInfo1"][!CommandMeasure MeasureParseInfo1 "Update"][!EnableMeasure "MeasureParseInfo2"][!CommandMeasure MeasureParseInfo2 "Update"][!EnableMeasure "MeasureParseInfo3"][!CommandMeasure MeasureParseInfo3 "Update"][!EnableMeasure "MeasureParseInfo4"][!CommandMeasure MeasureParseInfo2 "Updat4"][!EnableMeasure "MeasureParseInfo5"][!CommandMeasure MeasureParseInfo5 "Update"]
UpdateDivider=30
CodePage=#uni#

; ------------------- OMISSIS 

;----------INFOS ABRUFEN---------
[MeasureParseInfo1]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink1]
RegExp=(?siU)#INFO#
DynamicVariables=1
StringIndex=1
Disabled=1
RegExpSubstitute=1
Substitute=#Substitute#
Debug=2
Debug2File=#CURRENTPATH#Termin1.txt
UpdateDivider=30
CodePage=#uni#

[MeasureParseInfo2]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink2]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=2
Disabled=1
CodePage=#uni#

[MeasureParseInfo3]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink3]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=3
Disabled=1
CodePage=#uni#

[MeasureParseInfo4]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink4]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=4
Disabled=1
CodePage=#uni#

[MeasureParseInfo5]
Measure=Plugin
Plugin=WebParser.dll
URL=https://www.landfunker.de/termine/[&MeasureLink5]
RegExp=(?siU)#INFO#
RegExpSubstitute=1
Substitute=#Substitute#
DynamicVariables=1
StringIndex=5
Disabled=1
CodePage=#uni#



;---- METERS ----
; ------------------- OMISSIS 
Capture_001.PNG
You do not have the required permissions to view the files attached to this post.
doomerino
Posts: 10
Joined: May 9th, 2014, 8:42 pm

Re: RegExp problem with german Umlauts again

Post by doomerino »

ok.. thanks that did the trick ... :17good

i fixed the right Measure...
User avatar
FreeRaider
Posts: 826
Joined: November 20th, 2012, 11:58 pm

Re: RegExp problem with german Umlauts again

Post by FreeRaider »

Glad to help.
User avatar
jsmorley
Developer
Posts: 22631
Joined: April 19th, 2009, 11:02 pm
Location: Fort Hunt, Virginia, USA

Re: RegExp problem with german Umlauts again

Post by jsmorley »

FreeRaider wrote:Glad to help.
Nice one. The source doesn't indicate that a codepage is being used, so that was fooling me.
User avatar
FreeRaider
Posts: 826
Joined: November 20th, 2012, 11:58 pm

Re: RegExp problem with german Umlauts again

Post by FreeRaider »

jsmorley wrote:Nice one. The source doesn't indicate that a codepage is being used, so that was fooling me.
Thanks jsmorley. I had a look at what site and it reported lang="de" xml:lang="de". So I thought that the site used German characters and that a codepage could be useful.