Unicode support for Lua scripting

Changes made during the Rainmeter 3.0 beta cycle.
User avatar

Unicode support for Lua scripting

August 12th, 2013, 4:04 pm
jsmorley
Developer   [15851 posts]

We're excited to announce Unicode support for Lua! To enable Unicode support for your script, you must encode your .lua script file in the UTF-16 LE encoding (also known as UCS-2 in some programs). If you're using plain old Notepad, you can select Save As... and then Unicode in the Encoding selection.

poiru.png


After the file encoding has been changed to UTF-16 LE, you can..
  • Use Unicode characters in the script itself. For example, print("ユニコード") is valid code.
  • Retrieve Unicode values from Rainmeter. For example, measure:GetStringValue() will return a string with Unicode characters intact. Previously, Unicode characters were replaced with a question mark.

Important note:

Internally, the Lua strings are stored in the UTF-8 format. UTF-8 is a multibyte character encoding, which basically means that a single index of a string is not necessarily enough to represent one character. ASCII characters (such as B or x) can be represented in one index. Consider this example:

Code: Select all

function Initialize()
   local s = "A"
   print("The length of s is " .. s:len())
end


That will log "The length of s is 1", as expected. Now, instead of the ASCII character A, let's use the Japanese character ユ.

Code: Select all

function Initialize()
   local s = "ユ"
   print("The length of s is " .. s:len())
end


This, surprisingly, will print "The length of s is 3"! It is beyond the scope of this post to explain why this is so. However, what you need to know is that a single character may not necessarily have a length of 1. It may be 2, 3, or even more in some cases. In your Unicode scripts, you should never attempt to split a string based on arbitrary index. Consider this example:

Code: Select all

function Initialize()
   local s = "ABC 汉堡包 DEF"
   print(s:sub(0, 3)) -- this prints "ABC"
   print(s:sub(0, 8)) -- this prints "ABC 汉�"
   print(s:sub(0, s:find("E"))) -- this prints "ABC 汉堡包 DE"
end


As you can see, attempting to get the first 8 actual characters of the string s fails miserably. However, attempting to get the all characters until the first "E" works fine. This is because s:find() skips all the characters that are not E regardless of how many indexes the characters consumes.

It is important to keep this in mind when enabling Unicode support for your script. If you treat the Unicode string as if all characters have a length of exactly 1 (like you could and still can with non-Unicode scripts), you're in for trouble with Unicode strings. If you keep this limitation in mind when writing your scripts, you should be perfectly fine. In fact, most of your existing scripts will probably work fine after converting to Unicode scripts :)

---

Below is some useful information about using the string functions in Unicode scripts (originally from here):


  • sub
    Works fine if the indices are calculated reasonably -- and I
    think this is almost always the case. People don't generally
    do [[ string.sub (UNKNOWN_STRING, 3, 6) ]], they calculate a
    string position, e.g. by searching, or string beginning/end,
    and maybe calculate offsets based on _known_ contents, e.g.
    [[ string.sub (s, 1, string.find (s, "/") - 1) ]]

  • upper, lower
    Works fine, but of course only upcases ASCII characters.
    However doing this "properly" requires unicode tables, so
    isn't appropriate for a minimal library I guess.

  • len
    Works fine for calculating the string byte length -- which is
    often what is actually wanted -- or calculating the string
    index of the end of the string (for further searching or
    whatever).

  • rep, format
    Work fine (only use concatenation)

  • byte, char
    Work fine

  • find, match, gmatch, gsub
    Work fine for the most part. The main exception, of course,
    is single-character wildcards, ".", "[^abc]", etc, when used
    without a repeat suffix -- but I think in practice, these are
    very rarely used without a repeat suffix.

    Some of the patterns are limited to ASCII in their
    interpration of course (e.g. "%a"), but this isn't really
    fixable without full unicode tables, and the ASCII-only
    interpretation is not dangerous.

  • reverse

    Now _this_ will probably simply fail for strings containing
    non-ASCII UTF-8. But it's also probably not very widely
    used...
User avatar

Re: Unicode support for Lua scripting

August 12th, 2013, 4:11 pm
jsmorley
Developer   [15851 posts]

Here is a sample skin that demonstrates how to use Unicode with Lua:

UnicodeLua_1.0.rmskin
(2.04 KiB) Downloaded 356 times


UnicodeLua.jpg


All files used are encoded with UTF-16 LE

Additional note: In addition to the limitations with string "length" noted before, Lua cannot correctly read or write to external files that are encoded in UTF-16. In some cases you can get around this by reading the file in WebParser (with CodePage=1200 when the file is UTF=16) and RegEXp=(?siU)^(.*)$ and then get and parse that WebParser string value in Lua.

1) It uses WebParser to read an external file, and the Lua gets this measure, parses and displays the text. The WebParser measure uses CodePage=1200 to read UTF-16 Unicode files.

2) Unicode strings are embedded in the .lua file, and displayed in the skin.

3) Unicode strings are embedded in the .ini skin file, and displayed in the skin.
User avatar

Re: Unicode support for Lua scripting

August 12th, 2013, 4:15 pm
jsmorley
Developer   [15851 posts]

And another example, demonstrating how you might use the new Unicode support in Lua to create an RSS skin that can change to multiple sites in different languages:

LuaUnicodeRSS_1.0.rmskin
(2.47 KiB) Downloaded 305 times


LuaUnicodeRSS.jpg
User avatar

Re: Unicode support for Lua scripting

August 12th, 2013, 7:27 pm
thatsIch
   [461 posts]

Thank you <3

A lot of Love from Germany with their äöüß :)
User avatar

Re: Unicode support for Lua scripting

August 12th, 2013, 9:07 pm
sa3er
   [152 posts]

Excellent.
I'm going to try it with some Persian stuff. Thanks.
User avatar

Re: Unicode support for Lua scripting

August 13th, 2013, 1:29 am
Mordasius
   [1022 posts]

:cheer:

Thanks to all those that worked on this. It means I should be able to produce a neater version of the Hijri Calendar
User avatar

Re: Unicode support for Lua scripting

August 16th, 2013, 11:22 am
~Faradey~
   [348 posts]

dreams comes true :) :cheer: :yahoo:
thank you guys!
User avatar

Re: Unicode support for Lua scripting

August 20th, 2013, 3:38 pm
thatsIch
   [461 posts]

Rainmeter.log in UTF16-LE with BOM
Main.lua in UTF16-LE with BOM
Sub.lua in UTF16-LE with BOM

if I do io.open and :read on the sub.lua I only get cryptic expressions and can't evaluate them. I need to save the Sub.lua in UTF8. Bug or is there a workaround?

Sub.lua

Code: Select all

[[
   print("Test")
]]


in UTF8 results in

Code: Select all

DBUG (00:24:47.922) : [[
   print("Test")
]]


but in UTF16

Code: Select all

DBUG (00:27:59.538) : ��[


as you can see, I only print the content of the file
User avatar

Re: Unicode support for Lua scripting

August 20th, 2013, 3:41 pm
jsmorley
Developer   [15851 posts]

thatsIch wrote:Rainmeter.log in UTF16-LE with BOM
Main.lua in UTF16-LE with BOM
Sub.lua in UTF16-LE with BOM

if I do io.open and :read on the sub.lua I only get cryptic expressions and can't evaluate them. I need to save the Sub.lua in UTF8. Bug or is there a workaround?


Not really. See above:

Additional note: In addition to the limitations with string "length" noted before, Lua cannot correctly read or write to external files that are encoded in UTF-16 or UTF-8. In some cases you can get around this by reading the file in WebParser (with CodePage=1200 when the file is UTF=16) and RegEXp=(?siU)^(.*)$ and then get and parse that WebParser string value in Lua.


Actually, it can read UTF-8 w/BOM files ok, as long as you don't make the read "byte" specific and have Unicode chars in the file. Haven't really tested writing to UTF-8 files, and for sure it will hate UTF-16.

Edit: Yeah, it can write to UTF-8 w/BOM external files as well, although I'm not sure if there are any "gotchas".
User avatar

Re: Unicode support for Lua scripting

August 20th, 2013, 6:29 pm
sa3er
   [152 posts]

jsmorley wrote:Edit: Yeah, it can write to UTF-8 w/BOM external files as well, although I'm not sure if there are any "gotchas".

I haven't had any problem so far in this project.
It works like a charm.

Return to “Rainmeter 3.0”



Who is online

Users browsing this forum: No registered users and 0 guests