After the file encoding has been changed to UTF-16 LE, you can..
- Use Unicode characters in the script itself. For example, print("ユニコード") is valid code.
- Retrieve Unicode values from Rainmeter. For example, measure:GetStringValue() will return a string with Unicode characters intact. Previously, Unicode characters were replaced with a question mark.
Internally, the Lua strings are stored in the UTF-8 format. UTF-8 is a multibyte character encoding, which basically means that a single index of a string is not necessarily enough to represent one character. ASCII characters (such as B or x) can be represented in one index. Consider this example:
Code: Select all
function Initialize()
local s = "A"
print("The length of s is " .. s:len())
end
Code: Select all
function Initialize()
local s = "ユ"
print("The length of s is " .. s:len())
end
Code: Select all
function Initialize()
local s = "ABC 汉堡包 DEF"
print(s:sub(0, 3)) -- this prints "ABC"
print(s:sub(0, 8)) -- this prints "ABC 汉�"
print(s:sub(0, s:find("E"))) -- this prints "ABC 汉堡包 DE"
end
It is important to keep this in mind when enabling Unicode support for your script. If you treat the Unicode string as if all characters have a length of exactly 1 (like you could and still can with non-Unicode scripts), you're in for trouble with Unicode strings. If you keep this limitation in mind when writing your scripts, you should be perfectly fine. In fact, most of your existing scripts will probably work fine after converting to Unicode scripts
---
Below is some useful information about using the string functions in Unicode scripts (originally from here):
- sub
Works fine if the indices are calculated reasonably -- and I
think this is almost always the case. People don't generally
do [[ string.sub (UNKNOWN_STRING, 3, 6) ]], they calculate a
string position, e.g. by searching, or string beginning/end,
and maybe calculate offsets based on _known_ contents, e.g.
[[ string.sub (s, 1, string.find (s, "/") - 1) ]] - upper, lower
Works fine, but of course only upcases ASCII characters.
However doing this "properly" requires unicode tables, so
isn't appropriate for a minimal library I guess. - len
Works fine for calculating the string byte length -- which is
often what is actually wanted -- or calculating the string
index of the end of the string (for further searching or
whatever). - rep, format
Work fine (only use concatenation) - byte, char
Work fine - find, match, gmatch, gsub
Work fine for the most part. The main exception, of course,
is single-character wildcards, ".", "[^abc]", etc, when used
without a repeat suffix -- but I think in practice, these are
very rarely used without a repeat suffix.
Some of the patterns are limited to ASCII in their
interpration of course (e.g. "%a"), but this isn't really
fixable without full unicode tables, and the ASCII-only
interpretation is not dangerous. - reverse
Now _this_ will probably simply fail for strings containing
non-ASCII UTF-8. But it's also probably not very widely
used...