First off, it should be noted that Lua is "platform agnostic", and is not designed specifically with the Windows operating system in mind. It is designed to be used on a variety of platforms, and so to some degree uses a "lowest common denominator" approach to things.
Let's start off by laying out the "rules". and "limitations".
Rules
1) Your .lua script file should ALWAYS be encoded as UTF-16 Little Endian. This will allow it to communicate properly with Rainmeter, which relies on UTF-16 as the default encoding. You can encode the .lua file as ANSI, but if you do, then no Unicode external files at all will be supported.
2) Any external files that you read or write must be encoded one of two ways:
a: ANSI, with no Unicode characters of any kind it it. Only the ASCII and Extended-ASCII characters sets for your system locale.
b: UTF-8 with or without BOM. Then it will support Unicode characters with some limitations.
The BOM (byte order mark) with UTF-8 is not important. Lua is fine with it either way.
3) External files you intend to read or write with Lua must NEVER be encoded as UTF-16. It simply won't work.
4) External folder or file names you wish to open in Lua must not contain Unicode characters in them. This will cause Windows to treat them as UTF-16, and Lua won't be able to open the file.
Limitations
Unicode characters are often stored as multi-byte characters. If you have strings like 你好,世界 or Привет мир, these have a certain number of "characters", in this case 5 and 10, but since the characters are two bytes each in length they are stored using 10 and 20 bytes respectively. This means that any Lua functions that depend on "bytes" will not work correctly. What this means in a practical sense is that if you are reading in an external file that contains Unicode, these functions should not be used on the result:
1) string.len() : This measures the length of a string in "bytes", which will be the same as "characters" when measuring ASCII, but will return confusing and improper values when measuring Unicode.
2) string.sub() : This extracts a sub-string from a string, based on "bytes". Again, when using it on ASCII strings, it reflects "characters", but will not work correctly with multi-byte Unicode characters.
What works
Other functions, that do not depend on "bytes", but "pattern matching", will work if you use them carefully.
1) string.find() : This will return indexes in "bytes" for where the start and end of a "pattern" is found in a string. This will return valid results, but the result will be based on "bytes" and not "characters". Take care how you use this.
2) string.match() : This will extract sub-strings from a string based on a "pattern". This will work fine.
3) string.gsub() : This will search and replace strings in a string based on a "pattern". This will work fine.
So the long and the short of it is that you can read and write external files that contain Unicode characters, but you have to remember two things:
1) Unicode is always UTF-8 in Lua.
2) Functions that depend on "bytes" to define "characters" will not work correctly.
Here is a skin you can play with to see how things work:
Skin:
Code: Select all
[Rainmeter]
Update=1000
DynamicWindowSize=1
AccurateText=1
[Variables]
[Lua]
Measure=Script
ScriptFile=LuaFileDemo.lua
Disabled=1
[MeterANSI]
Meter=String
FontSize=11
FontWeight=400
FontColor=255,255,255,255
SolidColor=47,47,47,255
Padding=5,5,5,5
AntiAlias=1
Text=[&Lua:ReadANSI('#CURRENTPATH#ANSI.txt')]
DynamicVariables=1
[MeterUTF8Unicode]
Meter=String
Y=5R
FontSize=11
FontWeight=400
FontColor=255,255,255,255
SolidColor=47,47,47,255
Padding=5,5,5,5
AntiAlias=1
Text=[&Lua:UTF8Unicode('#CURRENTPATH#UTF8Unicode.txt')]#CRLF#The length of the string in bytes is [&Lua:UTF8Len]#CRLF#Note that this is NOT the number of characters, but the number of bytes.#CRLF#The first Unicode character is [&Lua:firstUnicode]. Note that it is two bytes long, not one.#CRLF#firstUnicode = fileTxt:sub(48,50)
DynamicVariables=1
Code: Select all
function Initialize()
end
function ReadANSI(fileName)
file = io.open(fileName)
fileTxt = file:read('*all')
file:close()
ANSILen = fileTxt:len()
return fileTxt
end
function UTF8Unicode(fileName)
file = io.open(fileName)
fileTxt = file:read('*all')
file:close()
UTF8Len = fileTxt:len()
patStart, patEnd = string.find(fileTxt, '你')
firstUnicode = fileTxt:sub(patStart,patEnd)
return fileTxt
end
Code: Select all
This file is encoded as ANSI and has no Unicode characters.
Code: Select all
This file is encoded as UTF8 with BOM and says 你好,世界 and Привет мир in Unicode.