[rust-dev] Unicode identifiers
graydon at mozilla.com
Fri Feb 25 11:38:24 PST 2011
I came across some 3rd party discussion of my choice of ASCII-range
identifiers (and limitation of non-ASCII-range unicode to strings, chars
and comments) that cited this as a major problem in the language. This
prompted a little more research and reading on my part, and talking with
people who had differing experiences with non-English identifier use in
programming languages. I now believe that my earlier impression of
"almost universal" adoption of ASCII-range identifiers in non-English
programming shops was mistaken, an that there is actually substantial
value to such programmers in having non-ASCII range available.
Moreover, looking at the approach taken by PEP 3131 (delegating to the
NFKC-normalization-closed sets defined in UAX 31,
XID_Start/XID_Continue), I see the "proper solution" has a
better-established consensus than I had previously understood to exist.
So I've updated the Rust manual to delegate to these specifications as
well, and filed a bug (issue 242, if anyone wants to jump on it) to get
the lexer patched up to handle this change.
Practical implications of this change are few for people (a) already
comfortable with ASCII-range identifiers or (b) working outside the
lexer. Hopefully it'll make things more welcome for people who don't fit
in to case (a) though.
Apologies for the trashing about on this issue, I misunderstood the
current state of play (possibly due to a little too much time spent in
despair while trying to upgrade ECMAScript 4 to "any Unicode spec after
1995", but that's a whole other story...)
More information about the Rust-dev