[rust-dev] Unicode identifiers

Graydon Hoare graydon at mozilla.com
Fri Feb 25 11:38:24 PST 2011


Hi,

I came across some 3rd party discussion of my choice of ASCII-range 
identifiers (and limitation of non-ASCII-range unicode to strings, chars 
and comments) that cited this as a major problem in the language. This 
prompted a little more research and reading on my part, and talking with 
people who had differing experiences with non-English identifier use in 
programming languages. I now believe that my earlier impression of 
"almost universal" adoption of ASCII-range identifiers in non-English 
programming shops was mistaken, an that there is actually substantial 
value to such programmers in having non-ASCII range available.

Moreover, looking at the approach taken by PEP 3131 (delegating to the 
NFKC-normalization-closed sets defined in UAX 31, 
XID_Start/XID_Continue), I see the "proper solution" has a 
better-established consensus than I had previously understood to exist. 
So I've updated the Rust manual to delegate to these specifications as 
well, and filed a bug (issue 242, if anyone wants to jump on it) to get 
the lexer patched up to handle this change.

Practical implications of this change are few for people (a) already 
comfortable with ASCII-range identifiers or (b) working outside the 
lexer. Hopefully it'll make things more welcome for people who don't fit 
in to case (a) though.

Apologies for the trashing about on this issue, I misunderstood the 
current state of play (possibly due to a little too much time spent in 
despair while trying to upgrade ECMAScript 4 to "any Unicode spec after 
1995", but that's a whole other story...)

-Graydon


More information about the Rust-dev mailing list