Last call for consensus on format-control char. issues
cowan at ccil.org
Wed Jun 17 11:32:04 PDT 2009
Allen Wirfs-Brock scripsit:
> "...However, visible distinctions created by certain format characters
> (particularly the Join_Control characters) are necessary and make
> necessary distinctions in certain languages. A blanket exclusion of
> these characters makes it impossible to create identifiers based on
> certain words or phrases in those languages..."
In Persian, omitting ZWNJ from the word meaning "letter" (nun, alef,
mim, heh, ZWNJ, alef, yeh) causes it to be rendered as the word meaning
In Malayalam, the word for "eyewitness" (da, vocalic-r, ka, virama,
ZWNJ, sa, aa, ka, virama, ssa) becomes completely illegible if the ZWNJ
In Sinhalese, the phrase "Sri Lanka" (the name of the country) is
normally written sha, virama, ZWJ, ra, ii, SPACE, la, anusvara, ka, aa.
If the SPACE is omitted (like "SriLanka" in English), the result is
still legible, but if the ZWJ is dropped, the result is unintelligible.
These aren't necessarily the sort of thing that native speakers can
call to mind right away -- what's an English word, offhand, that is
pronounced differently when titlecased than when not titlecased?
But they would be extremely surprising when found.
As you read this, I don't want you to feel John Cowan
sorry for me, because, I believe everyone cowan at ccil.org
will die someday. http://www.ccil.org/~cowan
--From a Nigerian-type scam spam
More information about the es5-discuss