Last call for consensus on format-control char. issues

John Cowan cowan at ccil.org
Wed Jun 17 11:32:04 PDT 2009


Allen Wirfs-Brock scripsit:

> "...However, visible distinctions created by certain format characters
> (particularly the Join_Control characters) are necessary and make
> necessary distinctions in certain languages. A blanket exclusion of
> these characters makes it impossible to create identifiers based on
> certain words or phrases in those languages..."

Specifically:

In Persian, omitting ZWNJ from the word meaning "letter" (nun, alef,
mim, heh, ZWNJ, alef, yeh) causes it to be rendered as the word meaning
"names" instead.

In Malayalam, the word for "eyewitness" (da, vocalic-r, ka, virama,
ZWNJ, sa, aa, ka, virama, ssa) becomes completely illegible if the ZWNJ
is omitted.

In Sinhalese, the phrase "Sri Lanka" (the name of the country) is
normally written sha, virama, ZWJ, ra, ii, SPACE, la, anusvara, ka, aa.
If the SPACE is omitted (like "SriLanka" in English), the result is
still legible, but if the ZWJ is dropped, the result is unintelligible.

These aren't necessarily the sort of thing that native speakers can
call to mind right away -- what's an English word, offhand, that is
pronounced differently when titlecased than when not titlecased?
But they would be extremely surprising when found.

-- 
As you read this, I don't want you to feel      John Cowan
sorry for me, because, I believe everyone       cowan at ccil.org
will die someday.                               http://www.ccil.org/~cowan
        --From a Nigerian-type scam spam


More information about the es5-discuss mailing list