Last call for consensus on format-control char. issues

John Cowan cowan at
Wed Jun 17 11:32:04 PDT 2009

Allen Wirfs-Brock scripsit:

> "...However, visible distinctions created by certain format characters
> (particularly the Join_Control characters) are necessary and make
> necessary distinctions in certain languages. A blanket exclusion of
> these characters makes it impossible to create identifiers based on
> certain words or phrases in those languages..."


In Persian, omitting ZWNJ from the word meaning "letter" (nun, alef,
mim, heh, ZWNJ, alef, yeh) causes it to be rendered as the word meaning
"names" instead.

In Malayalam, the word for "eyewitness" (da, vocalic-r, ka, virama,
ZWNJ, sa, aa, ka, virama, ssa) becomes completely illegible if the ZWNJ
is omitted.

In Sinhalese, the phrase "Sri Lanka" (the name of the country) is
normally written sha, virama, ZWJ, ra, ii, SPACE, la, anusvara, ka, aa.
If the SPACE is omitted (like "SriLanka" in English), the result is
still legible, but if the ZWJ is dropped, the result is unintelligible.

These aren't necessarily the sort of thing that native speakers can
call to mind right away -- what's an English word, offhand, that is
pronounced differently when titlecased than when not titlecased?
But they would be extremely surprising when found.

