JSON parser grammar

John Cowan cowan at ccil.org
Fri Jun 12 14:24:27 PDT 2009


Wes Garland scripsit:

> What about the 32 code points designated as non-characters? 1FFE, 1FFF,
> 2FFE, 2FFF.... 10FFE, 10FFF

The set of noncharacters amounts to exactly 66: U+FDD0..U+FDEF and xFFFE..xFFFF
where x = 0 to 10 (hex).

> If I understand correctly, the BOM and FFFF mentioned earlier in this thread
> are merely special instances of these characters.

Yes.

> Should those code points be permitted in JSON text, as they might
> become future characters in Unicode, or not?

Not, because they are permanently reserved and will never be used
as characters, nor will there ever be any additional non-characters.
They allow applications to have, for internal processing purposes only,
a stream that is mostly Unicode characters but also contains some other
dinguses with application-defined meanings.

Non-characters shouldn't be confused with reserved code points, which
might at some future time be allocated for characters.  Those are not
banned from interchange, because what a receiver believes to be a reserved
codepoint may have meant something to the sender if the sender is using
a later version of Unicode.  This is inevitable in a character set too
large to define all at once.

-- 
Winter:  MIT,                                   John Cowan
Keio, INRIA,                                    cowan at ccil.org
Issue lots of Drafts.                           http://www.ccil.org/~cowan
So much more to understand!
Might simplicity return?                        (A "tanka", or extended haiku)


More information about the es5-discuss mailing list