JSON parser grammar
cowan at ccil.org
Fri Jun 12 14:24:27 PDT 2009
Wes Garland scripsit:
> What about the 32 code points designated as non-characters? 1FFE, 1FFF,
> 2FFE, 2FFF.... 10FFE, 10FFF
The set of noncharacters amounts to exactly 66: U+FDD0..U+FDEF and xFFFE..xFFFF
where x = 0 to 10 (hex).
> If I understand correctly, the BOM and FFFF mentioned earlier in this thread
> are merely special instances of these characters.
> Should those code points be permitted in JSON text, as they might
> become future characters in Unicode, or not?
Not, because they are permanently reserved and will never be used
as characters, nor will there ever be any additional non-characters.
They allow applications to have, for internal processing purposes only,
a stream that is mostly Unicode characters but also contains some other
dinguses with application-defined meanings.
Non-characters shouldn't be confused with reserved code points, which
might at some future time be allocated for characters. Those are not
banned from interchange, because what a receiver believes to be a reserved
codepoint may have meant something to the sender if the sender is using
a later version of Unicode. This is inevitable in a character set too
large to define all at once.
Winter: MIT, John Cowan
Keio, INRIA, cowan at ccil.org
Issue lots of Drafts. http://www.ccil.org/~cowan
So much more to understand!
Might simplicity return? (A "tanka", or extended haiku)
More information about the es5-discuss