Mark Davis ☕ mark at
Thu Nov 17 10:55:58 PST 2011

Regex has not been part of scope of the Globalization API work. I wanted to
find out whether any improvements from an internationalization point of
view are being planned, separately.

Some of the problems include:

   - Regex's fail on supplementary characters (above U+FFFF). Most of these
   are rather low frequency, but there are a large number of Chinese
   characters, some used in people's names or place names.
      - This also impacts the result of validation in HTML5, such as in
   - The Unicode support is otherwise extremely limited, especially for
   properties. See for a
   comparison to other programming languages. The downside of this is that it
   promotes hard-coded lists because people "think" they know what characters
   occur in words, etc., but get it wrong.
