`String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

Bjoern Hoehrmann derhoermi at gmx.net
Sat Oct 19 11:27:32 PDT 2013

* Mathias Bynens wrote:
>On 19 Oct 2013, at 12:15, Bjoern Hoehrmann <derhoermi at gmx.net> wrote:
>> Certainly not common enough to warrant a two-character method on the
>> native string type. Odds are people will use it incorrectly in an
>> attempt to make their code look concise […]
>Are you saying that changing the name to something that is longer than 
>`at` would solve this problem?

If it was `.getOneOrTwoCodepointLongSubstringAtUcs2CodeUnitIndex(...)`
I am sure people would be reluctant using it because it's unreasonably
long compared to `String.fromCodePoint(str.codePointAt(p))` and harder
to understand than the combination of those two primitives.

>> […] not understanding that it'll retrieve a substring of .length 1 or 2,
>> possibly consisting of a lone surrogate, based on a 16 bit index that
>> might fall in the middle of a character; the problematic cases are
>> fairly rare, so it's hard to notice improper use of `.at` in automated
>> testing or in code review.
>People are using `String.prototype.charAt()` incorrectly too, expecting
>it to return whole symbols instead of surrogate halves wherever possible.
>How would _not_ introducing a method that avoids this problem help?

Right now people do not have much of a choice other than writing code
that does not do the right thing when faced with malformed strings or
non-BMP characters, it's unreasonable to call a method like `substr`
and then manually smooth it up around the edges and perhaps scan the
interior for lone surrogates to ensure that at least your code doesn't
do the wrong thing. That gives you "well-known bad" code, which is a
good thing to have, better than more complicated code that might have
unknown bugs. Allen's loop `for (let p=0; p<str.length; p+=c.length)`
for instance is just waiting for someone to improve or replace it with
code that increments by `1` instead of `.length` because that's simpler.

The methods `fromCodePoint` and `codePointAt` can be used to get ugly
constants out of code that tries to do the right thing, and they will
offer some insight into how developers might go from UCS-only code to
something more proper, but for the moment duplicating all the UCS-based
methods strikes me as premature, especially when giving them seductive
names. How would a somewhat-surrogate-aware `substring` method work and
what would it be called, for instance? If it is omitted, we would be
back to square one, someone in need of substring functionality has to
jump through overly complicated hoops to make it work "correctly" and
ends up mixing surrogate-pair-aware with -unaware code.
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

More information about the es-discuss mailing list