Collation API not complete for search

Mark Davis ☕ mark at macchiato.com
Fri Mar 25 15:14:55 PDT 2011


I think an iterator is a cleaner interface; we were just trying to minimize
new API.

In general, collation is context sensitive, so searching on substrings isn't
a good idea. You want to search from a location, but have the rest of the
text available to you.

For the iterator, you would need to be able to reset to a location, but the
context beforehand could affect what happens.

Mark

*— Il meglio è l’inimico del bene —*


On Fri, Mar 25, 2011 at 14:22, Mike Samuel <mikesamuel at gmail.com> wrote:

> 2011/3/25 Mike Samuel <mikesamuel at gmail.com>:
> > 2011/3/25 Nebojša Ćirić <cira at google.com>:
> >> find method wouldn't return boolean but an array of two values:
> >
> > Sorry if I wasn't clear.  The !! at the beginning of the call to find
> > is important.
> > The undefined value you mentioned below as possible no match result is
> > falsey because !!undefined === false.
> >
> >> myCollator.find('gaard', 'ard', 2) -> [2, 5]  // 4 or 5 as a bound
> >> myCollator.find('ard', 'ard', 0) -> [0, 3]  // 2 or 3 as a bound
> >> I guess [2, 5] !== [0, 3]
> >
> > True, but also [2, 5] !== [2, 5].
> >
> >> We could return [-1, undefined] for not found state, or just undefined.
> >
> >> I agree that returning a boolean makes for easier tests in loops.
> >
> >
> >> 25. март 2011. 14.00, Mike Samuel <mikesamuel at gmail.com> је написао/ла:
> >>>
> >>> 2011/3/25 Nebojša Ćirić <cira at google.com>:
> >>> > Looking through the notes from the meeting I also found some problems
> >>> > with
> >>> > the collator. We did specify the collatorType: search, but we didn't
> >>> > offer a
> >>> > function that would make use of it. Mark and I are thinking about:
> >>> > /**
> >>> >  * string - string to search over.
> >>> >  * substring - string to look for in "string"
> >>> >  * index - start search from index
> >>> >  * @return {Array} [first, last] - first is index of the match or -1,
> >>> > last
> >>> > is end of the match or undefined.
> >>> >  */
> >>> > LocaleInfo.Collator.prototype.find(string, substring, index)
> >>> > We could also opt for iterator solution where we keep the state.
> >>>
> >>> Assuming find returns a falsey value when nothing is found, is it the
> >>> case that for all (string, index) pairs,
> >>>
> >>> !!myCollator.find(string, substring, index) ===
> >>> !!myCollator.find(string.substring(index), substring, 0)
>
> Maybe a better way to phrase this relation is
>
> will any collator ever look at a code-unit to the left of index when
> trying to determine whether there is a match at or after index?
>
> E.g. if the code-unit at index might be a strict suffix of a substring
> that could be represented as a one codepoint ligature.
>
>
> >>> This would be false if the substring 'ard' should be found in 'gard',
> >>> but not 'gaard' because then
> >>>
> >>>     !!myCollator.find('gaard', 'ard', 2) !== !!myCollator.find('ard',
> >>> 'ard', 0)
> >>>
> >>>
> >>> If that relation does not hold, then exposing find as an iterator
> >>> might help prevent a profusion of subtly wrong loops.
> >>>
> >>>
> >>> > The reason we need to return both begin and end part of the found
> string
> >>> > is:
> >>> > Look for gaard and we find gård - which may be equivalent in Danish,
> but
> >>> > substring lengths don't match (5 vs. 4) so we need to tell user the
> next
> >>> > index position.
> >>> > The other problem Jungshik found is that there is a combinatorial
> >>> > explosion
> >>> > with all ignoreXXX options we defined. My proposal is to define only
> N
> >>> > that
> >>> > make sense (and can be supported by all implementors) and fall back
> the
> >>> > rest
> >>> > to some predefined default.
> >>>
> >>>
> >>>
> >>> > --
> >>> > Nebojša Ćirić
> >>> >
> >>> > _______________________________________________
> >>> > es-discuss mailing list
> >>> > es-discuss at mozilla.org
> >>> > https://mail.mozilla.org/listinfo/es-discuss
> >>> >
> >>> >
> >>
> >>
> >>
> >> --
> >> Nebojša Ćirić
> >>
> >
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110325/c61f0656/attachment.html>


More information about the es-discuss mailing list