Collation API not complete for search

Mike Samuel mikesamuel at gmail.com
Fri Mar 25 14:22:29 PDT 2011


2011/3/25 Mike Samuel <mikesamuel at gmail.com>:
> 2011/3/25 Nebojša Ćirić <cira at google.com>:
>> find method wouldn't return boolean but an array of two values:
>
> Sorry if I wasn't clear.  The !! at the beginning of the call to find
> is important.
> The undefined value you mentioned below as possible no match result is
> falsey because !!undefined === false.
>
>> myCollator.find('gaard', 'ard', 2) -> [2, 5]  // 4 or 5 as a bound
>> myCollator.find('ard', 'ard', 0) -> [0, 3]  // 2 or 3 as a bound
>> I guess [2, 5] !== [0, 3]
>
> True, but also [2, 5] !== [2, 5].
>
>> We could return [-1, undefined] for not found state, or just undefined.
>
>> I agree that returning a boolean makes for easier tests in loops.
>
>
>> 25. март 2011. 14.00, Mike Samuel <mikesamuel at gmail.com> је написао/ла:
>>>
>>> 2011/3/25 Nebojša Ćirić <cira at google.com>:
>>> > Looking through the notes from the meeting I also found some problems
>>> > with
>>> > the collator. We did specify the collatorType: search, but we didn't
>>> > offer a
>>> > function that would make use of it. Mark and I are thinking about:
>>> > /**
>>> >  * string - string to search over.
>>> >  * substring - string to look for in "string"
>>> >  * index - start search from index
>>> >  * @return {Array} [first, last] - first is index of the match or -1,
>>> > last
>>> > is end of the match or undefined.
>>> >  */
>>> > LocaleInfo.Collator.prototype.find(string, substring, index)
>>> > We could also opt for iterator solution where we keep the state.
>>>
>>> Assuming find returns a falsey value when nothing is found, is it the
>>> case that for all (string, index) pairs,
>>>
>>> !!myCollator.find(string, substring, index) ===
>>> !!myCollator.find(string.substring(index), substring, 0)

Maybe a better way to phrase this relation is

will any collator ever look at a code-unit to the left of index when
trying to determine whether there is a match at or after index?

E.g. if the code-unit at index might be a strict suffix of a substring
that could be represented as a one codepoint ligature.


>>> This would be false if the substring 'ard' should be found in 'gard',
>>> but not 'gaard' because then
>>>
>>>     !!myCollator.find('gaard', 'ard', 2) !== !!myCollator.find('ard',
>>> 'ard', 0)
>>>
>>>
>>> If that relation does not hold, then exposing find as an iterator
>>> might help prevent a profusion of subtly wrong loops.
>>>
>>>
>>> > The reason we need to return both begin and end part of the found string
>>> > is:
>>> > Look for gaard and we find gård - which may be equivalent in Danish, but
>>> > substring lengths don't match (5 vs. 4) so we need to tell user the next
>>> > index position.
>>> > The other problem Jungshik found is that there is a combinatorial
>>> > explosion
>>> > with all ignoreXXX options we defined. My proposal is to define only N
>>> > that
>>> > make sense (and can be supported by all implementors) and fall back the
>>> > rest
>>> > to some predefined default.
>>>
>>>
>>>
>>> > --
>>> > Nebojša Ćirić
>>> >
>>> > _______________________________________________
>>> > es-discuss mailing list
>>> > es-discuss at mozilla.org
>>> > https://mail.mozilla.org/listinfo/es-discuss
>>> >
>>> >
>>
>>
>>
>> --
>> Nebojša Ćirić
>>
>


More information about the es-discuss mailing list