Collation API not complete for search

Shawn Steele Shawn.Steele at microsoft.com
Mon Mar 28 15:45:59 PDT 2011


Yes, sort, comparison and in-text search seem like reasonable buckets to me.  Although in-text can further be broken into, I think, exact and non-exact cases.

- Shawn


-----Original Message-----
From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Axel Hecht
Sent: Monday, March 28, 2011 3:44 PM
To: es-discuss at mozilla.org
Subject: Re: Collation API not complete for search

Reading this thread, we have possibly three types? "sort", "comparison", "in-text search"?

I'm trying to remember, and fail. Was "sort" and "non-sort" different other than default options?

As for proposals in this thread, I'm not too fond of putting non-optional arguments into options, that's not really how I understand the contract there.

Also, comments in this thread indicate that the matching substring may not be uniquely defined by the collator, i.e., there could be a difference between greedy and not. That sounds like a bad thing to happen.

Axel

On 25.03.11 21:42, Nebojša Ćirić wrote:
> Looking through the notes from the meeting I also found some problems 
> with the collator. We did specify the collatorType: search, but we 
> didn't offer a function that would make use of it. Mark and I are 
> thinking about:
>
> /**
>   * string - string to search over.
>   * substring - string to look for in "string"
>   * index - start search from index
>   * @return {Array} [first, last] - first is index of the match or -1, 
> last is end of the match or undefined.
>   */
> LocaleInfo.Collator.prototype.find(string, substring, index)
>
> We could also opt for iterator solution where we keep the state.
>
> The reason we need to return both begin and end part of the found string is:
>
> Look for *gaard* and we find *g**å**rd* - which may be equivalent in 
> Danish, but substring lengths don't match (5 vs. 4) so we need to tell 
> user the next index position.
>
> The other problem Jungshik found is that there is a combinatorial 
> explosion with all ignoreXXX options we defined. My proposal is to 
> define only N that make sense (and can be supported by all 
> implementors) and fall back the rest to some predefined default.
>
> --
> Nebojša Ćirić
>
>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


More information about the es-discuss mailing list