Collation API not complete for search

Shawn Steele Shawn.Steele at
Mon Mar 28 15:45:59 PDT 2011

Yes, sort, comparison and in-text search seem like reasonable buckets to me.  Although in-text can further be broken into, I think, exact and non-exact cases.

- Shawn

-----Original Message-----
From: es-discuss-bounces at [mailto:es-discuss-bounces at] On Behalf Of Axel Hecht
Sent: Monday, March 28, 2011 3:44 PM
To: es-discuss at
Subject: Re: Collation API not complete for search

Reading this thread, we have possibly three types? "sort", "comparison", "in-text search"?

I'm trying to remember, and fail. Was "sort" and "non-sort" different other than default options?

As for proposals in this thread, I'm not too fond of putting non-optional arguments into options, that's not really how I understand the contract there.

Also, comments in this thread indicate that the matching substring may not be uniquely defined by the collator, i.e., there could be a difference between greedy and not. That sounds like a bad thing to happen.


On 25.03.11 21:42, Nebojša Ćirić wrote:
> Looking through the notes from the meeting I also found some problems 
> with the collator. We did specify the collatorType: search, but we 
> didn't offer a function that would make use of it. Mark and I are 
> thinking about:
> /**
>   * string - string to search over.
>   * substring - string to look for in "string"
>   * index - start search from index
>   * @return {Array} [first, last] - first is index of the match or -1, 
> last is end of the match or undefined.
>   */
> LocaleInfo.Collator.prototype.find(string, substring, index)
> We could also opt for iterator solution where we keep the state.
> The reason we need to return both begin and end part of the found string is:
> Look for *gaard* and we find *g**å**rd* - which may be equivalent in 
> Danish, but substring lengths don't match (5 vs. 4) so we need to tell 
> user the next index position.
> The other problem Jungshik found is that there is a combinatorial 
> explosion with all ignoreXXX options we defined. My proposal is to 
> define only N that make sense (and can be supported by all 
> implementors) and fall back the rest to some predefined default.
> --
> Nebojša Ćirić
> _______________________________________________
> es-discuss mailing list
> es-discuss at
es-discuss mailing list
es-discuss at

More information about the es-discuss mailing list