Collation API not complete for search

Nebojša Ćirić cira at google.com
Mon Mar 28 17:04:54 PDT 2011


I think we can do that, leave the collatorType with sort and comparison. We
can add more, like in-text-search, later.

28. март 2011. 15.45, Shawn Steele <Shawn.Steele at microsoft.com> је
написао/ла:

> Yes, sort, comparison and in-text search seem like reasonable buckets to
> me.  Although in-text can further be broken into, I think, exact and
> non-exact cases.
>
> - Shawn
>
>
> -----Original Message-----
> From: es-discuss-bounces at mozilla.org [mailto:
> es-discuss-bounces at mozilla.org] On Behalf Of Axel Hecht
> Sent: Monday, March 28, 2011 3:44 PM
> To: es-discuss at mozilla.org
> Subject: Re: Collation API not complete for search
>
> Reading this thread, we have possibly three types? "sort", "comparison",
> "in-text search"?
>
> I'm trying to remember, and fail. Was "sort" and "non-sort" different other
> than default options?
>
> As for proposals in this thread, I'm not too fond of putting non-optional
> arguments into options, that's not really how I understand the contract
> there.
>
> Also, comments in this thread indicate that the matching substring may not
> be uniquely defined by the collator, i.e., there could be a difference
> between greedy and not. That sounds like a bad thing to happen.
>
> Axel
>
> On 25.03.11 21:42, Nebojša Ćirić wrote:
> > Looking through the notes from the meeting I also found some problems
> > with the collator. We did specify the collatorType: search, but we
> > didn't offer a function that would make use of it. Mark and I are
> > thinking about:
> >
> > /**
> >   * string - string to search over.
> >   * substring - string to look for in "string"
> >   * index - start search from index
> >   * @return {Array} [first, last] - first is index of the match or -1,
> > last is end of the match or undefined.
> >   */
> > LocaleInfo.Collator.prototype.find(string, substring, index)
> >
> > We could also opt for iterator solution where we keep the state.
> >
> > The reason we need to return both begin and end part of the found string
> is:
> >
> > Look for *gaard* and we find *g**å**rd* - which may be equivalent in
> > Danish, but substring lengths don't match (5 vs. 4) so we need to tell
> > user the next index position.
> >
> > The other problem Jungshik found is that there is a combinatorial
> > explosion with all ignoreXXX options we defined. My proposal is to
> > define only N that make sense (and can be supported by all
> > implementors) and fall back the rest to some predefined default.
> >
> > --
> > Nebojša Ćirić
> >
> >
> >
> > _______________________________________________
> > es-discuss mailing list
> > es-discuss at mozilla.org
> > https://mail.mozilla.org/listinfo/es-discuss
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>



-- 
Nebojša Ćirić
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110328/7ec22b03/attachment-0001.html>


More information about the es-discuss mailing list