Best practices and data filters for FHR
Benjamin Smedberg
benjamin at smedbergs.us
Mon Jan 5 06:16:27 PST 2015
On 12/29/2014 8:11 PM, Brendan Colloran wrote:
> The metrics team has decided that we want to focus on this human
> language description of filters and rules, but it is indeed our intent
> that anyone that writes code that reads FHR data should follow this
> document so that numbers reported throughout Mozilla match no matter
> whether a script was written in R, Python, Lua, whatevs. As you
> noticed, there are definitely a few things we need to discuss and come
> to agreement around, but these rules are far enough along that it'd be
> reasonable to start turning them into code.
>
> Of course, that could mean a couple different things-- the metrics
> team is already using these filters in our scripts, but I suppose that
> we could do as you suggest and write one common pre-processor that
> reads every records, drops or sets aside bad ones, cleans up or drops
> bad days and sessions (by whatever mechanism we ultimately decide on),
> maybe segments the data by some dimensions, and so on. I guess
> something like that could fit into the processing pipeline after
> de-orphaning but before sampling. The metrics team has not discussed
> that possibility, but if you think that kind of centralized
> pre-processing and cleaning would be useful we (you and our team)
> could talk about it. We should at least think about moving in that
> direction for v4.
But what I do really want is that these rules are written down in source
control along with the filters in the various languages. Currently you
already have some filters in code form, but AFAIK are not anywhere I can
see them, and I have filters in fhr-toolbox which do not match yours,
perhaps in subtle ways. I certainly haven't done anything with time-skew
correction or exclusion, for example. Could you commit your current
filters into fhr-toolbox in whatever languages you already have them
implemented?
The important part here is that we expect most engineers to be running
analyses, and so we need to provide an out-of-the-box solution where
they can say "analyze all Firefox beta users" and follow these filters
in a reasonable way.
--BDS
More information about the fhr-dev
mailing list