Best practices and data filters for FHR

Benjamin Smedberg benjamin at smedbergs.us
Mon Jan 5 06:16:27 PST 2015


On 12/29/2014 8:11 PM, Brendan Colloran wrote:
> The metrics team has decided that we want to focus on this human
> language description of filters and rules, but it is indeed our intent
> that anyone that writes code that reads FHR data should follow this
> document so that numbers reported throughout Mozilla match no matter
> whether a script was written in R, Python, Lua, whatevs. As you
> noticed, there are definitely a few things we need to discuss and come
> to agreement around, but these rules are far enough along that it'd be
> reasonable to start turning them into code.
>
> Of course, that could mean a couple different things-- the metrics
> team is already using these filters in our scripts, but I suppose that
> we could do as you suggest and write one common pre-processor that
> reads every records, drops or sets aside bad ones, cleans up or drops
> bad days and sessions (by whatever mechanism we ultimately decide on),
> maybe segments the data by some dimensions, and so on. I guess
> something like that could fit into the processing pipeline after
> de-orphaning but before sampling. The metrics team has not discussed
> that possibility, but if you think that kind of centralized
> pre-processing and cleaning would be useful we (you and our team)
> could talk about it. We should at least think about moving in that
> direction for v4.

But what I do really want is that these rules are written down in source 
control along with the filters in the various languages. Currently you 
already have some filters in code form, but AFAIK are not anywhere I can 
see them, and I have filters in fhr-toolbox which do not match yours, 
perhaps in subtle ways. I certainly haven't done anything with time-skew 
correction or exclusion, for example. Could you commit your current 
filters into fhr-toolbox in whatever languages you already have them 
implemented?

The important part here is that we expect most engineers to be running 
analyses, and so we need to provide an out-of-the-box solution where 
they can say "analyze all Firefox beta users" and follow these filters 
in a reasonable way.

--BDS



More information about the fhr-dev mailing list