Suggestions for the new unified FHR/Telemetry/Experiment ping

Vladan Djeric vdjeric at mozilla.com
Tue Jan 27 00:24:34 PST 2015


>
>
>
>    - It will be hard to do per-session analyses
>
>
> I have several responses here:
>
> 1) It will be a bit harder than currently, but I don't think that it will
> be extremely hard. There will be an efficient API to fetch all the pings
> associated with a user, which should make it relatively straightforward to
> stitch together an entire session from its pieces. This is a functional
> requirement for the more qualitative analyses, which will have to stitch
> together an entire user history and not just individual sessions. Doing an
> individual session should be fairly easy.
>


   - Whether it'll be easy to reconstruct sessions or user-days* from pings
   will depend on how we design the backend.
      - **Note:* I'm making up a term "*user-days*": all of a user's pings
      from a given day stitched together into one coherent unit, similar to the
      "days" structures in FHR payloads. FHR operates on calendar-days, so
      they'll need a user's pings collected over a single day converted to a
      "user-day"
   - As far as I know, we don't have a design for the backend and that's
   making me nervous. I would hate to make client-side changes only to realize
   we've painted ourselves into a corner on the server-side, and then have
   that collected Telemetry & FHR data go to waste.
   - My understanding was that there will be an API that returns the list
   of *archives* where a user's pings are located.
   - The API itself won't be hard to implement, but..
      - That means that whenever we want to look at a user of interest and
      re-create his sessions, we have to fetch all the archives containing his
      pings, pull out each ping from an offset in each archive, then
run the code
      that stitches those pings together. Then that process has to be repeated
      for the entire population of interest.
      - Stitching is a pain:
      - Different types of histograms will have to be stitched differently
            - e.g. histograms (or other measurements) that are
            one-per-session
            - Non-histogram Telemetry sections (e.g. add-on data,
         background hang reporter data) will have their own stitching rules
         - We'll have to maintain exception lists for histograms that look
         like regular (linear/exponential) histograms but actually represent
         one-per-session measurements
         - A single missing or corrupted fragment in a long session means
         we won't be able to recreate the session. We already have
problems with
         missing clientIDs
      - Running the session stitching job is going to take a long time when
      we're stitching sessions for 100,000 users every night.
      - If we want to re-construct long sessions (e.g. one month) during
         this stitching job, we'll have to touch a LOT of archives
(e.g. going 30
         days back).
         - If there's an issue during the stitching job, either a bug or
         some incorrect or missing logic, we'll have to re-run the job
and backfill
         data.
         - We're going to have to define ahead of time which users we're
      tracking (e.g. clientID % 10 == 0), because we just won't be able to
      efficiently record the locations of ALL pings for ALL users
         - That would essentially mean we throw away 90% of the data on the
         busier channels as soon as we receive it on the server
         - I think re-constructing an FHR user-day will be a lot easier
   than re-constructing a session
      - A user-day has the advantage of being data collected during a
      single day & the day's pings will likely all be submitted together during
      the following day or a couple of days later. That means we can probably
      reconstruct vast majority of user-days for any given day simply
by looking
      at pings collected from the day of interest and a few days after it.
      - If this FHR job runs every night, getting a users' longitudinal
      history will just be a trivial matter of joining the nightly results.
      - Rules for stitching together FHR data are simpler
      - There's more, but I will explain further in a later message


> 2) I treat the session orientation of telemetry as an unfortunate
> limitation, not a desirable property, for almost all of the use cases that
> I've seen. I'd like us to try and move away from reporting metrics based on
> sessions. Can you describe in more detail the use cases where analyzing
> data by session is preferable to analyzing by some constant denominator? We
> should be willing to use both clock time and activeTicks as denominators,
> and these denominators can both be calculated looking at individual
> subsession pings.
>

   - I wasn't around when Telemetry was initially designed, but I don't
   think sessions are the worst choice of unit for Telemetry. Sessions have
   inherent meaning since they map to how the browser is used. For example, a
   crash rate expressed in terms of "1 crash every 100 sessions" has an
   intuitive meaning.
   - I think your point about analyzing with respect to activeTicks and
   clock time is orthogonal. We can do that with today's Telemetry using
   session pings. In fact, that's exactly what Avi will use when calculating
   "responsiveness scores"
   - There is *definitely* benefit to analyzing data with respect to
   sessions:
      1. Some measurements just need to be done in terms of sessions, e.g.
      % of sessions with clean shutdowns in Firefox 38, maximum number
of threads
      during a session, ratio of cold startups to warm startups, etc.
      2. We frequently look for correlations between variables. Having
      access to a full session's measurements means having access to
all the data.
         - For example, let's say I want to know if there is a correlation
         between a user's networking stack getting into a catatonic
state and the
         browser doing a certain type of SPDY request. I need to
compare how common
         a stuck networking stack is in sessions that made that
request vs sessions
         that didn't make that request.
      - I do understand the advantages of "ping per day+environment". It's
   appealing because it gives us more precision, shorter submission delays and
   allows us to unify Telemetry & FHR. However, I'm worried the backend
   stitching is being treated as a black box, and that this approach will end
   up costing us our ability to do our current analyses.
      - If we can figure out a solid plan for efficiently recreating
      sessions & user-days on the backend, I'll be in complete agreement with
      switching to the new ping format.


> 3) For the case of the current telemetry dashboard, I'd like to understand
> why simply replacing the current whole-session analysis with the new
> subsessions would produce statistically worse results than the current
> session-based analysis.
>

I'll assume you mean the telemetry.mozilla.org *histogram *dashboard. For
the record, my objections are not on the basis of statistical skew.

First, the histograms in that dash can be divided into 2 categories:

   1. The first category is histograms that collect many measurements
   during a session, each mostly independent of the other. These are the most
   common, and are used to generate a view of the *distribution *of
   individual measurements. A good example is the exponential GC_MS histogram
   showing the duration of garbage collection. Another example are the "error
   code" enum histograms which show the relative frequency of different error
   codes.
   - Reporting by subsession is ideal for these histograms! We get data
      sooner from longer sessions and we get it even in the event of a
crash! No
      objections here.
      - We could even reset these histograms between pings from the same
      session.
   2. The second category of histograms are the per-session histograms
   (flag, count, some keyed histograms, etc).
      - These have to be reported differently. I've already talked about
      these and will answer your questions further below.
      - These histograms don't benefit from subsession reporting since
      their value isn't known until the session is complete.
      - These can also be represented well with your proposed approach.

However, the Telemetry *histogram *dashboard is the simplest & easiest
Telemetry use case to implement. It's also the least interesting. Custom
Telemetry analyses are much more powerful and are the preferred way of
answering the most interesting questions (is optimization X working, is
environment variable A correlated with performance problem B, how is
extension Y affecting Firefox performance, etc).


>  *If* you really care about this per-session, why can't you just take
> "true" from any of the subsessions as an indication that it's true for the
> entire session?
>

This was in relation to flag histograms. Telemetry users definitely want
flag histograms, there are 86 of them in Histograms.json (even though most
people don't know about them) because it's just a very natural use case.
Your approach wouldn't work. Flag histograms are reported as either a True
or False value, but the flag could be "set" to True many times in a
session. So a 30-day session that hits the flag condition every day would
count as 30 sessions submitting a true value (which is incorrect, 1 session
= 1 vote for flag histograms).

Instead, we could treat per-session histograms differently within your
proposal. Simply don't reset per-session histograms on every new ping, and
only ever report them in the final ping of a session.


> And if we just report by subsession, how is this much different from the
> skew that we already have between users who have lots of short sessions and
> users that keep their browser open for days or weeks?
>

The histograms are already defined as "per session" which is a meaningful
unit, e.g. percentage of browser startups that showed the "Updating
Nightly" dialog. So you can see how the short-session skew isn't always a
factor. Expressing the same measurement in terms of "dialog appearances per
subsession" or "per hour of usage" would be meaningless because the
measurement is specifically tied to startup behavior (i.e. once per
session).


> Maybe this just indicates that we're mis-using histograms for
> non-aggregate measurements, and we should just have a separate list of flag
> metrics which are treated differently.
>

That would work, but converting  all the client code is going to be no easy
task.


>     - Count histograms
>       <https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type> are
>       also per-session measurements. You can't aggregate a count-histogram value
>       from the middle of a session together with final values from other sessions
>
>
> Won't summing across the subsessions get you the total count for the
> session?
>

Yes it would, but that requires running a session stitching job before
feeding the data into the dash, something that's not currently required.
I'm also skeptical about session stitching. I think it's better to report
count histograms only in the final ping of the session.


> I don't understand this case. Assuming session stitching works, which is a
> general requirement for all sorts of analyses, this should work no worse
> than currently, and you potentially have finer-grain data on the subsequent
> days if that's useful.
>

Yup I was expressing concerns about the feasibility of session stitching.


>
>    - Resetting Telemetry and FHR data when a TelemetryExperiment begins
>    removes valuable context from the experiment ping. It's possible to
>    reconstruct it, but that's yet another server-side job to run
>
>   I don't understand this. Is this also assuming that stitching is
> expensive?
>

Yes.


> I feel like this proposal is optimizing for the wrong things.
>
> You are making a distinction between "Telemetry" measurements and other
> measurements in a way which I am specifically trying to avoid. The goal is
> to use the common histogram system for everything. At least some of those
> measurements must be distinguished by subsession. I explicitly want to get
> rid of the current situation where "telemetry metrics" are treated one way,
> and "FHR metrics" are treated in some entirely separate manner. We want to
> be able to use the standard histograms/keyed histograms for almost
> everything.
>

I understand the appeal of unification, but we have to be aware we're
merging systems with different requirements. The subsession reporting is
the least-common-denominator between FHR and Telemetry, but that comes at
the cost of doing more work on the backend to convert data back to the 2
formats we currently have and care about (user-days and sessions).
Note that in my earlier proposal, FHR *could* use the same histograms as
Telemetry.


> For the simple things like the telemetry dashboard, I believe that doing
> all analysis by subsession is good enough (no worse than the current
> situation). For more complex queries , both stitching together an entire
> session and stitching together the history per-user will not only be
> possible but should be fairly efficient.
>

I agree about the dash, but in the absence of an efficient design for
backend stitching, I'm worried about the potential for significantly
increasing the difficulty of analyzing Telemetry, especially at a time when
we're doing more and more analysis to support top organizational goals. It
also seems to run counter to the efforts Roberto has made in making
analysis quicker & simpler (Spark and web front-end) -- I think we could
end up adding a clunky and possibly fragile pre-processing step to every
analysis.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fhr-dev/attachments/20150127/22a159b3/attachment-0001.html>


More information about the fhr-dev mailing list