<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class=""><blockquote type="cite">
<div dir="ltr"><br>
<div>
<ul>
<li>It will be hard to do per-session analyses</li>
</ul>
</div>
</div>
</blockquote>
<br></span>
I have several responses here:<br>
<br>
1) It will be a bit harder than currently, but I don't think that it
will be extremely hard. There will be an efficient API to fetch all
the pings associated with a user, which should make it relatively
straightforward to stitch together an entire session from its
pieces. This is a functional requirement for the more qualitative
analyses, which will have to stitch together an entire user history
and not just individual sessions. Doing an individual session should
be fairly easy.<br></div></blockquote><div><br></div><ul><li>Whether it'll be easy to reconstruct sessions or user-days<span style="color:rgb(255,0,0)">*</span> from pings will depend on how we design the backend.</li><ul><li><span style="color:rgb(255,0,0)">*<b>Note:</b> </span>I'm making up a term "<span style="color:rgb(255,0,0)"><b>user-days</b></span>": all of a user's pings from a given day stitched together into one coherent unit, similar to the "days" structures in FHR payloads. FHR operates on calendar-days, so they'll need a user's pings collected over a single day converted to a "user-day"</li></ul><li>As far as I know, we don't have a design for the backend and that's making me nervous. I would hate to make client-side changes only to realize we've painted ourselves into a corner on the server-side, and then have that collected Telemetry & FHR data go to waste.<br></li><li>My understanding was that there will be an API that returns the list of <u>archives</u> where a user's pings are located.<br></li><ul><li>The API itself won't be hard to implement, but..</li><li>That means that whenever we want to look at a user of interest and re-create his sessions, we have to fetch all the archives containing his pings, pull out each ping from an offset in each archive, then run the code that stitches those pings together. Then that process has to be repeated for the entire population of interest.<br></li><li>Stitching is a pain:<br></li><ul><li>Different types of histograms will have to be stitched differently</li><ul><li>e.g. histograms (or other measurements) that are one-per-session<br></li></ul><li>Non-histogram Telemetry sections (e.g. add-on data, background hang reporter data) will have their own stitching rules<br></li><li>We'll have to maintain exception lists for histograms that look like regular (linear/exponential) histograms but actually represent one-per-session measurements</li><li>A single missing or corrupted fragment in a long session means we won't
be able to recreate the session. We already have problems with missing
clientIDs</li></ul><li>Running the session stitching job is going to take a long time when we're stitching sessions for 100,000 users every night. <br></li><ul><li>If we want to re-construct long sessions (e.g. one month) during this stitching job, we'll have to touch a LOT of archives (e.g. going 30 days back).</li><li>If there's an issue during the stitching job, either a bug or some incorrect or missing logic, we'll have to re-run the job and backfill data.<br></li></ul><li>We're going to have to define ahead of time which users we're tracking (e.g. clientID % 10 == 0), because we just won't be able to efficiently record the locations of ALL pings for ALL users</li><ul><li>That would essentially mean we throw away 90% of the data on the busier channels as soon as we receive it on the server<br></li></ul></ul><li>I think re-constructing an FHR user-day will be a lot easier than re-constructing a session</li><ul><li>A user-day has the advantage of being data collected during a single day & the day's pings will likely all be submitted together during the following day or a couple of days later. That means we can probably reconstruct vast majority of user-days for any given day simply by looking at pings collected from the day of interest and a few days after it.</li><li>If this FHR job runs every night, getting a users' longitudinal history will just be a trivial matter of joining the nightly results.</li><li>Rules for stitching together FHR data are simpler<br></li></ul><li>There's more, but I will explain further in a later message<br></li></ul><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
2) I treat the session orientation of telemetry as an unfortunate
limitation, not a desirable property, for almost all of the use
cases that I've seen. I'd like us to try and move away from
reporting metrics based on sessions. Can you describe in more detail
the use cases where analyzing data by session is preferable to
analyzing by some constant denominator? We should be willing to use
both clock time and activeTicks as denominators, and these
denominators can both be calculated looking at individual subsession
pings.<br></div></blockquote><div><ul><li>I wasn't around when Telemetry was initially designed, but I don't think sessions are the worst choice of unit for Telemetry. Sessions have inherent meaning since they map to how the browser is used. For example, a crash rate expressed in terms of "1 crash every 100 sessions" has an intuitive meaning.</li><li>I think your point about analyzing with respect to activeTicks and
clock time is orthogonal. We can do that with today's Telemetry using
session pings. In fact, that's exactly what Avi will use when
calculating "responsiveness scores"<br></li><li>There is <b>definitely</b> benefit to analyzing data with respect to sessions:</li><ol><li>Some measurements just need to be done in terms of sessions, e.g. % of sessions with clean shutdowns in Firefox 38, maximum number of threads during a session, ratio of cold startups to warm startups, etc.</li><li>We frequently look for correlations between variables. Having access to a full session's measurements means having access to all the data.</li><ul><li>For example, let's say I want to know if there is a correlation between a user's networking stack getting into a catatonic state and the browser doing a certain type of SPDY request. I need to compare how common a stuck networking stack is in sessions that made that request vs sessions that didn't make that request.</li></ul></ol><li>I do understand the advantages of "ping per day+environment". It's
appealing because it gives us more precision, shorter submission delays
and allows us to unify Telemetry & FHR. However, I'm worried the
backend stitching is being treated as a black box, and that this
approach will end up costing us our ability to do our current analyses.</li><ul><li>If
we can figure out a solid plan for efficiently recreating sessions
& user-days on the backend, I'll be in complete agreement with
switching to the new ping format.</li></ul></ul></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
3) For the case of the current telemetry dashboard, I'd like to
understand why simply replacing the current whole-session analysis
with the new subsessions would produce statistically worse results
than the current session-based analysis.<span class=""><br></span></div></blockquote><div><br></div><div>I'll assume you mean the <a href="http://telemetry.mozilla.org">telemetry.mozilla.org</a> <b>histogram </b>dashboard. For the record, my objections are not on the basis of statistical skew.<br><br>First, the histograms in that dash can be divided into 2 categories:<br><ol><li>The first category is histograms that collect many measurements during a session, each mostly independent of the other. These are the most common, and are used to generate a view of the <b>distribution </b>of individual measurements. A good example is the exponential GC_MS histogram showing the duration of garbage collection. Another example are the "error code" enum histograms which show the relative frequency of different error codes.<br></li><ul><li>Reporting by subsession is ideal for these histograms! We get data sooner from longer sessions and we get it even in the event of a crash! No objections here.</li><li>We could even reset these histograms between pings from the same session.</li></ul><li>The second category of histograms are the per-session histograms (flag, count, some keyed histograms, etc). </li><ul><li>These have to be reported differently. I've already talked about these and will answer your questions further below.</li><li>These histograms don't benefit from subsession reporting since their value isn't known until the session is complete.</li><li>These can also be represented well with your proposed approach.</li></ul></ol></div><div>However, the Telemetry <b>histogram </b>dashboard is the simplest & easiest Telemetry use case to implement. It's also the least interesting. Custom Telemetry analyses are much more powerful and are the preferred way of answering the most interesting questions (is optimization X working, is environment variable A correlated with performance problem B, how is extension Y affecting Firefox performance, etc).<br></div><div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class="">
</span>
*If* you really care about this per-session, why can't you just take
"true" from any of the subsessions as an indication that it's true
for the entire session?<br></div></blockquote><div><br></div><div>This was in relation to flag histograms. Telemetry users definitely want flag histograms, there are 86 of them in Histograms.json (even though most people don't know about them) because it's just a very natural use case.<br>Your approach wouldn't work. Flag histograms are reported as either a True or False value, but the flag could be "set" to True many times in a session. So a 30-day session that hits the flag condition every day would count as 30 sessions submitting a true value (which is incorrect, 1 session = 1 vote for flag histograms).<br><br>Instead, we could treat per-session histograms differently within your proposal. Simply don't reset per-session histograms on every new ping, and only ever report them in the final ping of a session.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
And if we just report by subsession, how is this much different from
the skew that we already have between users who have lots of short
sessions and users that keep their browser open for days or weeks?<br></div></blockquote><div><br></div><div>The histograms are already defined as "per session" which is a meaningful unit, e.g. percentage of browser startups that showed the "Updating Nightly" dialog. So you can see how the short-session skew isn't always a factor. Expressing the same measurement in terms of "dialog appearances per subsession" or "per hour of usage" would be meaningless because the measurement is specifically tied to startup behavior (i.e. once per session).<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
Maybe this just indicates that we're mis-using histograms for
non-aggregate measurements, and we should just have a separate list
of flag metrics which are treated differently.<span class=""><br></span></div></blockquote><div><br></div><div>That would work, but converting all the client code is going to be no easy task.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class="">
<blockquote type="cite">
<div dir="ltr">
<div>
<ul>
<ul>
<li><a href="https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type" target="_blank">Count
histograms</a> are also per-session measurements. You
can't aggregate a count-histogram value from the middle
of a session together with final values from other
sessions<br>
</li>
</ul>
</ul>
</div>
</div>
</blockquote>
<br></span>
Won't summing across the subsessions get you the total count for the
session?<span class=""><br></span></div></blockquote><div><br></div><div>Yes it would, but that requires running a session stitching job before feeding the data into the dash, something that's not currently required. I'm also skeptical about session stitching. I think it's better to report count histograms only in the final ping of the session.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=""></span>I don't understand this case. Assuming session stitching works,
which is a general requirement for all sorts of analyses, this
should work no worse than currently, and you potentially have
finer-grain data on the subsequent days if that's useful.<br></blockquote><div><br></div><div>Yup I was expressing concerns about the feasibility of session stitching.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class="">
<blockquote type="cite">
<div dir="ltr">
<div>
<ul>
<li>Resetting Telemetry and FHR data when a
TelemetryExperiment begins removes valuable context from
the experiment ping. It's possible to reconstruct it, but
that's yet another server-side job to run</li>
</ul>
</div>
</div>
</blockquote></span>
I don't understand this. Is this also assuming that stitching is
expensive?<span class=""><br></span></div></blockquote><div><br></div><div>Yes.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><span class=""></span>
I feel like this proposal is optimizing for the wrong things.<br>
<br>
You are making a distinction between "Telemetry" measurements and
other measurements in a way which I am specifically trying to avoid.
The goal is to use the common histogram system for everything. At
least some of those measurements must be distinguished by
subsession. I explicitly want to get rid of the current situation
where "telemetry metrics" are treated one way, and "FHR metrics" are
treated in some entirely separate manner. We want to be able to use
the standard histograms/keyed histograms for almost everything.<br></div></blockquote><div><br></div><div>I understand the appeal of unification, but we have to be aware we're merging systems with different requirements. The subsession reporting is the least-common-denominator between FHR and Telemetry, but that comes at the cost of doing more work on the backend to convert data back to the 2 formats we currently have and care about (user-days and sessions).<br></div><div>Note that in my earlier proposal, FHR <b>could</b> use the same histograms as Telemetry.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
For the simple things like the telemetry dashboard, I believe that
doing all analysis by subsession is good enough (no worse than the
current situation). For more complex queries , both stitching
together an entire session and stitching together the history
per-user will not only be possible but should be fairly efficient.<br></div></blockquote><div><br></div><div>I agree about the dash, but in the absence of an efficient design for backend stitching, I'm worried about the potential for significantly increasing the difficulty of analyzing Telemetry, especially at a time when we're doing more and more analysis to support top organizational goals. It also seems to run counter to the efforts Roberto has made in making analysis quicker & simpler (Spark and web front-end) -- I think we could end up adding a clunky and possibly fragile pre-processing step to every analysis.<br></div></div><br></div></div>