Suggestions for the new unified FHR/Telemetry/Experiment ping
Vladan Djeric
vdjeric at mozilla.com
Fri Jan 23 22:35:21 PST 2015
Hi all,
I've been thinking about Telemetry and FHR unification, and I think we have
to be careful around the new ping semantics. Client-side changes require
backend architecture changes, and both ultimately determine which analyses
are convenient or even feasible.
To recap, we are merging 3 measurement systems with different semantics:
- Telemetry's measurements are implicitly "*per session*". Telemetry
creates a new ping a the beginning of every Firefox session and records a
"saved-session" ping at the end of the session.
- *There is also an "idle-daily" ping sent during the session at most
every 24 hours, but there are backend problems so idle-dailies
are currently not being used for anything*
- FHR's reporting of user activity & browser state is mostly with
respect to *calendar days*
- TelemetryExperiments focus on differences between the test group and
the control group
We wish to unify the FHR & Telemetry pings into a single ping and make data
collected during A/B TelemetryExperiments more precise.
*So first off, let me describe my intepretation of the current unification
proposal
<https://docs.google.com/document/d/1IGpzsYGi_sq3YFQDAPyKOkU_BKvXAC95fZYA2i4ceVs/edit#>:*
- Whenever a new ping is started, all the FHR & Telemetry measurements
for the current session will be reset
- In the new system, Firefox starts collecting a new ping whenever:
1. a new Firefox session is started
2. a new day has begun (not sure if it's every 24 hours of uptime, or
if it's based on local time e.g. midnight local time)
3. whenever the Firefox "environment" changes
- Examples: a user enables or disables an addon, the graphics
driver is updated, an A/B experiment begins or ends (this
happens in the
middle of a session), Firefox HW acceleration is disabled, etc
This proposal has some nice properties unique to it:
- Relatively straightforward to implement on the client
- No duplicated measurements (the existing Telemetry "saved-session"
pings duplicate the measurements in the "idle-daily" pings)
- Data is sent to Mozilla servers quickly (no ping covers more than a
24-hour period, so no waiting for a session to end)
However, I think these semantics would create a lot of problems for
Telemetry analysis:
- It will be hard to do per-session analyses
- It's going to be hard to re-construct sessions from session
fragments on the Telemetry server
- A 2-week long session will have at least 14 pings scattered
across 14 daily archives
- The reconstruction process could get messy and fragile
- Code for merging different histogram types, dealing with missing
session fragments, storing and updating the merged pings, backfills,
correcting errors, etc.
- For FHR's needs, I think this is unavoidable (and easier!),
unless we go back to submitting 6-month user histories in
every ping as FHR
does now
- It's better not to do these kinds of reconstruction jobs unless
if we absolutely have to
- Many of the 1000+ Telemetry measurements are inherently
"per-session" and can't meaningfully be split into session fragments:
- Flag histograms
<https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type>
track
feature usage per-session.
- They are automatically initialized to a value of "false" at the
beginning of a session, and can only be set to "true" once.
- If we reset Telemetry measurements every time we create a new
ping, we'll be reporting nonsense: pings from the same session will
contradict each other on whether a feature was ever used
during the session
- This would feed bad data to both the dashboards and any custom
analyses
- Count histograms
<https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe#Choosing_a_Histogram_Type>
are
also per-session measurements. You can't aggregate a
count-histogram value
from the middle of a session together with final values from
other sessions
- You might be saying: "So only report those histogram types in the
final ping of the session!"
- We don't know all the histograms that need this treatment. Other
histogram types are being used to represent per-session
measurements such
as configuration settings or feature usage, e.g.
"CANVAS_2D_USED" boolean
histogram
- Some keyed-histograms have per-session semantics, some don't
- Some Telemetry users want measurements expressed in "per
session" terms and those measurements aren't necessarily in
count & flag
histograms
- See next point about custom analyses
- For custom analyses, we sometimes want to correlate measurements
from the beginning of a session with measurements from the end of a session
(which could have lasted several days), e.g. histograms related to startup
performance vs later performance
- We would need that messy server-side session reconstruction process
to get at per-session data.
- More generally, a ping generated as a result of local time &
environment changes is not inherently meaningful to us, unlike a
full user
session
- Resetting Telemetry and FHR data when a TelemetryExperiment begins
removes valuable context from the experiment ping. It's possible to
reconstruct it, but that's yet another server-side job to run
- There's overhead from sending a new ping for each mid-session
environment change
- There's also a small privacy issue with creating ordered,
fine-grained reports of user actions, e.g. when a user goes through their
add-ons list and disables 5 addons, we report each user action
- Either coalesce successive environment-change pings, or carefully
vet which mid-session environment changes generate a new ping
I'd like to propose that we implement the following modifications to the
FHR/Telemetry v4 document:
1. Do not reset *Telemetry* measurements when a session crosses the
24-hour boundary
- Continue to "reset" Telemetry measurements when we start a new
session
- There's no need to reset Telemetry on most environment changes
(e.g. amount of memory installed) since those can't happen without a
Firefox restart anyway.
2. Record mid-session environment changes (add-ons and
TelemetryExperiments) in a special section in the ping.
- For each such environment change, document the change in the
section and also attach a snapshot of the Telemetry & FHR data
at the time
of the change
- After the snapshot is saved, reset Telemetry and FHR measurements
for the current session. In other words, snapshot & then build up a diff
- For each additional environment change during the same session,
just repeat and append to the new section
- Telemetry backend scripts (dashboard, regression detector etc) can
just ignore experiment/add-on change pings
This model has some nice properties:
- The *final ping* of a session is equivalent to a Telemetry
saved-session ping
- Per-session analyses are as easy to do as before
- No need to run any session reconstruction jobs!
- Every main ping submitted is meaningful without needing any
reconstruction steps. All pings will contain the current FHR state + all
the Telemetry measurements from the current session
- Most pings will only have one environment change, so the relevant
measurements that happened after the change are all going to be in the
regular Telemetry/FHR section
- However, when deeper analysis is required, Experiment pings will also
have information about what was happening BEFORE the experiment began
- Analyzing pings with multiple environment changes won't be much harder
Admittedly, there is a trade-off to not resetting Telemetry after the
24-hour period.
- Since each main ping submitted will contain Telemetry data from the
start of the session, getting Telemetry data collected over a single day
will be hard. I think this is an acceptable tradeoff.
I want to mention a few other solutions, but these are not as appealing:
- Collect and submit both per-session and per-ping Telemetry data...
This doubles Telemetry run-time memory use
- Reconstruct sessions by merging saved pings on the client-side... I
think this would be a mess
- Have TelemetryExperiments take effect on restart instead of
mid-session... This biases the experiment data against longer-running
sessions, and addons would still be an issue
Let me know what you think.
Thank you,
Vladan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fhr-dev/attachments/20150124/4f746555/attachment.html>
More information about the fhr-dev
mailing list