Heavy Users Dataset - Available Now!
Frank Bertsch
fbertsch at mozilla.com
Thu Oct 5 03:50:50 UTC 2017
All -
After much weeping, wailing, and gnashing of teeth, the heavy_users
dataset is available. It is one-row per client-day, where day is
submission_date. A client has a row for a specific submission_date if
they were active at all in the 28 day window ending on that submission_date.
How to use it:
In Spark: `spark.read.parquet("s3://telemetry-parquet/heavy_users/v1")
In SQL: "SELECT * FROM heavy_users"
Below are a few example queries for you to peruse.
https://sql.telemetry.mozilla.org/queries/47041/source#127382 - *Joined
with main_summary* to get distribution of max_concurrent_tab_count for
heavy vs. non-heavy users
https://sql.telemetry.mozilla.org/queries/47044/source#127385 - *Joined
with longitudinal* to get crash rates for heavy vs. non-heavy users
You'll note that it seems that heavy_users use more tabs, but crash
less. These results probably require more investigation.
What is it?
A user is a heavy_user as of day N if, for the 28 day period ending on
day N, the sum of their active_ticks is in the 90th percentile (or
above) of all clients during that period. For more analysis on this, and
a discussion of new profiles, see
https://metrics.mozilla.com/protected/sguha/heavy/heavycutoffs5.html.
Please note a few caveats:
0. Data starts at 20170801. There is technically data in the table
before this, but heavy_user is NULL for those dates because it needed to
bootstrap the first 28 day window.
1. Because it is top 10% for each 28 day period, and single
submission_date has more than 10% of clients be considered heavy_users.
This is because heavy_users, on average, use Firefox on more days than
non-heavy users.
2. Each day has a separate, but related, set of heavy_users. Initial
investigations show that ~97.5% of heavy_user as of a certain day are
still considered heavy_users as of the next day.
-Frank
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fx-data-dev/attachments/20171004/1d59ccaf/attachment.html>
More information about the Fx-data-dev
mailing list