<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>All -<br>
</p>
<p>After much weeping, wailing, and gnashing of teeth, the
heavy_users dataset is available. It is one-row per client-day,
where day is submission_date. A client has a row for a specific
submission_date if they were active at all in the 28 day window
ending on that submission_date.</p>
<h2>How to use it:<br>
</h2>
<p>In Spark:
`spark.read.parquet(<a class="moz-txt-link-rfc2396E" href="s3://telemetry-parquet/heavy_users/v1">"s3://telemetry-parquet/heavy_users/v1"</a>)</p>
<p>In SQL: "SELECT * FROM heavy_users"</p>
<p>Below are a few example queries for you to peruse.</p>
<p><a class="moz-txt-link-freetext" href="https://sql.telemetry.mozilla.org/queries/47041/source#127382">https://sql.telemetry.mozilla.org/queries/47041/source#127382</a> - <b>Joined
with main_summary</b> to get distribution of
max_concurrent_tab_count for heavy vs. non-heavy users</p>
<p><a class="moz-txt-link-freetext" href="https://sql.telemetry.mozilla.org/queries/47044/source#127385">https://sql.telemetry.mozilla.org/queries/47044/source#127385</a> - <b>Joined
with longitudinal</b> to get crash rates for heavy vs. non-heavy
users</p>
<p>You'll note that it seems that heavy_users use more tabs, but
crash less. These results probably require more investigation.</p>
<h2>What is it?</h2>
<p>A user is a heavy_user as of day N if, for the 28 day period
ending on day N, the sum of their active_ticks is in the 90th
percentile (or above) of all clients during that period. For more
analysis on this, and a discussion of new profiles, see
<a class="moz-txt-link-freetext" href="https://metrics.mozilla.com/protected/sguha/heavy/heavycutoffs5.html">https://metrics.mozilla.com/protected/sguha/heavy/heavycutoffs5.html</a>.</p>
<p>Please note a few caveats:</p>
<p>0. Data starts at 20170801. There is technically data in the
table before this, but heavy_user is NULL for those dates because
it needed to bootstrap the first 28 day window.<br>
</p>
<p>1. Because it is top 10% for each 28 day period, and single
submission_date has more than 10% of clients be considered
heavy_users. This is because heavy_users, on average, use Firefox
on more days than non-heavy users.</p>
<p>2. Each day has a separate, but related, set of heavy_users.
Initial investigations show that ~97.5% of heavy_user as of a
certain day are still considered heavy_users as of the next day.</p>
<p>-Frank<br>
</p>
</body>
</html>