Heavy Users Dataset - Available Now!

Ryan Harter rharter at mozilla.com
Thu Oct 5 13:31:02 UTC 2017


Awesome! This is a huge help. Thanks, Frank!

On Thu, Oct 5, 2017 at 9:25 AM, Rebecca Weiss <rweiss at mozilla.com> wrote:

> AMAZING.  Thanks Frank!
>
> On Thu, Oct 5, 2017 at 9:22 AM, Saptarshi Guha <sguha at mozilla.com> wrote:
>
>> Lovely news. Thanks for your efforts on this
>> Regards
>> Saptarshi
>>
>> On Thu, Oct 5, 2017 at 12:18 AM, Alessio Placitelli <
>> aplacitelli at mozilla.com> wrote:
>>
>>> Great job Frank!
>>>
>>> 2017-10-05 5:50 GMT+02:00 Frank Bertsch <fbertsch at mozilla.com>:
>>>
>>>> All -
>>>>
>>>> After much weeping, wailing, and gnashing of teeth, the heavy_users
>>>> dataset is available. It is one-row per client-day, where day is
>>>> submission_date. A client has a row for a specific submission_date if they
>>>> were active at all in the 28 day window ending on that submission_date.
>>>> How to use it:
>>>>
>>>> In Spark: `spark.read.parquet("s3://telemetry-parquet/heavy_users/v1")
>>>>
>>>> In SQL: "SELECT * FROM heavy_users"
>>>>
>>>> Below are a few example queries for you to peruse.
>>>>
>>>> https://sql.telemetry.mozilla.org/queries/47041/source#127382 - *Joined
>>>> with main_summary* to get distribution of max_concurrent_tab_count for
>>>> heavy vs. non-heavy users
>>>>
>>>> https://sql.telemetry.mozilla.org/queries/47044/source#127385 - *Joined
>>>> with longitudinal* to get crash rates for heavy vs. non-heavy users
>>>>
>>>> You'll note that it seems that heavy_users use more tabs, but crash
>>>> less. These results probably require more investigation.
>>>> What is it?
>>>>
>>>> A user is a heavy_user as of day N if, for the 28 day period ending on
>>>> day N, the sum of their active_ticks is in the 90th percentile (or above)
>>>> of all clients during that period. For more analysis on this, and a
>>>> discussion of new profiles, see https://metrics.mozilla.com/pr
>>>> otected/sguha/heavy/heavycutoffs5.html.
>>>>
>>>> Please note a few caveats:
>>>>
>>>> 0. Data starts at 20170801. There is technically data in the table
>>>> before this, but heavy_user is NULL for those dates because it needed to
>>>> bootstrap the first 28 day window.
>>>>
>>>> 1. Because it is top 10% for each 28 day period, and single
>>>> submission_date has more than 10% of clients be considered heavy_users.
>>>> This is because heavy_users, on average, use Firefox on more days than
>>>> non-heavy users.
>>>>
>>>> 2. Each day has a separate, but related, set of heavy_users. Initial
>>>> investigations show that ~97.5% of heavy_user as of a certain day are still
>>>> considered heavy_users as of the next day.
>>>>
>>>> -Frank
>>>>
>>>> _______________________________________________
>>>> Fx-data-dev mailing list
>>>> Fx-data-dev at mozilla.org
>>>> https://mail.mozilla.org/listinfo/fx-data-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Fx-data-dev mailing list
>>> Fx-data-dev at mozilla.org
>>> https://mail.mozilla.org/listinfo/fx-data-dev
>>>
>>>
>>
>> _______________________________________________
>> Fx-data-dev mailing list
>> Fx-data-dev at mozilla.org
>> https://mail.mozilla.org/listinfo/fx-data-dev
>>
>>
>
> _______________________________________________
> Fx-data-dev mailing list
> Fx-data-dev at mozilla.org
> https://mail.mozilla.org/listinfo/fx-data-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fx-data-dev/attachments/20171005/1666dfc1/attachment-0001.html>


More information about the Fx-data-dev mailing list