1% Longitudinal Dataset

Roberto Agostino Vitillo rvitillo at mozilla.com
Tue Feb 2 18:34:37 UTC 2016


tldr: A longitudinal Parquet dataset of 1% of the release population is available for consumption at [0], see [1] for some examples on how to query it.

The longitudinal dataset is logically organized as a table where rows represent profiles and columns the various metrics (e.g. startup time). Each field of the table contains a list of values, one per Telemetry submission received for that profile.

The dataset is going to be regenerated from scratch every week, this allows us to apply non backward compatible changes to the schema and not worry about merging procedures. 

The current version of the longitudinal dataset has been build with all main pings received from 1% of the release population after mid November, which is shortly after Unified Telemetry landed. Future version will store up to 180 days of data.

The dataset contains all histograms but does not yet include all metrics stored in the various sections of the pings. See [1] for a pointer to a complete list of available metrics. More metrics are going to be included in future versions of the dataset; inclusion of specific metrics can be prioritized by filing a bug.

--  
Roberto

[0] s3://telemetry-parquet/longitudinal/generationDate=20160201
[1] https://gist.github.com/vitillo/627eab7e2b3f814725d2


More information about the fhr-dev mailing list