Fwd: Longitudinal Dataset Missing 11% of Clients
Frank Bertsch
frank at mozilla.com
Thu Jul 26 19:59:01 UTC 2018
(If you're seeing this as a duplicate, note that the previous email,
sent to all users of Longitudinal, was sent to the wrong fx-data-dev
address.)
Longitudinal users - we have found, and fixed, a bug in the Longitudinal
dataset.
Previous iterations of the longitudinal dataset were excluding 11% of
users. We haven't done analysis on whether this was a randomly selected
group or if they all share some features or usage patterns. Given the
nature of the bug it is safe to assume the latter. Details of the bug
are available below.
The next version of Longitudinal (2018-07-28) will have the fix, and as
such, your dashboards may look different! Additionally, if you have any
analysis/queries from the past 6 months that relied on Longitudinal, we
*strongly* recommend rerunning on the new dataset (when it is available)
- the results may have changed. We will also be rebuilding historical
versions of Longitudinal for the past six months. *Once the data is
available I will reply-all to this email.*
Moving forward, we will see only around .0008% of clients ignored (52 in
the most recent tests); these are due to malformed pings and are safe to
ignore.
We do expect to EOL the Longitudinal dataset within the coming two
quarters. If you rely on it for ongoing analysis, we will be in contact
with you about moving to a different datasource.
If you have /any/ concerns, please contact us in #fx-metrics on Slack or
#telemetry in IRC.
--Boring Details Below--
The issue was that we tried to hash some Avro types by creating a Set
structure out of them. However, when looking up the HashCode of the Avro
type, 11% of clients had at least one ping with a BigInteger or
BigDecimal Avro type (after conversion). That ended up hitting an Avro
bug: https://issues.apache.org/jira/browse/AVRO-1146
Fix:
https://github.com/mozilla/telemetry-batch-view/commit/8a8525b5a980908907b63f38017f82fa4ea10ab8
Verification of the bug:
Longitudinal Client Count:
https://sql.telemetry.mozilla.org/queries/57350/source
Main Summary Client Count, same period:
https://sql.telemetry.mozilla.org/queries/57349/source
-Frank Bertsch (:frank)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fx-data-dev/attachments/20180726/45586145/attachment.html>
More information about the Fx-data-dev
mailing list