Decommissioning HBase
Frank Bertsch
fbertsch at mozilla.com
Fri Oct 20 17:34:00 UTC 2017
Hi All,
The HBase Main Summary cluster has been a major headache one our end. As
such, we've looked at usage (almost nil) and decided to decommission the
cluster. I've already removed the code from python_moztelemetry, so if
you were using the HBase API you'll be seeing an error very soon (this
should affect no one, but please reach out for a resolution if you are
using it).
A workaround would be to use the main_summary table. The use case was:
given these client_ids, get the pings for those clients. A query for
that may look like this (in Presto - note we include sample_id to
decrease the search space). The query also looks very similar in Spark:
SELECT *
FROM main_summary
WHERE client_id = 'some-client-id'
AND sample_id = |crc32(CAST('some-client-id' AS varbinary)) % 100|
Alternatively, the clients_daily dataset (one row per client-day) could
be used. Let us know if you need help with these types of queries!
So long, HBase, and thanks for all the fish.
-Frank
[0] Bug for removing HBase:
https://bugzilla.mozilla.org/show_bug.cgi?id=1402322
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fx-data-dev/attachments/20171020/e7cb8304/attachment.html>
More information about the Fx-data-dev
mailing list