<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi All,</p>
<p>The HBase Main Summary cluster has been a major headache one our
end. As such, we've looked at usage (almost nil) and decided to
decommission the cluster. I've already removed the code from
python_moztelemetry, so if you were using the HBase API you'll be
seeing an error very soon (this should affect no one, but please
reach out for a resolution if you are using it).<br>
</p>
<p>A workaround would be to use the main_summary table. The use case
was: given these client_ids, get the pings for those clients. A
query for that may look like this (in Presto - note we include
sample_id to decrease the search space). The query also looks very
similar in Spark:</p>
<tt>SELECT *</tt><br>
<tt>FROM main_summary</tt><tt><br>
</tt><tt>WHERE client_id = 'some-client-id'</tt><tt><br>
</tt><tt> AND sample_id = </tt><code class="descname">crc32(CAST('some-client-id'
AS varbinary)) % 100</code><span class="sig-paren"><br>
<tt><br>
</tt>Alternatively, the clients_daily dataset (one row per
client-day) could be used. Let us know if you need help with these
types of queries!<br>
<br>
So long, HBase, and thanks for all the fish.<br>
<br>
-Frank<br>
<br>
[0] Bug for removing HBase:
<a class="moz-txt-link-freetext" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1402322">https://bugzilla.mozilla.org/show_bug.cgi?id=1402322</a><br>
<tt></tt></span>
</body>
</html>