<div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">External S3 uploader <a href="https://gist.github.com/trink/399e8b923bcbc7095afba1ba0870d10a">https://gist.github.com/trink/399e8b923bcbc7095afba1ba0870d10a</a><br><br></div><div class="gmail_default" style="font-family:monospace,monospace">Trink<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 21, 2018 at 9:36 AM, Madhukar Thota <span dir="ltr"><<a href="mailto:madhukar.thota@gmail.com" target="_blank">madhukar.thota@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks. One last question, is the uploader process is open source to use? If not I will try to combine telemetry_s3.lua with s3_parquet for my usecase.<div class="HOEnZb"><div class="h5"><br><br>On Wednesday, February 21, 2018, Michael Trinkala <<a href="mailto:mtrinkala@mozilla.com" target="_blank">mtrinkala@mozilla.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">- The process is outside of Hindsight<br></div><div class="gmail_default" style="font-family:monospace,monospace">- Yes that uploader works fine but since we have an external process we didn't add it to s3_parquet<br><br></div><div class="gmail_default" style="font-family:monospace,monospace">Trink<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 21, 2018 at 5:39 AM, Madhukar Thota <span dir="ltr"><<a href="mailto:madhukar.thota@gmail.com" target="_blank">madhukar.thota@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks Michael.<div><br></div><div><span><pre style="white-space:pre-wrap;color:rgb(34,34,34);font-style:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;word-spacing:0px;background-color:rgb(255,255,255)">fyi: This writes to a local disk queue, we have a separate process that performs the actually s3 upload.</pre></span>Is this process part of hindsight or some other process outside of hindsight?</div><div><br></div><div>is it possible to use something like this with parquet: <a href="https://github.com/mozilla-services/data-pipeline/blob/master/hindsight/output/telemetry_s3.lua" target="_blank">https://github.com/mo<wbr>zilla-services/data-pipeline/b<wbr>lob/master/hindsight/output/te<wbr>lemetry_s3.lua</a></div><div><br></div><div>-Madhu</div><div><br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 20, 2018 at 11:54 AM, Michael Trinkala <span dir="ltr"><<a href="mailto:mtrinkala@mozilla.com" target="_blank">mtrinkala@mozilla.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">
<pre>Here is an example of how we write a Heka message to parquet. fyi: This writes to a local disk queue, we have a separate process that performs the actually s3 upload.<br><br>-- -*- lua -*-
filename        = "s3_parquet.lua"
message_matcher = "Type == 'telemetry' && Logger == 'telemetry'"
preserve_data   = false
ticker_interval = 60

parquet_schema_file = "<%= @heka_schema_path %>/telemetry/telemetry_payload<wbr>_size.1.parquetmr.txt"

metadata_group = nil
json_objects = nil
s3_path_dimensions  = {
    {name = "submission_date_s3", source = "Timestamp", dateformat = "%Y%m%d"},
}

batch_dir           = "<%= @s3_buffer_dir_disk %>/telemetry-payload-size-parq<wbr>uet/v1"
max_writers         = 5
max_rowgroup_size   = 10000
max_file_size       = 1024 * 1024 * 300
max_file_age        = <%= @max_file_age %>
hive_compatible     = true
</pre>

<br></div><div class="gmail_default" style="font-family:monospace,monospace">-- parquet schema<br>
<table><tbody><tr><td>message telemetry_payload_size {</td>
      </tr>
      <tr>
        </tr></tbody></table><table><tbody><tr><td>  required int64 Timestamp;</td>
      </tr>
      <tr>
        </tr></tbody></table><table><tbody><tr><td>  required int64 size;</td>
      </tr>
      <tr>
        </tr></tbody></table><table><tbody><tr><td>  required group Fields {</td>
      </tr>
      <tr>
        </tr></tbody></table><table><tbody><tr><td>    required binary appBuildId (UTF8);</td>
      </tr>
      <tr>
        </tr></tbody></table><table><tbody><tr><td>    required binary appUpdateChannel (UTF8);</td>
      </tr>
      <tr>
        </tr></tbody></table><table><tbody><tr><td>    required binary docType (UTF8);</td>
      </tr>
      <tr>
        </tr></tbody></table><table><tbody><tr><td>  }</td>
      </tr>
      <tr>
        </tr></tbody></table>}

<br><br></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On Thu, Feb 15, 2018 at 4:46 PM, Madhukar Thota <span dir="ltr"><<a href="mailto:madhukar.thota@gmail.com" target="_blank">madhukar.thota@gmail.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div dir="ltr">Hi<div><br></div><div>Is there any example to send data syslog data from Kafka to s3 with parquet format using hindsight.</div><div><br></div><div>This is what i am trying to achieve.</div><div><br></div><div>syslog --> hindsight --> Kafka --> hindsight --> s3 (parquet format).</div><div><br></div><div>Thanks,</div><div>Madhu</div></div>
<br></div></div>______________________________<wbr>_________________<br>
Hindsight mailing list<br>
<a href="mailto:Hindsight@mozilla.org" target="_blank">Hindsight@mozilla.org</a><br>
<a href="https://mail.mozilla.org/listinfo/hindsight" rel="noreferrer" target="_blank">https://mail.mozilla.org/listi<wbr>nfo/hindsight</a><br>
<br></blockquote></div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</blockquote>
</div></div></blockquote></div><br></div>