[Hindsight] Parquet example

Michael Trinkala mtrinkala at mozilla.com
Tue Feb 20 16:54:08 UTC 2018


Here is an example of how we write a Heka message to parquet. fyi:
This writes to a local disk queue, we have a separate process that
performs the actually s3 upload.

-- -*- lua -*-
filename        = "s3_parquet.lua"
message_matcher = "Type == 'telemetry' && Logger == 'telemetry'"
preserve_data   = false
ticker_interval = 60

parquet_schema_file = "<%= @heka_schema_path
%>/telemetry/telemetry_payload_size.1.parquetmr.txt"

metadata_group = nil
json_objects = nil
s3_path_dimensions  = {
    {name = "submission_date_s3", source = "Timestamp", dateformat = "%Y%m%d"},
}

batch_dir           = "<%= @s3_buffer_dir_disk
%>/telemetry-payload-size-parquet/v1"
max_writers         = 5
max_rowgroup_size   = 10000
max_file_size       = 1024 * 1024 * 300
max_file_age        = <%= @max_file_age %>
hive_compatible     = true


-- parquet schema
message telemetry_payload_size {
required int64 Timestamp;
required int64 size;
required group Fields {
required binary appBuildId (UTF8);
required binary appUpdateChannel (UTF8);
required binary docType (UTF8);
}
}


On Thu, Feb 15, 2018 at 4:46 PM, Madhukar Thota <madhukar.thota at gmail.com>
wrote:

> Hi
>
> Is there any example to send data syslog data from Kafka to s3 with
> parquet format using hindsight.
>
> This is what i am trying to achieve.
>
> syslog --> hindsight --> Kafka --> hindsight --> s3 (parquet format).
>
> Thanks,
> Madhu
>
> _______________________________________________
> Hindsight mailing list
> Hindsight at mozilla.org
> https://mail.mozilla.org/listinfo/hindsight
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/hindsight/attachments/20180220/f6ae40fb/attachment.html>


More information about the Hindsight mailing list