[Hindsight] Repeated messages in elastiseach output

Michael Trinkala mtrinkala at mozilla.com
Mon Nov 28 22:17:20 UTC 2016


Our production data warehouse loaders use a balanced consumer group without
any message duplication.  If all of your input plugin configs are using a
group.id of 'logs-0' and you have #inputs <= #topics then everything should
be working fine.  I would start by dumping the Kafka consumer
groups/offsets it thinks it is processing.  If you provide those dumps and
the full configs there will be a better chance to diagnose the issue.

Thank,
Trink

On Mon, Nov 28, 2016 at 1:28 AM, Tomas Barton <barton.tomas at gmail.com>
wrote:

> Hi,
>
> I'm trying to configure Hindsight to pull messages from Kafka and store
> then to Elasticsearch. I'm using Hindsight 0.12.7 and latest version of all
> other modules.
>
> Before using Hindsight the number of messages per day in given topic was
> around 10M now the number of messages is at least doubled when 2 Hindsight
> consumers are used.
>
> Probably I misunderstood consumer group concept. Earlier I was using 1
> consumer per topic partition. The configuration is pretty much default:
>
> -- In balanced consumer group mode a consumer can only subscribe on
> topics, not topics:partitions.
> -- The partition syntax is only used for manual assignments (without
> balanced consumer groups).
> topics                  = {"logs"}
>
> -- https://github.com/edenhill/librdkafka/blob/master/
> CONFIGURATION.md#global-configuration-properties
> consumer_conf = {
>     ["group.id"] = "logs-0", -- must always be provided (a single
> consumer is considered a group of one
>     -- in that case make this a unique identifier)
>     ["message.max.bytes"] = output_limit,
> }
>
> -- https://github.com/edenhill/librdkafka/blob/master/
> CONFIGURATION.md#topic-configuration-properties
> topic_conf = {
>     -- ["auto.commit.enable"] = true, -- cannot be overridden
>     -- ["offset.store.method"] = "broker, -- cannot be overridden
> }
>
> Now I have multiple consumers within the same consumer group, according to
> Kafka documentation:
>
> Kafka will deliver each message in the subscribed topics to one process in
>> each consumer group.
>
>
> So having all processes in one group seems to be a good idea when each
> message is supposed to be stored just once. But in reality it looks like
> each process in the same consumer groups is reading all the messages.
>
> Btw. Kafka version is 0.9.0.1 and each topic's partition seems to be owned
> by some consumer.
>
> Any idea what could be wrong?
>
> Thanks,
> Tomas
>
> _______________________________________________
> Hindsight mailing list
> Hindsight at mozilla.org
> https://mail.mozilla.org/listinfo/hindsight
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/hindsight/attachments/20161128/72a9326a/attachment.html>


More information about the Hindsight mailing list