[Hindsight] Repeated messages in elastiseach output

Tomas Barton barton.tomas at gmail.com
Mon Nov 28 09:28:01 UTC 2016


I'm trying to configure Hindsight to pull messages from Kafka and store
then to Elasticsearch. I'm using Hindsight 0.12.7 and latest version of all
other modules.

Before using Hindsight the number of messages per day in given topic was
around 10M now the number of messages is at least doubled when 2 Hindsight
consumers are used.

Probably I misunderstood consumer group concept. Earlier I was using 1
consumer per topic partition. The configuration is pretty much default:

-- In balanced consumer group mode a consumer can only subscribe on topics,
not topics:partitions.
-- The partition syntax is only used for manual assignments (without
balanced consumer groups).
topics                  = {"logs"}

consumer_conf = {
    ["group.id"] = "logs-0", -- must always be provided (a single consumer
is considered a group of one
    -- in that case make this a unique identifier)
    ["message.max.bytes"] = output_limit,

topic_conf = {
    -- ["auto.commit.enable"] = true, -- cannot be overridden
    -- ["offset.store.method"] = "broker, -- cannot be overridden

Now I have multiple consumers within the same consumer group, according to
Kafka documentation:

Kafka will deliver each message in the subscribed topics to one process in
> each consumer group.

So having all processes in one group seems to be a good idea when each
message is supposed to be stored just once. But in reality it looks like
each process in the same consumer groups is reading all the messages.

Btw. Kafka version is and each topic's partition seems to be owned
by some consumer.

Any idea what could be wrong?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/hindsight/attachments/20161128/96426419/attachment.html>

More information about the Hindsight mailing list