Andrew J. Buehler wanderer at
Fri Feb 21 17:43:37 UTC 2014

On 02/20/2014 07:31 AM, Tanstaafl wrote:

> On 2014-02-19 7:45 AM, Andrew J. Buehler <wanderer at>
> wrote:

>> This is especially - or, at least, primarily - because last time I
>> checked, they don't consider e.g. "the Sent-folder copy of a
>> message I sent to a mailing list" and "the copy of the same message
>> which I received from the mailing list, which has been modified by
>> the mailing list software" to be different messages, even though
>> their contents (e.g. list footers and message header information)
>> are different.
> I don't see *any* reason to *ever* keep more than one *physical* copy
> of any given message on the same mailstore, I absolutely agree with
> you on the way GMail treats Sent messages. They are *not*, in fact,
> the same message, so should *not* be de-duped like google does now.
> But other than that, I really like it's de-duplication feature.

It's not just Sent messages, though.

Say I post a message to a mailing list, and someone replies both to the
list and directly to me.

The copy of the reply which I receive through the list has been modified
by the list software. The copy which I receive directly has not. They
are not identical, and I want both versions - or at least to be able to
decide for myself which, if either, to delete.

Other scenarios are possible as well, where there will be multiple
similar-but-different copies of a given message, with the same
Message-ID; such messages are distinct, and should be treated as such.
Reliably distinguishing between them for deduplication purposes, without
having to essentially cmp every new message against every existing
message (which seems likely to kill performance), would be - at best - a
considerable challenge. The obvious approach would be to use a hash of
the message, but that leaves open the potential for collisions.

I'm having a hard time thinking of any example of a scenario where there
would be multiple identical copies of a message in a given account,
except where the user explicitly and/or actively copied the message into
a second location - which I would expect to be rare.

Deduplication in that case would be fine, I should think, but not pretty
much any other. (And even that case has issues. What if the message has
an attachment, and then the user chooses "remove attachment" on one
copy? Should the attachment be removed from the other copy as well, or
should the deduplication disappear?)

I like the idea of deduplication within an account in theory, but in
practice, I'm not sure the challenges don't far outweigh the advantages.

