JS raw MIME parser prototype
Pidgeot18 at verizon.net
Sat Jan 28 03:23:39 UTC 2012
Earlier  I proposed an API for a parser which would only concern
itself with producing a MIME tree and not trying to do the other
functions that libmime does (i.e., conversion to the attachment model
and driving the UI). I now have a working prototype, which you may find
attached (unless the newsgroup and/or mailing list decides to scrub it
:-( ), which has been tested on the MIME torture test  and, by
hand-inspection, appears to be correct.
This is merely a prototype, so a lot of things don't work. A brief list
of minimum necessary support that I do not have:
1. Content-Transfer-Encoding support
2. Support for RFC 2047/RFC 2231 in the structured header parser
3. Fixing the XXX's found in the comment
5. Hooking up external APIs to it for use as a streamer (it's a bit tricky)
6. Giving it knobs to twist for output
Basic notes about the implementation:
1. This parser is fully synchronous and blocking; the intent is to spin
the event loop using the callback in getNextBuffer.
2. The parser does not do any buffering of data; it relies on the input
data to buffer. Looking at the body parser, this may be a mistake...
3. Not all of the extensibility grabpoints are in there, but the one for
structured header decoding is.
4. I intend to minimize copying of the buffers as much as possible
(indeed, body part packets are pretty much just transparently passed
The main reason I'm writing this message is to solicit advice about the
API. What I have so far was motivated mostly by the comments of asuth
and protz in #maildev a few days ago; I still haven't worked out the
full public-facing API yet.
To say that this is completely independent of libmime would be a lie;
I've resorted to looking at the parsing code there to worry about
various edge cases. I have, however, deviated with respect to part
numbering, to make it fall more in line with the IMAP (RFC 3501)
numbering scheme. One problem that said scheme has is the inability to
distinguish between a series of nested rfc822 messages, so I have opted
here to represent that as an extra `$' tacked onto the part numbering
scheme (instead of a `.1').
Some final notes:
* There are about 450 lines of code right now, with well over a 100
lines of it in comments (as much notes about MIME specs as anything useful).
* The core header-parsing code is 55 lines of code (including comments),
while libmime's equivalent function is 177 lines long.
* I've previously been concerned about speed. The current implementation
takes .753s (user-time) to snarf the 1.7-MiB torture test (which
suggests throughput of ~2-2.5MiB/s right now); unfortunately, I don't
have equivalent timing for the current MIME parser.
* I haven't decided on a license for the parser yet. I tend to prefer
BSD-licensing everything I do, and this is certainly more widely
applicable than just mailnews code. On the other hand, this I intend to
get checked into comm-central, so maybe it might be a better idea to
stick to a tri-license.
Comments/questions/concerns/feedback greatly appreciated!
 <news://news.mozilla.org:119/4E320577.firstname.lastname@example.org>, or
if you prefer (you oughtn't).
appears to be the easiest way to find it. Note this is a gzip'd mbox and
not a message/rfc822 file by itself.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 17418 bytes
Desc: not available
More information about the tb-planning