JS raw MIME parser prototype

Joshua Cranmer Pidgeot18 at verizon.net
Sat Jan 28 03:23:39 UTC 2012

Earlier [1] I proposed an API for a parser which would only concern 
itself with producing a MIME tree and not trying to do the other 
functions that libmime does (i.e., conversion to the attachment model 
and driving the UI). I now have a working prototype, which you may find 
attached (unless the newsgroup and/or mailing list decides to scrub it 
:-( ), which has been tested on the MIME torture test [2] and, by 
hand-inspection, appears to be correct.

This is merely a prototype, so a lot of things don't work. A brief list 
of minimum necessary support that I do not have:
1. Content-Transfer-Encoding support
2. Support for RFC 2047/RFC 2231 in the structured header parser
3. Fixing the XXX's found in the comment
4. Documentation
5. Hooking up external APIs to it for use as a streamer (it's a bit tricky)
6. Giving it knobs to twist for output

Basic notes about the implementation:
1. This parser is fully synchronous and blocking; the intent is to spin 
the event loop using the callback in getNextBuffer.
2. The parser does not do any buffering of data; it relies on the input 
data to buffer. Looking at the body parser, this may be a mistake...
3. Not all of the extensibility grabpoints are in there, but the one for 
structured header decoding is.
4. I intend to minimize copying of the buffers as much as possible 
(indeed, body part packets are pretty much just transparently passed 

The main reason I'm writing this message is to solicit advice about the 
API. What I have so far was motivated mostly by the comments of asuth 
and protz in #maildev a few days ago; I still haven't worked out the 
full public-facing API yet.

To say that this is completely independent of libmime would be a lie; 
I've resorted to looking at the parsing code there to worry about 
various edge cases. I have, however, deviated with respect to part 
numbering, to make it fall more in line with the IMAP (RFC 3501) 
numbering scheme. One problem that said scheme has is the inability to 
distinguish between a series of nested rfc822 messages, so I have opted 
here to represent that as an extra `$' tacked onto the part numbering 
scheme (instead of a `.1').

Some final notes:
* There are about 450 lines of code right now, with well over a 100 
lines of it in comments (as much notes about MIME specs as anything useful).
* The core header-parsing code is 55 lines of code (including comments), 
while libmime's equivalent function is 177 lines long.
* I've previously been concerned about speed. The current implementation 
takes .753s (user-time) to snarf the 1.7-MiB torture test (which 
suggests throughput of ~2-2.5MiB/s right now); unfortunately, I don't 
have equivalent timing for the current MIME parser.
* I haven't decided on a license for the parser yet. I tend to prefer 
BSD-licensing everything I do, and this is certainly more widely 
applicable than just mailnews code. On the other hand, this I intend to 
get checked into comm-central, so maybe it might be a better idea to 
stick to a tri-license.

Comments/questions/concerns/feedback greatly appreciated!

[1] <news://news.mozilla.org:119/4E320577.4090804@verizon.net>, or 
if you prefer (you oughtn't).
appears to be the easiest way to find it. Note this is a gzip'd mbox and 
not a message/rfc822 file by itself.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parser.js
Type: application/x-javascript
Size: 17418 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/tb-planning/attachments/20120127/5f366a2a/attachment.js>

More information about the tb-planning mailing list