(Reducing) Mentat Binary Size
nalexander at mozilla.com
Wed Jun 27 16:24:48 UTC 2018
On Tue, Jun 26, 2018 at 11:08 PM, Thom Chiovoloni <tchiovoloni at mozilla.com>
> So, it came up in the Sync.Next meeting today that we're a bit concerned
> about the mentat binary size, and don't know where it comes from, etc. My
> completely unfounded suspicion has been that there are some easy wins here.
> So I decided to see if this is true. (I'm sending this to the sync-dev
> mailing list even though it's clearly the wrong place mostly because it
> would get completely lost if I just posted it in the slack channel, and it
> seems plausible that someone might want to refer to it later).
This was my suspicion too, so thank you for digging in and producing this
> I decided to test using the mentat-cli binary, built for a 64-bit mac.
> Because that's the platform I use, and because binaries are a lot easier to
> do measurements on than libraries. Especially when we do LTO, which we do,
> also, on binaries you can actually `strip` them without reading a bunch of
> manpages on what flags to pass `strip` .
> Anyway, the baseline on my machine for mentat_cli is 9539784 bytes
> (9.5MBish), after running `strip` it shrinks to 7855608 bytes.
> First step is changing the optimization level to one that is set to
> optimize for size. There are two of these, opt-level "s" and "z", opt-level
> "z" being the more substantial. This shrunk the binary to 9114776 bytes,
> 5352264 after strip.
> The next idea I had is that a lot of the libraries we import seem to
> involve networking. The only network code we have is in `tolstoy`, which is
> very experimental and not something we're planning on using in production.
> Moving this to live under a `feature` flag reduces our size to 7396480
> bytes, 4571776 bytes after strip.
The WIP branch that Grisha and I are working on includes a "syncable"
feature that does this, so we'll handle being able to avoid Hyper, etc in
the very near future. But thanks for the reminder!
> I also noticed a few libraries that had multiple versions built.
> Specifically `regex` seems like it might be heavy (at the very least, it
> has dependent crates), and we're building both 0.2 and 1.0.1 (the former is
> specified by mentat_query_sql, and the latter by env_logger). Moving both
> of these to be 1.0.1 brings the size down to 6565472 bytes, 4036968 after
> (Worth noting that regex is a transitive dependency from `env_logger`,
> which I suspect we aren't thrilled with, and the use of it inside
> mentat_query_sql could probably be trivially rewritten
> to avoid the dependency.)
I filed a few issues rooted at https://github.com/mozilla/mentat/issues/772.
We should definitely cull `regex`, but `env_logger` is an application
choice more than a Mentat choice.
> There are probably other targets for this (the `memchr` lib seems to be
> included twice, but while I've done exactly no checking, my gut says it
> doesn't have the same heft as `regex`).
Is it possible to estimate a size metric for each of our dependencies?
Yes, it's difficult with LTO/inlining/dead code removal, but it would help
gauge where to put effort.
> There are two  more things I tried.
> It seems likely for various reasons that we will have to build mentat with
> panic="abort" when distributing an FFI binary. This is mainly because it's
> undefined behavior to `panic` across FFI boundaries, which basically means
> arbitrarily bad things can happen (see  below on some hedging on this,
> but I really don't know what other options we have here). Doing this
> reduced the size to 5357500 bytes, 3484324 bytes after stripping.
> Finally, I tried replacing jemalloc with the system allocator. This shaved
> off less than I was expecting, but not nothing. End result was 5097456
> bytes, 3293640 after stripping.
> This is under half the size we started with (for the stripped library,
> it's a bit over half unstripped, but who needs debug symbols?). At this
> point I'm giving up. It's kind of late and I've had a few beers, and I
> think I've hit most of the low hanging fruit. You can stare at this work
> here <https://github.com/thomcc/mentat/tree/shrink-binary> if you have a
> burning need to, some of it is probably worth PRing too! I'll do that
> - Thom
>  These last two comments are probably the cause of some of our
> confusion here, and the first might make this work not terribly
> representative, although I'd be surprised if many of the changes that saved
> size don't do the same for a library -- given that we intend to ship as a
> static or dynamic libraries (e.g. we distribute native code libraries that
> might not always get benefits from LTO), it's possible that our strategy of
> 'put everything in separate crates and rely on LTO to sort it out for us'
> might not be so great. That also could be wrong!
>  Actually, I tried more but most didn't really work. Like hackily
> removing our dependency on `num`, which seems to exist primarily so we can
> use bigint, which we don't fully implement. This only shaved off about 10k,
> but I guess that's not too surprising since we aren't doing any arithmetic
> with the bigints.
>  While some libraries are able to avoid this using
> `std::panic::catch_unwind`, rusqlite doesn't support this due to use of
> Cell and RefCell. Neither does `sync15_adapter` (although I'll likely fix
> this, as we should actually be unwind-safe). In the long term, it's not
> clear to me that we want `panic = abort` behavior, although from what I can
> tell, most of mentat was written with a pattern like this in mind (I could
> be wrong).
> So, compiling with panic = "abort" is probably what we'll want to do in
> the short term, *maybe* what we'll want to do in the long term, and should
> result in a substantial space saving (no libbacktrace, no code bloat from
> landing pads, etc), so even though it's possible that it's not great for
> building something robust, I tried it.
Thanks for getting this started!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Sync-dev