A tale of OOMs and thumbnails: the concrete

Richard Newman rnewman at mozilla.com
Mon Nov 4 13:35:16 PST 2013

I spent much of last week chasing down the root cause of a couple of bugs: 


This email is split into "concrete" and "abstract" parts. I figured I'd share 'cos sharing is good.

Both bugs are OOMs handling bitmaps. Both occur on releases before and after our recent favicon work.

Those signatures seemed rather obviously a partial red herring: unless we were doing something stupid like decoding an 8MB bitmap, we wouldn't run out of heap in one call. So an OOM usually means we have a memory leak somewhere else, even if it is probably image-related.

I started investigating in the usual way: set up a decent-sized profile (using Sync), replicate the issues (open a bunch of tabs, do some browsing, open about:home and leave it, open the tabs tray and leave it), and then took a memory dump.

Of note: I couldn't replicate the issue with an empty profile. Sometimes it pays to be thorough!

To investigate memory, I used Eclipse/DDMS/Monitor (whichever floats your boat) to take a memory snapshot:

(The red arrow cylinder thing.)

In Eclipse with MAT installed (http://www.eclipse.org/mat/downloads.php) this automatically opens the memory profiler.

The leak report was pretty handy:

Hey look! a 2MB bitmap!

And so was the object list:

16 load tasks… and each one holding a reference to a listener?

I did some poking around in the code, adding log statements, and discovered why the bitmap was so large, and why we had so many load tasks outstanding. That led to three patches: holding on to fewer references, fixing a simple Java coercion bug, and changing the way we capture thumbnails.


Each load task was an inner class. They weren't static, and typically they didn't even need to be multiple instances.

Furthermore, in the case of a tab's favicon load, that listener -- which sticks around until it runs -- was holding on to the tab, and when it ran it poked at the tab's inner state. That held references, for one thing, but it also didn't make any sense -- why is some arbitrary favicon code tightly coupled to the implementation of Tab?

That change was here: 


and here:


It turns out that the event itself has all the information it needs to find the affected tab. We can even update the favicon for more than one tab at a time! By inverting the ownership of this work from the OnFaviconLoadedListener to Tabs itself, I _completely eliminated_ all external references. No more leak.

Strictly speaking these references weren't important: after all, we shouldn't have any outstanding load tasks. Except when we're scrolling, and when a favicon fetch is slow, and when we have a bug. Defense in depth!

Look at the rest of the patch for more fun. The other parts on the bug fix similar issues: we were holding on to the thumbnail Bitmap just to use it as a pseudo-boolean ("did I set a thumbnail?"), for example.


If you have a generic map like this:

  HashMap<Integer, Foo> mMap …

and you access it with ints, the ints are coerced to Integers, they hash to the correct value, and all works fine.

If you access it with longs, remember that Java does type erasure; that map is

  HashMap<Object, Object> mMap …

The long is coerced to a Long, not to an int and then an Integer. While a Long and an Integer with the 'same' value have the same .hashCode, they are not .equals, and so the map lookup will fail. The patch was simply:

-    public static void removeLoadTask(long taskId) {
+    public static void removeLoadTask(int taskId) {

Terrifying, isn't it?

I found that by adding a log statement to removeLoadTask. It was being called, but the total didn't shrink. Lightbulb moment.


We did the following.

* How big is a column in about:home? Recompute this every time we re-measure.
* Take that number (which is strictly bigger than the thumbnail view, due to padding) and inflate it to the next biggest power of 2. A 519-pixel column (xhdpi) jumps to 1024.
* Make sure it's even… which all powers of two would be, no? This was a holdover from a still-extant bug in thumbnail capture on non-24-bit screens.
* Take any new screenshots with that width. Dump it into the DB.
* Ask Android to display that bitmap whenever it needs to, which will hold on to a reference (for the view, 'cos the view resizes on rotate).

The fix was to (a) calculate the non-padded view size, (b) take a screenshot of that size, making it of even width for non-24-bit devices.

Memory usage for 6 thumbnails dropped by TEN MEGABYTES on my HTC One. Those also live in the DB, so the win applies there, too.

In retrospect this was an easy win. It took a fair amount of work to verify exactly what was happening, and to validate each step of the changes.

More discussion of this in the next message…

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/mobile-firefox-dev/attachments/20131104/a70b3387/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2013-11-04 12.43.41.png
Type: image/png
Size: 19341 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/mobile-firefox-dev/attachments/20131104/a70b3387/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2013-11-04 12.45.31.png
Type: image/png
Size: 30795 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/mobile-firefox-dev/attachments/20131104/a70b3387/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2013-11-04 12.46.33.png
Type: image/png
Size: 134852 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/mobile-firefox-dev/attachments/20131104/a70b3387/attachment-0005.png>

More information about the mobile-firefox-dev mailing list