A Crash Trend Dashboard (wrt to crashes and clients)

Andre Duarte aduarte at mozilla.com
Wed Mar 1 03:50:23 UTC 2017


Hi all,

we built this dashboard with the intent of getting a summary of several
crash rates and types, with a special focus on content crash rates since
the introduction of Electrolysis.

The increase in crash rates is expected, due to the correlation of content
crashes with e10s adoption. As seen here: https://metrics.mozilla.com/
protected/sguha/crashgraphs/#crash-rates-e10s, content crashes are the main
difference between e10s and non-e10s users, while main and plugin crashes
are not dissimilar between the two groups (click on the buttons on the
right-hand-side to toggle the crash types to compare between the two
groups). Therefore, I would suggest waiting until e10s adoption stabilizes
in order to see whether crash rates even out as well.

In addition, content crashes are more frequent than the other two types of
crashes (maybe because they are dependent on how many tabs users have
open?). This causes the time between crashes to decrease: since more users
have e10s enabled, more crashes are being recorded, so the average time
in-between crashes decreases.

The data collection that was used to build this dashboard allows us to see
that this is indeed happening: more content crashes are happening (and
therefore general crash rates are increasing). This gives us a great
opportunity to look into how to handle this, and try to understand what
exactly is causing so many content crashes to happen in the first place.

I'm happy to answer any more questions. I am working towards making the
github repository public so that you can take a look exactly how all these
metrics are being calculated (previous commits contain private Mozilla
information such as weekly active users, so I cannot make the current
repository public).

Best

Andre Duarte
irc; aduarte


On Tue, Feb 28, 2017 at 12:03 PM, Saptarshi Guha <sguha at mozilla.com> wrote:

> Firstly, wrong link and now correct:
>
> https://metrics.mozilla.com/protected/sguha/crashgraphs/
>
> (behind LDAP)
>
> Most of the increase is coming from content-crashes. Also E10s
> has moved a lot of crashes into content-crashes.
>
> Andre, would you like to elaborate?
> Cheers
>
>
> On Tue, Feb 28, 2017 at 11:46 AM, John Jensen <jjensen at mozilla.com> wrote:
>
>> Congratulations for producing an information-filled, easy-to-digest
>> dashboard, but ....
>>
>> Wow. Crash rates have increased 10x since September!?
>>
>> > Content crashes have been steadily increasing since September 2016.
>> > Content crashes for profiles that have e10s enabled are significantly
>> higher (almost 20x) than for those that don't have it enabled,
>> > around 8-12% of weekly active Firefox users have experienced any type
>> of crash.
>> > The number of new profiles (created that week) that crashed is
>> increasing.
>> > The number of hours between crashes for users who have had a previous
>> crash in their history is decreasing. This means that crashes (of any type)
>> are becoming more frequent in general.
>> > Over 50% of Firefox users who crash per week have had their previous
>> crash within 7 days.
>>
>> There is only alarming news on this dashboard. That's fine, but please
>> correct me if I am wrong.
>>
>> I know it's not its objective, but are there better data sources/analytic
>> tools now to understand key drivers of these crashes? Is there a dashboard,
>> say, showing a "burndown" chart of fixes?
>>
>> John
>>
>>
>>
>>
>> On 28 February 2017 at 11:23, Saptarshi Guha <sguha at mozilla.com> wrote:
>>
>>> Happy to present the crashgraphs dashboard. A high level dashboard that
>>> captures
>>>
>>> - average profile crash rate (average across profiles crash rates)
>>> - % of profiles crashing in the last week
>>> - time between crashes for profiles that have another crash in their
>>> history
>>>  (history is lifetime!)
>>> - hours and days between crashes (for profiles with 2+ crashes). Lower
>>> is *bad*
>>>
>>> We capture two things i've been looking for
>>>
>>> - a profile level view of crash (% of profiles experiencing a crash)
>>> - a software engineering level view of crash (hours used across crashes)
>>>
>>> This is high level so not as detailed as arwwestableyet.
>>>
>>> Many thanks to Andre Duarte and Connor Ameres for working and designing
>>> this.
>>>
>>> Your comments welcome!
>>>
>>> https://people-mozilla.org/~sguha/crashgraphs/
>>>
>>> Regard
>>> saptarsi
>>>
>>>
>>> Appendix
>>>
>>> Also since we use main_summary, we cannot eliminate shutdown crashes.
>>> That would be a nice metric to include in main_summary itself.
>>>
>>> That said, on average  shutdown crashes is a fairly constant fraction of
>>> total content crashes [1] (the latter measured in main_summary). That is on
>>> average. For a given profile with 5 content crashes(as per main_summary),
>>> it's tough to say how many are shutdown crashes
>>>
>>> [1] https://docs.google.com/document/d/1jzcEPI4NLlar102kS1WB
>>> v1cgnxDULYQNl4VawmQEr5c/edit#heading=h.imxtnojofajm
>>>
>>> _______________________________________________
>>> fhr-dev mailing list
>>> fhr-dev at mozilla.org
>>> https://mail.mozilla.org/listinfo/fhr-dev
>>>
>>>
>>
>>
>> --
>> John Jensen
>> jjensen at mozilla.com
>> Director, Organization Strategy
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fhr-dev/attachments/20170228/7ea2ff36/attachment.html>


More information about the fhr-dev mailing list