A Crash Trend Dashboard (wrt to crashes and clients)

Georg Fritzsche gfritzsche at mozilla.com
Wed Mar 1 13:08:52 UTC 2017


It would be great to default to public repos (except where there is a
specific reason not to).
I think we should keep up with the openness we have at Mozilla; for reasons
of transparency, contributor/community access, knowledge sharing... and
even ease of team access.

Georg

On Tue, Feb 28, 2017 at 9:29 PM, Saptarshi Guha <sguha at mozilla.com> wrote:

> Aah, it's private. Will make it public or give you access
> Cheers
>
>
> On Tue, Feb 28, 2017 at 12:26 PM, Chris Hutten-Czapski <
> chutten at mozilla.com> wrote:
>
>> Ack, 404.
>>
>> On Tue, Feb 28, 2017 at 3:25 PM, Saptarshi Guha <sguha at mozilla.com>
>> wrote:
>>
>>> Absolutely.
>>> Github: https://github.com/aguimaraesduarte/CrashGraphs
>>>
>>> I'll schedule a meeting with Andre,Connor,Chris and myself to discuss
>>> the inner workings of this.
>>>
>>> Cheers
>>> Saptarshi
>>>
>>> On Tue, Feb 28, 2017 at 12:23 PM, Chris Hutten-Czapski <
>>> chutten at mozilla.com> wrote:
>>>
>>>> A better link for the telemetry-based crash measurements is
>>>> https://telemetry.mozilla.org/crashes/ (telemetry_crashes on my
>>>> gh-pages is now unmaintained)
>>>>
>>>> Very nice, I like the layout and presentation.
>>>>
>>>> Can we take a look at the analysis backing the data? Your page shows
>>>> numbers that are very different from other measures of crashes per
>>>> kuh[1][2][3]. I think there are a couple of things that might contribute to
>>>> this, but without looking at the analysis itself I can only speculate. (and
>>>> this is beyond the aforementioned content shutdownkill issue)
>>>>
>>>> :chutten
>>>>
>>>> [1]: https://telemetry.mozilla.org/crashes/
>>>> [2]: https://sql.telemetry.mozilla.org/dashboard/stability-metric
>>>> s-for-e10s-add-ons-experiment-release-49-50-51-
>>>> [3]: https://sql.telemetry.mozilla.org/queries/689#1163
>>>>
>>>>
>>>> On Tue, Feb 28, 2017 at 3:18 PM, Saptarshi Guha <sguha at mozilla.com>
>>>> wrote:
>>>>
>>>>> Very very likely. As mentioned in the initial post
>>>>>
>>>>> "
>>>>>
>>>>> Also since we use main_summary, we cannot eliminate shutdown crashes.
>>>>> That would be a nice metric to include in main_summary itself.
>>>>>
>>>>> That said, on average  shutdown crashes is a fairly constant fraction
>>>>> of total content crashes [1] (the latter measured in main_summary). That is
>>>>> on average. For a given profile with 5 content crashes(as per
>>>>> main_summary), it's tough to say how many are shutdown crashes
>>>>>
>>>>> [1] https://docs.google.com/document/d/1jzcEPI4NLlar102kS1WB
>>>>> v1cgnxDULYQNl4VawmQEr5c/edit#heading=h.imxtnojofajm
>>>>>
>>>>> "
>>>>>
>>>>> We would love to remove shutdown crashes but since we use main-summary
>>>>> we cannot.
>>>>>
>>>>> crash_summary table might be a fix for this. Last i checked it didn't
>>>>> have some fields
>>>>> we need (for determining new users). If it has we can move to that and
>>>>> remove shutdowns
>>>>>
>>>>> Thanks for the comment. This is increases the urgency to get fields we
>>>>> need into crashsummary
>>>>> ( https://github.com/mozilla/telemetry-batch-view/blob/maste
>>>>> r/src/main/scala/com/mozilla/telemetry/views/CrashSummaryView.scala )
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 28, 2017 at 12:03 PM, Mike Conley <mconley at mozilla.com>
>>>>> wrote:
>>>>>
>>>>>> I may not be sharing new information here (I just subscribed to this
>>>>>> mailing list!), but thought I'd add in here:
>>>>>>
>>>>>> I do know that at least for e10s, there is a specific type of crash,
>>>>>> ("shutdownkill crashes") that occur silently, and on purpose. This is
>>>>>> when the content process does not shut down in time. When that occurs,
>>>>>> the parent process kills the content process and collects a minidump
>>>>>> for
>>>>>> submission. This happens without the crash reporter dialog being
>>>>>> presented to the user.
>>>>>>
>>>>>> When evaluating e10s release criteria, I believe these crashes were
>>>>>> subtracted from the overall content crash rate. You can see that in
>>>>>> chutten's graph here:
>>>>>>
>>>>>> https://chutten.github.io/telemetry_crashes/
>>>>>>
>>>>>> "M + C - Content Shutdown (M+C-S)" reflects what I'm talking about
>>>>>> here.
>>>>>>
>>>>>> Could it be that the 10x increase you're seeing is mostly composed of
>>>>>> shutdownkill crashes?
>>>>>>
>>>>>> -Mike
>>>>>>
>>>>>> On 28/02/2017 2:46 PM, John Jensen wrote:
>>>>>> > Congratulations for producing an information-filled, easy-to-digest
>>>>>> > dashboard, but ....
>>>>>> >
>>>>>> > Wow. Crash rates have increased 10x since September!?
>>>>>> >
>>>>>> >> Content crashes have been steadily increasing since September 2016.
>>>>>> >> Content crashes for profiles that have e10s enabled are
>>>>>> significantly
>>>>>> > higher (almost 20x) than for those that don't have it enabled,
>>>>>> >> around 8-12% of weekly active Firefox users have experienced any
>>>>>> type
>>>>>> > of crash.
>>>>>> >> The number of new profiles (created that week) that crashed is
>>>>>> > increasing.
>>>>>> >> The number of hours between crashes for users who have had a
>>>>>> previous
>>>>>> > crash in their history is decreasing. This means that crashes (of
>>>>>> any
>>>>>> > type) are becoming more frequent in general.
>>>>>> >> Over 50% of Firefox users who crash per week have had their
>>>>>> previous
>>>>>> > crash within 7 days.
>>>>>> >
>>>>>> > There is only alarming news on this dashboard. That's fine, but
>>>>>> please
>>>>>> > correct me if I am wrong.
>>>>>> >
>>>>>> > I know it's not its objective, but are there better data
>>>>>> > sources/analytic tools now to understand key drivers of these
>>>>>> crashes?
>>>>>> > Is there a dashboard, say, showing a "burndown" chart of fixes?
>>>>>> >
>>>>>> > John
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On 28 February 2017 at 11:23, Saptarshi Guha <sguha at mozilla.com
>>>>>> > <mailto:sguha at mozilla.com>> wrote:
>>>>>> >
>>>>>> >     Happy to present the crashgraphs dashboard. A high level
>>>>>> dashboard
>>>>>> >     that captures
>>>>>> >
>>>>>> >     - average profile crash rate (average across profiles crash
>>>>>> rates)
>>>>>> >     - % of profiles crashing in the last week
>>>>>> >     - time between crashes for profiles that have another crash in
>>>>>> their
>>>>>> >     history
>>>>>> >      (history is lifetime!)
>>>>>> >     - hours and days between crashes (for profiles with 2+ crashes).
>>>>>> >     Lower is *bad*
>>>>>> >
>>>>>> >     We capture two things i've been looking for
>>>>>> >
>>>>>> >     - a profile level view of crash (% of profiles experiencing a
>>>>>> crash)
>>>>>> >     - a software engineering level view of crash (hours used across
>>>>>> crashes)
>>>>>> >
>>>>>> >     This is high level so not as detailed as arwwestableyet.
>>>>>> >
>>>>>> >     Many thanks to Andre Duarte and Connor Ameres for working and
>>>>>> >     designing this.
>>>>>> >
>>>>>> >     Your comments welcome!
>>>>>> >
>>>>>> >     https://people-mozilla.org/~sguha/crashgraphs/
>>>>>> >     <https://people-mozilla.org/~sguha/crashgraphs/>
>>>>>> >
>>>>>> >     Regard
>>>>>> >     saptarsi
>>>>>> >
>>>>>> >
>>>>>> >     Appendix
>>>>>> >
>>>>>> >     Also since we use main_summary, we cannot eliminate shutdown
>>>>>> >     crashes. That would be a nice metric to include in main_summary
>>>>>> itself.
>>>>>> >
>>>>>> >     That said, on average  shutdown crashes is a fairly constant
>>>>>> >     fraction of total content crashes [1] (the latter measured in
>>>>>> >     main_summary). That is on average. For a given profile with 5
>>>>>> >     content crashes(as per main_summary), it's tough to say how
>>>>>> many are
>>>>>> >     shutdown crashes
>>>>>> >
>>>>>> >     [1] https://docs.google.com/docume
>>>>>> nt/d/1jzcEPI4NLlar102kS1WBv1cgnxDULYQNl4VawmQEr5c/edit#headi
>>>>>> ng=h.imxtnojofajm
>>>>>> >     <https://docs.google.com/document/d/1jzcEPI4NLlar102kS1WBv1
>>>>>> cgnxDULYQNl4VawmQEr5c/edit#heading=h.imxtnojofajm>
>>>>>> >
>>>>>> >     _______________________________________________
>>>>>> >     fhr-dev mailing list
>>>>>> >     fhr-dev at mozilla.org <mailto:fhr-dev at mozilla.org>
>>>>>> >     https://mail.mozilla.org/listinfo/fhr-dev
>>>>>> >     <https://mail.mozilla.org/listinfo/fhr-dev>
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > John Jensen
>>>>>> > jjensen at mozilla.com <mailto:jjensen at mozilla.com>
>>>>>> > Director, Organization Strategy
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > fhr-dev mailing list
>>>>>> > fhr-dev at mozilla.org
>>>>>> > https://mail.mozilla.org/listinfo/fhr-dev
>>>>>> >
>>>>>> _______________________________________________
>>>>>> fhr-dev mailing list
>>>>>> fhr-dev at mozilla.org
>>>>>> https://mail.mozilla.org/listinfo/fhr-dev
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> fhr-dev mailing list
>>>>> fhr-dev at mozilla.org
>>>>> https://mail.mozilla.org/listinfo/fhr-dev
>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> fhr-dev mailing list
> fhr-dev at mozilla.org
> https://mail.mozilla.org/listinfo/fhr-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fhr-dev/attachments/20170301/5a45ae1f/attachment-0001.html>


More information about the fhr-dev mailing list