A Crash Trend Dashboard (wrt to crashes and clients)

Benjamin Smedberg benjamin at smedbergs.us
Wed Mar 1 15:21:08 UTC 2017


In the "Percentage of Weekly Active Users that Crashed" chart, how much
work is it to break that down in two dimensions:

* Vertically, stack "one crash", "two crashes", "more than two crashes"
* Separate out/focus on heavy users, according to whatever definition
bcolloran is using. So explicitly for our target heavy-user growth market,
see how many of those are crashing once/more than once per week.

I don't quite understand "Percentage of First Crashes Recorded". How far
back is "ever" in this? e.g. if somebody crashed two years ago and then
again this week, how would that show up? This feels like the kind of graph
which has unintentional consequences: people using the browser more will
probably crash more and make the line go up, even though that's not
necessarily an indication of a problem.

"Percentage of New Profiles that Crashed" is one of the key baselines I was
looking for, and I'm super-excited that we finally have it! I have some
questions about the method. You are aggregating this by calendar week; does
this mean both "users who are new within this week" and "users who crashed
within this week"? Here's my concern:

User starts using Firefox on Friday.
User has a crash on Tuesday.
According to my expected definition, this would be a crash in the user's
first week and so would count against this graph. But if you're just
aggregating by week, I'm not sure whether this is counted.

"Hours Between Crashes" seems like a typical MTBF chart but I'm confused
because the graph limits itself to users who have crashed more than once.
What is the timeframe over which we're measuring these things? e.g. if
there are users who never crash, or only crash once a year, wouldn't we
want to include that in this kind of chart? Or how should I read this chart
as a manager. We want to end up with more users never crashing, which seems
like it would then make this chart appear worse instead of better.

The "Count of Hours Between Crashes per User" graph is scary! Is this
really per *user* or per *crash*. For example:

User A crashes has never crashed before, and crashes 3 times in one hour.
Does this show up as one crash with infinite uptime and two crashes with
0-hour uptime?
How does a user show up who didn't crash at all this week?

Also, how much resolution do these datasets have? e.g. if somebody has 2
content crashes per day, they would be recorded in one subsession (main
ping), and I don't think we currently have internal timing data to
distinguish those. Would those be counted as happening in the same hour,
even though they could be anywhere from 0-23 hours apart? Do you think that
affects the overall quality of this data? There is a similar question about
the per-hour distribution graph below.

--BDS


On Tue, Feb 28, 2017 at 2:23 PM, Saptarshi Guha <sguha at mozilla.com> wrote:

> Happy to present the crashgraphs dashboard. A high level dashboard that
> captures
>
> - average profile crash rate (average across profiles crash rates)
> - % of profiles crashing in the last week
> - time between crashes for profiles that have another crash in their
> history
>  (history is lifetime!)
> - hours and days between crashes (for profiles with 2+ crashes). Lower is
> *bad*
>
> We capture two things i've been looking for
>
> - a profile level view of crash (% of profiles experiencing a crash)
> - a software engineering level view of crash (hours used across crashes)
>
> This is high level so not as detailed as arwwestableyet.
>
> Many thanks to Andre Duarte and Connor Ameres for working and designing
> this.
>
> Your comments welcome!
>
> https://people-mozilla.org/~sguha/crashgraphs/
>
> Regard
> saptarsi
>
>
> Appendix
>
> Also since we use main_summary, we cannot eliminate shutdown crashes. That
> would be a nice metric to include in main_summary itself.
>
> That said, on average  shutdown crashes is a fairly constant fraction of
> total content crashes [1] (the latter measured in main_summary). That is on
> average. For a given profile with 5 content crashes(as per main_summary),
> it's tough to say how many are shutdown crashes
>
> [1] https://docs.google.com/document/d/1jzcEPI4NLlar102kS1WB
> v1cgnxDULYQNl4VawmQEr5c/edit#heading=h.imxtnojofajm
>
> _______________________________________________
> fhr-dev mailing list
> fhr-dev at mozilla.org
> https://mail.mozilla.org/listinfo/fhr-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fhr-dev/attachments/20170301/fd7b3879/attachment.html>


More information about the fhr-dev mailing list