TMO Stability Dashboard is Changing
Harald Kirschner
harald at mozilla.com
Wed Mar 1 18:59:32 UTC 2017
Just adding my 2c, mostly reflecting what has been said already: For
dashboards in the past we had mostly the approach of an aggregated time
series per channel and crash type. The timeseries is beneficial to see how
the ecosystem affects the whole product, but not as good for tracking
stability of builds released over time.
For release health I prefer the approach of tracking crash rate per build,
combining the crashes over the "life cycle" of a build (not doing a
timeseries). Including the data adds noise to the data and simplifying .
To see the downside of the latter to track "ecosystem health",
https://health.graphics/crashes/beta has a huge spike for 53b5, which is
not an unstable candidate but DLL injection issues by a 3rd party.
So both aggregates (rate per build and rate per date) make sense, but for
different reasons. I would set per build as default and combined time
series as option.
On Wed, Mar 1, 2017 at 9:50 AM, Benjamin Smedberg <benjamin at smedbergs.us>
wrote:
> Especially in the transition period, having a consistent view across to
> see if the regression is date-based is important. I agree that builds which
> are very old are much less interesting (probably defining "very old" is a
> per-channel thing).
>
> A lot of what release management needs is to determine whether we have a
> regression by build, and to solve that issue the current view seems
> inadequate. We really need view that shows the crash rate per build. That's
> far more valuable than inferring a regression by looking at dates.
>
> --BDS
>
> On Tue, Feb 28, 2017 at 1:11 PM, Chris Hutten-Czapski <chutten at mozilla.com
> > wrote:
>
>> Relevance to release engineering and short-to-moderate term planning, I
>> suppose. It is easier (possible) to improve the crash rate of the
>> population that is up-to-date than the population that isn't?
>>
>> I'm not particularly attached to this approach, it is what came out of
>> user feedback during development and use.
>>
>> However, conversely: How is showing all beta builds no matter their
>> version number better than only showing the most-recent version? Are there
>> signals that would show in that view that wouldn't show in this one?
>>
>> :chutten
>>
>> On Tue, Feb 28, 2017 at 1:04 PM, Benjamin Smedberg <benjamin at smedbergs.us
>> > wrote:
>>
>>> Why do we have to choose? Why is either of these behaviors better than
>>> showing all beta builds no matter their version number?
>>>
>>> --BDS
>>>
>>> On Tue, Feb 28, 2017 at 1:02 PM, Chris Hutten-Czapski <
>>> chutten at mozilla.com> wrote:
>>>
>>>> The default (only) view of the dashboard shows crash rates by crash
>>>> date (crash_aggregates' activity_date), so long as the crash happened in
>>>> the most-recent (previously, most-used) version (not build).
>>>>
>>>> An example may help.
>>>>
>>>> The previous version of the dashboard showed for the date of January 27
>>>> beta crashes for all beta builds on beta 51. This is despite beta 52 having
>>>> been released. This was because there were more people spending more time
>>>> on beta51 than on beta52.
>>>>
>>>> The present version of the dashboard now shows all crashes from beta52
>>>> on that date instead, since crash rates on beta51 cease to be something we
>>>> can do anything about. It's release 51 we then care about and can influence.
>>>>
>>>> I hope this helps.
>>>>
>>>> :chutten
>>>>
>>>>
>>>> On Tue, Feb 28, 2017 at 12:50 PM, Benjamin Smedberg <
>>>> benjamin at smedbergs.us> wrote:
>>>>
>>>>> I don't quite understand what this means.
>>>>>
>>>>> Is the default view of this dashboard to show crash rates by date, or
>>>>> crash rates by build?
>>>>>
>>>>> If showing crash rates by date, why do you care what the version is?
>>>>> Just show all version for the channel.
>>>>> If showing crash rates by build, you should be able to just line up
>>>>> the e.g. 53 nightlies < 54 nightlies < 55 nightlies all on the same graph.
>>>>>
>>>>> In either case, it doesn't seem useful to require people to pick a
>>>>> particular version.
>>>>>
>>>>> For most metrics, but especially for crashes, being able to switch
>>>>> between date metrics and build metrics is important, because some
>>>>> regressions are caused by stuff we check in and therefore show up clearly
>>>>> on per-build charts. Other things such as crashes caused by an external
>>>>> website are date-driven and having the date-based view helps correlate that
>>>>> across channels.
>>>>>
>>>>> Having to pick a version is one of the least attractive things about
>>>>> the histogram views on t.m.o as well. If you're chasing a nightly
>>>>> regression around a version bump, you end up having to switch for no
>>>>> particular reason.
>>>>>
>>>>> --BDS
>>>>>
>>>>>
>>>>> On Tue, Feb 28, 2017 at 12:43 PM, Chris Hutten-Czapski <
>>>>> chutten at mozilla.com> wrote:
>>>>>
>>>>>> Just a small change [1] with large ramifications to the TMO Stability
>>>>>> Dashboard: https://telemetry.mozilla.org/crashes/
>>>>>>
>>>>>> Previously the dashboard was looking at whatever was the most-used
>>>>>> release on a particular day, and plotting those crash numbers. Around
>>>>>> release days that approach is rather less useful, as it might show data
>>>>>> from an older, now-abandoned, release.
>>>>>>
>>>>>> From now on it will show data from the most-recent release for that
>>>>>> particular day.
>>>>>>
>>>>>> It uses https://product-details.mozilla.org/ to determine what
>>>>>> release is most recent. More details in the pull request and commit message.
>>>>>>
>>>>>> One cool thing from this is you can now more easily pick out when
>>>>>> release numbers changed because there's a dip in the kuh graphs.
>>>>>>
>>>>>> One less-cool thing is that the (oft-confusing) %ge numbers in the
>>>>>> table are now no longer sufficient to give you an idea of how
>>>>>> inflated/deflated the crash figures may be, as they are still tuned to what
>>>>>> last week's usage volume was, independent of across how many releases it
>>>>>> was split.
>>>>>>
>>>>>> I have yet to come up with a replacement "trust" figure for how
>>>>>> likely the numbers are to reflect some ideal, "true" crash rate and so am
>>>>>> just wishing bug 1336360 along so that main pings will start being received
>>>>>> faster.
>>>>>>
>>>>>> If you have any questions, please do ask.
>>>>>>
>>>>>> :chutten
>>>>>>
>>>>>> [1]: https://github.com/mozilla/telemetry-dashboard/pull/282
>>>>>>
>>>>>> _______________________________________________
>>>>>> fhr-dev mailing list
>>>>>> fhr-dev at mozilla.org
>>>>>> https://mail.mozilla.org/listinfo/fhr-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> fhr-dev mailing list
> fhr-dev at mozilla.org
> https://mail.mozilla.org/listinfo/fhr-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/fhr-dev/attachments/20170301/c062b234/attachment-0001.html>
More information about the fhr-dev
mailing list