Telemetry Experiments: experiment, A/B, and questionnaire implementation in Firefox

Benjamin Smedberg benjamin at smedbergs.us
Tue Jan 28 23:14:34 UTC 2014


One of the things I have been looking at in some detail recently is how 
we can use qualitative measurements in Firefox. This includes better 
integration of existing Telemetry and FHR systems, but also measurements 
which don't fit into those systems.

Part of my study was prompted by a request from Mozilla user research, 
who want to be able to run experiments and collect data from Firefox 
users, in a similar way to the mostly-defunct Test Pilot program, but 
with a better sample population and more rigorous engineering and release.

I also collected examples of problems that various groups have wanted to 
solve using data collection. It turns out that some of these use cases 
can already be solved using existing deployment and measurement systems, 
while others need additional features.

https://docs.google.com/document/d/19qPbV8XJQL0bDwG4ZOfFhIdwbeFAM-5VT9S8uCpzapc/edit?usp=sharing

I'm interested to know whether there are other important cases which a 
significantly different from the ones which I've already collected. In 
particular, I'm looking at the following variables:

  * User population: what kind of user population is desirable/necessary
    in order to answer the question? For an early-stage UI demo, the
    desired population may be users who want to live on the bleeding
    edge and are willing to live with bugs. For some studies, we may
    want to examine user behavior in particular countries or who have
    particular addons installed.
  * Data privacy characteristics: in order to answer the question, do we
    need to collect any identifying information, such as URLs? Does
    collecting the data provide direct benefit back to users?
  * Engineering: does the measurement require changing core code, or can
    the measurement be implemented as addon code? What is the expected
    quality of the change being considered?
  * Result monitoring: what kind of result monitoring is necessary? Do
    we expect a single report to run after a while, or will this measure
    ongoing Firefox behavior? Is it important to be able to correlate
    results against other pieces of data?
  * User interactions: to what extent should users be aware that a
    measurement or experiment is in progress? Do we want to ask them
    specific questions or does the experiment require some sort of
    opt-in or opt-out (this is related to the questions about privacy
    and user population).


This quarter (in Firefox 30) my team is going to focus on building out 
one specific system, a way to deploy experiment code to prerelease users 
in Firefox desktop builds. We're going to start out small, solving a 
specific request from Gregg Lind in user research for a tool to deploy 
some experiments related to search behavior in Firefox.

https://docs.google.com/document/d/1GPpkIcWFNkZmXONjqBCc05U3uocOD-1jpZHdAsR0v1k/edit?usp=sharing

Each experiment will be deployed as a restartless addon, and 
measurements will be taken via some combination of existing FHR and 
telemetry data collection channels. The experiment system will be 
limited to people with telemetry enabled(1) and each experiment will 
also be able to set additional conditions, such as limiting the 
experiment to users in certain release channels, locales, addons or lack 
of addons, etc.

After this first phase is complete, I expect to extend this system. We 
will probably want to be able to run similar experiments in Firefox for 
Android, although addons can do far less UI modification in general. We 
will also want to handle A/B testing where we don't install an addon, 
but simply flip various pref configurations. We also plan on extending 
this same system as a way to deploy questionnaires or surveys to users. 
For example, if we find an addon which appears to be malware, we might 
ask users whether they know the addon is installed, whether they 
installed it intentionally, etc. I am interested if people have specific 
high-priority studies or surveys in mind that we can use to serve as 
models for future revisions.

Finally, we are considering whether and how to combine FHR and telemetry 
data collection. Each system currently has weaknesses which we'd like to 
address, and it seems that the best way forward is to combine them. This 
is still in early decision-making, but I've written up a proposal here 
for comment: 
https://docs.google.com/document/d/1JKnqejahVWMev4xUYGbRiICw0HpwopcXBqPYxco0YzU/edit?usp=sharing

Questions, concerns? Followup to firefox-dev please.

--BDS

1. Currently telemetry is enabled by default in nightly and aurora 
builds, and I have requested that it be enabled by default in all 
prerelease builds (including beta). Being able to run experiments on 
beta users and measure the results is critical, since our beta user 
population is much more representative of release users.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/firefox-dev/attachments/20140128/07b26876/attachment.html>


More information about the firefox-dev mailing list