The Surprising Power of Online Experiments

Hot industry news & trends

Discover more articles

In 2012 a Microsoft employee working on Bing had an idea about changing the way the search engine displayed ad headlines. Developing it wouldn’t require much effort—just a few days of an engineer’s time—but it was one of hundreds of ideas proposed, and the program managers deemed it a low priority. So it languished for more than six months, until an engineer, who saw that the cost of writing the code for it would be small, launched a simple online controlled experiment—an A/B test—to assess its impact. Within hours the new headline variation was producing abnormally high revenue, triggering a “too good to be true” alert. Usually, such alerts signal a bug, but not in this case. An analysis showed that the change had increased revenue by an astonishing 12%—which on an annual basis would come to more than $100 million in the United States alone—without hurting key user-experience metrics. It was the best revenue-generating idea in Bing’s history, but until the test its value was underappreciated.

Humbling! This example illustrates how difficult it can be to assess the potential of new ideas. Just as important, it demonstrates the benefit of having a capability for running many tests cheaply and concurrently—something more businesses are starting to recognize.

Today, Microsoft and several other leading companies—including Amazon, Booking.com, Facebook, and Google—each conduct more than 10,000 online controlled experiments annually, with many tests engaging millions of users. Start-ups and companies without digital roots, such as Walmart, Hertz, and Singapore Airlines, also run them regularly, though on a smaller scale. These organizations have discovered that an “experiment with everything” approach has surprisingly large payoffs. It has helped Bing, for instance, identify dozens of revenue-related changes to make each month—improvements that have collectively increased revenue per search by 10% to 25% each year. These enhancements, along with hundreds of other changes per month that increase user satisfaction, are the major reason that Bing is profitable and that its share of U.S. searches conducted on personal computers has risen to 23%, up from 8% in 2009, the year it was launched.

At a time when the web is vital to almost all businesses, rigorous online experiments should be standard operating procedure. If a company develops the software infrastructure and organizational skills to conduct them, it will be able to assess not only ideas for websites but also potential business models, strategies, products, services, and marketing campaigns—all relatively inexpensively. Controlled experiments can transform decision making into a scientific, evidence-driven process—rather than an intuitive reaction. Without them, many breakthroughs might never happen, and many bad ideas would be implemented, only to fail, wasting resources.

Yet we have found that too many organizations, including some major digital enterprises, are haphazard in their experimentation approach, don’t know how to run rigorous scientific tests, or conduct way too few of them.

Together we’ve spent more than 35 years studying and practicing experiments and advising companies in a wide range of industries about them. In these pages we’ll share the lessons we’ve gleaned about how to design and execute them, ensure their integrity, interpret their results, and address the challenges they’re likely to pose. Though we’ll focus on the simplest kind of controlled experiment, the A/B test, our findings and suggestions apply to more-complex experimental designs as well.

Appreciate the Value of A/B Tests
In an A/B test the experimenter sets up two experiences: “A,” the control, is usually the current system and considered the “champion,” and “B,” the treatment, is a modification that attempts to improve something—the “challenger.” Users are randomly assigned to the experiences, and key metrics are computed and compared. (Univariable A/B/C tests and A/B/C/D tests and multivariable tests, in contrast, assess more than one treatment or modifications of different variables at the same time.) Online, the modification could be a new feature, a change to the user interface (such as a new layout), a back-end change (such as an improvement to an algorithm that, say, recommends books at Amazon), or a different business model (such as an offer of free shipping). Whatever aspect of operations companies care most about—be it sales, repeat usage, click-through rates, or time users spend on a site—they can use online A/B tests to learn how to optimize it.

Any company that has at least a few thousand daily active users can conduct these tests. The ability to access large customer samples, to automatically collect huge amounts of data about user interactions on websites and apps, and to run concurrent experiments gives companies an unprecedented opportunity to evaluate many ideas quickly, with great precision, and at a negligible cost per incremental experiment. That allows organizations to iterate rapidly, fail fast, and pivot.

Recognizing these virtues, some leading tech companies have dedicated entire groups to building, managing, and improving an experimentation infrastructure that can be employed by many product teams. Such a capability can be an important competitive advantage—provided you know how to use it. Here’s what managers need to understand:

Tiny changes can have a big impact.
People commonly assume that the greater an investment they make, the larger an impact they’ll see. But things rarely work that way online, where success is more about getting many small changes right. Though the business world glorifies big, disruptive ideas, in reality most progress is achieved by implementing hundreds or thousands of minor improvements.

Putting credit card offers on the shopping cart page boosted profits by millions.

Consider the following example, again from Microsoft. (While most of the examples in this article come from Microsoft, where Ron heads experimentation, they illustrate lessons drawn from many companies.) In 2008 an employee in the United Kingdom made a seemingly minor suggestion: Have a new tab (or a new window in older browsers) automatically open whenever a user clicks on the Hotmail link on the MSN home page, instead of opening Hotmail in the same tab. A test was run with about 900,000 UK users, and the results were highly encouraging: The engagement of users who opened Hotmail increased by an impressive 8.9%, as measured by the number of clicks they made on the MSN home page. (Most changes to engagement have an effect smaller than 1%.) However, the idea was controversial because few sites at the time were opening links in new tabs, so the change was released only in the UK.

Hot industry news & trends

Discover more articles

Hot industry news & trends

More useful articles

OKR: The importance of measuring what really matters

Managing uncertainty

The Right Thing to Do

The Surprising Power of Online Experiments

What Your Innovation Process Should Look Like

Optimize your system

Making Mentorship a Team Effort

Driven Data Science

Reassure Your Team During Uncertainty