Target and Toys R Us posted overall sales declines during the holidays.
Multivariable testing has made a big splash with online retailers, though its success really is riding the crest of a wave building for years.
Multivariable testing has made a big splash in the last few years with online retailers, yet this sudden success is really riding the crest of a wave that’s been building for years. Multivariable testing-also called scientific testing, multivariate or matrix testing, Taguchi methods, or other branded terms-is based on a specialized field of statistics that has evolved over the last 80 years. Since the 1930s, a small group of academic statisticians has developed new test designs and techniques focused on efficient ways to test more variables more quickly.
Often called “experimental design,” this specialty falls outside of mainstream statistics and has remained largely unknown to the business world. Only in the last decade have practitioners found a successful approach for using this impressive depth of academic theory to navigate fast-moving marketing channels.
Many variables at once
The concept is simple: with the right techniques you can change many variables at once-but in an organized way-so you can separate the impact of each. Complex mathematical principles define the “organized way” you need to set up your multivariable test.
The depth of statistical complexity below the surface can seem daunting. As marketers, you should understand the fundamental concepts and basic pros and cons of the selected test strategy. The expert who guides you through the process should be able to explain the _rationale of his approach and have a good grasp of the vast realm of techniques available. These include efficient test designs like full-factorial, fractional-factorial and Plackett-Burman designs, plus a veritable A-to-Z of specialized tools: axial runs, Bonferroni method, confounding, dispersion effects and experimental units, plus _orthogonality, projectivity and quadratic effects, down to the X-, Y- and Z-components of interaction.
Various designs and techniques are appropriate for different _marketing programs and objectives. For example, Plackett-Burman designs work well for testing 10-20 creative elements very efficiently in high-production-cost direct mail programs. Fractional-factorial designs are flexible and powerful for testing 5-15 creative elements and select interactions in e-mail and Internet programs. For product, price and offer testing-where elements are known to be important and interactions can be very large and valuable-full-factorial designs often are best. The number and type of test elements, cost and constraints on the number of “recipes” you can create, and the desired speed and precision of the test are among the issues that impact your choice of test design and strategy.
Since the dawn of direct marketing, split-run techniques (also called A/B splits, test-_control or _champion-_challenger testing) have been the standard for marketing testing. You may have a long-running (or “control”) banner advertisement and test it against one other with only the tagline changed, so any difference in click-through and conversion can be attributed to this one variable alone.
In contrast, one multivariable test design is made up of a number of related test “recipes.” Instead of the one-variable change of a split-run test, one new banner ad in a multivariable test would include a number of changes-perhaps the new tagline along with a control graphic, new price, additional starburst and control background color. These multiple versions each has a unique combination of all elements in the test, each providing one new piece of data on every test element. Analyzing all recipes together, but grouping data in different ways, you can separate the precise impact of each change. The statistical structure requires that the creative execution accurately follows the defined test recipes.
Scientific multivariable tests have four key advantages over split-run techniques. You can test many marketing elements at once, using the same small sample size as A/B split, with results that quantify the impact of each element alone (main effect) and in combination with others (interaction), and with a vast array of techniques available to customize your approach.
The best e-mail recipe
A large Internet retailer/cataloger wanted to increase e-mail conversion. With 2-3 e-mail drops per week to a customer base of 450,000, conversion rate averaged 1% per campaign. The team had a challenge pinpointing what worked best because they continually changed e-mail creatives and offers to keep the program fresh.
After brainstorming 42 ideas, the team narrowed the list down to 18 bold, independent test elements for one multivariable test made up of 20 different combinations, or recipes, of all 18 elements. Four of these “recipes” are shown on p. 75 (with control levels in black and the new ideas in orange). A direct subject line might be something like “Save 20% through Friday.” A creative subject line would change that to “Awesome new products and super savings that won’t last.”
Recipe 20 was simply the control. All other recipes had about half the elements set at the control level and half at the new level, but a different half-and-half for each recipe. Though these four may look like random combinations, all recipes fit within the precise statistical test design. Like the pieces of a puzzle, all recipes fit together to provide accurate data on the main effects and important interactions of all 18 elements.
The team ran the same test across three different types of promotions to see promotion-_specific effects plus elements that were important across all campaigns. Results for the first campaign are shown below, including the 18 main effects (shown in the bar chart) and one key interaction (in the line plot).
In the chart, main effects are arranged from the largest (D, at the top) to the smallest. Test elements are listed on the left with the “new idea” shown in parentheses. The length of the bar and the label show the size of the main effect. The +/- sign on the effect shows whether the new idea is better (positive effect) or the control is better (negative effect). The dashed line of significance is a measure of experimental error. All effects below that line (less than 6%) can be explained simply by random variation. Effects are shown as a percentage change from the control, so a 10% effect would increase conversion rate from 1% for the control to 1.1%.