Datacasting: What will you buy tomorrow?
18 August 2011 by Jim Giles and Peter Aldhous
Magazine issue 2825. Subscribe and save
We put a new breed of number-crunching forecasters to their toughest test yet – predicting sales of New Scientist
See gallery: "Brains, delusion, mummies: Best New Scientist covers"
CROESUS, the king of ancient Lydia, wanted to know the future. It wasn't going to be easy. First he had to climb the slopes of mount Parnassus to consult the Pythia, a Greek oracle. Those granted an audience received a prophecy, but only if they brought along a sacrificial goat. Having heard the prophecy, expressed in enigmatic verse, they then had to figure out what it meant. Croesus was told that a great empire would fall if he went to war. He invaded his Persian neighbours, only to discover that the ill-fated empire was his own.
It might seem as if little has changed. Business leaders and politicians still turn to forecasters, who often charge high sums for their services. And yet their predictions can be unhelpful, if not wildly inaccurate. Many technology analysts said the iPad would be a flop; it has sold tens of millions. The movie Titanic was supposedly destined for a fate as miserable as the ship; it earned almost $2 billion. As the old joke goes, prediction is hard - especially about the future.
Yet there is a buzz about the prediction business. That's because forecasters have a new place to look for answers. This time, their efforts are based on data - mountains of data, in fact. Our online activities are now tracked in unprecedented detail. Sensors monitor everything from hospital treatments to road conditions. Data storage is so cheap that many large companies are sitting on enough information to fill a million compact discs. The global data mountain is growing by 40 per cent annually.
This information is a potential gold mine. People who can extract trends are much in demand, with job ads seeking "quantitative rock stars" and "stats ninjas". The fledgling field of data science can already predict traffic flows, help doctors make better decisions and tell a turkey from a future box-office smash. In fact, data science has probably changed your day-to-day life without you realising it.
So how is the new forecasting done, and just how good is it? We decided to see how the latest techniques would stand up to the task of predicting what people will buy - one of the hottest challenges in the field. Finding a suitable test wasn't difficult. Every week, we at New Scientist strive to present a magazine you'd want to pick up. So, over the past four months, we have set four teams the task of trying to predict the sales of each issue of New Scientist, using some of the most promising and innovative approaches available. Are our readers predictable, we dared to wonder?
It won't spoil the ending if we tell you that the experiment's results have been mixed. The task turned out to be far more complex than we thought. Yet along the way, we discovered how data science is set to transform the world we live in, for better and for worse. Read on to find out what happened - and to learn about the gallant failure of our wild-card forecasting team: a quartet of pigeons.
Predicting magazine sales is notoriously hard, so we figured that our exercise would provide data science with a stiff test. Still, we found forecasters who were willing to give it a try. To get them started, we had each hone their techniques on historic data - sales of New Scientist between 2006 and 2010 in UK stores. We also provided images of all the magazine covers, figuring that these are likely to have influenced sales significantly. The forecasters were free to study any other data they deemed useful, from weather reports to economic indicators.
If there was a straightforward pattern in the sales figures, the teams should have found it. But several entrants fell at this first hurdle, including data scientists at the University of California, Berkeley, and nearby Stanford University, who looked at the numbers and scratched their heads.
Further evidence of the task's magnitude came from Johan Bollen at Indiana University Bloomington. He is one of the foremost practitioners of an exciting new idea - predicting the future by aggregating information from social networks. Positive tweets about a forthcoming movie, for example, are an indicator of good box-office returns. Bollen has used sentiments expressed on Twitter to forecast stock-market movements, and wanted to examine the connection between tweets about New Scientist and the magazine's sales. "We were absolutely convinced we were going to see a correlation," says Bollen. Yet none emerged.
A team of forecasting pigeons, which we had been secretly rooting for, also stumbled early on (see "Bird brains").
Then our luck appeared to change. Some of the groups looking at the sales figures began to identify patterns with modest predictive power. So, in March we decided to set up a 17-week contest in which entrants had to use the cover to predict sales before the magazine came out.
Max Sklar and Matthew Rathbone at New York University started by identifying and extrapolating long-term trends in our sales. Then they came up with a method for adjusting forecasts to take account of a seasonal variation that they discovered. Finally, they tweaked forecasts according to another pattern they found in the data: issues with pale covers sold slightly more than those with dark ones.
Basing forecasts on historical examples is a well-established approach in data science. Yahoo News, for example, monitors the behaviour of visitors to its website and uses the data it collects to decide what headlines to display next time that person visits. Data scientists also analyse old insurance claims for patterns that suggest fraud, while those in the tourism business use historic visitor numbers and economic data, such as wage levels and the price of plane tickets, to guess your next holiday destination. The approach has even been used to predict road traffic up to 24 hours ahead (see "Meet the Kagglers").
Indeed, by using a similar method, Sklar and Rathbone got off to a reasonable start. Their opening forecast was within 1000 copies of the actual figure - not bad considering that weekly sales can vary by several thousand. But the very next week, the gap was more than five times their initial miss. As the contest draws to a close, Sklar and Rathbone have only got within 1000 of the true sales figure on four more occasions.
Our second entrant - a "prediction market" - didn't fare much better. These markets date back to work in the 1990s by Robin Hanson, an economist at George Mason University in Fairfax, Virginia, and rely on collating human judgement. In a regular stock market, a company's share price is effectively the aggregate of traders' forecasts about the firm's future. Hanson realised that this "wisdom of the crowd" could be used to forecast other events. Markets for "shares" in election outcomes and box-office takings have since proved to be powerful forecasting tools.
Consensus Point of Nashville, Tennessee, a company that employs Hanson as chief scientist, set up a prediction market involving New Scientist staff. Around 25 of us used an online interface to express how much confidence we had in each edition of the magazine. If we thought a cover had big potential to drive sales, for example, we would buy shares in it. Our collective decisions would drive up or depress the share price. The closing price each week was used to predict how well that issue would sell.
Such markets give forecasters a powerful picture of where the consensus lies, because they take account of people's confidence in their prediction. The markets are also useful when you suspect the results of a forecast could change as time passes. For example, a prediction about a politician's chances in an election will evolve as their campaign runs.
Yet for this task, as a crowd we did not prove wise. The technique fared no better than Sklar and Rathbone's.
A different crowd turned out to have more smarts. Websites like Amazon's Mechanical Turk allow users to commission workers to complete tasks in return for a small payment. Cellphone companies already use such services to get rapid feedback on new designs. Perhaps workers could also forecast sales?
We turned to CrowdFlower, a San Francisco-based company that helps clients outsource tasks to the online labour pool. It became our third entrant. CrowdFlower intern Diyang Tang started by asking workers to rate old covers. Their answers didn't tell her anything useful. But then she asked if they would pay $10 - almost twice the actual price - to buy the corresponding issue. The fraction of workers that said yes correlated with historic sales, so she applied this approach in the contest.
In the last days of the contest, the "Turkers" were battling it out for first place with our final contestant, Sebastian Wernicke, a former bioinformatics statistician based in Munich. Wernicke applied a statistical algorithm to the task, and like Sklar and Rathbone, was looking to patterns in the past to predict the future. He ran a pixel-by-pixel analysis of each cover that revealed the distribution of different colours. He also considered the topics, wording and image type. Details of public holidays were thrown into the mix on the assumption that time off may affect reading habits.
Wernicke adjusted the importance of each variable, noting the impact on the algorithm's ability to estimate historical sales. Over multiple rounds of tweaking, the algorithm became more accurate. Too much purple is a bad thing, Wernicke found. Printing the magazine's title in black is good. His technique, known as regression analysis, is one of the oldest forecasting methods. It is often the first one that forecasters turn to when trying to find relevant factors.
Wernicke got off to a flying start, making the most accurate forecast for the first four weeks of the contest. For three of those weeks, he got within a few hundred of the correct figure.
Then he began missing by thousands. One week in May, his forecast was out by over 5500. He was not alone - all the teams, including the Turkers, fared poorly during this period. Near the end the errors reduced, but the disastrous run in the middle of the competition may have scuppered the chances of any of the methods having useful forecasting power.
It was then that the magnitude of our challenge dawned, rather belatedly. Even the most carefully constructed algorithms can be derailed. Our forecasters were thrown off by a sales surge in April and May that went against precedent. Why? Perhaps the modest gains made by the UK economy played a role. Maybe our competitors published a series of duds. Readers may even have sought solace in science as other media overdosed on the recent royal wedding. The list of possible factors is lengthy, and including them all in a forecasting model would be a formidable task.
Our experience shows how difficult it is to identify all the elements that drive change. Miss just one or two factors, and a forecast can go awry. It is a reminder to any decision-maker swept up in the forecasting boom: large datasets and sophisticated statistical algorithms have the potential for awesome predictive power, but they are not infallible.
We also have an admission to make: we are just a tiny bit relieved that these techniques failed to foretell the future. It hints that data science can't predict your every desire just yet. Trouble can arise when businesses attempt this. Eli Pariser, author of The Filter Bubble: What the internet is hiding from you, describes two friends who searched Google for information about Egypt. One received links to news of the country's recent political turmoil, the other was pointed to travel websites. In trying to guess what Pariser's friends would be interested in, Google ended up filtering out important information.
Our competition raises similar questions. What will media companies do if forecasting algorithms tell them to avoid certain topics, like foreign news? And what about politics? Will on-demand forecasts of public reaction cause politicians to abandon a principled stand? This already happens via the results of surveys and focus groups, but the ability to forecast the impact of every small decision in close to real time will bring these tensions into sharper focus.
That's why we are not wholly sad that we couldn't predict what was to come. The more we turn to forecasts, the more we change the present - and sometimes for the worse. Even King Croesus would appreciate that.
See gallery: "Brains, delusion, mummies: Best New Scientist covers"
Paula, Trisha, George and Judith live in Austria. Recently we asked them to predict the future. Our unlikely clairvoyants are pigeons.
The pigeon quartet was a potential wild-card entrant in a contest we have been running for the past few months. Various teams have been using data science to try to predict the weekly sale figures of New Scientist magazine (see main story). Unfortunately the birds failed early on.
While pitting pigeons against top number-crunchers might seem fanciful, if the visual structure of the magazine cover has a major impact on sales, then the birds should have had a decent shot. Pigeons are capable of impressive feats of visual-discrimination learning.
Tanja Kleinhappel, a student in the lab of Ludwig Huber at the University of Vienna, Austria, used food rewards to train the pigeons to distinguish between high and low-selling covers. It took a while, but each of the birds learned to categorise the covers according to sales with more than 80 per cent reliability.
Sadly, when Kleinhappel started introducing new covers, the pigeons' performance dropped back to the chance level. They had learned how the covers they were trained on had sold, but had not latched on to an underlying pattern correlated with sales.
One consolation? Part of our editor's job is unlikely to be outsourced to the pigeon coop any time soon.
Meet the Kagglers
Wouldn't it be useful to have an accurate traffic forecast? In Sydney, Australia, road-surface sensors log passing cars to gauge congestion. To find out if this data could be used to forecast travel times, the government of New South Wales turned to Kaggle, a company that runs forecasting contests.
Kaggle has run 20 competitions to date. In one ongoing challenge, $3 million is up for grabs for anyone who can use health records to predict which people will end up in hospital.
For the traffic challenge, entrants had access to just over a year's worth of sensor readings, together with associated travel times. They had three months to search for patterns that could predict traffic conditions up to 24 hours ahead. At stake was a prize of A$10,000.
José González-Brenes of Carnegie Mellon University in Pittsburgh, Pennsylvania, took on the challenge. He and teammate Guido Matias Cortes of the University of British Columbia in Vancouver, Canada, came up with a "decision tree" model. They began by grouping the historical data into chunks using a series of questions such as "was it a weekday?" and "was traffic busy on neighbouring roads?".
Each chunk was associated with a characteristic set of travel times, so to make a forecast, González-Brenes and Cortes just had to use the same set of questions to determine which category any set of new conditions would best fit into. Their algorithm was able to predict travel times over a 500-metre stretch of road 24-hours in advance to within about 5 seconds - good enough to take first prize.
Jim Giles is a consultant for New Scientist. Peter Aldhous is our San Francisco bureau chief