Friday, 15 May 2015

Blinded by Numbers

There's an amusing article here with lots of inside info from the Labour party's disastrous campaign in the recent general election. This one passage though really stood out as explaining much that is wrong with the way the world is run presently. It is attributed to an unnamed Labour MP and appears to have been spoken last year, though no firm date is given.

 ‘I sat in a room and saw Greenberg and Morris explain how it was structurally impossible — impossible — for us to poll less than 35 per cent in this election,’ said one. In the event, it was 30 per cent, against the Conservatives’ 37 per cent. A far cry from the tie predicted by most pollsters.

It might sound like normal election campaign hubris, but as so often with such bold pronouncements based on statistics, it's real-world implications are worth considering a little more closely. 

For a bit of background, Stan Greenberg is one of the founders of Greenberg, Quinlan, Rosner Research (GQRR), a major polling and research firm long associated with the Democratic party in the US and Labour in the UK. James Morris is the European Director for GQRR, a former speech writer for Tony Blair. 

These two men, with their terrific combined experience felt safe to say ahead of time that it was "structurally impossible" for less than 35% of the electorate to vote Labour. They had made this pronouncement on the strength of extensive polling data and the advanced statistical techniques, and it was completely wrong. In the event Labour polled 30% to the Conservatives 37%. However right up until the early hours of the 8th of May, as the article says, Miliband firmly believed he was going to be Prime Minister. He was even busily writing a victory speech after the polls closed, and the traditionally much more accurate exit poll showed the extent of the damage to Labour's vote.

You could dismiss it all as politicians seeing what they want to see during a close run election campaign, but I believe that is highlights something more profoundly wrong with the world, and especially with the business of government, and how it uses statistics. Don't forget that while this ultimately trivial example relates to an election defeat, similar such numbers are put forward all the time to "prove" whatever point it is that the government or the civil service wish to make. 

People forget that statistics are a proxy for a far more complex and chaotic reality; and quite often it appears, people believe that some advanced statistical techniques can make up for a fundamental lack of data. People are awed by this and believe that such advanced and scientific methodology must surely be accurate. They could not be more wrong.

Over 30 million people voted last Thursday, from wild and remote Orkney to cosmopolitan Islington and all points in between. To imagine that a poll of a few hundred, or even a few thousand people a week, never mind 6 months before the election itself will give you a realistic idea of how people will vote on polling day is a nonsense. However it is a nonsense upon which millions of pounds of campaign expenditure were lavished, and important policy decisions were made, and exactly the kind of nonsense upon which government, politicians and so called experts base the regulations and policies we live under.

The vast industry that has built up around telling us how we can avert global warming is built around predictions of a 0.7ºC rise in global temperatures over the next 30 years, and as the linked article points out the IPCC has a history of wildly over predicting such changes anyway. Even so 0.7ºC is barely perceptible. Drawing from this flawed estimate the apocalyptic conclusion that the ice caps will melt, the seas will rise and famine, pestilence and death will follow is ludicrous, as is sacrificing our economic well being, and compromising such obviously beneficial things as private transport or electricity on the altar of this guess. These models may well use advanced techniques but they are based on a miniscule amount of questionable data, and predicting a change which is within any sensible margin of statistical error.

It isn't just politics where people misuse statistics rather than deal with reality, either. Cars are almost entirely designed by numbers, and car companies developing a new model must match or beat rivals on a whole range of metrics from boot space, to fuel consumption, by way of rear passenger legroom and in the case of anything mildly sporty, Nurburgring lap time. I have heard people actually judge the performance of cars entirely on the basis of these numbers, then puzzle over why one car that is faster than another round the Nurburgring in a car magazine is not comparably faster around a flat, sweeping airfield track in England. As though somehow the numbers better represented reality than the reality itself.

Other consumer goods are the same. In fact nearly every walk of life is dominated by this obsessive apophenia, and I believe it most often hinders rather than helps the process of improving our environment to suit our needs.

Perhaps the most insane application of this cultish belief in the power of numbers is the black art of quants trading. At once a very powerful and completely nonsensical method of trading financial instruments based on analysing past data, hedge funds and global banks "invest" trillions this way on little more than a bet that the patterns in the data analysed will continue to repeat themselves. 

Nicholas Taleb has gone into a lot more detial on this in his in his books, but suffice to say that while usually successful for a time these models are prone to serious failures as well, and the consequences are often disastrous, and extremely expensive. 

So what to do? Ford and Toyota can't very well go around designing their cars as per the specification of each individual customer, anymore than GQRR or the Labour Party can go out and ask 30 million people how they intend to vote and what might change their minds over the next 6 months. The IPCC can simply disappear, but that's another story. You do need to draw inferences from sample data, and numbers are a very efficient way of simplifying this. What I would suggest is that three points be always kept in mind when using such data to make predictions or decisions:

1) That no amount or complexity of numbers will give you a qualitative assessment. I believe car makers, politicians and others would do very well to spend a few minutes talking to a handful of car buyers or voters than asking a series of closed questions to thousands upon thousands.

2) That people forming data driven surveys tend to set out with a result in mind and then find it. There's a wonderful exchange in Yes Prime Minister which illustrates this point very well. 

3) That a small sample is not reality, and should be used only as a rough guide. To say that for instance a political party is guaranteed 35% of the vote weeks or even months before an election is foolish in the extreme. That the partners of a major polling company didn't see this is absurd.

Ultimately it comes down to taking numerical data as just one factor to consider when making decisions, but also having the confidence to question it and the wisdom to over-ride it. Many times the flaws in numerical data are quite obvious when the results are assessed in the light of actual qualitative feedback drawn from a far smaller sample. Even after this then you still need to qualify all sampling with a bit of experience and a bit of inspiration to make sure you end up with and end product which is designed with intelligence and thought, by and for a human being.