Development Horizons by Lawrence Haddad: Data Mining and Homer Simpson

It is annual leave time in Europe and a chance to catch up on some reading. And when I’m not reading Robertson Davies’ “Deptford Trilogy” (brilliant, by the way) I have been going through some reports and books that have been lingering near my in tray. First up is an article in the New York Times entitled “In the age of data, nerd is the word” where Hal Varian (he of the famous economics text book and now chief economist at Google) and others reflect on the growing trend of trawling massive data sets (generated by web hits) and looking for things that are statistically odd. The job of statistician will become “sexy” says Varian. While this might be pushing things a bit far, we certainly are in an increasingly data-rich world.

Who is going to make sense of all this data? The statisticians are necessary but not sufficient. Why is that? Because the ways in which data are manipulated reflect political and cultural processes. Analysts have their own agendas (Diana Coyle’s excellent book, Economics: The Soulful Science, reminds us that economists in the US vote one way in general elections, and anthropologists vote in a very different way) as do the organisations they work for. The new data sets they work with do not represent a global census, much as they may appear to be, they are samples—big ones—and very partial, dominated by the US and Europe and by the wealthier groups in those regions.

Knowledge is power, yes. But power and culture also shape knowledge. Dan Gardner’s latest book, Risk: The Science and Politics of Fear, reminds us of the classic Homer Simpson quote “people can come up with statistics to prove anything.. forty percent of all people know that”. Gardner goes through all the traps of interpreting data—looking for evidence that lines up with what one already thinks (confirmation bias), group-think (social cascading), the tendency for people to think something that is easy to visualise occurs more frequently than it really does (the example rule), our tendency to initially view new information as all good or all bad (the good-bad rule) and the power of stereotypes to frame new information and data (the rule of typical things). Statisticians are people too and will suffer from these biases no matter how rigorous their methods are because their models kick in largely after the questions have been framed and before the conclusions are drawn.

What does this have to do with international development? First, the development community is not spending enough time thinking about the potential of the internet to fix some of the things wrong with development (e.g. accountability to people living in poverty and ways of communicating to donor country citizens about what aid is and is not good for). Second, we need to be aware of the potential of the internet to make some things worse (e.g. generating a knowledge platform that privileges those with the greatest capacity to generate information which is partial). Third, I am worried that the age of data will undermine critical analysis. Hypotheses, true causality, evidence based policymaking and commons sense may become victims of the new data miners unless we put checks and balances in place. As the pressure on aid budgets increases and as the numbers in poverty grow we need to know more about where aid works and where it does not. These are themes that IDS is currently trying to raise funds for.

09 August 2009

Data Mining and Homer Simpson

1 comment: