06 April 2010

Never mind the politeness, just give me the data

I have been keeping track of the University of East Anglia's "emailgate" story, not so much for the specifics of what the members of the University's Climate Research Unit did or did not say or do, but for the broader debates around scientific processes, including peer review and data release.

I have been spurred to write this blog entry by an article on emailgate by George Monbiot in the Guardian and an article written by the computer scientist Steve Easterbrook.

Monbiot has been appalled by the responses of scientists to freedom of information requests. Now after reading Easterbrook's description of scientists' behaviour he says he is beginning to understand. The picture Easterbrook paints is not one I fully recognise or accept. Easterbrook says that most academics have big egos and thick skins, This, he argues is the way that we get things published in the face of critical peer review. This makes sense--these things are helpful--but it does not mean that academics have license to "always be rude to ignorant and lazy people".

Academics have the responsibility to test their ideas dispassionately and when they are convinced about their robustness, to argue and communicate the ideas with passion -- until someone comes along and does serious damage to the plausibility of said ideas. I'm reading Manjit Kumar's "Quantum" and I like Max Planck's quote on this "consider every step carefully in advance, but then, if you believe you can take responsibility for it, let nothing stop you". But by Kumar's accounts, Planck was courteous, in public AND in private.

The bit of Easterbrook's piece that disturbs me the most is the derision he heaps upon those who attempt to get datasets via freedom of information requests. He says that the way others get data is to stroke the holder's ego, or propose joint work, or reconstruct the data yourself.

It's true that most data sets contain a huge amount of value added from the researchers who have "cleaned" the data (removing obviously incorrect values, filling in gaps, smoothing implausible discontinuities etc.) and have constructed new variables. But the source data are taxpayer supported. And most of the value added is taxpayer supported. And on the topic of climate change, the consequences of making a mistake with the data are huge.

So it's not good enough to have to rely on stroking, cajoling and duplicating from scratch in order to try to reproduce someone's work. Scientists are weakening the peer review process they think is such a "crap shoot" by denying others' access to data sets they have constructed.

Clearly we need to find ways of balancing the protection of datasets as intellectual property and as the creation of global public goods. More journals should insist on making the datasets available upon publication of the papers that rely on them. All research funders should insist on data sets being made publicly available in a readily usable form, with an appropriate negotiated delay.

My previous organisation, IFPRI has a data release policy (which I helped author) and a good implementation record on making data available. My current organisation, IDS, does not have the same core resources to do this. Most UK research funders are lax about making data they fund publicly available in a usable form and so the entire UK sector is, I suspect, underproviding these public goods.

Politeness from researchers is probably too much to ask for, but the data they work with should not be.


Steve Easterbrook said...

Your points would be more cogent if you probed the nature of the data that was requested in these FoI requests. It wasn't data from publicly funded research, it was commercially sensitive data from a large number of different national meteorological services around the world. The UEA had signed agreements with these services in order to secure access to this data. The people seeking the data could have gone and gotten it from source (and would have had to sign similar agreements), but they were never interested in the data, they were pursuing an obsessive campaign to discredit scientists.

In climate science, virtually all data is freely available, and obsessively curated at a number of web-based data repositories. Go look at the BADC website if you don't believe it:

You can comment all you like on the way things ought to be, but the sad fact these days is there isn't enough public funding to support our public research institutes, and more of what we do has been privatized and monetized.

All the best,

Lawrence Haddad said...

Hi Steve, yes, my point was not specifically about the climate datasets, because I don't know much about them at all, but on the more general points you were making about data release.

I agreed with you on the ways in which data are currently released, but I think this is unacceptable.

Whether public funding for research is going up or down (and it has gone up quite substantially in the social sciences in the past 10 years) I think journals and funders could and should do more to incentivise data sharing to strengthen peer review and maximise transparency.

Many thanks for your comments.

Best, Lawrence