I have been keeping track of the University of East Anglia's "emailgate" story, not so much for the specifics of what the members of the University's Climate Research Unit did or did not say or do, but for the broader debates around scientific processes, including peer review and data release.
I have been spurred to write this blog entry by an article on emailgate by George Monbiot in the Guardian and an article written by the computer scientist Steve Easterbrook.
Monbiot has been appalled by the responses of scientists to freedom of information requests. Now after reading Easterbrook's description of scientists' behaviour he says he is beginning to understand. The picture Easterbrook paints is not one I fully recognise or accept. Easterbrook says that most academics have big egos and thick skins, This, he argues is the way that we get things published in the face of critical peer review. This makes sense--these things are helpful--but it does not mean that academics have license to "always be rude to ignorant and lazy people".
Academics have the responsibility to test their ideas dispassionately and when they are convinced about their robustness, to argue and communicate the ideas with passion -- until someone comes along and does serious damage to the plausibility of said ideas. I'm reading Manjit Kumar's "Quantum" and I like Max Planck's quote on this "consider every step carefully in advance, but then, if you believe you can take responsibility for it, let nothing stop you". But by Kumar's accounts, Planck was courteous, in public AND in private.
The bit of Easterbrook's piece that disturbs me the most is the derision he heaps upon those who attempt to get datasets via freedom of information requests. He says that the way others get data is to stroke the holder's ego, or propose joint work, or reconstruct the data yourself.
It's true that most data sets contain a huge amount of value added from the researchers who have "cleaned" the data (removing obviously incorrect values, filling in gaps, smoothing implausible discontinuities etc.) and have constructed new variables. But the source data are taxpayer supported. And most of the value added is taxpayer supported. And on the topic of climate change, the consequences of making a mistake with the data are huge.
So it's not good enough to have to rely on stroking, cajoling and duplicating from scratch in order to try to reproduce someone's work. Scientists are weakening the peer review process they think is such a "crap shoot" by denying others' access to data sets they have constructed.
Clearly we need to find ways of balancing the protection of datasets as intellectual property and as the creation of global public goods. More journals should insist on making the datasets available upon publication of the papers that rely on them. All research funders should insist on data sets being made publicly available in a readily usable form, with an appropriate negotiated delay.
My previous organisation, IFPRI has a data release policy (which I helped author) and a good implementation record on making data available. My current organisation, IDS, does not have the same core resources to do this. Most UK research funders are lax about making data they fund publicly available in a usable form and so the entire UK sector is, I suspect, underproviding these public goods.
Politeness from researchers is probably too much to ask for, but the data they work with should not be.