26 June 2012

Impact Horses for Impact Courses: Two Helpful New Reviews


As the Trini Lopez song goes, "If I had a hammer, I'd hammer in the morning, I'd hammer in the evening, all over this land, I'd hammer out danger, I'd hammer out a warning, I'd hammer out the love between my brothers and my sisters, all over this land".

For "hammer" read "my favourite evaluation method".

For "brothers and sisters" read, say,  "randomistas" and "realistas" or "propensitistas" and "participatoristas".

In other words, how do we live up to the ideal of the diagram above (from the Stern et. al. paper reviewed below), whereby we let: (1) the question, (2) the available evaluation tools and (3) the attributes of the programme being evaluated, co-determine the impact design?

One of the ways of moving towards this ideal is knowing more about the "available designs".  Let's face it, we all learn--at most--one or two of these techniques and the effort of learning more is large.

This lack of familiarity with a broad range of techniques makes us wary of ones we do not know well.  Every evaluation problem looks like the nail that our own particular hammer is well suited to hit.

Two new publications set out to help us expand the set of available impact designs and to help identify some common traits across them :  


The goal of the Stern et. al. report:  

"Up to now most investment in IE has gone into a narrow range of mainly experimental and statistical methods and designs that according to the study’s Terms of Reference, DFID has found are only applicable to a small proportion of their current programme portfolio. This study is intended to broaden that range and open up complex and difficult to evaluate programmes to the possibility of IE."

The goal of the White and Phillips report:

"This increased demand for results has resulted in a growth in large n impact evaluations involving tests of statistical significance between outcomes for treatment and comparison groups. However, this type of statistical analysis is not suitable for all evaluations. When there are insufficient units of assignment to conduct tests of statistical significance, then small n approaches are the best available option. However, while there is considerable consensus among large n researchers concerning what constitutes a valid approach, the same is not true for small n approaches. In fact, there is a plethora of possible approaches available to an evaluator, but little consensus concerning what approach, methodology, or methods are most suitable for an impact evaluation, and what constitutes valid causal evidence. This paper seeks to advance this discussion."

Both reports are very long and technical, so I won't go into much detail here.  But they are both excellent and well worth reading (and will, I suspect, prove most useful as reference works).

Some takeaways:

* The field of development evaluation is very technical.  It is not something for dilettantes: if you are serious about it, work with experts

* There is a wide range of approaches out there:  

The Stern paper outlines 5 categories: experimental, statistical, theory based, case based and participatory. 

The White and Phillips paper discusses only "small n" approaches (where there are not many units of observation): 
  • Group 1 approaches, where establishing causality is a key goal:  Realist Evaluations where mechanisms x context=outcomes; General Elimination Methodology which draws up lists of possible causes and then eliminates the least plausible; Process Tracing (as pioneered by Oxfam); Contribution Analysis-building the most plausible and credible contribution story. 
  • Group 2 approaches are less concerned with establishing cause and effect and are more interested in finding out about the drivers of positive change and the role the programme is playing. Examples here include: Most Significant Change (strength is in identifying diverse or unexpected changes); Success Case Method (which is interested in exploring particularly successful or unsuccessful cases); Outcome Mapping (as pioneered by IDRC -- tracking steps between programme activities and programme outcomes); MAPP, a participatory approach (with a quantitative step) to understanding the perceived influence of a development project not he daily life of the population)


* Picking the right approach requires a lot of discipline--when you are writing an evaluation proposal to, say, a 20 day timeline without detailed knowledge of how a programme works or detailed contextual evidence, and without knowledge of the different methods available, you are going to fall back on the tried and trusted (but probably not most appropriate) methods.  Funders of evaluations need to give the evaluators more time at the beginning of a project

* "Mixed methods" is not just about mixing quantitative and qualitative methods, it is about mixing approaches too (say realist and experimental)

* There is still some disagreement on the usefulness of the experimental and statistical approaches to complex interventions--the Stern paper implying that the other 3 approaches are more suited (theory based, case based, participatory) while the White and Phillips paper arguing that it depends on the complexity and it depends on whether you want to assess the overall impact, because the complexity may be too much for any method

* Biases are everywhere.  The White and Phillips paper is particularly strong on this--pointing out the biases in large-n approaches, but also the checks and balances that are in place (which can be verified by others), while arguing that these may not be in place, or so easily checked by others, in the small-n methods. 

Conclusion? Not only do we need a lot of tools in the toolbox, but we need lots of skilled craftswomen and men to use them, with programme implementers and funders giving them enough time and support to get the evaluation done properly.   These two papers set out the magnitude of the task but also help us move to the  ideal solution.  


No comments: