Duignan’s Impact Evaluation Feasibility Check

Assessing the feasibility of seven major types of impact evaluation design

A major problem in designing evaluations comes when you attempt to identify which type of impact evaluation design should be used. Duignan has identified the seven major types of impact evaluation designs generally used in program evaluation and uses this list as the basis for the Duignan Impact Evaluation Feasibility Check (Duignan 2009). The approach works by going through each of the seven major impact evaluation design types and assessing which are appropriate, feasible, affordable and/or credible. The different design types are assessed against these criteria for their fit with the particular program, organization or intervention that is being evaluated. 

Some impact evaluation designs are relatively easy to set up and do. Such designs could be undertaken by a wide range of evaluators or researchers. An example of one of these designs is Impact Evaluation Design Type 6: Key Informant Judgment Design. In this design type you just ask people who are likely to be knowledgeable about a program whether they think the program or organization has made a difference to high-level outcomes. Other design types, like Time Series Designs and True Experiments and Regression Discontinuity Designs are much more technical and will usually require that you ask someone with technical evaluation expertise whether or not they are applicable in the case of a particular program or organization you are evaluating. You will also probably need technical guidance in setting up and analyzing the results from these types of designs.

The seven major impact evaluation design types are listed below and an explanation is given for each of them. The idea is that for any program or organization, an evaluation planner looks at each of these design types and sees if any of them are applicable in the case of the particular program or organization being evaluated.


1. True Experiments and related designs

In true Randomized Experimental Designs, people, organizations, regions or other units are assigned randomly to an intervention and a control group. These are sometimes called Randomized Controlled Trials (RCTs). The intervention group receives the intervention and its outcomes are compared to those for the control group which does not. Because of the use of random assignment, all other confounding factors which may have produced the outcomes can be ruled out as unlikely to have created any observed improvement in high-level outcomes. 

Regression Discontinuity Designs are an adaptation of true randomized experiments. In them, people, organizations, regions or units are quantitatively ranked or distinguished. For example they might be ranked from those with the lowest pre-intervention outcome scores to those with the highest. For this version of the design, imagine that the units have been graphed and the intervention is implemented above or below a cut-off point in that graph of their ranking. Any improvement in the intervention group that is effective should appear on the graph at the cut-off point between those units which received the intervention and those which did not. One advantage of this design over a true experiment is that it is often seen as a more ethical design because the treatment is given to those most in need (i.e. those below the cut-off point).


3. Time Series Designs

In time series designs, a number of measurements of an outcome are taken over a substantial period of time. Then an intervention is introduced at a specific point in time. If it is successful, it is expected that there would be an improvement at exactly the time when the intervention was introduced. Because a series of measures have been taken over time, it is possible to look at the point in time when the intervention was introduced and ask the question as to whether an improvement is shown at that point in time, for instance when one graphs the outcomes. More sophisticated versions of these designs involve stopping and starting an intervention and looking for effects at these points in time.


4. Constructed comparison group Designs

In constructed matched comparison group designs, the attempt is made to identify or create a comparison group which does not receive the intervention. This group is then used to compare outcomes with the group which does receive the intervention. For instance, one might find a similar community with which one could compare an intervention community. 

In some cases, called propensity matching, statistical methods are used to work out what 'is likely to have happened’ to outcomes for a particular type of person or unit if they did not receive the intervention. This is done by looking at the outcomes for many people or units which did not receive the intervention. This then gives you an average estimate of what is likely to happen to people or units that do not receive the intervention. You can then compare this with the outcomes of those who did receive the intervention.


5. Exhaustive Causal Identification and Elimination Designs

In exhaustive alternative causal identification and elimination designs there needs to be a good way of measuring whether or not outcomes have occurred. Then all of the alternative explanations as to why outcomes might have occurred need to be detailed. Alternative explanations are then eliminated by logical analysis, and using any empirical data available. 

If all alternative explanations can be eliminated, it leaves the intervention as the only credible explanation as to why outcomes have improved. 


5. Expert Judgment Designs

In expert judgment designs, an expert, or an expert panel, is just asked to make a judgment as to whether, in their opinion (using whatever method they usually use in making such judgments) they believe that the program has had an effect on improving high-level outcomes. This type of evaluation design is sometimes called a 'connoisseur' evaluation design drawing on an analogy with connoisseur judges such as wine tasters. Obviously this type of design can only be used in cases where you can locate people who are experts in the type of program being evaluated. 

This type of impact evaluation design is sometimes seen as less robust and credible by some stakeholders than the designs listed above. However it is usually much more feasible and affordable.

6. Key Informant Judgment Designs

In key informant judgment designs, key informants (people who are likely to be in a position to be knowledgeable about what is happening in a program and whether it impacted on high-level outcomes) are asked to make a judgment regarding whether they think that the program has actually affected high-level outcomes (using whatever method they want to use to make such judgments). This type of impact evaluation design is sometimes seen as less robust and credible by some stakeholders than the designs listed above. However it is usually much more feasible and affordable.


7. Intervention Logic (Program Theory/Theory of Change) Based Designs

In intervention logic designs, the attempt is first made to establish a credible 'intervention logic' (program theory/theory of change/outcomes model) for the program or organization. This logic sets out the way in which it is believed that lower-level program activities will logically lead on to cause higher-level outcomes. This logic is then endorsed either by showing that previous evidence shows that it does work in cases similar to the one being evaluated, or by experts on the topic endorsing the logic as being a credible logic. It is then established that lower-level activities have actually occurred (relatively easy to do because they tend to be able to be measured by controllable indicators). The final step in the process is to just assume (but know that you have not proved) that the lower-level activities did, in this particular instance did, in fact, cause higher-level outcomes to occur.


Using Duignan's Tool - an example

In practice, Duignan's Impact Evaluation Feasibility Check is used by going through each of these seven major impact evaluation design types and assessing the appropriateness, feasibility, affordability and credibility of each of them.

Below is an example of part of the analysis of the set of seven impact evaluation major design types for impact evaluation of a new national building regulation regime designed to improve the performance of new buildings regulation within a country.


Duignan's Impact Evaluation Feasibility Check for a New National Building Regulatory Regime

An evaluation plan for a national new building regulatory regime has been developed (it is available here). The section below presents the analysis from that evaluation plan related to impact evaluation feasibility. The new building regulatory regime was introduced as a consequence of the failure (due to leaking) of a number of buildings built under the previous national building regulatory regime. The analysis of the possible impact evaluation designs is given below:

1. True experimental designs and regression-discontinuity designs

True experiment 

NOT CONSIDERED FEASIBLE. This design would set up a comparison between a group which receives the intervention and a group (ideally randomly selected from the same pool) which does not. For ethical, political, legal and design-compromise reasons it is not possible to implement the interventions in one or more localities while other localities (serving as a control group) do not have the intervention. Apart from anything else, statutory regulation could not be imposed on only one part of the country. 

In addition, there is a major impact evaluation design-compromise problem arising from a situation developing where there might be a high standard of new building work in one locality (the intervention locality) but not in another (the control locality). It is likely that compensatory rivalry could reduce any difference in outcomes between the intervention and control group.  Compensatory rivalry is where the control locality also implements an intervention similar to that which is being evaluated because it also wants to achieve the outcomes which are as important to it as they are to the intervention locality. When the control locality finds out about the details of the intervention being implemented in the intervention locality, it starts to copy the control locality’s intervention and this makes it less likely that any effect will be found.

Regression-discontinuity design

NOT CONSIDERED FEASIBLE. One example of this design type is if you graphed the localities which could potentially receive the intervention on a measurable continuum (e.g. the quality of buildings in the locality). The intervention would then only be applied to those localities below a certain cut-off level. Any effect should show as an upwards shift of the graph at the cut-off point.

In theory it would be possible to rank local regions in order of the quality of their new building work. If resources for the intervention were limited it would be ethical to only intervene in those localities with the worst new building work occurring and hence establish a regression discontinuity design. However, the political, legal and design-compromise (as in the above true experimental design) mean that a regression-discontinuity design does not seem to be feasible in this instance.


Time-series design

NOT CONSIDERED FEASIBLE. This design measures an outcome a large number of times (say 30) and then looks to see if there is a clear change at the point in time when the intervention was introduced. This design would be possible if multiple measures of new building quality were available over a lengthy (say 20 year) time series which could then continue to be tracked over the course of the intervention. However this design has the design-compromise problem that there is another major factor – which can be termed the ‘crystallization of liability’ which is occurring at the same time as the introduction of the new building regulatory regime. 

The crystallization of liability is a consequence of all the stakeholders now becoming aware of the liability they can be exposed to due to failure of many buildings and the attendant liability claims which have arisen from them. It should be noted that this crystallization, of course, does not mean that any available time series data cannot be used as a way of tracking the not-necessarily controllable* indicator of quality of new building work over time. It is just that any such time series analysis would be silent on the question of attribution of any improvement in building quality to the new building regulatory regime rather than crystallization of liability being the causal factor.


Constructed matched comparison group design

NOT CONSIDERED FEASIBLE. This design would attempt to locate a group which is matched to the intervention group on all important variables apart from receiving the intervention. This would require the construction (identification) of a comparison group not subject to a change in its regulatory regime, ideally over the same time period as the intervention. Since the new building regulatory regime is a national intervention such a comparison group will not be able to be located within the country in question. It is theoretically possible that one or more comparison groups could be constructed from other countries or regions within other countries. 

However discussions so far with experts in the area have concluded that it is virtually impossible for a country or region to be identified which could be used in a way that meets the assumptions of this design. These assumptions are: that the initial regulatory regime in the other country was the same; that the conditions new buildings are exposed to in the other country are similar; that the authorities in the other country do not respond to new building quality issues by changing the regulatory regime themselves; and that there are sufficient valid and reliable ways of measuring new building quality in both countries before and after the intervention. It should be noted that while some of these assumptions may be met in regard to some overseas countries, all of them would need to be met for a particular country to provide an appropriate comparison group within a constructed matched comparison group design.


Causal identification and elimination design

CONSIDERED LOW FEASIBILITY. This design works through first identifying that there has been a change in observed outcomes and then undertaking a detailed analysis of all of the possible causes of a change in the outcome and elimination of all other causes apart from the intervention. In some cases it is possible to develop a detailed list of possible causes of observed outcomes and then to use a ‘forensic’ type process (just as a detective does) to identify what is most likely to have created the observed effect. This goes far beyond just accumulating evidence as to why it may be possible to explain the observed outcome by way of the intervention and requires that the alternative explanations be eliminated as having caused the outcome. 

This may not be possible in this case due to the concurrent crystallization of liability, discussed above, which is occurring in the same timeframe as the intervention. It is likely that this other cause is significantly intertwined with the intervention in being responsible for any change that occurs in new building practice. Therefore it will be impossible to disaggregate the effect of the intervention from the effect of crystallization of liability. However some more work should be done to definitively establish that this  design is not feasible.


Expert judgment design

CONSIDERED HIGH APPROPRIATENESS, HIGH FEASIBILITY, HIGH AFFORDABILITY, LOWER CREDIBILITY. This design consists of asking a subject expert(s) to analyze a situation in a way that makes sense to them and to assess whether on balance they accept the hypothesis that the intervention may have caused the outcome. One or more well-regarded and appropriate independent expert(s) in building regulation (presumably from overseas in order to ensure independence) could be asked to visit the country and to assess whether they believe that any change in the new building outcomes is a result of the new building regulatory regime. This would be based on their professional judgment and they would take into account what data they believe they require in order to make their judgment. Their report would spell out the basis on which they made their judgment. 

This approach is highly feasible but provides a significantly lower level of certainty than the above impact evaluation designs. If this design is used then the evaluation question being answered should always be clearly identified as: In the opinion of an independent expert(s) has the new building regulatory regime led to an improvement in building outcomes? There are obvious linkages between this design and the causal identification and elimination design above and further work looking at that design should also look in detail at the possibilities for the expert judgment design. 


Expert judgment design

CONSIDERED HIGH APPROPRIATENESS, HIGH FEASIBILITY, HIGH AFFORDABILITY, LOWER CREDIBILITY. A key informant judgment design is also highly feasible. This design consists of asking key informants (people who have access by virtue of their position to knowledge about what has occurred regarding the intervention) to analyze the success of the program in a way that makes sense to them and to assess whether on balance they accept the idea that the intervention may have caused the outcome. A selection of stakeholder key informants could be interviewed in face-to-face interviews and their opinions regarding what outcomes can be attributed to the new building regime could be summarized and analyzed in order to draw general conclusions about the effect of the intervention. This could be linked in with an expert judgment and a causal elimination design as are described above. 


Intervention logic designs

CONSIDERED HIGH APPROPRIATENESS, HIGH FEASIBILITY, HIGH AFFORDABILITY, LOWER CREDIBILITY. In this design the logic of how it is thought the intervention will work is spelt out in a logic model, theory of change or outcomes model. This is then validated against existing evidence and its credibility assessed by experts in the field. If it is deemed to be a credible logic, it just needs to be established taht the lower-levels of the logic model have occurred and then it is assumed that their occurrence in this instance has led to the higher-level outcomes occurring


Use of the approach

Anyone can use the above material when doing evaluation planning for their own organization or for-profit or not-for-profit consulting work as long as they acknowledge that they are using the Duignan Impact Evaluation Feasibility Check. However you can't embed the approach into software or web-based systems without permission. If you want to embed it in software or web-based systems please contact admin@parkerduignan.com.

Reference to cite in regard to acknowleding this work: Duignan, P. (2009). A concise framework for thinking about the types of evidence provided by monitoring and evaluation. Australasian Evaluation Society International Conference, Canberra, Australia, 31 August – 4 September 2009. See Duignan’s Impact Evaluation Feasibility Check http://parkerduigan.com/m/impact-evaluation.

*For more information on what is meant by ‘not-necessarily controllable indicators’ please see the relevant outcomes theory principle.

© Parker Duignan 2013-2017. Parker Duignan is a trading name of The Ideas Web Ltd.