Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Doctoral Thesis
Revisiting the OECD/DAC Evaluation Criteria
Applying Cost-Benefit Analysis Framework
비용편익분석 틀을 적용한
OECD/DAC 개발평가기준의 비판적 검토
February 2018
Graduate School of International Studies
Seoul National University
Eunsuk Lee
i
ABSTRACT
In the field of development evaluation, the OECD DAC’s five criteria have
become an influential evaluation framework, adopted by most of the aid
agencies. This study examines how the DAC criteria framework affects the
evaluation results from the perspective of cost-benefit analysis (CBA).
The study focuses on the question whether the use of DAC criteria tends
to produce positively biased evaluation results. If the risk of positive bias is
inherent in the pre-determined criteria, it inevitably harms the credibility and
validity of evaluations. Positive bias may also reduce comparability between
evaluations that is one of the purposes for which the DAC criteria were
developed, consequently making it difficult to differentiate a more successful
intervention from a less one and possibly causing inconsistency across
evaluation results.
Implementing the ‘general logic of evaluation’ as the analytical
framework, the study organizes the analyses into three stages, dealing with: first,
the notion of merit or success that the DAC criteria define; second, the standard
of judgement and source of supporting evidence in each criterion; and third, the
method of synthesis with which the assessments in the five criteria are
integrated into the overall evaluative conclusion on the intervention. CBA
provides a conceptual framework through which the evaluation results applying
the DAC criteria can be analyzed in comparison.
Given that the term ‘evaluation criteria’ is defined as the aspects,
qualities, or dimensions that distinguish a more valuable evaluand from a less
valuable one, the DAC criteria, namely relevance, effectiveness, efficiency,
impact and sustainability, would represent the merit of the program being
evaluated, encompassing the properties that constitute a successful intervention.
ii
The study finds that the DAC criteria cover most of the dimensions that general
evaluation models suggest, but with some limitations. The ‘relevance’
criterion primarily focuses on the policy context and asks ex-ante questions
with less attention to the need of the program or the logical linkage between the
program and expected results. Measuring ‘effectiveness’, defined as the
achievement of program objectives, and ‘efficiency’, e.g., whether completed
on time and within the budget, is largely based on the assumption that the
program’s initial plan and objectives are valid. The ‘sustainability’ criterion
assumes that the results are beneficial and worth continuing.
This logical interdependence between criteria may affect assessment in
the criterion at issue because the standard of judgement may have to be adjusted
according to the assessment in another criterion. Whether to achieve the
stated objectives, the primary condition to satisfy the effectiveness criterion,
may not be an appropriate standard to judge the program’s success, if the stated
objectives are not valid. So meeting the relevance criterion would be the
precondition for meaningful assessment for effectiveness, otherwise the
standard should be reestablished. The similar interdependence lies between
effectiveness/impact and sustainability. What kinds and magnitude of
benefits can be sustained depends on the analysis in effectiveness and impact
criteria, so the standards of judgement in sustainability criterion should be
drawn accordingly. If the assessments are made separately and integrated into
an overall conclusion without a thoughtful consideration of relative importance,
there is a risk of bias in overall results.
The review of 65 ex-post evaluation reports by two Korean aid agencies,
KOICA and EDCF, confirms the analysis above. The evaluations under
review applied the DAC criteria framework mostly using the standardized
questions and standards of judgment in the guidelines. Some of the questions
address what is taken for granted as a development project or do not necessarily
iii
require in-depth research. In most cases, assessments in each criterion were
made independently, though some of them should have rested on the
information found in other criteria. This leads to high ratings in average
especially in relevance and effectiveness criteria. 88% of the reports
concluded that the evaluand is either successful or very successful, even in the
cases where development results were not deemed to be significant. The
evaluation conclusions are drawn on a simple average of the ratings by criterion,
and a serious flaw detected in one criterion, e.g., sustainability, is often
compensated by high rating in another, e.g., relevance. Such mechanical
applications of the DAC criteria and standardized questions can mislead
evaluation results, in most cases towards positive conclusions, which make it
difficult to understand the true value of the project, to differentiate a more
successful project from a less one, and to draw valid lessons from the evaluation.
To explain the suspected positive bias in the DAC criteria, this study
adopts the criterion used in CBA, namely the net present value (NPV), as a
benchmark. Theoretically, CBA considers all direct and indirect effects as
consequences of a project against costs for a certain period. The NPV, as an
indicator that represents the social value of a project, covers the scope of the
DAC criteria: the need of program (relevance), benefits in relation with costs
(efficiency), increase in social welfare (effectiveness), beneficial effects in
comparison to negative effects (impact), and the duration that the benefits
continue (sustainability). The analysis shows that certain events, which may
reduce the net social benefit seriously making the NPV negative, only partially
affect the assessments in the DAC criteria framework leading to an overall
positive conclusion. A comparative case study of two water supply projects
suggests that the DAC criteria framework may yield positively biased
evaluation results for a project whose net social benefit is relatively small and
even possibly negative, getting same level of assessment as or even higher than
iv
another project whose net social benefit is much larger and stable. The study
also finds that positive bias may occur unevenly between projects, resulting in
inconsistency across evaluation results. This is due to the arbitrary relative
weights placed on the DAC criteria in addition to the problems of uneven
difficulties and imbalanced importance between the criteria. The findings
have significant policy implications, since positive bias and its inconsistent
occurrence in evaluation results may seriously weaken the validity of
evaluations, and consequently mislead the agencies’ learning and decision
making which are the primary purposes of evaluation.
This study contributes to the relatively small body of academic literature
on international development evaluation, by adding value from a new
perspective to look into the DAC criteria through the cost-benefit analysis
framework. It would also enrich the recent discussion on how to improve the
DAC criteria, providing both theoretical and empirical analyses.
Keywords: DAC Criteria, Development Evaluation, Cost-Benefit Analysis,
Evaluation Theory, Evaluation Method, Positive Bias
v
TABLE OF CONTENTS
Abstract ......................................................................................................... i
List of Figures, Tables and Box .................................................................. viii
Chapter 1. Introduction ................................................................................. 1
1.1. Motivation and Purpose of the Study .................................................... 1
1.2. Research Questions ............................................................................... 4
1.3. Organization of the Thesis .................................................................... 6
Chapter 2. Research Design .......................................................................... 7
2.1. Literature Review ................................................................................. 7
2.1.1. Outline of Literature Review ......................................................... 7
2.1.2. Literature in the General Evaluation Discipline ............................ 8
2.1.3. Development Evaluation ............................................................. 19
2.1.4. DAC Evaluation Criteria ............................................................. 22
2.1.5. Cost-Benefit Analysis and Evaluation ......................................... 26
2.2. Concepts and Scope of the Study ........................................................ 31
2.3. Analytical Framework ......................................................................... 35
Chapter 3. Notion of Merit – Definition and Scope of DAC Criteria ...... 38
3.1. Key Requirements for Evaluation Criteria.......................................... 38
3.2. General Criteria in Evaluation Models ............................................... 43
vi
3.2.1. Key Evaluation Checklist (KEC) framework .............................. 44
3.2.2. CIPP Evaluation Model ............................................................... 47
3.2.3. Theory-driven Evaluations .......................................................... 50
3.3. Definitions and Scope of the DAC Evaluation Criteria ...................... 53
3.3.1. Characteristics of the DAC Criteria ............................................. 53
3.3.2. Relevance .................................................................................... 55
3.3.3. Effectiveness ................................................................................ 58
3.3.4. Efficiency..................................................................................... 62
3.3.5. Impact .......................................................................................... 64
3.3.6. Sustainability ............................................................................... 66
3.4. Criterion in Cost-Benefit Analysis – Net Present Value ..................... 69
3.5. Discussion ........................................................................................... 73
Chapter 4. Standard of Judgement – Review of Evaluation Reports in
Korean Agencies .......................................................................... 76
4.1. Overview ............................................................................................. 76
4.2. Result of the Analysis ......................................................................... 80
4.2.1. Relevance .................................................................................... 80
4.2.2. Efficiency..................................................................................... 85
4.2.3. Effectiveness and Impact ............................................................. 88
4.2.4. Sustainability ............................................................................... 93
4.2.5. Discussion.................................................................................... 96
4.3. Standards in the DAC Criteria and Net Present Value ........................ 97
vii
Chapter 5. Method of Synthesis - the DAC Framework and Cost-Benefit
Analysis in Comparison ........................................................... 104
5.1. Overview ........................................................................................... 104
5.2. Comparison of DAC Framework and CBA: Hypothetical Cases ..... 108
5.2.1. Illustrative Comparison between Two Projects ......................... 108
5.2.2. DAC Framework and CBA in Five Scenarios ............................ 112
5.3. Comparative Case Study - Evaluations of Water Supply Projects .... 124
5.3.1. Project Description .................................................................... 124
5.3.2. Evaluation Results in the DAC Criteria Framework ................. 132
5.3.3. Results of Cost-Benefit Analysis ............................................... 136
5.3.3.1. Theoretical Basis of CBA for Water Supply Intervention .. 136
5.3.3.2. Estimating Benefits of Two Water Supply Projects ........... 140
5.3.3.3. Results of Cost-Benefit Analysis in Comparison ............... 145
5.4. Discussion ......................................................................................... 150
Chapter 6. Conclusions .............................................................................. 152
References ................................................................................................... 157
Appendix. List of Evaluation Reports Reviewed in Chapter 4 .............. 168
Abstract in Korean ..................................................................................... 173
viii
LIST OF FIGURES, TABLES AND BOX
Figure 1. Christie and Alkin's Evaluation Theory Tree .................................. 17
Figure 2. Development of Evaluation Paradigm and the DAC Criteria ......... 24
Figure 3. General Logic of Evaluation ........................................................... 34
Figure 4. Analytical Framework of the Study ................................................ 35
Figure 5. The Evaluation Hierarchy ............................................................... 51
Figure 6. Net Present Value (NPV) and DAC Criteria ................................... 71
Figure 7. Comparison of Dimensions or Criteria in Evaluation Models ........ 73
Figure 8. Evaluation Reports Classified by Sector ......................................... 79
Figure 9. Evaluation Questions in Relevance Criterion ................................. 81
Figure 10. Evaluation Questions in Efficiency Criterion ............................... 86
Figure 11. Evaluation Questions in Effectiveness Criterion .......................... 89
Figure 12. Evaluation Questions in Impact Criterion ..................................... 90
Figure 13. Evaluation Questions in Sustainability Criterion .......................... 94
Figure 14. Consumer Surplus of Water Supply Project (1) .......................... 139
Figure 15. Consumer Surplus of Water Supply Project (2) .......................... 139
Table 1. Views on Historical Development of Evaluation ............................. 15
Table 2. Key Requirements of Evaluation Criteria and Possible Sources of Bias
........................................................................................................................ 42
Table 3. Rating System and Soring Scale ....................................................... 77
Table 4. Descriptive Data of the Samples ...................................................... 78
Table 5. Main Standards of Judgment in Relevance Criterion ....................... 82
Table 6. Average Ratings on Relevance ......................................................... 84
Table 7. Main Standards of Judgment in Efficiency Criterion ....................... 87
Table 8. Average Ratings on Efficiency ......................................................... 88
ix
Table 9. Main Standards of Judgment in Effectiveness/Impact Criteria ........ 92
Table 10. Average Ratings on Effectiveness/Impact ...................................... 93
Table 11. Main Standards of Judgment in Sustainability Criterion ................ 94
Table 12. Average Ratings on Sustainability .................................................. 95
Table 13. Average Overall Ratings of All Evaluations (2013-2015) .............. 97
Table 14. Comparison of Two Hypothetical Projects .................................... 110
Table 15. Net Present Value of Base Case ..................................................... 113
Table 16. NPV and Effectiveness .................................................................. 116
Table 17. NPV and Efficiency ....................................................................... 118
Table 18. NPV and Negative Externality ...................................................... 119
Table 19. NPV and Sustainability ................................................................ 121
Table 20. Summary of Water Supply Project in Juigalpa, Nicaragua .......... 126
Table 21. Achievement of Nicaragua Project ............................................... 126
Table 22. Nicaragua Project: DAC Criteria Evaluation ............................... 127
Table 23. Summary of Water Supply Project in Buon Ho Town, Vietnam .. 128
Table 24. Objectives and Targets in Vietnam Project ................................... 130
Table 25. Achievement of Vietnam Project .................................................. 130
Table 26. Vietnam Proejct: DAC Criteria Evaluation .................................. 131
Table 27. Two Projects in Comparison ......................................................... 132
Table 28. Costs and Benefits in Water Supply Interventions ....................... 137
Table 29. Patterns of Water Consumption in Juigalpa before the Project .... 142
Table 30. Changes in Water Consumption Before and After the Project ..... 143
Table 31. Patterns of Water Consumption in Buon Ho Town ...................... 144
Table 32. Estimation of NPV and B/C of Two Water Projects ..................... 147
Box 1. Definitions of Evaluation .................................................................... 13
1
CHAPTER 1. INTRODUCTION
1.1. Motivation and Purpose of the Study
Evaluation of public investments is a critical activity to examine whether the
results can justify the resources spent. It is a way of ensuring accountability
of the government or public agencies to the citizens who pay tax. The
knowledge and lessons from evaluations can improve performances and help
informed decision-making. Evaluation requires a systematic and scientific
process to produce credible information necessary to make judgements on the
worth, value, or significance of what is evaluated, conforming with the norms
and quality standards to ensure objectivity and validity of information. It is
not uncommon, however, that evaluations produce biased results.
In the evaluation discipline, bias has been widely recognized, especially
those towards positive evaluation results. Scriven (1976, 1991) asserted that
there is a strong general positive bias across all evaluation fields. Positive bias,
a tendency to turn in more favorable results than are justified, seriously weakens
the credibility and validity of evaluations.
The problem seems to be prevalent in the field of international
development evaluation, if not more serious. Riddell (2007) showed that
more than 75% of aid projects by major aid agencies (e.g., those in the UK, the
2
US, and Australia, as well as multilateral banks such as World Bank and ADB)
had been reported as successful but the evidence was likely to be biased in favor
of positive results than deserved. Michaelowa and Borrmann (2006) argued
that there is an incentive to produce biased evaluation results as ‘legitimation’
function is dominating in aid evaluation. Bamberger (2009) found that a
significant number of evaluation reports by many development agencies have
a systematic positive bias, which may mislead the agencies to continue to fund
projects that might be less beneficial than claimed or even with potential
negative impacts which are not addressed in the evaluation. Political interests,
time and budget constraints, and lack of evaluation capacity are the primary
reasons of positive bias that have been identified by many scholars.
This study asks a question from a new viewpoint: whether this positive
bias is attributable to the widely used evaluation framework, the Criteria for
Evaluating Development Assistance developed by OECD DAC (Development
Assistance Committee) or the DAC criteria. The DAC criteria have become
a standard framework for international development evaluations. As a part of
internationally agreed evaluation principles, the DAC criteria have been
adopted by most of the aid agencies (OECD 2016) and shaped the way of
designing and conducting evaluations for their programs. The five criteria,
namely relevance, effectiveness, efficiency, impact, and sustainability, serve as
a guideline on what aspects to be addressed in evaluating development
interventions. The DAC criteria also provide agencies with a comparable
3
framework that facilitates the managements of evaluation activities and
findings.
If the risk of positive bias is inherent in the pre-determined evaluation
criteria, it inevitably harms the credibility and validity no matter how the
evaluation process and methods comply with the norms and standards for
evaluation to ensure them. Positive bias may reduce comparability between
evaluations that is one of the purposes for which the DAC criteria were
developed, consequently making it difficult to differentiate a more successful
intervention from a less one and possibly causing inconsistency across
evaluation results. Evaluation, which has become an established activity in
the management of development cooperation, would be another waste of
resources if not fulfilling its purposes.
While the DAC criteria serve as such an influential framework for
international development evaluations and there seems to be consensus on their
limitations, little effort has been made to explain and assess the DAC criteria
based on theoretical and empirical research. The purpose of this study is to
identify the limitations of the DAC criteria from a theoretical point of view in
comparison with general evaluation theories and methods and provide evidence
on how the DAC criteria have been used and with focus on the positive bias
they may produce and mislead the evaluation results.
In this study, I apply the framework of cost-benefit analysis (CBA) in
order to examine the DAC evaluation criteria. Although there are
4
controversies over its technical difficulties and practical limitations, CBA rests
on a sound theoretical basis and provides a useful framework for evaluation.
This study presents theoretical analyses as well as case studies in which the
DAC criteria as an evaluation framework are analyzed in comparison to that of
CBA.
1.2. Research Questions
This study examines the influence of DAC criteria on evaluation results with
focuses on positive bias and inconsistency that the use of DAC criteria may
generate. The research question is:
Does the use of OECD/DAC criteria tend to generate positively
biased evaluation results?
I first identify requirements for good evaluation criteria drawing on
general evaluation theories, and assess the DAC criteria against them with
special attention to possible sources of bias. Multiple evaluation theories are
reviewed. Then I discuss whether and in what aspects the use of DAC criteria
may generate bias in light of the defining features of evaluation criteria drawn
from the review of evaluation theories, and argue that the bias is likely to occur
and lean to positive results.
Since the DAC criteria only provide brief definitions and a few sample
5
questions, the study requires both theoretical and empirical approaches.
Based on the findings in the theoretical examination, I examine whether biases
occur in practice and tend to lean to positive results by conducting an empirical
review of evaluation reports applying the DAC criteria by two Korean aid
agencies.
Thirdly, I attempt to explain the existence of positive bias and its extent
in the DAC criteria. The basis of my argument is that there exists positive bias
if the overall conclusion in the DAC criteria evaluation is more favorable than
can be justified by the evaluative standards widely used in other evaluation
methods. Applying the criterion used in the cost-benefit analysis, namely the
net present value (NPV) as a benchmark, I first review the definitions and scope
of DAC criteria comparing to that of NPV. My assumptions are: (1) if
evaluations applying the DAC criteria tend to generate a positive conclusion on
a project whose NPV is negative, it is fair to conclude that there is positive bias
inherent in the DAC criteria; and (2) if a project with smaller NPV (or benefit-
cost ratio, B/C) gets similar or more favorable assessments in the DAC criteria
framework than another project with greater NPV (or B/C), in means that the
extent to which positive bias occurs is inconsistent across evaluations results.
To support my argument, I present some case studies which show how certain
events can affect the evaluation results in the DAC evaluation framework and
in CBA differently.
6
1.3. Organization of the Thesis
Following this introductory chapter, Chapter 2 presents the research design of
this study. It starts with a review of the evaluation literature which
encompasses the issues of general field of evaluation, development evaluation,
the DAC criteria and cost-benefit analysis. After the concepts and scope of
the study are defined, the framework for analysis is presented.
Chapter 3 presents an analysis of or the definitions and scope of or the
‘notion of merit’ in the DAC criteria in views of whether they constitute a good
evaluation framework and satisfy the requirements for evaluation criteria.
Based on the review of multiple evaluation theories as well as the CBA
framework, I attempt to identify possible sources of bias in the DAC criteria.
Chapter 4 deals with ‘standards of judgement’ in evaluations. Based on
the findings from an empirical review of the evaluation reports, I compare the
standards and supporting evidence generally applied the DAC criteria
framework with those in CBA.
In Chapter 5, the ‘method of synthesis’ of the DAC evaluation framework
is discussed in comparison to CBA framework. A theoretical analysis as well
as a comparative case study of water supply projects are presented. Chapter
6 summarizes the findings and concludes the thesis.
7
CHAPTER 2. RESEARCH DESIGN
2.1. Literature Review
2.1.1. Outline of Literature Review
The aim of the literature review is to develop a conceptual basis of framework
for analyzing the DAC criteria. In order to identify the historical and
theoretical context in which the DAC criteria were established, the literature
review starts with the evolution of evaluation discipline and fundamental issues
of evaluations in theory and practice. The review covers different
perspectives on evaluation purposes and principles, the contexts in which they
have evolved, and their implications.
After the theoretical overview of the evaluation discipline in general, I
present a review of literature on evaluation of development assistance or
‘development evaluation’ with focus on its history and development in practice,
based on which the DAC criteria have been discussed and established into what
they are now. Recent developments and main challenges in the field of
development evaluation are also discussed.
The literature review on the DAC criteria deals with in what context they
were developed and how they have been used in practice. It also considers the
8
discussion in the development community over the strengths and weaknesses
of the DAC criteria as a standard evaluation framework.
Cost-benefit analysis (CBA) has somehow not been fully integrated in
the evaluation discipline. To support the rationale for adopting the concepts
in the CBA theory, the literature review on CBA focuses on how CBA has been
considered in the evaluation discipline and what are the main issues in applying
CBA in development evaluations.
2.1.2. Literature in the General Evaluation Discipline
Evaluation as a discipline emerged in the 1960s in the United States (Shadish,
Cook, and Leviton 1991, Rossi, Lipsey, and Freeman 2004). Though
relatively new and dominated by the US scholars, the evaluation discipline is
considered as a “well established field of study, with contributions by theorists
and practitioners in many countries throughout the world” (Owen and Rogers
1999, 22).
Characteristics of the Evaluation Discipline
The evaluation discipline has some interesting characteristics, which have
contributed to the unique conceptual and theoretical development of evaluation.
First, the development of evaluation has largely been affected by political and
policy context, mainly driven by the domestic demands in the US. The US
9
government’s large-scale social interventions in the 1960s, such as ‘War on
Poverty’ and ‘Great Society’ initiatives, called for a systematic approach to
evaluating the social programs. This approach involved new
conceptualizations existing views and attempts to build a knowledge base for
evaluation across various areas of study. With legislations that mandated and
funded evaluations for major federal programs, evaluation grew as an
independent field of study and flourished as a profession for next two decades.
The political and social climates have influenced not only the demand for
evaluation, i.e., funding, but also the perspectives and approaches to evaluation
(Mark, Greene, and Shaw 2006, Shadish and Luellen in Mathison 2005, 183-
6).
Secondly, evaluation is an applied social science with multidisciplinary
or transdisciplinary1 nature. It has the intellectual traditions from various
disciplinary perspectives, for example, sociology, economics, psychology,
anthropology, women’s studies, cultural studies, etc., as well as based on the
accompanying scientific paradigms (Mark, Greene, and Shaw 2006). It
implies conceptual and methodological diversity. Evaluators take
methodologies from multiple and diverse social sciences, whether side-by-side
or in mixed or integrated ways. The evaluation discipline went through the
famous paradigm war or ‘quantitative-qualitative debate’ during the 1970s and
1 Scriven (2008a) describes the characteristics of evaluation as a transdiscipline, which
he argues are distinct from ‘interdiscipline’ or ‘multidiscipline’.
10
1980s as did other social sciences in the US. The intense debate around the
legitimacy and relative superiority of evaluation methodologies especially in
the 1980s eventually took the evaluation community to a trend towards more
pluralistic approaches.
Its practical focus has also played a crucial role in shaping the evaluation
discipline. From its start, the primary purpose of evaluations has been to
provide information on social programs whether and why they were successful
or not, which would supposedly assist the government in holding accountability
and making decisions for future public programs. The early dominance of
positivism applying strict quantitative methods to test the program-effects
relationship seemed to gradually lose its influence, criticized for its limited use,
and often being unworkable or even counterproductive (Stufflebeam and Coryn
2014, 310). The mid-1970s saw the focus of evaluation move to their
utilization, based on the premise that “evaluations should be judged by their
utility and actual use”, so the process and design of evaluations should consider
the diversity of interests and values of stakeholders in order to facilitate the
judgement and decision making by intended users (Patton 1997, 20). Since
then, the scope of evaluation has been broadened to utilization-focused,
participatory, and developmental approaches (for example, Guba and Lincoln
1989, Patton 1997, 1994), which have become one of the main pillars in the
evaluation discipline and also influenced the methodological evolutions in
other approaches.
11
These characteristics of evaluation discipline have contributed to
somewhat chaotic scenery of evaluation theories. Evaluators continue to
develop new approaches and brand them as a theory with ear-catching
adjectives. As a result, evaluation theories have proliferated with little critical
attention to their validity. This is another challenge of studying evaluation
theories, in addition to those that King (2003) identifies, namely, lack of
conceptual consensus, practical focus, continuing emphasis on models and
methods, predominant focus on program theory, and lack of research support.
Main Areas of Evaluation Literature
As evaluation has a short but rather chaotic history with its multi-disciplinary
nature, methodological diversity and a strong pragmatic focus, the literature in
the field encompasses various issues from its logic and philosophical
foundations (e.g., Shusterman 1980, Scriven 1981, 1994, 1995, Fournier 1995)
to how to manage budget and time (e.g., Alkin 2010, Bamberger, Rugh, and
Mabry 2011). To review these broad issues is beyond the scope of this study,
but it is useful to categorize the main areas that the evaluation literature deals
with. I identify three main areas: 1) what is evaluation and why we do
evaluations; 2) how do we conduct evaluations; and 3) what do we learn
through evaluations.
12
1) What is evaluation and why
The first topic is related to the fundamental dimension of evaluation: its nature
and role in society. The literature in this area deals with the definitions of
evaluation, its purposes and functions, role of evaluation, as well as debates
over the issues of theory, methodology, practice, and the profession including
ethical and quality standards aspects (Alkin, Patton, and Weiss 1990, Smith and
Brandon 2008, Stockmann and Meyer 2016).
Evaluation is defined in various ways by evaluation scholars with
different perspectives, reflecting its functions, purposes, and methods. Box 1
shows some examples of definition which are generally quoted in the
evaluation literature. To put together the core concepts in the definitions,
evaluation is a systematic process which consists of activities of collecting,
analyzing and interpreting information and eventually making a judgement on
the object being evaluated (called ‘evaluand’). The purposes of evaluation are
summarized as: to determine its value, to contribute to improvement of or
decision-making on the evaluand, and/or to increase understanding about the
evaluand. They fall into one of the three conceptual frameworks or what
Chelimsky (1997) calls ‘evaluation perspectives’: evaluation for accountability;
evaluation for development; and evaluation for knowledge.
13
Box 1. Definitions of Evaluation
Dictionary Definition (Oxford): The making of a judgment about the amount,
number, or value of something; assessment
Scriven (2015, 4): “Evaluation… refer[s] to the process of determining the merit,
worth, or significance.”
Stufflebeam and Coryn (2014, 14): “[The expanded, operational definition of
evaluation] is “the systematic process of delineating, obtaining, reporting,
and applying descriptive and judgmental information about some
object's merit, worth, probity, feasibility, safety, significance, and/or
equity.”
Rossi, Lipsey, and Freeman (2004, 2): “Evaluation research is defined as a social
science activity directed at collecting, analyzing, interpreting, and
communicating information about the workings and effectiveness of
social programs.”
Weiss (1998, 4): “Evaluation is the systematic assessment of the operation and/or
the outcomes of a program or a policy, compared to a set of explicit or
implicit standards, as a means of contributing to the improvement of the
program or policy.”
Patton (1997, 23): “Evaluation is the systematic collection of information about
the activities, characteristics, and results of programs to make judgments
about the program, improve or further develop program effectiveness,
inform decisions about future programming, and/or increase
understanding.”
Morra-Imas and Rist (2009, 9): “Evaluation refers to the process of determining
the worth or significance of an activity, policy, or program. [It is] as
systematic and objective as possible, of a planned, on-going, or completed
intervention.”
14
OECD (2002, 21-22): “The systematic and objective assessment of an on-going
or completed project, programme or policy, its design, implementation and
results... to determine the relevance and fulfillment of objectives,
development efficiency, effectiveness, impact and sustainability… to
provide credible and useful information enabling the incorporation of
lessons learned into the decision–making process of both recipients and
donors. Evaluation also refers to the process of determining the worth or
significance of an activity, policy or program. An assessment, as
systematic and objective as possible, of a planned, on-going, or completed
development intervention.”
Source: organized by author.
Historically, different perspectives have evolved over time and shaped
the trends of evaluation theories and paradigm shifts in the discipline. In
1960s, most evaluation practice had followed the standard social science
conventions of the era, which were largely 'quantitative', focusing on assessing
the causal relationships with methods of experimental or quasi-experimental
designs.
The dominance of positivism was challenged by attempts to develop
more practical and user-oriented approach in the 1970s. Constructivists,
heuristic evaluation also gained attentions by the time, and its descriptive,
qualitative and participatory approach became influential when conceptualized
as the fourth generation of evaluation by Guba and Lincoln (1989). In the
2000s, a return to the scientific methods or what Vedung (2010) calls ‘evidence
wave’ was observed as evidence-based decision-making became the primary
15
concern of the public agencies.
Table 1. Views on Historical Development of Evaluation
Stufflebeam and
Coryn (2014)
Guba and Lincoln
(1989) Vedung (2010)
~
1930s
1940s
1950s
1960s
1970s
1980s
1990s
2000s
Pre-Tylerian Period
(~1930s)
Tylerian Age
(1930-1945)
Age of Innocence
(1946-1957)
Age of Realism
(1958-1972)
Age of
Professionalism
(1973-2004)
Age of Global and
Multidisciplinary
Expansion
(2005-present)
First generation:
Measurement
(to early 1930s)
Second generation:
Description
(early 1930-1967)
Third generation:
Judgement
(1967-early 1980s)
Fourth generation
(late 1980 and
onwards)
Science-Driven Wave
(late 1950-mid 1970)
Dialogue-Oriented
Wave
(from mid 1970s)
Neo-Liberal Wave
(from around 1980s)
Evidence Wave
(in 2000s)
Source: arranged by author.
In the area of education, the history of systematic evaluation begins with
Ralph Tyler, who coined the term ‘educational evaluation’ and exerted a heavy
influence in the field (Stufflebeam and Coryn 2014, 30-33). In other fields,
social scientist had also conducted evaluation-type studies in their own field,
16
on major programs related to public health, social policies, international
initiatives, etc.
2) How to conduct evaluations
The second type of the literature on evaluation covers specific evaluation
models or approaches, which are claimed as ‘evaluation theories’.2 Ideally, an
evaluation theory would “describe and justify why certain evaluation practices
lead to particular kinds of results across situations that evaluators confront” but
is not likely to be achievable (Shadish, Cook, and Leviton 1991).
Nevertheless, there are a variety of evaluation theories, which attempt to
“provide a set of rules, prescriptions, prohibitions, and guiding frameworks that
specify what a good or proper evaluation is and how evaluation should be done”
(Alkin 2013a, 4). In other words, evaluation theories are theories of
evaluation practice, which provide a guidance regarding when, in what context
and why certain dimensions should be addressed with certain methods, as well
as how to assign value to what is being evaluated.
The variety of different evaluation theories are well displayed in the
‘evaluation theory tree’ (Figure 1) developed by Christie and Alkin (2013).
The 'roots' of tree, namely social accountability, systematic social inquiry, and
2 In the evaluation discipline, the term ‘theory’ is generally used interchangeably with
‘approach’ or ‘model’ (Shaw, Greene, and Mark 2006, Alkin 2013b). Some theorists
prefer using one over the others in a particular research purpose (Smith 2010, for
example, Hansen, Alkin, and Wallace 2013).
17
epistemology, serve as a foundation of evaluation work, namely the motivations
and rationales for evaluation. The tree grows into three branches that
represent dominant themes in evaluation theory—use, methods, and valuing.
Evaluation theories are categorized into each branch according to what is their
main emphasis, i.e., utilization, research methodology and techniques, or value
judgement. The leaves represent the theorists and the fruits (added by author)
indicate the evaluation theories grown out of the branches.
Figure 1. Christie and Alkin's Evaluation Theory Tree
Source: Christie and Alkin (2013, 12), representing theories in boxes added by author.
Evaluation theory is a concept much broader than an evaluation design
or methodology. So evaluation theories provide a guidance on how to design
18
an evaluation and what methodologies to use in collecting information.
Evaluation models also suggest what aspects or dimensions of evaluand to be
examined (evaluative criteria), how to determine the value or quality in each
aspect (evaluative standards), and what to consider when combining all
information into overall conclusions. These three processes are the important
components in the general logic of evaluation (Fournier 1995) and will be
discussed in detail in the later chapters.
3) What we learn through evaluations
While the above two topics are rather normative or practical, this topic is
empirically oriented, represented by so-called ‘research on evaluation (RoE)’.
There has been increasing recognition that evaluation theories should be more
empirically based (Mark 2008, Shadish, Cook, and Leviton 1991, Smith 2010).
Notwithstanding the proliferation of evaluation theories or models, none of
those have been systematically verified with empirical evidence (Astbury 2016,
324). It is difficult to generalize what is a sound theory in the evaluation
discipline yet.
Empirically oriented evaluation literature is growing as the evidence-
based evaluations are more and more emphasized, in the forms such as
empirical reviews on evaluation practice applying certain evaluation theories
(e.g., Cullen and Coryn 2011, Miller and Campbell 2006), or meta-analyses and
systematic reviews of evaluations on specific topics (e.g., Scott-Little, Hamann,
19
and Jurs 2002, Fewtrell and Colford 2004).
2.1.3. Development Evaluation
The field of development evaluation has a solid foundation in Europe (Berlage
and Stokke 1992, Stockmann 2013) and many textbook-type publications as
well as academic papers are written by European authors. The researches on
development evaluation have been from more pragmatic approaches rather than
based on theoretical foundations. One of the reasons is that development
evaluation has been dominated by “the need of the donor community with the
emphasis on the practical usefulness of the results in improving aid operations”
(Carden 2013, Cracknell 2000). As a result, the term development evaluation
is often used as a synonym as ‘evaluation of development aid’ or ‘aid
evaluation’. In fact, the OECD DAC, the group of donors of development
assistance, has played a key role in the development of the field. Many of the
early literature on development evaluation have been published by the DAC or
experts who participated in the DAC’s works on evaluation.
According to Cracknell (2000, 39), it is only by the time he wrote the
book that development evaluation became more fully integrated into the wider
world of evaluation debates. From the late 1990s, criticisms on the donor-
driven conventional approach became visible, and more diverse perspectives
with participatory, empowerment, and cross-cultural approaches have been
20
called for, as the constructivist became one of the main stream in the general
evaluation discipline (Rebien 1996, McDonald 1999).
In these contexts of evaluation in development cooperation, the DAC
criteria were developed and gained the importance. Cracknell (1988, 2000)
summarized the history of aid evaluation into four phases: the first, “early
developments” phase from 1960s to 1979; phase two of “explosion of interest”
from 1979 to 1984; phase three from 1984 to 1988 when the international
dialogues became important, and phase four from 1988 to the time of writing
characterized as “aid evaluation at the crossroads”. He viewed the next, phase
five could be “the emergence of methodological pluralism”.
In the 1960s, the most popular approach to evaluating development
projects was estimating financial and economic rates of return (Binnendijk
1989). As most development projects then were large infrastructure and
industrial projects, the economic methodology was seen to be appropriate. In
the 1970s when the emphasis of development assistance shifted toward meeting
the basic human needs (e.g., agricultural and rural development), the economic
analysis method became insufficient due to the difficulties in estimating and
quantifying the social benefits and other impacts, and also because it was not
able to deal adequately with the equity or distribution issues.
In the late 1970s, the logical framework, or the ‘Logframe’ emerged in
the USAID as a conceptual framework for guiding project planning,
implementation and evaluation. The Logframe was adopted by many other
21
aid agencies for project design and evaluation. Evaluation design was largely
based on ‘experimental and quasi-experimental research’, a methodology that
requires statistically reliable baseline data to measure the attribution. Soon it
turned out that this evaluation approach was “overly sophisticated, costly, and
impractical for the evaluation of most development projects” (Binnendijk 1989,
209). The focus of this approach was too narrowly placed on impacts, with
little useful evaluation outcomes for lessons and learning.
It is fair to say that development evaluations were largely conducted by
donor agencies for their own information needs with focuses on accountability
and control. Donor agencies have played the major role “as promoters,
executors, and consumers of the evaluations conducted in developing countries”
(Bamberger 2000, 101). The donor-driven evaluation practice resulted in a
number of problems as reported by many scholars (Bamberger 1991, Berlage
and Stokke 1992, McDonald 1999, Rebien 1997). The problems include:
evaluation designs were too costly and inappropriate within the local context,
with too much emphasis on the collection of quantitative data with little
flexibility in the methods used, while the duration and resources allowed were
generally limited at the expense of the quality of collected data or information.
Another interesting point in McDonald (1999) is that evaluations were mainly
conducted by professionals from donor side with skills in the project substance
(in the sector) rather than evaluation expertise, so they generally lacked
knowledge of the local social, political and cultural context.
22
All these problems seem to have contributed to what Bamberger (2009)
called ‘positive bias’ in international development evaluations. He explains
the positive bias as a result of combination of four factors: budget and time
constraints, limited access to data (particularly baseline data), the way
evaluations are commissioned and managed, and political and organizational
constraints and pressure. Raimondo, Vaessen, and Bamberger (2016) identify
five common scenarios in development evaluation: rapid evaluations, large-
scale, long-term evaluations; experimental manipulation and/or reliance on
primary data collection; systematic reviews, and participatory evaluations. It
appears that many evaluations conducted by donor agencies fall into the
category of ‘rapid evaluation’.
Recently the pressure on ‘evidence-based’ development assistance has
become strong, which calls for more scientific and empirical evidence in
development evaluations. This is in line with the ‘evidence wave’ in the
evaluation discipline, which is interpreted as a return of experimentation
(Vedung 2010). Development evaluation community seems to be skeptical of
using the experimental type methodologies (Forss and Bandstein 2008, Van
Den Berg 2005), but they continue to face the challenges to demonstrate the
results in evaluating development activities.
2.1.4. DAC Evaluation Criteria
23
The OECD/DAC’s Criteria for Evaluating Development Assistance have their
origin in the DAC Principles for Aid Evaluation, which was developed by the
DAC Expert Group on Aid Evaluation (now the DAC network on Development
Evaluation) and adopted by the DAC in 1991. The DAC Principles document
is considered as the most important product established by the Group (OECD
2013, 33). The definition of evaluation is the core element of the principles,
from which five concepts, namely relevance, efficiency, effectiveness, impact,
and sustainability, emerged. They have become the widely accepted
evaluation criteria as called the DAC criteria and had a profound impact on
development evaluation. At that time, the five criteria were presented as a
basic group of evaluation issues or questions to be addressed in an evaluation.
The purpose of developing the evaluation principles was for (1) the greater
coordination between the evaluation units in different aid agencies and (2)
harmonization of the terms of reference for evaluations, in an effort to ensure a
greater comparability of the results (OECD 1991). In other words, the set of
five criteria was developed with an expectation of contribution to collaboration
and comparability in evaluation activities between the DAC members.
The DAC criteria were updated in 2002 with the Glossary of Key Terms
in Evaluation and stipulated in the DAC Quality Standards for Development
Evaluation which was adopted in 2010. The DAC criteria have become a
standard framework for international development evaluation, adopted by
major donor agencies. According to Lundgren who was the member of the
24
Expert Group and now is the Head of Evaluation Unit at OECD, a key reason
why these criteria came to be widely spread and used is that they are a
manageable and relatively easy framework to understand and to use when
framing key evaluation questions (IEG 2017).
Figure 2. Development of Evaluation Paradigm and the DAC Criteria
Source: author.
The DAC five criteria have been widely used among the DAC members
as well as other development organizations including NGOs to meet donors’
requirements in reporting their activities. It is often criticized that the practice
of applying the DAC criteria is rather mechanical, described as a template,
checklist or box-ticking approach and even perceived as a ‘straight jacket’.
Even though the DAC requires all criteria applied to be defined in unambiguous
terms, it is argued that several criteria are not well understood, and their use is
25
often mechanical while excluding more creative evaluation processes (ALNAP
2006, 10). Some donors include additional criteria such as cross-cutting
issues or so-called 3Cs, namely, coherence, co-ordination, and complementarity
which are mainly adopted by European donors, e.g., EC, Demark Germany,
Ireland, etc. (OECD 2016). But within a rating system, the overall assessment
is usually made combining the five criteria.
Although there seems to be shared sentiments among evaluators on the
rigidness and limitations of the DAC criteria framework, academic literature
with theoretical explanations or from critical perspectives is not sufficient.
Among a few, Chianca (2008) examined each criterion under three questions:
(1) are they sufficient to provide a sound assessment of the quality, value, and
significance of an aid intervention? (2) are they necessary; (3) are they equally
important? He provides a good overview on the definitions of the criteria and
their strengths and weaknesses, based on which he suggests the ideas for
improvement, but did not attempt to provide empirical evidence on the
problems raised in the paper.
Igarashi and Awabdeh (2015) discuss the problems associated with
applications of the DAC criteria in the institutional context. The focus of their
discussion is on the ‘mechanical application’ of the DAC criteria based on a
pre-determined log-frame which, they criticize, generally assumes a linear
means-ends causality and fails to see the complex and dynamic nature of
development process. They argue that ‘log-frame-based evaluations’ are
26
common in practice, and that overreliance on the DAC criteria in designing
evaluation frameworks, often by those whose expertise is not in evaluation, has
a danger of overlooking how things progress and what changes happen and how.
The authors propose ‘weaning from the DAC criteria’ as an operational
framework which serves as a ‘template approach’. Instead, they suggest
utilizing them as a framing approach to define evaluation questions which
should be the driver of the evaluation methodology.
Recently the development evaluation community has started rather
openly discussing the issues around the DAC evaluation criteria. Early in
2017, Caroline Heider, the Director General of the Independent Evaluation
Group (IEG) at the World Bank, initiated the ‘Rethinking Evaluation’ series on
the IEG blog, suggesting that it is time to rethink the DAC five criteria. She
raised questions such as whether the criteria represent diverse views which have
emerged in the development field, e.g., inclusiveness and complexity. She
examined the definition of each criterion and their use in practice in a series of
provoking posts which brought up more than hundred comments and questions
(Heider 2017). The discussion is ongoing and expected to collect ideas for
improvements of the DAC criteria, though it remains as exchanges of opinions.
2.1.5. Cost-Benefit Analysis and Evaluation
Cost-benefit analysis (CBA) is generally perceived as a method used in project
27
appraisal or feasibility assessment before an intervention, to assist in decision-
making on whether to allocate resources to the specific intervention. While
this type of ex-ante CBA is regarded as a standard CBA, ex-post CBA has its
own values of learning about the actual value of the particular intervention and
contributing the learning about value of similar interventions, and even with
greater accuracy than ex-ante analysis (Boardman et al. 2011).
Conducting an ex-post CBA, therefore, shares the same purposes of
conducting an evaluation, more specifically a summative evaluation.3 By
estimating the value of what is being evaluated, CBA determines whether the
intervention was socially worthy, thus satisfies the accountability purpose of
evaluation. With the information obtained by the CBA on whether a similar
intervention would be worth investing, it helps decision-making on whether to
expand or replicate the project.
CBA is a widely-used tool for assessment of the value of an evaluand
and includes a task of drawing a conclusion with regard to how a given
evaluand has performed. Although CBA shares common purposes and tasks
with evaluation, it has somehow been separated from the evaluation discipline.
King (2015, 2017) describes that economic evaluation, e.g., CBA, tends to be
3 Summative evaluation is a term coined by Scriven, who make a distinction between
evaluation for assessing the overall value of an evaluand and that for improvement
which is called formative evaluation. The summative evaluation is generally
conducted after completion of the project or program, but also can be conducted for on-
going ones after stabilization (Scriven 1991, 340).
28
applied “either in isolation from or in parallel to other methods of evaluation”.
He adds that the evaluation community has noted this gap and suggested
economic valuing methods should be integrated in general evaluations.
This is probably because the evaluation field started in and has been
developed by other fields of social science than economics, such as education,
sociology, public policy, etc. In fact, consideration of cost, not to mention the
cost-benefit relation, has been very weak in general evaluations4. There seems
to have been some misunderstanding or resistance among the evaluation
community on the concepts and definitions used in CBA such as valuing and
monetizing qualitative aspects of human and society, e.g., happiness or pain,
quality of life, environment, and so on. A direct example is the article titled
“The Economist’s Fallacy” by Michael Scriven (2008b), who is the second to
none influential scholar in the discipline of evaluation. With some
misunderstanding of the definition of opportunity cost in economics, he even
argued, “evaluators should not follow the economists [when analyzing costs],
or they will end up in a swamp of misleading conclusions about program
costs…”.5 Technical complexity and theoretical assumptions embedded in
CBA may also contribute to isolation of the method from the mainstream
evaluation approaches which has moved towards qualitative methods with
4 Scriven (2015) noted that cost-analysis, both quantitative and qualitative, were
ignored for long and still seriously underused in evaluation. 5 Two economists, Rudd (2009) and Watts (2008), responded to Scriven’s claim, saying
that his understanding of opportunity cost is incorrect from an economics perspective.
29
more emphasis on participatory and social justice perspectives.
In the field of development, it was common to use CBA in 1960s and
1970s, although it seems to have been limited to ex-ante evaluations.
Economic analysis was institutionalized as a main method of feasibility
assessment in major multilateral development banks. But it became less
preferred for practical reasons. For example, the percentage of projects which
were assessed with CBA methods had dropped from 70% in 1970 to 25% in
2005, mainly because of the growth of project area which are difficult to apply
CBA, especially governance and social protection (World Bank 2010).
Recently, however, multilateral development banks became active again in
studying how to apply the economic methods to development programs, and
have developed guidelines for sector-specific areas. International Fund for
Agricultural Development (IFAD) released a series of guidelines for economic
analysis of agricultural projects in 2015 and in 2016 (IFAD 2015, 2016a, b).
Asian Development Bank (ADB) recently revised its 1997 version of
Guidelines for the Economic Analysis of Project (ADB 2017) and also
published a practical guide on CBA for development (ADB 2013) among others.
In the development context, CBA is often called as ‘social cost-benefit
analysis’. Snell (2011) also uses the term when introducing three categories
of CBA as financial, economic and social CBA. While financial CBA
concerns the financial position of a person or firm, economic CBA concerns the
welfare of a defined group of people. Social CBA adjusts the price to reflect
30
priorities and policies that markets would not reflect, for example, adjustments
to give advantage to certain population groups such as the rural poor. But the
distinction between economic and social CBA is often blurred, as the author
admits (Snell 2011, 5). As Brent (2006, 5) points out, the word ‘social’ is used
to emphasize that one is attempting to express the preferences of all individuals,
whether they be rich or poor, or directly or indirectly affected by the project.
Without the word ‘social’, CBA deals with all individuals in the society as well
as distribution issues. As de Rus (2010) defines, CBA is about social welfare
and considers all social costs and benefits of projects, whose view is taken in
this thesis.
Numerous critiques of CBA exist. In addition to the conceptual and
methodological issues such as quantification and monetization of qualitative
values and discounting and distributional issues (Frank 2000, OECD 2006),
practical challenges make the application costly and limited especially in the
development context. Despite all the controversies, however, a consensus is
that CBA provides a powerful conceptual framework for evaluation. CBA
involves “systemically identifying, measuring, valuing, and comparing the
costs and consequences of alternative courses of action” (Drummond 2005,
quoted in King 2017). Methods to supplement the limitations have been
developed, such as analysis of multiple scenarios through sensitivity tests.
This is the ground that I argue CBA can be a good benchmark to the DAC
criteria framework.
31
2.2. Concepts and Scope of the Study
In the evaluation literature, many terms are used in different ways, sometimes
causing confusions. It would be useful as well as necessary to define the key
terms which I use in the study and to narrow down the scope of the study as
follows.
Evaluation and Evaluand
As discussed in the literature review, there are various definitions of evaluation
which reflect difference perspectives to its functions, purposes, and methods.
In this study, I take an eclectic way by combining Scriven’s and Weiss’s to
define evaluation as “the systematic assessment of a development intervention,
compared to set of explicit or implicit standards, to determine its merit, worth,
or significance6”.
Perspectives, approaches and methodologies on evaluation vary
according to the object of evaluation or the ‘evaluand’. Evaluation literature
and practice encompass various kinds of evaluands from product to personnel,
summarized as the “six Ps”: programs, policies, performance, products,
personnel, and proposals (Scriven 1991). This study is focused on evaluation
6 Precisely speaking, the terms ‘merit’ and ‘worth’ have distinctive meanings: ‘merit’
means intrinsic, context-free qualities, while ‘worth’ refers to context-determined value
(Scriven 1994, Stake 2004). In some literature, the terms merit, worth, value or
quality are used interchangeably.
32
of development interventions or ODA projects, so mostly deals with theories
for ‘program evaluation’.
Evaluation Criteria and Standards
The term ‘criteria’ means “the aspects, qualities, or dimensions that distinguish
a more meritorious or valuable evaluand from one that is less meritorious or
valuable” (Mathison 2005: 91). In evaluation literature, the term is used as a
synonym as dimension of merit/worth (Davidson 2005) or analytical category
(Dale 2004) and often includes indicators or variables of success or merit
(Scriven 1991: 111).
A criterion should be distinguished from the term ‘standard’ which means
the level or amount of quality needed for a certain judgement. While criteria
are “the aspects of an evaluand that define whether it is good or bad and whether
it is valuable or not valuable” (Davidson 2005: 239), standards are the level of
how good and how valuable that differentiate the evaluand between acceptable
and unacceptable (Stake 2004: 7).
Evaluation Framework
An evaluation framework generally means a tool used to organize and link
evaluation criteria with questions, outcomes or outputs, data sources and data
collection methods. Some use the term as a synonym as an evaluation matrix.
The DAC criteria as a set can be viewed as an evaluation framework, which
33
defines the aspects the evaluation should examine and the questions to be asked.
In this study, the term ‘DAC criteria framework’ or ‘DAC framework’ are used
to address an evaluation framework using the DAC criteria.
Positive Bias
In evaluation literature, bias is defined as “systematic deviation of results from
what they should be” (Camfield, Duvendack, and Palmer-Jones 2014).
Positive bias in evaluation means the judgement is in favor than the actual
results. Scriven (1991) described ‘bias’ as the same as “prejudice”, whose
antonyms are ‘objectivity’, ‘fairness’ or ‘impartiality’. Bias may be caused by:
evaluator’s personal view e.g., halo effects or Rorschach effect7 ; evaluation
design and methods, e.g., selection bias; incentive mechanism, e.g., funding
bias.
The concept of positive bias in this study is used in the way that positive
bias exits if the overall conclusion is more favorable than can be justified by
widely used evaluative standards in other evaluations, e.g., the net present value
in cost-benefit analysis.
7 Halo effects mean the tendency to allow the presence of some highly valued feature
to overinfluence one’s judgement. Rorschach effect refer to the tendency to see what
one wants to see (Scriven 1991).
34
General Logic of Evaluation
The general logic of evaluation means the generally applied reasoning process
by which evaluative conclusions are established and supported. This general
logic is commonly shared by various evaluation approaches, while what counts
as criteria or evidence and how evidence is weighted varies from one approach
to another. Based on Michael Scriven’s logic of evaluation, Fournier (1995)
describes the general logic in four steps (Figure 3):
Source: Fournier (1995)
King (2017) described the process of cost-benefit analysis (CBA)
implementing the general logic of evaluation: “identifying the things of value”,
that is, establishing criteria of merit; “quantifying and valuing them” which
1. Establishing criteria of merit
On what dimensions must the evaluand do well?
2. Constructing standards
How well should the evaluand perform?
how well did the evaluand perform?
3. Measuring performance and comparing with standards
How well did the evaluand perform?
4. Synthesizing and integrating data into a judgement of merit or
worth
What is the merit or worth of the evaluand?
Figure 3. General Logic of Evaluation
35
involves constructing of standards and measuring; and “synthesizing the
evidence… to reach an overall determination of net value”. Likewise,
conducting an evaluation using the DAC framework can be conceptualized by
implementing the general logic of evaluation, which would provide a logical
analytic framework for the analysis of the DAC criteria evaluation process
especially in comparison to CBA framework.
2.3. Analytical Framework
This study analyzes the DAC evaluation criteria implementing the stages
of the general logic of evaluation. The analytical framework of the study is
illustrated in Figure 4.
Figure 4. Analytical Framework of the Study
The first stage, i.e., establishing criteria of merit involves defining and
36
describing what are the merits and values that the evaluand should have to be
considered as a good one. Adopting the key requirements for evaluation
criteria identified in the ‘criteria of merit checklist (COMlist)’ (Scriven 2007),
I examine the definitions and scope of DAC criteria individually and
collectively, in views of whether they constitute a good evaluation framework
in comparison to the evaluation criteria in established evaluation theories, i.e.,
Scriven’s KEC framework, Stufflebeam’s CIPP model, Rossi’s Theory-based
model as well as the cost-benefit analysis framework. This part of the study
is presented in Chapter 3.
In the second and third stages of the general logic of evaluation, the core
concept is ‘standard’, that is, “the level of how good and how valuable that
differentiate the evaluand between acceptable and unacceptable” (Stake 2004,
7). In Chapter 4, I analyze what are the standards of judgement in the DAC
criteria framework and whether they can indicate appropriate levels of values
that requires a development intervention to meet. The analysis is based on the
findings from empirical review of the evaluation reports which apply the DAC
criteria framework and on comparison of the standard of judgement in cost-
benefit analysis, i.e., the net present value.
The final stage of the general logic of evaluation is concerned with how
to integrate the findings from different criteria into an overall conclusion.
This involves the issues of relative importance between evaluation criteria and
the validity of evaluation results. A comparative analysis is conducted in
37
Chapter 5 to examine how differently a certain event can affect the evaluation
results in the DAC framework and the cost-benefit analysis with a conceptual
framework followed by a comparative case study.
38
CHAPTER 3. NOTION OF MERIT – DEFINITION AND SCOPE OF DAC
CRITERIA
3.1. Key Requirements for Evaluation Criteria
Establishing criteria of merit is the first step in the general logic of evaluation.
Identifying and selecting the right criteria is one the most critical tasks in
evaluation procedures, as they define the merit, worth, or significance of a
program being evaluated.
A dictionary definition of criterion is “a principle or standard by which
something may be judged or decided”. In Encyclopedia of Evaluation, the
term ‘criteria’ is defined as “the aspects, qualities, or dimensions that
distinguish a more meritorious or valuable evaluand from one that is less
meritorious or valuable” (Mathison 2005, 91). In evaluation literature, the
term is used as a synonym as dimension of merit/worth (Davidson 2005) or
analytical category (Dale 2004), and often includes indicators or variables of
success or merit (Scriven 1991, 111).
Davidson (2005: 27) made an interesting analogy of identifying
evaluation criteria with “deciding what symptoms to look at when determining
what is wrong with a patient and how serious it is”. In other words,
establishing evaluation criteria is making decisions on what aspects of the
39
evaluand to investigate in order to draw judgments on whether it is successful
or not and what are the causes of the success or failure. Once developed,
evaluation criteria represent the desired characteristics of the evaluand and
serve as a basis for assessing the overall merit, worth and significance that is
what an evaluation would determine eventually.
Therefore, selection of evaluation criteria affects the validity of
conclusions. As Scriven (2007) underscores, a set of evaluation criteria, when
developed in a right way, can contribute substantially to the improvement of
validity, reliability, and credibility of an evaluation, because evaluators are
required to consider each of the relevant aspects separately and make a
judgement on each criterion, based on which an overall evaluative conclusion
can be drawn. Such a set of criteria often incorporates a great amount of
specific knowledge and experience about the particular type of interventions,
so facilitates evaluation tasks.
According to Scriven (2007), identifying true criteria for an evaluand X
begins with asking “what properties are parts of the concept of a good X”. It
implies that to identify criteria for a development intervention, the first question
should be: what the properties of a good development intervention are. Those
properties, in other words, the notion of merit should represent a successful
development intervention.
There are conflicting views on whether the evaluation criteria for an
evaluation should be determined beforehand, since they often emerge during
40
the evaluation process. Scriven (1991, 2007) advocates the value and
usefulness of a list of criteria. When developed in a right way, a list of criteria
can contribute substantially to the improvement of validity, reliability, and
credibility of an evaluation as evaluators are required to consider each of the
relevant aspects separately and make a separate judgement on each separate
criterion, based on which an overall evaluative conclusion can be drawn. It
can reduce the risks of possible biases such as the halo effect or the Rorschach
effect. Such a list often incorporates a great amount of specific knowledge
and experience about the particular evaluands such as development
interventions, so facilitates evaluation tasks.
Skeptical views are against the alleged objectivity which is an often-
stated purpose for criterial analyses. They argue that since the criteria
manifest the subjective biases of their developers, there is a danger that “pre-
specified criteria will ensure attention to some program aspects at the expense
of others and inject systematic bias rather than eliminate it” (Bamberger, Rugh,
and Mabry 2011, 315-6). This is especially the case in which each program is
complex and contextual to be judged comprehensively by comparison to a set
of criteria intended for all programs of its kind.
As Stufflebeam and Coryn (2014: 683) argue, evaluators may be
divergent in considering a wide range of potential evaluative criteria, and
should subsequently converge on the criteria agreed to be most important in
carrying out a given evaluation assignment. In this sense, applying the pre-
41
determined criteria such as DAC criteria may limit the possibility of
discovering other important aspects to be also considered when drawing overall
conclusions about the evaluand.
Then what are the characteristics of appropriate evaluation criteria for an
evaluation? Scriven (2007) provides a list of key requirements for evaluation
criteria or ‘criteria of merit checklist (COMlist)’, which is useful when
examining the DAC criteria as an evaluation framework.
The key requirements for evaluation criteria are: (1) the criteria in the list
should refer to criteria defining the general notion of merit in the evaluand not
mere indicators, (2) the list should be complete (no significant omissions), (3)
the items should be non-overlapping, (4) the criteria should be commensurable,
(5) the criteria should be clear (comprehensible and applicable), (6) the list
should be concise (with no superfluous criteria), (7) the criteria should be
confirmable (measurable or reliably inferable).
To be complete means that every significant criterion (of merit) must be
included. Otherwise, the overall evaluation may be misleading or biased
because of its poor or superior results on some missing but crucial aspect of
merit, that is a non-counting problem. If the criteria overlap, there is a risk of
double counting in the overlap area especially when the list of criteria is to be
used as a basis for scoring. Including aspects which are taken for granted,
those put into the general background of all development interventions, would
only extend the list beyond necessity.
42
A criterion can measure (or assist judgements) how well the evaluand, an
intervention under evaluation, is performing or has performed in the criterion
at issue. So the criterion should explain why meeting certain level of the
criterion means a worth or success of the intervention, and suggest measures
that can directly address and describe the achievement in each criterion.
Being direct is one of the desirable properties of evaluation criterion that
Keeney and Gregory (2005) also pointed out.
Table 2 Summarizes the key requirements of evaluation criteria and what
kinds of bias may occur if the requirements are not met.
Table 2. Key Requirements of Evaluation Criteria and Possible Sources of
Bias
Requirements If… Sources of bias
Clear notion of
quality
Definition is not unambiguous
and comprehensive
Questions are about something
taken for granted
Inappropriate data or evidence
Easy to satisfy
Guidance for
standards of
assessment
too high or too low
don’t address directly the
criteria
No guidance
Difficult to assess the true
value and to discriminate
More subjectivity, more room
for general positive bias
Complete,
non-
overlapping,
commensurable
omission
overlaps
a serious problem detected in
one criterion can be
compensated for by good
assessment in another criterion
non-counting
double-counting
Unbalanced overall
conclusions
Source: by author.
In this chapter, I examine the DAC evaluation criteria in comparison to
43
those in established evaluation models in the discipline and in cost-benefit
analysis, with focus on the definitions and scope of each criterion.
3.2. General Criteria in Evaluation Models
There are several widely recognized guiding frameworks for evaluations by
renowned theorists in the field. Each framework provides a set of components,
categories or dimensions to be assessed in evaluations, which can be regarded
as corresponding to the DAC evaluation criteria. First of all, Key Evaluation
Checklist (KEC) framework by Scriven (2015) provides relatively
comprehensive ‘checkpoints’ or ‘sub-evaluation’ categories. The widely used
evaluation textbook by Davidson (2005) has its foundation on the KEC
framework.
Another influential evaluation approach is the CIPP Evaluation Model
by Stufflebeam (2003, 2007). CIPP stands for Context (what needs to be
done), Input (how should it be done), Process (is it being done), and Product
(did it succeed). CIPP model provides seven components of evaluation, which
may be employed selectively and in different sequences and often
simultaneously, depending on the needs of particular evaluations.
Theory-driven Evaluation approaches name the categories as ‘scope and
hierarchy of evaluation’. Rossi, Lipsey, and Freeman (2004) and Donaldson
(2007) suggest five dimensions of evaluation hierarchy in the order of: need for
44
the program, program design and theory, process and implementation,
outcome/impact, and cost and efficiency.
3.2.1. Key Evaluation Checklist (KEC) framework
Michael Scriven is one of the most eminent figures in the evaluation discipline,
and his contributions to and their overall influence on the field are considered
as second to none. 8 The checklist methodology and several evaluation
checklists are among his original contribution to the theoretical development of
evaluation discipline. The Key Evaluation Checklist (KEC) was first
developed in his earlier book, The Logic of Evaluation (1980) and has been
updated with minor revisions.9
Though titled as ‘checklist’, the KEC is a comprehensive framework
with in-depth descriptions for how to conduct and report evaluations. It
covers the whole process of evaluation from ‘preliminaries’ (on executive
summary and preface), ‘foundations’ of conducting evaluations (on identifying
and explaining the background, context, descriptions, impactees or consumers,
resources and values of the evaluand), ‘sub-evaluations’ (the dimensions to be
assessed), to ‘conclusions’ (synthesis of the assessments in sub-evaluations and
8 Scriven’s contributions to the field are well presented in Donaldson (2013), a book
published as a tribute to him. One of the earliest in the field, his work includes ‘the
Methodology Evaluation (1972)’ and ‘the Logic of Evaluation (1980)’. 9 My review is based on the version of 2015.
45
possibly recommendations). The ‘sub-evaluations’ part, as the dimensions to
be assessed, consists of the following five checkpoints: Process, Outcomes,
Costs, Comparison and Generalizability.
Process checkpoint deals with how good, valuable or efficient is the
contents and implementation of the evaluand. Contents means what the
evaluand consists of, including its basic components or design.
Implementation is on how well or efficiently the evaluand was implemented or
delivered to those who needed it. Davidson (2005: 65) mentions that under
Process checkpoint everything about the program except outcomes and costs
should be examined.
While Process is about ‘means’, Outcomes covers all the ‘ends’ including
what the Process are aimed at as well as other effects of the intervention. In
Outcomes (or effects, impacts interchangeably) checkpoint, the main evaluative
question is how good or valuable are the impacts on immediate recipients and
other impactees10. So it is required to identify all effects, including unintended
impacts on all potential impactees, or at least the possibility should be
investigated. Sustainability of the program’s effects is also important and
should be covered here.
Costs checkpoint examines whether the evaluand has good or poor value,
not just within the budget but whether the budget itself was excessively high or
10 Scriven warns not to use the term ‘beneficiaries’ since it carries the completely
unacceptable assumption that all the effects are beneficial.
46
low or whether there were more cost-effective alternatives that should have
been considered.
Under Comparisons checkpoint, the evaluand should be compared with
(1) an exemplary one or what is widely regarded as ‘best practice’ or state-of-
the-art, (2) a creative low-budget option, (3) an option with slightly more
resources allocated to it, or (4) a slightly more streamlined or economical
version.
Generalizability may not be an obligatory criterion, but worth
considering the possibility. Also called as exportability or transferability, it
deals with what elements, if any (e.g., innovative design), of the evaluand might
make it potentially valuable or a significant contribution or advance in another
setting.
Davidson (2005) adopted the KEC for her evaluation methodology and
reorganized the evaluative criteria with five most relevant ones in the KEC. In
addition to the four core sub-evaluation checkpoints, Process, Outcome, and
Comparative Cost-Effectiveness (combining Costs and Comparisons), and
excluding the non-obligatory Generalizability, she identified Impactees (or
consumers) and Value under ‘foundation’ heading as key evaluative criteria.
The rationale of the two additional criteria is that evaluation needs to identify
who might be affected by the evaluand and how to define what is ‘good’ or
‘valuable’ (Davidson 2005, 23-24).
47
3.2.2. CIPP Evaluation Model
The CIPP evaluation model is a comprehensive framework for conducting both
formative and summative evaluations (Stufflebeam 2003). It was created in
the late 1960s by as an alternative to the classic evaluation approaches at the
time, e.g., experimental design and objectives-based evaluation, which were
proved to be of limited use and often unworkable and even counterproductive
(Stufflebeam and Coryn 2014). The model has been further developed over
the years11, and been adopted and applied across the world and a wide range of
areas.
CIPP is an acronym for Context, Input, Process, and Product, which
represent the four categories of evaluation in relation with program’s goal, plans,
actions, and outcomes respectively. The CIPP framework provides seven
components of evaluation, Context, Input, Process, and Product as subdivided
into Impact, Effectiveness, Sustainability and Transportability. They may be
employed selectively and in different sequences and often simultaneously,
depending on the needs of particular evaluations.
Context evaluation is about what needs to be done. It assesses needs,
assets, and problems within a defined environment. The Context evaluation
component is ideally done at an early stage of program development with focus
11 My review is based on the 2007 version of the model checklist (Stufflebeam 2007)
as well as on Stufflebeam (2003) and Stufflebeam and Coryn (2014).
48
on program’s aims and evaluation design. For formative use, the model
assumes that the evaluators are involved from the beginning of the program and
throughout the program activities, in order to identify the problems and assess
program goals as well as to observe and record pertinent information residing
in the program’s geographic area. For summative evaluations, the focus is
more on judging goals and priorities by comparing them to the assessed needs,
problems, assets, and opportunities.
Input evaluation addresses the question of how it should be done. It is
to be done at a stage of program planning, dealing with competing strategies
and the work plans and budgets of the intervention. It assesses the program’s
strategy against relevant research and development literature, in comparison
with alternative strategies in similar programs. The assessment should
include the program’s work plan and schedule for sufficiency, feasibility, and
political viability.
A Process evaluation is “an ongoing check on a plan’s implementation
plus documentation of the process”. In Process evaluation, program activities
are monitored, documented, and assessed. It helps the clients/stakeholders to
coordinate and strengthen staff activities, to strengthen the program design, to
maintain a record of the program’s process and costs. Process evaluation is
more related to formative than summative evaluation, but the information
produced here is vital for interpreting evaluation results in Product evaluation,
the following category.
49
Impact is the first component in Product evaluation which asks “did it
succeed?”. A program’s reach to the target audience is assessed with a
question whether the right beneficiaries were reached. Evaluators assess and
make a judgement of the extent to which the served individuals and groups are
consistent with the intended beneficiaries. Ideally impact evaluation is to be
done regularly and updated.
Effectiveness evaluation is about whether the beneficiaries’ needs were
met and the quality and significance of outcomes. On the program’s outcomes,
evaluators conduct in-depth case studies of selected beneficiaries as feasible
and appropriate, identify the program’s full range of effects both positive and
negative, and intended and unintended, and judge its effectiveness in
comparison to the identified ‘critical competitors’ which means similar program
conducted elsewhere.
Sustainability evaluation assesses the extent to which “a program’s
contributions are institutionalized successfully and continued over time”.
Evaluators are also to identify what program successes should and could be
sustained.
Transportability evaluation asks the question of “whether the processes
that produced the gains were proved transportable and adaptable for effective
use in other settings”. It assesses the extent to which a program has (or could
be) successfully adapted and applied in other settings. This is an optional
component as in Scriven’s KEC.
50
The CIPP Model has evolved adopting new concepts and ideas in the
field of evaluation. For example, Sustainability and Transportability were not
distinctive evaluation components in earlier versions of CIPP Model where they
were included in an example of checklist for summative evaluations consisting
of 21 checkpoints. The definitions have been elaborated in the later versions.
3.2.3. Theory-driven Evaluations
Theory-driven (or theory-based) evaluation is “a contextual or holistic
assessment of a program based on the conceptual framework of program theory”
(by Chen in Mathison 2005, 415-9). Given that program theory is “a set of
assumptions of how the program should be organized and why the program is
expected to work”, the primary aim of theory-driven evaluation is to test if the
theory works. Taking up a holistic approach, it also serves to fulfill wider
purposes such as “to provide information on not only the performance or merits
of a program but on how and why the program achieves such result”. Theory-
driven evaluations are well established in evaluation discipline and have been
applied to numerous domains.
One of the most quoted theory-driven evaluation approach by Rossi,
Lipsey, and Freeman (2004) advocates a comprehensive model for a program
evaluation, providing the following five domains which are generally involved
in the evaluation of a program: (1) the need for the program, (2) the program’s
51
design, (3) its implementation and service delivery, 4) its impacts, or outcomes,
and (5) its efficiency. These represent the types of evaluations which can be
addressed in separate evaluations. In a holistic model, they together represent
‘evaluation building blocks’ in the form of a hierarchy as in Figure 5.
Source: Rossi, Lipsey, and Freeman (2004, 80)
In the evaluation hierarchy, each dimension to be assessed is based on
those beneath it. In other words, evaluation tasks in each level assume
knowledge about supporting issues below in the hierarchy. Assessment of
need for the program at the foundation level of the hierarchy provides the
diagnostic information on the nature of the problems and the need for
intervention, based on which the program design can be assessed whether the
program theory is reasonable for addressing the problems and needs. Then
Figure 5. The Evaluation Hierarchy
Assessments of Program Costs and Efficiency
Assessments of Program Outcome/Impact
Assessments of Need for the Program
Assessments of Program Design and Theory
Assessments of Program Process and Implementation
52
the evaluation may move on to the next level above, the assessment of program
process and implementation, that is, the task of assessing whether the
corresponding program activities are well implemented.
A key message in the evaluation hierarchy is that there are logical
interdependencies between the levels. To assess program outcomes would be
meaningful when it rests on acceptable results from assessments of the logically
prior issues, such as whether the program theory is sound in addressing the
needs and social conditions the program is intended to improve and how well
it is implemented, which are to be asked in the lower level of the hierarchy.
Assessment of program cost and efficiency requires supporting information
from below levels in the hierarchy, regarding the social problems and program
theory addressing them, implementation process, and program outcomes,
which serve as ‘building blocks’ for the next level evaluation.
The theory-driven evaluation model by Rossi and others provides the
dimensions to be examined in a holistic evaluation with a logical sequence. It
is worth noting that they confirm that the assessment of cost and efficiency,
represented by cost-benefit analysis, is at the top of the evaluation hierarchy,
meaning that the task assumes information and knowledge drawn from other
categories of evaluation such as program’s needs, process and outcomes.
53
3.3. Definitions and Scope of the DAC Evaluation Criteria
3.3.1. Characteristics of the DAC Criteria
The DAC evaluation criteria or Criteria for Evaluating Development
Assistance were developed driven by the donor community’s efforts to
coordinate and harmonize their evaluation activities in the late 1980s through
1990s. Having their origin in the DAC Principles for Aid Evaluation
developed by the DAC Expert Group on Aid Evaluation, the five evaluation
criteria were adopted in 1991 (OECD 1991). The main ideas and definitions
are updated in the Glossary of Key Terms in Evaluation published by the DAC
in 2002. They are a part of internationally agreed principles for development
evaluation, stipulated in the DAC Quality Standards for Development
Evaluation adopted in 2010.
The DAC Quality Standards for Development Evaluation are intended to
identify the key pillars needed for a quality development evaluation process and
product (OECD 2010). As stated in the document, the quality standards are
not mandatory, but meant to provide a guide to good practice. The document
also makes it clear that the standards are “not intended to be used as an
evaluation manual and do not supplant specific guidance on particular types of
evaluation, methodologies or approaches”.
Likewise, the DAC criteria are not meant to be obligatory. The use of
54
the five criteria does not rule out the possibility of excluding the existing ones
or adding other criteria that are considered relevant to specific characteristics
of the evaluation and its context.12 In Quality Standards for Development
Evaluation, it is stated:
2.8 Selection and application of evaluation criteria
The evaluation applies the agreed DAC criteria for evaluating
development assistance: relevance, efficiency, effectiveness, impact
and sustainability. The application of these and any additional
criteria depends on the evaluation questions and the objectives of the
evaluation. If a particular criterion is not applied and/or any
additional criteria added, this is explained in the evaluation report.
All criteria applied are defined in unambiguous terms. (OECD 2010,
9)
Nonetheless, the DAC five criteria have been widely used among the
DAC members as well as other development organizations including NGOs to
meet donors’ requirements in reporting their activities. It is often criticized
that the practice of applying the DAC criteria is rather mechanical, described
as a template or box-ticking approach and even perceived as a ‘straight jacket’.
Even though the DAC requires all criteria applied to be defined in unambiguous
12 An example is the DAC Criteria for the Evaluation of Humanitarian Assistance.
The Active Learning Network for Accountability and Performance in Humanitarian
Action (ALNAP) has introduced three additional evaluation criteria: connectedness,
coherence and coverage, considering some unique features of humanitarian
intervention (ALNAP 2006).
55
terms in the quoted above, it is argued that several criteria are not well
understood, and their use is often mechanical while excluding more creative
evaluation processes (ALNAP 2006, 10-11).
Given that the term ‘evaluation criteria’ refers to the dimensions to be
addressed in evaluations, it is fair to say that the DAC criteria represent the
aspects of value that a development intervention should fulfill. In other words,
to be judged as successful in light of the DAC criteria, a development
intervention should be relevant, efficient, effective and sustainable as well as
brings impacts. It is natural that questions arise around the definitions of the
criteria and the standards of judgement against which the assessments are to be
made. For example, what does it mean when we say a development
intervention is relevant? What constitutes an effective intervention? What
are the benchmarks of an efficient program? What quality and quantity of
impacts do we expect? To what extent and how long should an intervention
be sustainable? I discuss these questions on each criterion as follows.
3.3.2. Relevance
In the OECD/DAC evaluation context, relevance is defined as “the extent to
which the aid activity is suited to the priorities and policies of the target group,
recipient and donor” (OECD n.d.). More detailed descriptions are found in
the Glossary of Key terms in Evaluation and Results Based Management:
56
The extent to which the objectives of a development intervention are
consistent with beneficiaries’ requirements, country needs, global
priorities and partners’ and donors’ policies. Note: Retrospectively,
the question of relevance often becomes a question as to whether the
objectives of an intervention or its design are still appropriate given
changed circumstances. (OECD 2002, 32)
Relevance is a rather subjective term and the interpretation may vary
according to ‘relevant to whom or to what’. As to the question of ‘relevant to
whom’, the DAC definition indicates the target group, partner (recipient)
government, and donor government. ‘Relevant to what’ involves needs,
priorities, and policies. Are the target group, recipient government, and donor
government have the same needs and policy priorities? It is hard to say they
always are. The target group’s needs may depend on the context and be
specific to local conditions. A recipient government may have a different
priority from the people in need, possibly due to political interests of those in
charge of making policies or implementing the donor-supporting programs.
Donor’s priority may reflect its diplomatic, political or economic interests that
may not be always consistent with the needs of target population.
A question also arises on how to identify the policy priorities of recipient
government or donor’s. For example, policy documents of developing
countries, such as national development plans or multi-year sector strategies,
tends to encompass almost all areas and sectors because they need
57
comprehensive strategies for national development. Thus, it is rather rare to
find a development project that would not be justified by policies of the
recipient government. Likewise, one would easily find a way or another to
justify an activity claimed to be a development intervention being consistent
with policies of donors and aid agencies. The policy context would not
sufficiently reflect the diversity of target communities nor the differences in
priorities at central and local levels.
So the definition of relevance itself does not serve as a good guidance for
making a thoughtful judgement whether the intervention under assessment is
relevant. It even bears a risk of misleading evaluators to use the policy context
as the primary yardstick to assess the relevance criterion. This is an issue
commonly criticized by many. Heider (2017), noting that meeting the bar for
relevance is not all that hard, argues that the relevance criterion might be
irrelevant in the world of complexity. Chianca (2008) points out that the
context and significance of the intervention for the donors and governments are
important to understand in evaluation but it is not necessarily evaluative. He
adds that the core function of the relevance criterion should be “to determine
whether the intervention’s design, activities, and initial results are adequate to
respond to existing needs”. Markiewicz and Patrick (2016) have replaced the
term relevance with ‘appropriateness’ in their book, as they consider the latter
to be more inclusive encompassing the needs of key stakeholders and program
beneficiaries.
58
Interestingly, the focus of sample questions in the relevance criterion is
more on the program’s objectives than policy context. The DAC suggests:
In evaluating the relevance of a programme or a project, it is useful to
consider the following questions:
• To what extent are the objectives of the programme still valid?
• Are the activities and outputs of the programme consistent with
the overall goal and the attainment of its objectives?
• Are the activities and outputs of the programme consistent with
the intended impacts and effects? (OECD n.d.)
The questions ask whether the programme is designed in a way the objectives
and the intended effects can be achieved from the intervention, which is similar
to assessment of the so-called program theory, i.e., whether the program is
logically and theoretically sound in achieving the outcomes. The sample
questions address at least some aspects of design of intervention but still not
sufficiently in a sense that they focus more on the objectives and overall goal,
rather than the need of the intervention itself. Taking only the definition and
sample questions into the consideration, one would view an intervention as
relevant if it is aligned with policies and if it is plausible to achieve the goal
with planned activities.
3.3.3. Effectiveness
59
‘Effectiveness’ is a term that causes confusion. It requires a different
interpretation from ‘effects’ of an intervention or when the intervention is
‘effective’. The DAC defines ‘effectiveness’ as “a measure of the extent to
which an aid activity attains its objectives”. The Glossary of Key Terms
provides a more elaborated definition of effectiveness as follows:
The extent to which the development intervention’s objectives were
achieved, or are expected to be achieved, taking into account their
relative importance. Note: Also used as an aggregate measure of (or
judgment about) the merit or worth of an activity, i.e. the extent to
which an intervention has attained, or is expected to attain, its major
relevant objectives efficiently in a sustainable fashion and with a
positive institutional development impact. Cf. Effect: Intended or
unintended change due directly or indirectly to an intervention.
(OECD 2002, 20)
This detailed definition has the element of relative importance of objectives.
It also recognizes the term can be used as an aggregate measure of the merit or
worth of an activity with consideration of relevance of objectives, efficiency,
sustainability of development impact. It is interesting that a ‘Cf’ explaining
the term ‘effect’ is added. According to the note, ‘effect’ means “intended or
unintended change due directly or indirectly to an intervention”, which is very
similar to the definition of ‘impact’ criterion to be discussed later. In the
Glossary, the definitions of effects, impact, results, and outcome are very
similar and so it seems meaningless to distinguish those terms.
60
The above discussion indicates that the term ‘effectiveness’ can be
understood in different ways. Nonetheless, an assessment of ‘effectiveness’
of an intervention generally has more focuses on achievenments against its pre-
set objectives rather than on effects. The sample questions that the DAC
suggests to consider are:
• To what extent were the objectives achieved or are likely to be
achieved?
• What were the major factors influencing the achievement or
non-achievement of the objectives?
In fact, the DAC definition of ‘effectiveness’ is consistent to the one
widely accepted in the evaluation field in general. In Encyclopedia of
Evaluation, ‘effectiveness’ is defined as “the extent to which an evaluand
produces desired or intended outcome” (by Davidson in Mathison 2004, 122).
In the definition, however, it is asserted that “effectiveness alone provides a
poor assessment of overall evaluand merit or worth”. The problems are: an
evaluand can be effective in terms of producing desirable intended outcomes,
but can produce unintended negative effects at the same time or be overly costly.
Demonstration of a causal link between the evaluand and the desired outcomes
is required if the evaluand can be claimed as effective.
The same applies to the DAC definition of effectiveness. It would not
be fair to say that a development intervention is effective when it yields
61
desirable intended results with serious negative impacts or at an excessive cost.
The DAC framework provides other criteria such as impact and efficiency to
assess ‘unintended’ or ‘negative’ impact or costs. They may serve as
safeguards, as long as the assessments can be properly combined into an overall
assessment. With regard to a causal link, the ‘effectiveness’ criterion does not
ask a why question, so it is left to evaluators whether to investigate if the
achieved objectives are outcomes caused by the intervention and not
coincidental changes.
There are other problems associated with using preset objectives as
evaluation criteria or so-called objective-oriented evaluation (Davidson 2005,
Scriven 1991). First of all, it makes evaluators concentrate on the effects the
program is intended to bring about, so miss the unintended outcomes in many
cases unconsciously. Second, if the program objectives are not clearly defined
or too ambitiously or conservatively targeted, their achievement itself would
not provide meaningful information about the program’s merit, worth or
importance. Objective-oriented evaluation may induce those who set the
objectives at the designing stage to lower the level of targets so as to be
achieved easily. In addition, difference in difficulty to achieve or importance
between objectives can be problematic in case of programs with multiple
objectives.
The ‘effectiveness’ criterion involves similar weaknesses of objective-
oriented evaluations such as: validity of the objectives and target levels, relative
62
importance between objectives, competing or even completing objectives,
tendency to set objectives that are more visible and easy to address, and less
attention to unintended or negative side effects of the intervention. It also
heavily rests on an assumption that the objectives have causal links with the
intervention. These issues need to be addressed carefully for the assessment
of ‘effectiveness’ to provide meaningful information for the overall evaluation
results.
3.3.4. Efficiency
Efficiency is a term causing a confusion as much as effectiveness. In the field
of evaluation in general, efficiency is defined as “the extent to which an
evaluand produces outputs and outcomes without wastage of resources such as
time, efforts, money, etc.” (by Davidson in Mathison 2004, 122). It differs
from ‘cost-effectiveness’ in the respect that the latter compares both the cost
and results (by Levin in Mathison 2004, 90). Chianca (2008) argues that cost-
effectiveness is a more comprehensive term in defining the concepts embedded
under the efficiency criterion. Nonetheless, the distinction between the two is
not very clear.
In the DAC documents 13 , the definition of efficiency and sample
13 The DAC Glossary provides a rather short and simple definition of efficiency as “a
measure of how economically resources/inputs (funds, expertise, time, etc.) are
63
questions appear as below:
Efficiency measures the outputs—qualitative and quantitative—in
relation to the inputs. It is an economic term which signifies that the
aid uses the least costly resources possible in order to achieve the
desired results. This generally requires comparing alternative
approaches to achieving the same outputs, to see whether the most
efficient process has been adopted. When evaluating the efficiency
of a programme or a project, it is useful to consider the following
questions:
• Were activities cost-efficient?
• Were objectives achieved on time?
• Was the programme or project implemented in the most
efficient way compared to alternatives?
Proper attentions should be given to ‘qualitative and quantitative’ and ‘cost-
efficient’. Assessment of ‘efficiency’ should examine the input-output
relations with both qualitative and quantitative consideration, and whether the
intervention achieved the desired results, supposedly outputs stated in the first
line, at the least costly process in comparison with alternatives. According to
the definition, an intervention can be regarded as efficient if it achieved the
output with least costly resources in comparison with alternatives.
On the other hand, the question whether the intervention was completed
on time involves a comparison to the initial plan. It is important to follow the
converted to results”.
64
plan considering the cost of delay, but only when the plan was
reasonable. Once the benchmark is the initial plan, it would make the program
implementers keep the planned schedule even in cases when they find serious
flaws in the initial design and adjustments are necessary. Especially when a
project is designed by donor and implemented in the developing countries, there
is often the cases where the initial plan does not fully consider the local context
and potential risks. Sometimes it could be more efficient to make necessary
adjustments to the plan rather than to follow a wrong or flawed plan.
Another challenge in assessing efficiency as defined is that it is hard to
find benchmark programs to compare the program cost and output with. Even
one attempts to conduct a cost-effective analysis which compares the costs of
alternatives for outcomes in a same unit, it is rather difficult to measure because
it assumes an identical objective (or effect) between projects in comparison.
Therefore, judgements on efficiency can be very subjective unless one could
find good benchmarks.
3.3.5. Impact
While ‘effectiveness’ means the extent to which an intervention achieves its
objectives, ‘impact’ considers broader results of the intervention, as defined:
The positive and negative changes produced by a development
65
intervention, directly or indirectly, intended or unintended. This
involves the main impacts and effects resulting from the activity on
the local social, economic, environmental and other development
indicators. The examination should be concerned with both
intended and unintended results and must also include the positive
and negative impact of external factors, such as changes in terms of
trade and financial conditions. When evaluating the impact of a
programme or a project, it is useful to consider the following
questions:
• What has happened as a result of the programme or project?
• What real difference has the activity made to the beneficiaries?
• How many people have been affected?
Anyone would hardly disagree that a development intervention should
yield a long-term positive impact whether intended or unintended and minimize
negative impact if any. But conceptualizing development impacts is not that
easy. If we are looking for the impacts “positive and negative, primary and
secondary long-term…, directly or indirectly, intended or unintended” as
defined in the DAC Glossary (OECD 2002, 24), the scope of evaluation would
be very wide.
The most controversial issue in this criterion has been related to the
methodology, namely, how and which impact to be measured and assessed. At
the most ambitious, the impact criterion requires a rigorous evaluation design
with rather complicated and expensive methodologies, e.g., randomized
controlled trials (RCTs) to provide scientific evidence of the impacts. The
66
prevalent view in the development evaluation community, however, is that
rigorous evidence-based evaluation is not likely possible nor feasible in most
cases, and even not necessarily desirable (Forss and Bandstein 2008).
Development impacts are diverse and cannot be defined in a simple way, so it
is almost impossible to design an evaluation that could provide answers to
multiple questions. Long-term impacts, if they have not occurred at the time
of evaluation, are not feasible to assess. A more practical reason is that it is
too costly.
For these reasons, some development agencies exclude the impact
criterion when making overall judgements on interventions. For example,
ADB recommends impact criterion to be considered under ‘other performance
assessment’ section and not synthesized in overall assessment of projects (ADB
2016, 23).
3.3.6. Sustainability
In the Glossary, sustainability is defined as the “continuation of benefits
from a development intervention after major development assistance has been
completed” (OECD 2002, 36). It also means “the probability of continued
long-term benefits or the resilience to risk of the net benefit flows over time”.
Detailed description and evaluation questions are as below:
67
Sustainability is concerned with measuring whether the benefits of an
activity are likely to continue after donor funding has been withdrawn.
Projects need to be environmentally as well as financially sustainable.
When evaluating the sustainability of a programme or a project, it is
useful to consider the following questions:
• To what extent did the benefits of a programme or project
continue after donor funding ceased?
• What were the major factors which influenced the achievement
or non-achievement of sustainability of the programme or project?
The definition of sustainability assumes that the intervention under
evaluation produces benefits that are worth continuing. It does not ask how
long the benefit should continue at what cost. Relevant questions would arise:
would it be worthwhile to maintain a program when it is inefficient and likely
to cost more than the benefits it would generate or when unintended negative
effects exist?; what if the benefits would become insignificant in the near future
not because the intervention is a failure but because it inevitably has a short life
due to rapid changes in technology for example?
Whether or not an intervention is worth continuing depends on two
questions: first, whether the intervention yields sufficient beneficial outcomes
and second, whether the benefits will be enough to justify the costs to be
incurred in sustaining the outcomes.
The first question, whether the intervention under evaluation yields
sufficient outcomes and is likely to do so in the future, is related to the other
68
criteria, i.e., effectiveness and impact. It may not be sensible to measure
“whether the benefits of an aid activity are likely to continue after donor
funding has been withdrawn” if the development outcomes it generates are
insignificant or when negative externalities exist. So, defining the scope of
sustainability of an intervention should be based on the assessments in
‘effectiveness’ and ‘impact’ criteria.
The second question, whether the benefits in the future can justify the
costs to be incurred, is about efficiency or cost-effectiveness of maintaining the
intervention. It would not be worthwhile to continuing a program if it is likely
to cost more than the benefits it would generate. After the donor funding is
withdrawn, the costs of continuing program are supposed to be borne by
recipient government or community and they would not have an incentive to
maintain the program if it is not cost-effective.
The evaluation models examined earlier also support these arguments.
In KEC model, sustainability of program’s effects is supposed to be covered
under ‘outcomes’ category. CIPP model suggests that in sustainability
evaluation evaluators need to identify what success should and could be
sustained. Using the concept of evaluation hierarchy in theory-based models,
assessment of sustainability rests on the findings under the effectiveness,
impact and efficiency criteria as the building blocks.
To summarize, the scope of sustainability should consider 1) the
magnitude of benefits which have been or is likely to be generated after
69
withdrawn of donor funding, 2) the costs to be incurred to sustain the benefits
during the expected life of intervention.
3.4. Criterion in Cost-Benefit Analysis – Net Present Value
The NPV is one of the indicators to measure the social welfare generated by a
project being assessed, and probably the most reliable one (de Rus 2010, 129,
Boardman et al. 2011, 13). It summarizes the social value of a project in a
single figure by subtracting all the costs (C) from the benefits (B) accrued for
the period of project life (n) with the appropriate discount rate (r). It is
generally expressed as follows:
n
tt
tt
r
CBNPV
0 )1(
Bt and Ct represent benefit and cost at year t respectively, which include
direct and indirect benefit and cost as well as those whose market prices do not
exist. If the NPV is zero, the project’s present value of benefits is equal to the
present value of its costs, which would make either decisions on whether to
accept or reject indifferent. If the NPV is greater than zero, investment in the
project will yield more benefits, compared to no investment or investing to
another with less NPV. Two general statements can be drawn: 1) a project is
worth investing if the NPV is positive; and 2) a project with greater NPV is
70
worthier than one with smaller NPV. So the decision rules are: 1) for a single
project, adopt the project if its NPV is positive; 2) out of multiple, mutually
exclusive projects, select the project that maximizes the net social benefit,
which means one with the largest NPV.
CBA is generally thought to measure efficiency, but the definition is
much broader than one in the DAC criteria. CBA in principle examines all
consequences against the costs invested as well as to be incurred in the future.
Therefore, it helps organize understanding of the consequences of an
intervention and consider the multiple implications together, by analyzing each
of the essential elements and synthesizing the findings from the analyses.
This corresponds to what evaluators do when conducting evaluations.
The principle of CBA is that an intervention is worth implementing when
the benefits exceed its costs. The net benefit is calculated as the net present
value (NPV) which is the main indicator to express the social value of a project.
Based on the CBA theory which assumes consideration of all direct and indirect
effects as consequences of a project against costs for a certain period, I argue
that CBA examines all aspects that the DAC criteria suggest. As an indicator
that represents the social value of a project, the NPV would reflect the needs
(relevance), benefits in relation with costs (efficiency), increase in social
welfare (effectiveness and impact), and the period that the benefits maintain
(sustainability).
71
Figure 6. Net Present Value (NPV) and DAC Criteria
The NPV is also comparable between projects, i.e., which project is more
worthy or successful, as the NPV shows the net social value in a single figure.
Comparability is one of the main purposes for which the DAC criteria were
developed, so it is also possible and meaningful to compare the evaluation
results between the DAC framework and CBA.
In the DAC framework, effects are generally called as ‘results’ of the
intervention, which include intended positive effects, supposedly to be assessed
under effectiveness criterion, as well as unintended effects whether positive or
negative under impact criterion. From the definitions, effects or results dealt
with in both CBA and the DAC framework are not very different. However,
how to view those effects and how much value to put are quite different.
72
A CBA starts from the ceiling, meaning that the first job is to identify
those effects as exhaustively as possible. Then it is required to examine
whether the effects are caused by the intervention, whether they are double-
counted, and what is the net value (if any social costs are accompanied with the
social benefit under discussion) or the present value of those effects (in case
they occur in different times).
On the other hand, the DAC definitions of effectiveness and impact do
not provide explicit boundaries of causality, overlapping, or time. So an
assessment generally starts from the minimum, the intended positive results,
and then adds other effects which are (luckily) found by evaluators and deemed
to be plausibly caused by the intervention rather intuitively. There seems to
be no clear standard on how to value the magnitude of those effects. In case
of long-term outcomes, often used as synonym to impact, the present value is
rarely considered. It is possible that even if some negative impacts exist but
was not considered by evaluators intended or unintended.
The limitation of inferring causality in the DAC framework is similar to
that in CBA at best or much looser (as we have seen in many reports in the cases
examined in the preceding chapter). While listing impacts plausibly caused
by the intervention, overlaps may happen between the claimed results.
73
3.5. Discussion
The review of multiple evaluation theories shows that the DAC criteria cover
most of the dimensions that other established evaluation models suggest.
Figure 7 illustrates the comparison of evaluation criteria or dimensions
examined in this chapter.
Source: organized by author.
In the DAC evaluation framework, the five criteria are used as analytical
categories. Evaluators assess the evaluand from each dimension or category
of analysis defined in the DAC criteria. Based on the assessments in all
categories, the overall conclusion, that is, the final evaluative judgement is
Figure 7. Comparison of Dimensions or Criteria in Evaluation Models
74
drawn. The DAC criteria framework is for a summative evaluation which
focuses more on outcomes than process. Assessment of process is not explicit
in the DAC criteria. The word ‘process’ only appears once in the description
of efficiency criterion, “whether the most efficient process has been adopted”.
The DAC definition of evaluation also indicates that evaluation is the
assessment of an on-going or completed aid activity, determining the worth or
significance of the activity (OECD 2002, 21-22). Therefore, the purpose of
evaluation using the DAC criteria is more for accountability by determining
whether the intervention was successful as well as for helping make decisions
on whether to continue, expand or export the intervention on what conditions
or modifications.
The analysis found that the DAC criteria are interrelated, and sometimes
one is a precondition of another. For example, relevance of the objectives and
program design is a necessary condition of a project being effective and
efficient by the definitions. If the objectives are not adequate (relevant), there
is no point to measure the extent to which the objectives were achieved, which
is effectiveness. The similar argument applies to efficiency criterion: it may
not be important to follow the plan and complete the project on time or on
budget if the initial plan was irrelevant, not designed for an efficient
implementation. Sustainability assumes that the project under evaluation
produces and will produce net benefit which encompasses both positive and
negative outcomes whether intended or unintended, so assessment in the
75
effectiveness and impact criteria should be considered.
The effectiveness criterion can overlap with impact. By the DAC
definition, assessment of impact requires a comprehensive analysis the results
caused by the intervention, which include positive and negative, intended and
unintended results. The effectiveness measures the extent to which the
intervention achieved its objectives which means the intended positive results.
For this reason, some argue that effectiveness could be subsumed under impact
rather than be a stand-alone criterion (Chianca 2008).
The issues of interrelation and overlap between criteria have an
implication to the relative weight of criteria and method of synthesis, which
will be discussed in Chapter 5. Relative weight is also an issue within a
criterion. For example, relevance to which is more important, priority of
recipient government, donor’s policy or the needs of target population? If
there are multiple objectives and the intervention achieved a less important
objective fully but a more important one partially, what overall judgement can
be made in effectiveness criterion? This is the topic of standard of judgement,
which will be covered in Chapter 4.
76
CHAPTER 4. STANDARD OF JUDGEMENT – REVIEW OF EVALUATION
REPORTS IN KOREAN AGENCIES
4.1. Overview
The term ‘standard’ in evaluation means the level of quality needed for a certain
judgement. Constructing standards with which performance are compared
consist the second and third steps in the logic of evaluation. Since the DAC
criteria provide brief definitions and a few sample questions as discussed in the
previous chapter, the analysis of judgement standards requires both conceptual
and empirical approaches. For this reason, this chapter presents an analysis of
65 ex-post evaluations published by two main ODA agencies in Korea, KOICA
(Korea International Cooperation Agency) and EDCF (Economic Development
Cooperation Fund), from 2013 to 2015.
As an emerging donor and a member of DAC, Korea has incorporated
the DAC evaluation principles into its ODA evaluation policy, which requires
all agencies to use the five criteria as a primary evaluation framework. As
required by the evaluation policy, all evaluations under review used the DAC
criteria and reported final ratings by criteria. The main question of the
analysis is whether the DAC criteria serve as a good framework to fulfill the
evaluation purpose, that is, to provide credible and useful information for
77
learning and decision-making. Specifically, the evaluation reports are
examined following three questions: 1) what evaluation questions they ask
under each criterion; 2) what methodologies and evidences they use to answer
the questions; and 3) whether the findings are consistent with the ratings
KOICA and EDCF have their own evaluation units and guidelines
KOICA, responsible for Korea’s bilateral grants, established its Development
Cooperation Evaluation Guidelines established in 2008 which was updated in
2014. Since 2013, KOICA has adopted a “Project Result Rating System” by
which all ex-post evaluations are required to assess and rate the projects in
accordance of DAC criteria. EDCF also applies its own rating system in ex-
post evaluations.
Table 3. Rating System and Soring Scale
Criteria Sub category Rating Scale
Relevance
1. Relevance to development strategy and needs
of partner country, and to Korea’s development
cooperation strategy
4 3 2 1
2. Relevance of design and implementation 4 3 2 1
3. Ownership of the partner country 4 3 2 1
Average (a)
Efficiency
1. Cost Efficiency (within the planned budget) 4 3 2 1
2. Time Efficiency (within the planned time
frame) 4 3 2 1
3. Results against inputs 4 3 2 1
Average (b)
Effectiveness/
Impact
1. The extent to which objectives are met 4 3 2 1
2. Positive or negative impacts on society,
economy, institutions 4 3 2 1
Average (c)
78
Sustainability
1. Human resources, institutional and financial
aspects 4 3 2 1
2. Maintenance capability and management
system 4 3 2 1
Average (d)
Total Score (a+b+c+d) 16
Source: CIDC Sub-Committee on Evaluation (2015).
All ex-post evaluation reports published by EDCF and KOICA during
2013-2015 were collected for the analysis. For the 3 years, 65 ex-post
evaluations were available, 47 by KOICA and 18 by EDCF. The brief
information on the evaluation reports collected is in Table 4. Figure 8
illustrates the classification of the projects by sector.
Table 4. Descriptive Data of the Samples
All KOICA EDCF
Number of report 65 47 18
2013 25 19 6
2014 27 21 6
2015 13 7 6
Average size of project
(amount supported, million)
9.1 4.4 21.4
When (years after project
completion)
3.9 3.5 5
Duration (month of
conducting evaluation)
5.7 5.5 5.8
79
Figure 8. Evaluation Reports Classified by Sector
The sample consists of project evaluations with an average size of $9.1
million, conducted about 4 years after project completion in average. All are
conducted independently, with an average duration of 6 months. Information
collecting methods included desk review, interviews, and field visits. Some
conducted surveys but not in a large scale. It was rare to find evaluations with
detailed descriptions on the basis on which they selected interview or survey
respondents or what methods they used.
In the analysis, the following three questions are examined in each
criterion:
• Question 1: what evaluation questions (sub-criteria) they ask?
80
• Question 2: what methods and evidences they use to answer the
questions? What are the standards of judgement?
• Question 3: how the evaluative conclusion was drawn? Is the
rating consistent with the findings?
4.2. Result of the Analysis
4.2.1. Relevance
Evaluation questions asked in relevance criterion were diverse. Figure 9
shows the list of 14 different questions and how many evaluations out of 65
asked each of them. Nonetheless, most of the questions largely fall into two
categories of sub-criteria, one in relation to policies and the other concerned
with program design or implementation process.
81
Figure 9. Evaluation Questions in Relevance Criterion
Questions related to policies asked if the project under evaluation is
aligned with development strategies and priority of recipient country, Korea’s
ODA policy and agency’s priority, or global development agendas such as
MDGs, Paris Declaration (PD) or Accra Action Agenda (AAA). Questions on
program design and implementation mostly dealt with whether the plan was
appropriate in terms of schedule and budget or whether the process of planning
or implementation was in conformity of agreed requirements.
Not many evaluations, 15 out of 65, examined whether the objectives
were valid, and the project elements was designed to achieve the objectives.
Surprisingly, more than 20% (15 evaluations) did not assess the validity of
82
objectives nor the logical links between the intervention and objectives.
Assessment about the need for the intervention was rare. Even when asked,
the supporting arguments heavily relied on project documents, e.g., ex-ante
appraisal reports. In other words, it seems that the need of target population
is not the primary aspect to be examined…
Table 5 summarizes the standard of judgements and source of supporting
information based on which the assessments in the sub-criteria were made.
Table 5. Main Standards of Judgment in Relevance Criterion
Sub-criteria Standards of judgement Sources of supporting
information
Consistency with
development
strategy and policy
priority in the
recipient country
Consistent if the objectives of
the intervention are aligned with
those in policy documents
(national development plan,
sector/multi-year strategy, etc.)
Mainly document review,
Partly interviews with
government officials or
rarely with beneficiaries
Consistency with
Korea’s ODA policy
and sector/country
strategy
Consistent if the project sector or
objectives are aligned with
Korea’s ODA policy document
or agency’s sector strategy
Document review
Consistency with
international agenda
(MDGs,
harmonization)
Consistent if the project
objectives are included in MDGs
or no overlap with other donors’
activities is found
Document review,
In some cases, interview
with donor agencies
Relevance of project
design/decision-
making process
Relevant if the plan in terms of
schedule and budget, or the
process of planning and
implementation was appropriate
Mainly document review,
Interview with
stakeholders
Relevance to target
area
Relevant if the target area was
selected in due process, if
project elements (equipment,
training, technology, etc.) were
appropriate in local conditions
Document review,
Interview,
Site-visit
83
It is not surprising that the first question, which was asked by all
evaluations in the sample, was whether the aim of the intervention was
consistent with development policy of recipient country. Majority of the
evaluations, 86%, also examined the consistency with policy and sector strategy.
Assessments on the above questions were largely based on the findings from
document reviews, e.g., national development plans, sector/multi-year
strategies, Korea’s ODA policy, etc. Some evaluations presented the results
of interviews with stakeholders, e.g., government officials in recipient country,
but it seems to be natural that the findings were very positive considering the
nature of questions. There was no single case that found the project objectives
were not consistent with the policies or development strategies of both recipient
government or Korea.
Findings in relevance criterion do not necessarily require in-depth
research either in document review or on site. The most frequently asked
question related to the consistency with strategy and priority of recipient
country was easily answered with findings from policy documents. Questions
concerned with project design, e.g., whether the planning process was
appropriate or participation of and cooperation with the recipient government
was sufficient, was assessed based on the project documents or the results of
interviews with stakeholders who gave positive answers in most of the cases.
As a result, the overall ratings are rather high, around 3.5 in average out 4-point
scale as shown in Table 6.
84
Table 6. Average Ratings on Relevance
KOICA EDCF Rating Description
2013 3.65
(2.74) 3.82
4: very relevant
3: relevant
2: partly relevant
1: irrelevant
2014 3.43
(2.57) 3.48
2015 3.27 3.73
All 3.45 3.68
Note: KOICA used 3-point scale in 2013 and 2014. For comparison purpose, the
rescaled (1~4) ratings are presented, and the original scores are in parenthesis. The
same applies to the tables for other criteria.
It is fair to say that the primary focus on policy context results in overall
positive assessments in relevance criterion. There are many cases which
received high scores over 3 (meaning the project was relevant) although the
project design was not relevant to produce the intended outcomes. For
example, a project for construction of solid waste recycling facility in
Ulaanbaatar City, Mongolia, supported a RDF (refuse derived fuel) production
facility which was inappropriate to the local condition and turned out to be
completely useless. This flaws in project design was discussed in relevance
assessment but did not prevail in the judgment. The dominant justification for
relevance was that the overall purpose of the project was consistent with the
development policy of Mongolia, which led the higher score of 2.7 out of 4 than
it seems to deserve.
85
Overall, the assessments of relevance criterion in the evaluations
reviewed in this study did not involve serious investigations on the need of the
project, the local priorities, or the logical linkage between the project activities
and intended results.
4.2.2. Efficiency
In the DAC definition, ‘efficiency’ criterion focus on whether the intervention
was implemented in the least costly manner. In evaluations of Korean
agencies, the most common question was whether the project was completed as
planned in terms of time and budget. Half of the sample assessed the process
with focus on communication and cooperation with stakeholders, e.g., the local
government. Only 26% made assessments the results against input, and less
than 10% attempted a comparison to alternatives, e.g., similar project
implemented by other development agencies.
Other questions in efficiency criterion are concerned with the process of
communication or management, which often appears in relevance criterion, too.
Some evaluators who attempted to conduct process evaluation asked related
questions in either relevance or efficiency criteria, as the DAC definitions do
not explicitly mention an assessment of process. The list of questions and
their frequency is presented in Figure 10.
86
Figure 10. Evaluation Questions in Efficiency Criterion
Information on whether the project was completed within the planned
budget and on time is not hard to obtain. In most cases, the assessment in the
cost- and time-efficiency sub-criteria was based on the records in project
documents. The judgement was made according to the rating standards in the
guidelines, described in Table 7. For example, if a 2-year project with a
budget of $1 million was completed in 2.5 years at total cost of $1.2 million, it
would be assessed as ‘partly efficient’ in terms of time and ‘efficient’ in terms
of cost. It implies that the project plan is assumed to employ the most efficient
way of resource use, and as a result the project plan became the most important
standard of judgment against which the efficiency was assessed.
87
Table 7. Main Standards of Judgment in Efficiency Criterion
Sub-criteria Standard of judgement Sources of supporting
information
Cost efficiency
Very efficient if completed within
planned budget
Efficient if completed within 120%
of planned budget
Inefficient if exceeded 150% of
planned budget
Document review,
Interviews
Time efficiency
Very efficient if completed on time
as planned
Efficient if completed within 120%
of planned time
Inefficient if exceeded 150% of
planned time
Document review,
Interviews
Communication
process,
partnership
Degree of involvement of
stakeholders (e.g., local
governments) in the process
Document review,
Interviews with
stakeholders
Results against
input
The extent to which the project cost
is deemed to be reasonable in
comparison to alternatives
Review of similar
interventions in the area,
Interviews
Efficiency in
management
Status of operation of the
facility/equipment supported
Site- visit,
Interviews
Most of the projects under evaluation exceeded the planned time. In
some cases, additional budget was provided. If judgements were made strictly
in accordance with the standard in the guidelines, there would have been many
projects to be assessed as ‘inefficient’ in time- and cost-efficiency sub-criteria.
When the project was delayed by more than 50% of planned period, which
would fall into the category of ‘inefficient’, it was often justified by inevitable
or unforeseen circumstances, for example, delays due to customs procedure or
slow administration in recipient government. There were only two projects
88
which received an overall assessment of being inefficient.
Table 8. Average Ratings on Efficiency
KOICA EDCF Rating Description
2013 3.01
(2.26) 3.45
4: very efficient
3: efficient
2: partly efficient
1: inefficient
2014 3.37
(2.52) 3.22
2015 2.93 3.30
All 3.10 3.32
Table 8 presents the average ratings on efficiency by agency and by year.
It indicates that the average falls into the ‘efficient’ category. The time- and
budget-efficiency were dominant sub-criteria that lowered the overall rating
compared to that on relevance, even with some excuses of inevitability.
4.2.3. Effectiveness and Impact
Effectiveness and impact criteria are discussed together in this section as
evaluations by KOICA combine the two criteria and give one rating.
Questions in the effective criterion are rather standardized into four types
(Figure 11). Most of the evaluations asked about achievement of output and
outcome (goals) as the primary sub-criteria, which is not surprising in
89
consideration of the DAC definition of effectiveness. The other frequently
asked question is whether the outputs, i.e., facilities or equipment provided by
the project, were well utilized. This question also appears in efficiency
criterion. The level of beneficiaries’ satisfaction served as another important
sub-criterion.
Figure 11. Evaluation Questions in Effectiveness Criterion
In impact criterion, 80% of evaluations identified ‘achievement of long-
term goals’ as the primary sub-criteria. Other questions considered economic
and social impact, impact on recipient’s policy and system, and so on. 26%
of the evaluations raised a question about unintended impacts, while 20% asked
90
on the project’s influence on bilateral relationship between the recipient country
and Korea. Impact on environment was considered in 5 evaluations, as others
covered the issue of environment along with gender mainstreaming under the
cross-cutting issues which are not included in calculation of the final rating.
Figure 12. Evaluation Questions in Impact Criterion
Table 9 summarizes how and with what supporting information the
judgements in effectiveness and impact criteria are made. Achievement of
outputs or outcomes are measured against the preset targets in original plan in
most cases. Generally, measuring outputs is straightforward and the records
are readily available. Most of the projects succeeded in producing the planned
output. Measuring achievement in outcomes or long-term goals, on the other
91
hand, is a challenging task. It was rather common that the targets or indicators
for outcomes had not been clearly set, so the assessments were often based on
output indicators only or on evaluators’ conjecture.
Under impact criterion, various impacts were described but rarely with
plausible causal links. There was no evaluation which attempted a causality
test. In many cases, the assessments were based on speculation. Claimed
impacts were often supported by macro data which covers far wider
geographical area or in little relations to the project’s results. For example, in
the evaluation of a project of which the main activity was provision of medical
equipment, it was stated that the project was deemed to contribute to
improvement of health condition and drop in fatalities in the target area on a
basis of the improved health indicators (EDCF 2014). However, the claims
are highly exaggerated, as the indicators and data were from the regional
statistics on which the influence of the project would have been very limited.
92
Table 9. Main Standards of Judgment in Effectiveness/Impact Criteria
Sub-criteria Standard of judgement
Sources of
supporting
information
Achievement in
planned outputs
Very effective if over 90% of
planned output/outcome achieved
Effective if 70~90% of planned
output/outcome achieved
Partly effective if 50~70% of
planned output/outcome achieved
Ineffective if less than 50% of
planned output/outcome achieved
Document review,
Data analysis,
Interviews,
Survey,
Site visit Achievement in
planned outcome
Utilization of
outputs
The extent to which the output is
operational as planned
Site visit,
Interviews,
Questionnaires
Impact on society,
economy and
institutions
Whether the project is deemed to
contribute to improvements in
society, economy, institutions, etc.;
The degree of importance of the
project
Document review,
Data analysis,
Site visit,
Interviews,
Questionnaires
Beneficiaries’
satisfaction level
The extent to which the
beneficiaries are satisfied Interviews,
Questionnaires
Unintended
outcomes
Whether unintended negative
impacts exist
Site visit,
Interviews,
Questionnaires
The average ratings on effectiveness and impact are presented in Table
10. While many of the evaluations found that the achievement in outcomes
were not very substantial, the achievement in output raised the overall score in
effectiveness. The average score in impact does not seem to be reliable,
because the assessments were mostly based on speculations and even
expectations without plausible causal links.
93
Table 10. Average Ratings on Effectiveness/Impact
KOICA EDCF
Effectiveness/Impact Effectiveness Impact
2013 3.09
(2.32) 3.63 3.48
2014 3.56
(2.67) 3.63 3.78
2015 3.14 3.59 3.58
All 3.26 3.62 3.61
4.2.4. Sustainability
The questions in sustainability criterion largely fall into three sub-criteria:
policy and institutional supports; financial sustainability, and maintenance
capacity (Figure 13 and Table 11). They cover most of the required aspects
that the DAC definition suggests (except for the environmental sustainability
which is dealt with separately under the cross-cutting issues criterion).
However, not many evaluations examined whether the demand for the
intervention is likely to be sustainable or whether the benefits are significant
enough to be worth continuing at the given maintenance costs.
The supporting information was mainly drawn from the results of
interviews with government officials or management personnel with regard to
94
whether they have ownership over the project and are willing to support
financially and institutionally. Assessments were made largely based on
expectations rather than supported by due analysis of the significance of
benefits and the contexts in which the benefits would and could be maintained.
Figure 13. Evaluation Questions in Sustainability Criterion
Table 11. Main Standards of Judgment in Sustainability Criterion
Sub-criteria Standard of judgement Sources of supporting
information
Policy and
institutional
supports
Sustainable if the recipient government
has a policy and institutions to support
the project and strong ownership
Document review,
Interviews of recipient
government
Financial
sustainability
Sustainable if financial resources will
be available for continuing operation of
the project
Document review,
Interviews of recipient
government
Maintenance
capability
Sustainable if the outputs are likely to
be operational with proper maintenance
Site visit, Interviews,
Survey
95
Table 12 shows the results of average ratings in sustainability, which is
the lowest among the five criteria. Looking into details, however, even this
lower score seems to be rather overrated. There was only one case that was
assessed as ‘not sustainable’ with an obvious reason that the supported facility
was no more in operation because of technology and maintenance issues so
little benefit was expected to continue (KOICA 2014). A number of projects
were assessed as ‘partially sustainable’ or ‘sustainable’, even when serious
problems in maintenance were observed or the facilities were not in full
operation so that it was hard to expect significant benefits in the future.
Table 12. Average Ratings on Sustainability
KOICA EDCF Rating Description
2013 2.89
(2.17) 3.41
4: very sustainable
3: sustainable
2: partly sustainable
1: not sustainable
2014 3.11
(2.33) 2.69
2015 2.89 3.30
All 2.96 3.13
It is certain that sustainability is an important aspect to be examined in
development evaluations. However, “measuring whether the benefits … are
likely to continue” in terms of financial and/or institutional resources would not
96
produce meaningful information unless it first addresses the question of what
benefits should and could continue and why.
4.2.5. Discussion
As the decisions on evaluation questions, standards of judgements, and the way
of synthesis are largely left to evaluators, characteristics of the DAC criteria as
against the above features become clear in practice. Empirical review of 65
ex-post evaluation reports by two Korean aid agencies, KOICA and EDCF,
shows that evaluations applying the DAC criteria tend to use similar sub-
criteria and evaluation questions, adopting standardized ones in the guidelines.
Some of the questions do not necessarily require in-depth research or are taken
for granted as a development project. This leads to high ratings in average
especially in relevance and effectiveness criteria.
The priority of evaluations is to answer the standardized questions (for
example, whether the planned outputs and outcomes have been achieved) rather
than to verify the causal links or to measure the effectiveness of interventions,
so the methods are rarely rigorous, and judgements rely on weak inferences or
survey results of beneficiaries’ satisfaction level. 88% of the reports
concluded that the evaluand is either successful or very successful, even in the
cases where development results were insignificant (Table 13).
97
Table 13. Average Overall Ratings of All Evaluations (2013-2015)
very
successful successful
partly
successful unsuccessful Total
All 17 40 8 0 65
26% 62% 12% 0%
KOICA 13 26 8 0 47
28% 55% 17% 0%
EDCF 4 14 0 0 18
22% 78% 0% 0%
The evaluation conclusions are drawn on a simple average of the ratings
by criterion, and a serious flaw detected in one criterion, e.g., sustainability, is
often canceled out by high rating in another, e.g., relevance. I find that such
mechanical applications of the DAC criteria and standardized questions can
mislead evaluation results, in most cases towards positive conclusions, which
make it difficult to understand the true value of the project, to differentiate a
more successful project from a less one, and to draw valid lessons from the
evaluation
4.3. Standards in the DAC Criteria and Net Present Value
Based on the findings in the analysis of how the standards of judgement in the
DAC criteria affect the evaluation results, this section compares them with the
standard of judgement in CBA, i.e., the net present value.
98
Relevance
By definition, a positive NPV means that the project under evaluation increases
the social welfare and satisfies the needs of target population. To increase the
social welfare is the primary purpose of development interventions. In other
words, a project with a positive NPV would be highly relevant to the need of
target population and the NPV can serve as a reasonable standard of judgment
for relevance of an intervention.
In the DAC criteria, ‘relevance’ refers to consistency with priorities and
policies of the donor, recipient country, and the target group. In the previous
chapter, several important questions have been raised regarding how to make
an overall assessment in the relevance criterion. Are the donor’s policy or the
priorities of recipient government always consistent with increasing the social
welfare of the target population? Of course, development cooperation policy
of any donor would support the overall welfare increase of target groups. The
national development strategies or ‘National Development Plan (NDP)’ may
list all dimensions of what people would need. But these facts do not
necessarily assure that specific local-level decisions on resource allocation
would correspond to the priorities of people’s needs, due to, for example, lack
of information, uncertainties, or political interests of policy makers. There are
many exemplary cases that shows political priority was in contrast the opposite
of what people actually needed. A project that increases the welfare of target
group by meeting their needs, that is, a project with a positive NPV, will be
99
more relevant than a project which is consistent with the priorities of
government policy but with unclear prospects of welfare increase in the specific
target group. If the policy priorities of donor or recipient government is
insufficient to reflect the people’s needs for the reasons described above, it
would be appropriate to consider the project’s NPV as a standard of judgement
in assessing the relevance in the DAC framework.
Effectiveness
In the DAC framework, effectiveness is generally measured by the extent to
which the intervention has achieved its objectives. This objective-based
assessment presents many weaknesses especially when the objectives were
poorly defined, unrealistic or underambitious, or more aligned with the needs
of donors or governments rather than the needs of beneficiaries. Meeting the
effectiveness criterion, i.e., the achievement of planned outputs or outcomes,
does not necessarily provide a meaningful information on the intervention’s
success. In other words, the level of achievement of objectives may not
always be an appropriate yardstick to assess the true benefits of an intervention.
CBA measures benefits which are embedded in the NPV. Given that a
development intervention would intend to produce outputs which lead to
positive outcomes, i.e., benefits, CBA involves an assessment of effectiveness
by measuring the benefits of the intervention. A positive NPV indicates that
the intervention has achieved its planned positive results.
100
What CBA measures is more related to outcomes than to outputs. In
fact, achieving outputs is a prerequisite of achieving outcomes, because the
intended outcomes or positive results caused by the intervention are supposed
to be generated through the project outputs. In the evaluations by Korean
agencies, achievements in both outputs and outcomes are considered as sub-
criteria in effectiveness, and high scores in output achievement tend to
compensate relatively low achievement in outcomes. To meet the intended
output target can be one of the sub-criteria in assessment of effectiveness, but
does not guarantee that the outputs will realize the outcomes. A positive NPV
can be a more relevant standard of judgement in assessing effectiveness than
simply measuring the level of outputs achievement.
Efficiency
CBA is generally considered as a tool to measure efficiency. If the NPV is
positive, a project is regarded as efficient in generating the benefits at the costs.
This interpretation of efficiency in CBA, however, is much broader than the
DAC definition of efficiency which focuses on whether an intervention was
implemented in the least costly manner in comparison to alternatives. The
DAC definition does not address whether the costs are justified by the benefits.
An intervention used the least costly resources compared to alternatives will
satisfy the DAC definition of efficiency, but if the intervention yields a negative
NPV, it would not be appropriate to adopt the project or consider it as efficient
101
from the CBA perspective.
It is worth noting that a positive NPV alone does not suggest that it is the
least costly way of resource use. CBA assumes that a project would be worth
investing as long as its NPV is greater than zero, regardless of whether there is
a possibility of reducing the costs in the process of implementation. In this
regard, evaluations using the DAC framework can provide useful information
in addition to NPV in assessing efficiency of an intervention, given that the
NPV is positive.
Impact
The DAC definition of impact deals with changes produced by a development
intervention. The changes to be addressed include positive or negative, direct
or indirect, or intended or unintended results. CBA in principle considers all
direct and indirect consequences of a project, so it clearly covers the impact
criteria.
The DAC definition does not provide an explicit scope of impact nor a
standard on how to assess its magnitude. Assessment generally starts with
intended positive results and the prospects of their long-term continuation after
the evaluation, and then adds other effects which are found by evaluators and
deemed to be plausibly caused by the intervention. In case of long-term
outcomes, the present value is rarely considered. It is possible that some
negative impact exists but was not considered by evaluators intendedly or
102
unintendedly. The claimed impact can be exaggerated with double counting
of overlapping results or due to potential additional costs concomitant with the
impact. It is also challenging to make a judgement on the level of overall
impact of an intervention if there are both positive and negative results, which
is likely to be a case in any development interventions.
The NPV can provide the information on net benefit, by comparing the
unintended negative impacts with positive impacts. Even if there is negative
impact, the intervention can be considered as worthy when the benefits caused
by the intervention is sufficient enough to cancel out its negative consequences.
The challenges of measuring the magnitude of impacts and demonstrating their
causal relationship with the input apply to both the DAC framework and CBA.
As a whole, it seems that CBA would provide more credible information about
tangible impacts as it considers overlaps in benefits and the costs concomitant
with the impact. With regard to the importance of an intervention and its
intangible impact on society, it is hard to expect that CBA would provide
meaningful information as they require a qualitative assessment.
Sustainability
CBA considers the life of intervention during which it is supposed to generates
the costs and benefits. In principle, the NPV is calculated the net benefit
during the project lifespan, based on the analysis of how much cost and benefit
will occur by the project during the period. So the NPV reflects the concept
103
of sustainability in the DAC framework, which measures whether and to what
extent the benefits continue.
In the DAC sustainability criterion, it is not explicitly mentioned how
long would be enough to consider or at what cost. It seems that sustainability
suggests the benefits of an intervention continue for as a long period as possible
regardless of the magnitude of the benefits or the costs to maintain the benefit.
From the CBA perspective, a positive NPV, which covers both the magnitude
of benefit and cost in the net benefit, is more important than how long the
benefit will continue. In CBA, benefits in the distant future make much less
difference to the NPV than those in the near future do, unless the discount rate
is very low (Snell 2011: 53). For example, a project that produces larger
benefit in the first 10 years and less benefit in the next 10 years will have a
greater NPV than a project which generate benefit evenly for 20 years, as the
benefits in the last 10 years have less present value than those in the first 10
years.
It would be controversial which would be more appropriate in the
development context, either a project producing bigger benefit in early years or
one which sustains the benefit for longer years even though its present value is
smaller. Nevertheless, it would be fair to say that sustainability assessment
should consider how properly the intervention can be maintained as well as
what are the expected benefits and the concurrent costs.
104
CHAPTER 5. METHOD OF SYNTHESIS - THE DAC FRAMEWORK AND
COST-BENEFIT ANALYSIS IN COMPARISON
5.1. Overview
Evaluation by definition is a task of determining the value of what is being
evaluated. The final step in the general logic of evaluation involves
synthesizing and integrating data into a judgement of merit or worth of the
evaluand (Fournier 1995). As Scriven (2007) claims, this is the most difficult
task in evaluation, and there is little consensus on best methods to reach these
needed conclusions (Julnes 2012). In development evaluations, it is often
required to make overall conclusions about whether a development intervention
was/has been successful and worth the resources required or can be improved.
To answer these questions is indeed the primary purpose of evaluation in order
to provide meaningful judgements about the overall value and success of the
intervention.
In the development of evaluation theories, there have been some
resistance to making 'evaluative conclusions' or overall judgements among
evaluators (Scriven 1994, 160). The common argument against making
evaluative conclusions is that the decision whether a program is desirable or
not should be made by policy-makers not by evaluators whose role is to provide
105
information for such decision making. However, it is the main clients of
evaluations, e.g., funding agencies, that usually want information on overall
conclusions about development interventions.
In development evaluations using the DAC criteria framework, an
overall, summative conclusion is often required to be drawn, by combining the
separate assessments in the five criteria. In some cases, the results of
individual conclusions under each criterion are integrated into a final rating, for
example, whether it is very successful, successful, partially successful (or
acceptable), or unsuccessful (or unacceptable). Donor agencies in Germany,
Japan, Korea, and the UK as well as development banks such as World Bank
and ADB use rating systems with such semantic scale.
Therefore, drawing an overall conclusion is basically to integrate the
findings from different criteria into a judgement about whether the intervention
was/has been a success. But it is challenging to combine the assessments in
multiple criteria. For example, how to make an appropriate conclusion on an
intervention which has been proved to generate significant beneficial impacts
but to cost much more than a similar intervention? When contrasting
advantages exist between two projects, e.g., one program is little more effective
while the other is substantially less expensive, how can these differing aspects
be combined into an overall valuation?
Stake (2004, 14-15) shows how criterial evaluation can be synthesized
in two difference approaches. First, a compensatory model allows a weakness
106
in one aspect to be compensated for by strength in another aspect. In the DAC
criteria framework, for example, if a project has high level of ‘relevance’ as it
is consistent with policy priorities of both countries, some low level of
effectiveness or sustainability may be accepted. Another approach is a
multiple cut-off model, in which a certain standard must be met on each of
several criteria, otherwise the whole thing is assessed as a failure. With this
approach, one can consider an intervention as unsuccessful if its effectiveness
is lower than an acceptable level no matter how relevant the purposes of the
project were.
In most cases using the DAC criteria framework including the Korean
agencies reviewed in the previous chapter, a compensatory model is applied,
and each criterion has a same level of importance. As such, synthesis of
assessments across multiple criteria involves how to weight values on different
criteria. It is critical to ensure balanced weights on the criteria, as there are
risks that a serious problem detected in one criterion can be compensated for
by good assessments in another criterion. As discussed in the earlier chapters,
there are interdependence and overlaps between the DAC criteria. When the
separate assessments in different criteria are combined, would the overall
conclusion be affected by the interdependence and overlaps? The challenge
of aggregation by combining different criteria make evaluation particularly
complex when there is no consensus on what constitute a good development
intervention or what is really in the best interests of people.
107
On the other hand, cost-benefit analysis applies one criterion, the net
present value (NPV) in measuring the social value of a project. One would
hardly agree that an intervention whose costs exceed the benefits it produces is
worthy or considered as a success. Based on the argument that a positive net
benefit is one condition of a successful intervention, I adopt the criterion used
in CBA, namely the NPV as a benchmark against which overall conclusions
drawn from the DAC criteria framework can be compared.
In the literature review, I discussed that CBA is a useful framework for
evaluation, which involves systemically identifying, measuring, valuing, and
comparing the costs and consequences of an intervention. This is the ground
that I argue CBA can be a good benchmark to the DAC criteria framework for
a summative evaluation. The issues around the application of CBA are also
discussed, especially the difficulties in applying the theory to practice, for
example, how to measure and monetize the values that some critics view as
priceless and how to determine the social discount rate. Such challenges seem
to exist in all evaluation methods. If one is skeptical about CBA for the
practical limitations, any methodology would make him/her skeptical in
valuing a public intervention.
While valuing is one of the central tasks in evaluation, there is little
consensus on which methodology is more appropriate or useful. This lack of
consensus on methods of valuing has been noted within the evaluation
community which has called for responding to the pressure of more systematic
108
approaches to the methods of valuing appropriate for evaluation, with better
integration of methods including economic approach, e.g., CBA (Julnes 2012,
King 2017). A dominant framework such as the DAC criteria may need a
fresh look for valuing and assessing development intervention incorporating
relevant methods in a constructive way.
5.2. Comparison of DAC Framework and CBA: Hypothetical Cases
5.2.1. Illustrative Comparison between Two Projects
Using the DAC criteria and suggested sample evaluation questions make it
difficult to differentiate the synthesis evaluation judgements about the project’s
success, especially when there are no agreed standards to make a judgement on
the project success.
Let us assume that two projects are evaluated. Project A was planned
for three years with the budget of $1 million. It was finished after three years
as planned at the total cost of $1 million, so on time and within the budget.
The size of beneficiary population is 10,000. The project was on the National
Development Plan and had a reasonable program theory that could support the
causal links between the input and the outcomes. It achieved all the outputs
planned and the outcome targets were met. The satisfaction level of
beneficiaries was very high. Hypothetically, the benefit that the project
109
brought to the beneficiaries of the project was valued as $90 per individual in
average. The benefit is expected to continue after donor funding ceases, given
that the local government would keep the promise to provide necessary
financial and human resources to maintain the facility.
From this limited information, some initial assessments can be made
under the DAC criteria. By the definitions of DAC criteria, Project A would
be assessed to be relevant as that the activity was consistent with the needs of
the target group and priorities of the recipient country. Under the program
theory, the intended results are plausibly related to the input and output. The
intervention could be rated as effective in terms of the extent to which the
intervention attains its objectives, which was fully achieved in this case. In
terms of the definition of efficiency, the project was finished on time and within
the budget, so it satisfies the main evaluative questions in the efficiency
criterion. It would be assumed to be sustainable based on the expectation that
the recipient government would take over the responsibility. Overall, Project
A has most of the aspects that the definitions of DAC criteria require.
Project B was planned initially as a three-year, $0.9 million project but
finished in 3.5 years at the total cost of $1 million. It targeted the total
population of 50,000. The project was consistent with the national
development strategy and policy of the recipient government, and the causal
links in the program theory seemed plausible. It achieved the outputs as
planned, but in terms of the outcome targets it only attained half of what was
110
intended. The beneficiaries valued the benefit of the project as $50 per person
in average, and the level of their satisfaction was moderate. If assessed with
the DAC criteria framework, Project B has some flaws in efficiency and
effectiveness criteria. It cost more than planned, that is, additional six months
and $100,000, and only achieved 50% of the objectives with moderate
beneficiary satisfaction though it did produce all the output. The summary of
characteristics of Project A and Project B are shown in Table 14.
Table 14. Comparison of Two Hypothetical Projects
Project A Project B
Budget planned
($) 1,000,000 900,00
Actual Cost
Expended ($) 1,000,000
1,000,000
(Additional 100,000)
Population
(beneficiaries) 10,000 50,000
Context
Of high priority in policy of
recipient/donor, and need of
target group
Consistency of activities and
output with the intended
outcome
Same as in Project A
Process
The project achieved its output
and outcome on time and
within the budget as planned.
The project exceeded the time
and budget in achieving its
output and outcomes.
Achievements
Intended output achieved
Outcome target achieved
100%
Beneficiary satisfaction level
is high
Intended output achieved
Outcome target achieved by
50%
Beneficiary satisfaction level
is moderate
Projected
benefit ($/c) 90 50
111
Compared to Project A, Project B would get less positive assessments in
the DAC criteria. Would it be fair? Given the limited information, we can
infer the value of each project. In Project A, the per capita benefit is assumed
as $90, so the total benefit would be $900,000 which is smaller than the total
cost, $1 million. Project B, on the other hand, produced a lower per capita
benefit of $50 but the total benefit would be $2.5 million, which is 2.5 times
larger than the total cost of $1 million. In sum, the net benefit of Project A is
negative, while Project B shows much larger net benefit than Project A.
In Project A, even though the target outcome was achieved, the net
benefit is negative. Possible reasons could include: the program design was
wrong in estimating the benefit to be generated by the targeted outcome, or the
program theory was wrong. The main problem with Project A is the total
benefit is smaller than the total cost. Even if a project is relevant to recipient’s
priorities and needs, it is possible a project has a negative net benefit. In this
case, the project is not worth investing. The project should be considered as
a failure regardless of how relevant this project is to the needs of targets or the
extent to which it achieved the objectives.
The above is a very simple, hypothetical comparison, but gives some
important implications to evaluations using the DAC five criteria. First, there
is a risk of a project being considered as effective because the project has
achieved the planned or intended outputs and outcome targets (as in the DAC
definition), even when the net benefit is negative. In other words, achieving
112
the objectives does not guarantee a success of the project. Second,
beneficiaries’ satisfaction level may mislead the judgement on effects of a
project, as individual satisfaction would be higher in a project giving more per
capita benefit to a small number of beneficiaries than a project covering a larger
number with less per capita benefit. Third, evaluations only using the DAC
criteria considers the cost-efficiency under the efficiency criterion which in
definition measures the output in relation to the inputs, but does not take into
account the net benefit which is more related to cost-effectiveness of the project.
If the output does not generate enough benefits to make up the costs, it cannot
be said the intervention was cost-efficient.
5.2.2. DAC Framework and CBA in Five Scenarios
Consider a hypothetical development project whose initial investment cost was
$1 million. To simplify the model, several assumptions are made: there is no
recurrent cost after the initial investment; the project is expected to yield
benefits worthy of $120,000 per year including all direct and indirect effects, if
it achieves the planned outcomes; and the life of the project is 25 years. The
initial cost was invested in Year 0, and the benefits would be generated for next
25 years starting Year 1. At the discount rate of 10%,14 the net present value
14 The appropriate level of social discount rate is one of the controversial issues in CBA
discussion, and there seems to be no authoritative answer for that. Some development
113
(NPV) and the benefit-cost ratio (B/C) of the project would be $89,225 and 1.09
respectively, as shown in Table 15.
Table 15. Net Present Value of Base Case
Year (n) Cost (C) Benefit (B) Present Value (PV) discount factor
(r=10%)
0 1,000,000 0 - 1,000,000 1.000
1 0 120,000 109,090.91 0.909
2 0 120,000 99,173.55 0.826
3 0 120,000 90,157.78 0.751
4 0 120,000 81,961.61 0.683
5 0 120,000 74,510.56 0.621
… … … … …
t Ct Bt Bt-Ct 1/(1+r)n
… … … … …
21 0 120,000 16,215.67 0.135
22 0 120,000 14,741.52 0.123
23 0 120,000 13,401.38 0.112
24 0 120,000 12,183.07 0.102
25 0 120,000 11,075.52 0.092
NPV 89,244.80
B/C 1.09
By the decision rules of CBA, this project is socially worthy as the total
benefits exceed the total costs, therefore is considered as successful.
banks and agencies have their own guidelines for discount rate: e.g., World Bank has
applied 10-12% (1998), ADB used 12% until 2016 and now applies 9% (2017), and
Korea’s EDCF suggest 10-12% (EDCF 2012). The 10% discount rate used in this
analysis is a result of referring to these guidelines. One can argue that it is rather
arbitrary, but the level of discount rate in a hypothetical analysis like this one does not
make a dramatic difference in drawing conclusions.
114
Supposing that this project is also assessed as a success by meeting the
requirements that each of the DAC criteria requires, I will use this model as a
base for the following analysis.
Some variations from this base model can be made according to what can
affect the assessment in each DAC criterion. I consider the following five
scenarios which would change the value of project in different ways:
1) The project achieved the planned output on time and within the
budget, but did not yield the full expected outcome because a part of
the facilities is not in operation. With other things being hold
constant, this would affect the assessment in effectiveness criterion.
2) The project achieved the planned output and expected outcome
within the budget, but the completion of the project was delayed.
Delays in project implementation alters the assessment in efficiency
criterion.
3) The project achieved the planned output and expected outcome on
time and within the budget, but produces a negative externality, such
as environment degradation or higher risk to safety or health. This
unintended negative result is related to impact criterion.
4) The project achieved the planned output on time and within the
budget, and is in full operation yielding the expected benefits for the
first few years. After a few years, however, the project is somehow
forced to reduce or stop its operation for some reason, e.g., financial
115
problems. As a result, benefits of the project continue only
partially for the rest of the project life, or discontinue. The
likelihood of benefits’ continuation is primarily associated with
sustainability issue.
5) The project achieved the planned output on time and within the
budget and is ready for full operation. But the demand for the
project falls short of the capacity that the project can serve. The
problem is identified that during the project planning, the demand
for the project was misjudged and may be due to misjudgment of
demands during the project planning, which is first to be assessed in
relevance criterion. If a project is not relevant, the risk of lower
effectiveness or efficiency, and even sustainability
In practice, these scenarios are often observed. In the following section,
I will discuss how each of the above cases affect the evaluation conclusions
differently in the DAC framework and in CBA.
Effectiveness
By definition in the DAC criteria, effectiveness measures the extent to which
an aid activity attains its objectives. The first case considers a project which
achieved the planned output, but did not yield the full expected outcome
because a part of the facilities is not in operation. The partial operation would
yield yearly benefit less than expected $120,000, assuming other things are
116
constant. The lower yearly benefit would make the NPV of the project smaller,
as in Table 16. With the assumption made earlier, full operation is expected
to yield yearly benefit of $120,000. If the project produced 90% of its
expected outcome, the NPV would fall to -$19,680, and the B/C to 0.98.
When it achieved only 80% of its planned outcome, the results would be -
$128,604 in NPV and 0.87 in B/C.
Table 16. NPV and Effectiveness
Base Scenario 1
(90% of outcome)
Scenario 2
(80% of outcome)
Y C Bt0 PV0 Bt1 PV1 Bt2 PV2
0 1,000,000 0 - 1,000,000 0 - 1,000,000 0 - 1,000,000
1 - 120,000 109,091 108,000 98,182 96,000 87,273
2 - 120,000 99,174 108,000 89,256 96,000 79,339
3 - 120,000 90,158 108,000 81,142 96,000 72,126
4 - 120,000 81,962 108,000 73,765 96,000 65,569
5 - 120,000 74,511 108,000 67,060 96,000 59,608
… … … … … … … …
21 - 120,000 16,216 108,000 14,594 96,000 12,973
22 - 120,000 14,742 108,000 13,267 96,000 11,793
23 - 120,000 13,401 108,000 12,061 96,000 10,721
24 - 120,000 12,183 108,000 10,965 96,000 9,746
25 - 120,000 11,076 108,000 9,968 96,000 8,860
NPV 89,245 - 19,680 - 128,604
B/C 1.09 0.98 0.87
117
Judging by the criterion in CBA, both project in Scenario 1 and 2 would
be assessed as failure. Partial operation, thus fulfilling only 90% or 80% of
its expected outcomes, makes the NPV negative and B/C smaller than 1, which
means that the project is no more socially worthy nor successful by the standard
of judgement in CBA. In the DAC framework, however, how much would it
affect the results of overall conclusion? Assessment in effectiveness would be
lower than the base model, as the project achieved its objectives only partly.
But it would hardly affect the assessment of relevance, as long as it is consistent
with the country priority and donor’s strategic objectives, which is assumed met
in the base case. Whether the less than expected achievement of outcome is
due to the project design in addressing the development needs
Efficiency
A delay in project completion would affect its efficiency according to the
definition of the DAC criteria. The second case falls into this: the project
achieved the planned output and expected outcome within the budget, but the
completion of the project was delayed. Table 17 shows how one and two years
delays affect the NPV and B/C of the base case.
One-year delay in completion of the project make the benefit realized
from one year later, and even though the project would operate fully for the
lifespan, the present value of total benefit falls by 9%. The NPV would turn
to negative at -$9.777 with the B/C of 0.99. If the project delayed two years,
118
the NPV would fall to -$99,789.
Table 17. NPV and Efficiency
Base Scenario 1
(1-year delay)
Scenario 2
(2-year delay)
Y Ct Bt0 PV0 Bt1 PV1 Bt2 PV2
0 1,000,000 0 -1,000,000 0 -1,000,000 0 -1,000,000
1 - 120,000 109,091 0 0 0 0
2 - 120,000 99,174 120,000 99,174 0 0
3 - 120,000 90,158 120,000 90,158 120,000 90,158
4 - 120,000 81,962 120,000 81,962 120,000 81,962
5 - 120,000 74,511 120,000 74,511 120,000 74,511
… … … … … … … …
21 - 120,000 16,216 120,000 16,216 120,000 16,216
22 - 120,000 14,742 120,000 14,742 120,000 14,742
23 - 120,000 13,401 120,000 13,401 120,000 13,401
24 - 120,000 12,183 120,000 12,183 120,000 12,183
25 - 120,000 11,076 120,000 11,076 120,000 11,076
26 120,000 10,069 120,000 10,069
27 120,000 9,153
NPV 89,245 NPV -9,777 NPV -99,798
B/C 1.09 B/C 0.99 B/C 0.90
How would this delay in project completion, which is fatal enough to
make the project a failure with a negative NPV, affect the result of the DAC
framework evaluation? As seen in the evaluation cases in the previous chapter,
many projects were evaluated as success or partially success at the worst case,
even though a serious delay was observed. In the scenarios above, the output
and outcome has been achieved though delayed, so the assessment in
119
effectiveness criterion would not be affected. Assuming the benefits continue
for the project life, there seems to be no reason for changes in the assessments
in impact and sustainability criteria. With only some deduction in efficiency
criterion, the project is likely to be evaluated as a success in the scenarios, while
the CBA shows it may not be the case.
Impact
Assume that the project achieved the planned output and expected outcome on
time and within the budget, but it produces serious air pollution (negative
externality) which is estimated to cost $30,000 per year. It reduces the yearly
benefit to $90,000, and as a result, the NPV turns to negative.
Table 18. NPV and Negative Externality
Base Unintended Result
(Negative Externality)
Y C Bt0 PV0 Neg.Ext. PV1
0 1,000,000 0 -1,000,000 0 -1,000,000
1 - 120,000 109,091 -30,000 98,182
2 - 120,000 99,174 -30,000 89,256
3 - 120,000 90,158 -30,000 81,142
4 - 120,000 81,962 -30,000 73,765
5 - 120,000 74,511 -30,000 67,060
… … … … … …
21 - 120,000 16,216 -30,000 14,594
22 - 120,000 14,742 -30,000 13,267
23 - 120,000 13,401 -30,000 12,061
120
24 - 120,000 12,183 -30,000 10,965
25 - 120,000 11,076 -30,000 9,968
NPV 89,245 NPV -183,066
B/C 1.09 B/C 0.82
How much would this negative impact affect the results of DAC
framework evaluation? This type of negative externality, even if it is serious,
would be assessed under impact and possibly sustainable criteria, but not likely
to be considered in relevance, efficiency, or effectiveness criteria.
Sustainability
Consider a project which achieved the planned output on time and within the
budget, and is in full operation yielding the expected benefits for the first few
years. After a few years, however, the project is somehow forced to reduce or
stop its operation for technical problems. Scenario 1 assumes 50% of
operation from 5 years since completion, while Scenario 2 that the operation
will stop after 10 years. In both cases, the net benefit with discount factor falls
considerable, making the project not worthy at all.
121
Table 19. NPV and Sustainability
Base
Scenario 1
(benefit falls to half
after 4 years)
Scenario 2
(benefit discontinues
after 10 years)
Y C Bt0 PV0 Bt1 PV1 Bt2 PV2
0 1,000,000 0 -1,000,000 0 -1,000,000 0 -1,000,000
1 - 120,000 109,091 120,000 109,091 120,000 109,091
2 - 120,000 99,174 120,000 99,174 120,000 99,174
3 - 120,000 90,158 120,000 90,158 120,000 90,158
4 - 120,000 81,962 120,000 81,962 120,000 81,962
5 - 120,000 74,511 60,000 37,255 120,000 74,511
6 - 120,000 67,737 60,000 33,868 120,000 67,737
7 - 120,000 61,579 60,000 30,789 120,000 61,579
8 - 120,000 55,981 60,000 27,990 120,000 55,981
9 - 120,000 50,892 60,000 25,446 120,000 50,892
10 - 120,000 46,265 60,000 23,133 120,000 46,265
11 - 120,000 42,059 60,000 21,030 0 0
12 - 120,000 38,236 60,000 19,118 0 0
… … … … … … … …
21 - 120,000 16,216 60,000 8,108 0 0
22 - 120,000 14,742 60,000 7,371 0 0
23 - 120,000 13,401 60,000 6,701 0 0
24 - 120,000 12,183 60,000 6,092 0 0
25 - 120,000 11,076 60,000 5,538 0 0
NPV 89,245 NPV -265,186 NPV -262,652
B/C 1.09 B/C 0.73 B/C 0.74
In the DAC framework, both scenarios would affect the result of
assessment in sustainable criterion, and maybe in impact criterion in Scenario
2 to some extent. But they are not likely to change much the results in other
criteria, i.e., relevance, efficiency, or effectiveness. Following the DAC
122
definition, evaluators would generally assess whether the benefit will be likely
to continue, not specifically how much benefit would continue. They may
consider Scenario 1 is better than Scenario 2, because the former would at least
yield some continuing benefit for the period of analysis, though the NPV in
both cases are similar.
A Case: Project for Modernization of the Traffic Management System in Erbil
The project consists of three site constructions: 1) driver’s license test site, 2)
vehicle registration and inspection site, and 3) license plate manufacturing site.
All constructions and equipment provision were completed as planned. The
evaluation found that “only the driver’s license test site is under normal
operation, while the equipment for vehicle registration and inspection and
license plate manufacturing are not utilized at all”. As to the driver’s license
test site, the project established separate facilities for different driving skills
such as bus and large trailer truck drivers in addition to regular license. The
facilities for bus and trailer truck driving test sites, however, were not used at
all, because they do not have the system which issues separate licenses for those
special vehicles. In the target region, a driver with a regular driver’s license
can operate any type of vehicle including buses, large trucks and trailers. To
summarize, out of three sites the project constructed only one site is on
operation which is not fully utilized. The cost of establishing those unused
facilities is not clearly mentioned in the evaluation report. Nevertheless, it is
123
rather obvious that the benefits would not generated as planned.
But the assessment under DAC criteria, it only affects the score in
efficiency. Because in relevance criteria, the evaluators looked into the
relevance of project objectives to strategy of target country and region rather
than project design and components whether they are relevant to the context of
the target area. In effectiveness criteria, they assessed the outcomes which are
the results of now-operating facilities. Looking at the outcomes the operating
site generates, e.g., the driver’s license tests and necessary services, the
evaluators deemed that ‘the project was ‘very effective’ in attaining the
performance goals' (p. 62). This is a problem of objective setting, because in
the PDM the objectives are not related to what the project can actually achieve,
as its project goals are simply put as “established and operate an advance traffic
management system, with indicators 15% increase in licensed drivers and
register vehicles by 2008, reduction in traffic accidents, improved customer
satisfaction with DTC service. Sustainability is also measured based on
whether the present benefits from the one site, rather than from what would be
expected with the full operation.
124
5.3. Comparative Case Study - Evaluations of Water Supply Projects15
This section presents a comparative case study based on two evaluations of
water supply projects implemented by EDCF and KOICA. The evaluation of
Nicaragua project was conducted in 2012 (EDCF 2012) and the other in 2016
(KOICA 2016). Two agencies had different rating methods in the evaluation
guidelines according to which each of the evaluations was conducted.
Following the brief project descriptions, the evaluation results using the
DAC framework are presented. Then I discuss the results of cost-benefit
analysis of each project followed by the discussion on conceptual framework
and methodology of CBA for water projects. Finally, I discuss how the CBA
results can complement the implications
5.3.1. Project Description
The Water Supply Expansion Project in Juigalpa, Nicaragua
This project was developed to solve the chronic water shortage problem in the
city of Juigalpa, Nicaragua by establishing a new water supply system using an
alternative water resource. Juigalpa is a capital city of Chontales, a state in
15 The case studies are based on the two ex-post evaluations which I participated in and
conducted CBAs for. Both evaluations are published and available on the website of
each agency.
125
the middle part of Nicaragua, with a population of approximately 70,000. The
total cost of the project was $40.5 million, out of which $33 million was
supported as a loan by EDCF. The project started in 2006 and the construction
was completed in February 2010. The ex-post evaluation was conducted in
2012.
The purpose of the project was to provide safe and stable water supply
in Juigalpa. The major problem in water supply in the city before the project
was the water source was not reliable both in quantity and quality. The
existing source, River Pirre, had a very irregular and insufficient volume of
water. In dry seasons, the water plant often had to stop its operation because
of the water shortage. Restricted water rationing was enforced, and the water
was supplied on average once in three days in rainy seasons and two or three
times in a month in dry seasons. The access rate to water supply system was
77% but the actual supply of water was limited. People stored water in a water
tank or buckets when tap water was available, and used the stored water or a
well when the water supply was cut off. Those who did not have water supply
connection usually used wells or collected rainwater. People also purchased
water from water vendors, because the wells often went dry especially in dry
seasons.
The new water supply system was designed to use a new water source of
Lake Nicaragua which is located about 30km southwest from Juigalpa.
Construction of the new system included water intake tower and facilities in the
126
Lake Nicaragua, aqueducts and pressurizing pump stations, a new purification
plant, and pipe replacement and new connections. The total production
capacity of the new system was 650,000m3 per month, approximately 21,000m3
per day. The project descriptions, main achievements, and the results of
evaluations are summarized in Table 20, 21, and 22.
Table 20. Summary of Water Supply Project in Juigalpa, Nicaragua
Project Title Water Supply Expansion Project in Juigalpa, Nicaragua
Overall Goal To improve the quality of life of people in Juigalpa
Project purpose To supply clean and safe running water
Activities/Outputs Construction of the water supply system
Water intake facilities (240l/s)
- Water pumping stations
- Water purification and reservoir facilities
- Pipelines and network
Consulting
-supervision and technical assistance
Population in
target area 68,410
Project Cost
(external support) $40.5 million ($33.1 million)
Source: organized by author based on EDCF (2012).
Table 21. Achievement of Nicaragua Project
Before project (2009) After project (2012)
Production capacity
(m3/m) 340,000 650,000
Population with piped
water supply 77.2% 95.1%
Tap water consumption
(m3/m, per household) 14.4 18.6
Total water consumption
(m3/m) 126,181 206,591
Population in target area 68,410 70,000
127
Table 22. Nicaragua Project: DAC Criteria Evaluation
Criterion Sub-category Rating
Description
Rating
Value
Relevance -Consistency with water supply and
sewage system development policies and
priorities of the partner country
-Consistency with the EDCF's assistance
strategies
-Harmonization with International
Development Cooperation norms such as
MDG, cross-cutting issues and Water
supply aid policies
-Adequacy of Feasibility study and
Project design
Relevant 3
Efficiency -Efficiency of project cost
-Efficiency of project time period
-Efficiency of project implementation
procedures
Highly
Efficient
4
Effectiveness -Achievement of planned outputs
-Achievement of project objectives
-Application of appropriate technology on
a local level
Highly
Effective
4
Impact -Socio-economic impact
-Systemic impact
-Impact on gender equality and
environment
Highly
Influential
4
Sustainability -Systemic sustainability
-Financial sustainability
Highly
Sustainable
4
Overall Evaluation Score Highly
Successful
3.8
Source: EDCF (2012, 8)
The Project for the Construction of Water Supply System in Buon Ho, Vietnam
This is a KOICA project implemented in 2010-2013 to support the expansion
of water supply system in Buon Ho Town in Dak Lak Province, Vietnam.
Buon Ho Town is a district-level city raised to the urban status in 2009, with a
population of 55,000 as of 2010. The local government highlighted the
128
increasing demands for basic infrastructure in the Town as a result of rapid
urbanization with concurrent population growth, and requested KOICA to
support the construction of water supply facilities. The total project cost was
$5.15 million, out of which KOICA funded $4.5 million including construction
of water intake and purification facilities, provision of equipment, and technical
assistance. The project description is summarized in Table 23.
Table 23. Summary of Water Supply Project in Buon Ho Town, Vietnam
Project Title Project for the Construction of Water Supply System in Buon
Ho Town, Vietnam
Overall Goal To improve the quality of life of people in Buon Ho Town
Project purpose To supply clean and safe running water
Activities/Outputs To establish the water supply system
- Water purification facilities (5,600㎥ per day)
- Water intake facilities
- 60km of the pipeline
• To provide equipment and training for the use of them
• To build the capacity for the management and operation of
waterworks
Population in
target area
54,218
Project Cost
(external support)
$5.1 million ($4.5 million)
The project mainly focused on construction of an additional water supply
system: nine water intake facilities (seven from underground water, two from
surface water), water purification facilities with a capacity of 5,600m3 per day,
and 60km of pipelines. The construction cost 91% of the total budget, while
the rest, less than 10%, was spent in providing equipment, training programs,
and administration. The Vietnam government also spent about $0.65 million
129
in construction of additional pipelines and civil complaint settlement during the
implementation.
The purpose of this project is to supply clean and safe running water to
people in Buon Ho Town, Vietnam, by expanding the existing water supply
system which was presumed to be insufficient to serve the increasing
population in the near future. Before the project was initiated, the exiting
water supply system with a capacity of 4,200m3 per day had served about 30%
of the population in Buon Ho Town, mainly residents in three wards located in
the center of the Town. Those who did not have access to the water supply
system used water from a well, mostly on the premises. At the time of project
appraisal, it was reported that because of drinking and using well water,
residents were suffering from water-related diseases such as diarrhea, parasitic
infections, and skin disease. Although there was lack of information on how
prevalent and serious the problem was, the health issue was considered as a
rationale for the project. Accordingly, the project aimed at achieving two
objectives: to increase the population connected to piped water with wider
water supply network covering all seven wards in the Buon Ho Town; and to
improve the health conditions of residents. Table 24 shows the objectives and
outcome indicators set for the project monitoring and evaluation.
130
Table 24. Objectives and Targets in Vietnam Project
Indicators Baseline
(2012)
Target
Objectives
1. To improve access to
safe and drinkable
water
Increased access to
waterworks
15% 75% by 2013
90% by 2015
2. To improve the health
of local residents
Reduced number of
patients contracting
water-born diseases
Reduced cases of
parasitic infections
1,800 people
18.3%
Reduction of
50%
Overall Goal
To propel community
development by
establishing social
infrastructure
Reduced share of
people in extreme
poverty
6.5% Reduction of
20%
Source: KOICA (2016)
The target of access rate to piped water to 75% and 90% by 2015 was set
following the national target for urban area. The construction was completed
in March 2013. After the project, the coverage of water supply network in
Buon Ho Town was expanded from 3 wards to 7 wards geographically, with the
access rate from 30% to 64% as of August 2016.
Table 25. Achievement of Vietnam Project
Before project (2012) After project (2016)
Production capacity (m3/m) 126,000 294,000
Population with piped water
supply 31.1% 64.1%
Tap water consumption
(m3/m, per household) 18.5 15
Total water consumption
(m3/m) 82,320 132,956
Population in target area 54,218
Three wards in center
area
All seven wards
131
The ex-post evaluation was in 2016. The main framework for
evaluation questions and assessments was the DAC criteria. Table 26
summarizes the findings from DAC framework evaluation.
Table 26. Vietnam Project: DAC Criteria Evaluation
Criteria Sub category Rating Scale
Relevance
1. Relevance to development strategy and
needs of partner country, and to Korea’s
development cooperation strategy
④ 3 2 1
2. Relevance of design and implementation 4 ③ 2 1
3. Ownership of the partner country ④ 3 2 1
Average (a) 3.7
Efficiency
1. Cost Efficiency (within the planned budget) ④ 3 2 1
2. Time Efficiency (within the planned time
frame) ④ 3 2 1
3. Results against inputs 4 3 ② 1
Average (b) 3.3
Effectiveness/
Impact
1. The extent to which objectives are met 4 ③ 2 1
2. Positive or negative impacts on society,
economy, institutions 4 ③ 2 1
Average (c) 3.0
Sustainability
1. Human resources, institutional and financial
aspects ④ 3 2 1
2. Maintenance capability and management
system ④ 3 2 1
Average (d) 4.0
Total Score (a+b+c+d)
Overall Assessment
14.0
Very Successful
Source: KOICA (2016, iii)
132
Table 27. Two Projects in Comparison
Juigalpa, Nicaragua
(2006~10)
Buon Ho, Vietnam
(2010~13)
Initial Costs USD 40,520,000 USD 5,150,000
External
support
33,134,000
(loan by EDCF)
4,500,000
(grant by KOICA)
Population in
target area 68,410 54,218
Major changes
Before project
(2009)
After project
(2012)
Before project
(2012)
After project
(2016)
Production
capacity
(m3/m)
340,000 650,000 126,000 294,000
Population
with piped
water supply
77.2% 95.1% 31.1% 64.1%
Tap water
consumption
(m3/m, per
household)
14.4 18.6 18.5 15
Total water
consumption
(m3/m)
126,181 206,591 82,320 132,956
Leakage ratio
(%) 15.88 53.05 11 18
5.3.2. Evaluation Results in the DAC Criteria Framework
Both projects were evaluated as “very successful” based on the overall rating
of evaluation results which is calculated with the sum or average of the scores
for the five evaluation criteria. The difference in final scores is negligible as
133
well as not very comparable because the two agencies had different averaging
methods. I summarize the main rationale for the judgment by criterion, while
there are detailed descriptions on pros and cons in the reports.
1) Relevance
Both projects were assessed as relevant to the development policy and
water strategies of each country, the needs of the target area and beneficiaries,
as well as the Korea’s development cooperation strategies. Regarding the
project design, there were found some negative aspects. The Nicaragua
project had not had a set of outcome indicators and targets, which was not
required at the time of project design. Lack of consideration of a sewage
system was also pointed out. As to the Vietnam project, the outcome
indicators were not selected based on clear rationale and the quantitative targets
were not realistic to achieve. Nonetheless, the overall rating of ‘relevance’
assessments was 3 and 3.33 respectively out of 4.
2) Efficiency
Main questions in the ‘efficiency’ criterion were whether the project was
completed within the planned schedule and budget, and if the project was
implemented in the most efficient way in terms of process and technology.
Both projects were completed within the initially planned budget and with
slight delays in schedule. The process and technology for each project were
134
found to be efficient. Cost-benefit analysis was conducted in both evaluations,
and the net benefit in each project was positive. Overall, both projects were
given high score in ‘efficiency’ criterion, 4 in the Nicaragua project and 3.3 in
the Vietnam project out of 4.
3) Effectiveness
The assessments of ‘effectiveness’ were conducted based on the outputs
and outcomes achieved against the targets in the plans. Both projects
produced all outputs planned, so were recorded as having achieved the 100%
of output targets. All facilities were operational in good condition and with
proper maintenance. The quality of water supplied met the national standards.
As to the outcomes, the population having access to the water supply
system was used as an indicator. In the Nicaragua project, the access to
waterworks increased from 77.2% to 95.1% in three years since the completion
of project. In the Vietnam project, the access rate doubled after three years
from 31% to 64%, but the achievement was not met the target value which was
set following the national target, without due consideration of what the project
would have been able to achieve.
4) Impact
The assessment of ‘impact’ of the projects relied on observations and
household surveys. The Nicaragua project was assessed to be ‘highly
135
influential’ as the project was deemed to have made significant contributions to
the improvement on the quality of life and living conditions of the direct
beneficiaries. It also considered the socio-economic impacts on vulnerable
social groups such as women, children and the poor: the most benefited were
women and children of newly connected households in the poorest area who
had been responsible for fetching water from a public well.
The Vietnam project evaluation found that the project contributed to the
improved living standards of people in the target area to a certain degree, based
on the results of the household survey conducted for the evaluation. The
majority of respondents said that the convenience in water usage was improved,
while increase in the amount of water they use was only moderate, about 5%.
5) Sustainability
Both projects were found very sustainable in terms of institutional,
financial and human resource managements. In both case, there seemed to be
increasing demands for water service. The systems were operated and
managed by government-owned companies which were assessed to be
financially stable or supported by the government and to have relevant human
resources and operational capacity.
136
5.3.3. Results of Cost-Benefit Analysis
5.3.3.1. Theoretical Basis of CBA for Water Supply Intervention
Estimating Costs of Water Supply Intervention
Costs of water supply interventions are relatively straightforward. They
generally consist of two types of costs (Hutton and Haller 2004). Initial
investment costs include planning and supervision, construction, hardware, and
education for use of hardware. Recurrent costs, i.e., costs required to maintain
the interventions, may include: operational materials, maintenance of hardware
and replacement of parts, costs of water treatment and distribution, regulation
and control of water supply, management and education of human resources,
etc.
Estimating Benefits of Water Supply Intervention
Water supply interventions in developing countries often come with sanitation
components, of which the primarily purpose is to improve health conditions.
Consequently, literature on water supply interventions in development context
has a strong focus on health improvements, broadly dividing the benefits into
two categories: health benefits and non-health benefits (Hutton, Haller, and
Bartram 2007, OECD 2011). Beneficial health impacts of water supply are
associated with reduced incidence of water-related diseases, which leads to less
137
expenditure on treatment and avoided productivity loss from sickness and death.
Non-health benefits include time saved in water collection, increase in
productivity, and improvement in quality of life.
Hutton and Haller (2004) categorized four distinct groups that would
benefit from water service improvements: the health sector, patients, consumers,
and agricultural and industrial sectors, as summarized in Table 28.
Table 28. Costs and Benefits in Water Supply Interventions
Costs
Investment costs Planning and supervision, hardware, construction, protection
of water sources, education of
Recurrent costs
Operational materials, maintenance of hardware and
replacement of parts, costs of water treatment and
distribution, regulation and control of water supply, etc.
Benefits
Health-related
benefits
Government health-care costs saved
Household health-care costs saved
Saved time of patient or caretaker
Benefits from mortalities postponed
Value of avoided days lost at work or at school
Consumers Time savings related to water collection
Switch away from more expensive water sources
Increased quality of life
Agricultural and
industrial sector Improved productivity due to improved water supply and
more efficient management of water resources
Source: arranged by author based on Hutton and Haller (2004) and Cameron (2011).
Benefits from reduced incidence of water-related diseases may accrue
not only to patients and caretakers who would save treatment-related costs and
time, but also to the health sector which would spend less health-care
expenditure. Consumers of water service would benefit from time savings
138
related to water collection and less expensive water sources. Agriculture and
industrial sectors would have benefits of improved productivity due to more
efficient management of water resources. In addition, Cameron (2011) added
wider social benefits such as environmental gain associated with amenity value
of land and social capital benefits related to increased confidence and trust.
When conducting a CBA, it is useful to apply the concepts of
microeconomic theory to measuring the benefits (Boardman et al. 2011) 16 .
Figure 14 and 15 illustrate the changes in consumer surplus before and after a
water supply project.
Suppose that the initial price of water is given by P1 and the quantity
consumed at P1 was Q1. A water supply intervention that reduces the price of
water from P1 to P2 would increase the quantity consumed from Q1 to Q2,
resulting in increased benefits to consumers which is equal to the shaded area
‘a’ plus ‘b’ (Figure 14). Consumers gain benefits in two ways: they pay a
lower price than before in consuming same amount of water (represented by
the area ‘a’), and they consume additional quantity of water by Q2- Q1 (the area
‘b’). We can also think of a case where the demand curve shifts to the right
(D’) as shown in Figure 15. For example, better water quality after the new
water system may result in the increase in demand, and the increase consumer
16 The concepts of microeconomic model presented here are drawn from Chapter 3.
Microeconomic Foundations of Cost-Benefit Analysis in the referred literature.
139
welfare equal to the area ‘c’ would be estimated by improvement in health.
P1: price of water used before connection
P2: price of water (water tariff) after connection
Q1: water consumption before connection
Q2: water consumption after connection
To estimate the consumer surplus increased by a project requires
quantitative information such as changes in water price and quantity of water
consumption before and after the project, and the number of population or
households that benefit from the project. Data on factors which affect the
demand curve shift, for example, the extent of benefits in health or environment,
are also needed in case there are observed improvements as consequences of
Figure 14. Consumer Surplus of
Water Supply Project (1) Figure 15. Consumer Surplus of
Water Supply Project (2)
140
the project.
5.3.3.2. Estimating Benefits of Two Water Supply Projects
Based on the theories discussed above, CBA was conducted for the two water
supply projects. Since the baseline information on the price or the amount of
water people used before the project was not available, the evaluation team
including the author collected data from household surveys and interviews.
Statistics from the local governments and water management companies were
also incorporated in estimation of NPVs. The information provided in this
section is based on the evaluation team’s findings on each of the projects, which
were published by the respective agencies in EDCF (2012) and in KOICA
(2016).
In the Nicaragua project, 77% of households had the access to
waterworks before the project but they could not fully enjoy the service. The
water supply was very unstable due to the irregular and insufficient water
source. It was found that water supply was restricted on average to one in
three days in rainy seasons and two or three days in a month in dry seasons.
People stored tap water when supplied in tanks or buckets and used the water
during shutdowns or fetched water from wells. In dry seasons when water
supply was far short of demand, people had to buy water from private water
trucks as wells often dried up. Those who had no assess to water supply
141
system mainly used wells in rainy seasons, while they also had to buy water
from water vendors in dry seasons.
After the project, the water supply was expanded to 95% of population.
The new system provides stable and sufficient water to cover the demand
regardless of rainy or dry seasons. Households with waterworks connection
could use the tap water 24 hours and 7 days a week. The price of tap water,
the water tariff, is $0.2/ m3 which is much lower than the price from water trucks
(the average price was approximately $10/m3) or the cost of fetching water from
wells (more than 50 hours for 1m3 on average), so they no more used wells or
bought water from private vendors. The quantity of water consumption also
increased as they could use water with more convenience at much less price.
The extent to which the price and quantity change in water consumption
was varied according to the pattern of water usage before the project which also
depended on geographical area and income level. Three groups were
identified: the first group of households with low income and in a remote area
where the water supply had not been installed before the project; the second
group in the water connected residential area with medium income; and the
third, households in high-income commercial area whose water consumption
was relatively large. Table 29 summarizes the information of these three
groups before the project.
142
Table 29. Patterns of Water Consumption in Juigalpa before the Project
Before
project
Remote Area Residential Area Central Area
Areas without
water connection
(low income)
Residential area with
water connection
(medium income)
Commercial area with
water connection
Population household: 2,000
population: 12,000
household: 6,000
population: 36,000
household: 3,000
population: 18,000
Monthly
income per
HH
$100-150 $150-300 over $300
Major source
of income
house keeper
($100/month)
civil servant, teacher,
self- employed
store, restaurant,
self-employed
Use of water Rainy
Season
Dry
Season
Rainy
Season
Dry
Season
Rainy
Season
Dry
Season
Amount of
use per HH
(m3/month)
3~4 3~4 20 10~15 40 15~30
Major source
of water
public
well
water
vendor
tap water
or public
well
tap water
or
water
vendor
tap water
tap water or
delivery
(in person)
Costs due to
the lack of
water supply
- time spent for
fetching water
- water price (from
water vendor)
- inconvenience
and time spent for
purchasing water
from water truck
- water price (from
water vendor)
- inconvenience and
uncertainty when
using tap water
- cost for delivering
water by vehicle
- inconvenience and
uncertainty when
using tap water
Source: EDCF (2012)
Table 30 shows the changes in quantity of water that the beneficiaries
consumed before and after the project. Households in remote area who did
not have water supply connection used to consume relatively small amount of
water, approximately 3m3/month on average before the project, because they
had to fetch water from public wells in distance or pay high price to water
143
vendors. With access to inexpensive water supply, it was found that the water
consumption increased to 15m3/month for average household. The other
groups with water connection before the project now could enjoy 24 hours and
7 days stable water supply, resulting in increase in average water consumption.
The avoided cost of uncertainty and inconvenience in using tap water before
the project is also a beneficial effect of the project.
Table 30. Changes in Water Consumption Before and After the Project
(Unit: m3/month)
Area Remote Area Residential Area Central Area
Season Rainy Dry Rainy Dry Rainy Dry
Before
project
Wells 0 3 0 0 5 0
Water trucks 3 0 5 0 0 0
Tap Water 0 0 10 20 20 40
Total 3 3 15 20 25 40
After
project
Wells 0 0 0 0 0 0
Water trucks 0 0 0 0 0 0
Tap Water 15 15 25 25 45 45
Total 15 15 25 25 45 45
Source:
Compared to Juigalpa, Nicaragua, Buon Ho Town, the target area of the
Vietnam project, has relatively abundant rainfalls throughout the year, even in
so-called dry seasons. Most of the households own a private well in the yard,
and many of them were equipped with electric motor pumps. Household
survey and interviews with residents revealed that the amount of water they
144
used before connected to the water service was about 15~25m3 per month, much
larger than those in Juigalpa used. Table 31 compares the patterns of water
use in Buon Ho Town before and after the project.
Table 31. Patterns of Water Consumption in Buon Ho Town
before and after the Project
Before connected After connected
Major water source Well on premises Tap water, well on premises
Amount of water use
(m3/month) 15~25
10~30 from tap water
5~10 from well
Use of water
For drinking bottled water, well bottled water, tap water
For cooking, washing
dishes well tap water, partly well (10%)
For shower, washing
clothes well tap water, partly well (15%)
For cleaning, etc. well well, partly tap water
Benefits of water
connection
- Saved cost of using water pump (maintenance,
replacement of parts, electricity, etc.)
- Saved time of collecting water (mostly from well in
premises, rarely from outside)
- Reduced inconvenience and uncertainty when using well
- Reduced purchase of bottled water for drinking
Interestingly, many households kept using wells even after connected to
water supply system. Only 10% of newly connected households responded
that they completely switched from well to tap water. The others still used
well water for washing and cleaning, because they did not have much
inconvenience in using well waters while they felt the water tariff was not
145
considerably cheaper than using wells.
The main benefits from water supply come from lowered price and
improved convenience of access to water in terms of quantity (OECD 2011, 46).
In Buon Ho Town, difference between the price of tap water and that of using
wells seemed to be insignificant, so it was hard to expect substantial increase
in consumer surplus. In terms of quantity, they used to consume sufficient
water from wells using electric pump and water tanks before the project, so
increase in the amount of water consumption was limited for average
households. Small number of households who did not installed an electric
pump on the well benefited significantly in terms of convenience from the water
connection.
5.3.3.3. Results of Cost-Benefit Analysis in Comparison
The two projects share similar characteristics in term of the project activities
and expected outputs and outcomes, the size of beneficiary population, and the
level of development in the target area. Juigalpa and Buon Ho Town are in
urban areas with basic social infrastructure such as major roads, electricity,
schools and health services. The standard of living was similar and relatively
higher than rural areas. In both areas, there had existed a water supply system
before the interventions, with problems such as insufficient capacity.
The major difference lies in the costs of project. The initial investment
146
in the Nicaragua project is approximately $40 million, eight times larger than
that in the Vietnam project, $5.2 million. If calculating the project cost versus
the number of beneficiaries, which is the often used method for cost-
effectiveness comparison, the cost per capita in the Nicaragua project, about
$580, is much higher than that in the Vietnam project, which is around $95.
Simple comparison of the costs to number of population can be misleading, as
it does not consider the magnitude of benefits. Cost-benefit analysis provides
a window to view the effects of the projects in consideration of the size of
benefits compared to the costs.
Both interventions were basically to expand the water supply systems
and networks, so the data for estimating costs and benefits, e.g., management
costs, water price and consumption, etc., were available and relatively reliable.
CBAs were conducted for both projects using same method. The benefits
mainly came from the consumers side, as the water supply systems primarily
targeted the residents not agriculture or industrial sectors. As to health
benefits, the evaluation teams found that it would not be sensible to expect
improvements in health conditions for the reasons follow. In both areas, the
living standard was relatively high compared to rural areas with good access to
basic social infrastructure. Water they used to drink was not ‘unsafe’ to drink,
so it was unlikely to have caused serious water-borne disease.17 Data from
17 Survey in the Vietnam project found that about 30% of households in Buon Ho Town
buy and use 20-litter bottled water for drinking purpose.
147
health authorities also failed to show meaningful relationship between the
projects and the number of patients with water-related disease. This is
consistent with the findings in a meta-analysis on impacts of water supply
interventions, which states that water supply interventions alone showed
negligible and insignificant impact on diarrhea morbidity (Waddington et al.
2009, quoted in OECD 2011, 49).
Table 32. Estimation of NPV and B/C of Two Water Projects
Nicaragua Project Vietnam Project
Discount
Rate NPV($) B/C NPV($) B/C
8% 33,597,094 1.71 2,661,045 1.31
9% 26,308,096 1.57 1,708,372 1.21
10% 20,174,695 1.44 909,463 1.11
11% 14,973,178 1.33 234,641 1.03
12% 10,529,080 1.24 - 339,320 0.96
Note: Detailed descriptions and method of calculation are available in ECDF (2012)
and KOICA (2016).
The result of cost-benefit analyses of two projects are shown in Table 32.
The Nicaragua project would be expected to have approximately $20 million
of NPV considering 35 years of project life at 10% discount rate, and showed
positive NPVs in the range of 8~12% discount rate. The Vietnam project
showed relatively small net social benefit compared to the Nicaragua project.
148
When calculated with a discount rate of 10%, the NPV exhibited a positive
value of about $0.9 million for 35 years, which indicates that the project would
generate economic benefits with the discount rate of 10%. Applying a
discount rate of 12% for a sensitivity test, the NPV turns to negative.
As mentioned earlier, the total cost of the Nicaragua project is much
higher than that of the Vietnam project. The results of CBA showed that the
simple comparison of per capita cost, which is often used as a standard in
‘efficiency’ criterion, would be misleading, since it does not compare the size
of benefits relative to the costs.
The magnitude of benefits depends on the conditions before the project,
that is, in this case, what was the costs that the beneficiaries had to spend to get
water before the project. The costs can include time, paid expenditure to water
consumption, or inconvenience and uncertainty. For those who used to spend
two hours a day for fetching water or pay several dollars per month to buy
drinking water, the benefits from assess to water supply system with a
reasonable tariff rate would be considerably big. On the other hand, for those
who used to use water without much inconvenience at relatively low price, the
benefits might not be so significant.
In the Nicaragua case, those who did not have a household connection
before the project used to spend a lot of time and money to fetch or buy water,
and even people with household connection used to experience much
inconvenience because of irregular and insufficient water supply. After the
149
project they have stable water supply at home, which led to increase water
consumption with improved convenience. In Buon Ho Town, most
households had a well in the yard before the project, in most cases with an
electric pump to draw and store water in a tank, so the costs of using water or
inconvenience was not very high. For the households who used to draw water
from a well with an automatic water pump and store in a two-ton water tank on
the top of the house, the way of using water before and after water supply
connection would not make a significant difference. The differences would
be the water quality between tap and well water, and maybe an effort to push a
button of the automatic water pump to draw and store the well water into the
tank.
Increase in water consumption is another important factor of benefits
from water supply intervention. The average tap water consumption per
household in Juigalpa increased from 14.4 m3 per month to 18.6 m3, while it
decreased in the Vietnam project from 18.5 m3 to 15 m3. It implies that newly
connected households in Boun Ho Town do not use tap water as much as those
who already had a connection. Since people kept using well water even after
connected, for the purposes of other than drinking or cooking, e.g., for doing
laundry or watering plants in the yard, the tap water consumption does not
reflect the total amount of water use. Nevertheless, the survey results showed
that residents in Buon Ho Town used to consume sufficient water from wells
even before the project, so increase in water consumption and consequent
150
improvement in consumer surplus seem to be limited.
What the CBA results did not show is the benefits from improved water
quality, as those are not easily quantified. It is reasonably inferred that the
quality of purified tap water would be much better than that of well water or
water purchased from water trucks. Judgement about how to value this type
of benefit, therefore, is left to evaluators and requires qualitative assessments.
5.4. Discussion
In this chapter, I examined how the methods of synthesis can affect the
evaluation results differently between the DAC framework and CBA. The
DAC criteria have a same weight when integrated into a final judgement of
success or not, while CBA aggregates both positive and negative values to
determine a net value, the NPV.
The hypothetical case analysis shows that evaluations applying the DAC
criteria can generate a positive conclusion on a project whose social benefit, i.e.,
the NPV is negative. It is fair to conclude that there is positive bias inherent
in the DAC criteria; as the positive conclusion is drawn by combining the
assessments in the five criteria against the standards as defined.
The comparative case study of water supply projects suggests that a
project with smaller NPV (or benefit-cost ratio, B/C) can get similar or more
favorable assessments in the DAC criteria framework than another project with
151
greater NPV (or B/C). In the DAC criteria framework, both projects were
evaluated as ‘very successful’ with similar findings in each criterion. The
results of CBAs, however, show much larger NPV (about 20 million USD) and
B/C (1.44) in the Nicaragua project than those in the Vietnam project (about
0.9 million USD or 1.11 respectively). In a sensitivity test, the NPV of
Vietnam project turned to negative, showing that it is not safe to make a firm
judgement that it is a successful project. The DAC criteria framework yield
positively biased evaluation results for the Vietnam project whose net social
benefit is relatively small and even possibly negative, getting high and same
level of assessment as the Nicaragua project whose net social benefit is much
larger and stable. It means that the extent to which positive bias occurs is
inconsistent across evaluations results.
Positively biased evaluation results can be attributed to the standards of
judgement in each criterion as well as the synthesis methods of integrating the
assessments in different criteria with a same weight, which is rather arbitrary.
Applying the compensatory model, the DAC framework allows a serious
problem in one criterion to be compensated by a good assessment in another
criterion. The problem may be serious enough to make the project’s NPV
negative, in which case the project would not be considered worthy or
successful. However, the overall conclusion in the DAC framework would be
affected partially, as the assessments in the two criteria are combined into the
overall judgement with same level of importance.
152
CHAPTER 6. CONCLUSIONS
This study examined the OECD/DAC evaluation criteria with a question
whether they contribute to positively biased and inconsistent evaluation results
in comparison with cost-benefit analysis (CBA). Positive bias may reduce
comparability between evaluations that is one of the purposes for which the
DAC criteria were developed, making it difficult to differentiate a more
successful intervention from a less one and possibly causing inconsistency
across evaluation results.
Implementing the ‘general logic of evaluation’ as the analytical
framework, the study is organized the analyses into three stages, dealing with:
first, the notion of merit that the DAC criteria define for a successful
development intervention; second, the standards of judgement and source of
supporting evidence in each criterion; and third, the methods of synthesis by
which the assessments in the five criteria are integrated into the overall
evaluative conclusion on the intervention.
The term ‘evaluation criteria’ is defined as the aspects, qualities, or
dimensions that distinguish a more valuable evaluand from a less valuable one.
Based on the review of key requirements for evaluation criteria in various
evaluation models, the defining features of good evaluation criteria are drawn
as follows. First, a criterion should provide a clear notion of quality or value
153
to be assessed. If a criterion asks questions about what is taken for granted or
is not supported by evidence that address the criterion directly, it would be
superfluous or inappropriate. Second, a criterion should suggest a guidance
for setting a standard of assessments: what level of quality is enough to be
considered as valuable or acceptable. If the standards of judgement are
subjective and easy to be satisfied, the assessment can hardly provide reliable
information to judge the true value of the evaluand or to discriminate between
values of evaluands. Third, the criteria should be complete, non-overlapping,
and commensurable. Omission of an important aspect could result in non-
counting problems in synthesizing the assessments. Likewise, overlapping
criteria could cause double-counting problems. This is essential especially
when the assessments are integrated into an overall rating.
Given the definition of ‘evaluation criteria’, the DAC criteria would
represent the merit of the program being evaluated, encompassing the
properties that constitute a successful intervention. The study finds that the
DAC criteria cover most of the dimensions that general evaluation models
suggest, but with some limitations. The ‘relevance’ criterion primarily
focuses on the policy context and asks ex-ante questions with less attention to
the need of the program or the logical linkage between the program and
expected results. Measuring ‘effectiveness’, defined as the achievement of
program objectives, and ‘efficiency’, e.g., whether completed on time and
within the budget, is largely based on the assumption that the program’s initial
154
plan and objectives are valid. The ‘sustainability’ criterion assumes that the
results are beneficial and worth continuing.
This logical interdependence between criteria may affect assessment in
the criterion at issue because the standard of judgement may have to be adjusted
according to the assessment in another criterion. Whether to achieve the
stated objectives, the primary condition to satisfy the effectiveness criterion,
may not be an appropriate standard to judge the program’s success, if the stated
objectives are not valid. So meeting the relevance criterion would be the
precondition for meaningful assessment for effectiveness, otherwise the
standard should be reestablished. The similar interdependence lies between
effectiveness/impact and sustainability. What kinds and magnitude of
benefits can be sustained depends on the analysis in effectiveness and impact
criteria, so the standards of judgement in sustainability criterion should be
drawn accordingly. If the assessments are made separately and integrated into
an overall conclusion without a thoughtful consideration of relative importance,
there is a risk of bias in overall results.
The review of 65 ex-post evaluation reports by two Korean aid agencies,
KOICA and EDCF, confirms the analysis above. The evaluations under
review applied the DAC criteria framework mostly using the standardized
questions and standards of judgment in the guidelines. Some of the questions
address what is taken for granted as a development project or do not necessarily
require in-depth research. In most cases, assessments in each criterion were
155
made independently, though some of them should have rested on the
information found in other criteria. This leads to high ratings in average
especially in relevance and effectiveness criteria. 88% of the reports
concluded that the evaluand is either successful or very successful, even in the
cases where development results were not deemed to be significant. The
evaluation conclusions are drawn on a simple average of the ratings by criterion,
and a serious flaw detected in one criterion, e.g., sustainability, is often
compensated by high rating in another, e.g., relevance. Such mechanical
applications of the DAC criteria and standardized questions can mislead
evaluation results, in most cases towards positive conclusions, which make it
difficult to understand the true value of the project, to differentiate a more
successful project from a less one, and to draw valid lessons from the evaluation.
To explain the suspected positive bias in the DAC criteria, this study
adopts the criterion used in cost-benefit analysis (CBA), namely the net present
value (NPV) as a benchmark. Theoretically, CBA considers all direct and
indirect effects as consequences of a project against costs for a certain period,
and examines all aspects that the DAC criteria suggest. I argued that the NPV,
as an indicator that represent the social value of a project, covers the scope of
the DAC criteria: the needs (relevance), benefits in relation with costs
(efficiency), increase in social welfare (effectiveness and impact), and the
period that the benefits maintain (sustainability). I explained that certain
events, which affect the NPV seriously making the assessment of a project
156
negative, only affect the evaluation results in the DAC criteria partially
resulting in rather positive assessment than that in CBA. Comparative case
studies presented in Chapter 5 explain that the DAC criteria framework yield
positively biased evaluation results for a project whose net social benefit is
relatively small and even possibly negative, getting high and same level of
assessment as another project whose net social benefit is much larger and stable.
To summarize, this study finds that the DAC criteria have possibilities to
produce positive bias in evaluation results, and they are more evident in practice
as in the cases of Korean agencies. This positive bias is analyzed conceptually
and empirically in comparison with the evaluation standard in CBA, and found
to be attributable to arbitrary relative weights placed on the DAC criteria. The
study also finds that positive bias may occur unevenly between projects,
resulting in inconsistency across evaluation results. The findings have
significant policy implications as positive bias and its inconsistent occurrence
in evaluation results may seriously weaken the validity of evaluations, and
consequently mislead the agencies’ learning and decision making which are the
primary purposes of evaluation.
This study contributes to the relatively small body of academic literature
on international development evaluation, by adding value from a new
perspective to look into the limitations of the DAC criteria as an evaluation
framework. It would also enrich the recent discussion on how to improve the
DAC criteria, providing both theoretical and empirical analyses.
157
REFERENCES
ADB. 2013. Cost-Benefit Analysis for Development: A Practical Guide.
ADB. 2016. Guidelines for Preparing a Design and Monitoring Framework.
Mandaluyong City, Philippines: Asian Development Bank.
ADB. 2017. Guidelines for the Economic Analysis of Projects (revised version
of the 1997 edition).
Alkin, Marvin C. 2013a. "Comparing Evaluation Points of View." In
Evaluation Roots: a Wider Perspective of Theorists’ Views and
Influences, edited by Marvin C Alkin, 3-10. Thousand Oaks, CA:
Sage Publications.
Alkin, Marvin C, ed. 2013b. Evaluation Roots: a Wider Perspective of
Theorists’ Views and Influences. 2nd ed. Thousand Oaks, CA: Sage
Publications.
Alkin, Marvin C. 2010. Evaluation Essentials from A to Z. New York:
Guilford Press.
Alkin, Marvin C., Michael Quinn Patton, and Carol H. Weiss. 1990. Debates
on Evaluation. Newbury Park, Calif.: Newbury Park, Calif. : Sage
Publications.
ALNAP. 2006. Evaluating Humanitarian Action Using the OECD-DAC
Criteria: An ALNAP Guide for Humanitarian Agencies. London:
Overseas Development Institute.
Astbury, Brad. 2016. "From Evaluation Theory to Tests of Evaluation
Theory?" In The Future of Evaluation: Global Trends, New
Challenges, Shared Perspectives, edited by Reinhard Stockmann and
158
Wolfgang Meyer, 309-325. London: Palgrave Macmillan UK.
Bamberger, Michael. 1991. "The politics of evaluation in developing
countries." Evaluation and Program Planning 14 (4):325-339.
Bamberger, Michael. 2000. "The Evaluation of International Development
Programs: A View from the Front." American Journal of Evaluation
21 (1):95-102.
Bamberger, Michael. 2009. "Why Do Many International Development
Evaluations Have a Positive Bias?: Should We Worry?" Evaluation
Journal of Australasia 9 (2):39-49.
Bamberger, Michael, Jim Rugh, and Linda Mabry. 2011. RealWorld
Evaluation: Working Under Budget, Time, Data, and Political
Constraints. 2nd ed: SAGE Publications.
Berlage, L., and O. Stokke, eds. 1992. Evaluating Development Assistance:
Approaches and Methods, EADI Book Series 14: Evaluation of Aid.
London: Frank Cass.
Binnendijk, Annette L. 1989. "Donor Agency Experience With the Monitoring
and Evaluation of Development Projects." Evaluation Review 13
(3):206-222.
Boardman, A., D. Greenberg, A. Vining, and D. Weimer. 2011. Cost-Benefit
Analysis: Concepts and Practice. 4th ed. Boston Pearson Education.
Brent, Robert J. 2006. Applied Cost-benefit Analysis. 2nd ed. Edward Elgar.
Cameron, John. 2011. "Social Cost-Benefit Analysis - Principles." In Valuing
Water, Valuing Livelihoods - Guidance on Social Cost-Benefit
Analysis of Drinking-Water Interventions, with Special Reference to
Small Community Water Supplies, edited by John Cameron, Paul
Hunter, Paul Jagals and Katherine Pond, 199-216. London, UK: IWA
159
Publishing.
Camfield, Laura, Maren Duvendack, and Richard Palmer-Jones. 2014.
"Things You Wanted to Know about Bias in Evaluations but Never
Dared to Think." IDS Bulletin 45 (6):49-64.
Carden, Fred. 2013. "Evaluation, Not Development Evaluation." American
Journal of Evaluation 34 (4):576-579.
Chelimsky, Eleanor. 1997. "Thoughts for a New Evaluation Society."
Evaluation 3 (1):97-109.
Chianca, Thomaz. 2008. "The OECD/DAC Criteria for International
Development Evaluations: an Assessment and Ideas for
Improvement." Journal of Multidisciplinary Evaluation 5 (9):41-51.
Christie, Christina A, and Marvin C Alkin. 2013. "An Evaluation Theory
Tree." In Evaluation Roots: a Wider Perspective of Theorists’ Views
and Influences, edited by Marvin C Alkin, 11-57. Thousand Oaks,
CA: Sage Publications.
Cracknell, Basil E. 2000. Evaluating Development Aid: Issues, Problems and
Solutions. New Delhi: SAGE Publications.
Cullen, Anne E., and Chris L. S. Coryn. 2011. "Forms and Functions of
Participatory Evaluation in International Development: A Review of
the Empirical and Theoretical Literature." Journal of
MultiDisciplinary Evaluation 7 (16):32-47.
Dale, Reidar. 2004. Evaluating Development Programmes and Projects. 2nd
ed. New Delhi: SAGE Publications.
Davidson, E Jane. 2005. Evaluation Methodology Basics: The Nuts and Bolts
of Sound Evaluation. Thousand Oaks: SAGE Publications.
160
de Rus, Ginés. 2010. Introduction to Cost-Benefit Analysis: Looking for
Reasonable Shortcuts. Cheltenham, UK: Edward Elgar.
Donaldson, Stewart I. 2007. Program Theory-Driven Evaluation Science:
Strategies and Applications. New York: Psychology Press.
Donaldson, Stewart I., ed. 2013. The Future of Evaluation in Society: A
Tribute to Michael Scriven. Charlotte, NC: Information Age
Publishing.
EDCF. 2012. Water Supply Expansion Project in Juigalpa, Nicaragua - Ex-
post Evaluation Report. Seoul: Export-Import Bank of Korea.
EDCF. 2014. Ex-post Evaluation on Medical Equipment Provision to Ha
Trung District General Hospital in Thanh Hoa Project, Vietnam.
Seoul: Export-Import Bank of Korea.
Fewtrell, Lorna, and John M. Jr. Colford. 2004. Water, Sanitation and
Hygiene: Interventions and Diarrhoea - A Systematic Review and
Meta Analysis. Washington, DC: World Bank.
Forss, Kim, and Sara Bandstein. 2008. "Evidence-based Evaluation of
Development Cooperation: Possible? Feasible? Desirable?" IDS
Bulletin 39 (1):82-89.
Fournier, Deborah M. 1995. "Establishing Evaluative Conclusions: A
Distinction between General and Working Logic." New Directions for
Evaluation 1995 (68):15-32. doi: 10.1002/ev.1017.
Frank, Robert H. 2000. "Why Is Cost‐Benefit Analysis so Controversial?"
The Journal of Legal Studies 29 (S2):913-930.
Guba, Egon G., and Yvonna S. Lincoln. 1989. Fourth Generation Evaluation.
Newbury Park, Calif. : Sage Publications.
161
Hansen, Mark, Marvin C. Alkin, and Tanner LeBaron Wallace. 2013.
"Depicting the logic of three evaluation theories." Evaluation and
Program Planning 38:34-43.
Heider, Caroline 2017. "Rethinking Evaluation – Have we had enough of
R/E/E/I/S?: After nearly 15 years of adhering to the DAC evaluation
criteria, is it time for a rethink?" World Bank Group Rethinking
Evaluation series. https://ieg.worldbankgroup.org/blog/rethinking-
evaluation.
Hutton, Guy, and Laurence Haller. 2004. Evaluation of the Costs and Benefits
of Water and Sanitation Improvements at the Global Level. Geneva:
World Health Organization.
Hutton, Guy, Laurence Haller, and Jamie Bartram. 2007. "Global Cost-Benefit
Analysis of Water Supply and Sanitation Interventions." Journal of
Water and Health 5 (4):481-502.
IEG. 2017. "Conversations: the Future of Development Evaluation." Last
Modified June 21, 2017, accessed June 25, 2017.
https://ieg.worldbankgroup.org/news/conversations-future-
development-evaluation.
IFAD. 2015. IFAD’s Internal Guidelines – Economic and Financial Analysis
(EFA) of Rural Investment Projects, Vol. 1. Basic Concepts and
Rationale.
IFAD. 2016a. IFAD’s Internal Guidelines – Economic and Financial Analysis
(EFA) of Rural Investment Projects, Vol. 2. Economic and Financial
Analysis of Rural Investment Projects.
IFAD. 2016b. IFAD’s Internal Guidelines – Economic and Financial Analysis
(EFA) of Rural Investment Projects, Vol. 3. Case Studies.
162
Igarashi, Masahiro, and Omar Awabdeh. 2015. "Weaning from DAC Criteria."
The 5th Biennial International Conference of Sri Lanka Evaluation
Association, Colombo, 15-16 September 2015.
Julnes, George. 2012. "Managing Valuation." New Directions for Evaluation
2012 (133):3-15.
Keeney, Ralph L., and Robin S. Gregory. 2005. "Selecting Attributes to
Measure the Achievement of Objectives." Operations Research 53
(1):1-11.
King, Jean A. 2003. "The Challenge of Studying Evalutation Theory." New
Directions for Evaluation 2003 (97):57-68.
King, Julian. 2015. "Letter to the editor: Use of Cost-benefit Analysis in
Evaluation." Evaluation Journal of Australasia 15 (3):37-41.
King, Julian. 2017. "Using Economic Methods Evaluatively." American
Journal of Evaluation 38 (1):101-113.
KOICA. 2014. Ex-post Evaluation on the Project for the Establishment of an
Early Warning System for Disaster Mitigation in Philippines.
Seongnam: Korea International Cooperation Agency (KOICA).
KOICA. 2016. Ex-Post Evaluation of the Project for the Construction of
Water Supply System in Buon Ho Town, Vietnam. Seongnam: Korea
International Cooperation Agency (KOICA).
Mark, Melvin M. 2008. "Building a Better Evidence Base for Evaluation
Theory: Beyond General Calls to a Framework of Types of Research
on Evaluation." In Fundamental Issues in Evaluation, edited by Nick
L. Smith and Paul R. Brandon. New York: Guilford Press.
Mark, Melvin M., Jennifer C. Greene, and Ian Shaw. 2006. "The Evaluation
of Policies, Programs, and Practices." In Handbook of Evaluation:
163
Policies, Programs and Practices, edited by Ian Shaw, Jennifer C.
Greene and Melvin M. Mark. London: SAGE.
Markiewicz, Anne, and Ian Patrick. 2016. Developing Monitoring and
Evaluation Frameworks. Thousand Oaks, CA.: SAGE Publications.
Mathison, Sandra, ed. 2005. Encyclopedia of Evaluation. Thousand Oaks:
SAGE Publications.
McDonald, Diane. 1999. "Developing guidelines to enhance the evaluation of
overseas development projects." Evaluation and Program Planning
22 (2):163-174.
Michaelowa, Katharina, and Axel Borrmann. 2006. "Evaluation Bias and
Incentive Structures in Bi- and Multilateral Aid Agencies." Review
of Development Economics 10 (2):313-329.
Miller, Robin Lin, and Donald Campbell. 2006. "Taking Stock of
Empowerment Evaluation - an Empirical Review." American
Journal of Evaluation 27 (3):296-319.
Morra-Imas, L.G., and R.C. Rist. 2009. The Road to Results: Designing and
Conducting Effective Development Evaluations: World Bank.
OECD. 1991. Principles for Evaluation of Development Assistance.
OECD. 2002. "Glossary of Key Terms in Evaluation and Results Based
Management." OECD DAC Working Party on Aid Evaluation.
http://www.oecd.org/dataoecd/29/21/2754804.pdf.
OECD. 2006. Cost-Benefit Analysis and the Environment: Recent
Developments. Paris: OECD Publishing.
OECD. 2011. Benefits of Investing in Water and Sanitation: An OECD
Perspective. Paris: OECD Publishing.
164
OECD. 2013. The DAC Network on Development Evaluation – 30 years of
strengthening learning in development. Paris: DAC Network on
Development Evaluation, OECD.
OECD. 2016. Evaluation Systems in Development Co-operation: 2016
Review. Paris: OECD Publishing.
OECD. n.d. DAC Criteria for Evaluating Development Assistance Factsheet.
Accessed 2012.11.29.
Owen, John M., and Patricia J. Rogers. 1999. Program Evaluation: Forms
and Approaches. London: SAGE.
Patton, Michael Quinn. 1994. "Developmental Evaluation." Evaluation
Practice 15 (3):311-319.
Patton, Michael Quinn. 1997. Utilization-Focused Evaluation: the New
Century Text. 3rd ed. Thousand Oaks, CA: Sage Publications.
Raimondo, Estelle, Jos Vaessen, and Michael Bamberger. 2016. "Towards
More Complexity-Responsive Evaluations: Overview and
Challenges." In Dealing with Complexity in Development Evaluation:
A Practical Approach, edited by Michael Bamberger, Jos Vaessen and
Estelle Raimondo, 26-47. Thousand Oaks, CA: SAGE Publications.
Rebien, Claus C. 1996. Evaluating Development Assistance in Theory and in
Practice. Aldershot: Avebury.
Rebien, Claus C. 1997. "Development Assistance Evaluation and the
Foundations of Program Evaluation." Evaluation Review 21
(4):438-460.
Riddell, Roger C. 2007. Does Foreign Aid Really Work?: Oxford University
Press.
165
Rossi, Peter H., Mark W. Lipsey, and Howard E. Freeman. 2004. Evaluation:
a Systematic Approach. 7th ed. Thousand Oaks, CA: Sage.
Rudd, Murray A. 2009. "Nonmarket Economic Valuation and "The
Economist’s Fallacy"." Journal of MultiDisciplinary Evaluation 6
(11):112-115.
Scott-Little, Catherine, Mary Sue Hamann, and Stephen G. Jurs. 2002.
"Evaluations of After-School Programs: A Meta-Evaluation of
Methodologies and Narrative Synthesis of Findings." American
Journal of Evaluation 23 (4):387-419.
Scriven, Michael. 1976. "Evaluation Bias and Its Control." In Evaluation
Studies Review Annual (Vol. 1), edited by G. V. Glass. Beverly Hills,
CA: Sage.
Scriven, Michael. 1981. The Logic of Evaluation. Edgepress.
Scriven, Michael. 1991. Evaluation Thesaurus. 4th ed. Newbury Park: SAGE
Publications.
Scriven, Michael. 1994. "Evaluation as a Discipline." Studies in Educational
Evaluation 20 (1):147-166.
Scriven, Michael. 1995. "The Logic of Evaluation and Evaluation Practice."
New Directions for Evaluation 1995 (68):49-70.
Scriven, Michael. 2007. The Logic and Methodology of Checklist. Retrieved
June 22, 2017, from The Evaluation Center, evaluation checklists
website: www.wmich.edu/evaluation/checklists.
Scriven, Michael. 2008a. "The Concept of a Transdiscipline: And of
Evaluation as a Transdiscipline." Journal of MultiDisciplinary
Evaluation 5 (10):65-66.
166
Scriven, Michael. 2008b. "The Economist's Fallacy." Journal of
MultiDisciplinary Evaluation 5 (9):74-76.
Scriven, Michael. 2015. Key Evaluation Checklist.
Shadish, William R., Thomas D. Cook, and Laura C. Leviton. 1991.
Foundations of Program Evaluation: Theories of Practice. Newbury
Park, Calif.: Sage Publications.
Shaw, Ian, Jennifer C. Greene, and Melvin M. Mark, eds. 2006. Handbook
of Evaluation: Policies, Programs and Practices. London: SAGE.
Shusterman, Richard. 1980. "The Logic of Evaluation." The Philosophical
Quarterly (1950-) 30 (121):327-341.
Smith, Nick L. 2010. "Characterizing the Evaluand in Evaluating Theory."
American Journal of Evaluation 31 (3):383-389.
Smith, Nick L., and Paul R. Brandon, eds. 2008. Fundamental Issues in
Evaluation. New York: Guilford Press.
Snell, Michael. 2011. Cost-benefit Analysis: A Practical Guide. 2nd ed.
London: Thomas Telford.
Stake, Robert E. 2004. Standards-Based and Responsive Evaluation.
Thousand Oaks: Sage Publications.
Stockmann, Reinhard. 2013. "The Role of Evaluation in Society." In
Functions, Methods and Concepts in Evaluation Research, edited by
Reinhard Stockmann and Wolfgang Meyer, 8-53. Palgrave Macmillan
UK.
Stockmann, Reinhard, and Wolfgang Meyer, eds. 2016. The Future of
Evaluation: Global Trends, New Challenges, Shared Perspectives.
London: Palgrave Macmillan UK.
167
Stufflebeam, Daniel L. 2003. "The CIPP Model for Evaluation." In
International Handbook of Educational Evaluation, edited by Thomas
Kellaghan and Daniel L. Stufflebeam, 31-61. Dordrecht: Springer
Netherlands.
Stufflebeam, Daniel L. 2007. CIPP Evaluation Model Checklists. 2nd ed.
Stufflebeam, Daniel L., and Chris L. S. Coryn. 2014. Evaluation Theory,
Models, and Applications. 2nd ed. San Francisco, CA: Jossey-Bass.
Van Den Berg, Rob D. 2005. "Results Evaluation and Impact Assessment in
Development Co-operation." Evaluation 11 (1):27-36.
Vedung, Evert. 2010. "Four Waves of Evaluation Diffusion." Evaluation 16
(3):263-277.
Waddington, Hugh, Birte Snilstveit, Howard White, and Lorna Fewtrell. 2009.
Water, Sanitation and Hygiene Interventions to Combat Childhood
Diarrhoea in Developing Countries. New Delhi, India: 3ie.
Watts, Brad R. 2008. "Understanding Opportunity Costs and the Economist’s
View: A Response to Scriven's "The Economist's Fallacy"." Journal
of MultiDisciplinary Evaluation 5 (10):89-92.
Weiss, Carol H. 1998. Evaluation: Methods for Studying Programs and
Policies. Upper Saddle River, NJ: Prentice Hall.
World Bank. 2010. Cost-Benefit Analysis in World Bank Projects.
168
APPENDIX. List of Evaluation Reports Reviewed in Chapter 4
1. KOICA
N Title of Evaluation Size
(mil USD) Sector
2015
1
Ex-post Evaluation on the Project for the
Development of the Vocational Training Capacity in
Uzbekistan
4.0 education
2
Ex-post Evaluation on the Project for the
Establishment of the Korea-Nepal Institute of
Technology in Butwal
5.7 education
3 Ex-Post Evaluation Report on the Project for the
Improvement of Blood Bank in Irbid Jordan 3.1 health
4 Ex-Post Evaluation on the Projects for Management
of Mercury Waste in Egypt 3.0 energy
5
Ex-Post Evaluation on the Projects for Construction
of Municipal Solid Waste Recycling Facility for
Ulaanbaatar City, Mongolia
3.5 energy
6
Ex-post Evaluation on the project for Solar-powered
irrigation pump and solar home system in
Bangladesh
2.5 energy
7 Ex-post Evaluation on the Project for Integrated
Community Development in Comilla, Bangladesh 3.5 agriculture
2014
8
Ex-Post Evaluation on the Projects for Developing
and Publishing Textbooks for Upper Secondary
Schools in Lao PDR
3.0 education
9
The Project for the Establishment of Bangladesh-
Korea ICT Training Center for Education
(BKITCE)
1.4 education
10
Project for the Establishment of a Morocco-Korean
ICT Training Center for Moroccan Teachers
(CMCF TICE)
3.0 education
11
Ex-Post Evaluation on the Projects for Improving
the Korea-Peru Health Center in Bellavista, Callao,
Peru
2.0 health
12
Ex-Post Evaluation on the Projects for Public Health
Service Improvement for Mother and Child in El
Alto, Bolivia
1.3 health
169
13
Ex-Post Evaluation on the Projects for Improving
Maternal and Child Health Care Services in
Paraguay
3.3 health
14
Ex-Post Evaluation on the Projects for Construction
of Maternal Homes for Maternal and Child Heath in
El Salvador
2.0 health
15
Ex-Post Evaluation on the Project for the
Establishment of Lao-Korea National Children's
Hospital
3.5 health
16
Ex-Post Evaluation on the Project for the
Establishment of the Central General Hospital in
Quang Nam Province, Vietnam
35.0 health
17
Ex-Post Evaluation on the Project for
Informatization of the Central State Archives in
Uzbekistan
3.0 e-government
18
Ex-Post Evaluation on the Project for the
Establishment of a Pilot e-Procurement System in
Mongolia
4.6 e-government
2013
19
Ex-Post Evaluation on the Project for the
Automation of Intellectual Property Administration
in Mongolia
3.1 e-government
20 Ex-Post Evaluation on the Project for Modernization
of Tanzania Customs Administration 3.3 e-government
21
Ex-Post Evaluation on the Project for the
Establishment of the Emergency Response System
in Sri Lanka
2.0 e-government
22
Ex-Post Evaluation on the Project for the
Construction of Storm Water Drainage at
Valachchenai in Sri Lanka
3.9 energy
23
Ex-Post Evaluation on the Projects for the
Establishment of an Early Warning System for
Disaster Mitigation Phase I in Philippines
1.0 disaster
prevention
24
Ex-Post Evaluation on Project for the establishment
of Early Warning and Response System for Disaster
Mitigation Phase II in Metro Manila
3.0 disaster
prevention
25
Ex-Post Evaluation on the Project for the Grid
Connected Photovoltaic Power Generation in Sri
Lanka
3.0 energy
26
Ex-Post Evaluation on the Project for the
Establishment of Hybrid <PV/Diesel/Batteries>
Power System
2.2 energy
170
27 Ex-Post Evaluation on the Project for Construction
of Siem Reap Bypass Road in Cambodia 17.4 transportation
28 Ex-Post Evaluation on the Project for the Integrated
Rural Development in Arsi Zone, Ethiopia 1.9 agriculture
29
Ex-post Evaluation Report on the Project for
Upgrading Auto-Maintenance Vocational Training
Center in Embaba, Guiza
2.0 education
30
Ex-post Evaluation Report on Program for the
Improvement for the Automotive Vocational
Training System
5.0 education
31
Ex-post Evaluation Report on the Project for the
Establishment of Industrial Training Center in
Thagaya, Myanmar
2.3 education
32
Ex-post Evaluation Report on the Project for
Upgrading Jaffna Technical College as a College of
Technology
2.3 education
33
Ex-post Evaluation Report on the Project for
Establishment of Korea-Vietnam Friendship IT
College in Danang
10.0 education
34 Ex-post Evaluation Report on the Hlegu Township
Rural Development Project in Myanmar 2.0 agriculture
35
Ex-post Evaluation Report on the Project for
Irrigation Technology Capacity Building in Upper
Myanmar
2.0 agriculture
36 Ex-post Evaluation Report on the Project for
Batheay Flood Control in Cambodia 2.0
disaster
prevention
37
Ex-post Evaluation Report on the Project for
Construction of Irrigation System in Batheay,
Cambodia
2.5 agriculture
38
Ex-post Evaluation Report on the Project for
Modernization of the Traffic Management System
in Erbil
5.0 transportation
39 Ex-post Evaluation Report on the Project for
Busuanga Airport Development in the Philippines 3.0 transportation
40
Ex-Post Evaluation on the Project to Reduce Air
Pollution by Improving Heating Culture in
Ulaanbaatar, Mongolia
0.7 environment
41
Ex-post Evaluation Report on the Project for
Improving the District Heating and Water Supply
System in Ulaanbaatar, Mongolia
5.0 energy
42
Ex-post Evaluation Report on the Project for
Improving Heat Supply System in Khorezm,
Uzbekistan
3.5 energy
171
43
Ex-post Evaluation Report on the 3th Phase of the
Project for Upgrading the Korea-Vietnam
Friendship Clinic in Hanoi, Veitnam
1.3 health
44
Ex-post Evaluation Report on the Program for the
Improvement of Maternal and Child Health in
Chimaltenango, Guatemala
3.0 health
45
Ex-post Evaluation Report on the Project for the
Improvement of Maternal and Neonatal Health in
Guatemala
1.5 health
46
Ex-post Evaluation Report on the Project for
Establishment of an E-procurement Pilot System in
Vietnam
3.0 e-government
47
Ex-post Evaluation Report on the Project for
Modernization of Communication and Information
System of the State Ministries of the Republic of
Paraguay
2.5 e-government
2. ECDF
N Title of Evaluation Size
(mil USD) Sector
2015
48 Ex-post Evaluation on the Improvement of Padeniya
- Anuradhapura Road Project in Sri Lanka 66.0 transportation
49 Ex-Post Evaluation on the Procurement of
Locomotive Project Phase III in Bangladesh 28.0 transportation
50 Ex-post Evaluation on the Creation of Capabilities in
Vocational Training Centers Project in Nicaragua 12.6 education
51 Ex-post Evaluation on the Myanmar Basic e-
Government Project 20.0 e-government
52 Ex-post Evaluation on the Hospitals Modernization
Projects (BIH-001) in Bosnia and Herzegovina 20.0 health
53 Ex-post Evaluation on the Hospitals Modernization
Projects (BIH-002) in Bosnia and Herzegovina 50.0 health
2014
54 Ex-Post Evaluation on Indonesia Manado By‐Pass
Project I 10.0 transportation
55 Ex-Post Evaluation on Bolivia Pailón-San José
Highway Construction Project (Component 2) 23.0 transportation
56 Ex-post Evaluation on Power Sector Development
Project, Sri Lanka 7.5 energy
172
57 Ex-post Evaluation on Power Distribution
Improvement Project, Myanmar 16.8 energy
58 Ex-post Evaluation on Medical Equipment Provision
to Ha Trung District General Hospital in Thanh Hoa
Project, Vietnam
3.0 health
59 Ex-post Evaluation on Medical Equipment Supply to
Lai Chau Provincial General Hospital Project,
Vietnam
10.0 health
2013
60 Ex-post Evaluation on Transmission Line and
Substation Project in Luzon, the Philippines 4.7 energy
61 Ex-post Evaluation on Mindanao Power
Transmission Project in the Philippines 9.7 energy
62 Ex-post Evaluation on GSO Road Expansion and
Emergency Dredging Project in Philippines 21.8 transportation
63 Ex-post Evaluation on National Road No.3
Rehabilitation Project in Cambodia 36.7 transportation
64 Ex-post Evaluation on Upgrading of Niyagama
National Vocational Traninig Center Project 8.5 education
65 Ex-post Evaluation on Re-engineering Government
Component of e-Sri Lanka Project 15.0 e-government
173
ABSTRACT IN KOREAN
국문 초록
비용편익분석 틀을 적용한 OECD/DAC 개발평가기준의 비판적 검토
경제협력개발기구 개발원조위원회(OECD/DAC)의 개발사업을 위한
평가기준(evaluation criteria)은 1991년에 채택된 이후 개발원조사업을
평가하는 가장 영향력있는 평가틀로 활용되고 있다. 적절성, 효과
성, 효율성, 영향력, 지속가능성의 5개 항목으로 구성되는 DAC 평가
기준은 대부분의 국제기구와 공여기관에 표준으로 제도화되면서 성
공적인 개발협력사업이 갖추어야 할 요건으로 인식된다.
본 연구는 OECD/DAC 5대 기준을 적용한 평가가 실제보다 긍정적
인 결과를 도출할 수 있다는 데 주목하고, 이를 비용편익분석의 관
점에서 이론적으로 분석하고 사례를 통해 확인하였다. 평가이론에
서 적용하는 평가논리(general logic of evaluation)의 단계, 즉 1) 평가
대상의 가치를 정의하는 항목 또는 기준(criteria)의 설정, 2) 각 기준
에 대한 가치판단 준거(standard)의 설정과 그에 따른 평가, 3) 각 항
목 평가 결과의 종합과 최종 평가결과의 도출의 3단계로 구분하여
비용편익분석에서 기준이 되는 순현재가치(NPV)와 비교 분석하였다.
주요 평가 이론과 비교할 때, DAC 평가기준은 사후 평가가 다루
어야 할 요소를 포괄하고 있으며, 개발협력사업의 특성을 반영하여
정형화된 기준과 표준 문항을 제시함으로써 공여국 입장에서 종합
174
적이고 일관된 평가 관리를 가능하게 한다는 장점이 있다. 단, 개
념과 정의가 포괄적이고 불명확하여 개발사업의 가치와 중요성을
정의하는 데 한계가 있는 것으로 분석된다. 하나의 평가틀로 보자
면, DAC 평가기준은 상호 의존적이며 중복되는 면이 있다. 효과성
과 효율성 항목은 적절성이 확보된 경우 판단의 의미가 있으며, 지
속가능성은 사업의 결과가 긍정적이라는 효과성 및 영향력의 충족
을 전제로 한다. 각 기준이 독립적으로 평가되고 이를 근거로 종합
적인 성공 여부를 판단할 경우 최종적으로는 긍정적인 결론이 도출
될 위험이 있다. 한국의 원조공여기관인 KOICA와 EDCF의 3년간
(2013-2015) 사후평가보고서 총 65건을 검토한 결과, 이러한 한계가
실제 적용상에도 나타나는 것으로 분석되었다.
DAC 5대 기준을 적용한 평가가 긍정적인 편향(positive bias)을 유
발할 수 있다는 점은 비용편익분석에서 활용되는 성공적 사업의 판
단 기준에 비추어 보면 좀더 명확해진다. 본 연구에서는 니카라과
와 베트남에서 시행된 상수도 사업을 사례로 두가지 평가 결과를
비교 분석하였다. 그 결과, DAC 평가틀을 적용할 경우 사회적 가치
가 없거나 낮은 사업이 성공적인 사업으로 평가될 가능성이 있으며
이러한 긍정 편향이 사업의 효과와 일관되지 않게 발생함으로써 사
업간 성공 여부를 객관적으로 차별화하기 어렵다는 점을 확인하였
다. 이는 기준별 평가 결과를 종합할 때 임의적으로 동일한 가중치
를 적용하는 데 따른 것으로, 5개 기준 간 난이도가 상이하고 중요도
가 사업에 따라 달라지기 때문인 것으로 분석된다.
본 연구는 비용편익분석이 사업의 가치를 측정하는 개념틀로서
개발사업의 사후평가에 적용될 수 있는 방법을 제시하고, 비용편익
분석 틀을 접목하여 DAC 평가기준에 보완이 필요한 부분을 도출하