185
Doctoral Thesis Revisiting the OECD/DAC Evaluation Criteria Applying Cost-Benefit Analysis Framework 비용편익분석 틀을 적용한 OECD/DAC 개발평가기준의 비판적 검토 February 2018 Graduate School of International Studies Seoul National University Eunsuk Lee

Revisiting the OECD/DAC Evaluation Criteria Applying Cost-Benefit …s-space.snu.ac.kr/bitstream/10371/140778/1/000000150430.pdf · 2019-11-14 · Doctoral Thesis Revisiting the OECD/DAC

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Doctoral Thesis

Revisiting the OECD/DAC Evaluation Criteria

Applying Cost-Benefit Analysis Framework

비용편익분석 틀을 적용한

OECD/DAC 개발평가기준의 비판적 검토

February 2018

Graduate School of International Studies

Seoul National University

Eunsuk Lee

i

ABSTRACT

In the field of development evaluation, the OECD DAC’s five criteria have

become an influential evaluation framework, adopted by most of the aid

agencies. This study examines how the DAC criteria framework affects the

evaluation results from the perspective of cost-benefit analysis (CBA).

The study focuses on the question whether the use of DAC criteria tends

to produce positively biased evaluation results. If the risk of positive bias is

inherent in the pre-determined criteria, it inevitably harms the credibility and

validity of evaluations. Positive bias may also reduce comparability between

evaluations that is one of the purposes for which the DAC criteria were

developed, consequently making it difficult to differentiate a more successful

intervention from a less one and possibly causing inconsistency across

evaluation results.

Implementing the ‘general logic of evaluation’ as the analytical

framework, the study organizes the analyses into three stages, dealing with: first,

the notion of merit or success that the DAC criteria define; second, the standard

of judgement and source of supporting evidence in each criterion; and third, the

method of synthesis with which the assessments in the five criteria are

integrated into the overall evaluative conclusion on the intervention. CBA

provides a conceptual framework through which the evaluation results applying

the DAC criteria can be analyzed in comparison.

Given that the term ‘evaluation criteria’ is defined as the aspects,

qualities, or dimensions that distinguish a more valuable evaluand from a less

valuable one, the DAC criteria, namely relevance, effectiveness, efficiency,

impact and sustainability, would represent the merit of the program being

evaluated, encompassing the properties that constitute a successful intervention.

ii

The study finds that the DAC criteria cover most of the dimensions that general

evaluation models suggest, but with some limitations. The ‘relevance’

criterion primarily focuses on the policy context and asks ex-ante questions

with less attention to the need of the program or the logical linkage between the

program and expected results. Measuring ‘effectiveness’, defined as the

achievement of program objectives, and ‘efficiency’, e.g., whether completed

on time and within the budget, is largely based on the assumption that the

program’s initial plan and objectives are valid. The ‘sustainability’ criterion

assumes that the results are beneficial and worth continuing.

This logical interdependence between criteria may affect assessment in

the criterion at issue because the standard of judgement may have to be adjusted

according to the assessment in another criterion. Whether to achieve the

stated objectives, the primary condition to satisfy the effectiveness criterion,

may not be an appropriate standard to judge the program’s success, if the stated

objectives are not valid. So meeting the relevance criterion would be the

precondition for meaningful assessment for effectiveness, otherwise the

standard should be reestablished. The similar interdependence lies between

effectiveness/impact and sustainability. What kinds and magnitude of

benefits can be sustained depends on the analysis in effectiveness and impact

criteria, so the standards of judgement in sustainability criterion should be

drawn accordingly. If the assessments are made separately and integrated into

an overall conclusion without a thoughtful consideration of relative importance,

there is a risk of bias in overall results.

The review of 65 ex-post evaluation reports by two Korean aid agencies,

KOICA and EDCF, confirms the analysis above. The evaluations under

review applied the DAC criteria framework mostly using the standardized

questions and standards of judgment in the guidelines. Some of the questions

address what is taken for granted as a development project or do not necessarily

iii

require in-depth research. In most cases, assessments in each criterion were

made independently, though some of them should have rested on the

information found in other criteria. This leads to high ratings in average

especially in relevance and effectiveness criteria. 88% of the reports

concluded that the evaluand is either successful or very successful, even in the

cases where development results were not deemed to be significant. The

evaluation conclusions are drawn on a simple average of the ratings by criterion,

and a serious flaw detected in one criterion, e.g., sustainability, is often

compensated by high rating in another, e.g., relevance. Such mechanical

applications of the DAC criteria and standardized questions can mislead

evaluation results, in most cases towards positive conclusions, which make it

difficult to understand the true value of the project, to differentiate a more

successful project from a less one, and to draw valid lessons from the evaluation.

To explain the suspected positive bias in the DAC criteria, this study

adopts the criterion used in CBA, namely the net present value (NPV), as a

benchmark. Theoretically, CBA considers all direct and indirect effects as

consequences of a project against costs for a certain period. The NPV, as an

indicator that represents the social value of a project, covers the scope of the

DAC criteria: the need of program (relevance), benefits in relation with costs

(efficiency), increase in social welfare (effectiveness), beneficial effects in

comparison to negative effects (impact), and the duration that the benefits

continue (sustainability). The analysis shows that certain events, which may

reduce the net social benefit seriously making the NPV negative, only partially

affect the assessments in the DAC criteria framework leading to an overall

positive conclusion. A comparative case study of two water supply projects

suggests that the DAC criteria framework may yield positively biased

evaluation results for a project whose net social benefit is relatively small and

even possibly negative, getting same level of assessment as or even higher than

iv

another project whose net social benefit is much larger and stable. The study

also finds that positive bias may occur unevenly between projects, resulting in

inconsistency across evaluation results. This is due to the arbitrary relative

weights placed on the DAC criteria in addition to the problems of uneven

difficulties and imbalanced importance between the criteria. The findings

have significant policy implications, since positive bias and its inconsistent

occurrence in evaluation results may seriously weaken the validity of

evaluations, and consequently mislead the agencies’ learning and decision

making which are the primary purposes of evaluation.

This study contributes to the relatively small body of academic literature

on international development evaluation, by adding value from a new

perspective to look into the DAC criteria through the cost-benefit analysis

framework. It would also enrich the recent discussion on how to improve the

DAC criteria, providing both theoretical and empirical analyses.

Keywords: DAC Criteria, Development Evaluation, Cost-Benefit Analysis,

Evaluation Theory, Evaluation Method, Positive Bias

v

TABLE OF CONTENTS

Abstract ......................................................................................................... i

List of Figures, Tables and Box .................................................................. viii

Chapter 1. Introduction ................................................................................. 1

1.1. Motivation and Purpose of the Study .................................................... 1

1.2. Research Questions ............................................................................... 4

1.3. Organization of the Thesis .................................................................... 6

Chapter 2. Research Design .......................................................................... 7

2.1. Literature Review ................................................................................. 7

2.1.1. Outline of Literature Review ......................................................... 7

2.1.2. Literature in the General Evaluation Discipline ............................ 8

2.1.3. Development Evaluation ............................................................. 19

2.1.4. DAC Evaluation Criteria ............................................................. 22

2.1.5. Cost-Benefit Analysis and Evaluation ......................................... 26

2.2. Concepts and Scope of the Study ........................................................ 31

2.3. Analytical Framework ......................................................................... 35

Chapter 3. Notion of Merit – Definition and Scope of DAC Criteria ...... 38

3.1. Key Requirements for Evaluation Criteria.......................................... 38

3.2. General Criteria in Evaluation Models ............................................... 43

vi

3.2.1. Key Evaluation Checklist (KEC) framework .............................. 44

3.2.2. CIPP Evaluation Model ............................................................... 47

3.2.3. Theory-driven Evaluations .......................................................... 50

3.3. Definitions and Scope of the DAC Evaluation Criteria ...................... 53

3.3.1. Characteristics of the DAC Criteria ............................................. 53

3.3.2. Relevance .................................................................................... 55

3.3.3. Effectiveness ................................................................................ 58

3.3.4. Efficiency..................................................................................... 62

3.3.5. Impact .......................................................................................... 64

3.3.6. Sustainability ............................................................................... 66

3.4. Criterion in Cost-Benefit Analysis – Net Present Value ..................... 69

3.5. Discussion ........................................................................................... 73

Chapter 4. Standard of Judgement – Review of Evaluation Reports in

Korean Agencies .......................................................................... 76

4.1. Overview ............................................................................................. 76

4.2. Result of the Analysis ......................................................................... 80

4.2.1. Relevance .................................................................................... 80

4.2.2. Efficiency..................................................................................... 85

4.2.3. Effectiveness and Impact ............................................................. 88

4.2.4. Sustainability ............................................................................... 93

4.2.5. Discussion.................................................................................... 96

4.3. Standards in the DAC Criteria and Net Present Value ........................ 97

vii

Chapter 5. Method of Synthesis - the DAC Framework and Cost-Benefit

Analysis in Comparison ........................................................... 104

5.1. Overview ........................................................................................... 104

5.2. Comparison of DAC Framework and CBA: Hypothetical Cases ..... 108

5.2.1. Illustrative Comparison between Two Projects ......................... 108

5.2.2. DAC Framework and CBA in Five Scenarios ............................ 112

5.3. Comparative Case Study - Evaluations of Water Supply Projects .... 124

5.3.1. Project Description .................................................................... 124

5.3.2. Evaluation Results in the DAC Criteria Framework ................. 132

5.3.3. Results of Cost-Benefit Analysis ............................................... 136

5.3.3.1. Theoretical Basis of CBA for Water Supply Intervention .. 136

5.3.3.2. Estimating Benefits of Two Water Supply Projects ........... 140

5.3.3.3. Results of Cost-Benefit Analysis in Comparison ............... 145

5.4. Discussion ......................................................................................... 150

Chapter 6. Conclusions .............................................................................. 152

References ................................................................................................... 157

Appendix. List of Evaluation Reports Reviewed in Chapter 4 .............. 168

Abstract in Korean ..................................................................................... 173

viii

LIST OF FIGURES, TABLES AND BOX

Figure 1. Christie and Alkin's Evaluation Theory Tree .................................. 17

Figure 2. Development of Evaluation Paradigm and the DAC Criteria ......... 24

Figure 3. General Logic of Evaluation ........................................................... 34

Figure 4. Analytical Framework of the Study ................................................ 35

Figure 5. The Evaluation Hierarchy ............................................................... 51

Figure 6. Net Present Value (NPV) and DAC Criteria ................................... 71

Figure 7. Comparison of Dimensions or Criteria in Evaluation Models ........ 73

Figure 8. Evaluation Reports Classified by Sector ......................................... 79

Figure 9. Evaluation Questions in Relevance Criterion ................................. 81

Figure 10. Evaluation Questions in Efficiency Criterion ............................... 86

Figure 11. Evaluation Questions in Effectiveness Criterion .......................... 89

Figure 12. Evaluation Questions in Impact Criterion ..................................... 90

Figure 13. Evaluation Questions in Sustainability Criterion .......................... 94

Figure 14. Consumer Surplus of Water Supply Project (1) .......................... 139

Figure 15. Consumer Surplus of Water Supply Project (2) .......................... 139

Table 1. Views on Historical Development of Evaluation ............................. 15

Table 2. Key Requirements of Evaluation Criteria and Possible Sources of Bias

........................................................................................................................ 42

Table 3. Rating System and Soring Scale ....................................................... 77

Table 4. Descriptive Data of the Samples ...................................................... 78

Table 5. Main Standards of Judgment in Relevance Criterion ....................... 82

Table 6. Average Ratings on Relevance ......................................................... 84

Table 7. Main Standards of Judgment in Efficiency Criterion ....................... 87

Table 8. Average Ratings on Efficiency ......................................................... 88

ix

Table 9. Main Standards of Judgment in Effectiveness/Impact Criteria ........ 92

Table 10. Average Ratings on Effectiveness/Impact ...................................... 93

Table 11. Main Standards of Judgment in Sustainability Criterion ................ 94

Table 12. Average Ratings on Sustainability .................................................. 95

Table 13. Average Overall Ratings of All Evaluations (2013-2015) .............. 97

Table 14. Comparison of Two Hypothetical Projects .................................... 110

Table 15. Net Present Value of Base Case ..................................................... 113

Table 16. NPV and Effectiveness .................................................................. 116

Table 17. NPV and Efficiency ....................................................................... 118

Table 18. NPV and Negative Externality ...................................................... 119

Table 19. NPV and Sustainability ................................................................ 121

Table 20. Summary of Water Supply Project in Juigalpa, Nicaragua .......... 126

Table 21. Achievement of Nicaragua Project ............................................... 126

Table 22. Nicaragua Project: DAC Criteria Evaluation ............................... 127

Table 23. Summary of Water Supply Project in Buon Ho Town, Vietnam .. 128

Table 24. Objectives and Targets in Vietnam Project ................................... 130

Table 25. Achievement of Vietnam Project .................................................. 130

Table 26. Vietnam Proejct: DAC Criteria Evaluation .................................. 131

Table 27. Two Projects in Comparison ......................................................... 132

Table 28. Costs and Benefits in Water Supply Interventions ....................... 137

Table 29. Patterns of Water Consumption in Juigalpa before the Project .... 142

Table 30. Changes in Water Consumption Before and After the Project ..... 143

Table 31. Patterns of Water Consumption in Buon Ho Town ...................... 144

Table 32. Estimation of NPV and B/C of Two Water Projects ..................... 147

Box 1. Definitions of Evaluation .................................................................... 13

1

CHAPTER 1. INTRODUCTION

1.1. Motivation and Purpose of the Study

Evaluation of public investments is a critical activity to examine whether the

results can justify the resources spent. It is a way of ensuring accountability

of the government or public agencies to the citizens who pay tax. The

knowledge and lessons from evaluations can improve performances and help

informed decision-making. Evaluation requires a systematic and scientific

process to produce credible information necessary to make judgements on the

worth, value, or significance of what is evaluated, conforming with the norms

and quality standards to ensure objectivity and validity of information. It is

not uncommon, however, that evaluations produce biased results.

In the evaluation discipline, bias has been widely recognized, especially

those towards positive evaluation results. Scriven (1976, 1991) asserted that

there is a strong general positive bias across all evaluation fields. Positive bias,

a tendency to turn in more favorable results than are justified, seriously weakens

the credibility and validity of evaluations.

The problem seems to be prevalent in the field of international

development evaluation, if not more serious. Riddell (2007) showed that

more than 75% of aid projects by major aid agencies (e.g., those in the UK, the

2

US, and Australia, as well as multilateral banks such as World Bank and ADB)

had been reported as successful but the evidence was likely to be biased in favor

of positive results than deserved. Michaelowa and Borrmann (2006) argued

that there is an incentive to produce biased evaluation results as ‘legitimation’

function is dominating in aid evaluation. Bamberger (2009) found that a

significant number of evaluation reports by many development agencies have

a systematic positive bias, which may mislead the agencies to continue to fund

projects that might be less beneficial than claimed or even with potential

negative impacts which are not addressed in the evaluation. Political interests,

time and budget constraints, and lack of evaluation capacity are the primary

reasons of positive bias that have been identified by many scholars.

This study asks a question from a new viewpoint: whether this positive

bias is attributable to the widely used evaluation framework, the Criteria for

Evaluating Development Assistance developed by OECD DAC (Development

Assistance Committee) or the DAC criteria. The DAC criteria have become

a standard framework for international development evaluations. As a part of

internationally agreed evaluation principles, the DAC criteria have been

adopted by most of the aid agencies (OECD 2016) and shaped the way of

designing and conducting evaluations for their programs. The five criteria,

namely relevance, effectiveness, efficiency, impact, and sustainability, serve as

a guideline on what aspects to be addressed in evaluating development

interventions. The DAC criteria also provide agencies with a comparable

3

framework that facilitates the managements of evaluation activities and

findings.

If the risk of positive bias is inherent in the pre-determined evaluation

criteria, it inevitably harms the credibility and validity no matter how the

evaluation process and methods comply with the norms and standards for

evaluation to ensure them. Positive bias may reduce comparability between

evaluations that is one of the purposes for which the DAC criteria were

developed, consequently making it difficult to differentiate a more successful

intervention from a less one and possibly causing inconsistency across

evaluation results. Evaluation, which has become an established activity in

the management of development cooperation, would be another waste of

resources if not fulfilling its purposes.

While the DAC criteria serve as such an influential framework for

international development evaluations and there seems to be consensus on their

limitations, little effort has been made to explain and assess the DAC criteria

based on theoretical and empirical research. The purpose of this study is to

identify the limitations of the DAC criteria from a theoretical point of view in

comparison with general evaluation theories and methods and provide evidence

on how the DAC criteria have been used and with focus on the positive bias

they may produce and mislead the evaluation results.

In this study, I apply the framework of cost-benefit analysis (CBA) in

order to examine the DAC evaluation criteria. Although there are

4

controversies over its technical difficulties and practical limitations, CBA rests

on a sound theoretical basis and provides a useful framework for evaluation.

This study presents theoretical analyses as well as case studies in which the

DAC criteria as an evaluation framework are analyzed in comparison to that of

CBA.

1.2. Research Questions

This study examines the influence of DAC criteria on evaluation results with

focuses on positive bias and inconsistency that the use of DAC criteria may

generate. The research question is:

Does the use of OECD/DAC criteria tend to generate positively

biased evaluation results?

I first identify requirements for good evaluation criteria drawing on

general evaluation theories, and assess the DAC criteria against them with

special attention to possible sources of bias. Multiple evaluation theories are

reviewed. Then I discuss whether and in what aspects the use of DAC criteria

may generate bias in light of the defining features of evaluation criteria drawn

from the review of evaluation theories, and argue that the bias is likely to occur

and lean to positive results.

Since the DAC criteria only provide brief definitions and a few sample

5

questions, the study requires both theoretical and empirical approaches.

Based on the findings in the theoretical examination, I examine whether biases

occur in practice and tend to lean to positive results by conducting an empirical

review of evaluation reports applying the DAC criteria by two Korean aid

agencies.

Thirdly, I attempt to explain the existence of positive bias and its extent

in the DAC criteria. The basis of my argument is that there exists positive bias

if the overall conclusion in the DAC criteria evaluation is more favorable than

can be justified by the evaluative standards widely used in other evaluation

methods. Applying the criterion used in the cost-benefit analysis, namely the

net present value (NPV) as a benchmark, I first review the definitions and scope

of DAC criteria comparing to that of NPV. My assumptions are: (1) if

evaluations applying the DAC criteria tend to generate a positive conclusion on

a project whose NPV is negative, it is fair to conclude that there is positive bias

inherent in the DAC criteria; and (2) if a project with smaller NPV (or benefit-

cost ratio, B/C) gets similar or more favorable assessments in the DAC criteria

framework than another project with greater NPV (or B/C), in means that the

extent to which positive bias occurs is inconsistent across evaluations results.

To support my argument, I present some case studies which show how certain

events can affect the evaluation results in the DAC evaluation framework and

in CBA differently.

6

1.3. Organization of the Thesis

Following this introductory chapter, Chapter 2 presents the research design of

this study. It starts with a review of the evaluation literature which

encompasses the issues of general field of evaluation, development evaluation,

the DAC criteria and cost-benefit analysis. After the concepts and scope of

the study are defined, the framework for analysis is presented.

Chapter 3 presents an analysis of or the definitions and scope of or the

‘notion of merit’ in the DAC criteria in views of whether they constitute a good

evaluation framework and satisfy the requirements for evaluation criteria.

Based on the review of multiple evaluation theories as well as the CBA

framework, I attempt to identify possible sources of bias in the DAC criteria.

Chapter 4 deals with ‘standards of judgement’ in evaluations. Based on

the findings from an empirical review of the evaluation reports, I compare the

standards and supporting evidence generally applied the DAC criteria

framework with those in CBA.

In Chapter 5, the ‘method of synthesis’ of the DAC evaluation framework

is discussed in comparison to CBA framework. A theoretical analysis as well

as a comparative case study of water supply projects are presented. Chapter

6 summarizes the findings and concludes the thesis.

7

CHAPTER 2. RESEARCH DESIGN

2.1. Literature Review

2.1.1. Outline of Literature Review

The aim of the literature review is to develop a conceptual basis of framework

for analyzing the DAC criteria. In order to identify the historical and

theoretical context in which the DAC criteria were established, the literature

review starts with the evolution of evaluation discipline and fundamental issues

of evaluations in theory and practice. The review covers different

perspectives on evaluation purposes and principles, the contexts in which they

have evolved, and their implications.

After the theoretical overview of the evaluation discipline in general, I

present a review of literature on evaluation of development assistance or

‘development evaluation’ with focus on its history and development in practice,

based on which the DAC criteria have been discussed and established into what

they are now. Recent developments and main challenges in the field of

development evaluation are also discussed.

The literature review on the DAC criteria deals with in what context they

were developed and how they have been used in practice. It also considers the

8

discussion in the development community over the strengths and weaknesses

of the DAC criteria as a standard evaluation framework.

Cost-benefit analysis (CBA) has somehow not been fully integrated in

the evaluation discipline. To support the rationale for adopting the concepts

in the CBA theory, the literature review on CBA focuses on how CBA has been

considered in the evaluation discipline and what are the main issues in applying

CBA in development evaluations.

2.1.2. Literature in the General Evaluation Discipline

Evaluation as a discipline emerged in the 1960s in the United States (Shadish,

Cook, and Leviton 1991, Rossi, Lipsey, and Freeman 2004). Though

relatively new and dominated by the US scholars, the evaluation discipline is

considered as a “well established field of study, with contributions by theorists

and practitioners in many countries throughout the world” (Owen and Rogers

1999, 22).

Characteristics of the Evaluation Discipline

The evaluation discipline has some interesting characteristics, which have

contributed to the unique conceptual and theoretical development of evaluation.

First, the development of evaluation has largely been affected by political and

policy context, mainly driven by the domestic demands in the US. The US

9

government’s large-scale social interventions in the 1960s, such as ‘War on

Poverty’ and ‘Great Society’ initiatives, called for a systematic approach to

evaluating the social programs. This approach involved new

conceptualizations existing views and attempts to build a knowledge base for

evaluation across various areas of study. With legislations that mandated and

funded evaluations for major federal programs, evaluation grew as an

independent field of study and flourished as a profession for next two decades.

The political and social climates have influenced not only the demand for

evaluation, i.e., funding, but also the perspectives and approaches to evaluation

(Mark, Greene, and Shaw 2006, Shadish and Luellen in Mathison 2005, 183-

6).

Secondly, evaluation is an applied social science with multidisciplinary

or transdisciplinary1 nature. It has the intellectual traditions from various

disciplinary perspectives, for example, sociology, economics, psychology,

anthropology, women’s studies, cultural studies, etc., as well as based on the

accompanying scientific paradigms (Mark, Greene, and Shaw 2006). It

implies conceptual and methodological diversity. Evaluators take

methodologies from multiple and diverse social sciences, whether side-by-side

or in mixed or integrated ways. The evaluation discipline went through the

famous paradigm war or ‘quantitative-qualitative debate’ during the 1970s and

1 Scriven (2008a) describes the characteristics of evaluation as a transdiscipline, which

he argues are distinct from ‘interdiscipline’ or ‘multidiscipline’.

10

1980s as did other social sciences in the US. The intense debate around the

legitimacy and relative superiority of evaluation methodologies especially in

the 1980s eventually took the evaluation community to a trend towards more

pluralistic approaches.

Its practical focus has also played a crucial role in shaping the evaluation

discipline. From its start, the primary purpose of evaluations has been to

provide information on social programs whether and why they were successful

or not, which would supposedly assist the government in holding accountability

and making decisions for future public programs. The early dominance of

positivism applying strict quantitative methods to test the program-effects

relationship seemed to gradually lose its influence, criticized for its limited use,

and often being unworkable or even counterproductive (Stufflebeam and Coryn

2014, 310). The mid-1970s saw the focus of evaluation move to their

utilization, based on the premise that “evaluations should be judged by their

utility and actual use”, so the process and design of evaluations should consider

the diversity of interests and values of stakeholders in order to facilitate the

judgement and decision making by intended users (Patton 1997, 20). Since

then, the scope of evaluation has been broadened to utilization-focused,

participatory, and developmental approaches (for example, Guba and Lincoln

1989, Patton 1997, 1994), which have become one of the main pillars in the

evaluation discipline and also influenced the methodological evolutions in

other approaches.

11

These characteristics of evaluation discipline have contributed to

somewhat chaotic scenery of evaluation theories. Evaluators continue to

develop new approaches and brand them as a theory with ear-catching

adjectives. As a result, evaluation theories have proliferated with little critical

attention to their validity. This is another challenge of studying evaluation

theories, in addition to those that King (2003) identifies, namely, lack of

conceptual consensus, practical focus, continuing emphasis on models and

methods, predominant focus on program theory, and lack of research support.

Main Areas of Evaluation Literature

As evaluation has a short but rather chaotic history with its multi-disciplinary

nature, methodological diversity and a strong pragmatic focus, the literature in

the field encompasses various issues from its logic and philosophical

foundations (e.g., Shusterman 1980, Scriven 1981, 1994, 1995, Fournier 1995)

to how to manage budget and time (e.g., Alkin 2010, Bamberger, Rugh, and

Mabry 2011). To review these broad issues is beyond the scope of this study,

but it is useful to categorize the main areas that the evaluation literature deals

with. I identify three main areas: 1) what is evaluation and why we do

evaluations; 2) how do we conduct evaluations; and 3) what do we learn

through evaluations.

12

1) What is evaluation and why

The first topic is related to the fundamental dimension of evaluation: its nature

and role in society. The literature in this area deals with the definitions of

evaluation, its purposes and functions, role of evaluation, as well as debates

over the issues of theory, methodology, practice, and the profession including

ethical and quality standards aspects (Alkin, Patton, and Weiss 1990, Smith and

Brandon 2008, Stockmann and Meyer 2016).

Evaluation is defined in various ways by evaluation scholars with

different perspectives, reflecting its functions, purposes, and methods. Box 1

shows some examples of definition which are generally quoted in the

evaluation literature. To put together the core concepts in the definitions,

evaluation is a systematic process which consists of activities of collecting,

analyzing and interpreting information and eventually making a judgement on

the object being evaluated (called ‘evaluand’). The purposes of evaluation are

summarized as: to determine its value, to contribute to improvement of or

decision-making on the evaluand, and/or to increase understanding about the

evaluand. They fall into one of the three conceptual frameworks or what

Chelimsky (1997) calls ‘evaluation perspectives’: evaluation for accountability;

evaluation for development; and evaluation for knowledge.

13

Box 1. Definitions of Evaluation

Dictionary Definition (Oxford): The making of a judgment about the amount,

number, or value of something; assessment

Scriven (2015, 4): “Evaluation… refer[s] to the process of determining the merit,

worth, or significance.”

Stufflebeam and Coryn (2014, 14): “[The expanded, operational definition of

evaluation] is “the systematic process of delineating, obtaining, reporting,

and applying descriptive and judgmental information about some

object's merit, worth, probity, feasibility, safety, significance, and/or

equity.”

Rossi, Lipsey, and Freeman (2004, 2): “Evaluation research is defined as a social

science activity directed at collecting, analyzing, interpreting, and

communicating information about the workings and effectiveness of

social programs.”

Weiss (1998, 4): “Evaluation is the systematic assessment of the operation and/or

the outcomes of a program or a policy, compared to a set of explicit or

implicit standards, as a means of contributing to the improvement of the

program or policy.”

Patton (1997, 23): “Evaluation is the systematic collection of information about

the activities, characteristics, and results of programs to make judgments

about the program, improve or further develop program effectiveness,

inform decisions about future programming, and/or increase

understanding.”

Morra-Imas and Rist (2009, 9): “Evaluation refers to the process of determining

the worth or significance of an activity, policy, or program. [It is] as

systematic and objective as possible, of a planned, on-going, or completed

intervention.”

14

OECD (2002, 21-22): “The systematic and objective assessment of an on-going

or completed project, programme or policy, its design, implementation and

results... to determine the relevance and fulfillment of objectives,

development efficiency, effectiveness, impact and sustainability… to

provide credible and useful information enabling the incorporation of

lessons learned into the decision–making process of both recipients and

donors. Evaluation also refers to the process of determining the worth or

significance of an activity, policy or program. An assessment, as

systematic and objective as possible, of a planned, on-going, or completed

development intervention.”

Source: organized by author.

Historically, different perspectives have evolved over time and shaped

the trends of evaluation theories and paradigm shifts in the discipline. In

1960s, most evaluation practice had followed the standard social science

conventions of the era, which were largely 'quantitative', focusing on assessing

the causal relationships with methods of experimental or quasi-experimental

designs.

The dominance of positivism was challenged by attempts to develop

more practical and user-oriented approach in the 1970s. Constructivists,

heuristic evaluation also gained attentions by the time, and its descriptive,

qualitative and participatory approach became influential when conceptualized

as the fourth generation of evaluation by Guba and Lincoln (1989). In the

2000s, a return to the scientific methods or what Vedung (2010) calls ‘evidence

wave’ was observed as evidence-based decision-making became the primary

15

concern of the public agencies.

Table 1. Views on Historical Development of Evaluation

Stufflebeam and

Coryn (2014)

Guba and Lincoln

(1989) Vedung (2010)

~

1930s

1940s

1950s

1960s

1970s

1980s

1990s

2000s

Pre-Tylerian Period

(~1930s)

Tylerian Age

(1930-1945)

Age of Innocence

(1946-1957)

Age of Realism

(1958-1972)

Age of

Professionalism

(1973-2004)

Age of Global and

Multidisciplinary

Expansion

(2005-present)

First generation:

Measurement

(to early 1930s)

Second generation:

Description

(early 1930-1967)

Third generation:

Judgement

(1967-early 1980s)

Fourth generation

(late 1980 and

onwards)

Science-Driven Wave

(late 1950-mid 1970)

Dialogue-Oriented

Wave

(from mid 1970s)

Neo-Liberal Wave

(from around 1980s)

Evidence Wave

(in 2000s)

Source: arranged by author.

In the area of education, the history of systematic evaluation begins with

Ralph Tyler, who coined the term ‘educational evaluation’ and exerted a heavy

influence in the field (Stufflebeam and Coryn 2014, 30-33). In other fields,

social scientist had also conducted evaluation-type studies in their own field,

16

on major programs related to public health, social policies, international

initiatives, etc.

2) How to conduct evaluations

The second type of the literature on evaluation covers specific evaluation

models or approaches, which are claimed as ‘evaluation theories’.2 Ideally, an

evaluation theory would “describe and justify why certain evaluation practices

lead to particular kinds of results across situations that evaluators confront” but

is not likely to be achievable (Shadish, Cook, and Leviton 1991).

Nevertheless, there are a variety of evaluation theories, which attempt to

“provide a set of rules, prescriptions, prohibitions, and guiding frameworks that

specify what a good or proper evaluation is and how evaluation should be done”

(Alkin 2013a, 4). In other words, evaluation theories are theories of

evaluation practice, which provide a guidance regarding when, in what context

and why certain dimensions should be addressed with certain methods, as well

as how to assign value to what is being evaluated.

The variety of different evaluation theories are well displayed in the

‘evaluation theory tree’ (Figure 1) developed by Christie and Alkin (2013).

The 'roots' of tree, namely social accountability, systematic social inquiry, and

2 In the evaluation discipline, the term ‘theory’ is generally used interchangeably with

‘approach’ or ‘model’ (Shaw, Greene, and Mark 2006, Alkin 2013b). Some theorists

prefer using one over the others in a particular research purpose (Smith 2010, for

example, Hansen, Alkin, and Wallace 2013).

17

epistemology, serve as a foundation of evaluation work, namely the motivations

and rationales for evaluation. The tree grows into three branches that

represent dominant themes in evaluation theory—use, methods, and valuing.

Evaluation theories are categorized into each branch according to what is their

main emphasis, i.e., utilization, research methodology and techniques, or value

judgement. The leaves represent the theorists and the fruits (added by author)

indicate the evaluation theories grown out of the branches.

Figure 1. Christie and Alkin's Evaluation Theory Tree

Source: Christie and Alkin (2013, 12), representing theories in boxes added by author.

Evaluation theory is a concept much broader than an evaluation design

or methodology. So evaluation theories provide a guidance on how to design

18

an evaluation and what methodologies to use in collecting information.

Evaluation models also suggest what aspects or dimensions of evaluand to be

examined (evaluative criteria), how to determine the value or quality in each

aspect (evaluative standards), and what to consider when combining all

information into overall conclusions. These three processes are the important

components in the general logic of evaluation (Fournier 1995) and will be

discussed in detail in the later chapters.

3) What we learn through evaluations

While the above two topics are rather normative or practical, this topic is

empirically oriented, represented by so-called ‘research on evaluation (RoE)’.

There has been increasing recognition that evaluation theories should be more

empirically based (Mark 2008, Shadish, Cook, and Leviton 1991, Smith 2010).

Notwithstanding the proliferation of evaluation theories or models, none of

those have been systematically verified with empirical evidence (Astbury 2016,

324). It is difficult to generalize what is a sound theory in the evaluation

discipline yet.

Empirically oriented evaluation literature is growing as the evidence-

based evaluations are more and more emphasized, in the forms such as

empirical reviews on evaluation practice applying certain evaluation theories

(e.g., Cullen and Coryn 2011, Miller and Campbell 2006), or meta-analyses and

systematic reviews of evaluations on specific topics (e.g., Scott-Little, Hamann,

19

and Jurs 2002, Fewtrell and Colford 2004).

2.1.3. Development Evaluation

The field of development evaluation has a solid foundation in Europe (Berlage

and Stokke 1992, Stockmann 2013) and many textbook-type publications as

well as academic papers are written by European authors. The researches on

development evaluation have been from more pragmatic approaches rather than

based on theoretical foundations. One of the reasons is that development

evaluation has been dominated by “the need of the donor community with the

emphasis on the practical usefulness of the results in improving aid operations”

(Carden 2013, Cracknell 2000). As a result, the term development evaluation

is often used as a synonym as ‘evaluation of development aid’ or ‘aid

evaluation’. In fact, the OECD DAC, the group of donors of development

assistance, has played a key role in the development of the field. Many of the

early literature on development evaluation have been published by the DAC or

experts who participated in the DAC’s works on evaluation.

According to Cracknell (2000, 39), it is only by the time he wrote the

book that development evaluation became more fully integrated into the wider

world of evaluation debates. From the late 1990s, criticisms on the donor-

driven conventional approach became visible, and more diverse perspectives

with participatory, empowerment, and cross-cultural approaches have been

20

called for, as the constructivist became one of the main stream in the general

evaluation discipline (Rebien 1996, McDonald 1999).

In these contexts of evaluation in development cooperation, the DAC

criteria were developed and gained the importance. Cracknell (1988, 2000)

summarized the history of aid evaluation into four phases: the first, “early

developments” phase from 1960s to 1979; phase two of “explosion of interest”

from 1979 to 1984; phase three from 1984 to 1988 when the international

dialogues became important, and phase four from 1988 to the time of writing

characterized as “aid evaluation at the crossroads”. He viewed the next, phase

five could be “the emergence of methodological pluralism”.

In the 1960s, the most popular approach to evaluating development

projects was estimating financial and economic rates of return (Binnendijk

1989). As most development projects then were large infrastructure and

industrial projects, the economic methodology was seen to be appropriate. In

the 1970s when the emphasis of development assistance shifted toward meeting

the basic human needs (e.g., agricultural and rural development), the economic

analysis method became insufficient due to the difficulties in estimating and

quantifying the social benefits and other impacts, and also because it was not

able to deal adequately with the equity or distribution issues.

In the late 1970s, the logical framework, or the ‘Logframe’ emerged in

the USAID as a conceptual framework for guiding project planning,

implementation and evaluation. The Logframe was adopted by many other

21

aid agencies for project design and evaluation. Evaluation design was largely

based on ‘experimental and quasi-experimental research’, a methodology that

requires statistically reliable baseline data to measure the attribution. Soon it

turned out that this evaluation approach was “overly sophisticated, costly, and

impractical for the evaluation of most development projects” (Binnendijk 1989,

209). The focus of this approach was too narrowly placed on impacts, with

little useful evaluation outcomes for lessons and learning.

It is fair to say that development evaluations were largely conducted by

donor agencies for their own information needs with focuses on accountability

and control. Donor agencies have played the major role “as promoters,

executors, and consumers of the evaluations conducted in developing countries”

(Bamberger 2000, 101). The donor-driven evaluation practice resulted in a

number of problems as reported by many scholars (Bamberger 1991, Berlage

and Stokke 1992, McDonald 1999, Rebien 1997). The problems include:

evaluation designs were too costly and inappropriate within the local context,

with too much emphasis on the collection of quantitative data with little

flexibility in the methods used, while the duration and resources allowed were

generally limited at the expense of the quality of collected data or information.

Another interesting point in McDonald (1999) is that evaluations were mainly

conducted by professionals from donor side with skills in the project substance

(in the sector) rather than evaluation expertise, so they generally lacked

knowledge of the local social, political and cultural context.

22

All these problems seem to have contributed to what Bamberger (2009)

called ‘positive bias’ in international development evaluations. He explains

the positive bias as a result of combination of four factors: budget and time

constraints, limited access to data (particularly baseline data), the way

evaluations are commissioned and managed, and political and organizational

constraints and pressure. Raimondo, Vaessen, and Bamberger (2016) identify

five common scenarios in development evaluation: rapid evaluations, large-

scale, long-term evaluations; experimental manipulation and/or reliance on

primary data collection; systematic reviews, and participatory evaluations. It

appears that many evaluations conducted by donor agencies fall into the

category of ‘rapid evaluation’.

Recently the pressure on ‘evidence-based’ development assistance has

become strong, which calls for more scientific and empirical evidence in

development evaluations. This is in line with the ‘evidence wave’ in the

evaluation discipline, which is interpreted as a return of experimentation

(Vedung 2010). Development evaluation community seems to be skeptical of

using the experimental type methodologies (Forss and Bandstein 2008, Van

Den Berg 2005), but they continue to face the challenges to demonstrate the

results in evaluating development activities.

2.1.4. DAC Evaluation Criteria

23

The OECD/DAC’s Criteria for Evaluating Development Assistance have their

origin in the DAC Principles for Aid Evaluation, which was developed by the

DAC Expert Group on Aid Evaluation (now the DAC network on Development

Evaluation) and adopted by the DAC in 1991. The DAC Principles document

is considered as the most important product established by the Group (OECD

2013, 33). The definition of evaluation is the core element of the principles,

from which five concepts, namely relevance, efficiency, effectiveness, impact,

and sustainability, emerged. They have become the widely accepted

evaluation criteria as called the DAC criteria and had a profound impact on

development evaluation. At that time, the five criteria were presented as a

basic group of evaluation issues or questions to be addressed in an evaluation.

The purpose of developing the evaluation principles was for (1) the greater

coordination between the evaluation units in different aid agencies and (2)

harmonization of the terms of reference for evaluations, in an effort to ensure a

greater comparability of the results (OECD 1991). In other words, the set of

five criteria was developed with an expectation of contribution to collaboration

and comparability in evaluation activities between the DAC members.

The DAC criteria were updated in 2002 with the Glossary of Key Terms

in Evaluation and stipulated in the DAC Quality Standards for Development

Evaluation which was adopted in 2010. The DAC criteria have become a

standard framework for international development evaluation, adopted by

major donor agencies. According to Lundgren who was the member of the

24

Expert Group and now is the Head of Evaluation Unit at OECD, a key reason

why these criteria came to be widely spread and used is that they are a

manageable and relatively easy framework to understand and to use when

framing key evaluation questions (IEG 2017).

Figure 2. Development of Evaluation Paradigm and the DAC Criteria

Source: author.

The DAC five criteria have been widely used among the DAC members

as well as other development organizations including NGOs to meet donors’

requirements in reporting their activities. It is often criticized that the practice

of applying the DAC criteria is rather mechanical, described as a template,

checklist or box-ticking approach and even perceived as a ‘straight jacket’.

Even though the DAC requires all criteria applied to be defined in unambiguous

terms, it is argued that several criteria are not well understood, and their use is

25

often mechanical while excluding more creative evaluation processes (ALNAP

2006, 10). Some donors include additional criteria such as cross-cutting

issues or so-called 3Cs, namely, coherence, co-ordination, and complementarity

which are mainly adopted by European donors, e.g., EC, Demark Germany,

Ireland, etc. (OECD 2016). But within a rating system, the overall assessment

is usually made combining the five criteria.

Although there seems to be shared sentiments among evaluators on the

rigidness and limitations of the DAC criteria framework, academic literature

with theoretical explanations or from critical perspectives is not sufficient.

Among a few, Chianca (2008) examined each criterion under three questions:

(1) are they sufficient to provide a sound assessment of the quality, value, and

significance of an aid intervention? (2) are they necessary; (3) are they equally

important? He provides a good overview on the definitions of the criteria and

their strengths and weaknesses, based on which he suggests the ideas for

improvement, but did not attempt to provide empirical evidence on the

problems raised in the paper.

Igarashi and Awabdeh (2015) discuss the problems associated with

applications of the DAC criteria in the institutional context. The focus of their

discussion is on the ‘mechanical application’ of the DAC criteria based on a

pre-determined log-frame which, they criticize, generally assumes a linear

means-ends causality and fails to see the complex and dynamic nature of

development process. They argue that ‘log-frame-based evaluations’ are

26

common in practice, and that overreliance on the DAC criteria in designing

evaluation frameworks, often by those whose expertise is not in evaluation, has

a danger of overlooking how things progress and what changes happen and how.

The authors propose ‘weaning from the DAC criteria’ as an operational

framework which serves as a ‘template approach’. Instead, they suggest

utilizing them as a framing approach to define evaluation questions which

should be the driver of the evaluation methodology.

Recently the development evaluation community has started rather

openly discussing the issues around the DAC evaluation criteria. Early in

2017, Caroline Heider, the Director General of the Independent Evaluation

Group (IEG) at the World Bank, initiated the ‘Rethinking Evaluation’ series on

the IEG blog, suggesting that it is time to rethink the DAC five criteria. She

raised questions such as whether the criteria represent diverse views which have

emerged in the development field, e.g., inclusiveness and complexity. She

examined the definition of each criterion and their use in practice in a series of

provoking posts which brought up more than hundred comments and questions

(Heider 2017). The discussion is ongoing and expected to collect ideas for

improvements of the DAC criteria, though it remains as exchanges of opinions.

2.1.5. Cost-Benefit Analysis and Evaluation

Cost-benefit analysis (CBA) is generally perceived as a method used in project

27

appraisal or feasibility assessment before an intervention, to assist in decision-

making on whether to allocate resources to the specific intervention. While

this type of ex-ante CBA is regarded as a standard CBA, ex-post CBA has its

own values of learning about the actual value of the particular intervention and

contributing the learning about value of similar interventions, and even with

greater accuracy than ex-ante analysis (Boardman et al. 2011).

Conducting an ex-post CBA, therefore, shares the same purposes of

conducting an evaluation, more specifically a summative evaluation.3 By

estimating the value of what is being evaluated, CBA determines whether the

intervention was socially worthy, thus satisfies the accountability purpose of

evaluation. With the information obtained by the CBA on whether a similar

intervention would be worth investing, it helps decision-making on whether to

expand or replicate the project.

CBA is a widely-used tool for assessment of the value of an evaluand

and includes a task of drawing a conclusion with regard to how a given

evaluand has performed. Although CBA shares common purposes and tasks

with evaluation, it has somehow been separated from the evaluation discipline.

King (2015, 2017) describes that economic evaluation, e.g., CBA, tends to be

3 Summative evaluation is a term coined by Scriven, who make a distinction between

evaluation for assessing the overall value of an evaluand and that for improvement

which is called formative evaluation. The summative evaluation is generally

conducted after completion of the project or program, but also can be conducted for on-

going ones after stabilization (Scriven 1991, 340).

28

applied “either in isolation from or in parallel to other methods of evaluation”.

He adds that the evaluation community has noted this gap and suggested

economic valuing methods should be integrated in general evaluations.

This is probably because the evaluation field started in and has been

developed by other fields of social science than economics, such as education,

sociology, public policy, etc. In fact, consideration of cost, not to mention the

cost-benefit relation, has been very weak in general evaluations4. There seems

to have been some misunderstanding or resistance among the evaluation

community on the concepts and definitions used in CBA such as valuing and

monetizing qualitative aspects of human and society, e.g., happiness or pain,

quality of life, environment, and so on. A direct example is the article titled

“The Economist’s Fallacy” by Michael Scriven (2008b), who is the second to

none influential scholar in the discipline of evaluation. With some

misunderstanding of the definition of opportunity cost in economics, he even

argued, “evaluators should not follow the economists [when analyzing costs],

or they will end up in a swamp of misleading conclusions about program

costs…”.5 Technical complexity and theoretical assumptions embedded in

CBA may also contribute to isolation of the method from the mainstream

evaluation approaches which has moved towards qualitative methods with

4 Scriven (2015) noted that cost-analysis, both quantitative and qualitative, were

ignored for long and still seriously underused in evaluation. 5 Two economists, Rudd (2009) and Watts (2008), responded to Scriven’s claim, saying

that his understanding of opportunity cost is incorrect from an economics perspective.

29

more emphasis on participatory and social justice perspectives.

In the field of development, it was common to use CBA in 1960s and

1970s, although it seems to have been limited to ex-ante evaluations.

Economic analysis was institutionalized as a main method of feasibility

assessment in major multilateral development banks. But it became less

preferred for practical reasons. For example, the percentage of projects which

were assessed with CBA methods had dropped from 70% in 1970 to 25% in

2005, mainly because of the growth of project area which are difficult to apply

CBA, especially governance and social protection (World Bank 2010).

Recently, however, multilateral development banks became active again in

studying how to apply the economic methods to development programs, and

have developed guidelines for sector-specific areas. International Fund for

Agricultural Development (IFAD) released a series of guidelines for economic

analysis of agricultural projects in 2015 and in 2016 (IFAD 2015, 2016a, b).

Asian Development Bank (ADB) recently revised its 1997 version of

Guidelines for the Economic Analysis of Project (ADB 2017) and also

published a practical guide on CBA for development (ADB 2013) among others.

In the development context, CBA is often called as ‘social cost-benefit

analysis’. Snell (2011) also uses the term when introducing three categories

of CBA as financial, economic and social CBA. While financial CBA

concerns the financial position of a person or firm, economic CBA concerns the

welfare of a defined group of people. Social CBA adjusts the price to reflect

30

priorities and policies that markets would not reflect, for example, adjustments

to give advantage to certain population groups such as the rural poor. But the

distinction between economic and social CBA is often blurred, as the author

admits (Snell 2011, 5). As Brent (2006, 5) points out, the word ‘social’ is used

to emphasize that one is attempting to express the preferences of all individuals,

whether they be rich or poor, or directly or indirectly affected by the project.

Without the word ‘social’, CBA deals with all individuals in the society as well

as distribution issues. As de Rus (2010) defines, CBA is about social welfare

and considers all social costs and benefits of projects, whose view is taken in

this thesis.

Numerous critiques of CBA exist. In addition to the conceptual and

methodological issues such as quantification and monetization of qualitative

values and discounting and distributional issues (Frank 2000, OECD 2006),

practical challenges make the application costly and limited especially in the

development context. Despite all the controversies, however, a consensus is

that CBA provides a powerful conceptual framework for evaluation. CBA

involves “systemically identifying, measuring, valuing, and comparing the

costs and consequences of alternative courses of action” (Drummond 2005,

quoted in King 2017). Methods to supplement the limitations have been

developed, such as analysis of multiple scenarios through sensitivity tests.

This is the ground that I argue CBA can be a good benchmark to the DAC

criteria framework.

31

2.2. Concepts and Scope of the Study

In the evaluation literature, many terms are used in different ways, sometimes

causing confusions. It would be useful as well as necessary to define the key

terms which I use in the study and to narrow down the scope of the study as

follows.

Evaluation and Evaluand

As discussed in the literature review, there are various definitions of evaluation

which reflect difference perspectives to its functions, purposes, and methods.

In this study, I take an eclectic way by combining Scriven’s and Weiss’s to

define evaluation as “the systematic assessment of a development intervention,

compared to set of explicit or implicit standards, to determine its merit, worth,

or significance6”.

Perspectives, approaches and methodologies on evaluation vary

according to the object of evaluation or the ‘evaluand’. Evaluation literature

and practice encompass various kinds of evaluands from product to personnel,

summarized as the “six Ps”: programs, policies, performance, products,

personnel, and proposals (Scriven 1991). This study is focused on evaluation

6 Precisely speaking, the terms ‘merit’ and ‘worth’ have distinctive meanings: ‘merit’

means intrinsic, context-free qualities, while ‘worth’ refers to context-determined value

(Scriven 1994, Stake 2004). In some literature, the terms merit, worth, value or

quality are used interchangeably.

32

of development interventions or ODA projects, so mostly deals with theories

for ‘program evaluation’.

Evaluation Criteria and Standards

The term ‘criteria’ means “the aspects, qualities, or dimensions that distinguish

a more meritorious or valuable evaluand from one that is less meritorious or

valuable” (Mathison 2005: 91). In evaluation literature, the term is used as a

synonym as dimension of merit/worth (Davidson 2005) or analytical category

(Dale 2004) and often includes indicators or variables of success or merit

(Scriven 1991: 111).

A criterion should be distinguished from the term ‘standard’ which means

the level or amount of quality needed for a certain judgement. While criteria

are “the aspects of an evaluand that define whether it is good or bad and whether

it is valuable or not valuable” (Davidson 2005: 239), standards are the level of

how good and how valuable that differentiate the evaluand between acceptable

and unacceptable (Stake 2004: 7).

Evaluation Framework

An evaluation framework generally means a tool used to organize and link

evaluation criteria with questions, outcomes or outputs, data sources and data

collection methods. Some use the term as a synonym as an evaluation matrix.

The DAC criteria as a set can be viewed as an evaluation framework, which

33

defines the aspects the evaluation should examine and the questions to be asked.

In this study, the term ‘DAC criteria framework’ or ‘DAC framework’ are used

to address an evaluation framework using the DAC criteria.

Positive Bias

In evaluation literature, bias is defined as “systematic deviation of results from

what they should be” (Camfield, Duvendack, and Palmer-Jones 2014).

Positive bias in evaluation means the judgement is in favor than the actual

results. Scriven (1991) described ‘bias’ as the same as “prejudice”, whose

antonyms are ‘objectivity’, ‘fairness’ or ‘impartiality’. Bias may be caused by:

evaluator’s personal view e.g., halo effects or Rorschach effect7 ; evaluation

design and methods, e.g., selection bias; incentive mechanism, e.g., funding

bias.

The concept of positive bias in this study is used in the way that positive

bias exits if the overall conclusion is more favorable than can be justified by

widely used evaluative standards in other evaluations, e.g., the net present value

in cost-benefit analysis.

7 Halo effects mean the tendency to allow the presence of some highly valued feature

to overinfluence one’s judgement. Rorschach effect refer to the tendency to see what

one wants to see (Scriven 1991).

34

General Logic of Evaluation

The general logic of evaluation means the generally applied reasoning process

by which evaluative conclusions are established and supported. This general

logic is commonly shared by various evaluation approaches, while what counts

as criteria or evidence and how evidence is weighted varies from one approach

to another. Based on Michael Scriven’s logic of evaluation, Fournier (1995)

describes the general logic in four steps (Figure 3):

Source: Fournier (1995)

King (2017) described the process of cost-benefit analysis (CBA)

implementing the general logic of evaluation: “identifying the things of value”,

that is, establishing criteria of merit; “quantifying and valuing them” which

1. Establishing criteria of merit

On what dimensions must the evaluand do well?

2. Constructing standards

How well should the evaluand perform?

how well did the evaluand perform?

3. Measuring performance and comparing with standards

How well did the evaluand perform?

4. Synthesizing and integrating data into a judgement of merit or

worth

What is the merit or worth of the evaluand?

Figure 3. General Logic of Evaluation

35

involves constructing of standards and measuring; and “synthesizing the

evidence… to reach an overall determination of net value”. Likewise,

conducting an evaluation using the DAC framework can be conceptualized by

implementing the general logic of evaluation, which would provide a logical

analytic framework for the analysis of the DAC criteria evaluation process

especially in comparison to CBA framework.

2.3. Analytical Framework

This study analyzes the DAC evaluation criteria implementing the stages

of the general logic of evaluation. The analytical framework of the study is

illustrated in Figure 4.

Figure 4. Analytical Framework of the Study

The first stage, i.e., establishing criteria of merit involves defining and

36

describing what are the merits and values that the evaluand should have to be

considered as a good one. Adopting the key requirements for evaluation

criteria identified in the ‘criteria of merit checklist (COMlist)’ (Scriven 2007),

I examine the definitions and scope of DAC criteria individually and

collectively, in views of whether they constitute a good evaluation framework

in comparison to the evaluation criteria in established evaluation theories, i.e.,

Scriven’s KEC framework, Stufflebeam’s CIPP model, Rossi’s Theory-based

model as well as the cost-benefit analysis framework. This part of the study

is presented in Chapter 3.

In the second and third stages of the general logic of evaluation, the core

concept is ‘standard’, that is, “the level of how good and how valuable that

differentiate the evaluand between acceptable and unacceptable” (Stake 2004,

7). In Chapter 4, I analyze what are the standards of judgement in the DAC

criteria framework and whether they can indicate appropriate levels of values

that requires a development intervention to meet. The analysis is based on the

findings from empirical review of the evaluation reports which apply the DAC

criteria framework and on comparison of the standard of judgement in cost-

benefit analysis, i.e., the net present value.

The final stage of the general logic of evaluation is concerned with how

to integrate the findings from different criteria into an overall conclusion.

This involves the issues of relative importance between evaluation criteria and

the validity of evaluation results. A comparative analysis is conducted in

37

Chapter 5 to examine how differently a certain event can affect the evaluation

results in the DAC framework and the cost-benefit analysis with a conceptual

framework followed by a comparative case study.

38

CHAPTER 3. NOTION OF MERIT – DEFINITION AND SCOPE OF DAC

CRITERIA

3.1. Key Requirements for Evaluation Criteria

Establishing criteria of merit is the first step in the general logic of evaluation.

Identifying and selecting the right criteria is one the most critical tasks in

evaluation procedures, as they define the merit, worth, or significance of a

program being evaluated.

A dictionary definition of criterion is “a principle or standard by which

something may be judged or decided”. In Encyclopedia of Evaluation, the

term ‘criteria’ is defined as “the aspects, qualities, or dimensions that

distinguish a more meritorious or valuable evaluand from one that is less

meritorious or valuable” (Mathison 2005, 91). In evaluation literature, the

term is used as a synonym as dimension of merit/worth (Davidson 2005) or

analytical category (Dale 2004), and often includes indicators or variables of

success or merit (Scriven 1991, 111).

Davidson (2005: 27) made an interesting analogy of identifying

evaluation criteria with “deciding what symptoms to look at when determining

what is wrong with a patient and how serious it is”. In other words,

establishing evaluation criteria is making decisions on what aspects of the

39

evaluand to investigate in order to draw judgments on whether it is successful

or not and what are the causes of the success or failure. Once developed,

evaluation criteria represent the desired characteristics of the evaluand and

serve as a basis for assessing the overall merit, worth and significance that is

what an evaluation would determine eventually.

Therefore, selection of evaluation criteria affects the validity of

conclusions. As Scriven (2007) underscores, a set of evaluation criteria, when

developed in a right way, can contribute substantially to the improvement of

validity, reliability, and credibility of an evaluation, because evaluators are

required to consider each of the relevant aspects separately and make a

judgement on each criterion, based on which an overall evaluative conclusion

can be drawn. Such a set of criteria often incorporates a great amount of

specific knowledge and experience about the particular type of interventions,

so facilitates evaluation tasks.

According to Scriven (2007), identifying true criteria for an evaluand X

begins with asking “what properties are parts of the concept of a good X”. It

implies that to identify criteria for a development intervention, the first question

should be: what the properties of a good development intervention are. Those

properties, in other words, the notion of merit should represent a successful

development intervention.

There are conflicting views on whether the evaluation criteria for an

evaluation should be determined beforehand, since they often emerge during

40

the evaluation process. Scriven (1991, 2007) advocates the value and

usefulness of a list of criteria. When developed in a right way, a list of criteria

can contribute substantially to the improvement of validity, reliability, and

credibility of an evaluation as evaluators are required to consider each of the

relevant aspects separately and make a separate judgement on each separate

criterion, based on which an overall evaluative conclusion can be drawn. It

can reduce the risks of possible biases such as the halo effect or the Rorschach

effect. Such a list often incorporates a great amount of specific knowledge

and experience about the particular evaluands such as development

interventions, so facilitates evaluation tasks.

Skeptical views are against the alleged objectivity which is an often-

stated purpose for criterial analyses. They argue that since the criteria

manifest the subjective biases of their developers, there is a danger that “pre-

specified criteria will ensure attention to some program aspects at the expense

of others and inject systematic bias rather than eliminate it” (Bamberger, Rugh,

and Mabry 2011, 315-6). This is especially the case in which each program is

complex and contextual to be judged comprehensively by comparison to a set

of criteria intended for all programs of its kind.

As Stufflebeam and Coryn (2014: 683) argue, evaluators may be

divergent in considering a wide range of potential evaluative criteria, and

should subsequently converge on the criteria agreed to be most important in

carrying out a given evaluation assignment. In this sense, applying the pre-

41

determined criteria such as DAC criteria may limit the possibility of

discovering other important aspects to be also considered when drawing overall

conclusions about the evaluand.

Then what are the characteristics of appropriate evaluation criteria for an

evaluation? Scriven (2007) provides a list of key requirements for evaluation

criteria or ‘criteria of merit checklist (COMlist)’, which is useful when

examining the DAC criteria as an evaluation framework.

The key requirements for evaluation criteria are: (1) the criteria in the list

should refer to criteria defining the general notion of merit in the evaluand not

mere indicators, (2) the list should be complete (no significant omissions), (3)

the items should be non-overlapping, (4) the criteria should be commensurable,

(5) the criteria should be clear (comprehensible and applicable), (6) the list

should be concise (with no superfluous criteria), (7) the criteria should be

confirmable (measurable or reliably inferable).

To be complete means that every significant criterion (of merit) must be

included. Otherwise, the overall evaluation may be misleading or biased

because of its poor or superior results on some missing but crucial aspect of

merit, that is a non-counting problem. If the criteria overlap, there is a risk of

double counting in the overlap area especially when the list of criteria is to be

used as a basis for scoring. Including aspects which are taken for granted,

those put into the general background of all development interventions, would

only extend the list beyond necessity.

42

A criterion can measure (or assist judgements) how well the evaluand, an

intervention under evaluation, is performing or has performed in the criterion

at issue. So the criterion should explain why meeting certain level of the

criterion means a worth or success of the intervention, and suggest measures

that can directly address and describe the achievement in each criterion.

Being direct is one of the desirable properties of evaluation criterion that

Keeney and Gregory (2005) also pointed out.

Table 2 Summarizes the key requirements of evaluation criteria and what

kinds of bias may occur if the requirements are not met.

Table 2. Key Requirements of Evaluation Criteria and Possible Sources of

Bias

Requirements If… Sources of bias

Clear notion of

quality

Definition is not unambiguous

and comprehensive

Questions are about something

taken for granted

Inappropriate data or evidence

Easy to satisfy

Guidance for

standards of

assessment

too high or too low

don’t address directly the

criteria

No guidance

Difficult to assess the true

value and to discriminate

More subjectivity, more room

for general positive bias

Complete,

non-

overlapping,

commensurable

omission

overlaps

a serious problem detected in

one criterion can be

compensated for by good

assessment in another criterion

non-counting

double-counting

Unbalanced overall

conclusions

Source: by author.

In this chapter, I examine the DAC evaluation criteria in comparison to

43

those in established evaluation models in the discipline and in cost-benefit

analysis, with focus on the definitions and scope of each criterion.

3.2. General Criteria in Evaluation Models

There are several widely recognized guiding frameworks for evaluations by

renowned theorists in the field. Each framework provides a set of components,

categories or dimensions to be assessed in evaluations, which can be regarded

as corresponding to the DAC evaluation criteria. First of all, Key Evaluation

Checklist (KEC) framework by Scriven (2015) provides relatively

comprehensive ‘checkpoints’ or ‘sub-evaluation’ categories. The widely used

evaluation textbook by Davidson (2005) has its foundation on the KEC

framework.

Another influential evaluation approach is the CIPP Evaluation Model

by Stufflebeam (2003, 2007). CIPP stands for Context (what needs to be

done), Input (how should it be done), Process (is it being done), and Product

(did it succeed). CIPP model provides seven components of evaluation, which

may be employed selectively and in different sequences and often

simultaneously, depending on the needs of particular evaluations.

Theory-driven Evaluation approaches name the categories as ‘scope and

hierarchy of evaluation’. Rossi, Lipsey, and Freeman (2004) and Donaldson

(2007) suggest five dimensions of evaluation hierarchy in the order of: need for

44

the program, program design and theory, process and implementation,

outcome/impact, and cost and efficiency.

3.2.1. Key Evaluation Checklist (KEC) framework

Michael Scriven is one of the most eminent figures in the evaluation discipline,

and his contributions to and their overall influence on the field are considered

as second to none. 8 The checklist methodology and several evaluation

checklists are among his original contribution to the theoretical development of

evaluation discipline. The Key Evaluation Checklist (KEC) was first

developed in his earlier book, The Logic of Evaluation (1980) and has been

updated with minor revisions.9

Though titled as ‘checklist’, the KEC is a comprehensive framework

with in-depth descriptions for how to conduct and report evaluations. It

covers the whole process of evaluation from ‘preliminaries’ (on executive

summary and preface), ‘foundations’ of conducting evaluations (on identifying

and explaining the background, context, descriptions, impactees or consumers,

resources and values of the evaluand), ‘sub-evaluations’ (the dimensions to be

assessed), to ‘conclusions’ (synthesis of the assessments in sub-evaluations and

8 Scriven’s contributions to the field are well presented in Donaldson (2013), a book

published as a tribute to him. One of the earliest in the field, his work includes ‘the

Methodology Evaluation (1972)’ and ‘the Logic of Evaluation (1980)’. 9 My review is based on the version of 2015.

45

possibly recommendations). The ‘sub-evaluations’ part, as the dimensions to

be assessed, consists of the following five checkpoints: Process, Outcomes,

Costs, Comparison and Generalizability.

Process checkpoint deals with how good, valuable or efficient is the

contents and implementation of the evaluand. Contents means what the

evaluand consists of, including its basic components or design.

Implementation is on how well or efficiently the evaluand was implemented or

delivered to those who needed it. Davidson (2005: 65) mentions that under

Process checkpoint everything about the program except outcomes and costs

should be examined.

While Process is about ‘means’, Outcomes covers all the ‘ends’ including

what the Process are aimed at as well as other effects of the intervention. In

Outcomes (or effects, impacts interchangeably) checkpoint, the main evaluative

question is how good or valuable are the impacts on immediate recipients and

other impactees10. So it is required to identify all effects, including unintended

impacts on all potential impactees, or at least the possibility should be

investigated. Sustainability of the program’s effects is also important and

should be covered here.

Costs checkpoint examines whether the evaluand has good or poor value,

not just within the budget but whether the budget itself was excessively high or

10 Scriven warns not to use the term ‘beneficiaries’ since it carries the completely

unacceptable assumption that all the effects are beneficial.

46

low or whether there were more cost-effective alternatives that should have

been considered.

Under Comparisons checkpoint, the evaluand should be compared with

(1) an exemplary one or what is widely regarded as ‘best practice’ or state-of-

the-art, (2) a creative low-budget option, (3) an option with slightly more

resources allocated to it, or (4) a slightly more streamlined or economical

version.

Generalizability may not be an obligatory criterion, but worth

considering the possibility. Also called as exportability or transferability, it

deals with what elements, if any (e.g., innovative design), of the evaluand might

make it potentially valuable or a significant contribution or advance in another

setting.

Davidson (2005) adopted the KEC for her evaluation methodology and

reorganized the evaluative criteria with five most relevant ones in the KEC. In

addition to the four core sub-evaluation checkpoints, Process, Outcome, and

Comparative Cost-Effectiveness (combining Costs and Comparisons), and

excluding the non-obligatory Generalizability, she identified Impactees (or

consumers) and Value under ‘foundation’ heading as key evaluative criteria.

The rationale of the two additional criteria is that evaluation needs to identify

who might be affected by the evaluand and how to define what is ‘good’ or

‘valuable’ (Davidson 2005, 23-24).

47

3.2.2. CIPP Evaluation Model

The CIPP evaluation model is a comprehensive framework for conducting both

formative and summative evaluations (Stufflebeam 2003). It was created in

the late 1960s by as an alternative to the classic evaluation approaches at the

time, e.g., experimental design and objectives-based evaluation, which were

proved to be of limited use and often unworkable and even counterproductive

(Stufflebeam and Coryn 2014). The model has been further developed over

the years11, and been adopted and applied across the world and a wide range of

areas.

CIPP is an acronym for Context, Input, Process, and Product, which

represent the four categories of evaluation in relation with program’s goal, plans,

actions, and outcomes respectively. The CIPP framework provides seven

components of evaluation, Context, Input, Process, and Product as subdivided

into Impact, Effectiveness, Sustainability and Transportability. They may be

employed selectively and in different sequences and often simultaneously,

depending on the needs of particular evaluations.

Context evaluation is about what needs to be done. It assesses needs,

assets, and problems within a defined environment. The Context evaluation

component is ideally done at an early stage of program development with focus

11 My review is based on the 2007 version of the model checklist (Stufflebeam 2007)

as well as on Stufflebeam (2003) and Stufflebeam and Coryn (2014).

48

on program’s aims and evaluation design. For formative use, the model

assumes that the evaluators are involved from the beginning of the program and

throughout the program activities, in order to identify the problems and assess

program goals as well as to observe and record pertinent information residing

in the program’s geographic area. For summative evaluations, the focus is

more on judging goals and priorities by comparing them to the assessed needs,

problems, assets, and opportunities.

Input evaluation addresses the question of how it should be done. It is

to be done at a stage of program planning, dealing with competing strategies

and the work plans and budgets of the intervention. It assesses the program’s

strategy against relevant research and development literature, in comparison

with alternative strategies in similar programs. The assessment should

include the program’s work plan and schedule for sufficiency, feasibility, and

political viability.

A Process evaluation is “an ongoing check on a plan’s implementation

plus documentation of the process”. In Process evaluation, program activities

are monitored, documented, and assessed. It helps the clients/stakeholders to

coordinate and strengthen staff activities, to strengthen the program design, to

maintain a record of the program’s process and costs. Process evaluation is

more related to formative than summative evaluation, but the information

produced here is vital for interpreting evaluation results in Product evaluation,

the following category.

49

Impact is the first component in Product evaluation which asks “did it

succeed?”. A program’s reach to the target audience is assessed with a

question whether the right beneficiaries were reached. Evaluators assess and

make a judgement of the extent to which the served individuals and groups are

consistent with the intended beneficiaries. Ideally impact evaluation is to be

done regularly and updated.

Effectiveness evaluation is about whether the beneficiaries’ needs were

met and the quality and significance of outcomes. On the program’s outcomes,

evaluators conduct in-depth case studies of selected beneficiaries as feasible

and appropriate, identify the program’s full range of effects both positive and

negative, and intended and unintended, and judge its effectiveness in

comparison to the identified ‘critical competitors’ which means similar program

conducted elsewhere.

Sustainability evaluation assesses the extent to which “a program’s

contributions are institutionalized successfully and continued over time”.

Evaluators are also to identify what program successes should and could be

sustained.

Transportability evaluation asks the question of “whether the processes

that produced the gains were proved transportable and adaptable for effective

use in other settings”. It assesses the extent to which a program has (or could

be) successfully adapted and applied in other settings. This is an optional

component as in Scriven’s KEC.

50

The CIPP Model has evolved adopting new concepts and ideas in the

field of evaluation. For example, Sustainability and Transportability were not

distinctive evaluation components in earlier versions of CIPP Model where they

were included in an example of checklist for summative evaluations consisting

of 21 checkpoints. The definitions have been elaborated in the later versions.

3.2.3. Theory-driven Evaluations

Theory-driven (or theory-based) evaluation is “a contextual or holistic

assessment of a program based on the conceptual framework of program theory”

(by Chen in Mathison 2005, 415-9). Given that program theory is “a set of

assumptions of how the program should be organized and why the program is

expected to work”, the primary aim of theory-driven evaluation is to test if the

theory works. Taking up a holistic approach, it also serves to fulfill wider

purposes such as “to provide information on not only the performance or merits

of a program but on how and why the program achieves such result”. Theory-

driven evaluations are well established in evaluation discipline and have been

applied to numerous domains.

One of the most quoted theory-driven evaluation approach by Rossi,

Lipsey, and Freeman (2004) advocates a comprehensive model for a program

evaluation, providing the following five domains which are generally involved

in the evaluation of a program: (1) the need for the program, (2) the program’s

51

design, (3) its implementation and service delivery, 4) its impacts, or outcomes,

and (5) its efficiency. These represent the types of evaluations which can be

addressed in separate evaluations. In a holistic model, they together represent

‘evaluation building blocks’ in the form of a hierarchy as in Figure 5.

Source: Rossi, Lipsey, and Freeman (2004, 80)

In the evaluation hierarchy, each dimension to be assessed is based on

those beneath it. In other words, evaluation tasks in each level assume

knowledge about supporting issues below in the hierarchy. Assessment of

need for the program at the foundation level of the hierarchy provides the

diagnostic information on the nature of the problems and the need for

intervention, based on which the program design can be assessed whether the

program theory is reasonable for addressing the problems and needs. Then

Figure 5. The Evaluation Hierarchy

Assessments of Program Costs and Efficiency

Assessments of Program Outcome/Impact

Assessments of Need for the Program

Assessments of Program Design and Theory

Assessments of Program Process and Implementation

52

the evaluation may move on to the next level above, the assessment of program

process and implementation, that is, the task of assessing whether the

corresponding program activities are well implemented.

A key message in the evaluation hierarchy is that there are logical

interdependencies between the levels. To assess program outcomes would be

meaningful when it rests on acceptable results from assessments of the logically

prior issues, such as whether the program theory is sound in addressing the

needs and social conditions the program is intended to improve and how well

it is implemented, which are to be asked in the lower level of the hierarchy.

Assessment of program cost and efficiency requires supporting information

from below levels in the hierarchy, regarding the social problems and program

theory addressing them, implementation process, and program outcomes,

which serve as ‘building blocks’ for the next level evaluation.

The theory-driven evaluation model by Rossi and others provides the

dimensions to be examined in a holistic evaluation with a logical sequence. It

is worth noting that they confirm that the assessment of cost and efficiency,

represented by cost-benefit analysis, is at the top of the evaluation hierarchy,

meaning that the task assumes information and knowledge drawn from other

categories of evaluation such as program’s needs, process and outcomes.

53

3.3. Definitions and Scope of the DAC Evaluation Criteria

3.3.1. Characteristics of the DAC Criteria

The DAC evaluation criteria or Criteria for Evaluating Development

Assistance were developed driven by the donor community’s efforts to

coordinate and harmonize their evaluation activities in the late 1980s through

1990s. Having their origin in the DAC Principles for Aid Evaluation

developed by the DAC Expert Group on Aid Evaluation, the five evaluation

criteria were adopted in 1991 (OECD 1991). The main ideas and definitions

are updated in the Glossary of Key Terms in Evaluation published by the DAC

in 2002. They are a part of internationally agreed principles for development

evaluation, stipulated in the DAC Quality Standards for Development

Evaluation adopted in 2010.

The DAC Quality Standards for Development Evaluation are intended to

identify the key pillars needed for a quality development evaluation process and

product (OECD 2010). As stated in the document, the quality standards are

not mandatory, but meant to provide a guide to good practice. The document

also makes it clear that the standards are “not intended to be used as an

evaluation manual and do not supplant specific guidance on particular types of

evaluation, methodologies or approaches”.

Likewise, the DAC criteria are not meant to be obligatory. The use of

54

the five criteria does not rule out the possibility of excluding the existing ones

or adding other criteria that are considered relevant to specific characteristics

of the evaluation and its context.12 In Quality Standards for Development

Evaluation, it is stated:

2.8 Selection and application of evaluation criteria

The evaluation applies the agreed DAC criteria for evaluating

development assistance: relevance, efficiency, effectiveness, impact

and sustainability. The application of these and any additional

criteria depends on the evaluation questions and the objectives of the

evaluation. If a particular criterion is not applied and/or any

additional criteria added, this is explained in the evaluation report.

All criteria applied are defined in unambiguous terms. (OECD 2010,

9)

Nonetheless, the DAC five criteria have been widely used among the

DAC members as well as other development organizations including NGOs to

meet donors’ requirements in reporting their activities. It is often criticized

that the practice of applying the DAC criteria is rather mechanical, described

as a template or box-ticking approach and even perceived as a ‘straight jacket’.

Even though the DAC requires all criteria applied to be defined in unambiguous

12 An example is the DAC Criteria for the Evaluation of Humanitarian Assistance.

The Active Learning Network for Accountability and Performance in Humanitarian

Action (ALNAP) has introduced three additional evaluation criteria: connectedness,

coherence and coverage, considering some unique features of humanitarian

intervention (ALNAP 2006).

55

terms in the quoted above, it is argued that several criteria are not well

understood, and their use is often mechanical while excluding more creative

evaluation processes (ALNAP 2006, 10-11).

Given that the term ‘evaluation criteria’ refers to the dimensions to be

addressed in evaluations, it is fair to say that the DAC criteria represent the

aspects of value that a development intervention should fulfill. In other words,

to be judged as successful in light of the DAC criteria, a development

intervention should be relevant, efficient, effective and sustainable as well as

brings impacts. It is natural that questions arise around the definitions of the

criteria and the standards of judgement against which the assessments are to be

made. For example, what does it mean when we say a development

intervention is relevant? What constitutes an effective intervention? What

are the benchmarks of an efficient program? What quality and quantity of

impacts do we expect? To what extent and how long should an intervention

be sustainable? I discuss these questions on each criterion as follows.

3.3.2. Relevance

In the OECD/DAC evaluation context, relevance is defined as “the extent to

which the aid activity is suited to the priorities and policies of the target group,

recipient and donor” (OECD n.d.). More detailed descriptions are found in

the Glossary of Key terms in Evaluation and Results Based Management:

56

The extent to which the objectives of a development intervention are

consistent with beneficiaries’ requirements, country needs, global

priorities and partners’ and donors’ policies. Note: Retrospectively,

the question of relevance often becomes a question as to whether the

objectives of an intervention or its design are still appropriate given

changed circumstances. (OECD 2002, 32)

Relevance is a rather subjective term and the interpretation may vary

according to ‘relevant to whom or to what’. As to the question of ‘relevant to

whom’, the DAC definition indicates the target group, partner (recipient)

government, and donor government. ‘Relevant to what’ involves needs,

priorities, and policies. Are the target group, recipient government, and donor

government have the same needs and policy priorities? It is hard to say they

always are. The target group’s needs may depend on the context and be

specific to local conditions. A recipient government may have a different

priority from the people in need, possibly due to political interests of those in

charge of making policies or implementing the donor-supporting programs.

Donor’s priority may reflect its diplomatic, political or economic interests that

may not be always consistent with the needs of target population.

A question also arises on how to identify the policy priorities of recipient

government or donor’s. For example, policy documents of developing

countries, such as national development plans or multi-year sector strategies,

tends to encompass almost all areas and sectors because they need

57

comprehensive strategies for national development. Thus, it is rather rare to

find a development project that would not be justified by policies of the

recipient government. Likewise, one would easily find a way or another to

justify an activity claimed to be a development intervention being consistent

with policies of donors and aid agencies. The policy context would not

sufficiently reflect the diversity of target communities nor the differences in

priorities at central and local levels.

So the definition of relevance itself does not serve as a good guidance for

making a thoughtful judgement whether the intervention under assessment is

relevant. It even bears a risk of misleading evaluators to use the policy context

as the primary yardstick to assess the relevance criterion. This is an issue

commonly criticized by many. Heider (2017), noting that meeting the bar for

relevance is not all that hard, argues that the relevance criterion might be

irrelevant in the world of complexity. Chianca (2008) points out that the

context and significance of the intervention for the donors and governments are

important to understand in evaluation but it is not necessarily evaluative. He

adds that the core function of the relevance criterion should be “to determine

whether the intervention’s design, activities, and initial results are adequate to

respond to existing needs”. Markiewicz and Patrick (2016) have replaced the

term relevance with ‘appropriateness’ in their book, as they consider the latter

to be more inclusive encompassing the needs of key stakeholders and program

beneficiaries.

58

Interestingly, the focus of sample questions in the relevance criterion is

more on the program’s objectives than policy context. The DAC suggests:

In evaluating the relevance of a programme or a project, it is useful to

consider the following questions:

• To what extent are the objectives of the programme still valid?

• Are the activities and outputs of the programme consistent with

the overall goal and the attainment of its objectives?

• Are the activities and outputs of the programme consistent with

the intended impacts and effects? (OECD n.d.)

The questions ask whether the programme is designed in a way the objectives

and the intended effects can be achieved from the intervention, which is similar

to assessment of the so-called program theory, i.e., whether the program is

logically and theoretically sound in achieving the outcomes. The sample

questions address at least some aspects of design of intervention but still not

sufficiently in a sense that they focus more on the objectives and overall goal,

rather than the need of the intervention itself. Taking only the definition and

sample questions into the consideration, one would view an intervention as

relevant if it is aligned with policies and if it is plausible to achieve the goal

with planned activities.

3.3.3. Effectiveness

59

‘Effectiveness’ is a term that causes confusion. It requires a different

interpretation from ‘effects’ of an intervention or when the intervention is

‘effective’. The DAC defines ‘effectiveness’ as “a measure of the extent to

which an aid activity attains its objectives”. The Glossary of Key Terms

provides a more elaborated definition of effectiveness as follows:

The extent to which the development intervention’s objectives were

achieved, or are expected to be achieved, taking into account their

relative importance. Note: Also used as an aggregate measure of (or

judgment about) the merit or worth of an activity, i.e. the extent to

which an intervention has attained, or is expected to attain, its major

relevant objectives efficiently in a sustainable fashion and with a

positive institutional development impact. Cf. Effect: Intended or

unintended change due directly or indirectly to an intervention.

(OECD 2002, 20)

This detailed definition has the element of relative importance of objectives.

It also recognizes the term can be used as an aggregate measure of the merit or

worth of an activity with consideration of relevance of objectives, efficiency,

sustainability of development impact. It is interesting that a ‘Cf’ explaining

the term ‘effect’ is added. According to the note, ‘effect’ means “intended or

unintended change due directly or indirectly to an intervention”, which is very

similar to the definition of ‘impact’ criterion to be discussed later. In the

Glossary, the definitions of effects, impact, results, and outcome are very

similar and so it seems meaningless to distinguish those terms.

60

The above discussion indicates that the term ‘effectiveness’ can be

understood in different ways. Nonetheless, an assessment of ‘effectiveness’

of an intervention generally has more focuses on achievenments against its pre-

set objectives rather than on effects. The sample questions that the DAC

suggests to consider are:

• To what extent were the objectives achieved or are likely to be

achieved?

• What were the major factors influencing the achievement or

non-achievement of the objectives?

In fact, the DAC definition of ‘effectiveness’ is consistent to the one

widely accepted in the evaluation field in general. In Encyclopedia of

Evaluation, ‘effectiveness’ is defined as “the extent to which an evaluand

produces desired or intended outcome” (by Davidson in Mathison 2004, 122).

In the definition, however, it is asserted that “effectiveness alone provides a

poor assessment of overall evaluand merit or worth”. The problems are: an

evaluand can be effective in terms of producing desirable intended outcomes,

but can produce unintended negative effects at the same time or be overly costly.

Demonstration of a causal link between the evaluand and the desired outcomes

is required if the evaluand can be claimed as effective.

The same applies to the DAC definition of effectiveness. It would not

be fair to say that a development intervention is effective when it yields

61

desirable intended results with serious negative impacts or at an excessive cost.

The DAC framework provides other criteria such as impact and efficiency to

assess ‘unintended’ or ‘negative’ impact or costs. They may serve as

safeguards, as long as the assessments can be properly combined into an overall

assessment. With regard to a causal link, the ‘effectiveness’ criterion does not

ask a why question, so it is left to evaluators whether to investigate if the

achieved objectives are outcomes caused by the intervention and not

coincidental changes.

There are other problems associated with using preset objectives as

evaluation criteria or so-called objective-oriented evaluation (Davidson 2005,

Scriven 1991). First of all, it makes evaluators concentrate on the effects the

program is intended to bring about, so miss the unintended outcomes in many

cases unconsciously. Second, if the program objectives are not clearly defined

or too ambitiously or conservatively targeted, their achievement itself would

not provide meaningful information about the program’s merit, worth or

importance. Objective-oriented evaluation may induce those who set the

objectives at the designing stage to lower the level of targets so as to be

achieved easily. In addition, difference in difficulty to achieve or importance

between objectives can be problematic in case of programs with multiple

objectives.

The ‘effectiveness’ criterion involves similar weaknesses of objective-

oriented evaluations such as: validity of the objectives and target levels, relative

62

importance between objectives, competing or even completing objectives,

tendency to set objectives that are more visible and easy to address, and less

attention to unintended or negative side effects of the intervention. It also

heavily rests on an assumption that the objectives have causal links with the

intervention. These issues need to be addressed carefully for the assessment

of ‘effectiveness’ to provide meaningful information for the overall evaluation

results.

3.3.4. Efficiency

Efficiency is a term causing a confusion as much as effectiveness. In the field

of evaluation in general, efficiency is defined as “the extent to which an

evaluand produces outputs and outcomes without wastage of resources such as

time, efforts, money, etc.” (by Davidson in Mathison 2004, 122). It differs

from ‘cost-effectiveness’ in the respect that the latter compares both the cost

and results (by Levin in Mathison 2004, 90). Chianca (2008) argues that cost-

effectiveness is a more comprehensive term in defining the concepts embedded

under the efficiency criterion. Nonetheless, the distinction between the two is

not very clear.

In the DAC documents 13 , the definition of efficiency and sample

13 The DAC Glossary provides a rather short and simple definition of efficiency as “a

measure of how economically resources/inputs (funds, expertise, time, etc.) are

63

questions appear as below:

Efficiency measures the outputs—qualitative and quantitative—in

relation to the inputs. It is an economic term which signifies that the

aid uses the least costly resources possible in order to achieve the

desired results. This generally requires comparing alternative

approaches to achieving the same outputs, to see whether the most

efficient process has been adopted. When evaluating the efficiency

of a programme or a project, it is useful to consider the following

questions:

• Were activities cost-efficient?

• Were objectives achieved on time?

• Was the programme or project implemented in the most

efficient way compared to alternatives?

Proper attentions should be given to ‘qualitative and quantitative’ and ‘cost-

efficient’. Assessment of ‘efficiency’ should examine the input-output

relations with both qualitative and quantitative consideration, and whether the

intervention achieved the desired results, supposedly outputs stated in the first

line, at the least costly process in comparison with alternatives. According to

the definition, an intervention can be regarded as efficient if it achieved the

output with least costly resources in comparison with alternatives.

On the other hand, the question whether the intervention was completed

on time involves a comparison to the initial plan. It is important to follow the

converted to results”.

64

plan considering the cost of delay, but only when the plan was

reasonable. Once the benchmark is the initial plan, it would make the program

implementers keep the planned schedule even in cases when they find serious

flaws in the initial design and adjustments are necessary. Especially when a

project is designed by donor and implemented in the developing countries, there

is often the cases where the initial plan does not fully consider the local context

and potential risks. Sometimes it could be more efficient to make necessary

adjustments to the plan rather than to follow a wrong or flawed plan.

Another challenge in assessing efficiency as defined is that it is hard to

find benchmark programs to compare the program cost and output with. Even

one attempts to conduct a cost-effective analysis which compares the costs of

alternatives for outcomes in a same unit, it is rather difficult to measure because

it assumes an identical objective (or effect) between projects in comparison.

Therefore, judgements on efficiency can be very subjective unless one could

find good benchmarks.

3.3.5. Impact

While ‘effectiveness’ means the extent to which an intervention achieves its

objectives, ‘impact’ considers broader results of the intervention, as defined:

The positive and negative changes produced by a development

65

intervention, directly or indirectly, intended or unintended. This

involves the main impacts and effects resulting from the activity on

the local social, economic, environmental and other development

indicators. The examination should be concerned with both

intended and unintended results and must also include the positive

and negative impact of external factors, such as changes in terms of

trade and financial conditions. When evaluating the impact of a

programme or a project, it is useful to consider the following

questions:

• What has happened as a result of the programme or project?

• What real difference has the activity made to the beneficiaries?

• How many people have been affected?

Anyone would hardly disagree that a development intervention should

yield a long-term positive impact whether intended or unintended and minimize

negative impact if any. But conceptualizing development impacts is not that

easy. If we are looking for the impacts “positive and negative, primary and

secondary long-term…, directly or indirectly, intended or unintended” as

defined in the DAC Glossary (OECD 2002, 24), the scope of evaluation would

be very wide.

The most controversial issue in this criterion has been related to the

methodology, namely, how and which impact to be measured and assessed. At

the most ambitious, the impact criterion requires a rigorous evaluation design

with rather complicated and expensive methodologies, e.g., randomized

controlled trials (RCTs) to provide scientific evidence of the impacts. The

66

prevalent view in the development evaluation community, however, is that

rigorous evidence-based evaluation is not likely possible nor feasible in most

cases, and even not necessarily desirable (Forss and Bandstein 2008).

Development impacts are diverse and cannot be defined in a simple way, so it

is almost impossible to design an evaluation that could provide answers to

multiple questions. Long-term impacts, if they have not occurred at the time

of evaluation, are not feasible to assess. A more practical reason is that it is

too costly.

For these reasons, some development agencies exclude the impact

criterion when making overall judgements on interventions. For example,

ADB recommends impact criterion to be considered under ‘other performance

assessment’ section and not synthesized in overall assessment of projects (ADB

2016, 23).

3.3.6. Sustainability

In the Glossary, sustainability is defined as the “continuation of benefits

from a development intervention after major development assistance has been

completed” (OECD 2002, 36). It also means “the probability of continued

long-term benefits or the resilience to risk of the net benefit flows over time”.

Detailed description and evaluation questions are as below:

67

Sustainability is concerned with measuring whether the benefits of an

activity are likely to continue after donor funding has been withdrawn.

Projects need to be environmentally as well as financially sustainable.

When evaluating the sustainability of a programme or a project, it is

useful to consider the following questions:

• To what extent did the benefits of a programme or project

continue after donor funding ceased?

• What were the major factors which influenced the achievement

or non-achievement of sustainability of the programme or project?

The definition of sustainability assumes that the intervention under

evaluation produces benefits that are worth continuing. It does not ask how

long the benefit should continue at what cost. Relevant questions would arise:

would it be worthwhile to maintain a program when it is inefficient and likely

to cost more than the benefits it would generate or when unintended negative

effects exist?; what if the benefits would become insignificant in the near future

not because the intervention is a failure but because it inevitably has a short life

due to rapid changes in technology for example?

Whether or not an intervention is worth continuing depends on two

questions: first, whether the intervention yields sufficient beneficial outcomes

and second, whether the benefits will be enough to justify the costs to be

incurred in sustaining the outcomes.

The first question, whether the intervention under evaluation yields

sufficient outcomes and is likely to do so in the future, is related to the other

68

criteria, i.e., effectiveness and impact. It may not be sensible to measure

“whether the benefits of an aid activity are likely to continue after donor

funding has been withdrawn” if the development outcomes it generates are

insignificant or when negative externalities exist. So, defining the scope of

sustainability of an intervention should be based on the assessments in

‘effectiveness’ and ‘impact’ criteria.

The second question, whether the benefits in the future can justify the

costs to be incurred, is about efficiency or cost-effectiveness of maintaining the

intervention. It would not be worthwhile to continuing a program if it is likely

to cost more than the benefits it would generate. After the donor funding is

withdrawn, the costs of continuing program are supposed to be borne by

recipient government or community and they would not have an incentive to

maintain the program if it is not cost-effective.

The evaluation models examined earlier also support these arguments.

In KEC model, sustainability of program’s effects is supposed to be covered

under ‘outcomes’ category. CIPP model suggests that in sustainability

evaluation evaluators need to identify what success should and could be

sustained. Using the concept of evaluation hierarchy in theory-based models,

assessment of sustainability rests on the findings under the effectiveness,

impact and efficiency criteria as the building blocks.

To summarize, the scope of sustainability should consider 1) the

magnitude of benefits which have been or is likely to be generated after

69

withdrawn of donor funding, 2) the costs to be incurred to sustain the benefits

during the expected life of intervention.

3.4. Criterion in Cost-Benefit Analysis – Net Present Value

The NPV is one of the indicators to measure the social welfare generated by a

project being assessed, and probably the most reliable one (de Rus 2010, 129,

Boardman et al. 2011, 13). It summarizes the social value of a project in a

single figure by subtracting all the costs (C) from the benefits (B) accrued for

the period of project life (n) with the appropriate discount rate (r). It is

generally expressed as follows:

n

tt

tt

r

CBNPV

0 )1(

Bt and Ct represent benefit and cost at year t respectively, which include

direct and indirect benefit and cost as well as those whose market prices do not

exist. If the NPV is zero, the project’s present value of benefits is equal to the

present value of its costs, which would make either decisions on whether to

accept or reject indifferent. If the NPV is greater than zero, investment in the

project will yield more benefits, compared to no investment or investing to

another with less NPV. Two general statements can be drawn: 1) a project is

worth investing if the NPV is positive; and 2) a project with greater NPV is

70

worthier than one with smaller NPV. So the decision rules are: 1) for a single

project, adopt the project if its NPV is positive; 2) out of multiple, mutually

exclusive projects, select the project that maximizes the net social benefit,

which means one with the largest NPV.

CBA is generally thought to measure efficiency, but the definition is

much broader than one in the DAC criteria. CBA in principle examines all

consequences against the costs invested as well as to be incurred in the future.

Therefore, it helps organize understanding of the consequences of an

intervention and consider the multiple implications together, by analyzing each

of the essential elements and synthesizing the findings from the analyses.

This corresponds to what evaluators do when conducting evaluations.

The principle of CBA is that an intervention is worth implementing when

the benefits exceed its costs. The net benefit is calculated as the net present

value (NPV) which is the main indicator to express the social value of a project.

Based on the CBA theory which assumes consideration of all direct and indirect

effects as consequences of a project against costs for a certain period, I argue

that CBA examines all aspects that the DAC criteria suggest. As an indicator

that represents the social value of a project, the NPV would reflect the needs

(relevance), benefits in relation with costs (efficiency), increase in social

welfare (effectiveness and impact), and the period that the benefits maintain

(sustainability).

71

Figure 6. Net Present Value (NPV) and DAC Criteria

The NPV is also comparable between projects, i.e., which project is more

worthy or successful, as the NPV shows the net social value in a single figure.

Comparability is one of the main purposes for which the DAC criteria were

developed, so it is also possible and meaningful to compare the evaluation

results between the DAC framework and CBA.

In the DAC framework, effects are generally called as ‘results’ of the

intervention, which include intended positive effects, supposedly to be assessed

under effectiveness criterion, as well as unintended effects whether positive or

negative under impact criterion. From the definitions, effects or results dealt

with in both CBA and the DAC framework are not very different. However,

how to view those effects and how much value to put are quite different.

72

A CBA starts from the ceiling, meaning that the first job is to identify

those effects as exhaustively as possible. Then it is required to examine

whether the effects are caused by the intervention, whether they are double-

counted, and what is the net value (if any social costs are accompanied with the

social benefit under discussion) or the present value of those effects (in case

they occur in different times).

On the other hand, the DAC definitions of effectiveness and impact do

not provide explicit boundaries of causality, overlapping, or time. So an

assessment generally starts from the minimum, the intended positive results,

and then adds other effects which are (luckily) found by evaluators and deemed

to be plausibly caused by the intervention rather intuitively. There seems to

be no clear standard on how to value the magnitude of those effects. In case

of long-term outcomes, often used as synonym to impact, the present value is

rarely considered. It is possible that even if some negative impacts exist but

was not considered by evaluators intended or unintended.

The limitation of inferring causality in the DAC framework is similar to

that in CBA at best or much looser (as we have seen in many reports in the cases

examined in the preceding chapter). While listing impacts plausibly caused

by the intervention, overlaps may happen between the claimed results.

73

3.5. Discussion

The review of multiple evaluation theories shows that the DAC criteria cover

most of the dimensions that other established evaluation models suggest.

Figure 7 illustrates the comparison of evaluation criteria or dimensions

examined in this chapter.

Source: organized by author.

In the DAC evaluation framework, the five criteria are used as analytical

categories. Evaluators assess the evaluand from each dimension or category

of analysis defined in the DAC criteria. Based on the assessments in all

categories, the overall conclusion, that is, the final evaluative judgement is

Figure 7. Comparison of Dimensions or Criteria in Evaluation Models

74

drawn. The DAC criteria framework is for a summative evaluation which

focuses more on outcomes than process. Assessment of process is not explicit

in the DAC criteria. The word ‘process’ only appears once in the description

of efficiency criterion, “whether the most efficient process has been adopted”.

The DAC definition of evaluation also indicates that evaluation is the

assessment of an on-going or completed aid activity, determining the worth or

significance of the activity (OECD 2002, 21-22). Therefore, the purpose of

evaluation using the DAC criteria is more for accountability by determining

whether the intervention was successful as well as for helping make decisions

on whether to continue, expand or export the intervention on what conditions

or modifications.

The analysis found that the DAC criteria are interrelated, and sometimes

one is a precondition of another. For example, relevance of the objectives and

program design is a necessary condition of a project being effective and

efficient by the definitions. If the objectives are not adequate (relevant), there

is no point to measure the extent to which the objectives were achieved, which

is effectiveness. The similar argument applies to efficiency criterion: it may

not be important to follow the plan and complete the project on time or on

budget if the initial plan was irrelevant, not designed for an efficient

implementation. Sustainability assumes that the project under evaluation

produces and will produce net benefit which encompasses both positive and

negative outcomes whether intended or unintended, so assessment in the

75

effectiveness and impact criteria should be considered.

The effectiveness criterion can overlap with impact. By the DAC

definition, assessment of impact requires a comprehensive analysis the results

caused by the intervention, which include positive and negative, intended and

unintended results. The effectiveness measures the extent to which the

intervention achieved its objectives which means the intended positive results.

For this reason, some argue that effectiveness could be subsumed under impact

rather than be a stand-alone criterion (Chianca 2008).

The issues of interrelation and overlap between criteria have an

implication to the relative weight of criteria and method of synthesis, which

will be discussed in Chapter 5. Relative weight is also an issue within a

criterion. For example, relevance to which is more important, priority of

recipient government, donor’s policy or the needs of target population? If

there are multiple objectives and the intervention achieved a less important

objective fully but a more important one partially, what overall judgement can

be made in effectiveness criterion? This is the topic of standard of judgement,

which will be covered in Chapter 4.

76

CHAPTER 4. STANDARD OF JUDGEMENT – REVIEW OF EVALUATION

REPORTS IN KOREAN AGENCIES

4.1. Overview

The term ‘standard’ in evaluation means the level of quality needed for a certain

judgement. Constructing standards with which performance are compared

consist the second and third steps in the logic of evaluation. Since the DAC

criteria provide brief definitions and a few sample questions as discussed in the

previous chapter, the analysis of judgement standards requires both conceptual

and empirical approaches. For this reason, this chapter presents an analysis of

65 ex-post evaluations published by two main ODA agencies in Korea, KOICA

(Korea International Cooperation Agency) and EDCF (Economic Development

Cooperation Fund), from 2013 to 2015.

As an emerging donor and a member of DAC, Korea has incorporated

the DAC evaluation principles into its ODA evaluation policy, which requires

all agencies to use the five criteria as a primary evaluation framework. As

required by the evaluation policy, all evaluations under review used the DAC

criteria and reported final ratings by criteria. The main question of the

analysis is whether the DAC criteria serve as a good framework to fulfill the

evaluation purpose, that is, to provide credible and useful information for

77

learning and decision-making. Specifically, the evaluation reports are

examined following three questions: 1) what evaluation questions they ask

under each criterion; 2) what methodologies and evidences they use to answer

the questions; and 3) whether the findings are consistent with the ratings

KOICA and EDCF have their own evaluation units and guidelines

KOICA, responsible for Korea’s bilateral grants, established its Development

Cooperation Evaluation Guidelines established in 2008 which was updated in

2014. Since 2013, KOICA has adopted a “Project Result Rating System” by

which all ex-post evaluations are required to assess and rate the projects in

accordance of DAC criteria. EDCF also applies its own rating system in ex-

post evaluations.

Table 3. Rating System and Soring Scale

Criteria Sub category Rating Scale

Relevance

1. Relevance to development strategy and needs

of partner country, and to Korea’s development

cooperation strategy

4 3 2 1

2. Relevance of design and implementation 4 3 2 1

3. Ownership of the partner country 4 3 2 1

Average (a)

Efficiency

1. Cost Efficiency (within the planned budget) 4 3 2 1

2. Time Efficiency (within the planned time

frame) 4 3 2 1

3. Results against inputs 4 3 2 1

Average (b)

Effectiveness/

Impact

1. The extent to which objectives are met 4 3 2 1

2. Positive or negative impacts on society,

economy, institutions 4 3 2 1

Average (c)

78

Sustainability

1. Human resources, institutional and financial

aspects 4 3 2 1

2. Maintenance capability and management

system 4 3 2 1

Average (d)

Total Score (a+b+c+d) 16

Source: CIDC Sub-Committee on Evaluation (2015).

All ex-post evaluation reports published by EDCF and KOICA during

2013-2015 were collected for the analysis. For the 3 years, 65 ex-post

evaluations were available, 47 by KOICA and 18 by EDCF. The brief

information on the evaluation reports collected is in Table 4. Figure 8

illustrates the classification of the projects by sector.

Table 4. Descriptive Data of the Samples

All KOICA EDCF

Number of report 65 47 18

2013 25 19 6

2014 27 21 6

2015 13 7 6

Average size of project

(amount supported, million)

9.1 4.4 21.4

When (years after project

completion)

3.9 3.5 5

Duration (month of

conducting evaluation)

5.7 5.5 5.8

79

Figure 8. Evaluation Reports Classified by Sector

The sample consists of project evaluations with an average size of $9.1

million, conducted about 4 years after project completion in average. All are

conducted independently, with an average duration of 6 months. Information

collecting methods included desk review, interviews, and field visits. Some

conducted surveys but not in a large scale. It was rare to find evaluations with

detailed descriptions on the basis on which they selected interview or survey

respondents or what methods they used.

In the analysis, the following three questions are examined in each

criterion:

• Question 1: what evaluation questions (sub-criteria) they ask?

80

• Question 2: what methods and evidences they use to answer the

questions? What are the standards of judgement?

• Question 3: how the evaluative conclusion was drawn? Is the

rating consistent with the findings?

4.2. Result of the Analysis

4.2.1. Relevance

Evaluation questions asked in relevance criterion were diverse. Figure 9

shows the list of 14 different questions and how many evaluations out of 65

asked each of them. Nonetheless, most of the questions largely fall into two

categories of sub-criteria, one in relation to policies and the other concerned

with program design or implementation process.

81

Figure 9. Evaluation Questions in Relevance Criterion

Questions related to policies asked if the project under evaluation is

aligned with development strategies and priority of recipient country, Korea’s

ODA policy and agency’s priority, or global development agendas such as

MDGs, Paris Declaration (PD) or Accra Action Agenda (AAA). Questions on

program design and implementation mostly dealt with whether the plan was

appropriate in terms of schedule and budget or whether the process of planning

or implementation was in conformity of agreed requirements.

Not many evaluations, 15 out of 65, examined whether the objectives

were valid, and the project elements was designed to achieve the objectives.

Surprisingly, more than 20% (15 evaluations) did not assess the validity of

82

objectives nor the logical links between the intervention and objectives.

Assessment about the need for the intervention was rare. Even when asked,

the supporting arguments heavily relied on project documents, e.g., ex-ante

appraisal reports. In other words, it seems that the need of target population

is not the primary aspect to be examined…

Table 5 summarizes the standard of judgements and source of supporting

information based on which the assessments in the sub-criteria were made.

Table 5. Main Standards of Judgment in Relevance Criterion

Sub-criteria Standards of judgement Sources of supporting

information

Consistency with

development

strategy and policy

priority in the

recipient country

Consistent if the objectives of

the intervention are aligned with

those in policy documents

(national development plan,

sector/multi-year strategy, etc.)

Mainly document review,

Partly interviews with

government officials or

rarely with beneficiaries

Consistency with

Korea’s ODA policy

and sector/country

strategy

Consistent if the project sector or

objectives are aligned with

Korea’s ODA policy document

or agency’s sector strategy

Document review

Consistency with

international agenda

(MDGs,

harmonization)

Consistent if the project

objectives are included in MDGs

or no overlap with other donors’

activities is found

Document review,

In some cases, interview

with donor agencies

Relevance of project

design/decision-

making process

Relevant if the plan in terms of

schedule and budget, or the

process of planning and

implementation was appropriate

Mainly document review,

Interview with

stakeholders

Relevance to target

area

Relevant if the target area was

selected in due process, if

project elements (equipment,

training, technology, etc.) were

appropriate in local conditions

Document review,

Interview,

Site-visit

83

It is not surprising that the first question, which was asked by all

evaluations in the sample, was whether the aim of the intervention was

consistent with development policy of recipient country. Majority of the

evaluations, 86%, also examined the consistency with policy and sector strategy.

Assessments on the above questions were largely based on the findings from

document reviews, e.g., national development plans, sector/multi-year

strategies, Korea’s ODA policy, etc. Some evaluations presented the results

of interviews with stakeholders, e.g., government officials in recipient country,

but it seems to be natural that the findings were very positive considering the

nature of questions. There was no single case that found the project objectives

were not consistent with the policies or development strategies of both recipient

government or Korea.

Findings in relevance criterion do not necessarily require in-depth

research either in document review or on site. The most frequently asked

question related to the consistency with strategy and priority of recipient

country was easily answered with findings from policy documents. Questions

concerned with project design, e.g., whether the planning process was

appropriate or participation of and cooperation with the recipient government

was sufficient, was assessed based on the project documents or the results of

interviews with stakeholders who gave positive answers in most of the cases.

As a result, the overall ratings are rather high, around 3.5 in average out 4-point

scale as shown in Table 6.

84

Table 6. Average Ratings on Relevance

KOICA EDCF Rating Description

2013 3.65

(2.74) 3.82

4: very relevant

3: relevant

2: partly relevant

1: irrelevant

2014 3.43

(2.57) 3.48

2015 3.27 3.73

All 3.45 3.68

Note: KOICA used 3-point scale in 2013 and 2014. For comparison purpose, the

rescaled (1~4) ratings are presented, and the original scores are in parenthesis. The

same applies to the tables for other criteria.

It is fair to say that the primary focus on policy context results in overall

positive assessments in relevance criterion. There are many cases which

received high scores over 3 (meaning the project was relevant) although the

project design was not relevant to produce the intended outcomes. For

example, a project for construction of solid waste recycling facility in

Ulaanbaatar City, Mongolia, supported a RDF (refuse derived fuel) production

facility which was inappropriate to the local condition and turned out to be

completely useless. This flaws in project design was discussed in relevance

assessment but did not prevail in the judgment. The dominant justification for

relevance was that the overall purpose of the project was consistent with the

development policy of Mongolia, which led the higher score of 2.7 out of 4 than

it seems to deserve.

85

Overall, the assessments of relevance criterion in the evaluations

reviewed in this study did not involve serious investigations on the need of the

project, the local priorities, or the logical linkage between the project activities

and intended results.

4.2.2. Efficiency

In the DAC definition, ‘efficiency’ criterion focus on whether the intervention

was implemented in the least costly manner. In evaluations of Korean

agencies, the most common question was whether the project was completed as

planned in terms of time and budget. Half of the sample assessed the process

with focus on communication and cooperation with stakeholders, e.g., the local

government. Only 26% made assessments the results against input, and less

than 10% attempted a comparison to alternatives, e.g., similar project

implemented by other development agencies.

Other questions in efficiency criterion are concerned with the process of

communication or management, which often appears in relevance criterion, too.

Some evaluators who attempted to conduct process evaluation asked related

questions in either relevance or efficiency criteria, as the DAC definitions do

not explicitly mention an assessment of process. The list of questions and

their frequency is presented in Figure 10.

86

Figure 10. Evaluation Questions in Efficiency Criterion

Information on whether the project was completed within the planned

budget and on time is not hard to obtain. In most cases, the assessment in the

cost- and time-efficiency sub-criteria was based on the records in project

documents. The judgement was made according to the rating standards in the

guidelines, described in Table 7. For example, if a 2-year project with a

budget of $1 million was completed in 2.5 years at total cost of $1.2 million, it

would be assessed as ‘partly efficient’ in terms of time and ‘efficient’ in terms

of cost. It implies that the project plan is assumed to employ the most efficient

way of resource use, and as a result the project plan became the most important

standard of judgment against which the efficiency was assessed.

87

Table 7. Main Standards of Judgment in Efficiency Criterion

Sub-criteria Standard of judgement Sources of supporting

information

Cost efficiency

Very efficient if completed within

planned budget

Efficient if completed within 120%

of planned budget

Inefficient if exceeded 150% of

planned budget

Document review,

Interviews

Time efficiency

Very efficient if completed on time

as planned

Efficient if completed within 120%

of planned time

Inefficient if exceeded 150% of

planned time

Document review,

Interviews

Communication

process,

partnership

Degree of involvement of

stakeholders (e.g., local

governments) in the process

Document review,

Interviews with

stakeholders

Results against

input

The extent to which the project cost

is deemed to be reasonable in

comparison to alternatives

Review of similar

interventions in the area,

Interviews

Efficiency in

management

Status of operation of the

facility/equipment supported

Site- visit,

Interviews

Most of the projects under evaluation exceeded the planned time. In

some cases, additional budget was provided. If judgements were made strictly

in accordance with the standard in the guidelines, there would have been many

projects to be assessed as ‘inefficient’ in time- and cost-efficiency sub-criteria.

When the project was delayed by more than 50% of planned period, which

would fall into the category of ‘inefficient’, it was often justified by inevitable

or unforeseen circumstances, for example, delays due to customs procedure or

slow administration in recipient government. There were only two projects

88

which received an overall assessment of being inefficient.

Table 8. Average Ratings on Efficiency

KOICA EDCF Rating Description

2013 3.01

(2.26) 3.45

4: very efficient

3: efficient

2: partly efficient

1: inefficient

2014 3.37

(2.52) 3.22

2015 2.93 3.30

All 3.10 3.32

Table 8 presents the average ratings on efficiency by agency and by year.

It indicates that the average falls into the ‘efficient’ category. The time- and

budget-efficiency were dominant sub-criteria that lowered the overall rating

compared to that on relevance, even with some excuses of inevitability.

4.2.3. Effectiveness and Impact

Effectiveness and impact criteria are discussed together in this section as

evaluations by KOICA combine the two criteria and give one rating.

Questions in the effective criterion are rather standardized into four types

(Figure 11). Most of the evaluations asked about achievement of output and

outcome (goals) as the primary sub-criteria, which is not surprising in

89

consideration of the DAC definition of effectiveness. The other frequently

asked question is whether the outputs, i.e., facilities or equipment provided by

the project, were well utilized. This question also appears in efficiency

criterion. The level of beneficiaries’ satisfaction served as another important

sub-criterion.

Figure 11. Evaluation Questions in Effectiveness Criterion

In impact criterion, 80% of evaluations identified ‘achievement of long-

term goals’ as the primary sub-criteria. Other questions considered economic

and social impact, impact on recipient’s policy and system, and so on. 26%

of the evaluations raised a question about unintended impacts, while 20% asked

90

on the project’s influence on bilateral relationship between the recipient country

and Korea. Impact on environment was considered in 5 evaluations, as others

covered the issue of environment along with gender mainstreaming under the

cross-cutting issues which are not included in calculation of the final rating.

Figure 12. Evaluation Questions in Impact Criterion

Table 9 summarizes how and with what supporting information the

judgements in effectiveness and impact criteria are made. Achievement of

outputs or outcomes are measured against the preset targets in original plan in

most cases. Generally, measuring outputs is straightforward and the records

are readily available. Most of the projects succeeded in producing the planned

output. Measuring achievement in outcomes or long-term goals, on the other

91

hand, is a challenging task. It was rather common that the targets or indicators

for outcomes had not been clearly set, so the assessments were often based on

output indicators only or on evaluators’ conjecture.

Under impact criterion, various impacts were described but rarely with

plausible causal links. There was no evaluation which attempted a causality

test. In many cases, the assessments were based on speculation. Claimed

impacts were often supported by macro data which covers far wider

geographical area or in little relations to the project’s results. For example, in

the evaluation of a project of which the main activity was provision of medical

equipment, it was stated that the project was deemed to contribute to

improvement of health condition and drop in fatalities in the target area on a

basis of the improved health indicators (EDCF 2014). However, the claims

are highly exaggerated, as the indicators and data were from the regional

statistics on which the influence of the project would have been very limited.

92

Table 9. Main Standards of Judgment in Effectiveness/Impact Criteria

Sub-criteria Standard of judgement

Sources of

supporting

information

Achievement in

planned outputs

Very effective if over 90% of

planned output/outcome achieved

Effective if 70~90% of planned

output/outcome achieved

Partly effective if 50~70% of

planned output/outcome achieved

Ineffective if less than 50% of

planned output/outcome achieved

Document review,

Data analysis,

Interviews,

Survey,

Site visit Achievement in

planned outcome

Utilization of

outputs

The extent to which the output is

operational as planned

Site visit,

Interviews,

Questionnaires

Impact on society,

economy and

institutions

Whether the project is deemed to

contribute to improvements in

society, economy, institutions, etc.;

The degree of importance of the

project

Document review,

Data analysis,

Site visit,

Interviews,

Questionnaires

Beneficiaries’

satisfaction level

The extent to which the

beneficiaries are satisfied Interviews,

Questionnaires

Unintended

outcomes

Whether unintended negative

impacts exist

Site visit,

Interviews,

Questionnaires

The average ratings on effectiveness and impact are presented in Table

10. While many of the evaluations found that the achievement in outcomes

were not very substantial, the achievement in output raised the overall score in

effectiveness. The average score in impact does not seem to be reliable,

because the assessments were mostly based on speculations and even

expectations without plausible causal links.

93

Table 10. Average Ratings on Effectiveness/Impact

KOICA EDCF

Effectiveness/Impact Effectiveness Impact

2013 3.09

(2.32) 3.63 3.48

2014 3.56

(2.67) 3.63 3.78

2015 3.14 3.59 3.58

All 3.26 3.62 3.61

4.2.4. Sustainability

The questions in sustainability criterion largely fall into three sub-criteria:

policy and institutional supports; financial sustainability, and maintenance

capacity (Figure 13 and Table 11). They cover most of the required aspects

that the DAC definition suggests (except for the environmental sustainability

which is dealt with separately under the cross-cutting issues criterion).

However, not many evaluations examined whether the demand for the

intervention is likely to be sustainable or whether the benefits are significant

enough to be worth continuing at the given maintenance costs.

The supporting information was mainly drawn from the results of

interviews with government officials or management personnel with regard to

94

whether they have ownership over the project and are willing to support

financially and institutionally. Assessments were made largely based on

expectations rather than supported by due analysis of the significance of

benefits and the contexts in which the benefits would and could be maintained.

Figure 13. Evaluation Questions in Sustainability Criterion

Table 11. Main Standards of Judgment in Sustainability Criterion

Sub-criteria Standard of judgement Sources of supporting

information

Policy and

institutional

supports

Sustainable if the recipient government

has a policy and institutions to support

the project and strong ownership

Document review,

Interviews of recipient

government

Financial

sustainability

Sustainable if financial resources will

be available for continuing operation of

the project

Document review,

Interviews of recipient

government

Maintenance

capability

Sustainable if the outputs are likely to

be operational with proper maintenance

Site visit, Interviews,

Survey

95

Table 12 shows the results of average ratings in sustainability, which is

the lowest among the five criteria. Looking into details, however, even this

lower score seems to be rather overrated. There was only one case that was

assessed as ‘not sustainable’ with an obvious reason that the supported facility

was no more in operation because of technology and maintenance issues so

little benefit was expected to continue (KOICA 2014). A number of projects

were assessed as ‘partially sustainable’ or ‘sustainable’, even when serious

problems in maintenance were observed or the facilities were not in full

operation so that it was hard to expect significant benefits in the future.

Table 12. Average Ratings on Sustainability

KOICA EDCF Rating Description

2013 2.89

(2.17) 3.41

4: very sustainable

3: sustainable

2: partly sustainable

1: not sustainable

2014 3.11

(2.33) 2.69

2015 2.89 3.30

All 2.96 3.13

It is certain that sustainability is an important aspect to be examined in

development evaluations. However, “measuring whether the benefits … are

likely to continue” in terms of financial and/or institutional resources would not

96

produce meaningful information unless it first addresses the question of what

benefits should and could continue and why.

4.2.5. Discussion

As the decisions on evaluation questions, standards of judgements, and the way

of synthesis are largely left to evaluators, characteristics of the DAC criteria as

against the above features become clear in practice. Empirical review of 65

ex-post evaluation reports by two Korean aid agencies, KOICA and EDCF,

shows that evaluations applying the DAC criteria tend to use similar sub-

criteria and evaluation questions, adopting standardized ones in the guidelines.

Some of the questions do not necessarily require in-depth research or are taken

for granted as a development project. This leads to high ratings in average

especially in relevance and effectiveness criteria.

The priority of evaluations is to answer the standardized questions (for

example, whether the planned outputs and outcomes have been achieved) rather

than to verify the causal links or to measure the effectiveness of interventions,

so the methods are rarely rigorous, and judgements rely on weak inferences or

survey results of beneficiaries’ satisfaction level. 88% of the reports

concluded that the evaluand is either successful or very successful, even in the

cases where development results were insignificant (Table 13).

97

Table 13. Average Overall Ratings of All Evaluations (2013-2015)

very

successful successful

partly

successful unsuccessful Total

All 17 40 8 0 65

26% 62% 12% 0%

KOICA 13 26 8 0 47

28% 55% 17% 0%

EDCF 4 14 0 0 18

22% 78% 0% 0%

The evaluation conclusions are drawn on a simple average of the ratings

by criterion, and a serious flaw detected in one criterion, e.g., sustainability, is

often canceled out by high rating in another, e.g., relevance. I find that such

mechanical applications of the DAC criteria and standardized questions can

mislead evaluation results, in most cases towards positive conclusions, which

make it difficult to understand the true value of the project, to differentiate a

more successful project from a less one, and to draw valid lessons from the

evaluation

4.3. Standards in the DAC Criteria and Net Present Value

Based on the findings in the analysis of how the standards of judgement in the

DAC criteria affect the evaluation results, this section compares them with the

standard of judgement in CBA, i.e., the net present value.

98

Relevance

By definition, a positive NPV means that the project under evaluation increases

the social welfare and satisfies the needs of target population. To increase the

social welfare is the primary purpose of development interventions. In other

words, a project with a positive NPV would be highly relevant to the need of

target population and the NPV can serve as a reasonable standard of judgment

for relevance of an intervention.

In the DAC criteria, ‘relevance’ refers to consistency with priorities and

policies of the donor, recipient country, and the target group. In the previous

chapter, several important questions have been raised regarding how to make

an overall assessment in the relevance criterion. Are the donor’s policy or the

priorities of recipient government always consistent with increasing the social

welfare of the target population? Of course, development cooperation policy

of any donor would support the overall welfare increase of target groups. The

national development strategies or ‘National Development Plan (NDP)’ may

list all dimensions of what people would need. But these facts do not

necessarily assure that specific local-level decisions on resource allocation

would correspond to the priorities of people’s needs, due to, for example, lack

of information, uncertainties, or political interests of policy makers. There are

many exemplary cases that shows political priority was in contrast the opposite

of what people actually needed. A project that increases the welfare of target

group by meeting their needs, that is, a project with a positive NPV, will be

99

more relevant than a project which is consistent with the priorities of

government policy but with unclear prospects of welfare increase in the specific

target group. If the policy priorities of donor or recipient government is

insufficient to reflect the people’s needs for the reasons described above, it

would be appropriate to consider the project’s NPV as a standard of judgement

in assessing the relevance in the DAC framework.

Effectiveness

In the DAC framework, effectiveness is generally measured by the extent to

which the intervention has achieved its objectives. This objective-based

assessment presents many weaknesses especially when the objectives were

poorly defined, unrealistic or underambitious, or more aligned with the needs

of donors or governments rather than the needs of beneficiaries. Meeting the

effectiveness criterion, i.e., the achievement of planned outputs or outcomes,

does not necessarily provide a meaningful information on the intervention’s

success. In other words, the level of achievement of objectives may not

always be an appropriate yardstick to assess the true benefits of an intervention.

CBA measures benefits which are embedded in the NPV. Given that a

development intervention would intend to produce outputs which lead to

positive outcomes, i.e., benefits, CBA involves an assessment of effectiveness

by measuring the benefits of the intervention. A positive NPV indicates that

the intervention has achieved its planned positive results.

100

What CBA measures is more related to outcomes than to outputs. In

fact, achieving outputs is a prerequisite of achieving outcomes, because the

intended outcomes or positive results caused by the intervention are supposed

to be generated through the project outputs. In the evaluations by Korean

agencies, achievements in both outputs and outcomes are considered as sub-

criteria in effectiveness, and high scores in output achievement tend to

compensate relatively low achievement in outcomes. To meet the intended

output target can be one of the sub-criteria in assessment of effectiveness, but

does not guarantee that the outputs will realize the outcomes. A positive NPV

can be a more relevant standard of judgement in assessing effectiveness than

simply measuring the level of outputs achievement.

Efficiency

CBA is generally considered as a tool to measure efficiency. If the NPV is

positive, a project is regarded as efficient in generating the benefits at the costs.

This interpretation of efficiency in CBA, however, is much broader than the

DAC definition of efficiency which focuses on whether an intervention was

implemented in the least costly manner in comparison to alternatives. The

DAC definition does not address whether the costs are justified by the benefits.

An intervention used the least costly resources compared to alternatives will

satisfy the DAC definition of efficiency, but if the intervention yields a negative

NPV, it would not be appropriate to adopt the project or consider it as efficient

101

from the CBA perspective.

It is worth noting that a positive NPV alone does not suggest that it is the

least costly way of resource use. CBA assumes that a project would be worth

investing as long as its NPV is greater than zero, regardless of whether there is

a possibility of reducing the costs in the process of implementation. In this

regard, evaluations using the DAC framework can provide useful information

in addition to NPV in assessing efficiency of an intervention, given that the

NPV is positive.

Impact

The DAC definition of impact deals with changes produced by a development

intervention. The changes to be addressed include positive or negative, direct

or indirect, or intended or unintended results. CBA in principle considers all

direct and indirect consequences of a project, so it clearly covers the impact

criteria.

The DAC definition does not provide an explicit scope of impact nor a

standard on how to assess its magnitude. Assessment generally starts with

intended positive results and the prospects of their long-term continuation after

the evaluation, and then adds other effects which are found by evaluators and

deemed to be plausibly caused by the intervention. In case of long-term

outcomes, the present value is rarely considered. It is possible that some

negative impact exists but was not considered by evaluators intendedly or

102

unintendedly. The claimed impact can be exaggerated with double counting

of overlapping results or due to potential additional costs concomitant with the

impact. It is also challenging to make a judgement on the level of overall

impact of an intervention if there are both positive and negative results, which

is likely to be a case in any development interventions.

The NPV can provide the information on net benefit, by comparing the

unintended negative impacts with positive impacts. Even if there is negative

impact, the intervention can be considered as worthy when the benefits caused

by the intervention is sufficient enough to cancel out its negative consequences.

The challenges of measuring the magnitude of impacts and demonstrating their

causal relationship with the input apply to both the DAC framework and CBA.

As a whole, it seems that CBA would provide more credible information about

tangible impacts as it considers overlaps in benefits and the costs concomitant

with the impact. With regard to the importance of an intervention and its

intangible impact on society, it is hard to expect that CBA would provide

meaningful information as they require a qualitative assessment.

Sustainability

CBA considers the life of intervention during which it is supposed to generates

the costs and benefits. In principle, the NPV is calculated the net benefit

during the project lifespan, based on the analysis of how much cost and benefit

will occur by the project during the period. So the NPV reflects the concept

103

of sustainability in the DAC framework, which measures whether and to what

extent the benefits continue.

In the DAC sustainability criterion, it is not explicitly mentioned how

long would be enough to consider or at what cost. It seems that sustainability

suggests the benefits of an intervention continue for as a long period as possible

regardless of the magnitude of the benefits or the costs to maintain the benefit.

From the CBA perspective, a positive NPV, which covers both the magnitude

of benefit and cost in the net benefit, is more important than how long the

benefit will continue. In CBA, benefits in the distant future make much less

difference to the NPV than those in the near future do, unless the discount rate

is very low (Snell 2011: 53). For example, a project that produces larger

benefit in the first 10 years and less benefit in the next 10 years will have a

greater NPV than a project which generate benefit evenly for 20 years, as the

benefits in the last 10 years have less present value than those in the first 10

years.

It would be controversial which would be more appropriate in the

development context, either a project producing bigger benefit in early years or

one which sustains the benefit for longer years even though its present value is

smaller. Nevertheless, it would be fair to say that sustainability assessment

should consider how properly the intervention can be maintained as well as

what are the expected benefits and the concurrent costs.

104

CHAPTER 5. METHOD OF SYNTHESIS - THE DAC FRAMEWORK AND

COST-BENEFIT ANALYSIS IN COMPARISON

5.1. Overview

Evaluation by definition is a task of determining the value of what is being

evaluated. The final step in the general logic of evaluation involves

synthesizing and integrating data into a judgement of merit or worth of the

evaluand (Fournier 1995). As Scriven (2007) claims, this is the most difficult

task in evaluation, and there is little consensus on best methods to reach these

needed conclusions (Julnes 2012). In development evaluations, it is often

required to make overall conclusions about whether a development intervention

was/has been successful and worth the resources required or can be improved.

To answer these questions is indeed the primary purpose of evaluation in order

to provide meaningful judgements about the overall value and success of the

intervention.

In the development of evaluation theories, there have been some

resistance to making 'evaluative conclusions' or overall judgements among

evaluators (Scriven 1994, 160). The common argument against making

evaluative conclusions is that the decision whether a program is desirable or

not should be made by policy-makers not by evaluators whose role is to provide

105

information for such decision making. However, it is the main clients of

evaluations, e.g., funding agencies, that usually want information on overall

conclusions about development interventions.

In development evaluations using the DAC criteria framework, an

overall, summative conclusion is often required to be drawn, by combining the

separate assessments in the five criteria. In some cases, the results of

individual conclusions under each criterion are integrated into a final rating, for

example, whether it is very successful, successful, partially successful (or

acceptable), or unsuccessful (or unacceptable). Donor agencies in Germany,

Japan, Korea, and the UK as well as development banks such as World Bank

and ADB use rating systems with such semantic scale.

Therefore, drawing an overall conclusion is basically to integrate the

findings from different criteria into a judgement about whether the intervention

was/has been a success. But it is challenging to combine the assessments in

multiple criteria. For example, how to make an appropriate conclusion on an

intervention which has been proved to generate significant beneficial impacts

but to cost much more than a similar intervention? When contrasting

advantages exist between two projects, e.g., one program is little more effective

while the other is substantially less expensive, how can these differing aspects

be combined into an overall valuation?

Stake (2004, 14-15) shows how criterial evaluation can be synthesized

in two difference approaches. First, a compensatory model allows a weakness

106

in one aspect to be compensated for by strength in another aspect. In the DAC

criteria framework, for example, if a project has high level of ‘relevance’ as it

is consistent with policy priorities of both countries, some low level of

effectiveness or sustainability may be accepted. Another approach is a

multiple cut-off model, in which a certain standard must be met on each of

several criteria, otherwise the whole thing is assessed as a failure. With this

approach, one can consider an intervention as unsuccessful if its effectiveness

is lower than an acceptable level no matter how relevant the purposes of the

project were.

In most cases using the DAC criteria framework including the Korean

agencies reviewed in the previous chapter, a compensatory model is applied,

and each criterion has a same level of importance. As such, synthesis of

assessments across multiple criteria involves how to weight values on different

criteria. It is critical to ensure balanced weights on the criteria, as there are

risks that a serious problem detected in one criterion can be compensated for

by good assessments in another criterion. As discussed in the earlier chapters,

there are interdependence and overlaps between the DAC criteria. When the

separate assessments in different criteria are combined, would the overall

conclusion be affected by the interdependence and overlaps? The challenge

of aggregation by combining different criteria make evaluation particularly

complex when there is no consensus on what constitute a good development

intervention or what is really in the best interests of people.

107

On the other hand, cost-benefit analysis applies one criterion, the net

present value (NPV) in measuring the social value of a project. One would

hardly agree that an intervention whose costs exceed the benefits it produces is

worthy or considered as a success. Based on the argument that a positive net

benefit is one condition of a successful intervention, I adopt the criterion used

in CBA, namely the NPV as a benchmark against which overall conclusions

drawn from the DAC criteria framework can be compared.

In the literature review, I discussed that CBA is a useful framework for

evaluation, which involves systemically identifying, measuring, valuing, and

comparing the costs and consequences of an intervention. This is the ground

that I argue CBA can be a good benchmark to the DAC criteria framework for

a summative evaluation. The issues around the application of CBA are also

discussed, especially the difficulties in applying the theory to practice, for

example, how to measure and monetize the values that some critics view as

priceless and how to determine the social discount rate. Such challenges seem

to exist in all evaluation methods. If one is skeptical about CBA for the

practical limitations, any methodology would make him/her skeptical in

valuing a public intervention.

While valuing is one of the central tasks in evaluation, there is little

consensus on which methodology is more appropriate or useful. This lack of

consensus on methods of valuing has been noted within the evaluation

community which has called for responding to the pressure of more systematic

108

approaches to the methods of valuing appropriate for evaluation, with better

integration of methods including economic approach, e.g., CBA (Julnes 2012,

King 2017). A dominant framework such as the DAC criteria may need a

fresh look for valuing and assessing development intervention incorporating

relevant methods in a constructive way.

5.2. Comparison of DAC Framework and CBA: Hypothetical Cases

5.2.1. Illustrative Comparison between Two Projects

Using the DAC criteria and suggested sample evaluation questions make it

difficult to differentiate the synthesis evaluation judgements about the project’s

success, especially when there are no agreed standards to make a judgement on

the project success.

Let us assume that two projects are evaluated. Project A was planned

for three years with the budget of $1 million. It was finished after three years

as planned at the total cost of $1 million, so on time and within the budget.

The size of beneficiary population is 10,000. The project was on the National

Development Plan and had a reasonable program theory that could support the

causal links between the input and the outcomes. It achieved all the outputs

planned and the outcome targets were met. The satisfaction level of

beneficiaries was very high. Hypothetically, the benefit that the project

109

brought to the beneficiaries of the project was valued as $90 per individual in

average. The benefit is expected to continue after donor funding ceases, given

that the local government would keep the promise to provide necessary

financial and human resources to maintain the facility.

From this limited information, some initial assessments can be made

under the DAC criteria. By the definitions of DAC criteria, Project A would

be assessed to be relevant as that the activity was consistent with the needs of

the target group and priorities of the recipient country. Under the program

theory, the intended results are plausibly related to the input and output. The

intervention could be rated as effective in terms of the extent to which the

intervention attains its objectives, which was fully achieved in this case. In

terms of the definition of efficiency, the project was finished on time and within

the budget, so it satisfies the main evaluative questions in the efficiency

criterion. It would be assumed to be sustainable based on the expectation that

the recipient government would take over the responsibility. Overall, Project

A has most of the aspects that the definitions of DAC criteria require.

Project B was planned initially as a three-year, $0.9 million project but

finished in 3.5 years at the total cost of $1 million. It targeted the total

population of 50,000. The project was consistent with the national

development strategy and policy of the recipient government, and the causal

links in the program theory seemed plausible. It achieved the outputs as

planned, but in terms of the outcome targets it only attained half of what was

110

intended. The beneficiaries valued the benefit of the project as $50 per person

in average, and the level of their satisfaction was moderate. If assessed with

the DAC criteria framework, Project B has some flaws in efficiency and

effectiveness criteria. It cost more than planned, that is, additional six months

and $100,000, and only achieved 50% of the objectives with moderate

beneficiary satisfaction though it did produce all the output. The summary of

characteristics of Project A and Project B are shown in Table 14.

Table 14. Comparison of Two Hypothetical Projects

Project A Project B

Budget planned

($) 1,000,000 900,00

Actual Cost

Expended ($) 1,000,000

1,000,000

(Additional 100,000)

Population

(beneficiaries) 10,000 50,000

Context

Of high priority in policy of

recipient/donor, and need of

target group

Consistency of activities and

output with the intended

outcome

Same as in Project A

Process

The project achieved its output

and outcome on time and

within the budget as planned.

The project exceeded the time

and budget in achieving its

output and outcomes.

Achievements

Intended output achieved

Outcome target achieved

100%

Beneficiary satisfaction level

is high

Intended output achieved

Outcome target achieved by

50%

Beneficiary satisfaction level

is moderate

Projected

benefit ($/c) 90 50

111

Compared to Project A, Project B would get less positive assessments in

the DAC criteria. Would it be fair? Given the limited information, we can

infer the value of each project. In Project A, the per capita benefit is assumed

as $90, so the total benefit would be $900,000 which is smaller than the total

cost, $1 million. Project B, on the other hand, produced a lower per capita

benefit of $50 but the total benefit would be $2.5 million, which is 2.5 times

larger than the total cost of $1 million. In sum, the net benefit of Project A is

negative, while Project B shows much larger net benefit than Project A.

In Project A, even though the target outcome was achieved, the net

benefit is negative. Possible reasons could include: the program design was

wrong in estimating the benefit to be generated by the targeted outcome, or the

program theory was wrong. The main problem with Project A is the total

benefit is smaller than the total cost. Even if a project is relevant to recipient’s

priorities and needs, it is possible a project has a negative net benefit. In this

case, the project is not worth investing. The project should be considered as

a failure regardless of how relevant this project is to the needs of targets or the

extent to which it achieved the objectives.

The above is a very simple, hypothetical comparison, but gives some

important implications to evaluations using the DAC five criteria. First, there

is a risk of a project being considered as effective because the project has

achieved the planned or intended outputs and outcome targets (as in the DAC

definition), even when the net benefit is negative. In other words, achieving

112

the objectives does not guarantee a success of the project. Second,

beneficiaries’ satisfaction level may mislead the judgement on effects of a

project, as individual satisfaction would be higher in a project giving more per

capita benefit to a small number of beneficiaries than a project covering a larger

number with less per capita benefit. Third, evaluations only using the DAC

criteria considers the cost-efficiency under the efficiency criterion which in

definition measures the output in relation to the inputs, but does not take into

account the net benefit which is more related to cost-effectiveness of the project.

If the output does not generate enough benefits to make up the costs, it cannot

be said the intervention was cost-efficient.

5.2.2. DAC Framework and CBA in Five Scenarios

Consider a hypothetical development project whose initial investment cost was

$1 million. To simplify the model, several assumptions are made: there is no

recurrent cost after the initial investment; the project is expected to yield

benefits worthy of $120,000 per year including all direct and indirect effects, if

it achieves the planned outcomes; and the life of the project is 25 years. The

initial cost was invested in Year 0, and the benefits would be generated for next

25 years starting Year 1. At the discount rate of 10%,14 the net present value

14 The appropriate level of social discount rate is one of the controversial issues in CBA

discussion, and there seems to be no authoritative answer for that. Some development

113

(NPV) and the benefit-cost ratio (B/C) of the project would be $89,225 and 1.09

respectively, as shown in Table 15.

Table 15. Net Present Value of Base Case

Year (n) Cost (C) Benefit (B) Present Value (PV) discount factor

(r=10%)

0 1,000,000 0 - 1,000,000 1.000

1 0 120,000 109,090.91 0.909

2 0 120,000 99,173.55 0.826

3 0 120,000 90,157.78 0.751

4 0 120,000 81,961.61 0.683

5 0 120,000 74,510.56 0.621

… … … … …

t Ct Bt Bt-Ct 1/(1+r)n

… … … … …

21 0 120,000 16,215.67 0.135

22 0 120,000 14,741.52 0.123

23 0 120,000 13,401.38 0.112

24 0 120,000 12,183.07 0.102

25 0 120,000 11,075.52 0.092

NPV 89,244.80

B/C 1.09

By the decision rules of CBA, this project is socially worthy as the total

benefits exceed the total costs, therefore is considered as successful.

banks and agencies have their own guidelines for discount rate: e.g., World Bank has

applied 10-12% (1998), ADB used 12% until 2016 and now applies 9% (2017), and

Korea’s EDCF suggest 10-12% (EDCF 2012). The 10% discount rate used in this

analysis is a result of referring to these guidelines. One can argue that it is rather

arbitrary, but the level of discount rate in a hypothetical analysis like this one does not

make a dramatic difference in drawing conclusions.

114

Supposing that this project is also assessed as a success by meeting the

requirements that each of the DAC criteria requires, I will use this model as a

base for the following analysis.

Some variations from this base model can be made according to what can

affect the assessment in each DAC criterion. I consider the following five

scenarios which would change the value of project in different ways:

1) The project achieved the planned output on time and within the

budget, but did not yield the full expected outcome because a part of

the facilities is not in operation. With other things being hold

constant, this would affect the assessment in effectiveness criterion.

2) The project achieved the planned output and expected outcome

within the budget, but the completion of the project was delayed.

Delays in project implementation alters the assessment in efficiency

criterion.

3) The project achieved the planned output and expected outcome on

time and within the budget, but produces a negative externality, such

as environment degradation or higher risk to safety or health. This

unintended negative result is related to impact criterion.

4) The project achieved the planned output on time and within the

budget, and is in full operation yielding the expected benefits for the

first few years. After a few years, however, the project is somehow

forced to reduce or stop its operation for some reason, e.g., financial

115

problems. As a result, benefits of the project continue only

partially for the rest of the project life, or discontinue. The

likelihood of benefits’ continuation is primarily associated with

sustainability issue.

5) The project achieved the planned output on time and within the

budget and is ready for full operation. But the demand for the

project falls short of the capacity that the project can serve. The

problem is identified that during the project planning, the demand

for the project was misjudged and may be due to misjudgment of

demands during the project planning, which is first to be assessed in

relevance criterion. If a project is not relevant, the risk of lower

effectiveness or efficiency, and even sustainability

In practice, these scenarios are often observed. In the following section,

I will discuss how each of the above cases affect the evaluation conclusions

differently in the DAC framework and in CBA.

Effectiveness

By definition in the DAC criteria, effectiveness measures the extent to which

an aid activity attains its objectives. The first case considers a project which

achieved the planned output, but did not yield the full expected outcome

because a part of the facilities is not in operation. The partial operation would

yield yearly benefit less than expected $120,000, assuming other things are

116

constant. The lower yearly benefit would make the NPV of the project smaller,

as in Table 16. With the assumption made earlier, full operation is expected

to yield yearly benefit of $120,000. If the project produced 90% of its

expected outcome, the NPV would fall to -$19,680, and the B/C to 0.98.

When it achieved only 80% of its planned outcome, the results would be -

$128,604 in NPV and 0.87 in B/C.

Table 16. NPV and Effectiveness

Base Scenario 1

(90% of outcome)

Scenario 2

(80% of outcome)

Y C Bt0 PV0 Bt1 PV1 Bt2 PV2

0 1,000,000 0 - 1,000,000 0 - 1,000,000 0 - 1,000,000

1 - 120,000 109,091 108,000 98,182 96,000 87,273

2 - 120,000 99,174 108,000 89,256 96,000 79,339

3 - 120,000 90,158 108,000 81,142 96,000 72,126

4 - 120,000 81,962 108,000 73,765 96,000 65,569

5 - 120,000 74,511 108,000 67,060 96,000 59,608

… … … … … … … …

21 - 120,000 16,216 108,000 14,594 96,000 12,973

22 - 120,000 14,742 108,000 13,267 96,000 11,793

23 - 120,000 13,401 108,000 12,061 96,000 10,721

24 - 120,000 12,183 108,000 10,965 96,000 9,746

25 - 120,000 11,076 108,000 9,968 96,000 8,860

NPV 89,245 - 19,680 - 128,604

B/C 1.09 0.98 0.87

117

Judging by the criterion in CBA, both project in Scenario 1 and 2 would

be assessed as failure. Partial operation, thus fulfilling only 90% or 80% of

its expected outcomes, makes the NPV negative and B/C smaller than 1, which

means that the project is no more socially worthy nor successful by the standard

of judgement in CBA. In the DAC framework, however, how much would it

affect the results of overall conclusion? Assessment in effectiveness would be

lower than the base model, as the project achieved its objectives only partly.

But it would hardly affect the assessment of relevance, as long as it is consistent

with the country priority and donor’s strategic objectives, which is assumed met

in the base case. Whether the less than expected achievement of outcome is

due to the project design in addressing the development needs

Efficiency

A delay in project completion would affect its efficiency according to the

definition of the DAC criteria. The second case falls into this: the project

achieved the planned output and expected outcome within the budget, but the

completion of the project was delayed. Table 17 shows how one and two years

delays affect the NPV and B/C of the base case.

One-year delay in completion of the project make the benefit realized

from one year later, and even though the project would operate fully for the

lifespan, the present value of total benefit falls by 9%. The NPV would turn

to negative at -$9.777 with the B/C of 0.99. If the project delayed two years,

118

the NPV would fall to -$99,789.

Table 17. NPV and Efficiency

Base Scenario 1

(1-year delay)

Scenario 2

(2-year delay)

Y Ct Bt0 PV0 Bt1 PV1 Bt2 PV2

0 1,000,000 0 -1,000,000 0 -1,000,000 0 -1,000,000

1 - 120,000 109,091 0 0 0 0

2 - 120,000 99,174 120,000 99,174 0 0

3 - 120,000 90,158 120,000 90,158 120,000 90,158

4 - 120,000 81,962 120,000 81,962 120,000 81,962

5 - 120,000 74,511 120,000 74,511 120,000 74,511

… … … … … … … …

21 - 120,000 16,216 120,000 16,216 120,000 16,216

22 - 120,000 14,742 120,000 14,742 120,000 14,742

23 - 120,000 13,401 120,000 13,401 120,000 13,401

24 - 120,000 12,183 120,000 12,183 120,000 12,183

25 - 120,000 11,076 120,000 11,076 120,000 11,076

26 120,000 10,069 120,000 10,069

27 120,000 9,153

NPV 89,245 NPV -9,777 NPV -99,798

B/C 1.09 B/C 0.99 B/C 0.90

How would this delay in project completion, which is fatal enough to

make the project a failure with a negative NPV, affect the result of the DAC

framework evaluation? As seen in the evaluation cases in the previous chapter,

many projects were evaluated as success or partially success at the worst case,

even though a serious delay was observed. In the scenarios above, the output

and outcome has been achieved though delayed, so the assessment in

119

effectiveness criterion would not be affected. Assuming the benefits continue

for the project life, there seems to be no reason for changes in the assessments

in impact and sustainability criteria. With only some deduction in efficiency

criterion, the project is likely to be evaluated as a success in the scenarios, while

the CBA shows it may not be the case.

Impact

Assume that the project achieved the planned output and expected outcome on

time and within the budget, but it produces serious air pollution (negative

externality) which is estimated to cost $30,000 per year. It reduces the yearly

benefit to $90,000, and as a result, the NPV turns to negative.

Table 18. NPV and Negative Externality

Base Unintended Result

(Negative Externality)

Y C Bt0 PV0 Neg.Ext. PV1

0 1,000,000 0 -1,000,000 0 -1,000,000

1 - 120,000 109,091 -30,000 98,182

2 - 120,000 99,174 -30,000 89,256

3 - 120,000 90,158 -30,000 81,142

4 - 120,000 81,962 -30,000 73,765

5 - 120,000 74,511 -30,000 67,060

… … … … … …

21 - 120,000 16,216 -30,000 14,594

22 - 120,000 14,742 -30,000 13,267

23 - 120,000 13,401 -30,000 12,061

120

24 - 120,000 12,183 -30,000 10,965

25 - 120,000 11,076 -30,000 9,968

NPV 89,245 NPV -183,066

B/C 1.09 B/C 0.82

How much would this negative impact affect the results of DAC

framework evaluation? This type of negative externality, even if it is serious,

would be assessed under impact and possibly sustainable criteria, but not likely

to be considered in relevance, efficiency, or effectiveness criteria.

Sustainability

Consider a project which achieved the planned output on time and within the

budget, and is in full operation yielding the expected benefits for the first few

years. After a few years, however, the project is somehow forced to reduce or

stop its operation for technical problems. Scenario 1 assumes 50% of

operation from 5 years since completion, while Scenario 2 that the operation

will stop after 10 years. In both cases, the net benefit with discount factor falls

considerable, making the project not worthy at all.

121

Table 19. NPV and Sustainability

Base

Scenario 1

(benefit falls to half

after 4 years)

Scenario 2

(benefit discontinues

after 10 years)

Y C Bt0 PV0 Bt1 PV1 Bt2 PV2

0 1,000,000 0 -1,000,000 0 -1,000,000 0 -1,000,000

1 - 120,000 109,091 120,000 109,091 120,000 109,091

2 - 120,000 99,174 120,000 99,174 120,000 99,174

3 - 120,000 90,158 120,000 90,158 120,000 90,158

4 - 120,000 81,962 120,000 81,962 120,000 81,962

5 - 120,000 74,511 60,000 37,255 120,000 74,511

6 - 120,000 67,737 60,000 33,868 120,000 67,737

7 - 120,000 61,579 60,000 30,789 120,000 61,579

8 - 120,000 55,981 60,000 27,990 120,000 55,981

9 - 120,000 50,892 60,000 25,446 120,000 50,892

10 - 120,000 46,265 60,000 23,133 120,000 46,265

11 - 120,000 42,059 60,000 21,030 0 0

12 - 120,000 38,236 60,000 19,118 0 0

… … … … … … … …

21 - 120,000 16,216 60,000 8,108 0 0

22 - 120,000 14,742 60,000 7,371 0 0

23 - 120,000 13,401 60,000 6,701 0 0

24 - 120,000 12,183 60,000 6,092 0 0

25 - 120,000 11,076 60,000 5,538 0 0

NPV 89,245 NPV -265,186 NPV -262,652

B/C 1.09 B/C 0.73 B/C 0.74

In the DAC framework, both scenarios would affect the result of

assessment in sustainable criterion, and maybe in impact criterion in Scenario

2 to some extent. But they are not likely to change much the results in other

criteria, i.e., relevance, efficiency, or effectiveness. Following the DAC

122

definition, evaluators would generally assess whether the benefit will be likely

to continue, not specifically how much benefit would continue. They may

consider Scenario 1 is better than Scenario 2, because the former would at least

yield some continuing benefit for the period of analysis, though the NPV in

both cases are similar.

A Case: Project for Modernization of the Traffic Management System in Erbil

The project consists of three site constructions: 1) driver’s license test site, 2)

vehicle registration and inspection site, and 3) license plate manufacturing site.

All constructions and equipment provision were completed as planned. The

evaluation found that “only the driver’s license test site is under normal

operation, while the equipment for vehicle registration and inspection and

license plate manufacturing are not utilized at all”. As to the driver’s license

test site, the project established separate facilities for different driving skills

such as bus and large trailer truck drivers in addition to regular license. The

facilities for bus and trailer truck driving test sites, however, were not used at

all, because they do not have the system which issues separate licenses for those

special vehicles. In the target region, a driver with a regular driver’s license

can operate any type of vehicle including buses, large trucks and trailers. To

summarize, out of three sites the project constructed only one site is on

operation which is not fully utilized. The cost of establishing those unused

facilities is not clearly mentioned in the evaluation report. Nevertheless, it is

123

rather obvious that the benefits would not generated as planned.

But the assessment under DAC criteria, it only affects the score in

efficiency. Because in relevance criteria, the evaluators looked into the

relevance of project objectives to strategy of target country and region rather

than project design and components whether they are relevant to the context of

the target area. In effectiveness criteria, they assessed the outcomes which are

the results of now-operating facilities. Looking at the outcomes the operating

site generates, e.g., the driver’s license tests and necessary services, the

evaluators deemed that ‘the project was ‘very effective’ in attaining the

performance goals' (p. 62). This is a problem of objective setting, because in

the PDM the objectives are not related to what the project can actually achieve,

as its project goals are simply put as “established and operate an advance traffic

management system, with indicators 15% increase in licensed drivers and

register vehicles by 2008, reduction in traffic accidents, improved customer

satisfaction with DTC service. Sustainability is also measured based on

whether the present benefits from the one site, rather than from what would be

expected with the full operation.

124

5.3. Comparative Case Study - Evaluations of Water Supply Projects15

This section presents a comparative case study based on two evaluations of

water supply projects implemented by EDCF and KOICA. The evaluation of

Nicaragua project was conducted in 2012 (EDCF 2012) and the other in 2016

(KOICA 2016). Two agencies had different rating methods in the evaluation

guidelines according to which each of the evaluations was conducted.

Following the brief project descriptions, the evaluation results using the

DAC framework are presented. Then I discuss the results of cost-benefit

analysis of each project followed by the discussion on conceptual framework

and methodology of CBA for water projects. Finally, I discuss how the CBA

results can complement the implications

5.3.1. Project Description

The Water Supply Expansion Project in Juigalpa, Nicaragua

This project was developed to solve the chronic water shortage problem in the

city of Juigalpa, Nicaragua by establishing a new water supply system using an

alternative water resource. Juigalpa is a capital city of Chontales, a state in

15 The case studies are based on the two ex-post evaluations which I participated in and

conducted CBAs for. Both evaluations are published and available on the website of

each agency.

125

the middle part of Nicaragua, with a population of approximately 70,000. The

total cost of the project was $40.5 million, out of which $33 million was

supported as a loan by EDCF. The project started in 2006 and the construction

was completed in February 2010. The ex-post evaluation was conducted in

2012.

The purpose of the project was to provide safe and stable water supply

in Juigalpa. The major problem in water supply in the city before the project

was the water source was not reliable both in quantity and quality. The

existing source, River Pirre, had a very irregular and insufficient volume of

water. In dry seasons, the water plant often had to stop its operation because

of the water shortage. Restricted water rationing was enforced, and the water

was supplied on average once in three days in rainy seasons and two or three

times in a month in dry seasons. The access rate to water supply system was

77% but the actual supply of water was limited. People stored water in a water

tank or buckets when tap water was available, and used the stored water or a

well when the water supply was cut off. Those who did not have water supply

connection usually used wells or collected rainwater. People also purchased

water from water vendors, because the wells often went dry especially in dry

seasons.

The new water supply system was designed to use a new water source of

Lake Nicaragua which is located about 30km southwest from Juigalpa.

Construction of the new system included water intake tower and facilities in the

126

Lake Nicaragua, aqueducts and pressurizing pump stations, a new purification

plant, and pipe replacement and new connections. The total production

capacity of the new system was 650,000m3 per month, approximately 21,000m3

per day. The project descriptions, main achievements, and the results of

evaluations are summarized in Table 20, 21, and 22.

Table 20. Summary of Water Supply Project in Juigalpa, Nicaragua

Project Title Water Supply Expansion Project in Juigalpa, Nicaragua

Overall Goal To improve the quality of life of people in Juigalpa

Project purpose To supply clean and safe running water

Activities/Outputs Construction of the water supply system

Water intake facilities (240l/s)

- Water pumping stations

- Water purification and reservoir facilities

- Pipelines and network

Consulting

-supervision and technical assistance

Population in

target area 68,410

Project Cost

(external support) $40.5 million ($33.1 million)

Source: organized by author based on EDCF (2012).

Table 21. Achievement of Nicaragua Project

Before project (2009) After project (2012)

Production capacity

(m3/m) 340,000 650,000

Population with piped

water supply 77.2% 95.1%

Tap water consumption

(m3/m, per household) 14.4 18.6

Total water consumption

(m3/m) 126,181 206,591

Population in target area 68,410 70,000

127

Table 22. Nicaragua Project: DAC Criteria Evaluation

Criterion Sub-category Rating

Description

Rating

Value

Relevance -Consistency with water supply and

sewage system development policies and

priorities of the partner country

-Consistency with the EDCF's assistance

strategies

-Harmonization with International

Development Cooperation norms such as

MDG, cross-cutting issues and Water

supply aid policies

-Adequacy of Feasibility study and

Project design

Relevant 3

Efficiency -Efficiency of project cost

-Efficiency of project time period

-Efficiency of project implementation

procedures

Highly

Efficient

4

Effectiveness -Achievement of planned outputs

-Achievement of project objectives

-Application of appropriate technology on

a local level

Highly

Effective

4

Impact -Socio-economic impact

-Systemic impact

-Impact on gender equality and

environment

Highly

Influential

4

Sustainability -Systemic sustainability

-Financial sustainability

Highly

Sustainable

4

Overall Evaluation Score Highly

Successful

3.8

Source: EDCF (2012, 8)

The Project for the Construction of Water Supply System in Buon Ho, Vietnam

This is a KOICA project implemented in 2010-2013 to support the expansion

of water supply system in Buon Ho Town in Dak Lak Province, Vietnam.

Buon Ho Town is a district-level city raised to the urban status in 2009, with a

population of 55,000 as of 2010. The local government highlighted the

128

increasing demands for basic infrastructure in the Town as a result of rapid

urbanization with concurrent population growth, and requested KOICA to

support the construction of water supply facilities. The total project cost was

$5.15 million, out of which KOICA funded $4.5 million including construction

of water intake and purification facilities, provision of equipment, and technical

assistance. The project description is summarized in Table 23.

Table 23. Summary of Water Supply Project in Buon Ho Town, Vietnam

Project Title Project for the Construction of Water Supply System in Buon

Ho Town, Vietnam

Overall Goal To improve the quality of life of people in Buon Ho Town

Project purpose To supply clean and safe running water

Activities/Outputs To establish the water supply system

- Water purification facilities (5,600㎥ per day)

- Water intake facilities

- 60km of the pipeline

• To provide equipment and training for the use of them

• To build the capacity for the management and operation of

waterworks

Population in

target area

54,218

Project Cost

(external support)

$5.1 million ($4.5 million)

The project mainly focused on construction of an additional water supply

system: nine water intake facilities (seven from underground water, two from

surface water), water purification facilities with a capacity of 5,600m3 per day,

and 60km of pipelines. The construction cost 91% of the total budget, while

the rest, less than 10%, was spent in providing equipment, training programs,

and administration. The Vietnam government also spent about $0.65 million

129

in construction of additional pipelines and civil complaint settlement during the

implementation.

The purpose of this project is to supply clean and safe running water to

people in Buon Ho Town, Vietnam, by expanding the existing water supply

system which was presumed to be insufficient to serve the increasing

population in the near future. Before the project was initiated, the exiting

water supply system with a capacity of 4,200m3 per day had served about 30%

of the population in Buon Ho Town, mainly residents in three wards located in

the center of the Town. Those who did not have access to the water supply

system used water from a well, mostly on the premises. At the time of project

appraisal, it was reported that because of drinking and using well water,

residents were suffering from water-related diseases such as diarrhea, parasitic

infections, and skin disease. Although there was lack of information on how

prevalent and serious the problem was, the health issue was considered as a

rationale for the project. Accordingly, the project aimed at achieving two

objectives: to increase the population connected to piped water with wider

water supply network covering all seven wards in the Buon Ho Town; and to

improve the health conditions of residents. Table 24 shows the objectives and

outcome indicators set for the project monitoring and evaluation.

130

Table 24. Objectives and Targets in Vietnam Project

Indicators Baseline

(2012)

Target

Objectives

1. To improve access to

safe and drinkable

water

Increased access to

waterworks

15% 75% by 2013

90% by 2015

2. To improve the health

of local residents

Reduced number of

patients contracting

water-born diseases

Reduced cases of

parasitic infections

1,800 people

18.3%

Reduction of

50%

Overall Goal

To propel community

development by

establishing social

infrastructure

Reduced share of

people in extreme

poverty

6.5% Reduction of

20%

Source: KOICA (2016)

The target of access rate to piped water to 75% and 90% by 2015 was set

following the national target for urban area. The construction was completed

in March 2013. After the project, the coverage of water supply network in

Buon Ho Town was expanded from 3 wards to 7 wards geographically, with the

access rate from 30% to 64% as of August 2016.

Table 25. Achievement of Vietnam Project

Before project (2012) After project (2016)

Production capacity (m3/m) 126,000 294,000

Population with piped water

supply 31.1% 64.1%

Tap water consumption

(m3/m, per household) 18.5 15

Total water consumption

(m3/m) 82,320 132,956

Population in target area 54,218

Three wards in center

area

All seven wards

131

The ex-post evaluation was in 2016. The main framework for

evaluation questions and assessments was the DAC criteria. Table 26

summarizes the findings from DAC framework evaluation.

Table 26. Vietnam Project: DAC Criteria Evaluation

Criteria Sub category Rating Scale

Relevance

1. Relevance to development strategy and

needs of partner country, and to Korea’s

development cooperation strategy

④ 3 2 1

2. Relevance of design and implementation 4 ③ 2 1

3. Ownership of the partner country ④ 3 2 1

Average (a) 3.7

Efficiency

1. Cost Efficiency (within the planned budget) ④ 3 2 1

2. Time Efficiency (within the planned time

frame) ④ 3 2 1

3. Results against inputs 4 3 ② 1

Average (b) 3.3

Effectiveness/

Impact

1. The extent to which objectives are met 4 ③ 2 1

2. Positive or negative impacts on society,

economy, institutions 4 ③ 2 1

Average (c) 3.0

Sustainability

1. Human resources, institutional and financial

aspects ④ 3 2 1

2. Maintenance capability and management

system ④ 3 2 1

Average (d) 4.0

Total Score (a+b+c+d)

Overall Assessment

14.0

Very Successful

Source: KOICA (2016, iii)

132

Table 27. Two Projects in Comparison

Juigalpa, Nicaragua

(2006~10)

Buon Ho, Vietnam

(2010~13)

Initial Costs USD 40,520,000 USD 5,150,000

External

support

33,134,000

(loan by EDCF)

4,500,000

(grant by KOICA)

Population in

target area 68,410 54,218

Major changes

Before project

(2009)

After project

(2012)

Before project

(2012)

After project

(2016)

Production

capacity

(m3/m)

340,000 650,000 126,000 294,000

Population

with piped

water supply

77.2% 95.1% 31.1% 64.1%

Tap water

consumption

(m3/m, per

household)

14.4 18.6 18.5 15

Total water

consumption

(m3/m)

126,181 206,591 82,320 132,956

Leakage ratio

(%) 15.88 53.05 11 18

5.3.2. Evaluation Results in the DAC Criteria Framework

Both projects were evaluated as “very successful” based on the overall rating

of evaluation results which is calculated with the sum or average of the scores

for the five evaluation criteria. The difference in final scores is negligible as

133

well as not very comparable because the two agencies had different averaging

methods. I summarize the main rationale for the judgment by criterion, while

there are detailed descriptions on pros and cons in the reports.

1) Relevance

Both projects were assessed as relevant to the development policy and

water strategies of each country, the needs of the target area and beneficiaries,

as well as the Korea’s development cooperation strategies. Regarding the

project design, there were found some negative aspects. The Nicaragua

project had not had a set of outcome indicators and targets, which was not

required at the time of project design. Lack of consideration of a sewage

system was also pointed out. As to the Vietnam project, the outcome

indicators were not selected based on clear rationale and the quantitative targets

were not realistic to achieve. Nonetheless, the overall rating of ‘relevance’

assessments was 3 and 3.33 respectively out of 4.

2) Efficiency

Main questions in the ‘efficiency’ criterion were whether the project was

completed within the planned schedule and budget, and if the project was

implemented in the most efficient way in terms of process and technology.

Both projects were completed within the initially planned budget and with

slight delays in schedule. The process and technology for each project were

134

found to be efficient. Cost-benefit analysis was conducted in both evaluations,

and the net benefit in each project was positive. Overall, both projects were

given high score in ‘efficiency’ criterion, 4 in the Nicaragua project and 3.3 in

the Vietnam project out of 4.

3) Effectiveness

The assessments of ‘effectiveness’ were conducted based on the outputs

and outcomes achieved against the targets in the plans. Both projects

produced all outputs planned, so were recorded as having achieved the 100%

of output targets. All facilities were operational in good condition and with

proper maintenance. The quality of water supplied met the national standards.

As to the outcomes, the population having access to the water supply

system was used as an indicator. In the Nicaragua project, the access to

waterworks increased from 77.2% to 95.1% in three years since the completion

of project. In the Vietnam project, the access rate doubled after three years

from 31% to 64%, but the achievement was not met the target value which was

set following the national target, without due consideration of what the project

would have been able to achieve.

4) Impact

The assessment of ‘impact’ of the projects relied on observations and

household surveys. The Nicaragua project was assessed to be ‘highly

135

influential’ as the project was deemed to have made significant contributions to

the improvement on the quality of life and living conditions of the direct

beneficiaries. It also considered the socio-economic impacts on vulnerable

social groups such as women, children and the poor: the most benefited were

women and children of newly connected households in the poorest area who

had been responsible for fetching water from a public well.

The Vietnam project evaluation found that the project contributed to the

improved living standards of people in the target area to a certain degree, based

on the results of the household survey conducted for the evaluation. The

majority of respondents said that the convenience in water usage was improved,

while increase in the amount of water they use was only moderate, about 5%.

5) Sustainability

Both projects were found very sustainable in terms of institutional,

financial and human resource managements. In both case, there seemed to be

increasing demands for water service. The systems were operated and

managed by government-owned companies which were assessed to be

financially stable or supported by the government and to have relevant human

resources and operational capacity.

136

5.3.3. Results of Cost-Benefit Analysis

5.3.3.1. Theoretical Basis of CBA for Water Supply Intervention

Estimating Costs of Water Supply Intervention

Costs of water supply interventions are relatively straightforward. They

generally consist of two types of costs (Hutton and Haller 2004). Initial

investment costs include planning and supervision, construction, hardware, and

education for use of hardware. Recurrent costs, i.e., costs required to maintain

the interventions, may include: operational materials, maintenance of hardware

and replacement of parts, costs of water treatment and distribution, regulation

and control of water supply, management and education of human resources,

etc.

Estimating Benefits of Water Supply Intervention

Water supply interventions in developing countries often come with sanitation

components, of which the primarily purpose is to improve health conditions.

Consequently, literature on water supply interventions in development context

has a strong focus on health improvements, broadly dividing the benefits into

two categories: health benefits and non-health benefits (Hutton, Haller, and

Bartram 2007, OECD 2011). Beneficial health impacts of water supply are

associated with reduced incidence of water-related diseases, which leads to less

137

expenditure on treatment and avoided productivity loss from sickness and death.

Non-health benefits include time saved in water collection, increase in

productivity, and improvement in quality of life.

Hutton and Haller (2004) categorized four distinct groups that would

benefit from water service improvements: the health sector, patients, consumers,

and agricultural and industrial sectors, as summarized in Table 28.

Table 28. Costs and Benefits in Water Supply Interventions

Costs

Investment costs Planning and supervision, hardware, construction, protection

of water sources, education of

Recurrent costs

Operational materials, maintenance of hardware and

replacement of parts, costs of water treatment and

distribution, regulation and control of water supply, etc.

Benefits

Health-related

benefits

Government health-care costs saved

Household health-care costs saved

Saved time of patient or caretaker

Benefits from mortalities postponed

Value of avoided days lost at work or at school

Consumers Time savings related to water collection

Switch away from more expensive water sources

Increased quality of life

Agricultural and

industrial sector Improved productivity due to improved water supply and

more efficient management of water resources

Source: arranged by author based on Hutton and Haller (2004) and Cameron (2011).

Benefits from reduced incidence of water-related diseases may accrue

not only to patients and caretakers who would save treatment-related costs and

time, but also to the health sector which would spend less health-care

expenditure. Consumers of water service would benefit from time savings

138

related to water collection and less expensive water sources. Agriculture and

industrial sectors would have benefits of improved productivity due to more

efficient management of water resources. In addition, Cameron (2011) added

wider social benefits such as environmental gain associated with amenity value

of land and social capital benefits related to increased confidence and trust.

When conducting a CBA, it is useful to apply the concepts of

microeconomic theory to measuring the benefits (Boardman et al. 2011) 16 .

Figure 14 and 15 illustrate the changes in consumer surplus before and after a

water supply project.

Suppose that the initial price of water is given by P1 and the quantity

consumed at P1 was Q1. A water supply intervention that reduces the price of

water from P1 to P2 would increase the quantity consumed from Q1 to Q2,

resulting in increased benefits to consumers which is equal to the shaded area

‘a’ plus ‘b’ (Figure 14). Consumers gain benefits in two ways: they pay a

lower price than before in consuming same amount of water (represented by

the area ‘a’), and they consume additional quantity of water by Q2- Q1 (the area

‘b’). We can also think of a case where the demand curve shifts to the right

(D’) as shown in Figure 15. For example, better water quality after the new

water system may result in the increase in demand, and the increase consumer

16 The concepts of microeconomic model presented here are drawn from Chapter 3.

Microeconomic Foundations of Cost-Benefit Analysis in the referred literature.

139

welfare equal to the area ‘c’ would be estimated by improvement in health.

P1: price of water used before connection

P2: price of water (water tariff) after connection

Q1: water consumption before connection

Q2: water consumption after connection

To estimate the consumer surplus increased by a project requires

quantitative information such as changes in water price and quantity of water

consumption before and after the project, and the number of population or

households that benefit from the project. Data on factors which affect the

demand curve shift, for example, the extent of benefits in health or environment,

are also needed in case there are observed improvements as consequences of

Figure 14. Consumer Surplus of

Water Supply Project (1) Figure 15. Consumer Surplus of

Water Supply Project (2)

140

the project.

5.3.3.2. Estimating Benefits of Two Water Supply Projects

Based on the theories discussed above, CBA was conducted for the two water

supply projects. Since the baseline information on the price or the amount of

water people used before the project was not available, the evaluation team

including the author collected data from household surveys and interviews.

Statistics from the local governments and water management companies were

also incorporated in estimation of NPVs. The information provided in this

section is based on the evaluation team’s findings on each of the projects, which

were published by the respective agencies in EDCF (2012) and in KOICA

(2016).

In the Nicaragua project, 77% of households had the access to

waterworks before the project but they could not fully enjoy the service. The

water supply was very unstable due to the irregular and insufficient water

source. It was found that water supply was restricted on average to one in

three days in rainy seasons and two or three days in a month in dry seasons.

People stored tap water when supplied in tanks or buckets and used the water

during shutdowns or fetched water from wells. In dry seasons when water

supply was far short of demand, people had to buy water from private water

trucks as wells often dried up. Those who had no assess to water supply

141

system mainly used wells in rainy seasons, while they also had to buy water

from water vendors in dry seasons.

After the project, the water supply was expanded to 95% of population.

The new system provides stable and sufficient water to cover the demand

regardless of rainy or dry seasons. Households with waterworks connection

could use the tap water 24 hours and 7 days a week. The price of tap water,

the water tariff, is $0.2/ m3 which is much lower than the price from water trucks

(the average price was approximately $10/m3) or the cost of fetching water from

wells (more than 50 hours for 1m3 on average), so they no more used wells or

bought water from private vendors. The quantity of water consumption also

increased as they could use water with more convenience at much less price.

The extent to which the price and quantity change in water consumption

was varied according to the pattern of water usage before the project which also

depended on geographical area and income level. Three groups were

identified: the first group of households with low income and in a remote area

where the water supply had not been installed before the project; the second

group in the water connected residential area with medium income; and the

third, households in high-income commercial area whose water consumption

was relatively large. Table 29 summarizes the information of these three

groups before the project.

142

Table 29. Patterns of Water Consumption in Juigalpa before the Project

Before

project

Remote Area Residential Area Central Area

Areas without

water connection

(low income)

Residential area with

water connection

(medium income)

Commercial area with

water connection

Population household: 2,000

population: 12,000

household: 6,000

population: 36,000

household: 3,000

population: 18,000

Monthly

income per

HH

$100-150 $150-300 over $300

Major source

of income

house keeper

($100/month)

civil servant, teacher,

self- employed

store, restaurant,

self-employed

Use of water Rainy

Season

Dry

Season

Rainy

Season

Dry

Season

Rainy

Season

Dry

Season

Amount of

use per HH

(m3/month)

3~4 3~4 20 10~15 40 15~30

Major source

of water

public

well

water

vendor

tap water

or public

well

tap water

or

water

vendor

tap water

tap water or

delivery

(in person)

Costs due to

the lack of

water supply

- time spent for

fetching water

- water price (from

water vendor)

- inconvenience

and time spent for

purchasing water

from water truck

- water price (from

water vendor)

- inconvenience and

uncertainty when

using tap water

- cost for delivering

water by vehicle

- inconvenience and

uncertainty when

using tap water

Source: EDCF (2012)

Table 30 shows the changes in quantity of water that the beneficiaries

consumed before and after the project. Households in remote area who did

not have water supply connection used to consume relatively small amount of

water, approximately 3m3/month on average before the project, because they

had to fetch water from public wells in distance or pay high price to water

143

vendors. With access to inexpensive water supply, it was found that the water

consumption increased to 15m3/month for average household. The other

groups with water connection before the project now could enjoy 24 hours and

7 days stable water supply, resulting in increase in average water consumption.

The avoided cost of uncertainty and inconvenience in using tap water before

the project is also a beneficial effect of the project.

Table 30. Changes in Water Consumption Before and After the Project

(Unit: m3/month)

Area Remote Area Residential Area Central Area

Season Rainy Dry Rainy Dry Rainy Dry

Before

project

Wells 0 3 0 0 5 0

Water trucks 3 0 5 0 0 0

Tap Water 0 0 10 20 20 40

Total 3 3 15 20 25 40

After

project

Wells 0 0 0 0 0 0

Water trucks 0 0 0 0 0 0

Tap Water 15 15 25 25 45 45

Total 15 15 25 25 45 45

Source:

Compared to Juigalpa, Nicaragua, Buon Ho Town, the target area of the

Vietnam project, has relatively abundant rainfalls throughout the year, even in

so-called dry seasons. Most of the households own a private well in the yard,

and many of them were equipped with electric motor pumps. Household

survey and interviews with residents revealed that the amount of water they

144

used before connected to the water service was about 15~25m3 per month, much

larger than those in Juigalpa used. Table 31 compares the patterns of water

use in Buon Ho Town before and after the project.

Table 31. Patterns of Water Consumption in Buon Ho Town

before and after the Project

Before connected After connected

Major water source Well on premises Tap water, well on premises

Amount of water use

(m3/month) 15~25

10~30 from tap water

5~10 from well

Use of water

For drinking bottled water, well bottled water, tap water

For cooking, washing

dishes well tap water, partly well (10%)

For shower, washing

clothes well tap water, partly well (15%)

For cleaning, etc. well well, partly tap water

Benefits of water

connection

- Saved cost of using water pump (maintenance,

replacement of parts, electricity, etc.)

- Saved time of collecting water (mostly from well in

premises, rarely from outside)

- Reduced inconvenience and uncertainty when using well

- Reduced purchase of bottled water for drinking

Interestingly, many households kept using wells even after connected to

water supply system. Only 10% of newly connected households responded

that they completely switched from well to tap water. The others still used

well water for washing and cleaning, because they did not have much

inconvenience in using well waters while they felt the water tariff was not

145

considerably cheaper than using wells.

The main benefits from water supply come from lowered price and

improved convenience of access to water in terms of quantity (OECD 2011, 46).

In Buon Ho Town, difference between the price of tap water and that of using

wells seemed to be insignificant, so it was hard to expect substantial increase

in consumer surplus. In terms of quantity, they used to consume sufficient

water from wells using electric pump and water tanks before the project, so

increase in the amount of water consumption was limited for average

households. Small number of households who did not installed an electric

pump on the well benefited significantly in terms of convenience from the water

connection.

5.3.3.3. Results of Cost-Benefit Analysis in Comparison

The two projects share similar characteristics in term of the project activities

and expected outputs and outcomes, the size of beneficiary population, and the

level of development in the target area. Juigalpa and Buon Ho Town are in

urban areas with basic social infrastructure such as major roads, electricity,

schools and health services. The standard of living was similar and relatively

higher than rural areas. In both areas, there had existed a water supply system

before the interventions, with problems such as insufficient capacity.

The major difference lies in the costs of project. The initial investment

146

in the Nicaragua project is approximately $40 million, eight times larger than

that in the Vietnam project, $5.2 million. If calculating the project cost versus

the number of beneficiaries, which is the often used method for cost-

effectiveness comparison, the cost per capita in the Nicaragua project, about

$580, is much higher than that in the Vietnam project, which is around $95.

Simple comparison of the costs to number of population can be misleading, as

it does not consider the magnitude of benefits. Cost-benefit analysis provides

a window to view the effects of the projects in consideration of the size of

benefits compared to the costs.

Both interventions were basically to expand the water supply systems

and networks, so the data for estimating costs and benefits, e.g., management

costs, water price and consumption, etc., were available and relatively reliable.

CBAs were conducted for both projects using same method. The benefits

mainly came from the consumers side, as the water supply systems primarily

targeted the residents not agriculture or industrial sectors. As to health

benefits, the evaluation teams found that it would not be sensible to expect

improvements in health conditions for the reasons follow. In both areas, the

living standard was relatively high compared to rural areas with good access to

basic social infrastructure. Water they used to drink was not ‘unsafe’ to drink,

so it was unlikely to have caused serious water-borne disease.17 Data from

17 Survey in the Vietnam project found that about 30% of households in Buon Ho Town

buy and use 20-litter bottled water for drinking purpose.

147

health authorities also failed to show meaningful relationship between the

projects and the number of patients with water-related disease. This is

consistent with the findings in a meta-analysis on impacts of water supply

interventions, which states that water supply interventions alone showed

negligible and insignificant impact on diarrhea morbidity (Waddington et al.

2009, quoted in OECD 2011, 49).

Table 32. Estimation of NPV and B/C of Two Water Projects

Nicaragua Project Vietnam Project

Discount

Rate NPV($) B/C NPV($) B/C

8% 33,597,094 1.71 2,661,045 1.31

9% 26,308,096 1.57 1,708,372 1.21

10% 20,174,695 1.44 909,463 1.11

11% 14,973,178 1.33 234,641 1.03

12% 10,529,080 1.24 - 339,320 0.96

Note: Detailed descriptions and method of calculation are available in ECDF (2012)

and KOICA (2016).

The result of cost-benefit analyses of two projects are shown in Table 32.

The Nicaragua project would be expected to have approximately $20 million

of NPV considering 35 years of project life at 10% discount rate, and showed

positive NPVs in the range of 8~12% discount rate. The Vietnam project

showed relatively small net social benefit compared to the Nicaragua project.

148

When calculated with a discount rate of 10%, the NPV exhibited a positive

value of about $0.9 million for 35 years, which indicates that the project would

generate economic benefits with the discount rate of 10%. Applying a

discount rate of 12% for a sensitivity test, the NPV turns to negative.

As mentioned earlier, the total cost of the Nicaragua project is much

higher than that of the Vietnam project. The results of CBA showed that the

simple comparison of per capita cost, which is often used as a standard in

‘efficiency’ criterion, would be misleading, since it does not compare the size

of benefits relative to the costs.

The magnitude of benefits depends on the conditions before the project,

that is, in this case, what was the costs that the beneficiaries had to spend to get

water before the project. The costs can include time, paid expenditure to water

consumption, or inconvenience and uncertainty. For those who used to spend

two hours a day for fetching water or pay several dollars per month to buy

drinking water, the benefits from assess to water supply system with a

reasonable tariff rate would be considerably big. On the other hand, for those

who used to use water without much inconvenience at relatively low price, the

benefits might not be so significant.

In the Nicaragua case, those who did not have a household connection

before the project used to spend a lot of time and money to fetch or buy water,

and even people with household connection used to experience much

inconvenience because of irregular and insufficient water supply. After the

149

project they have stable water supply at home, which led to increase water

consumption with improved convenience. In Buon Ho Town, most

households had a well in the yard before the project, in most cases with an

electric pump to draw and store water in a tank, so the costs of using water or

inconvenience was not very high. For the households who used to draw water

from a well with an automatic water pump and store in a two-ton water tank on

the top of the house, the way of using water before and after water supply

connection would not make a significant difference. The differences would

be the water quality between tap and well water, and maybe an effort to push a

button of the automatic water pump to draw and store the well water into the

tank.

Increase in water consumption is another important factor of benefits

from water supply intervention. The average tap water consumption per

household in Juigalpa increased from 14.4 m3 per month to 18.6 m3, while it

decreased in the Vietnam project from 18.5 m3 to 15 m3. It implies that newly

connected households in Boun Ho Town do not use tap water as much as those

who already had a connection. Since people kept using well water even after

connected, for the purposes of other than drinking or cooking, e.g., for doing

laundry or watering plants in the yard, the tap water consumption does not

reflect the total amount of water use. Nevertheless, the survey results showed

that residents in Buon Ho Town used to consume sufficient water from wells

even before the project, so increase in water consumption and consequent

150

improvement in consumer surplus seem to be limited.

What the CBA results did not show is the benefits from improved water

quality, as those are not easily quantified. It is reasonably inferred that the

quality of purified tap water would be much better than that of well water or

water purchased from water trucks. Judgement about how to value this type

of benefit, therefore, is left to evaluators and requires qualitative assessments.

5.4. Discussion

In this chapter, I examined how the methods of synthesis can affect the

evaluation results differently between the DAC framework and CBA. The

DAC criteria have a same weight when integrated into a final judgement of

success or not, while CBA aggregates both positive and negative values to

determine a net value, the NPV.

The hypothetical case analysis shows that evaluations applying the DAC

criteria can generate a positive conclusion on a project whose social benefit, i.e.,

the NPV is negative. It is fair to conclude that there is positive bias inherent

in the DAC criteria; as the positive conclusion is drawn by combining the

assessments in the five criteria against the standards as defined.

The comparative case study of water supply projects suggests that a

project with smaller NPV (or benefit-cost ratio, B/C) can get similar or more

favorable assessments in the DAC criteria framework than another project with

151

greater NPV (or B/C). In the DAC criteria framework, both projects were

evaluated as ‘very successful’ with similar findings in each criterion. The

results of CBAs, however, show much larger NPV (about 20 million USD) and

B/C (1.44) in the Nicaragua project than those in the Vietnam project (about

0.9 million USD or 1.11 respectively). In a sensitivity test, the NPV of

Vietnam project turned to negative, showing that it is not safe to make a firm

judgement that it is a successful project. The DAC criteria framework yield

positively biased evaluation results for the Vietnam project whose net social

benefit is relatively small and even possibly negative, getting high and same

level of assessment as the Nicaragua project whose net social benefit is much

larger and stable. It means that the extent to which positive bias occurs is

inconsistent across evaluations results.

Positively biased evaluation results can be attributed to the standards of

judgement in each criterion as well as the synthesis methods of integrating the

assessments in different criteria with a same weight, which is rather arbitrary.

Applying the compensatory model, the DAC framework allows a serious

problem in one criterion to be compensated by a good assessment in another

criterion. The problem may be serious enough to make the project’s NPV

negative, in which case the project would not be considered worthy or

successful. However, the overall conclusion in the DAC framework would be

affected partially, as the assessments in the two criteria are combined into the

overall judgement with same level of importance.

152

CHAPTER 6. CONCLUSIONS

This study examined the OECD/DAC evaluation criteria with a question

whether they contribute to positively biased and inconsistent evaluation results

in comparison with cost-benefit analysis (CBA). Positive bias may reduce

comparability between evaluations that is one of the purposes for which the

DAC criteria were developed, making it difficult to differentiate a more

successful intervention from a less one and possibly causing inconsistency

across evaluation results.

Implementing the ‘general logic of evaluation’ as the analytical

framework, the study is organized the analyses into three stages, dealing with:

first, the notion of merit that the DAC criteria define for a successful

development intervention; second, the standards of judgement and source of

supporting evidence in each criterion; and third, the methods of synthesis by

which the assessments in the five criteria are integrated into the overall

evaluative conclusion on the intervention.

The term ‘evaluation criteria’ is defined as the aspects, qualities, or

dimensions that distinguish a more valuable evaluand from a less valuable one.

Based on the review of key requirements for evaluation criteria in various

evaluation models, the defining features of good evaluation criteria are drawn

as follows. First, a criterion should provide a clear notion of quality or value

153

to be assessed. If a criterion asks questions about what is taken for granted or

is not supported by evidence that address the criterion directly, it would be

superfluous or inappropriate. Second, a criterion should suggest a guidance

for setting a standard of assessments: what level of quality is enough to be

considered as valuable or acceptable. If the standards of judgement are

subjective and easy to be satisfied, the assessment can hardly provide reliable

information to judge the true value of the evaluand or to discriminate between

values of evaluands. Third, the criteria should be complete, non-overlapping,

and commensurable. Omission of an important aspect could result in non-

counting problems in synthesizing the assessments. Likewise, overlapping

criteria could cause double-counting problems. This is essential especially

when the assessments are integrated into an overall rating.

Given the definition of ‘evaluation criteria’, the DAC criteria would

represent the merit of the program being evaluated, encompassing the

properties that constitute a successful intervention. The study finds that the

DAC criteria cover most of the dimensions that general evaluation models

suggest, but with some limitations. The ‘relevance’ criterion primarily

focuses on the policy context and asks ex-ante questions with less attention to

the need of the program or the logical linkage between the program and

expected results. Measuring ‘effectiveness’, defined as the achievement of

program objectives, and ‘efficiency’, e.g., whether completed on time and

within the budget, is largely based on the assumption that the program’s initial

154

plan and objectives are valid. The ‘sustainability’ criterion assumes that the

results are beneficial and worth continuing.

This logical interdependence between criteria may affect assessment in

the criterion at issue because the standard of judgement may have to be adjusted

according to the assessment in another criterion. Whether to achieve the

stated objectives, the primary condition to satisfy the effectiveness criterion,

may not be an appropriate standard to judge the program’s success, if the stated

objectives are not valid. So meeting the relevance criterion would be the

precondition for meaningful assessment for effectiveness, otherwise the

standard should be reestablished. The similar interdependence lies between

effectiveness/impact and sustainability. What kinds and magnitude of

benefits can be sustained depends on the analysis in effectiveness and impact

criteria, so the standards of judgement in sustainability criterion should be

drawn accordingly. If the assessments are made separately and integrated into

an overall conclusion without a thoughtful consideration of relative importance,

there is a risk of bias in overall results.

The review of 65 ex-post evaluation reports by two Korean aid agencies,

KOICA and EDCF, confirms the analysis above. The evaluations under

review applied the DAC criteria framework mostly using the standardized

questions and standards of judgment in the guidelines. Some of the questions

address what is taken for granted as a development project or do not necessarily

require in-depth research. In most cases, assessments in each criterion were

155

made independently, though some of them should have rested on the

information found in other criteria. This leads to high ratings in average

especially in relevance and effectiveness criteria. 88% of the reports

concluded that the evaluand is either successful or very successful, even in the

cases where development results were not deemed to be significant. The

evaluation conclusions are drawn on a simple average of the ratings by criterion,

and a serious flaw detected in one criterion, e.g., sustainability, is often

compensated by high rating in another, e.g., relevance. Such mechanical

applications of the DAC criteria and standardized questions can mislead

evaluation results, in most cases towards positive conclusions, which make it

difficult to understand the true value of the project, to differentiate a more

successful project from a less one, and to draw valid lessons from the evaluation.

To explain the suspected positive bias in the DAC criteria, this study

adopts the criterion used in cost-benefit analysis (CBA), namely the net present

value (NPV) as a benchmark. Theoretically, CBA considers all direct and

indirect effects as consequences of a project against costs for a certain period,

and examines all aspects that the DAC criteria suggest. I argued that the NPV,

as an indicator that represent the social value of a project, covers the scope of

the DAC criteria: the needs (relevance), benefits in relation with costs

(efficiency), increase in social welfare (effectiveness and impact), and the

period that the benefits maintain (sustainability). I explained that certain

events, which affect the NPV seriously making the assessment of a project

156

negative, only affect the evaluation results in the DAC criteria partially

resulting in rather positive assessment than that in CBA. Comparative case

studies presented in Chapter 5 explain that the DAC criteria framework yield

positively biased evaluation results for a project whose net social benefit is

relatively small and even possibly negative, getting high and same level of

assessment as another project whose net social benefit is much larger and stable.

To summarize, this study finds that the DAC criteria have possibilities to

produce positive bias in evaluation results, and they are more evident in practice

as in the cases of Korean agencies. This positive bias is analyzed conceptually

and empirically in comparison with the evaluation standard in CBA, and found

to be attributable to arbitrary relative weights placed on the DAC criteria. The

study also finds that positive bias may occur unevenly between projects,

resulting in inconsistency across evaluation results. The findings have

significant policy implications as positive bias and its inconsistent occurrence

in evaluation results may seriously weaken the validity of evaluations, and

consequently mislead the agencies’ learning and decision making which are the

primary purposes of evaluation.

This study contributes to the relatively small body of academic literature

on international development evaluation, by adding value from a new

perspective to look into the limitations of the DAC criteria as an evaluation

framework. It would also enrich the recent discussion on how to improve the

DAC criteria, providing both theoretical and empirical analyses.

157

REFERENCES

ADB. 2013. Cost-Benefit Analysis for Development: A Practical Guide.

ADB. 2016. Guidelines for Preparing a Design and Monitoring Framework.

Mandaluyong City, Philippines: Asian Development Bank.

ADB. 2017. Guidelines for the Economic Analysis of Projects (revised version

of the 1997 edition).

Alkin, Marvin C. 2013a. "Comparing Evaluation Points of View." In

Evaluation Roots: a Wider Perspective of Theorists’ Views and

Influences, edited by Marvin C Alkin, 3-10. Thousand Oaks, CA:

Sage Publications.

Alkin, Marvin C, ed. 2013b. Evaluation Roots: a Wider Perspective of

Theorists’ Views and Influences. 2nd ed. Thousand Oaks, CA: Sage

Publications.

Alkin, Marvin C. 2010. Evaluation Essentials from A to Z. New York:

Guilford Press.

Alkin, Marvin C., Michael Quinn Patton, and Carol H. Weiss. 1990. Debates

on Evaluation. Newbury Park, Calif.: Newbury Park, Calif. : Sage

Publications.

ALNAP. 2006. Evaluating Humanitarian Action Using the OECD-DAC

Criteria: An ALNAP Guide for Humanitarian Agencies. London:

Overseas Development Institute.

Astbury, Brad. 2016. "From Evaluation Theory to Tests of Evaluation

Theory?" In The Future of Evaluation: Global Trends, New

Challenges, Shared Perspectives, edited by Reinhard Stockmann and

158

Wolfgang Meyer, 309-325. London: Palgrave Macmillan UK.

Bamberger, Michael. 1991. "The politics of evaluation in developing

countries." Evaluation and Program Planning 14 (4):325-339.

Bamberger, Michael. 2000. "The Evaluation of International Development

Programs: A View from the Front." American Journal of Evaluation

21 (1):95-102.

Bamberger, Michael. 2009. "Why Do Many International Development

Evaluations Have a Positive Bias?: Should We Worry?" Evaluation

Journal of Australasia 9 (2):39-49.

Bamberger, Michael, Jim Rugh, and Linda Mabry. 2011. RealWorld

Evaluation: Working Under Budget, Time, Data, and Political

Constraints. 2nd ed: SAGE Publications.

Berlage, L., and O. Stokke, eds. 1992. Evaluating Development Assistance:

Approaches and Methods, EADI Book Series 14: Evaluation of Aid.

London: Frank Cass.

Binnendijk, Annette L. 1989. "Donor Agency Experience With the Monitoring

and Evaluation of Development Projects." Evaluation Review 13

(3):206-222.

Boardman, A., D. Greenberg, A. Vining, and D. Weimer. 2011. Cost-Benefit

Analysis: Concepts and Practice. 4th ed. Boston Pearson Education.

Brent, Robert J. 2006. Applied Cost-benefit Analysis. 2nd ed. Edward Elgar.

Cameron, John. 2011. "Social Cost-Benefit Analysis - Principles." In Valuing

Water, Valuing Livelihoods - Guidance on Social Cost-Benefit

Analysis of Drinking-Water Interventions, with Special Reference to

Small Community Water Supplies, edited by John Cameron, Paul

Hunter, Paul Jagals and Katherine Pond, 199-216. London, UK: IWA

159

Publishing.

Camfield, Laura, Maren Duvendack, and Richard Palmer-Jones. 2014.

"Things You Wanted to Know about Bias in Evaluations but Never

Dared to Think." IDS Bulletin 45 (6):49-64.

Carden, Fred. 2013. "Evaluation, Not Development Evaluation." American

Journal of Evaluation 34 (4):576-579.

Chelimsky, Eleanor. 1997. "Thoughts for a New Evaluation Society."

Evaluation 3 (1):97-109.

Chianca, Thomaz. 2008. "The OECD/DAC Criteria for International

Development Evaluations: an Assessment and Ideas for

Improvement." Journal of Multidisciplinary Evaluation 5 (9):41-51.

Christie, Christina A, and Marvin C Alkin. 2013. "An Evaluation Theory

Tree." In Evaluation Roots: a Wider Perspective of Theorists’ Views

and Influences, edited by Marvin C Alkin, 11-57. Thousand Oaks,

CA: Sage Publications.

Cracknell, Basil E. 2000. Evaluating Development Aid: Issues, Problems and

Solutions. New Delhi: SAGE Publications.

Cullen, Anne E., and Chris L. S. Coryn. 2011. "Forms and Functions of

Participatory Evaluation in International Development: A Review of

the Empirical and Theoretical Literature." Journal of

MultiDisciplinary Evaluation 7 (16):32-47.

Dale, Reidar. 2004. Evaluating Development Programmes and Projects. 2nd

ed. New Delhi: SAGE Publications.

Davidson, E Jane. 2005. Evaluation Methodology Basics: The Nuts and Bolts

of Sound Evaluation. Thousand Oaks: SAGE Publications.

160

de Rus, Ginés. 2010. Introduction to Cost-Benefit Analysis: Looking for

Reasonable Shortcuts. Cheltenham, UK: Edward Elgar.

Donaldson, Stewart I. 2007. Program Theory-Driven Evaluation Science:

Strategies and Applications. New York: Psychology Press.

Donaldson, Stewart I., ed. 2013. The Future of Evaluation in Society: A

Tribute to Michael Scriven. Charlotte, NC: Information Age

Publishing.

EDCF. 2012. Water Supply Expansion Project in Juigalpa, Nicaragua - Ex-

post Evaluation Report. Seoul: Export-Import Bank of Korea.

EDCF. 2014. Ex-post Evaluation on Medical Equipment Provision to Ha

Trung District General Hospital in Thanh Hoa Project, Vietnam.

Seoul: Export-Import Bank of Korea.

Fewtrell, Lorna, and John M. Jr. Colford. 2004. Water, Sanitation and

Hygiene: Interventions and Diarrhoea - A Systematic Review and

Meta Analysis. Washington, DC: World Bank.

Forss, Kim, and Sara Bandstein. 2008. "Evidence-based Evaluation of

Development Cooperation: Possible? Feasible? Desirable?" IDS

Bulletin 39 (1):82-89.

Fournier, Deborah M. 1995. "Establishing Evaluative Conclusions: A

Distinction between General and Working Logic." New Directions for

Evaluation 1995 (68):15-32. doi: 10.1002/ev.1017.

Frank, Robert H. 2000. "Why Is Cost‐Benefit Analysis so Controversial?"

The Journal of Legal Studies 29 (S2):913-930.

Guba, Egon G., and Yvonna S. Lincoln. 1989. Fourth Generation Evaluation.

Newbury Park, Calif. : Sage Publications.

161

Hansen, Mark, Marvin C. Alkin, and Tanner LeBaron Wallace. 2013.

"Depicting the logic of three evaluation theories." Evaluation and

Program Planning 38:34-43.

Heider, Caroline 2017. "Rethinking Evaluation – Have we had enough of

R/E/E/I/S?: After nearly 15 years of adhering to the DAC evaluation

criteria, is it time for a rethink?" World Bank Group Rethinking

Evaluation series. https://ieg.worldbankgroup.org/blog/rethinking-

evaluation.

Hutton, Guy, and Laurence Haller. 2004. Evaluation of the Costs and Benefits

of Water and Sanitation Improvements at the Global Level. Geneva:

World Health Organization.

Hutton, Guy, Laurence Haller, and Jamie Bartram. 2007. "Global Cost-Benefit

Analysis of Water Supply and Sanitation Interventions." Journal of

Water and Health 5 (4):481-502.

IEG. 2017. "Conversations: the Future of Development Evaluation." Last

Modified June 21, 2017, accessed June 25, 2017.

https://ieg.worldbankgroup.org/news/conversations-future-

development-evaluation.

IFAD. 2015. IFAD’s Internal Guidelines – Economic and Financial Analysis

(EFA) of Rural Investment Projects, Vol. 1. Basic Concepts and

Rationale.

IFAD. 2016a. IFAD’s Internal Guidelines – Economic and Financial Analysis

(EFA) of Rural Investment Projects, Vol. 2. Economic and Financial

Analysis of Rural Investment Projects.

IFAD. 2016b. IFAD’s Internal Guidelines – Economic and Financial Analysis

(EFA) of Rural Investment Projects, Vol. 3. Case Studies.

162

Igarashi, Masahiro, and Omar Awabdeh. 2015. "Weaning from DAC Criteria."

The 5th Biennial International Conference of Sri Lanka Evaluation

Association, Colombo, 15-16 September 2015.

Julnes, George. 2012. "Managing Valuation." New Directions for Evaluation

2012 (133):3-15.

Keeney, Ralph L., and Robin S. Gregory. 2005. "Selecting Attributes to

Measure the Achievement of Objectives." Operations Research 53

(1):1-11.

King, Jean A. 2003. "The Challenge of Studying Evalutation Theory." New

Directions for Evaluation 2003 (97):57-68.

King, Julian. 2015. "Letter to the editor: Use of Cost-benefit Analysis in

Evaluation." Evaluation Journal of Australasia 15 (3):37-41.

King, Julian. 2017. "Using Economic Methods Evaluatively." American

Journal of Evaluation 38 (1):101-113.

KOICA. 2014. Ex-post Evaluation on the Project for the Establishment of an

Early Warning System for Disaster Mitigation in Philippines.

Seongnam: Korea International Cooperation Agency (KOICA).

KOICA. 2016. Ex-Post Evaluation of the Project for the Construction of

Water Supply System in Buon Ho Town, Vietnam. Seongnam: Korea

International Cooperation Agency (KOICA).

Mark, Melvin M. 2008. "Building a Better Evidence Base for Evaluation

Theory: Beyond General Calls to a Framework of Types of Research

on Evaluation." In Fundamental Issues in Evaluation, edited by Nick

L. Smith and Paul R. Brandon. New York: Guilford Press.

Mark, Melvin M., Jennifer C. Greene, and Ian Shaw. 2006. "The Evaluation

of Policies, Programs, and Practices." In Handbook of Evaluation:

163

Policies, Programs and Practices, edited by Ian Shaw, Jennifer C.

Greene and Melvin M. Mark. London: SAGE.

Markiewicz, Anne, and Ian Patrick. 2016. Developing Monitoring and

Evaluation Frameworks. Thousand Oaks, CA.: SAGE Publications.

Mathison, Sandra, ed. 2005. Encyclopedia of Evaluation. Thousand Oaks:

SAGE Publications.

McDonald, Diane. 1999. "Developing guidelines to enhance the evaluation of

overseas development projects." Evaluation and Program Planning

22 (2):163-174.

Michaelowa, Katharina, and Axel Borrmann. 2006. "Evaluation Bias and

Incentive Structures in Bi- and Multilateral Aid Agencies." Review

of Development Economics 10 (2):313-329.

Miller, Robin Lin, and Donald Campbell. 2006. "Taking Stock of

Empowerment Evaluation - an Empirical Review." American

Journal of Evaluation 27 (3):296-319.

Morra-Imas, L.G., and R.C. Rist. 2009. The Road to Results: Designing and

Conducting Effective Development Evaluations: World Bank.

OECD. 1991. Principles for Evaluation of Development Assistance.

OECD. 2002. "Glossary of Key Terms in Evaluation and Results Based

Management." OECD DAC Working Party on Aid Evaluation.

http://www.oecd.org/dataoecd/29/21/2754804.pdf.

OECD. 2006. Cost-Benefit Analysis and the Environment: Recent

Developments. Paris: OECD Publishing.

OECD. 2011. Benefits of Investing in Water and Sanitation: An OECD

Perspective. Paris: OECD Publishing.

164

OECD. 2013. The DAC Network on Development Evaluation – 30 years of

strengthening learning in development. Paris: DAC Network on

Development Evaluation, OECD.

OECD. 2016. Evaluation Systems in Development Co-operation: 2016

Review. Paris: OECD Publishing.

OECD. n.d. DAC Criteria for Evaluating Development Assistance Factsheet.

Accessed 2012.11.29.

Owen, John M., and Patricia J. Rogers. 1999. Program Evaluation: Forms

and Approaches. London: SAGE.

Patton, Michael Quinn. 1994. "Developmental Evaluation." Evaluation

Practice 15 (3):311-319.

Patton, Michael Quinn. 1997. Utilization-Focused Evaluation: the New

Century Text. 3rd ed. Thousand Oaks, CA: Sage Publications.

Raimondo, Estelle, Jos Vaessen, and Michael Bamberger. 2016. "Towards

More Complexity-Responsive Evaluations: Overview and

Challenges." In Dealing with Complexity in Development Evaluation:

A Practical Approach, edited by Michael Bamberger, Jos Vaessen and

Estelle Raimondo, 26-47. Thousand Oaks, CA: SAGE Publications.

Rebien, Claus C. 1996. Evaluating Development Assistance in Theory and in

Practice. Aldershot: Avebury.

Rebien, Claus C. 1997. "Development Assistance Evaluation and the

Foundations of Program Evaluation." Evaluation Review 21

(4):438-460.

Riddell, Roger C. 2007. Does Foreign Aid Really Work?: Oxford University

Press.

165

Rossi, Peter H., Mark W. Lipsey, and Howard E. Freeman. 2004. Evaluation:

a Systematic Approach. 7th ed. Thousand Oaks, CA: Sage.

Rudd, Murray A. 2009. "Nonmarket Economic Valuation and "The

Economist’s Fallacy"." Journal of MultiDisciplinary Evaluation 6

(11):112-115.

Scott-Little, Catherine, Mary Sue Hamann, and Stephen G. Jurs. 2002.

"Evaluations of After-School Programs: A Meta-Evaluation of

Methodologies and Narrative Synthesis of Findings." American

Journal of Evaluation 23 (4):387-419.

Scriven, Michael. 1976. "Evaluation Bias and Its Control." In Evaluation

Studies Review Annual (Vol. 1), edited by G. V. Glass. Beverly Hills,

CA: Sage.

Scriven, Michael. 1981. The Logic of Evaluation. Edgepress.

Scriven, Michael. 1991. Evaluation Thesaurus. 4th ed. Newbury Park: SAGE

Publications.

Scriven, Michael. 1994. "Evaluation as a Discipline." Studies in Educational

Evaluation 20 (1):147-166.

Scriven, Michael. 1995. "The Logic of Evaluation and Evaluation Practice."

New Directions for Evaluation 1995 (68):49-70.

Scriven, Michael. 2007. The Logic and Methodology of Checklist. Retrieved

June 22, 2017, from The Evaluation Center, evaluation checklists

website: www.wmich.edu/evaluation/checklists.

Scriven, Michael. 2008a. "The Concept of a Transdiscipline: And of

Evaluation as a Transdiscipline." Journal of MultiDisciplinary

Evaluation 5 (10):65-66.

166

Scriven, Michael. 2008b. "The Economist's Fallacy." Journal of

MultiDisciplinary Evaluation 5 (9):74-76.

Scriven, Michael. 2015. Key Evaluation Checklist.

Shadish, William R., Thomas D. Cook, and Laura C. Leviton. 1991.

Foundations of Program Evaluation: Theories of Practice. Newbury

Park, Calif.: Sage Publications.

Shaw, Ian, Jennifer C. Greene, and Melvin M. Mark, eds. 2006. Handbook

of Evaluation: Policies, Programs and Practices. London: SAGE.

Shusterman, Richard. 1980. "The Logic of Evaluation." The Philosophical

Quarterly (1950-) 30 (121):327-341.

Smith, Nick L. 2010. "Characterizing the Evaluand in Evaluating Theory."

American Journal of Evaluation 31 (3):383-389.

Smith, Nick L., and Paul R. Brandon, eds. 2008. Fundamental Issues in

Evaluation. New York: Guilford Press.

Snell, Michael. 2011. Cost-benefit Analysis: A Practical Guide. 2nd ed.

London: Thomas Telford.

Stake, Robert E. 2004. Standards-Based and Responsive Evaluation.

Thousand Oaks: Sage Publications.

Stockmann, Reinhard. 2013. "The Role of Evaluation in Society." In

Functions, Methods and Concepts in Evaluation Research, edited by

Reinhard Stockmann and Wolfgang Meyer, 8-53. Palgrave Macmillan

UK.

Stockmann, Reinhard, and Wolfgang Meyer, eds. 2016. The Future of

Evaluation: Global Trends, New Challenges, Shared Perspectives.

London: Palgrave Macmillan UK.

167

Stufflebeam, Daniel L. 2003. "The CIPP Model for Evaluation." In

International Handbook of Educational Evaluation, edited by Thomas

Kellaghan and Daniel L. Stufflebeam, 31-61. Dordrecht: Springer

Netherlands.

Stufflebeam, Daniel L. 2007. CIPP Evaluation Model Checklists. 2nd ed.

Stufflebeam, Daniel L., and Chris L. S. Coryn. 2014. Evaluation Theory,

Models, and Applications. 2nd ed. San Francisco, CA: Jossey-Bass.

Van Den Berg, Rob D. 2005. "Results Evaluation and Impact Assessment in

Development Co-operation." Evaluation 11 (1):27-36.

Vedung, Evert. 2010. "Four Waves of Evaluation Diffusion." Evaluation 16

(3):263-277.

Waddington, Hugh, Birte Snilstveit, Howard White, and Lorna Fewtrell. 2009.

Water, Sanitation and Hygiene Interventions to Combat Childhood

Diarrhoea in Developing Countries. New Delhi, India: 3ie.

Watts, Brad R. 2008. "Understanding Opportunity Costs and the Economist’s

View: A Response to Scriven's "The Economist's Fallacy"." Journal

of MultiDisciplinary Evaluation 5 (10):89-92.

Weiss, Carol H. 1998. Evaluation: Methods for Studying Programs and

Policies. Upper Saddle River, NJ: Prentice Hall.

World Bank. 2010. Cost-Benefit Analysis in World Bank Projects.

168

APPENDIX. List of Evaluation Reports Reviewed in Chapter 4

1. KOICA

N Title of Evaluation Size

(mil USD) Sector

2015

1

Ex-post Evaluation on the Project for the

Development of the Vocational Training Capacity in

Uzbekistan

4.0 education

2

Ex-post Evaluation on the Project for the

Establishment of the Korea-Nepal Institute of

Technology in Butwal

5.7 education

3 Ex-Post Evaluation Report on the Project for the

Improvement of Blood Bank in Irbid Jordan 3.1 health

4 Ex-Post Evaluation on the Projects for Management

of Mercury Waste in Egypt 3.0 energy

5

Ex-Post Evaluation on the Projects for Construction

of Municipal Solid Waste Recycling Facility for

Ulaanbaatar City, Mongolia

3.5 energy

6

Ex-post Evaluation on the project for Solar-powered

irrigation pump and solar home system in

Bangladesh

2.5 energy

7 Ex-post Evaluation on the Project for Integrated

Community Development in Comilla, Bangladesh 3.5 agriculture

2014

8

Ex-Post Evaluation on the Projects for Developing

and Publishing Textbooks for Upper Secondary

Schools in Lao PDR

3.0 education

9

The Project for the Establishment of Bangladesh-

Korea ICT Training Center for Education

(BKITCE)

1.4 education

10

Project for the Establishment of a Morocco-Korean

ICT Training Center for Moroccan Teachers

(CMCF TICE)

3.0 education

11

Ex-Post Evaluation on the Projects for Improving

the Korea-Peru Health Center in Bellavista, Callao,

Peru

2.0 health

12

Ex-Post Evaluation on the Projects for Public Health

Service Improvement for Mother and Child in El

Alto, Bolivia

1.3 health

169

13

Ex-Post Evaluation on the Projects for Improving

Maternal and Child Health Care Services in

Paraguay

3.3 health

14

Ex-Post Evaluation on the Projects for Construction

of Maternal Homes for Maternal and Child Heath in

El Salvador

2.0 health

15

Ex-Post Evaluation on the Project for the

Establishment of Lao-Korea National Children's

Hospital

3.5 health

16

Ex-Post Evaluation on the Project for the

Establishment of the Central General Hospital in

Quang Nam Province, Vietnam

35.0 health

17

Ex-Post Evaluation on the Project for

Informatization of the Central State Archives in

Uzbekistan

3.0 e-government

18

Ex-Post Evaluation on the Project for the

Establishment of a Pilot e-Procurement System in

Mongolia

4.6 e-government

2013

19

Ex-Post Evaluation on the Project for the

Automation of Intellectual Property Administration

in Mongolia

3.1 e-government

20 Ex-Post Evaluation on the Project for Modernization

of Tanzania Customs Administration 3.3 e-government

21

Ex-Post Evaluation on the Project for the

Establishment of the Emergency Response System

in Sri Lanka

2.0 e-government

22

Ex-Post Evaluation on the Project for the

Construction of Storm Water Drainage at

Valachchenai in Sri Lanka

3.9 energy

23

Ex-Post Evaluation on the Projects for the

Establishment of an Early Warning System for

Disaster Mitigation Phase I in Philippines

1.0 disaster

prevention

24

Ex-Post Evaluation on Project for the establishment

of Early Warning and Response System for Disaster

Mitigation Phase II in Metro Manila

3.0 disaster

prevention

25

Ex-Post Evaluation on the Project for the Grid

Connected Photovoltaic Power Generation in Sri

Lanka

3.0 energy

26

Ex-Post Evaluation on the Project for the

Establishment of Hybrid <PV/Diesel/Batteries>

Power System

2.2 energy

170

27 Ex-Post Evaluation on the Project for Construction

of Siem Reap Bypass Road in Cambodia 17.4 transportation

28 Ex-Post Evaluation on the Project for the Integrated

Rural Development in Arsi Zone, Ethiopia 1.9 agriculture

29

Ex-post Evaluation Report on the Project for

Upgrading Auto-Maintenance Vocational Training

Center in Embaba, Guiza

2.0 education

30

Ex-post Evaluation Report on Program for the

Improvement for the Automotive Vocational

Training System

5.0 education

31

Ex-post Evaluation Report on the Project for the

Establishment of Industrial Training Center in

Thagaya, Myanmar

2.3 education

32

Ex-post Evaluation Report on the Project for

Upgrading Jaffna Technical College as a College of

Technology

2.3 education

33

Ex-post Evaluation Report on the Project for

Establishment of Korea-Vietnam Friendship IT

College in Danang

10.0 education

34 Ex-post Evaluation Report on the Hlegu Township

Rural Development Project in Myanmar 2.0 agriculture

35

Ex-post Evaluation Report on the Project for

Irrigation Technology Capacity Building in Upper

Myanmar

2.0 agriculture

36 Ex-post Evaluation Report on the Project for

Batheay Flood Control in Cambodia 2.0

disaster

prevention

37

Ex-post Evaluation Report on the Project for

Construction of Irrigation System in Batheay,

Cambodia

2.5 agriculture

38

Ex-post Evaluation Report on the Project for

Modernization of the Traffic Management System

in Erbil

5.0 transportation

39 Ex-post Evaluation Report on the Project for

Busuanga Airport Development in the Philippines 3.0 transportation

40

Ex-Post Evaluation on the Project to Reduce Air

Pollution by Improving Heating Culture in

Ulaanbaatar, Mongolia

0.7 environment

41

Ex-post Evaluation Report on the Project for

Improving the District Heating and Water Supply

System in Ulaanbaatar, Mongolia

5.0 energy

42

Ex-post Evaluation Report on the Project for

Improving Heat Supply System in Khorezm,

Uzbekistan

3.5 energy

171

43

Ex-post Evaluation Report on the 3th Phase of the

Project for Upgrading the Korea-Vietnam

Friendship Clinic in Hanoi, Veitnam

1.3 health

44

Ex-post Evaluation Report on the Program for the

Improvement of Maternal and Child Health in

Chimaltenango, Guatemala

3.0 health

45

Ex-post Evaluation Report on the Project for the

Improvement of Maternal and Neonatal Health in

Guatemala

1.5 health

46

Ex-post Evaluation Report on the Project for

Establishment of an E-procurement Pilot System in

Vietnam

3.0 e-government

47

Ex-post Evaluation Report on the Project for

Modernization of Communication and Information

System of the State Ministries of the Republic of

Paraguay

2.5 e-government

2. ECDF

N Title of Evaluation Size

(mil USD) Sector

2015

48 Ex-post Evaluation on the Improvement of Padeniya

- Anuradhapura Road Project in Sri Lanka 66.0 transportation

49 Ex-Post Evaluation on the Procurement of

Locomotive Project Phase III in Bangladesh 28.0 transportation

50 Ex-post Evaluation on the Creation of Capabilities in

Vocational Training Centers Project in Nicaragua 12.6 education

51 Ex-post Evaluation on the Myanmar Basic e-

Government Project 20.0 e-government

52 Ex-post Evaluation on the Hospitals Modernization

Projects (BIH-001) in Bosnia and Herzegovina 20.0 health

53 Ex-post Evaluation on the Hospitals Modernization

Projects (BIH-002) in Bosnia and Herzegovina 50.0 health

2014

54 Ex-Post Evaluation on Indonesia Manado By‐Pass

Project I 10.0 transportation

55 Ex-Post Evaluation on Bolivia Pailón-San José

Highway Construction Project (Component 2) 23.0 transportation

56 Ex-post Evaluation on Power Sector Development

Project, Sri Lanka 7.5 energy

172

57 Ex-post Evaluation on Power Distribution

Improvement Project, Myanmar 16.8 energy

58 Ex-post Evaluation on Medical Equipment Provision

to Ha Trung District General Hospital in Thanh Hoa

Project, Vietnam

3.0 health

59 Ex-post Evaluation on Medical Equipment Supply to

Lai Chau Provincial General Hospital Project,

Vietnam

10.0 health

2013

60 Ex-post Evaluation on Transmission Line and

Substation Project in Luzon, the Philippines 4.7 energy

61 Ex-post Evaluation on Mindanao Power

Transmission Project in the Philippines 9.7 energy

62 Ex-post Evaluation on GSO Road Expansion and

Emergency Dredging Project in Philippines 21.8 transportation

63 Ex-post Evaluation on National Road No.3

Rehabilitation Project in Cambodia 36.7 transportation

64 Ex-post Evaluation on Upgrading of Niyagama

National Vocational Traninig Center Project 8.5 education

65 Ex-post Evaluation on Re-engineering Government

Component of e-Sri Lanka Project 15.0 e-government

173

ABSTRACT IN KOREAN

국문 초록

비용편익분석 틀을 적용한 OECD/DAC 개발평가기준의 비판적 검토

경제협력개발기구 개발원조위원회(OECD/DAC)의 개발사업을 위한

평가기준(evaluation criteria)은 1991년에 채택된 이후 개발원조사업을

평가하는 가장 영향력있는 평가틀로 활용되고 있다. 적절성, 효과

성, 효율성, 영향력, 지속가능성의 5개 항목으로 구성되는 DAC 평가

기준은 대부분의 국제기구와 공여기관에 표준으로 제도화되면서 성

공적인 개발협력사업이 갖추어야 할 요건으로 인식된다.

본 연구는 OECD/DAC 5대 기준을 적용한 평가가 실제보다 긍정적

인 결과를 도출할 수 있다는 데 주목하고, 이를 비용편익분석의 관

점에서 이론적으로 분석하고 사례를 통해 확인하였다. 평가이론에

서 적용하는 평가논리(general logic of evaluation)의 단계, 즉 1) 평가

대상의 가치를 정의하는 항목 또는 기준(criteria)의 설정, 2) 각 기준

에 대한 가치판단 준거(standard)의 설정과 그에 따른 평가, 3) 각 항

목 평가 결과의 종합과 최종 평가결과의 도출의 3단계로 구분하여

비용편익분석에서 기준이 되는 순현재가치(NPV)와 비교 분석하였다.

주요 평가 이론과 비교할 때, DAC 평가기준은 사후 평가가 다루

어야 할 요소를 포괄하고 있으며, 개발협력사업의 특성을 반영하여

정형화된 기준과 표준 문항을 제시함으로써 공여국 입장에서 종합

174

적이고 일관된 평가 관리를 가능하게 한다는 장점이 있다. 단, 개

념과 정의가 포괄적이고 불명확하여 개발사업의 가치와 중요성을

정의하는 데 한계가 있는 것으로 분석된다. 하나의 평가틀로 보자

면, DAC 평가기준은 상호 의존적이며 중복되는 면이 있다. 효과성

과 효율성 항목은 적절성이 확보된 경우 판단의 의미가 있으며, 지

속가능성은 사업의 결과가 긍정적이라는 효과성 및 영향력의 충족

을 전제로 한다. 각 기준이 독립적으로 평가되고 이를 근거로 종합

적인 성공 여부를 판단할 경우 최종적으로는 긍정적인 결론이 도출

될 위험이 있다. 한국의 원조공여기관인 KOICA와 EDCF의 3년간

(2013-2015) 사후평가보고서 총 65건을 검토한 결과, 이러한 한계가

실제 적용상에도 나타나는 것으로 분석되었다.

DAC 5대 기준을 적용한 평가가 긍정적인 편향(positive bias)을 유

발할 수 있다는 점은 비용편익분석에서 활용되는 성공적 사업의 판

단 기준에 비추어 보면 좀더 명확해진다. 본 연구에서는 니카라과

와 베트남에서 시행된 상수도 사업을 사례로 두가지 평가 결과를

비교 분석하였다. 그 결과, DAC 평가틀을 적용할 경우 사회적 가치

가 없거나 낮은 사업이 성공적인 사업으로 평가될 가능성이 있으며

이러한 긍정 편향이 사업의 효과와 일관되지 않게 발생함으로써 사

업간 성공 여부를 객관적으로 차별화하기 어렵다는 점을 확인하였

다. 이는 기준별 평가 결과를 종합할 때 임의적으로 동일한 가중치

를 적용하는 데 따른 것으로, 5개 기준 간 난이도가 상이하고 중요도

가 사업에 따라 달라지기 때문인 것으로 분석된다.

본 연구는 비용편익분석이 사업의 가치를 측정하는 개념틀로서

개발사업의 사후평가에 적용될 수 있는 방법을 제시하고, 비용편익

분석 틀을 접목하여 DAC 평가기준에 보완이 필요한 부분을 도출하

175

였다는 점에서 학문적 의의가 있다.

주요어: DAC 평가기준(criteria), 비용편익분석, 개발사업평가, 평가

이론, 평가방법, 긍정 편향(positive bias)