Download pdf - School performance feedback systems: Design and ...mvalcke/CV/PhD Goedele Verhaeghe.pdf · feedback provided by specific systems. These so-called school performance feedback systems

School performance feedback systems:

Design and implementation issues

Goedele Verhaeghe

Promotor: Prof. Dr. Martin Valcke

Proefschrift ingediend tot het behalen Doctor in de Pedagogische

School performance feedback systems:

Design and implementation issues

Goedele Verhaeghe

Promotor: Prof. Dr. Martin Valcke

Proefschrift ingediend tot het behalen van de academische graad van Doctor in de Pedagogische Wetenschappen

2011

This Ph.D. research project has been funded by:

The agency for Innovation by Science and Technology (IWT) GRANT NUMBER SBO 50194 (SCHOOL FEEDBACK PROJECT)

VOORWOORD

Those that can, do research.

Those that cannot, teach.

Those that cannot teach, teach teachers.

Those that cannot teach teachers, do educational research.

(Anoniem, Bron: www.vob-ond.be)

Dit citaat, dat ik ooit goedbedoeld doorgestuurd kreeg van een

natuurwetenschapper, zal bij menige collega’s wenkbrauwen doen fronsen. Ik kan

niet ontkennen dat ik gedurende mijn doctoraatstraject aan deze uitspraak

getwijfeld heb. Wat is nu juist onderzoek voeren? Voor mij dekte dit vele ladingen.

Dat ik uit deze zoektocht zelf veel uit opgestoken heb, dat is zeker. Daarnaast hoop

ik eveneens te hebben bijgedragen aan de onderzoeksliteratuur. Een dankwoordje

is hier dan ook aangewezen voor de kansen die ik kreeg in dit leerproces.

Hierbij dank ik in de eerste plaats mijn promotor Prof. Dr. Martin Valcke en de

projectcoördinator en mijn bureaugenoot Dr. Jean Pierre Verhaeghe voor de

nodige ondersteuning en waardering. Eveneens dank ik de faculteit, de universiteit

en het IWT voor de kansen die ze aan jonge mensen bieden.

Een volgend dankwoordje is er voor iedereen die dit doctoraatsonderzoek

praktisch mogelijk heeft gemaakt. Daarbij verdient mijn collega uit Antwerpen,

Prof. Dr. Jan Vanhoof, meer dan een eervolle vermelding. Daarnaast zijn er

natuurlijk de studenten en de schoolleiders bij wie ik mijn gegevens verkregen heb.

Zonder hun medewerking valt er niets te onderzoeken. De leden van mijn

begeleidingscommissie (Prof. Dr. Peter Van Petegem, Prof. Dr. Patrick Onghena,

Jean Pierre en Martin) en de beoordelaars van tijdschriften wil ik danken omdat zij

mijn werk op een hoger niveau tilden met hun constructieve opmerkingen.

Voorts wil ik al mijn collega’s bedanken voor de aangename sfeer op de

vakgroep. Voor iedere gemoedstoestand was er wel een luisterend oor. Hoewel

iedereen het wel druk had met zijn/haar eigen bezigheden, was er steeds ruimte

voor een aangenaam en interessant gesprek.

Tenslotte zou ik dit proefschrift willen opdragen aan drie belangrijke mannen in

mijn leven. Aan jou papa, omdat je ons goed inpeperde: “Als je iets doet, doe het

dan goed”. Ik hoop hierin geslaagd te zijn. Collin, bedankt voor “alles”, een woord

dat meer omvat dan de meeste mensen in hun leven ooit krijgen van iemand. En

kleine Remi, ma vraie joie de vivre, als er iemand een glimlach kan toveren ben jij

het wel.

Goedele Gent, december 2010

TABLE OF CONTENTS

CHAPTER 1: GENERAL INTRODUCTION 1

1. Introduction: Moving forward by looking backward 2

2. Conceptual framework for School Performance Feedback

Systems

3

3. Research context: Each school its own mirror 14

4. Problem statement 15

5. Dissertation overview: Purpose, research questions and

research design

16

References 19

CHAPTER 2: CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS 24

Abstract 25

1. Introduction 26

2. Conceptual framework 27

3. Method 31

4. Results: Application of the framework 35

5. Discussion 45

6. Conclusion 50

References 51

CHAPTER 3: PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL

PERFORMANCE FEEDBACK USE

55

Abstract 56

1. Introduction 57

2. Theoretical framework 57

3. Research questions 61

4. Research context 62

5. Research design 63

6. Findings and discussion 67

7. Implications, limitations and conclusion 79

References 83

CHAPTER 4: VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL

FEEDBACK INFORMATION

87

Abstract 88

1. Introduction 89

2. Method 96

3. Results and discussion 103

4. General discussion and conclusion 107

References 111

CHAPTER 5: THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL

PERFORMANCE FEEDBACK USE

114

Abstract 115

1. Introduction and research questions 116

2. Theoretical framework 117

3. Methodology: research design, procedure and research

instruments

120

4. Results 123

5. Conclusion and discussion 127

References 130

CHAPTER 6: EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK 134

Abstract 135

Samenvatting 136

1. Probleemstelling 136

2. Conceptueel kader 137

3. Methode 142

4. Resultaten 147

5. Discussie en conclusie 156

Literatuur 160

CHAPTER 7: GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK 164

1. Introduction 165

2. Overview of research objectives and main findings 165

3. General discussion: “Mirror, mirror on the wall” 174

4. Limitations of the studies and directions for future research 177

5. Implications of the results 182

6. Final conclusion 186

References 187

NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH] 191

1. Inleiding 192

2. Conceptueel kader 193

3. Het Schoolfeedbackproject: Een spiegel voor elke school 194

4. Onderzoeksdoelstellingen en -opzet 195

5. Voornaamste bevindingen 196

6. Conclusie 200

Literatuur 202

RESEARCH VALORISATION: PUBLICATIONS 205

1

CHAPTER 1

GENERAL INTRODUCTION

Chapter 1

2

CHAPTER 1: GENERAL INTRODUCTION∗

1. Introduction: Moving forward by looking backward

“There was a time in education when decisions were based on the best

judgements of the people in authority. It was assumed that school and

district leaders, as professionals in the field, had both the responsibility and

the right to make decisions about students, schools and even about

education more broadly. They did so using a combination of intimate and

privileged knowledge of the context, political savvy, experience and logical

analysis. Data played almost no part in decisions. Instead, leaders relied on

their tacit knowledge to formulate and execute plans. In the past several

decades, a great deal has changed. Accountability has become the

watchword of education and data hold a central place in the current wave

of large-scale reform. At the same time, school leaders find themselves

faced with challenges that are ill structured with more than a single, right

answer that demand reflective judgements (King & Kitchener, 1994);

judgements that require them to have knowledge and understanding in

relationship to context and evidence. School leaders are caught in the nexus

of accountability and improvement, trying to make sense of the role that

data can and should play in school leadership.” (Earl & Fullan, 2003, p 383)

In recent years, the trend of decentralizing educational systems has spurred

researchers to focus on school-based management and internal evaluation.

Because schools are granted autonomy, governmental bodies expect them

to be accountable for continuously monitoring their internal quality policy

and improving their functioning (Hofman, Dijkstra, & Hofman, 2009;

Leithwood, Aitken, & Jantzi, 2006; Nevo, 2002). As public institutions,

schools are required to inform about the resources invested. Besides this

external drive, schools as learning organizations are supposed to

∗ Based on: Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance

Feedback: Perceptions of Primary School Principals. School Effectiveness and School

Improvement, 21(2), 167-188. Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The

influence of competences and support on school performance feedback use. Educational

Studies. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten van

ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën.

Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of schools: How to represent school feedback information. Manuscript submitted for publication in The

Journal of Educational Research.

Chapter 1

3

systematically gather data on their school functioning for self-evaluation

purposes. The idea behind is that schools need to adapt to and interact with

its constantly changing environment as a continuous developing organism

(Earl & Fullan, 2003). “Moving forward by looking backward” is

characterizing this cyclic process. In this context, the current and past

performance level of a school serves as a starting point for developing

future plans and educational targets. Related buzz words of the current

educational jargon are data-driven decision making, school accountability,

school improvement and value added. Many of these terms are deduced

from managerial literature, which stresses the function of schools as

professional organizations.

In order to make proper decisions, schools need to get informed about

their functioning. Besides experiences, intuitions and impressions of school

staff, several data sources are embedded in the school’s self-evaluation

process. Not only own class tests, school questionnaires, class lists,

inspection reports and the like are used, but also school performance

feedback provided by specific systems. These so-called school performance

feedback systems are specifically designed for providing schools with

confidential information on their functioning. They follow the trend of data-

driven schools improvement by fulfilling the need of schools of accessible

information-rich environments. Several local initiatives have been

developed and implemented worldwide. However, little is known yet on the

impact of these systems on the schools’ functioning and performance (Coe

& Visscher, 2002; Schildkamp, 2007; Schildkamp, Visscher, & Luyten, 2009;

Visscher & Coe, 2003). Therefore, the impact of these feedback

interventions is an interesting niche to examine in educational research.

Not only it is worthwhile to consider possible school improvement effects,

but also the intended, unintended, desired and undesired outcomes.

Furthermore, before looking at the final outcomes, a closer look is

warranted on the process of feedback use, including the influencing

(f)actors.

2. Conceptual framework for School Performance Feedback Systems

2.1. Data-driven school improvement

Data driven decision making

Data-driven/-based decision making or data-driven school improvement

can be defined as “systematically analyzing existing data sources within the

Chapter 1

4

school, applying outcomes of analyses to innovate teaching, curricula, and

school performance, and, implementing (e.g. genuine improvement actions)

and evaluating these innovations” (Schildkamp & Kuiper, 2010, p 482).

Gathering data in order to continuously improving school actions is the

central goal of data-driven school improvement. This continuous movement

is characterized by a cyclic process; illustrated by several models described

in educational management literature. A generic model is the Shewhart or

Deming cycle (Deming, 1986), often applied in educational contexts. The

first P refers to the planning phase, followed by Doing, Checking and Acting.

Specific for data-driven decision making, these elements recur in several

data-use models both in practical and research literature (Abbott, 2008;

Learning Point Associates, 2004; Verhaeghe, Vanhoof, Valcke, & Van

Petegem., 2010; Mandinach, Honey, Light, & Brunner, 2008; Zupanc, Urank,

& Bren, 2009). Thereby, data are used to inform on the functioning of a

school, to set goals and make sound decisions for improvement, and to

evaluate the outcomes of these improvement actions.

School accountability and improvement

Most literature on data use results from studies from the United States;

thus from contexts in which school accountability traditionally has been

stressed (e.g., Teddlie, Kochan, & Taylor, 2002). Recent studies are often

situated within an educational context in which setting high standards and

establishing measurable goals is believed to improve individual outcomes in

education, as illustrated by the No Child left Behind Act (e.g., Schildkamp &

Teddlie, 2008). Therefore, most of these studies report on assessment data

only. However, in recent years, more studies on data-driven decision

making for school improvement have been published (e.g., Verhaeghe et

al., 2010; Visscher & Coe, 2003; Zupanc, Urank, & Bren, 2009). Within these

studies, data-based decision making is conceptualized in a broader sense by

not merely focusing on improving student outcomes and assessment data.

Several data sources are integrated to base decision on, such as self-

evaluation data, results of school and student surveys, school inspection

data, etc. (Schildkamp & Kuiper, 2010).

Systems providing data with the purpose of school accountability are

referred to as official accountability systems (Tymms, 1999). In a context of

which schools are held accountable for publicly funded activities, external

agencies generate data on the schools’ functioning to inform diverse

stakeholders on the return on investments. As opposed to these data

systems, professional monitoring systems generate data for voluntary and

Chapter 1

5

internal use by schools (Tymms, 1999). Therefore, these monitoring

systems are more in accordance with data-driven school improvement.

Both motives of data use appear opposite at first sight. However, several

studies illustrate the complementary and interacting position of school

accountability and improvement (Earl & Fullan, 2003; Hofman, Dijkstra, &

Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank,

& Bren, 2009). For example, assessment data can be used in public rankings

for accountability, while these data are also used for internal use within the

school after secondary analyses have been performed on the data (e.g.,

adjustment for pupil background characteristics, calculation of value-

added). The resulting improvement actions are then considered to

contribute to better pupil performances, which will be measured in the

following assessment. This can be considered as internal evaluation in the

service of external evaluation (Vanhoof & Van Petegem, 2007). On the

opposite, systems especially designed for providing schools with

confidential information can be interesting for school inspectorates to get

insight in the school’s functioning. If these inspectors act as critical friends,

supporting the school as learning organization, this external evaluation

functions in the service of internal evaluation (Vanhoof & Van Petegem,

2007).

2.2. Performance indicator systems and school performance feedback

systems

Performance indicators

To collect data on the schools’ performance and functioning, official

accountability and professional monitoring systems make use of

performance indicators. Goldstein and Spiegelhalter define a performance

indicator as ”a summary statistical measurement on an institution or

system which is intended to be related to the ‘quality’ of its functioning”

(1996, p 385). Following Rowe and Lievesley, these performance indicators

serve as “data indices of information by which the functional quality of

institutions or systems may be measured and evaluated” (2002, p 1). Fitz-

Gibbon & Tymms (2002) emphasize the systematic character of using

performance indicators, as they mention that these indicators are collected

at regular intervals to monitor a system’s performance. The content of

these performance indicators does not only cover output results of schools,

but also input, process and context information. These can include

indicators on resource provision and funding, participation rates of pupils,

Chapter 1

6

repetition rates, class sizes, factors affecting students’ progress rates, etc.

(Rowe & Lievesley, 2002).

To successfully serve schools in their data-driven school improvement,

these indicators have to meet certain requirements. First, feedback needs

to be relevant and useful (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004;

Rowe & Lievesley, 2002). Relevant feedback corresponds to the actual

information needs of the users (Rowe & Lievesley, 2002; Schildkamp &

Teddlie, 2008; Visscher, 2002). Furthermore, feedback needs to be

accurate, which refers to the reliability and validity of the data gathered

(Fitz-Gibbon, 1996; Heck, 2006; Rowe & Lievesley, 2002). Next, the cost-

effectiveness of the indicator system is an important consideration to take

into account (Fitz-Gibbon, 1996; Rowe & Lievesley, 2002). Related to this

utility perspective, the performance indicators should be delivered timely,

which both refers to both the currency and punctuality of the delivered

feedback (Fitz-Gibbon, 1996; Heck, 2006; Rowe & Lievesley, 2002; Visscher,

2002). Furthermore, users need to accept the performance indicators and

consider them to be fair. This fairness does not only refer to the striving

towards unbiased results (Heck, 2006), but also to the interpretability,

reliability, stability and incorruptibility of the reported performance

indicators (Fitz-Gibbon, 1996). Lastly, performance indicators should strive

towards beneficial effects and should avoid unwarranted harm (Fitz-

Gibbon, 1996; Fitz-Gibbon & Tymms, 2002; Goldstein & Myers, 1996). School Performance Feedback Systems (SPFSs)

A particular type of performance indicator systems are School Performance

Feedback Systems (SPFSs), which “are information systems external to

schools that provide them with confidential information on their

performance and functioning as a basis for school self-evaluation” (Visscher

& Coe, 2002, p xi). SPFSs primarily aim at supporting school improvement

and internal quality policy.

The different components of this definition require some explanation.

• The systemic organization of the feedback initiative: The feedback

providers are bound to an organization and produce school performance

feedback not as a one-shot activity but on a systematic basis. According

to the definition of performance indicators by Fitz-Gibbon & Tymms

(2002), data are collected at regular intervals to monitor a system’s

performance.

• The external component: SPFSs are external systems that offer their

services to schools. These services include data gathering, analysis and

reporting and sometimes support in using the feedback provided.

Chapter 1

7

• The goal of school improvement: This implies that SPFS developers

provide the school with performance feedback on a confidential basis, in

contrast with information made public for accountability reasons. By

generating data for voluntary use by schools, SPFSs are considered as

professional monitoring systems (Tymms, 1999).

• The unit level of information: SPFSs offer feedback on the schools’

functioning, which contains school level information. Therefore,

aggregation of individual pupil results is required.

• The content of the feedback: The content refers to the schools’

performance and functioning. This schools’ functioning encompasses

more than merely output results, but also refers to context, input and

process related indicators.

• The confidential character of the data: The development and discussion

of SPFSs knows increased attention as more studies are published on the

confidential use of information within schools. As a reaction on the

negative consequences of making results public (e.g. measure fixation,

misinterpretation; Smith, 1995) educational governments support

research initiatives by which low stake testing is promoted. Instead of

competition and external drives for school improvement, the inherent

need of monitoring quality in school functioning is stimulated by

providing confidential information.

• The focus on feedback: In accordance to feedback intervention theories,

data delivered by SPFSs is aimed to reduce the gap between the

intended and actual performance of actors (Black & William, 1998;

Hattie & Timperley, 2007). In order that feedback would be effective,

some conditions of the tasks performed, the feedback and situation

need to be fulfilled (Kluger & DeNisi, 1996). Applied to SPFSs, this implies

that outcomes of feedback use will be determined by characteristics of

the feedback reports, the educational context and the users. This will be

briefly discussed in the next paragraph.

The research domain of school performance feedback systems is recent and

rather unexplored. Especially studies on the actual use of these feedback

systems and their impact on the schools’ functioning are scarce. Therefore,

a firm overall theoretical framework for describing and evaluating school

performance feedback use and its effects is lacking.

2.3. Factors influencing school performance feedback utilization

Differences in the use of school feedback can be attributed to a variety of

factors. The most commonly used framework is that of Visscher (2002;

Chapter 1

8

Visscher & Coe, 2003). It has been applied in several studies on data use

(e.g., in Maier, 2010; Schildkamp & Teddlie, 2008; Schildkamp & Visscher,

2009; Verhaeghe et al., 2010; Zupanc, Urank & Bren, 2010). This framework

discerns four sets of factors influencing the use of the performance

feedback, including the design process and features of the underlying

SPFSs, the implementation process and the school organizational features.

This framework served as a basis for the studies conducted in this

dissertation, although some adaptations were made. Visscher and Coe

embed the process of feedback use in the broader school environment,

which we define as context-related factors. Furthermore, we distinguish

support-related factors as a separate set instead of placing it within the

implementation process and characteristics of the feedback system. As a

result, the following set of influential factors is outlined: Factors related to

the educational context, to school and users, to SPFSs, and to support. As

this framework will be described further on in this dissertation, we refer to

the main components and ideas.

The educational context of SPFSs

Context-related factors that impact feedback use include the school’s policy

strategies at the regional and/or governmental level (Sun, Creemers, & De

Jong, 2007; Visscher, 2002). For instance, policies can contain clear

expectations that schools make use of feedback information. Educational

governments can stimulate feedback use by pressure and/or support

(Visscher, 2002). Furthermore, feedback will be used differently depending

on the context in which accountability and/or improvement play a role (Earl

& Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof &

Van Petegem, 2007; Visscher, 2002; Zupanc, Urank, & Bren, 2009).

Moreover, data use will depend on the accessibility of data sources. For

example in Flanders, no central examination systems are available. This

means there is no public reporting on school examination results and

almost no high stakes testing, in contrast to the educational context in the

UK. Therefore, the data culture and the related data sources in English

schools will differ apparently from those in Flemish schools. Also

educational inspectorates in their role as quality guard keepers and critical

friends may promote the use of data (Vanhoof & Van Petegem, 2007). For

example, Flemish schools are encouraged to inform the inspectorate on

their functioning by means of output results. Depending on the

prescriptions and expectations of these inspections, certain types of data

use will be promoted.

Chapter 1

9

Users of SPFSs

School- and user-related characteristics are also key variables explaining

differences in school feedback use. Schildkamp and Kuiper (2010) mention

the style of school leadership, the degree of teacher collaboration, the

shared vision, norms and goals for data use, the available time to use data,

the provided training for data management and use, the designated data

expert in the school, and the pressure and support if using data as

important school characteristics having an influence on data use.

Furthermore, school performance levels also influence feedback use

(Visscher, 2002; Visscher & Coe, 2003). Schools receiving positive feedback

(large value added) will discuss the results differently compared to schools

receiving a less positive picture (Schildkamp, 2007). In line with control

theory, participants receiving negative feedback are more likely to make an

effort to reduce the discrepancy between the negative feedback and the

expected standards (Kluger & DeNisi, 1996). This will result in different

policy implications. However, this theory does not hold in all cases; it is not

unusual for school principals to withhold feedback information that does

not fit the current policy plan (Van Petegem & Vanhoof, 2004).

Considering personal characteristics of the feedback users, we firstly

refer to the motivation and attitudes to use an SPFS. Motivation varies from

internal quality development or external accountability to policy

preparation (Liket, 1992; van Aanholt & Buis, 1990). A negative attitude

towards SPF is – according to Bosker, Branderhorst, and Visscher (2007) –

one of the main obstacles in the use of feedback information. The attitude

is the most significant aspect that determines a person’s willingness to

invest time and energy in dealing with information (Williams and Coles

2007) and the users’ belief that they need the data in order to improve

education (Schildkamp and Kuiper 2010). Furthermore, previous

experiences with feedback use, general experience with school-related

data, and the statistical knowledge and skills needed to interpret feedback

reports will also influence feedback use. This data literacy “encompasses

the strategies, skills and knowledge needed to define information needs,

and to locate, evaluate, synthesize, organize, present and/or communicate

information as needed” (Williams and Coles 2007, p 188). Whereas most

teachers have experience with school test data, pupil monitoring systems,

and self-evaluations, in several studies school staff report that they are

lacking the skills and confidence when using data for school policy purposes

(Earl & Fullan, 2003; Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006;

Saunders, 2000; Williams & Coles, 2007). Data literacy is a condition for

being able to convert data into valuable and usable information (Earl and

Chapter 1

10

Fullan 2003). The current lack of know-how on making use of the

information is an important obstacle (Kerr et al. 2006; Saunders 2000; Van

Petegem and Vanhoof 2004; Williams and Coles 2007). Next to a lack of

capacities needed to interpret the data, there often is a lack of well

developed research skills such as the formulation of research questions and

hypotheses (Earl and Fullan 2003; Herman and Gribbons 2001; Kerr et al.

2006).

Characteristics of feedback reports and underlying SPFS

Not the characteristics of the feedback (system) but the users’ perception

of these characteristics mainly determines how feedback will be used

(Visscher, 2002). Therefore, we refer to the quality characteristics of

performance indicators outlined before (Fitz-Gibbon, 1996; Heck, 2006;

Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher,

2002). Consistent with our definition of SPFSs, feedback systems for school

improvement should guarantee confidentiality and anonymity to the

subjects and schools. At the level of content, feedback should be perceived

as relevant, non-threatening, and corresponding to the actual informational

needs (Schildkamp & Teddlie, 2008; Van Petegem & Vanhoof, 2007;

Visscher, 2002). Information should also be up-to-date, reliable, and valid

(Schildkamp & Teddlie, 2008; Visscher, 2002; Visscher & Coe, 2003). In

terms of ethical issues, feedback should at least do no harm (Fitz-Gibbon

and Tymms, 2002). For example, in some cases feedback can be threatening

to the recipients’ self-esteem, particularly in a system of accountability

(Visscher & Coe, 2003). Moreover, feedback should not harm subjects or

schools on the basis of misleading information (Goldstein & Myers, 1996).

Both features of the feedback reports as the underlying feedback system

are influencing the outcomes of the feedback usage. No detailed

frameworks or descriptions of these different components have been

published. Research is lacking on the variety in school performance

feedback systems. The publication of Visscher & Coe (2002) is the first

overview of some SPFSs worldwide, but no detailed comparative study has

been performed. Furthermore, the question stays unanswered why

feedback systems have been developed in a certain way. More information

on the rationales of feedback designers for opting for certain features is

required.

Chapter 1

11

Support in using SPFSs

Considering the lack of data literacy skills, school feedback users are

requesting for support, not only when interpreting the data, but also for the

further steps in data use. As a result, numerous studies stress the

importance of providing feedback support (Schildkamp & Teddlie, 2008;

Schildkamp, Visscher, & Luyten, 2009; Van Petegem & Vanhoof, 2007;

Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc, Urank, & Bren,

2009). This support can be provided by school staff within the school but

also by externals (e.g., educational support services or feedback suppliers),

either organized formally or informally, by one shot or long term

interventions, involving school principals or (parts of) the school team.

Furthermore, these support initiatives can be organized within or outside

the school, what can be considered as onservice and inservice education

and training (Gardner, 1995). School staff that are involved in SPFS training

are more likely to read the feedback reports and adopt a more positive

attitude (Tymms, 1995). However, research on the impact of support

initiatives related to the use of SPF is scarce as current support initiatives

often lack empirical verification (Zupanc, Urank, & Bren, 2009).

2.4. School performance feedback use: Types, phases and effects

Types of school performance feedback use

School feedback can be used in several ways, depending on what feedback

users aspire to. Rossi, Lipsey, and Freeman (2004) made a classification of

types of evaluation use: instrumental, conceptual and symbolic/convincing

use. This classification has been applied in studies on SPF use (Schildkamp,

Visscher, & Luyten, 2009; Verhaeghe et al., 2010; Visscher & Coe, 2003;

Weiss, 1998). An instrumental use of feedback serves as a starting point for

immediate policy-making decisions. For example, new reading methods are

introduced as the previous method led to disappointing results. A

conceptual use of feedback does not result in concrete actions but

influences the decision-making process, which indirectly affects action. An

example of conceptual use is the altered way of thinking about repeating

classes when confronted with remarkably high numbers for the school.

Even if feedback does not influence one’s conceptualizations, it can affect

the policy-making process in a symbolic way. This means feedback results

serve to convince others of existing opinions and to support viewpoints in

discussions (Visscher, 2002). Visscher & Coe (2003) added a fourth type of

data use: strategic use. Feedback can be used in a strategic way for

Chapter 1

12

accountability purposes, although this is not in line with a school

improvement discourse. These four types of feedback use can be

considered as intermediate results of feedback use that eventually will

contribute to school improvement. For example, a conceptual use results in

an altered way of thinking about pupil performances. This intermediate

result can in the end lead to effects of feedback use, such as a stronger

achievement orientation. In addition, feedback also can be used as a mean

to motivate or stimulate school staff to improve (Verhaeghe et al., 2010;

Schildkamp & Kuiper, 2010). Finally, a pupil-directed use of data is observed

when pupil level data stimulates supporting individual pupils in their

learning process (Verhaeghe et al., 2010).

Phases in school performance feedback use

In the framework of Visscher (2002; Visscher & Coe, 2003), SPFS usage is

described only in types of use. In addition, also phases in use could be

discerned (Verhaeghe et al., 2010). In analogy with the definition of data-

driven decision making of Schildkamp & Kuiper (2010), SPFS use

encompasses following stages: Analyzing the data, applying outcomes of

these analyses, implementing innovations, and evaluating these

innovations. Also the Learning Point Associates (2004) describes data use in

certain phases: Analyzing data patterns, generating hypotheses, developing

goal-setting guidelines, designing specific strategies, defining evaluation

criteria, and making the commitment with school staff to implement and

evaluate these actions. Specific for school performance feedback use,

following successive stages in feedback use could be discerned (Verhaeghe

et al., 2010; Verhaeghe et al., 2010):

• Receiving the feedback on school

• Reading and discussing

• Interpretation

• Diagnosis

• Planning of improvement actions

• Implementation of improvement actions

• Evaluation of both the improvement actions and the process of feedback

use.

Receiving SPF has turned out to be a necessary yet insufficient step as

both the schools and the feedback systems have to meet certain

requirements in order to actually use this in practice (Verhaeghe et al.,

2010; Visscher and Coe 2003). One of the major phases where school staff

gets stuck, is the interpretation phase, due to the lack of data literacy

competences needed to process the information. Although several studies

Chapter 1

13

report on the fact that school staff often struggle with data interpretation,

an examination of existing SPF systems and their related literature reveals

that research on user comprehension is scarce (Schildkamp & Teddlie,

2008). Few studies have examined the effectiveness of the various modes

of explaining and representing data in school feedback reports. This is

problematic considering the fact that SPF reports use complex concepts and

graphical representations, whilst SPF users (i.e., school staff) are often not

statistically skilled (Earl & Fullan, 2003; Kerr et al., 2006; Saunders, 2000;

Williams & Coles, 2007).

Effects of school performance feedback use

Feedback use should eventually lead to school improvement effects as

improved student outcomes, professional development, improved

didactical approaches, a stronger achievement orientation of staff etc.

(Schildkamp & Teddlie, 2008). This positive feedback impact has been

observed in several studies (Hammond &Yeshanew, 2007; Schildkamp &

Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009). However, as a result

of the difficulties in data interpretation and use and the limited use of

information, current research often reports disappointing results from

school feedback use (Coe, 2002; Saunders & Rudd, 1999; Schildkamp,

Visscher, & Luyten, 2009; Tymms, 1995; Van Petegem & Vanhoof, 2004;

Verhaeghe et al., 2010; Zupanc, Urank, & Bren, 2009). Several studies show

that the actual use of school performance feedback is often limited within

schools, which may (partly) have been caused by the characteristics of

these SPFSs (Earl & Fullan, 2003; Schildkamp & Kuiper, 2010; Schildkamp &

Visscher, 2009; Verhaeghe et al., 2010; Coe & Visscher, 2002).

In contrast to the intended effects, some literature findings refer to

unintended and undesired effects of data use. For example, the

(administrative) workload of teachers and principals may increase as a

result of using an SPFS (Fitz-Gibbon & Tymms, 2002; Schildkamp & Teddlie,

2008). Moreover, participants may feel threatened by the evaluation, and

evaluations may evoke defensiveness (Fitz-Gibbon & Tymms, 2002). Finally,

using an SPFS may have a demotivating impact on teachers, especially in

poorly performing schools (Van Petegem, Vanhoof, Daems, & Mahieu,

2005).

Chapter 1

14

3. Research context: Each school its own mirror

Until now, only a limited number of initiatives to develop data systems have

been undertaken in Flanders. The Flemish dislike of central examinations

and the resulting lack of systematic data collection on the performance of

pupils are in part responsible for this (Van Petegem, et al., 2005). However,

schools are required by law to monitor and improve their own quality in a

systematic manner. How they do so is a matter for the individual school and

is part of the autonomy which schools are granted in Flanders. Deregulation

and decentralization are therefore a continuing part of the educational

policy implemented in Flanders. Schools are becoming increasingly

autonomous and are achieving a greater degree of self-direction. The

Flemish government does not impose any formal systematic obligation

upon schools to carry out self-evaluation or to compel them to collect

output data. Policy with regard to school feedback use is therefore

primarily one of encouragement rather than strong pressure. When

carrying out inspections, the schools education inspectorate is primarily

concerned with schools’ output (in relation to their context, input and

process) and this is not without consequences for the way in which schools

themselves look at their own functioning in general and their output in

particular.

Within this context of autonomy and absence of central examination

data, several data initiatives have been taken. However, an SPFS accessible

to all Flemish schools was nonexistent. Therefore, researchers related to

three Flemish universities (Katholieke Universteit Leuven, Ghent University

and University of Antwerp) shared their expertise in developing an SPFS for

Flemish schools, named the School Feedback Project “Each school its own

mirror”1. The main objective of the School Feedback Project is to provide

schools with confidential information on their functioning to encourage

data-driven school improvement. The feedback project uses data from the

SIBO research project (Schoolloopbanen in het BasisOnderwijs [School

Trajectories in Primary Education]), which is a longitudinal study that has

been set up to investigate the school careers of 6,000 children from a

representative sample of Flemish schools, from the time they entered

kindergarten until the end of primary education. Data are collected by

means of standardized tests, surveys and observational data on child

characteristics, family background, class characteristics, classroom 1 This research was supported by the agency for Innovation by Science and Technology

(IWT), Grant number SBO 50194 (School Feedback Project). IWT is a Flemish government agency stimulating and supporting innovation by providing financial support to research institutes.

Chapter 1

15

practices, teacher attitudes and subjective theory, and school

characteristics (Verachtert, Van Damme, Onghena, & Ghesquiere, 2009;

Verhaeghe, Maes, Gombeir, & Peeters, 2002). The tests focus on language

learning (orthography, reading fluency, reading comprehension) and

mathematics. Item response theory based techniques are used to construct

the test scores, enabling to estimate growth curves. The SPF project, as so

far, was able to deliver trial versions of school feedback reports to the 1952

primary school principals participating.

The resulting trial feedback reports were delivered on yearly basis to the

schools. These individualized school reports informed about the

performance of their cohort under study. Results were reported for

mathematics, reading fluency, and orthography, supplemented with

information about pupil characteristics (child factors, home factors, and

Dutch language skills at the start of Grade 1). The school-specific results

were compared to the Flemish reference group. The central concepts in

these reports include learning gain, value added, and adjusted scores and

were explained in such a way that no prior statistical knowledge was

required. The data were supported with graphical representations (i.e.,

boxplots, bar graphs, pie graphs, growth curves, and cross tables). The text

of each report was standardized. The school principals were required to

interpret the results for their school, based on the general information

made available.

This studies conducted in this dissertation depart from this research and

development feedback project in order to contribute both to further

development of this SPFS and to scientific research on SPF use.

4. Problem statement

Research literature on SPFSs depicts some limitations that require further

examination. First, there is a lack of a firm theoretical framework for SPF

use. Neither the different components, nor the relations between all

variables of the framework developed by Visscher (2002; Visscher & Coe,

2003) can be considered as an overall structure that has been empirically

validated. Further examination of influencing factors on school

performance feedback use is required.

In addition, there is a lack of detailed studies on the use and impact of

existing school performance feedback initiatives (Coe & Visscher, 2002;

2 The number of the sample of SiBO schools receiving feedback reports from the School

Feedback Project might slightly differ from study to study, due to school fusions or drop out.

Chapter 1

16

Goldstein & Spiegelhalter, 1996; Schildkamp, 2007; Schildkamp, Visscher, &

Luyten, 2009; Visscher & Coe, 2003). Evaluation research on the functioning

and impact of SPFSs is warranted in order to evaluate the strengths and

weaknesses of these types of feedback interventions.

Several studies reported on the limited data literacy skills of school staff

in relation to data use. However, no detailed studies on SFPS user

comprehension have been performed. This research topic would be

interesting both from scientific and practical point of view.

In consequence of the limited capacity of school staff in interpreting and

handling the data, there is a large need for support initiatives. Not only

there is a need for setting up more support initiatives, but also the

evaluation of current support is warranted as these initiatives often lack

empirical verification (Zupanc, Urank, and Bren 2009).

5. Dissertation overview: Purpose, research questions and research

design

In the following chapters, five studies will be reported and discussed. In the

next chapter, we provide a general introduction in characteristics of SPFSs.

A framework for characteristics of SPFSs will be applied to five SPFSs

worldwide. This descriptive and analytic study illustrates both the wide

variety in features but also provides a discussion on the rationales for

making choices in feedback design.

Following on a framework of SPFS characteristics, Chapter 3 is devoted

to a framework for SPFS use. Parts of this framework will be used in further

studies described in the successive chapters. Based on the Visscher

framework, both influencing factors, SPF use and the resulting effects will

be analyzed in the context of the School Feedback Project by examining

users’ perceptions.

Intrigued by the call for research on feedback interpretability, the fourth

chapter focuses on the representation and interpretation of central SPF

concepts. Alternatives in representation modes of value added and learning

gain have been examined, by integration of literature on graphical data

representation. Particular attention will be paid to misconceptions and

interpretation difficulties.

The Chapters 5 and 6 tackle two crucial variables in SPF use: data literacy

competences and support in using SPF. By reporting the results of both a

quantitative (Chapter 5) and a qualitative (Chapter 6) study, the outcomes

of a field experiment with participants of the School Feedback Project will

result in recommendations for effective support in using SPF.

Chapter 1

17

A final chapter will enumerate the key finding from all studies by

answering the research questions. A complementary overall discussion and

general conclusion will conclude this dissertation.

Figure 1. Chapter overview

An overview of the research objectives, the central research questions,

methods, data analysis and participants for each of the five studies is

provided in Table 1.

Table 1

Dissertation overview

Research objectives Chapter numbers 2 3 4 5 6

RO 1: Exploring the characteristics of SPFSs �

RO 2: Developing a framework for SPF use, including

influencing factors and effects � (�) (�)

RO 3: Exploring data literacy competences � (�) (�)

RO 4: Exploring effects of alternative data

representation modes on feedback

interpretation abilities

�

RO 5: Exploring effects of support on SPF use � �

Chapter 1

18

Research questions 2 3 4 5 6

RQ1: What variety in SPFS characteristics can be

observed? �

RQ2: What are the rationales behind choosing for

certain SPFS characteristics? �

RQ3: What phases can be observed in practice when

schools use SPF? � (�) (�)

RQ4: What is/are the result(s) of using SPF? � (�) (�)

RQ5: How can differences be explained in the

interpretation and use of SPF in different

school contexts?

�

RQ7: What’s the differential impact of alternative

explanations and representations of value-

added on the conceptual and procedural

understanding of non-statically skilled?

�

RQ8: To what extent are variations in SPF use

influenced by data literacy competences? �

RQ9: To what extent does specific SPF support has

an impact on the development of SPF

competences, actual SPF use and resulting SPF

effects?

� �

RQ9.1: To what extent does INSET and ONSET

for SPF use have and impact on the

level of satisfaction of SPF users?



data literacy competences of SPF

users?



use of this feedback within the school?



school improvement effects of SPF

use?

Methods 2 3 4 5 6

Survey research �

In-depth interviews � � �

Experiment � � �

Chapter 1

19

Data analysis 2 3 4 5 6

Qualitative analysis � � �

IRT-techniques � �

Path modeling �

Analysis of covariance �

Participants 2 3 4 5 6

School principals � � �

Students �

Feedback providers �

Note: � = main goal of study; (�) = side goal of study; SPF = school performance

feedback

References

Abbott, D.V. (2008). A functionality framework for educational

organizations: Achieving accountability at scale. In E. B. Mandinach & M.

Honey (Eds.), Data-driven school improvement: Linking data and

learning (pp. 257-276). New York: Teachers College Press.

Black, P. & William, D. (1998). Assessment and classroom learning.

Assessment in Education: Principles, Policy & Practice, 5(1), 7-75.

Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the

utilisation of management information systems in secondary schools.

School Effectiveness and School Improvement, 18(4), 451-467.

Coe, R. & Visscher, A.J. (2002). Drawing up the balance sheet for school

performance feedback systems. In A. J. Visscher & R. Coe (Eds.), School

improvement through performance feedback (pp. 221-254). Lisse, The

Netherlands: Swets & Zeitlinger.

Coe, R. (2002). Evidence on the role and impact of performance feedback in

schools. In A. J. Visscher & R. Coe (Eds.), School improvement through

performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger.

Deming, W.E. (1986). Out of the crisis. Cambridge: Massachusetts Institute

of Technology,Center for Advanced Engineering Study.

Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge

Journal of Education, 33(3), 383-394.

Fitz-Gibbon, C.T. (1996). Monitoring education: Indicators, quality and

effectiveness. London: Cassell.

Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in

indicator systems: Doing things right and doing wrong things. Education

Policy Analysis Archives, 10(6), 1-28. Retrieved from

http://epaa.asu.edu/ojs/article/viewFile/285/411

Chapter 1

20

Gardner, R. (1995). Onservice Teacher Education. In L. W. Anderson (Ed.),

International Encyclopedia of Teaching and Teacher Education (pp. 628-

632). London: Pergamon Press.

Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code

of ethics for performance indicators. Research Intelligence, 57, 12-16.

Goldstein, H. & Spiegelhalter, D.J. (1996). League tables and their

limitations: Statistical issues in comparisons of institutional

performance. Journal of the Royal Statistical Society: Series A: Statistics

in Society, 159(3), 385-443.

Hammond, P., & Yeshanew, T. (2007). The impact of feedback on school

performance. Educational Studies, 33(2), 99-113.

Hattie, J. & Timperley, H. (2007). The power of feedback. Review of

Educational Research, 77(1), 81-112.

Heck, R. (2006). Assessing school achievement progress: Comparing

alternative approaches. Educational Administration Quarterly, 42(5),

667-699.

Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support

school inquiry and continuous improvement: Final report to the Stuart

Foundation. Los Angeles: University of Carolina, Center for the Study of

Evaluation.

Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School self-

evaluation and student achievement. School Effectiveness and School

Improvement, 20(1), 47-68.

Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006).

Strategies to promote data use for instructional improvement: Actions,

outcomes, and lessons from three urban districts. American Journal of

Education, 112, 496-520.

King, P. & Kitchener, K. (1994) Developing Reflective Judgement:

understanding and promoting intellectual growth and critical thinking in

adolescents and adults. San Francisco, CA: Jossey-Bass.

Kluger, A.N., & DeNisi, A. (1996). The effects of feedback interventions on

performance: A historical review, a meta-analysis, and a preliminary

feedback intervention theory. Psychological Bulletin, 119(2), 254–284.

Learning Point Associates. (2004). Guide to using data in school

improvement efforts: A compilation of knowledge from data retreats and

data use at learning point associates. Retrieved from

http://www.learningpt.org/pdfs/datause/guidebook.pdf

Leithwood, K., Aitken, R., & Jantzi, D. (2006). Making schools smarter:

Leading with evidence. (3rd. ed.) Tousand Oaks, CA: Corwin Press.

Liket, T.M.E. (1992). Vrijheid & rekenschap: Zelfevaluatie en externe

evaluatie in het voortgezet onderwijs [Freedom and accountability: Self

Chapter 1

21

evaluation and external evaluation in secondary education]. Amsterdam:

Meulenhoff Educatief.

Maier, U. (2010). Accountability policies and teachers' acceptance and

usage of school performance feedback - a comparative study. School

Effectiveness and School Improvement, 21(2), 145-165.

Mandinach, E.B., Honey, M., Light, D., & Brunner, C. (2008). A conceptual

framework for data-driven decision making. In E. B. Mandinach & M.

Honey (Eds.), Data-driven school improvement: Linking data and

learning (pp. 13-31). New York: Teachers College Press.

Nevo, D. (2002). Dialogue evaluation: Combining internal and external

evaluation. In D. Nevo (Ed.), School-based evaluation: An international

perspective (pp. 3–16). Oxford, UK: Elsevier Science.

Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A

systematic approach. Thousand Oaks: Sage.

Rowe, K. & Lievesley, D. (2002). Constructing and using educational

performance indicators. Paper presented at the 2002 Asia-Pacific

Educational Research Association, Melbourne, Australia.

Rowe, K. (2004). Analysing and reporting performance indicator data:

'Caress' the data and user beware! Paper presented at the 2004 Public

Sector Performance and Reporting Conference, Sydney, Australia.

Saunders, L. (2000). Understanding schools’ use of ‘value added’ data: The

Psychology and sociology of numbers. Research Paper in Education,

15(3), 241-258.

Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’

data: A science in the service of an art? Paper presented at the British

Educational Research Association Conference, Brighton, University of

Sussex.

Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems

in the USA and in the Netherlands: A comparison. Educational Research

and Evaluation, 14(3), 255-282.

Schildkamp, K. (2007). The utilisation of a self-evaluation instrument for

primary education. Unpublished doctoral dissertation, University of

Twente, Enschede, The Netherlands.

Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform:

Which data, what purposes, and promoting and hindering factors.

Teaching and Teacher Education, 26(3), 482-496.

Schildkamp, K., & Visscher, A. (2009). Factors influencing the utilisation of a

school self-evaluation instrument. Studies in Educational Evaluation,

35(4), 150-159.

Chapter 1

22

Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school

self-evaluation instrument. School Effectiveness and School


Smith, P. (1995). On the unintended consequences of publishing

performance data in the public sector. International Journal of Public

Administration, 18(2&3), 277-310.

Sun, H., Creemers, B.P.M., & De Jong, R. (2007). Contextual factors and

effective school improvement. School Effectiveness and School

Improvement, 18(1), 93–122.

Teddlie, C., Kochan, S., & Taylor, D. (2002). The ABC+ model for school

diagnosis, feedback, and improvement. In A. J. Visscher & R. Coe (Eds.),

School improvement through performance feedback (pp. 75-114). Lisse,

The Netherlands: Swets & Zeitlinger.

Tymms, P. (1999). Baseline assessment and monitoring in primary schools.

Fulton Publishers: London.

van Aanholt, T., & Buis, T. (1990). De school onder de loep [The school under

scrutiny]. Culemborg, The Netherlands: Educaboek.

Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatie-

indicatoren als strategisch instrument voor schoolontwikkeling

[Feedback on school performance indicators as strategic instrument for

school improvement]. Pedagogische Studiën, 81, 338–353.

Van Petegem, P., Vanhoof, J., Daems, F., & Mahieu, P (2005). Publishing

information on individual schools. Educational Research and Evaluation,

11(1), 45-60.

Vanhoof, J. & Van Petegem, P. (2007). Matching internal and external

evaluation in an era of accountability and school development: Lessons

from a Flemish perspective. Studies in Educational Evaluation, 33(2),

101-119.

Verachtert, P., Van Damme, J., Onghena, P., & Ghesquiere, P. (2009). A

seasonal perspective on school effectiveness: Evidence from a Flemish

longitudinal study in kindergarten and first grade. School Effectiveness

and School Improvement, 20(2), 215-233.

Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using

School Performance Feedback: Perceptions of Primary School Principals.


Verhaeghe, G., Vanhoof, J., Van Petegem, P., Verhaeghe, J.P., & Van

Damme, J. (in press). Het gebruik van outputgegevens in basisscholen:

Concretiseringen en illustraties uit het Schoolfeedbackproject [The use

of output results in primary schools: Concretizations and illustrations

from the School Feedback Project). Kwaliteitszorg in Het Onderwijs.

Chapter 1

23

Verhaeghe, J.P., Maes, F., Gombeir, D., & Peeters, E. (2002). Longitudinaal

onderzoek in het basisonderwijs. Steekproeftrekking [A longitudinal

study in primary education: Sampling procedure]. Leuven, Belgium:

Steunpunt Loopbanen doorheen Onderwijs naar Arbeidsmarkt.

Visscher, A.J. (2002). A framework for studying school performance

feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement

through performance feedback (pp. 41-71). Lisse, The Netherlands:

Swets & Zeitlinger.

Visscher, A.J., & Coe, R. (2003). School performance feedback systems:

Conceptualisation, analysis, and reflection. School Effectiveness and

School Improvement, 14(3), 321-349.

Visscher, A. J., & Coe, R. (Eds.). (2002). School improvement through

performance feedback. Lisse, The Netherlands: Swets & Zeitlinger

Weiss, C.H. (1998). Have we learned anything new about the use of

evaluation? American Journal of Evaluation, 19(1), 21-33.

Williams, D., & Coles, L. (2007). Teachers’ approaches to finding and using

research evidence: An information literacy perspective. Educational

Research, 49(2), 185-206.

Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for

effectiveness and improvement in classrooms and schools in upper

secondary education in Slovenia: Assessment of/for Learning Analytic

Tool. School Effectiveness and School Improvement, 20(1), 89-122.

24

CHAPTER 2

CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS

Chapter 2

25

CHAPTER 2: CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS∗∗∗∗

Abstract

As evaluation and data-driven decision making are receiving increased

attention in education, more and more School Performance Feedback

Systems (SPFSs) are being developed and used worldwide. These systems

provide schools with data on their functioning. However, little research is

available on the characteristics of the different SPFSs. Therefore, this study

reflected on characteristics SPFSs to provide feedback designers and users

arguments for making sound choices in selecting certain school

performance feedback characteristics. Based on literature on data driven

decision making, a framework for identifying SPFS characteristics was

developed. Next, this framework was applied to five diverse SPFSs.

Interviews and surveys were administered, and documents about the

selected SPFSs were collected. By integrating the results of the survey,

semi-structured interviews, and SPFS documents, a summary meta-matrix

was created. The results illustrate wide variety of the five selected SPFSs,

with respect to features related to data gathering and data analysis

processes, the content, and the numerical measures and representation

modes used. Large variety in complexity and accuracy of the data modeling

can be detected. These findings imply that users need to be informed

properly on the underlying rationales of SPFSs features and on the

limitations and strengths of the performance indicators used. Expanding

and adjusting on the preliminary framework into a set of standards SPFS

developers and schools can use, might aid to develop efficient instruments

for data driven decision making.

∗ Based on Verhaeghe, G., Schildkamp, K., & Luyten, H. (2010). Characteristics of School

Performance Feedback Systems. Manuscript submitted for publication in Educational

Administration Quarterly.

Chapter 2

26

1. Introduction

Schools all over the world have been granted more autonomy.

Governmental bodies consider them as learning organizations and hold

them accountable for continuously monitoring their internal quality policy

and improving their functioning (Hofman, Dijkstra, & Hofman, 2009;

Leithwood, Aitken, & Jantzi, 2006). Therefore schools are required to

systematically gather data on their school functioning for self-evaluation

purposes. Several schools use School Performance Feedback Systems

(SPFSs) to gather these data, which “are information systems external to

schools that provide them with confidential information on their

performance and functioning as a basis for school self-evaluation” (Visscher

& Coe, 2002, p xi). SPFSs primarily aim at supporting school improvement

and internal quality policy. These feedback initiatives contribute to the

creation of information-rich environments which are essential for schools in

their data-driven decision making. Although data from SPFSs are only one

source of information, they may provide schools with important

information on variables associated with school effectiveness, which

schools can use to improve their performance in terms of improving

teacher instruction and ultimately student achievement (Davies & Rudd,

2001; Visscher & Coe, 2003). However, the empirical findings are not always

confirming the expected positive effects of the SPFSs. Several studies show

that often the actual use of school performance feedback is limited within

schools, which may (partly) have been caused by the characteristics of

these SPFSs (Earl & Fullan, 2003; Schildkamp & Kuiper, 2010; Schildkamp &

Visscher, 2009; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010; Coe &

Visscher, 2002).

A wide variety of SPFSs can be discerned, all designed for specific

purposes in certain educational contexts. All adopt their own data gathering

systems, statistical methods, data representations, etc. However, little is

known on the distinct characteristics of these SPFSs or on the rationales

behind these features. Little is also known about whether its users are

capable of correctly interpreting and analyzing data derived from these

systems, which is a crucial condition for data-driven decision making. A

debate on characteristics of SPFSs would be a first starting point for

reflection for current and future feedback providers and users. Therefore,

this study has been set up, focusing on the characteristics of SPFSs,

specifically on the data gathering and data analysis processes, the content,

and the representation modes of SPFSs. We will examine the variety in

these aspects and the underlying rationales of these variations.

Chapter 2

27

2. Conceptual framework

2.1. School performance feedback systems

The definition of SPFSs includes several important aspects:

• The systemic organization of the feedback initiative: The feedback

providers are bound to an organization and produce school performance

feedback not as a one-shot activity but on a systematic basis.

• The external component: This refers mainly to the data analysis and

feedback provision. The data gathering process can be conducted in

cooperation with school team members.

• The goal of school improvement: This implies that SPFS developers

provide the school performance feedback on a confidential basis, in

contrast with information made public for accountability reasons. By

generating data for voluntary use by schools, SPFSs are considered as

professional monitoring systems. They differ from official accountability

systems, by which schools are hold accountable as publicly funded

institutions (Tymms, 1999).

• The unit level of information: School performance feedback goes beyond

individual pupil results. At least some indications are provided on the

school’s functioning and effectiveness by aggregating data.

• The content of the feedback: The content refers to the schools’

performance and functioning. This schools’ functioning encompasses

more than merely output results, but also refers to context, input and

process related indicators.

If one looks at the definition and characteristics of an SPFS, many

different systems might be considered as SPFSs, including central

examination systems, school inspectorate, national assessment systems,

pupil monitoring systems, research projects, school self-evaluation systems

and providers of standardized tests (see Table 1).

However all systems described in Table 1 can function as SPFSs, they

simultaneously might function as an official accountability system. For

example, central examination data are often considered by inspectorates

and parents as a performance indicator for the school’s functioning. In

addition, these data can be transformed into confidential feedback, after

having performed secondary analyses on these results (Yang, Goldstein,

Rath, & Hill, 1999). Also, reports from inspection visits can serve both

purposes of accountability and improvement. This illustrates that the

relation between accountability and improvement may have different

Chapter 2

28

configurations (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009;

Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank, & Bren, 2009).

Table 1.

Different kinds of SPFSs

SPFS Description

Genuine SPFSs The core task of these systems is providing schools with

confidential information on their functioning.

Central examination

systems

Sometimes, (raw or adjusted) results of central

examinations are fed back to schools for school

improvement, instead of/ in addition to making the

results public.

School inspectorates These reports can be considered as school feedback if

they serve the purpose of school improvement,

instead of/ in addition to accountability.

National assessment

systems

This differs from central examinations as this

information is gathered in the first place for

educational governments to make a state of the art

of a national educational level. However, if school-

specific results are confidentially fed back to

schools, it can be considered as school feedback.

Pupil monitoring

systems

These systems are developed to assess individual

pupils’/ students’ learning progress in the first place.

These results can be used as school feedback, when

also aggregated reports are provided for a group of

pupils/ students.

Research projects Participation in research projects can result in a school

feedback report, as a return in investment.

School self-

evaluation systems

These are systems developed only with the purpose to

provide schools with confidential information on

their performance and functioning.

Standardized tests Some (psychometric) standardized tests, taken from

individual pupils/ students, can result in aggregated

scores for a class or group and thus can be

considered as school feedback.

2.2. Performance indicators

SPFSs gather information on the schools’ performance and functioning, by

making use of performance indicators. Following Goldstein and

Spiegelhalter “a performance indicator is a summary statistical

Chapter 2

29

measurement on an institution or system which is intended to be related to

the ‘quality’ of its functioning” (1996, p 385). Rowe and Lievesley add an

evaluative component to this definition: “performance indicators (PIs) are

defined as data indices of information by which the functional quality of

institutions or systems may be measured and evaluated” (2002, p 1).

Applied to the context of schools and internal quality policy, Fitz-Gibbon &

Tymms (2002, p 2) define an indicator “as an item of information collected

at regular intervals to track the performance of a system”. Hereby, they

emphasize the systematic character of the data gathering and analysis,

which corresponds to the definition of SPFSs by Visscher and Coe (2002).

School performance indicators do not only report about the output

aspect of school quality, such as pupil achievement results, but also on the

context, input and process of the school’s functioning. These can include

indicators on resource provision and funding, participation rates of pupils,

repetition rates, class sizes, factors affecting students’ progress rates, etc.

(Rowe & Lievesley, 2002).

To successfully serve schools in their internal quality policy, these

indicators have to meet certain requirements (Fitz-Gibbon, 1996; Heck,

2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008;

Visscher, 2002). First, feedback needs to be relevant and useful, which

means it corresponds to the actual information needs of the users.

Furthermore, feedback needs to be accurate, which relates to the reliability

and validity of the data gathered. Next, the cost-effectiveness of the

indicator system is an important factor to take into consideration. Related

to this utility perspective, the performance indicators should be delivered

timely, which both concerns the currency and punctuality of the delivered

feedback. Furthermore, users need to accept the performance indicators

and consider them to be fair. This fairness does not only refer to the striving

towards unbiased results, but also to the interpretability, reliability, stability

and incorruptibility of the reported performance indicators. Lastly,

performance indicators should strive towards beneficial effects and should

avoid unwarranted harm (Goldstein & Myers, 1996; Fitz-Gibbon, 1996, Fitz-

Gibbon & Tymms, 2002). Although there is a lack of systematic evaluation

of effects of SPFSs (Visscher & Coe, 2003), some literature findings refer to

unintended effects of data use. For example, the (administrative) workload

of teachers and principals may increase as a result of using an SPFS (Fitz-

Gibbon & Tymms, 2002; Schildkamp & Teddlie, 2008). Moreover,

participants may feel threatened by the evaluation, and evaluations may

evoke defensiveness (Fitz-Gibbon & Tymms, 2002). Finally, using an SPFS

may have a demotivating impact on teachers, especially in poorly

performing schools (Van Petegem, Vanhoof, Daems, & Mahieu, 2005).

Chapter 2

30

2.3. Framework for SPFSs

The central aim of this study is to explore the variety in characteristics of

SPFSs and to reveal the underlying rationales. School performance feedback

systems however cover a wide range of characteristics. In this study, we

focus on the following main aspects of school performance feedback (see

Table 2): the data gathering process, the data analysis, and the content of

the feedback report, with a focus on the numerical measures and graphical

representations used. Describing the data gathering and analysis is crucial

to get a view on the reliability and validity of the feedback produced. In

order to get a view on the relevance of the feedback system, the content of

the feedback reports of the selected systems are described. Finally, we

focus on the feedback representations used. This includes both the

numerical measures and graphical representations used, to get a view on

the interpretability of what is fed back to schools.

Table 2.

A framework for comparing SPFSs

SPFS characteristics

Data gathering

- Data administrators ( e.g., school team members, field workers from SPFS )

- Medium (e.g., paper pencil, computer)

- Structuredness of instruments (e.g., completely structured, semi structured,

computer adaptive)

- Types of instruments (e.g., tests, interviews, surveys, observation scales)

- Data source (e.g., pupils, teachers, parents)

- Timing (e.g., any time, fixed moments)

- Place (e.g., classroom, computer lab, playground)

- Options in test administration (e.g., fixed, flexible or demand driven supply)

Data analysis

- Type of analysis (e.g., quantitative, qualitative)

- Scaling model (e.g., Classical Test Theory, Item Response Theory)

- Model used (e.g., regression model, Ordinary Least Squares, multilevel

analysis)

- Type of value added (e.g., prior, concurrent)

- Levels of unit (e.g., pupil level, year group level, school level, cohort level,

subscale level, item level, subject level, aggregate level)

- Measurement moments (e.g., single measurements, successive

measurements, two linked measurements, longitudinal measurements)

Chapter 2

31

Feedback content

- Variables (e.g., attitudinal, behavioural, cognitive, contextual)

- Subjects (e.g., language, mathematics, science, world orientation)

- Non subject specific information (e.g., school culture, pupil background

variables, pupil mobility, socio-emotional development, ADHD scale,

attitudes to school, dyslexia, study skills)

- Reference group (e.g., national average, representative sample of

population, group of participating schools)

- Type of reference (e.g., self-referenced, norm referenced, criterion

referenced)

- Reliability indication (e.g., confidence intervals, significant values)

- Text content (e.g., results, interpretation of results, explanation of statistical

concepts and graphical representations, information on how to

communicate results)

- Numerical measures (e.g., raw scores, expected scores, cut off score, gain

score, mean score, predicted score, value-added score)

- Feedback medium (e.g., static reporting, flexible tool)

- Graphical representations (e.g., bar graph, box plot, histogram, layer graph,

line graph, pie graph)

- Reliability indices (e.g., confidence intervals, significance values)

3. Method

3.1. Instruments

In order to create a framework for describing SPFSs a qualitative method

has been used. Literature on SPFSs reveals that the framework developed

by Visscher (Visscher, 2002; Visscher & Coe, 2003) is the most frequently

cited and used (e.g., in Maier, 2010; Schildkamp & Teddlie, 2008;

Schildkamp & Visscher, 2009; Verhaeghe et al., 2010; Zupanc, Urank &

Bren, 2010). This framework discerns four sets of factors influencing the use

of the performance feedback, including the design features of the

underlying SPFSs and the characteristics of the feedback report itself. In

addition to the framework of Visscher, more concrete features will be used

to compare the selected SPFSs. These are chosen based on literature review

on performance indicators, data-driven decision making, data use and

SPFSs. This resulted in a framework that enables to analyze and describe

SPFSs.

This framework for describing SPFSs was restructured in the form of a

survey. All different options were summed and explained in a MS word file,

Chapter 2

32

including 46 items (11 items with background information to identify the

SPFS, 9 items on the data gathering process, 7 items on the data analysis,

14 items on the content of the feedback report and the concepts used, and

5 items on the graphical data representation).

Almost all questions were multiple-choice items, besides some open

questions. Depending on the items, the respondents were requested to

provide complementary explanation.

3.2. Selected SPFSs

The five systems described in this study were purposefully selected because

of their diversity in feedback characteristics and because of the availability

of information on these systems. This selection was not made to strive for

representativeness but to illustrate and describe exemplarity. First, each

selected SPFS is shortly described:

• Assessment Tools for Teaching and learning (asTTle): AsTTle has been

developed as part of a government funded research project at the

Visible Learning Labs of the University of Auckland in New Zealand. This

SPFS offers schools a national assessment model with all characteristics

of an SPFS, without the negative consequences of high stakes testing.

This feedback production should help to make teachers acquainted with

the national curriculum, to enhance future teaching and learning. About

80% of all elementary and high schools of NZ are using the asTTle (year

4-12). Participation is voluntary and free of charge. The feedback is

offered both in English and Maori, which have two distinct curricula.

Feedback reports are delivered directly and immediately to school team

members and pupils/students and parents via a secured online website

or via software used on the local network. There are no results made

public. A remarkable option of asTTle is the direct feedback delivery to

students and parents. The technological applications allow pupils to get

access to their results during their school career, over all different years

and schools. Summarized, asTTle functions as a professional monitoring

system as the purpose is to create a low-stake assessment system to be

used internally within the schools. As it provides the function of

following individual learning paths, it serves as a pupil monitoring

system. However, the main function is the detection of learning needs

on an aggregate level.

• Performance Indicators in Primary Schools (PIPS): PIPS was developed

by The Centre for Evaluation and Monitoring at the Durham University

(UK). It is widespread in primary schools (from reception to year 6), in

Chapter 2

33

England and Scotland and to a smaller scale in the other parts of the UK.

Furthermore, PIPS has local adaptations of the system, applied

worldwide. Within the UK, independent schools show the largest

interest in PIPS, as compared to the government funded schools, as they

lack monitoring systems and information on national testing because

they do not follow the national curriculum. As the access to PIPS is not

cost-free, schools have to use their school budgets. All participation is

voluntary, although some schools are strongly encouraged to participate

by their Local Authorities. PIPS started as a research project, which

transferred its services to schools. In some cases, Local Authorities also

get direct access to the data of their schools, if they have paid for the

assessments. They are not allowed to make these results public and are

supposed to use the data for supporting schools. The feedback is

delivered via regular mail (to the PIPS coordinator on the school) and via

a secured electronic portal. Depending on whether the assessments

were computer delivered or paper based, feedback production can take

between two days and eight weeks. The main function of PIPS is a pupil

monitoring system, besides a research project, SPFS and standardized

test.

• South African Monitoring system for Primary Schools (SAMP): PIPS

served as a basis for the development of this system. It has been evolved

to an almost complete distinct SPFS, developed at the Centre for

Evaluation and Assessment at the University of Pretoria (SA). Due to

resource limitations, feedback is only delivered in the Tshwane Region

for the first year of primary education. Furthermore, only the

government funded schools are reached as these are the schools with

the largest need for accessible assessment systems, in contrast to the

wealthier independent schools. Therefore, this SPFS delivers feedback

for free (limited to 80 learners per school). Very specific for the

development of SAMP is the complicated language context of SA, with

has 11 official languages. SAMP is restricted to the three predominant

languages of instruction in that region: English, Afrikaans and Sepedi.

Therefore, SAMP is a small scale SPFS offering feedback to 22 schools.

All of these schools are participating voluntary. The feedback users are

in the first place the school team members. They are free to

communicate the results with other stakeholders, such as parents, the

department of education, etc. Feedback supply via regular mail is not an

option as there is no assurance the package will reach its destination in

SA. Since many schools lack internet and even computer access,

electronic feedback delivery is neither an option. Therefore, feedback is

Chapter 2

34

delivered on the school to the contact person. This happens four days to

two weeks after data gathering.

• Leerling- en OnderwijsVolgSysteem (LOVS) [Pupil and Educational

Monitoring System]: Similar as PIPS, LOVS is in the first place a pupil

monitoring system (reception to year 7), besides a research project,

SPFS and standardized test in the Netherlands. Furthermore, some local

projects (e.g., in Germany, Turkey, Denmark, etc.) make use of the LOVS

software. Dissimilar to the other systems in this study, LOVS is also an

official accountability system (e.g., it is used by the Dutch Inspectorate)

in addition to a professional monitoring system. During inspection visits,

schools may be asked for permission to show their results on these

tests. Furthermore, the inspectorate sometimes strongly encourages

(weak scoring) schools to participate if they do insufficiently use data

sources to prove their functioning. This implies that some schools may

experience participation as an obligation, whilst in general voluntary

assessment is the rule. The wide acceptance of LOVS is indicated by a

95% rate of use of at least one of the tests in all elementary schools in

The Netherlands, including special needs education. This feedback is

provided by a private company, called CiTO [Central institute for Test

Development]. Due to this private character, schools use their budgets

for the services offered. As a consequence, they are the only owners of

their data. To disseminate their results to externals, schools need the

permission of the parents. The way of delivering feedback depends on

the tests taken. Some results are sent by regular mail, while other data is

provided via an electronic portal, via software on a disk or manually by

means of printed scoring tables. Also depending on the test taken and

the standardization process (based on previous or current reference

groups), feedback delivery takes a second to a few months.

• Schoolfeedbackproject (SFP) [School Feedback Project]: The SFP is still

in a developmental phase and is thus not commercially available yet.

The SFP is a research and development project, initiated by three

universities in Flanders (Belgium). As there is no central assessment

system, schools lack information on their performance in relation to the

national average or to results of schools with similar characteristics.

Therefore, a government funded project has been set up for creating a

Flemish SPFS. In this study, only the system developed for primary

education will be described (year 1-6). From 2011, this will be

commercially available, whilst the current sample (representative

reference group of 195 schools) is participating for free. Although

participation was voluntary in general, some school boards decided for

their schools to participate. Results are fed back confidentially to schools

Chapter 2

35

only. In addition, aggregated results are reported to school boards, as

part of the research project. School reports are delivered to the

feedback coordinator on the school by electronic mail. Due to the

developmental phase of the project, feedback generation took several

months. From 2011, feedback will be delivered in an automated and

quick way as the underlying software engine, feedback formats and

reference data are already available. Furthermore, the SFP is developing

a secured electronic portal to upload student data and download school

feedback.

3.3. Procedure

This survey was sent to the directors or coordinators of five selected SPFSs.

They were informed about the purpose of this study. Additionally, semi-

structured in-depth telephone interviews were hold, to elaborate or clarify

some of the answers from the survey and to gather information on the

rationales for opting for certain SPFS characteristics. The telephone

conversations, which took on average 90 minutes, were audio-taped with

permission of the interviewees and transcribed afterwards. The integrated

results from survey and interview were sent to the interviewees for

member checking.

Finally, the integrated files of surveys and interviews were summarized

in separate files for each feedback system. These files were integrated in a

conceptually ordered meta-matrix (Miles & Huberman, 1994) that

facilitates a variable-oriented and case-oriented analysis. Furthermore, this

meta-matrix serves to give a quick overview of the variety in feedback

systems. Parts of this meta-matrix will be illustrated and explained in the

results section.

4. Results: Application of the framework

4.1. Data gathering

Having a view on the data gathering process is of major importance for

evaluating the accuracy of the data on which the feedback is based on.

Therefore, the following elements have been included in the framework:

the persons gathering the data, types and structuredness of instruments

used, data gathering medium, time and place of data collection and the

data source. Table 3 gives an overview of the instruments used.

Chapter 2

36

Table 3

Overview of data gathering instruments used in selected SPFSs

asTTle PIPS SAMP LOVS SFP

Completely structured

Domain specific tests X X X X X

Survey on attitudes/ socio emotional

development

X X X

General achievement test X X

Observation scale X X

ADHD-scale X

Pupil background questionnaire X X

Test on study skills X

Survey on social emotional functioning X

Test of intelligence X

Test on interests X

Semi- structured

interviews on strategies in mathematics,

writing assessments

X

Pupil background questionnaire X

Rating scale for evaluation of a technical

piece of work

X

Computer adaptive

Domain specific tests X X (1)

X(2)

Screening instrument for Dyslexia X

Other

Automatic upload of pupil background

variables from data management system

X X X

Observation notes of testing: no

structured instruments

X

Upload of results from Statutory

Assessment Tests

X

Note (1): Computer-delivered version of PIPS for Years 1 – 6; all other tests use

stopping rules based on a number of mistakes made, on increasingly difficult items

Note (2): depending on the test taken

In almost all cases (asTTle, PIPS, LOVS, SFP) teachers and/or other school

team members organized the test administration on the school, following

strict testing instructions. Only in case of SAMP, field workers from the SPFS

guided the assessment. This choice was made because of the reliability of

the data collection and because of not interrupting teachers from their

teaching. Furthermore, teachers were not only organizing the test, but

Chapter 2

37

sometimes they were also providing data on the pupil’s functioning. In PIPS

and the LOVS for example, they completed observation scales, pupil

background questionnaires and/or surveys on the socio-emotional

functioning. In asTTle, teachers have an even more active role by

composing the test based on predefined parameters and options by using

the testing software tool. Furthermore, parents can also be asked to

provide information. In the case of the SFP, a parent questionnaire is

provided for gathering home and pupil background information.

Not only the testing instructions, but also most testing instruments are

highly structured. Almost all instruments are completely structured. This

means that tests and questionnaires entirely describe and guide the data

collection. In some cases, semi-structured instruments are used. For

example, SAMP does not require schools to complete structured

questionnaires on student background variables, but just lists what

information would be favorable to deliver (due to a lack of pupil

information and lack of computerized management system). In contrast,

asTTle, LOVS and PIPS make use of advanced software options that allow

automatic import of pupil level data from the school’s management

information systems. These three SPFSs additionally provide computer

adaptive testing. This means that test items are presented to pupils

accordingly to their ability level. For example, if pupil performs well on an

intermediate difficult item, a more difficult question will be presented.

Subsequently, if he performs poorly, he will be presented with a simpler

item.

Testing pupils can be very time consuming, especially in case of younger

children. As they do not master reading or writing skills, often a one-on-one

oral testing is necessary. In this case, the instructor provides the

explanation following the guidelines and the pupil provides the answers

(e.g. in PIPS and SAMP). In other cases, a one-on-one testing is required

because of the nature of the test (e.g. reading fluency in SFP and LOVS). In

other cases, a one-on-one testing is optional, as the testing medium allows

both individual and classical testing at any time. This is the case for asTTle,

as each pupil owns a personal computer and software adapts

standardization of the scores to the moment of testing. More rigid systems

with paper-pencil tests, computerized tests in computer labs and/or fixed

measurement moments will be more likely to adopt whole classroom

testing (PIPS, LOVS, SFP).

The place of testing is highly related to infrastructural characteristics.

Mostly, tests take place at the classroom if printed booklets are used

(asTTle, PIPS, LOVS, SFP), or in the computer lab for computerized versions

(PIPS, LOVS). Testing administration of asTTle is flexible because of

Chapter 2

38

technological provisions. Even testing at home is plausible. In case of SAMP,

each testing situation is slightly different as has to be sought to an

appropriate place in each school (e.g., in the staff room, under a shady

tree).

4.2. Data analysis

In this section, we will focus on the underlying scaling model used, on the

data analysis model used including value-added measures, on the

opportunities for longitudinal measurements, on the inclusion of pupil

mobility and on the levels of aggregation used. Being informed about the

data analysis of SPFSs is a prerequisite for making judgments on the

accuracy of the feedback. In all feedback systems, testing data are analyzed

quantitatively. We will focus in this section on the variety in these

techniques used.

First, the underlying scaling models have been examined. Item Response

Theory (IRT) is underlying all SPFSs to some degree. This technique

estimates several parameters, including the difficulty level of the items and

the skill scores of the respondents. By creating one skill scale that relates

different tests in a certain domain, IRT offers opportunities for longitudinal

measurements or computer adaptive testing. IRT has been applied in the

selected SPFSs for defining the item parameters and composing tests

(SAMP). AsTTle, PIPS, LOVS and SFP go further and use IRT for defining

ability test scores for the respondents for certain test versions. The IRT

model that has been used most widely is Rasch (in asTTle, PIPS and SAMP).

The techniques used in LOVS depend on the test taken and SFP uses a more

complex 2-parameter model. The system taking the most advantage of IRT

is asTTle. In combination with the possibilities of the software tools,

teachers are enabled to compose tests from an item bank with different

degrees of difficulty. Besides IRT, Classical Test Theory (CTT) is applied in all

systems. This is not only used for analyzing data from interviews, surveys

and/or observation scales (asTTle, PIPS, LOVS), but also for some tests

(SAMP, SFP) which require no further analysis than a sum score.

Only PIPS and SFP make explicit use of value-added measures. These

measures indicate to what extent scores (raw or adjusted) are above or

below an “expected” value. The expectations are based on statistical

analyses that estimate the impact of independent variables such as

cognitive aptitude, prior achievement and socioeconomic background.

Value added is reported in PIPS and SFP as a measure of the school’s

influence on the pupil’s performance. PIPS makes a distinction in prior and

concurrent value added. For estimating the former type, a general

Chapter 2

39

achievement score is taken into account, as an aggregate of subject specific

test scores and a developed ability score. Concurrent or contextual value

added only includes the developed ability score. In addition, SFP uses both

student background variables and prior achievement scores in the

estimation of contextual value added. Both systems conflict in this

conception as student background are either seen as redundant or

necessary variables to be included in the model. Furthermore, the value-

added realizations differ significantly in their level of reporting. While SFP is

convinced that value added should only be reported at an aggregate level,

PIPS allows pupil level residual analysis. LOVS implicitly applies the notion

of value added measures by reporting the difference in growth of the

school as compared to the reference group.

When focusing on the statistical model underlying the feedback

production, a large variety in complexity can be noticed. While some SPFSs

strive for complexity to provide a nuanced view on school performance

data (as SFP and LOVS), others consciously avoid model complexity in favor

of the transparency for feedback users. For example, in the calculation of

value added, PIPS applies an ordinary least squares in contrast to the

multilevel piecewise growth curve models of SFP. Other systems do not use

regression models as they do not intend predicting scores or calculating

value added in order to keep the low-stakes character of testing (asTTle) or

are still in a development phase with limited capacity for growing

complexity (SAMP).

The type of statistical models used defines the options for longitudinal

measurement. This means that scores for pupils are linked to each other

over time. Whether or not learning progress can be measured, depends on

the scale used. In case of asTTle, progress is estimated on one underlying

IRT ability scale, which links all tests in a certain domain. PIPS uses a scale of

standardized scores (either obtained by CTT or IRT) and puts these scores

on a time line. SFP in contrast not just places the (rescaled) IRT-scores on a

time line but provides both raw scores and scores adjusted for both for the

influence of prior achievement and pupil background characteristics using a

multilevel piecewise repeated measures model. This gives a different

conception of growth and longitudinal measurement. The adjusted scores

do not express the actual achievement levels, but the levels that would

have been achieved if the pupils would have had the same background

characteristics as the reference group. Also for some tests of the LOVS,

adjustments have been applied for pupil background characteristics.

Another factor delineating opportunities for longitudinality is the

number of measurement occasions. AsTTle, PIPS, LOVS and SFP for example

offer tests with (at least) three linked measurement moments, while SAMP

Chapter 2

40

only tests pupils at the start and end of the first year of primary education.

In all systems its users decide whether or not to participate in single or

successive measurements (repeated for different cohorts or not).

Finally, it is of major importance to stress the influence of pupil mobility,

in particular when this longitudinal data are represented for a cohort. A

consequence of pupil mobility is that values are missing for pupils who left

a testing sample by changing classes or schools. Therefore, multilevel

modeling is preferred because missing data do not prevent from estimating

growth curves, as all available pupil scores are taken into account in the

estimation procedure. However, when estimating more complex

longitudinal models, taking into account pupil mobility requires cross

classifications, which may overburden the capacities for statistical analysis.

A final aspect to be discussed in this section is the reported aggregation

level for respondents and content. With regard to the respondents, all

systems opt to report pupil level data, with exception of the SFP, which is

designed for evaluating and informing school policy with a focus on

aggregated data. The adjusted scores and value-added scores are in their

view only valid for aggregated data as this cancels out measurement errors

and bias by averaging. Furthermore, the model complexity does not allow

generating data on several aggregation levels, due to pupil mobility. The

systems that report pupil level data (asTTle, PIPS, SAMP and LOVS) also

report data on aggregated levels as classroom level, group level, school

level, etc. In these cases, the aggregated scores are easily obtained by

averaging the pupil scores for a certain group. Besides the respondent level,

also the reported content level is determined by both convictions and

methodological considerations. All systems report at (broad) subscale and

subject level. Only PIPS reports on the item level (only for reception

feedback) as this would have the largest information value to inform

planning in the classroom. AsTTle intentionally does not report on the item

level as this would lead to teaching to the test. Items are therefore just

considered as indicators of subjects. Another restriction for reporting item

level scores depends on the objective of the test taken. As SFP developed

tests for determining learning gains (which requires avoiding ceiling effects)

and not for diagnostics (necessity to determine outliers), it is not opportune

to report item scores. Following the SFP, this implies these tests are not

suitable to discern detailed subscales either, as these would not meet the

psychometric standards of reliability.

Chapter 2

41

4.3. School performance feedback content

This paragraph contains a description of the subjects and topics that are

reported in the feedback reports, the conceptual representations

(performance indicators) and reference groups used, and the sections

offered in reporting. Following the quality standards of performance

indicators, the feedback content has to be relevant and useful.

Furthermore, SPFS users should have to accept the performance indicators

and consider them to be fair.

Regarding the contents that have been tested in the selected SPFSs, we

can refer to Table 3, in which the data gathering instruments are described.

This table shows that all systems use domain specific tests. These refer in all

cases to language and mathematics tests with different subscales. Some

systems broadened their supply with tests for science (PIPS), English as a

foreign language (SAMP, LOVS), and/or technique and world orientation

(aggregation of geography, history and environmental science in LOVS).

The other data instruments from Table 3 report other non-cognitive

measures, such as attitudinal, behavioral and contextual contents.

Concerning behavioral scales, interesting examples are noticed. PIPS, for

example, offers a scale for detecting ADHD and LOVS for Dyslexia, whilst

handwriting is tested by asTTle and SAMP. To illustrate attitudinal

measurements, there are measures of attitudes related to subjects (asTTle,

PIPS, SAMP and LOVS), to the school culture in general (asTTle, PIPS and

SAMP) or to socio-emotional development (LOVS). Related to contextual

information, informing schools about their pupil mobility is of major

importance to get a view on their functioning. Why are pupils leaving?

Which newcomers are schools attracting? Which pupils go to special

education? Was the school aware of the huge number of pupils with

learning lags? These are some of the questions that stimulate reflection at

the school level, transcending individual learning pathways. Only the SFP

specifically reports on this.

Numerical measures

A wide range of numerical measures have been reported in the SPFSs in this

study. Table 4 gives an overview.

Chapter 2

42

Table 4.

Numerical measures used in selected SPFSs

asTTle PIPS SAMP LOVS(1)

SFP

Type of scores

Adjusted scores X X

Expected scores X X X X X

Predicted scores X X

Raw scores X X X X X

Numerical measures

Band score X X

Cut-off score X X

Grade score X X

Learning gain score X X X(2)

X X

Mean score X X X X

Percentage score X X

Percentile score X

Rescaled score X X X

Standardized score X X

Value-added score X X

Note (1): Depending on the tests taken

Note(2): SAMP also registers loss scores besides gain scores

Raw scores are fed back in all cases, as well as expected scores, as the

latter are reflected by the average for the reference group. Predicted

scores, resulting from regression analyses, are used for making predictions

for future performance, based on the current pupil achievements (PIPS,

LOVS). Adjusted scores are rarer as these require more advanced statistical

analysis (LOVS, SFP). All these types of scores are rescaled in meaningful

units for the users. For example, scales are created with a mean of 50 and a

standard deviation of 15. All these transformations are somehow arbitrary

as there are no conventions on which scales, bands or grades are favorable.

Mostly, test score have been transformed in relation to the local context.

For example, AsTTle and PIPS reformulate scores to grades in accordance to

the national curriculum, SAMP rescales to five-point scales teachers are

familiar with and LOVS expresses scores conform to preferences of

inspection authorities.

Feedback reports may contain more information than the mere testing

results. The explanation on how to interpret the results is only provided in

the feedback reports of PIPS and SFP. Other systems provide this

information in the accompanying manual. When it comes to searching for

explanations for the results for a specific school, no further help is provided

Chapter 2

43

in any report. However, AsTTle, LOVS and SAMP take considerable

initiatives for offering remediation material. AsTTle is the most advanced

system by offering supporting material for teachers in accordance to the

achieved grade levels per pupil and group.

What information can be deduced from the reports also depends on the

opportunities for references offered (norm, criterion or self reference).

These three forms of reference offer different opportunities for a school to

measure its own functioning against. All systems offer a norm to compare

the results with. In most cases, this reference is a (representative sample

of) the national average. Only SAMP cannot realize this as a small scale local

project. Instead, SAMP offers the opportunity to compare with schools with

the same language group within the sample. AsTTle and LOVS allow to

compare with schools which are similar, based on certain characteristics.

These features foster fair comparison, with the same underlying idea as

providing adjusted scores (comparing same-to-same); although adjusted

scores are estimated with a different calculation procedure. Criterion-based

references are less prevalent (asTTle and SAMP) as these imply an absolute

instead of relative point of comparison. These references refer in these

cases to the cut-off scores used. Opportunities for self reference are offered

in all systems by allowing schools to compare results over time, either

within cohorts (cf. gain scores, longitudinal measurements), or between

cohorts (multiple measurements with different year groups).

Representation modes

With regard to representation modes used in the feedback reports, we will

discuss the medium used to present the results, the graphical

representations, and the reliability indices (see Table 5).

SPFSs differ in the feedback media used to report the results. These

media are related to the flexibility for users in choosing representations or

manipulating their feedback output. In AsTTle, PIPS and LOVS, users can

select different types of representations by software tools or Excel macros.

They may for example select a table to present exact data, and growth

curves to show trends. SAMP en SFP are less flexible: these SPFSs provide

the user with a printed or digital PDF report of the results. SAMP

additionally reports the results in Excel sheets, which users can use to

perform secondary analyses on.

The graphical representations offered by the different SPFSs differ as

well. Some systems only include more simple form of representations, such

as bar graphs, cross tables and histograms (SAMP). Others include more

Chapter 2

44

complex graphical representations of the results, such as scatter plots with

regression lines (PIPS), line graphs (SFP) and layer graphs (LOVS).

The school performance feedback results are based on certain statistical

analyses, which include a certain measurement error. To enable users to

judge the accuracy and importance of their findings, information on the

uncertainty surrounding the results has been incorporated in asTTle, PIPS,

LOVS and SFP. These uncertainties can be indicated by adding confidence

intervals. All SPFSs studied present confidence intervals in either bar graphs

(AsTTle and LOVS) or longitudinal progress charts (PIPS). SFP represents

uncertainty by marking significant values in cross tables. SAMP prefers not

to present confidence intervals, as this would make the interpretation of

the result too complex for the users. Instead of, they warn the users not to

over interpret small differences or shifts in scores.

Table 5.

Overview of representation modes in selected SPFSs

asTTle PIPS SAMP LOVS(1) SFP

Medium: fixed

Printed report X X X

Pdf version X X X

Medium: flexible tools

Online tools X

Software applications on local

network

X X

Excel sheet X

Excel macro’s in sheet X

Graphical representations

Bar graph X X X

Box plot X X

Cross table X X X X X

Divided bar graph X X X

Grouped bar graph X

Histogram X

Layer graph X

Line graph X X X

Multipanel display X

Pie graph X X

Scatter plot with regression line X

Side by side graph X

Other: e.g., schemes, iconic

representations

X X

Chapter 2

45

asTTle PIPS SAMP LOVS(1) SFP

Reliability indices

Confidence intervals X X X

Significance values X

Note (1): Depending on the tests taken

5. Discussion

As evaluation and data-driven decision making are receiving increased

attention in education, more and more SPFSs are being developed and used

worldwide. However, little research is available on the characteristics of the

different SPFSs. Studies show that these characteristics may influence the

degree to which the feedback is actually used for school improvement (e.g.

Schildkamp & Visscher, 2009; Verhaeghe et al., 2010). Therefore, it is

important to carefully consider the characteristics of an SPFS, when

developing or selecting one to use. Users need to purposefully choose the

type of SPFS that corresponds to their information needs. This requires

transparency on the characteristics of different SPFSs. Therefore, in this

article, we developed an exemplary framework for identifying SPFS

characteristics, which usability has been demonstrated by applying it to five

different SPFSs. We illustrated variety in the data gathering processes, the

type of analyses, and the content of the feedback, including the numerical

measures and representation modes used. The goal of this study hereby

was not to judge the different SPFSs, but to highlight some issues

concerning SPFS characteristics.

With regard to data gathering, all SPFSs studied mainly offer completely

structured instruments, as cognitive tests, questionnaires on socio

emotional development, and diverse types of scales. Additional semi-

structured instruments as interviews or rating scales are provided as well.

All are accompanied by strict prescriptions on how to gather the data.

Providing highly structured instructions and instruments is a prerequisite

for standardized and reliable data collection (Fitz-Gibbon, 1996; Fitz-Gibbon

& Tymms, 2002), especially if data has been gathered by school staff. This

data collection by school team members only leads to accurate data in case

of low-stake testing (Fitz-Gibbon & Tymms, 2002; Yang et al., 1999; Smith,

1995). As data collection is very time consuming, technological tools may

create several advantages. Therefore, initiatives as computer adaptive

testing or automatic upload of data from management information systems

(as in asTTle, PIPS and LOVS) are facilitating efficient data gathering. These

Chapter 2

46

tools do not only prevent pupils and teachers from overburdening with data

collection, but also foster targeted data collection. However, only highly

advanced SPFSs adopt these tools. Furthermore, these software tools

cannot be applied in all contexts due to the infrastructural limitations.

As data-driven decision making is a cyclic process (cf. Plan-Do-Check-Act-

cycle, Deming, 1986; Verhaeghe et al., 2010), repeated testing for different

cohorts and year groups is required. Furthermore, systematic measurement

with small time lags will result in the most reliable trend (van de Grift,

2009). In all the SPFSs in this study, the users therefore have the choice for

which time intervals to opt.

With respect to the content of the feedback, the SPFSs in this study focus

rather narrowly on a few cognitive outcomes (e.g. language, mathematics

and/or science), which make part of the core curriculum in all countries.

Developers of SPFSs might consider how to include other subject areas in

the SPFSs, as well as more attitudinal, behavioral and contextual

information. If school staff wants to make informed decisions on how to

improve their education, they need different types of data (Schildkamp &

Kuiper, 2010). AsTTle, PIPS and LOVS have set the first steps, but other

types of data may be considered as well, such as data on the functioning of

teachers (e.g. teacher and student questionnaires). Moreover, schools

already have other types of data available (e.g. school data such as

achievement tests from other subjects, inspection reports, parent surveys,

class tests). It is important to consider these different types of data and

data sources in schools as well, in order to make a comprehensive

evaluation of the schools’ functioning (Schildkamp & Kuiper, 2010). A

preferable scenario to foster this data triangulation would be the

development of integrated management information systems (Bosker,

Branderhorst, & Visscher, 2007). In order to obtain an integrated system,

more coherence in data conceptualization and representation is required,

not only between different data sources, but also between different

instruments of the same SPFS. A first step herein can be taken by

developers of SPFS in creating more conformity in data analyses and

representations.

With regard to the data analysis, it is important to find a balance between

statistically correct - and thus complicated - analyses and accurate results

on the one hand and understandable analyses and user friendly results on

the other. For example, the analyses used in PIPS are fairly straightforward

and not too complex. Schools can understand the results, and studies show

that schools feel ownership over the results (Tymms & Albone, 2002),

Chapter 2

47

which directly influences the degree to which the feedback is actually used

(Kyriakides & Campbell, 2004; Schildkamp & Teddlie, 2008). However,

because the system does not use multilevel analyses, schools are

sometimes wrongly classified as, for example, underperforming. To reduce

these misclassifications, some researchers claim that it is necessary to apply

multilevel models (Goldstein & Spiegelhalter, 1996; Karsten, Visscher,

Dijkstra, & Veenstra., 2010). Following Yang et al. (1999), it is possible to

explain these multilevel models and outcomes to head teachers in an

understandable way. In contrast, others consider multilevel modeling as

inappropriate for feedback purposes and claim that the method of Ordinary

Least Squares is accurate enough and more understandable to schools (Fitz-

Gibbon, 1996; Fitz-Gibbon & Tymms, 2002; Sharp, 2006). Whatever

statistical analysis an SPFS uses, it should inform its users on the associated

constraints, as it presents an image of being a fair performance indicator

system.

Moreover, it is important to realize that any type of measurement

always includes some type of error. Statistical estimates always include

uncertainty, which needs to be taken into account in any interpretation.

This especially holds for small groups, such as classes and cohorts within

schools. An SPFS should therefore provide information on limitations and

uncertainties, and provide information on the reliability of the estimates

(Fitz Gibbon & Tymms, 2002; Mortimore & Sammons, 1994; Rowe, 2004;

Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996; Yang et al., 1999;

Karsten et al., 2010). These reliability indices are for example applied in

AsTTle, PIPS, LOVS, and SFP.

If school level data is intended to be used for making fair comparisons

with reference groups, it is advisable to work with value-added models.

Value-added is usually defined as everything the pupil has learned at

his/her school (e.g., van de Grift, 2009). However, the concept “value-

added” is not unproblematic (van de Grift, 2009). It is not possible to assess

everything a pupil has learned, such as social and creative abilities.

Furthermore, because pupils change from schools and classes, different

schools and classes have an influence on the pupils’ learning progress. Also

it is not clear how this learning progress should be measured. And how to

take into account the knowledge and skills acquired outside the school? As

a result, several problems have been associated when applying value-added

modeling (van de Grift, 2009; Karsten et al., 2010). We discuss some

important issues for this study.

Firstly, there is the problem of missing values, which may distort the

results. In this study, primarily in the SFP, serious attention has been

devoted to this issue (Knipprath & Verhaeghe, 2010). Moreover, these

Chapter 2

48

missing data might not just be random, but might be the result of certain

interventions (e.g. grade repeating) in schools. Incorporating the impact of

these missing values in the estimation procedures would be advisable

(Sanders, 2006; van de Grift, 2009; Yang et al., 1999).

Next, there is the instability of value-added judgments. Therefore, it is

recommended to use data on successive cohorts (at least 3 school years;

Van De Grift, 2009), to use longitudinal measurements (Heck, 2006) or to

average scores over several years (OECD, 2008). Also cross-sectional data

analysis might be used as it might have several advantages as compared to

longitudinal testing (Luyten, 2006; Sammons & Luyten, 2009).

Thirdly, there are different procedures for computing value-added

models, which lead to different rankings of schools (Fitz-Gibbon, 1996;

Goldstein & Spiegelhalter, 1996; Heck, 2006; OECD, 2008; Rowe, 2004;

Sanders, 2006; van de Grift, 2009; Yang et al., 1999). For example, there is

no consensus on the inclusion of student background characteristics in the

models used in the SPFSs in this study. As student achievement results are

influenced by prior achievement and student background characteristics

(such as gender and SES), several researchers stress that corrections for

these out-of-school influences might be required (Goldstein & Thomas,

1996; Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996; Heck,

2006; Yang et al., 1999; Karsten et al., 2010; Sanders, 2006; Rowe, 2004).

Fourthly, a value-added model has only limited predictive validity for

certain schools (e.g., for schools with large SES-gaps). SPFSs that use value-

added models should therefore always be careful with categorizing schools

as underperforming, and should use labels as durably underperforming or

durably outperforming instead of ranking schools (van de Grift, 2009).

Goldstein and Thomas (1996) and Yang et al. (1999) also recommend using

these kinds of procedures only to identify “institutions at extremes”, as a

screening device to detect problems. Furthermore, one should always keep

in mind that value-added measures are only relative indicators of school

performance that should be interpreted against the reference group (Rowe,

2004; Karsten et al., 2010). Comparing institutions based on statistical

models will always require prudency (Goldstein & Spiegelhalter, 1996).

Finally, users have difficulties when interpreting value-added data

(Karsten et al., 2010; Santelices & Taut, 2009; Vanhoof, Verhaeghe,

Verhaeghe, Valcke, & Van Petegem, in press). Users should be supported to

gradually acquire expertise in data interpretation by, for example, getting

offered more and diverse value-added models ( Schatz, VonSecker, & Alban,

2005). Also other conceptualizations in terms of “school contribution”

(Santelices & Taut, 2009) or “residual analysis” (Fitz-Gibbon, 1996; Schatz,

VonSecker, & Alban, 2005) might foster correct understanding.

Chapter 2

49

In addition to the discussion on the accuracy of the reported feedback

data, we want to stress that relevant and fair feedback information does

not always need complex statistical modeling. For example, information on

pupil mobility is highly relevant to inform school policy. This variable should

not be used merely as a covariate or grouping variable or to determine

missing data (van de Grift, 2009). As only SFP is explicitly reporting on

amount of students entering in, staying in and leaving from the cohort each

year, this could be a consideration for other SPFSs. Moreover, initiatives as

asTTle that keep track of student results during their whole career and

make those results accessible to them should be encouraged. By denying

the value-added aspect, asTTle reports student level data for all students,

irrespective what schools they have attended before. Furthermore, we

stress that besides adjusted scores, the raw scores need to be reported

because of their informative value.

After having analyzed the data, SPFS developers need to carefully consider

what types of numerical measures and graphical representations to offer to

the users. Research has revealed that even simple numerical conceptions

and representations are often interpreted incorrectly (Earl & Fullan, 2003;

Zupanc et al., 2009). A sufficient level of assessment literacy is a

prerequisite for correct understanding. If not, proper support initiatives

should be provided. Furthermore, other ethical issues are related to the

fairness of the data. For example, the arbitrary or unfair boundaries used in

case of cut-off scores (Fitz-Gibbon & Tymms, 2002; Heck, 2006).

Furthermore, explorations in term of absolute instead of relative measures

of performance need to be encouraged (Luyten, 2006). To see all these

risks, proper informing of the users is required (Karsten et al., 2010).

Furthermore, it might be worth to offer different types of representations

that serve different purposes (e.g. band scores for detecting outliers and

line graphs for visualizing growth; Kosslyn, 2006). This idea has been applied

in asTTle that provides seven types of reporting serving different purposes.

SPFS developers should think carefully about each of these characteristics

and keep in mind that the use of school performance feedback does not

always lead to improvement, but that it should at least do no harm (Fitz

Gibbon & Tymms, 2002; Rowe, 2004). Moreover, they should consider

offering training in the interpretation and use of the results, especially

when using more complex statistical modeling, as studies have shown that

SPFS use without proper training is difficult (Schildkamp & Visscher, 2009;

Verhaeghe et al., 2010; Vanhoof et al., in press). Furthermore, it is advisable

to provide the users with indications on what instructional or organizational

Chapter 2

50

processes should be improved upon (Coe & Visscher, 2002; Karsten et al.,

2010; Verhaeghe et al., 2010), which has not been done by most of the

SPFSs in this study.

6. Conclusion

We belief that the components of SPFSs discussed in this study are

important aspects in ensuring that the SPFSs that have been developed all

over the world will be used as they are intended to be: for school

improvement purposes. However, we also belief that there is much to be

gained when it comes to developing SPFSs that provide schools with

reliable, valid and user friendly data. Decisions made by SPFS developers

about the design of the SPFS impact the results in ways that are not yet

fully understood, but can have implications for determining how “strong or

poor” a school is performing. Expanding and adjusting on the preliminary

framework we developed into a set of standards SPFS developers and

schools can use, may aid in developing efficient instruments for data driven

decision making.

Acknowledgement

We would like to express our sincere gratitude to the directors and

researchers of the SPFSs involved in this study for their cooperation:

• Prof. John Hattie, Director of Visible Learning Labs, director of asTTle,

University of Auckland

• Dr. Christine Merrell, Director of Primary Systems, Centre for Evaluation

and Monitoring, Durham University

• Elizabeth Archer, Project coordinator of SAMP, Centre for Evaluation and

Assessment, University of Pretoria

• Geert Evers, Information Manager Primary Education, Centraal Instituut

voor Toetsontwikkeling

• Ilse Papenburg: Training and advice, Centraal Instituut voor

Toetsontwikkeling

• Dr. Jean Pierre Verhaeghe, Project coordinator of the SFP, Ghent

University and Katholieke Universiteit Leuven

Chapter 2

51

References



School Effectiveness and School Improvement, 18(4), 451–467.

Coe, R., & Visscher, A.J. (2002). Drawing Up the Balance Sheet for School

Performance Feedback Systems. In R. Coe & A.J. Visscher (Eds.), School

improvement through performance feedback. Lisse: Swets & Zeitlinger

Publishers.

Davies, D. & Rudd, P. (2001). Evaluating school self-evaluation (Research

Report No. 21). Berkshire, UK: National Foundation for Educational

Research, Local Government Association.


of Technology, Center for Advanced Engineering Study.



Fitz-Gibbon, C.T. 1996, Monitoring education: Indicators, quality and

effectiveness London: Cassell.










in Society, 159(3), 385-443.

Goldstein, H., & Thomas, S. (1996). Using examination results as indicators

of school and college performance. Journal of the Royal Statistical

Society. Series A: Statistics in Society, 159(1), 149-163.



667-699.




Karsten, S., Visscher, A.J., Bert Dijkstra, A., & Veenstra, R. Towards

standards for the publication of performance indicators in the public

sector: The case of schools. Public Administration, 88(1), 90-112.

Chapter 2

52

Knipprath, H., Verhaeghe, J.P. (2010, April). Instability of the school

population: the less favourable side of longitudinal educational

effectiveness research. Paper presented at the 2010 AERA Annual

Meeting. Denver.

Kosslyn, S.M. 2006, Graph design for the eye and mind. Oxford: Oxford

University Press

Kyriakides, L. & Campbell, R.J. (2004). School self-evaluation and school

improvement: A critique of values and procedures. Studies in

Educational Evaluation, 30, 23-36.



Schatz, C.J., VonSecker, C.E., & Alban, T.R. (2005). Balancing accountability

and improvement: introducing value-added models to a large school

system. In R. Lissitz (Ed.), Value added models in education: Theory and

applications (pp. 1-18). Maple Grove, Minnesota: JAM Press.

Luyten, H. (2006a). An empirical assessment of the absolute effect of

schooling: Regression-discontinuity applied to TIMSS-95. Oxford Review

of Education, 32(3), 397-429.

Luyten, H., Tymms, P., & Jones, P. (2009). Assessing school effects without

controlling for prior achievement? School Effectiveness and School

Improvement, 20(2), 145-165.




Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An

expanded sourcebook. Thousand Oaks, CA: Sage.

Mortimore, P. & Sammons, P. (1994). School effectiveness and value added

measures. Assessment in Education: Principles, Policy and Practice, 1(3),

315.

Organisation for Economic Co-operation and Development (2008).

Measuring improvements in learning outcomes: Best-practices to assess

the value-added of schools Paris: OECD Publishing.







Sammons, P. & Luyten, H. (2009). Editorial article for special issue on

alternative methods for assessing school effects and schooling effects.


Chapter 2

53

Sanders, W.L. (2006). Comparisons among various educational assessment

value-added models. Paper presented at The Power of Two National

Value-Added Conference, Columbus, Ohio

Santelices, V. & Taut, S. (2009, September). Comprehension and use of

value-added school performance indicators reported to teachers and

parents. Paper presented at the European Conference on Educational

Research, Vienna.









35(4), 150-159.

Sharp, S. (2006). Assessing value-added in the first year of schooling: Some

results and methodological considerations. School Effectiveness and


Smith, P. (1995). On the unintended consequences of publishing

performance data in the public sector. International Journal of Public

Administration, 18(2), 277-310.

Tymms, P. (1999). Baseline assessment and monitoring in primary schools.

Fulton Publishers: London.

Tymms, P., & Albone, S. (2002). Performance indicators in primary schools.

In A.J. Visscher, & R. Coe (Eds.), School improvement through

performance feedback (pp 191-218). Lisse: Swets & Zeitlinger.

van de Grift, W. (2009). Reliability and validity in measuring the value added

of schools. School Effectiveness and School Improvement, 20(2), 269-

285.

Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P.

(in druk).The influence of competences and support on school

performance feedback use. Educational Studies.




101-119.

Van Petegem, P., Vanhoof, J., Daems, F., & Mahieu, P (2005). Publishing

information on individual schools. Educational Research and Evaluation,

11(1), 45-60.

Chapter 2

54


school performance feedback: Perceptions of primary school principals.



feedback systems. In A.J. Visscher & R. Coe (Eds.), School improvement


Swets & Zeitlinger.

Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through

performance feedback. Lisse, The Netherlands: Swets & Zeitlinger.




Yang, M., Goldstein, H., Rath, T., & Hill, N. (1999). The use of assessment

data for school improvement purposes. Oxford Review of Education,

25(4), 469-483.





55

CHAPTER 3

PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL PERFORMANCE FEEDBACK USE

Chapter 3

56

CHAPTER 3: PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL

PERFORMANCE FEEDBACK USE∗∗∗∗

Abstract

The present study focuses on the perception of primary school principals of

school performance feedback (SPF) and of the actual use of this

information. This study is part of a larger project which aims to develop a

new school performance feedback system (SPFS). The study builds on an

eclectic framework that integrates the literature on SPFSs. Through in-

depth interviews with 16 school principals, four clusters of factors

influencing school feedback use were identified: context, school and user,

SPFS, and support. This study refines the description of feedback use in

terms of phases and types of use, and effects on school improvement.

Although school performance feedback can be seen as an important

instrument for school improvement, no systematic use of feedback by

school principals was observed. This was partly explained by a lack of skills,

time, and support.

∗ Based on Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School

Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and


Chapter 3

57

1. Introduction

In recent years, the trend of decentralizing educational systems has

prompted researchers to focus on school-based management and internal

evaluation. Because schools are granted autonomy, governmental bodies

expect them to be accountable for monitoring their internal quality policy

(Nevo, 2002). In this context, the current performance level of a school

serves as a starting point for developing future plans and educational

targets. To asses their baseline performance level, schools can make use of

feedback offered by school performance feedback systems (SPFSs). These

external systems deliver confidential information about a school’s

performance and functioning (Visscher & Coe, 2002, 2003). Performance

feedback helps to reveal the strengths and weaknesses of a school’s

functioning and is expected to contribute to the school improvement

process by stimulating reflection and self-evaluation.

However, receiving feedback alone is not a sufficient condition to foster

self-evaluation and systematic reflection at the school level. Several other

conditions related to the school, the context, and the specific SPFS being

used, determine if and how schools will make use of the available feedback.

Empirical research on SPFSs is limited (Schildkamp & Teddlie, 2008). Studies

that have been carried out indicate that the actual use of school feedback

and its impact are rather low (Coe, 2002; Tymms, 1995; Saunders & Rudd,

1999; Van Petegem & Vanhoof, 2004). We believe that a detailed study of

the use and impact of existing school performance feedback initiatives is

warranted (Goldstein & Spiegelhalter, 1996; Schildkamp, 2007; Schildkamp,

Visscher, & Luyten, 2009; Visscher & Coe, 2002; 2003). In this study we

build on the findings of an ongoing project which focuses on the design,

development, and implementation of an SPFS in Flanders (The Dutch

speaking community of Belgium). We investigate the perceptions of school

principals of factors that promote or hinder their understanding and use of

school performance feedback information. The results of this study are

expected to support the development of SPFSs and to further refine

theories on school feedback use.

2. Theoretical framework

Based on a literature review, we developed a conceptual framework that

integrates factors affecting SPF use and effects (Fitz-Gibbon & Tymms,

2002; Schildkamp, 2007; Van Petegem & Vanhoof, 2007; Visscher, 2002;

Visscher & Coe, 2003). This framework is presented in Figure 1.

Chapter 3

58

Figure 1. Conceptual framework of school performance feedback use

2.1. School performance feedback use: Phases, types and effects

Adequate use of SPF is expected to lead to specific effects at the school and

pupil level (Visscher, 2002; Schildkamp, 2007). Its purpose is to contribute

to school improvement and lead to higher student performance (Visscher &

Coe, 2003). Apart from the intended effects of SPF, unintended effects have

also been reported in the literature, such as selective student admissions,

teaching to the test, and removing difficult students (Visscher, 2002). Other

studies refer to undesirable side effects of SPF, such as the demotivation of

school staff who become overwhelmed by the amount of the data involved

and the amount of time they have to invest (Fitz-Gibbon & Tymms, 2002;

Schildkamp & Teddlie, 2008). In this context, SPF does not always result in

significantly better student outcomes (Fitz-Gibbon & Tymms, 2002;

Schildkamp, Visscher, & Luyten, 2009; Visscher, 2002). Nevertheless,

recent research indicates that SPF can have a positive impact on pupil

achievement levels (Hammond & Yeshanew, 2007) and on the associated

school improvement processes (Schildkamp, 2007; Schildkamp & Teddlie,

2008; Schildkamp, Visscher, & Luyten, 2009). In these studies several effects

on process indicators were observed, such as an improvement in

consultation and communication about school functioning and school

quality, improved didactical approaches, and a stronger achievement

orientation of staff. However, considering the limited amount of research

available caution is warranted in drawing conclusions about the reported

effects of SPF use (Coe, 2002; Schagen, 2004).

The way school feedback is used plays a key role in its potential impact.

In terms of a policy-making cycle (e.g., Hoy & Miskel, 2001) feedback should

be used in the following sequence. First, feedback results must reach the

PH

ASES IN

U

SE

• Context related• School and user related• School performance feedback

(system) related• Support related

Interpretation

Policy actions

RESULTS:TYPES OF USE

EFFECTS

INFLUENCINGFACTORS

Chapter 3

59

proper person(s). Second, the data in the report must be read and

interpreted correctly for it to be meaningful. In the subsequent diagnostic

process, causes and explanations for the results are deliberated. The

diagnostic process results in actions that are implemented and finally

evaluated. However, research indicates that school principals do not always

disseminate feedback information or simply distribute feedback reports

without examining them (Van Petegem & Vanhoof, 2004). Other studies

found that school feedback users often get stuck in the transition from the

interpretation of SPF to active policy making (Vanhoof, 2007; Schildkamp,

2007). This is highly problematic as the interpretation of the data is

essential in deducing workable information (Earl & Fullan, 2003). These

phases of data use are outlined in the practice of data driven decision

making (Learning Point Associates, 2004). However, in the current literature

on SPF use for school improvement these phases are not distinguished in a

systematic way.

Within this policy-making cycle different types of feedback use can be

distinguished: (1) direct/instrumental, (2) conceptual, and (3)

symbolic/convincing (Rossi, Lipsey, & Freeman, 2004). An instrumental use

of feedback serves as a starting point for immediate policy making

decisions. A conceptual use of feedback does not result in concrete actions,

but influences the decision making process, which indirectly affects action.

Even if feedback does not influence one’s conceptualizations, it can affect

the policy making process in a symbolic way. This means feedback results

serve to convince others of existing opinions and to support viewpoints in

discussions (Visscher, 2002). Furthermore, feedback can be used in a

strategic way for accountability purposes, although this is not in line with a

school improvement discourse (Visscher & Coe, 2003). These four types of

feedback use can be considered as results of feedback use. For example, a

conceptual use results in an altered way of thinking about pupil

performances. This intermediate result can in the end lead to effects of

feedback use, such as a stronger achievement orientation.

2.2. Factors influencing school performance feedback utilization

Differences in the interpretation and use of school feedback can be

attributed to a variety of factors. In the framework of Visscher (2002) and

Visscher and Coe (2003) the following set of influential factors are outlined:

context, school and user, SPFS, and support. The authors embed the

process of feedback use in the broader school environment, which we call

context related factors. They do not distinguish support related factors as a

separate set, but place them within the implementation process and

Chapter 3

60

characteristics of the feedback system. These variables were selected based

on a literature review in the fields of educational innovation, educational

management, business administration, and computer science. However,

the relations between the different influencing factors and the feedback

effects are not examined (Visscher, 2002). This framework is used as a basis

for the present study.

Context related factors that impact feedback use include the school’s

policy strategies at the regional and/or governmental level (Sun, Creemers

& de Jong, 2007; Visscher, 2002). For instance, policies can contain clear

expectations that schools make use of feedback information. Educational

governments can stimulate feedback use by pressure and/or support.

Furthermore, feedback will be used differently depending on the context

(e.g., school improvement, school accountability, or a combination of both

strategies) (Vanhoof & Van Petegem, 2007; Visscher, 2002).

Secondly, school and user related characteristics seem to be key

variables explaining differences in school feedback use. First, the motivation

to use an SPFS leads to different utilizations. Motivation varies from

internal quality development or external accountability, to policy

preparation (van Aanholt & Buis, 1990; Liket, 1992). Secondly, previous

experiences with feedback use, general experience with school related

data, and the statistical knowledge and skills needed to interpret feedback

reports will also influence feedback use. While most teachers have

experience with school test data, pupil monitoring systems, and self-

evaluations, in several studies school staff report that they are lacking the

skills and confidence when using data for school policy purposes (Earl &

Fullan, 2003; Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006; Saunders,

2000; Williams & Coles, 2007). Thirdly, school performance levels also

influence feedback use (Visscher, 2002; Visscher & Coe, 2003). Schools

receiving positive feedback (large value added) will discuss the results

differently compared to schools receiving a less positive picture

(Schildkamp, 2007). In line with control theory, participants receiving

negative feedback are more likely to make an effort to reduce the

discrepancy between the negative feedback and the expected standards

(Kluger & DeNisi, 1996). This will result in different policy implications.

However, this theory does not hold in all cases; it is not unusual for school

principals to withhold feedback information that does not fit the current

policy plan (Van Petegem & Vanhoof, 2004).

A third set of factors influencing school performance feedback use refers

to the characteristics of the school feedback reports and the feedback

system. In this context, the perception of the user determines how

feedback will be used (Visscher, 2002; van den Berg & Ros, 1999). At the

Chapter 3

61

level of content, feedback should be perceived as relevant, non-

threatening, and corresponding to the actual informational needs

(Schildkamp & Teddlie, 2008; Visscher, 2002; Van Petegem & Vanhoof,

2007). Furthermore, the representation of both absolute and relative

school performance results also impacts the way feedback is used (Visscher,

2002; Visscher & Coe, 2003). If relative measures are used to compare the

school’s results with a reference group, these school scores should be

adjusted for the influence of pupil background characteristics and should be

linked to the relevant cohort group (Goldstein & Spiegelhalter, 1996).

Information should also be up-to-date, reliable, and valid (Visscher, 2002;

Visscher & Coe, 2003; Schildkamp & Teddlie, 2008). In terms of ethical

issues, Fitz-Gibbon and Tymms (2002) refer to the Hippocratic Oath and

state that feedback should “at least do no harm” (p. 75). For example, in

some cases feedback can be threatening to recipients’ self-esteem,

particularly in a system of accountability (Visscher & Coe, 2003). Consistent

with our definition of SPFSs, feedback systems for school improvement

should guarantee confidentiality and anonymity to the subjects and

schools. Moreover, feedback should not harm subjects or schools on the

basis of misleading information (Goldstein & Myers, 1996).

The fourth and final set of factors that affect feedback use concerns the

support experienced by feedback users (Schildkamp & Teddlie, 2008).

School staff that are involved in SPFS training are more likely to read the

feedback reports and adopt a more positive attitude (Tymms, 1995).

Numerous studies stress the importance of providing feedback support

(e.g., Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2209;

Van Petegem & Vanhoof, 2007; Visscher & Coe, 2003). This can be

administered by educational and government parties, school team

members, or the feedback system itself.

3. Research questions

This study examines the perception of school feedback users. Based on the

conceptual framework discussed above, the following research questions

are asked:

• What phases can be observed in practice when schools use school

performance feedback?

• What is/are the result(s) of using school performance feedback?

• How can differences be explained in the interpretation and the further

use of school performance feedback in different school contexts?

Chapter 3

62

4. Research context

This study is part of a larger SPF project called “Each school its own mirror.”

As there is currently no SPFS available in Flanders, this project is in the

process of developing and evaluating a new SPFS with collaboration

between researchers, various stakeholders, and a target group of primary

school principals and teachers. The system that has so far been developed

from the SPF project gives schools feedback on a confidential basis. These

feedback reports are designed to enable teachers and principals to

understand the value added scores of their school as compared to a

reference group. The reference group used is taken from another research

project (the SiBO project, Schoolloopbanen in het BasisOnderwijs [School

Trajectories in Primary Education]) that is currently tracking approximately

6000 children from a representative sample of Flemish schools (from the

time they entered kindergarten until the end of primary education). In the

SPF project, scores on tests and survey and observational data are being

continuously collected to gather information on child characteristics, family

background, class characteristics, classroom practices, teacher attitudes

and subjective theory, and school characteristics. The tests focus on

language learning (orthography, reading fluency, reading comprehension)

and mathematics. IRT-based techniques are used to construct the test

scores, enabling us to estimate growth curves.

The SPF project is currently able to deliver trial versions of school

feedback reports to the 198 primary school principals participating. In this

study, we build on the results from the trial versions sent to the schools in

the spring of 2007. These reports inform schools about the performance of

children and classes in the first two years of primary education. Results

were reported for mathematics, reading fluency, and orthography,

supplemented with information about pupil characteristics (child factors,

home factors, and Dutch language skills at the start of grade 1). The school

specific results were compared to the Flemish reference group. The central

concepts in these reports include learning gain, value added, and adjusted

scores and were explained in such a way that no prior statistical knowledge

was required. The data were supported with graphical representations (i.e.,

boxplots, bar graphs, pie graphs, growth curves, and cross tables). The text

of each report was standardized. The school principals were required to

interpret the results for their school, based on the general information

made available. They also received individual pupil feedback which

represents the observed scores and percentile rankings relative to the

reference group. Pupil feedback was presented to the schools shortly after

Chapter 3

63

taking the class tests, but the aggregated scores at class and school level

were sent approximately 10 months later.

5. Research design

5.1. Research approach

In this study we use a qualitative design to explore the perceptions of

primary school principals of SPF use. A qualitative approach is appropriate

since we want to develop a view on “naturally occurring, ordinary events in

natural settings, so that we have a strong handle on what ‘real life’ is like”

(Miles & Huberman, 1994, p. 10). It is recommended when the knowledge

base is limited and the nature of the variables, processes, and interrelations

is less clear (Maso & Smaling, 1998), which holds for the literature about

SPF use.

5.2. Research instrument and procedure

Data were gathered on the basis of semi-structured in-depth interviews.

This type of interview creates an informal relationship between researcher

and respondent, and gives the researcher a better understanding of the

perceptions, opinions, and views of respondents (Mason, 2002). The

interview questions were largely open ended and were derived from the

conceptual framework discussed above. Respondents were invited to

describe their school situation, to propose suggestions, and to express their

concerns. To clarify remarks or to ask for elaboration, spontaneous follow-

up probes were allowed (Lindlof & Taylor, 2002). Examples of questions

include:

• Questions about feedback characteristics: These questions focused on

the perceptions of the relevance, interpretability, user-friendliness,

validity and reliability of the feedback information (e.g., Do you think the

information is relevant to draw a picture of the school’s influence on

pupils’ performances? Which information is the most relevant? Why? Do

you trust the quality of the feedback results?).

• Questions about school and user characteristics mainly focused on

interpretation skills, expectations of feedback use, and the perception of

the school’s performance (e.g., Do you feel comfortable interpreting

these feedback results? If yes, where did you acquire the knowledge and

skills for this? Which problems did you encounter?) Furthermore

questions regarding school culture characteristics were asked (e.g., Is

Chapter 3

64

there a culture of systematic reflection? To what degree do teachers

welcome school performance feedback reports? Besides this feedback

project, are there other data gathering systems used to asses the

school’s functioning?).

• Questions about support initiatives included support use and support

needs (e.g., Have you engaged the team members when interpreting the

feedback reports? Do feel enough support from the school staff when

interpreting the results? Is there a need for more external support? For

which activities?)

• Questions on feedback use were formulated to discern different types

and phases of feedback use (e.g., Did you formulate any goals you want

to achieve by using feedback? Which initiatives are you undertaking to

communicate the feedback results to staff members? Did the feedback

report play a role in policy decision making? Has it influenced your way

of thinking about the school? Did you use the report for strategic

purposes, such as promoting your school, informing the school

inspectorate about your school’s results? Did you use the report to

legitimize your own convictions?)

• Questions about feedback effects were not stressed because it was

unlikely that effects of feedback use on the school could already have

been observed in the three months period between the feedback

delivery and the interview. However, questions about participants’

expectations of effects were posed (e.g., What effects should take place

for your effort to have been worthwhile?).

• The perception of context related factors is limited in this study to the

influence of the inspection visits to schools.

School principals were visited in their school office by one of the two

interviewers, three months after receiving their school performance

feedback report. Interviews lasted approximately 90 minutes.

5.3. Theoretical sampling

From the 198 SiBO-principals, a sample of 16 primary school principals was

selected by means of theoretical sampling, maximizing a variety of feedback

use (Mason, 2002; Silverman, 2005). In this sampling method the choice of

cases is made on conceptual grounds, not on representative grounds (Miles

& Huberman, 1994). To gather this sample, two months after having

received feedback reports, the 198 principals were asked to fill out an

online survey. We obtained a response rate of 61%. The principals were

selected for the present study on the basis of the following variables: the

Chapter 3

65

degree to which they used the school feedback, the number of children

without special needs in their school, experience in working with self-

evaluation, and school performance as represented in the feedback report.

For each variable the schools were divided in to three groups (low, average,

and high), with exception of the school performance level (positive or

negative value added). In this survey the principals were asked who they

had discussed the feedback report with, and chose from 6 answers. This

was considered as an indicator of feedback use. Respondents that depicted

more than 3 options were defined as high users. Principals that marked less

than two options were defined as low feedback users (M = 1.77, SD = 1.26).

The second variable concerns the school’s performance level (Visscher,

2002; Visscher & Coe, 2003). A distinction was made between schools with

a positive or negative value-added mathematics score at the end of grade

two. In the online survey principals were asked to report their degree of

experience in conducting self-evaluations in the school. Respondents with

scores higher than three on a 5-point Likert scale were classified as highly

experienced and those with scores less than three as having a low degree of

experience (M = 3.50, SD = 1.08). This selection criterion was used as it

indicates prior experience in data use for school improvement. The fourth

selection variable was the number of pupils without special needs at their

school. As feedback reports in this case were adjusted for pupil background

characteristics, a differential approval of the feedback relevance was

expected. Schools with percentages between 30 and 70 are considered as

having an average number of pupils without SEN (M = 50.36, SD = 27.73).

Figure 2 gives an overview of the selected schools.

Figure 2. Overview of selected respondents.

Note: H = high, A = average, L = low, ? = information unknown, + = positive value

added score, - = negative value added score; number of respondents between

parenthesis. From left to right respectively respondents from school 2, 11, 7, 1, 16,

3 & 10, 4 & 13, 15, 14 & 5, 6, 9, 8 & 12.

School population

Self-evaluation

Value added

Feedback use

Interviewees 16

H (5)

+ (3)

H (3)

H (1) A (1) L (1)

- (2)

L(2)

A (1) L (1)

A (7)

+ (4)

H(4)

H (2) A (2)

- (3)

H (3)

H (1) A (2)

L (4)

+ (1)

? (1)

L (1)

- (3)

H(1)

L (1)

L (2)

A (2)

Chapter 3

66

5.4. Framework analysis

Next to influencing the design of the SPFS, the results of this study were

also used as a means to evaluate the theoretical framework presented

above. Therefore the interview data were placed in the theoretical frame to

examine whether the theoretical findings were confirmed or needed to be

altered and/or elaborated. This can inspire future studies that build on new

preliminary concepts, and hypotheses (Ritchie & Spencer, 1994). These

findings can also contribute to the ecological validity of research findings on

feedback effects, as here they are applied in the context of school

improvement (Visscher & Coe, 2003).

Each interview was transcribed verbatim and was independently coded

by two researchers with ATLAS.ti, a qualitative analytic software tool. Codes

were assigned by following the middle order approach, which allows for the

initial application of broad categories that can later be refined (Dey, 1993).

Text fragments were mainly assigned to codes in a deductive way. First, text

fragments were placed under broad categories (e.g., effects of use, phases

of use, the four groups of influencing factors, types of use, and other

relevant information) and were then assigned to a predefined coding

structure. If no predefined code was appropriate, the text fragments

considered to be of importance were placed under the suitable broader

category. New codes were created for these fragments inductively,

emerging from the data, as in the grounded theory approach (Strauss &

Corbin, 2007).

For inter-rater agreement, the first two interviews were coded

collaboratively and the coding structure was set up. Two interviews were

then coded by both researchers separately to calculate inter-rater

reliability, following the formula of Miles and Huberman (1994): ratio

between the number of agreements and the total number of attributed

codes. An inter-rater correlation value of .90 was calculated, indicating

good inter-rater reliability.

After this coding phase, the analysis shifted from a focus on individual

interviews in a vertical analysis to a focus on the coding categories as they

occurred in all the different interviews in a horizontal analysis (variable

oriented approach; Miles & Huberman, 1994). This allows the researcher to

transcend the individual narratives of the school principals and to create a

spectrum of perceptions and interpretations.

Chapter 3

67

6. Findings and discussion

6.1. What phases can be observed when schools use school performance

feedback?

The interview results confirm that school performance feedback use in

primary schools is limited. Most schools were situated at the first phase of

the policy cycle described above. Only a few schools reached the planning

phase and action phase in the policy cycle.

Concerning the dissemination of information, the first stumbling block

occurred at the moment feedback reports arrived at the school. Though all

interviewees confirmed receipt of the report, one of them could not

remember it. This stumbling block became more apparent when we

examined the various ways in which the reports were handled. In some

schools, the report was not read: “Mostly the reports arrive at the school. I

give it a glimpse and then it is classified. Then, nothing is done with it”

(School 8). Other school principals reported they only took a quick look at it.

In contrast, others distributed the report to the teachers responsible for the

class that was discussed in the report. Others handed the report over to the

special needs teachers or special care coordinators. Sometimes teachers

were intentionally not asked to be involved in reading the reports.

My opinion is that if you are not really acquainted with the

interpretation of these data, you will not spontaneously unravel the

whole report. It is not so easy. It is an extra task on top of the rest. If I do

this and draw the conclusions and give it to them, it is already a lot.

(School 5)

Occasionally reading the feedback reports led to discussion between the

principal and the special care coordinator. In other cases, teachers were

also invited into a discussion, but even then it was not guaranteed that they

would read the reports. Principals reported that informal and unplanned

discussions took place:

We have a smoking room. That’s where we discussed the report. Those

who entered the room glanced through the report. It was not

intentionally communicated to the rest of the team members. This

happened rather informally. (School 10)

Other principals reported having a formal discussion during planned

multidisciplinary team meetings. In these instances, the school principal or

special care coordinator presented a summary of the results and their

interpretations. All school principals reported that they only discussed the

Chapter 3

68

feedback information within the school team, with the exception of also

reporting the information to the education inspectorate.

While we made a theoretical distinction between a reading and an

interpretation phase, it became clear that in practice these phases merged

together. The principals or special care coordinators that discussed the

results with team members proposed their own interpretations.

The new report was read and discussed by me and the care coordinator.

Afterwards the report was discussed in a team meeting with all teachers;

not just the teachers that are involved in the research. Conclusions and

underlying statistical procedures were communicated. Growth curves

were presented. (School 2)

Principals also stressed that the interpretation process was an intensive,

time consuming, and difficult activity. Some confirmed that they were not

able to correctly interpret or understand the information. This is

problematic as the interpretation phase is crucial for developing a solid and

valid basis for the development of school policy (Earl & Fullan, 2003). While

a minority reported not having experienced difficulties, all principals

reported that successfully interpreting the report requires effort.

You really have to examine it carefully to figure it out. I went over it …but

to really master it, you have to read and examine it several times.

(School 1)

I think that …one of the reasons is that you first look at it. It is similar to

the directions for use of a new apparatus. First you set it up and

afterwards you read how it works. If the set up is successful, you are not

going to read the instructions for use. (School 14)

The laborious interpretation phase seems to have a strong impact on the

diagnostic phase. Most principals dropped out after one attempt at

understanding the feedback results. Only a few principals set up initiatives

to identify strengths and weaknesses in their school and examined the

feedback information when looking for explanations. However, this was

rarely set up in a systematic way.

Principals frequently stated that the diagnostic and action phase were

barely reached. They also linked this to the lack of cues in the feedback

reports that might direct future action. This may be a reason why school

feedback is not systematically taken into consideration when developing

internal policy.

Chapter 3

69

We discuss it with the teachers involved. And, until now, the

interpretation is limited to the reading of the report and the file, but no

immediate actions follow from this. (School 4)

But this feedback is not that useful for classes and individual children. I

think this is the biggest concern. In fact it has to be as concrete as

possible. That is the request of teachers; something ready-made. In fact

this is also partly how I am. If I take a method book, I expect not to have

to search for accompanying exercises. (School 3)

6.2. What is/are the result(s) of using school performance feedback?

The findings discussed above indicate factors that can affect the outcomes

of SPF use. We found that in some schools feedback is used as a mirror

image of the school’s performance. In those cases a better understanding

of the school’s impact on pupil performance was developed. However, this

did not automatically lead to (policy) actions. This can be labeled as

conceptual feedback use; it led to reflection in schools, even when the

results confirm prior findings and impressions.

Indeed, so far we have (…) already noticed a few things concerning the

school’s position that we were not aware of before. What we also notice

is that there is a large pupil mobility, which influences our results

significantly. These are important findings for us. (School 12)

Most important was to see where the school’s position is. How well are

we performing and whether the school realizes a value added score. This

is, for me personally, a refinement in thinking about what you are doing

as a school, about your task, about your aims … (School 7)

Illustrations of instrumental feedback usage were rare. Some principals

stated that the feedback information did not offer enough starting points

(e.g., remedial information) to direct actions. However, some principals

reported that action had been taken, such as a reorganization of rosters, an

increase in the number of teaching hours, the introduction of a new reading

method, and more intensive mentoring of new teachers. Even when

information confirmed prior findings, it led to instrumental feedback use:

“What is reported confirms what we already assumed. It is more like an

affirmation of our feelings. And we have done a few things, such as

introducing a new spelling method.” (School 10)

Feedback information was particularly used in a symbolic way.

Respondents indicated that school feedback was a useful instrument in

highlighting existing opinions and underlining various problems in the

Chapter 3

70

school’s functioning. According to the respondents, the feedback was used

as input for shared decision-making. However, this did not lead to concrete

action.

I had my own vision of the school and I wanted to impose it on the team;

this was a good instrument to make out a case for it and to say it is

necessary that we deal with this. (School 4)

Examples that we found of strategic utilization referred to the use of

school feedback in the development of the self-evaluation report to be

submitted to the education inspectorate. Principals reported that they were

grateful to participate in the study because they could make use of school,

class, and pupil related information for this purpose. This factor deviates

from the original theoretical model of SPF usage. Schools seem to have

used the feedback information in the context of being accountable to the

inspection authorities. This is in contrast to the perception of the authors

and developers of the SPFS who want feedback to be used for school

improvement.

Not all of the information gathered about feedback use could be placed

within the predefined coding scheme that was based on the literature

(Rossi, Lipsey, & Freeman, 2004; Visscher, 2002). Therefore two extra codes

were created: a motivating use and a pupil directed use. In some cases, the

feedback information helped to motivate or stimulate school team

members. In some schools, the feedback was communicated to team

members for this purpose, which sometimes implied a selective

presentation of the results.

If you are an immigrant school, as we are, sometimes it is questioned if

our performance level is high enough. And if you receive an output report

from an external organization, it partly confirms we are doing a good

job. (School 16)

For making internal plans (…) we selected some results for reading and

mathematics. We used these results for our own reports to say: ‘Look, on

this measurement occasion, we just took out these results and notice

that our children score like this. And the Flemish average is like this.

Thus, we are below this average’. (School 7)

The latter statement illustrates that lower performance results were also

used to motivate the team members to overcome shortcomings.

Conversely, some school principals kept the feedback results private,

especially if they were not as good as expected. This was explained by the

intention not to discourage team members.

Chapter 3

71

For example, concerning the learning gain scores. Absolutely. If I had to

communicate it and mention that for example the learning gain in the

first grade is smaller than on average and in the second grade larger

than on average, this would be very hard to bear for the teacher involved

if this is made public. I am sure of that. (School 6)

All of the aforementioned examples indicate feedback usage at the

school level. During the interviews, principals stressed that aggregated

results were useful for policy makers, but not for teachers who prefer a

pupil directed utilization. Classroom teachers need data at the pupil level to

direct actions that correspond to the learning needs of individual pupils.

Pupil feedback is seen as complementary to pupil monitoring systems and is

also considered more accessible to interpretation and to direct action on

short notice.

These interview results indicate that school feedback is not extensively

used and has a limited impact. In fact, many school principals had not yet

noticed school improvement effects by using the SPF, and if they had, they

referred to the effects of using the feedback reports of the previous year.

[As a result of mentoring starting teachers and introducing a new

method; cf. instrumental use] We see the AVI-results [AVI is a Dutch

grading system for reading fluency often used in primary education].

When before almost no pupil reached an AVI-1 level at the end of the

year with that method and that young teacher, we now have several

AVI-6 levels. Thus we have good results. That partly was a result of that.

(School 1)

Some principals stated that, because of the longitudinal nature of the study

that provides the feedback services, barriers against the feedback

discussions in the group decreased and interest in the results increased.

This illustrates the valuable effects of process variables that indirectly

contribute to school improvement (Schildkamp, 2007; Schildkamp, Visscher,

& Luyten, 2009; Schildkamp & Teddlie, 2008).

6.3. How can differences be explained in the interpretation and the use of

school performance feedback in different school contexts?

In the theoretical framework, different factors/conditions were discerned

that explain differences in school feedback use. Our findings confirm the

distinction of four clusters of related factors.

Chapter 3

72

Context related factors

To understand why school feedback is used to such a limited extent, we

must take into account both the research context and the Flemish

educational context. In terms of the research context the SPF presented

information at the school level with adjusted scores. These built on a

comparison with a reference group, resulting in value added scores. This is

a very new approach that principals are not acquainted with.

In terms of the Flemish educational context, the central educational

authorities do not formally encourage or oblige schools to adopt an SPFS

approach. Indeed, some authorities are even reluctant to do so, stating that

it introduces the risk that schools will be compared and ranked on the basis

of biased information or that adjusted scores will reveal another school

performance level than expected. However, educational inspection

authorities adopt another view. They encourage schools to document

school performance on the basis of performance related information.

[On being questioned about whether it was a conscious choice to

participate in the research project] You always have the possibility to

refuse…The main reason for me to participate was that our inspectorate

often asks for output results. And yes, of course we have our own class

tests but there is no reference point, because teachers create their own

tests. We also have tests from our methods. But nowhere is there a

comparison with another school to see how we perform. (School 1)

School and user related factors

The interview analysis indicated four groups of related school and user

characteristics.

Differences in expected functions and effects of school performance

feedback. School principals differed in the degree to which they had

expectations of using feedback as well as the goals they oriented

themselves towards with feedback use. Some did not even define goals or

targets, while others reacted in a proactive way. When schools did

formulate explicit and shared goals, the chances of observing more optimal

and successful feedback use increased. This indicates that if schools are

convinced of the potential of school feedback, they undertake actions

toward effective use (Bosker, Branderhorst, & Visscher, 2007). These

actions have to be performed by the users themselves for innovations to

become successful (Fullan, 2007).

Chapter 3

73

A distinction can be made between utilization expectations and effect

oriented expectations. In the former situation, school principals expected to

use the school feedback as a mirror, helping to develop a clearer view of

the current school operation and school performance, and to detect

strengths and weaknesses. Others expected to use feedback for policy

development (e.g., for evaluating policy decisions or developing policy

plans).

We thought ‘look this research will be conducted over seven years; we

are going to follow it up. Where are we as a school? We are putting a lot

of effort into our care policy. What does this effort give us in return?’ (…)

In fact, we do have a very problematic population and it is our goal to

see what the benefit is of all our effort. (School 1)

Another utilization oriented perspective was discussed above (i.e., when

principals used the information for accountability purposes). Almost all

principals intended to use the feedback as input for their discussions with

inspection authorities but stressed that they would not do this for parents.

In terms of effect expectations, principals expected that investing time and

effort in school feedback would eventually improve education: “We expect

to improve our quality of education. So far, for the first grade, it was worth

the effort. That is the goal: an improvement of our education” (School 1).

We found no evidence that the principals systematically reflected upon

their expectations with regard to feedback use and feedback effects. In

addition, principals indicated that their expectations of the feedback did

not necessarily reflect the opinions of their staff members.

Teachers are not willing to participate because it is a lot of work for

them. Moreover, the SPF project examines the same domains as the

pupil monitoring system, thus it does not directly benefit them. (…)

Teachers participate in this research project because the previous school

principal decided they would. For them, it is ‘if it must be.’ (School 2)

Differences in statistical knowledge and skills. Most school principals

claimed not to have advanced statistical knowledge. Their statistical

knowledge was acquired during their initial teacher training and additional

training courses, and was partly based on learning to work with pupil

monitoring systems. However, they stressed that this was insufficient to

work with school performance feedback. Conversely, some did not

experience difficulties, either because everything was explained in the

report or because they had sufficient prior knowledge.

Chapter 3

74

Everything [in the feedback reports] is explained in terms of how to

interpret it. Thus, if one pays enough attention to the instructions ‘to

read it this way and these numbers, if this is mentioned it means this,’

then I think no extra prior knowledge is needed. (School 4)

Differences in time available for feedback use. Some principals reported

that if more time was available, they would have made more use of the

feedback. Because principals and teachers have to divide their time over a

large number of activities, less urgent tasks as those related to SPF use are

not prioritized. This confirms previous findings that the self-evaluation of a

school is not a priority for principals and teachers (Visscher, 1996; Williams

& Coles, 2007).

There is often a lack of time. You cannot use this as an excuse but it is

often the reason. For example at team meetings, you want to put this

and this on the agenda, but then there is not enough time to go more

deeply into it, because there are so many issues coming from the outside.

(School 11)

Differences in perceptions of positive/negative feedback results. When

school feedback reflected low performance levels, the principals were

willing to search for explanations. This confirms the control theory of Kluger

and DeNisi (1996). However this observation cannot be generalized: When

the performance levels were far below average, sometimes feedback

results were not distributed in order not to discourage team members.

When performance results were perceived to be relatively good, further

use of the feedback reports decreased: “We are scoring on average, so

there are no severe differences. So why should we pay much attention to

it?” (School 3).

The perception of the performance results was influenced by the way

the results were represented, for example by the way value added is

calculated. The feedback reports presented both adjusted scores that took

into account the influence of pupil background characteristics and

nonadjusted scores. Our results indicate that especially in schools with a

large number of children with special needs, the adjusted performance

scores were valued positively.

The surplus value of this research for our school is that for all these years

we’ve had the impression we were doing things right. Because we have a

large number of foreign speaking and special needs children we want to

know the effects of the way we organize our education and monitor our

children. (…) Particularly in the last few years with the introduction of

Chapter 3

75

adjusted scores, some attention is given to the pupils’ progress, while

taking into account certain factors. (School 7)

School performance feedback (system) related factors

Feedback has to meet a number of requirements to facilitate correct

interpretation and to promote feedback utilization.

Differences in perceived feedback relevance. All school principals

requested that feedback should fit their needs. These needs differed

between schools. Some principals expressed a primary interest in

performance results on mathematics and language; others were more

interested in socio-emotional development or other subjects. Furthermore,

schools’ preferences differed in the calculation of value added scores

(observed or adjusted scores), in the way information was aggregated

(pupil – class – school – other subgroups), in the amount of statistical

background information in the reports, and the nature of the reference

group(s). During the interviews these differences were observed between

and within schools. Differences were also related to the roles and

occupations of feedback users. Teachers prefer pupil level feedback, pupil

relevant error analysis, and remedial material, whilst policy makers prefer

aggregated information that reflects their school focus.

In my opinion, the school and class level is the most interesting, in view

of my function. I am supposed to work mainly on school and class level

and less on the pupil level. Thus for me this is more interesting than an

individual report. But of course a teacher will see it differently. I am sure

of that. This teacher will probably prefer feedback about the pupils in this

class. (School 10)

When asked for ideas on how to better meet user needs, respondents

suggested enlarging the amount of school subjects to be tested, focusing on

different pupil cohorts, and tailoring information. The interviewees were

not pleased about redundant information. They required feedback systems

to focus on complementary information. In particular, some principals

asked for information that would complement the available monitoring

systems. All respondents required that the performance feedback be up to

date. In particular, teachers expected feedback within the same school year

as when tests had been taken, in order to support low-scoring pupils. When

teachers shifted classes, feedback results of previous years were considered

irrelevant.

Differences in perceived feedback interpretability. For this factor no

coherent picture could be deduced from the interview data. Most principals

Chapter 3

76

stated that interpreting the information was difficult. Some stressed that

interpreting the information without support was a hopeless task. Some

stated that the information could not be understood after only one reading.

But not all principals considered this to be a problem or experienced

difficulty in analyzing the reports. Some principals stated that it is important

to stress that school feedback is a complex field and cannot be simplified

without losing depth and meaning.

It is magnificent the way this report [is written]…It is not easy to explain

something complex that clearly. They [the feedback developers] largely

succeeded in it, but it is still a large amount of information. (…) Of

course, sometimes I get lost, which is not surprising, considering the

technicality of it. (School 7)

During the interviews, explanations for why some principals were not

able to correctly interpret the feedback were given. Some complained of a

lack of structure in the feedback information. Others criticized the amount,

stating that they skipped a lot of information, were selective, and focused

only on the school results.

Maybe some parts are less interesting for me, but this is not a reason to

leave out this information from the reports, because everything is

concisely described. For example, the information about pupil mobility, if

it does not interest you, just turn the page. (School 10)

In contrast, others appreciated the comprehensiveness of the feedback

reports and preferred the additional information. A third element

influencing the interpretability of feedback was the balance between

technical concepts and the way school staff label and discuss education.

Feedback was often experienced as being too abstract. Additionally,

principals seemed to be less familiar with feedback that was aggregated at

the class and school level. Both the language used and the graphical

representations (growth curves, box plots) led to difficulties in

interpretation. Some school principals stressed that the feedback is not

appropriate for teachers as they do not possess the competence or

experience to interpret the information, whilst others did not question the

competence of their staff.

Differences in perceived validity and reliability. Respondents trusted the

professionalism of the feedback developers. Nevertheless, they expressed

some concerns. Some principals valued the feedback less because the

adjusted scores do not take into account school specific process and

context variables. The feedback developers wanted to articulate these

differences, but schools preferred an adjustment model taking into account

Chapter 3

77

more external influences that explain school outcomes and result in an

average school profile.

I think researchers do not have enough information [about pupil and

school characteristics]. They do not know we introduced a new reading

method, which caused problems. They do not know there was a starting

teacher. And they do not know that this teacher is not worthy of being

called a teacher. That gives different results. This information should be

on top of it [of the current adjustment procedure]. It is important for the

school. (…) Now it does not give a correct image of the school. (School 1)

The feedback was perceived as valid and reliable when the results were

congruent with the findings of pupil monitoring systems, school tests, or

intuition. When this was not the case the results were seen as less valid and

low performance was more easily attributed to external factors, such as the

difficulty of test items, atypical question methods, and incorrect results of

the reference group (i.e., some schools were thought to have falsified their

results by helping their pupils during the test). Others criticized the single-

shot nature of the data gathering. A particular problem arose when a school

was geographically distributed. Aggregation of data at the school level was

of lesser value because the school’s population, and sometimes also

school’s policies, can differ between geographical locations. Finally,

concerns were expressed when class organization or differentiation forms

were very different from the approaches adopted in the reference group.

Differences in perceived user-friendliness of the SPFS. The nature of the

overall feedback system influenced feedback use. Respondents complained

about the large investment of time and effort during the data gathering

process. Teachers and pupils perceived the tests as stressful. In addition,

questionnaires directed to parents required a considerable amount of time

and a willingness to report private information. Furthermore, test times

overlapped with other key assessment and evaluation periods in the school

year. This explains why some teachers considered participation in the

project as an extra burden on top of a heavy workload. This feeling was

reinforced when the feedback was perceived as less relevant.

User-friendliness also refers to the tailoring of the school feedback.

Some principals suggested adapting the report to the individual school

setting. In the same line, satisfaction with the communication between user

and the feedback system played a role. Moreover, the schools received the

feedback at a rather unexpected moment, which made it difficult to include

the new information in the policy making cycle.

Chapter 3

78

Support related factors

The interviewees offered valuable information on user needs concerning

feedback support and advised us about how to fulfill these needs. The

results reveal that feedback use requires both policy oriented and research

oriented skills. These are skills that must be developed (Visscher, 2002).

Differences in support needs. As mentioned above, most users reported

not being able to interpret the information without extra support.

Nevertheless, feedback support should go further than just assuring a

correct interpretation. Almost all principals reported that they got stuck

after their attempt at interpretation. They stated that they did not feel

confident about their interpretative capacities and that they needed

recommendations on how to proceed to the next phases in feedback use.

I can only hope my interpretations are correct. But definitely with the last

report, it is so extensive that there is some – I will not say doubt, but fear

– that it might be wrong. (School 6)

It is the same problem as with pupil monitoring systems. You can go to

the teacher and say ‘these are your results. This child scores an E, Here

you are.’ That teacher will file the report and there it stops. (School 3)

The respondents asked for specific help dependent on whether they

received positive or negative feedback results. Furthermore, they requested

help in diagnosing the causes and circumstances that the results could be

attributed to. Most respondents asked for concrete instructions for action.

This suggests that consultation services could help to fulfill these additional

needs.

Differences in support characteristics. When asked for ideas on how to

organize support, some respondents requested a face-to-face introduction

to the concepts and representations in the report. These sessions should be

organized on site, but if that is not possible, regional meetings are

acceptable.

Feedback support should be functional, offering intelligible, theoretical,

and practical information. Principals expected the support to go beyond the

interpretation phase and to empower schools to diagnose their results.

Concerning the interpretation, we try to manage it. But we do not know

if we are doing it right. It would be interesting if the SPF project would

come with the report to the schools and would explain the information in

a team meeting with the teachers, with the whole team, to show us how

to look at the results. ‘What’s the next step?’ Because now we only get

the ‘sec’ results and read them as such, as how they are printed. Even

Chapter 3

79

some reading advice is provided, the impulse to really do something with

it is always lacking. (School 4)

Defining the role of external support services was a difficult issue. Some

respondents claimed that schools have to take the lead in feedback use.

This is in line with Earl and Fullan (2003), who claim that professional

development will help strengthen personal confidence and self-efficacy in

coping with complex feedback information. The respondents indicated a

preference for internal support by counselors and via in-service training.

External support from feedback suppliers should not interfere with these

initiatives. They emphasized the demand-driven nature of support. This

confirms the idea that external support must be tailored to the needs of

individual schools. A sufficient level of goodness-of-fit is a requirement to

achieve successful support (Nevo, 1995).

Principals also referred to school team members as a basis for support.

Principals mostly got support from the special care coordinator or teacher.

Often these staff members were more experienced in interpreting

statistical concepts and graphical representations. These staff members can

play a role as complementary specialists. As they have a more flexible work

schedule, they can allocate time to study feedback reports. This is not the

case for teachers that have to work according to a prescheduled roster.

Some school principals also wanted to protect team members against work

overload, thus not involving them in feedback use activities. They might

also have perceived these staff members as less important sources of

support in feedback use.

7. Implications, limitations and conclusion

The present study focuses on the perception of principals of school

performance feedback and the actual use of feedback information. This

study took place within the context of a larger project aiming to develop

and implement a new school performance feedback system. This study also

builds on an eclectic framework that integrates the literature on SPF use.

This framework was the guiding structure for interviews with 16 principals

from different primary school settings. Our results indicate that the

elements presented in the theoretical framework reappear in the

interviews. Figure 3 represents the integration of findings from the

literature and our study.

80

Figure 3. Integration of literature and research findings on SPF use

Context related

◦ School improvement –accountability

◦ Pressure and support

◦ Support needs

◦ Support set up

◦ Internal - External

School and userrelated

◦ Functions/expectations of SPF use

◦ Prior knowledge and experience in data use

◦ Priorities in task scheme

◦ Statistical knowledge and skills

◦ Perception of school performance level

School performancefeedback(system)related

◦ Perception of relevance

◦ Perception of interpretability

◦ Perception of validity and reliability

◦ Perception of user-friendliness

Supportrelated

Results: Types offeedback use

InstrumentalConceptualSymbolicStrategicPupil directedMotivating

Successivephases

Reading and discussing

DiagnosisPlanning

Implementation

Evaluation

Receivingfeedback

Interpretion

Intended – Unintended Desirable – Undesirable

Product - Process

F E E D B A C K U S E

E F F E C T S I N F L U E N C I N G F A C T O R S

Chapter 3

81

The aim of this study was to illustrate and elaborate a framework of

factors that influence school performance feedback use. Where previous

studies have provided literature findings (Visscher, 2002; Visscher & Coe,

2003), perspectives of feedback suppliers (Schildkamp & Teddlie, 2008;

Visscher & Coe, 2002), and quantitative methods of testing feedback use

(Schildkamp, Visscher, & Luyten, 2009), this study illustrates the influence

of different variables on feedback use in a qualitative way.

From a theoretical perspective our findings can help refine the

description of feedback use. Whereas previous studies (e.g., Schildkamp,

2007; Visscher, 2002; Visscher & Coe, 2003) make a distinction between

different kinds of information use (cf. instrumental, symbolic, and

conceptual use; Rossi, Lipsey, & Freeman, 2004; cf. strategic use; Visscher &

Coe, 2003), an empirical investigation of the phases of feedback use has not

been carried out. In this study both were explored. Additional types of

feedback use emerged from the data: a motivating and pupil directed use.

The interview data also show that different types of feedback use are

related to one another and occur simultaneously or successively. While a

sequence of feedback phases can be discerned theoretically (Learning Point

Associates, 2004), the process of feedback use is less systematic in practice.

Our findings indicate that users can get stuck in the process of feedback

use. A crucial challenge for future feedback use is to detect the difficulties

in each phase and to offer appropriate support to systematize the process

involved.

Our findings indicate that interpreting school feedback, making a

diagnosis based on the results, discussing causes, and setting up actions

based on feedback results is not a clear-cut process. The results reveal that

feedback use requires both policy oriented and research oriented skills

which must be developed by users (Visscher, 2002). Educational authorities

should not neglect the importance of stimulating professional development

and providing external support. Expectations about the positive impact that

feedback use can have on school improvement will only be realized if extra

support is available (Schildkamp & Teddlie, 2008; Sun, Creemers, & de Jong,

2007).

To design appropriate support initiatives, a detailed analysis of the

difficulties encountered when interpreting feedback reports must be

conducted. For example, a recent study, which used both oral

comprehension tests and IRT-calibrated online tests, illustrated the

misconceptions that respondents reported during the interpretation of

feedback reports. The results of that study contributed to the design of

specific support initiatives (Verhaeghe, Verhaeghe, Vanhoof, & Valcke,

Chapter 3

82

2009). Furthermore, experimental studies that manipulate the nature of

external support can contribute to the design of a more sophisticated SPFS

(e.g., Tymms, 1995) and the required support measures. In the design of

SPFS, it is important to integrate the characteristics which appear to have a

considerable influence on feedback use, such as relevance, interpretability,

reliability, and validity. These characteristics are mediated by the

perceptions of the feedback users. What is considered relevant by feedback

developers, policy makers, or researchers does not necessarily correspond

with what the target group perceives as relevant. However, little is known

about the effect of these differing perspectives in the context of school

feedback use.

Moreover, one cannot expect schools to successfully implement

innovations without making sufficient resources available (Davies & Rudd,

2001; Kimball, 2002). As school feedback use is not heavily promoted

(Davies & Rudd, 2001), resources are limited. When we consider the work

load of teachers and principals, our findings indicate that teachers will

prioritize their classroom related activities at the expense of school level

issues.

This study was conducted in Flanders where there is no accountability

culture or central examination system. It is not yet clear whether effective

feedback use in such a context should only function within a school

improvement perspective, as we found that feedback use was stimulated

by an accountability orientation in terms of the inspection visits. It would be

useful to examine the (in)direct influence of national and international

authorities on feedback use (Creemers, 2006). Future research could focus

on the relationship between a school improvement and an accountability

orientation of educational authorities and key stakeholders (Vanhoof & Van

Petegem, 2007) and on the balance between internal and external

evaluations (Kyriakides & Campbell, 2004), influencing feedback use in

schools.

The present study contains certain limitations. The validity of our

findings is restricted to a specific educational context, with a particular

school performance feedback system. However, the aim of this study was

not to formulate generalizations but to explore and illustrate feedback use

by its users. Another limitation is that a comprehensive framework is

needed with an evidence based set of influencing factors. Neither this

study, nor previous school performance feedback studies have attempted

to meet this need. Furthermore, the link between school performance

feedback use and the more general practice of data driven decision making

remains unexplored. Despite the focus on accountability in the data driven

Chapter 3

83

decision making literature, common points of interest with SPF use can be

further examined.

References




Creemers, B.P.M. (2006). The importance and perspectives of international

studies in educational effectiveness. Educational Research and

Evaluation, 12(6), 499-511.




Davies, D., & Rudd, P. (2001). Evaluating school self-evaluation (LGA

research report 21). Berkshire: National Foundation for Educational

Research, Local Government Association.

Dey, I. (1993). Qualitative data analysis: A user-friendly guide for social

scientists. London: Routledge.

Earl, L., & Fullan, M. (2003). Using data in leadership for learning.

Cambridge Journal of Education, 33(3), 383-394.

Fitz-Gibbon, C.T., & Tymms, P. (2002). Technical and ethical issues in


Policy Analysis Archives, 10(6), 68-82.

Fullan, M. (2007). The new meaning of educational change (4th ed.).

London: Cassell.

Goldstein, H., & Myers, K. (1996). Freedom of information: Towards a code


Goldstein, H., & Spiegelhalter, D.J. (1996). League tables and their


performance. Journal of the Royal Statistical Society. Series A (Statistics

in Society). 159(3), 385-443.

Hammond, P., & Yeshanew, T. (2007). The impact of feedback on school

performance. Educational Studies, 33(2), 99-113.

Hoy, W., & Miskel, C. (2001). Educational administration: Theory, research

and practice. Boston: McGraw-Hill.




Education, 112(4), 496-520.

Chapter 3

84

Kimball, S.M. (2002). Analysis of feedback, enabling conditions and fairness

perceptions of teachers in three school districts with new standards-

based evaluation systems. Journal of Personnel Evaluation in Education,

16(4), 241-268.



feedback intervention theory. Psychological Bulletin, 119(2), 254-284.

Kyriakides, L., & Campbell, R.J. (2004). School self-evaluation and school





data use at learning point associates. Retrieved October 23, 2007, from


Liket, T.M.E. (1992). Vrijheid & rekenschap: Zelfevaluatie en externe

evaluatie in het voortgezet onderwijs [Freedom and accountability: Self

evaluation and external evaluation in secondary education]. Amsterdam:

Meulenhoff Educatief.

Lindlof, T.R., & Taylor, B.C. (2002). Qualitative communication research

methods (2nd ed.). London: Sage Publications.

Maso, I., & Smaling, A. (1998). Kwalitatief onderzoek: Praktijk en theorie

[Qualitative research: Practice and theory]. Amsterdam: Boom.

Mason, J. (2002). Qualitative Researching (2nd ed.). London: Sage

Publications.


expanded sourcebook (2nd ed.). Thousand Oaks: Sage Publications.

Nevo, D. (1995). School-based evaluation: A dialogue for school

improvement. Oxford: Pergamon.



perspective (pp. 3-16). Oxford: Elsevier Science.

Ritchie, J., & Spencer, L. (1994). Qualitative data analysis for applied policy

research. In: A. Bryman & R. Burgess (Eds.), Analysing qualitative data

(pp. 173-194). London: Routledge.

Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic

approach (7th ed.). London: Sage Publications.



15(3), 241-258.



Chapter 3

85


Sussex.

Schagen, I. (2004, November). Weighing the baby or fattening it: The use of

data to inform school evaluation. Paper presented at the NFER/ConfEd

Annual Research Conference, London.



Twente.

Schildkamp, K., & Teddlie, C. (2008). School performance feedback systems

in the USA and in The Netherlands: A comparison. Educational Research





Silverman, D. (2005). Doing qualitative research: A practical handbook (2nd

ed.). Londen: Sage Publications.

Strauss, A.L., & Corbin, J. (2007). Basics of qualitative research: Grounded

theory procedures and techniques (3rd ed.). Newbury Park, CA: Sage

Publications.

Sun, H., Creemers, B., & de Jong, R. (2007). Contextual factors and effective

school improvement. School Effectiveness and School Improvement,

18(1), 93-122.

Tymms, P. (1995). Influencing educational practice through performance

indicators. School Effectiveness and School Improvement, 6(2), 123-145.

van Aanholt, T., & Buis, T. (1990). De school onder de loep [The school under

scrutiny]. Culemborg, The Netherlands: Educaboek.




school improvement]. Pedagogische Studiën, 81, 338-353.

Van Petegem, P., & Vanhoof, J. (2007). Towards a model of effective school

feedback: School heads’ point of view. Educational Research and

Evaluation, 13(4), 311-325.

Van den Berg, R., & Ros, A. (1999). The permanent importance of the

subjective reality of teachers during educational innovation: A concerns-

based approach. American Educational Research Journal, 36(4), 879-906.

Vanhoof, J. (2007). Zelfevaluatie binnenstebuiten: Een onderzoek naar het

verloop en de kwaliteit van zelfevaluaties in scholen [Self-evaluation

inside out: A study on the proceeding and quality of self-evaluations in

schools]. Mechelen: Wolters-Plantijn.

Chapter 3

86

Vanhoof, J., & Van Petegem, P. (2007). Matching internal and external



101-119.

Verhaeghe, G., Verhaeghe, J.P., Vanhoof, J., & Valcke, M. (2009). The value-

added results of schools: How to represent school feedback information?

Manuscript submitted for publication.

Visscher, A.J. (1996). The implications of how school staff handle

information for the usage of school information systems. International

Journal of Educational Research, 25(4), 323-334.



through performance feedback (pp. 41-71). Lisse: Swets & Zeitlinger.

Visscher, A., & Coe, R. (Eds.). (2002). School improvement through

performance feedback. Lisse: Swets & Zeitlinger.

Visscher, A., & Coe, R. (2003). School performance feedback systems:



Williams, D., & Coles, L. (2007). Teachers' approaches to finding and using


Research, 49(2), 185-206.

87

CHAPTER 4

VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL FEEDBACK INFORMATION

Chapter 4

88

CHAPTER 4: VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL

FEEDBACK INFORMATION∗∗∗∗

Abstract

The use of data for school improvement purposes has recently gained

research interest. In order to use school performance feedback (SPF)

effectively, it is necessary to interpret feedback information correctly.

However, systematic research on this topic is scarce. Therefore, the present

experimental study was set up to examine the effectiveness of various

modes of explaining and representing the statistical concepts ‘value added’

and ‘learning gain’ in SPF reports. The results indicate that non-statistically

skilled people encounter interpretation difficulties, especially in deriving

value-added scores and for complex conceptual questions. This delineates

the importance of developing effective SPF systems and support initiatives.

∗ Based on Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of

schools: How to represent school feedback information. Manuscript submitted for publication in The Journal of Educational Research.

Chapter 4

89

1. Introduction

Over the last decades, governmental bodies require schools to be

accountable for their educational quality in return for school autonomy

(Nevo, 2002). Schools are not only expected to gain insight into their input

and process characteristics, but also to link these to their output. Therefore,

schools are required to systematically gather data on their functioning for

self-evaluation purposes. In this context, school performance feedback

(SPF) is an important source of information on pupil performance. It also

helps the school to detect the extent to which it contributes to pupils’

performance levels.

The correct interpretation of school feedback information is a crucial

condition for effective feedback use (Visscher, 2002; Visscher & Coe, 2003).

The interpretation phase is one of the most important phases in the process

of using feedback and requires a considerable amount of time, skills, and

effort (Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010). An examination

of existing SPF systems and their related literature reveals that research on

user comprehension is scarce (Schildkamp & Teddlie, 2008). Few studies

have examined the effectiveness of the various modes of explaining and

representing data in school feedback reports. This is problematic

considering the fact that SPF reports use complex concepts, such as

learning gain and value added, whilst SPF users (i.e., school principals and

teachers) are often not statistically skilled (Earl & Fullan, 2003; Kerr, Marsh,

Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Williams & Coles, 2007).

In this study we explore whether alternative modes of explaining and

representing SPF information have a differential impact on user

comprehension in a group non-statistically skilled people. The results of this

study are expected to contribute to a better understanding of the way the

target group (i.e., mainly school principals and teachers) interprets SPF

information. The results are also expected to direct the design and

development of a new SPF system.

Below we outline the central concepts investigated and present our

theoretical frame on how alternative ways of presenting complex

information influence users’ understanding. Following this, research

questions and hypotheses are presented.

1.1 School Performance Feedback

School performance feedback is conveyed to its users through school

performance feedback systems (SPFSs). Visscher and Coe (2002) define

SPFSs as “Information systems external to schools that provide them with

Chapter 4

90

confidential information on their performance and functioning as a basis for

school self-evaluation” (2002, p xi). SPFSs primarily aim at supporting school

improvement and internal quality policy, which distinguishes them from

school accountability systems. These feedback initiatives are

complementary to central examination results (provided by government

agencies) and to school and class data. SPFSs contribute to the creation of

information-rich environments essential for schools in their data-driven

decision making process. These data provisions serve schools in their role as

learning organisations that continuously monitor and improve their quality

policy (Hofman, Dijkstra, & Hofman, 2009; Leithwood, Aitken, & Jantzi,

2006).

Examples of available SPFSs or related projects are Performance

Indicators in Primary Schools (PIPS; Durham University, Centre for

Evaluation and Monitoring, n.d.), and the VAS [Monitor and Advice System]

(CiTO [Central Institute for Test Development], n.d.). In some cases, SPF is

also provided to schools in return for their participation in large scale

research projects. This approach is adopted by Flemish researchers in

relation to the Progress in International Reading Literacy Study (Katholieke

Universiteit Leuven, Centre for Educational Effectiveness and Evaluation,

2008), the Programme for International Student Assessment, and the

Trends in International Mathematics and Science Study.

The present study focuses on the representation and understanding of

two statistical concepts that can be considered as key concepts in school

feedback reports: learning gain and value added (Saunders, 2000). In this

study, learning gain (or gain score) is defined as the progress of a learner in

a certain knowledge domain. This can be considered as the difference

between the test scores of an individual at two different moments, on the

condition that the same scale is used for both tests. The latter implies that

the same test is presented twice or different tests are IRT (Item Response

Theory) calibrated. For example, PIPS Reception builds on the

administration of the same test at the start and at the end of the school

year while the VAS calculates the learning gain on the basis of skill scores,

estimated by the performance on different tests at three measurement

occasions.

Value added refers to the extent to which the school has contributed to

the achievement progress of its students (Organisation for Economic Co-

operation and Development, 2008). It can be operationalised as the school

level residual, after adjusting for the effects of student background

characteristics. Two different approaches to explaining the concept of value

added can be discerned in SPF reports. Both approaches to explaining the

concept are used in an overview publication of the OECD in relation to

Chapter 4

91

value-added modelling (OECD, 2008), but are not clearly distinguished or

further refined.

The first approach is based on the notion of expected values. Based on

students’ background characteristics and their prior achievement, a certain

achievement level can be expected. Students’ actual/observed achievement

level can be higher or lower than the expected/predicted achievement

level, because of measurement error, uncontrolled individual differences,

or differences between schools. Averaging the differences between

expected achievement and observed achievement within a particular

school will principally (i.e., when there is a sufficient number of students)

cancel out measurement error and the impact of individual differences.

What is left is the shared value added for all students attending the same

school. This approach is applied in the feedback reports of the Centre for

Evaluation and Monitoring and in the Value Added National Project (Fitz-

Gibbon, 1997).

The second approach to explaining the concept of value added starts

from the notion of adjusted achievement level. The adjusted achievement

level represents the achievement level a learner in a particular school

would have had if his or her input characteristics and prior achievement

were equal to the reference group, i.e., the “average” school. If a school

does not differ from an average school in its contribution to students’

learning, its mean adjusted achievement level will be equal to that of the

average school. In that case, the difference between the school mean of the

students’ adjusted achievement scores and the mean of the average school

will be zero. However, if there is a difference, the school’s contribution to

students’ learning (value added) is higher or lower than in the average

school. This approach is used in the PIRLS reports in Flanders.

It should be noted that these two approaches refer to exactly the same

statistical procedure and resulting regression equation. They only differ in

the underlying mathematical operation used to isolate the school level

residual from that equation, as represented in Table 1.

Chapter 4

92

Table 1 Regression equations of two approaches determining the school’s value added

Approach 1 – Expected means

Basic multilevel regression equation ��

Observed school mean is given by � � � ��

�

��

Stripping the school level residual gives the predicted or expected school mean

� ��

�

��

Subtracting predicted mean from raw mean yields

� � � � ��

From which follows ��

Meaning: Value added =

observed mean – expected mean

Approach 2 – Adjusted means

Basic multilevel regression equation ��

Observed school mean is given by � � � ��

�

��

Stripping the effects of student input characteristics (= setting them equal to reference group)

� � ��

�

��

Yields adjusted school mean � �, � ��

From which follows �� , � ��

Meaning: Value added =

adjusted mean – grand mean

To our knowledge, the differential impact of these two modes of

explaining value added on SPF users’ understanding of the concept has not

yet been studied. Not much research has been carried out on the way SPF is

interpreted by its users or how this interpretation is influenced by the

representational format of SPF reports. However, since school feedback

builds on numerical information, statistical concepts, and graphical

Chapter 4

93

representations, literature on graph design and interpretation is used as a

starting point for the design of this study.

1.2 Representation of complex quantitative information

Numerous studies have examined the way graphical representation of

numerical information is understood (e.g., Kosslyn, 2006; Leinhardt,

Zaslavsky, & Stein, 1990). Research has also examined the representation of

textual information in combination with multimedia representations, such

as illustrations and graphs (e.g., Mayer, 2001; Mittal, Carenini, Moore, &

Roth, 1998; Schnotz & Bannert, 2003), in relation to task performance. We

do not present an overview of this field of research, but build on the

available evidence about design principles derived from these studies. We

refer to these principles when describing the experimental design of the

present study.

Principles of effective graph design

An interesting overview study that summarises best practices in graph

design is the work of Kosslyn (2006). According to this author, many graphs

are not satisfactory because they do not adequately consider the aims,

needs and competences of the user. Based on research in perceptual and

cognitive information processing, Kosslyn proposes eight design principles:

(1) The principle of relevance. This concerns limiting or reducing the amount

of information presented: Only the information necessary to get the

message across must be represented. (2) The principle of appropriate

knowledge. This means tailoring information to the prior knowledge of the

user. (3) The principle of salience. Since it is crucial to attract the audience’s

attention, this principle stresses making large perceptible differences in

information presentation. (4) The principle of discriminability. The graphical

representation should enable users to distinguish between different pieces

of information. (5) The principle of perceptual organisation. This refers to

the tendency of users to group together perceptual elements and to

remember these groups better than isolated elements. Furthermore,

Kosslyn recommends promoting understanding and memory: (6) the

principle of compatibility stresses the importance of the compatibility

between form and meaning, and (7) the principle of informative changes

indicates that readers interpret any changes in displays as conveying

information (e.g., changing the colour, adding lines). Finally, (8) the principle

of capacity limitations, addresses users’ limited capacity to retain and

process information. Mayer (2001) also stresses these limitations in his

Chapter 4

94

cognitive theory of multimedia learning. In this context, Sweller, van

Merriënboer, and Paas (1998) describe the concepts of intrinsic and

extraneous cognitive load. Intrinsic cognitive load refers to the difficulty

inherent to instructional materials. The degree of intrinsic cognitive load

depends on the element interactivity, or the number of elements

simultaneously manipulated in one’s working memory. While intrinsic

cognitive load is related to the material being learned, extraneous cognitive

load refers to the instructional design. This concerns the execution of

cognitive activity that is redundant to the purpose of the task (Chandler &

Sweller, 1991). Overtaxing the user’s working memory is caused by

ineffective presentation of the materials. For example, when accompanying

text and illustrations are presented separately or inappropriately, the

reader has to invest extra cognitive effort to integrate the information. Both

types of cognitive load are additive, but only extraneous cognitive load can

be altered and prevented by the design of the learning material.

Misconceptions in interpreting graphs

Next to research on data representation, our study also builds on previous

research on the interpretation of graphs and the common misconceptions

of inexperienced users. Smith III, diSessa, & Roschelle (1993) define a

misconception as “a student’s conception that produces a systematic

pattern of errors” (p.119) that arises from the student’s prior learning. This

prior learning can follow from formal instruction (Smith III, diSessa, &

Roschelle, 1993), general knowledge, or intuition (Leinhardt, Zaslavsky, &

Stein, 1990). Alternative terms that are used to depict students’

misconceptions include preconceptions, alternative conceptions, naïve

beliefs, alternative beliefs, alternative frameworks, naïve theories, and

systematic errors (Mevarech & Kramarsky, 1997; Smith III, diSessa, &

Roschelle, 1993). All terms refer to students’ conceptualisations that differ

from the accepted or intended meaning of the instructed concepts. From a

constructivist point of view, misconceptions can be considered as the

incomplete acquisition of expert knowledge in a learning process, rather

than mistakes that impede learning (Smith III, diSessa, & Roschelle, 1993).

Clement (1988) and Leinhardt, Zaslavsky, and Stein (1990) present a review

of common misconceptions that occur when people interpret graphs. These

include the slope-height confusion, confusion in responding with a point or

an interval, and mistaking a graphical as an iconic representation (e.g.,

uphill and downhill for a rising and a descending curve). These

misconceptions are also mentioned in the studies of Beichner (1994),

Mevarech and Kramarsky (1997), and Kramarski (2004).

Chapter 4

95

SPF systems have not yet been incorporated in Flanders’ school system

which means that feedback users are inexperienced. Given that SPF

contains several complicated concepts (e.g., value added and learning gain)

and forms of representation (e.g., growth curves), we expect feedback

users to make several mistakes when interpreting the information. In this

study we explore which difficulties are encountered by statistically unskilled

participants when trying to understand learning gain and value added. In

this study we try to prevent interpretation difficulties by using effective

graph design principles in the SPF reports. It is therefore important to

examine what ways of representing information in the SPF report are

effective, taking into account the characteristics of its users.

Individual differences

Previous research has explored the interaction between representation

formats and individual characteristics. The effect of representation forms

on task performance appears to depend on learning styles, individual

preferences (Dekeyser, 2001), differences in ability (Mayer, 2001; Tapiero,

2001), and prior knowledge (De Westelinck, Valcke, De Craene, & Kirschner,

2005; Mayer, 2001; Shah & Hoeffner, 2002). Prior knowledge also appears

to be an important factor in determining the degree of intrinsic cognitive

load. Since more experienced users are able to handle higher element

interactivity they experience a lower degree of intrinsic cognitive load

(Sweller, van Merriënboer, & Paas, 1998). Furthermore, a recent study was

carried out on the interaction between learner characteristics and

hypermedia learning, cognitive load, and information utilisation strategies

(Scheiter, Gerjets, Vollmann, & Catrambone, 2009). The results of that

study indicate that characteristics such as positive attitudes towards

mathematics, more complex epistemological beliefs, higher prior

knowledge, and better cognitive and metacognitive strategy use have a

positive influence on these outcome variables.

This study will incorporate several individual variables that are expected

to have an influence on feedback users’ understanding of the information.

1.3 Research questions and hypotheses

The present study examines the extent to which non-statistically skilled

users understand the explanations and representations of the concepts

learning gain and value added. We focus on feedback users’ conceptual and

procedural (i.e., deriving information from graphical representations)

understanding of these terms. The level of understanding is indicated by

Chapter 4

96

the number of misconceptions and other mistakes made during the

interpretation of the information. Since the feedback users participating in

this study are inexperienced in interpreting SPF reports, we expect them to

have misconceptions and interpretation difficulties with the complex

conceptual and graphical information (Hypothesis 1).

Interpreting information accurately largely depends on the instructional

design of the learning material. Two modes of explaining the term value

added have been discussed above. Since no research has been carried out

on these two modes of explanation, we cannot formulate any expectations

with regard to the differential impact they may have on feedback users’

conceptual and procedural understanding of the SPF report. We therefore

examine this in an exploratory way (Research question 1)

Adding representations to textual information can support the

interpretation of the information presented, as indicated in the theory of

multimedia learning (Mayer, 2001). Analogously, we test the hypothesis

that graphical representations with supporting information that is supposed

to facilitate interpretation are more favourable for successful feedback

interpretation than basic representations (Hypothesis 2).

As individual learner characteristics have an effect on task performance

and cognitive load, we take these differences into account and expect them

to serve as significant control and/or moderator variables.

2. Method

2.1. Design

A 2x3 factorial experimental design with post-test was used. Two variants

of explaining the concept of value added were combined with three

alternatives to represent learning gain and value added (for a schematic

overview, see appendix A).

2.2. Participants

The target audience for SPF consists of school principals and teachers in

primary and secondary education. However, since no SPF system is

currently available in Flanders, a study was set up involving first year

students in educational sciences (N = 312, mean age 19.33 years, SD 1.69,

88% women) at the Ghent University. Not all participants started

educational studies without prior study experience as some had already

obtained a professional bachelor degree in a different subject area (n = 62,

Chapter 4

97

mean age 21.98 years, SD 1.14, 81% women; versus freshmen, n = 250,

mean age 18.75 years, SD 1.16, 90% women).

Students participated in this experiment as a formal part of their study

programme. They subscribed individually for one of the eight parallel

experimental sessions.

2.3. Material

SPF Tutorial - Experimental conditions

Participants were randomly assigned to one of the six different

experimental conditions. In each condition, they received a specific version

of an SPF report via PowerPoint presentation. This medium was used as it

enables a controlled stepwise instruction for each participant at his/her

own pace and simulates the electronic version of the SPF that schools

receive.

Each of the six presentations consisted of approximately 40 slides. First

an introduction and slide overview was presented, followed by pie graphs

representing the background characteristics of the fictitious school

population. Next, an explanation of the concept learning gain was

presented, which was the same in all experimental conditions. Third, the

definition and estimation of value added was shown.

Each feedback report presented the average growth curve from grade

one to grade six of a cohort of pupils and presented the school’s value

added for one single subject. The horizontal axis of the line graphs indicated

time and measurement moments; the vertical axis indicated the mean skill

score, as represented in Figure 1.

Figure 1. Example of growth curve as used in the present study.

Chapter 4

98

The design of the tutorials builds on Kosslyn’s principles of relevance,

appropriate knowledge, and capacity limitations (2006). As few slides as

needed were used to clearly explain the concepts learning gain and value

added. Therefore, only limited information was given about the underlying

statistical analyses, such as Item Response Theory and regression analysis.

Furthermore, captions were added piecewise to graphs, consistent with

Mayer’s theory of multimedia learning (2001). Spatial contiguity was

respected to promote the integration of captions and illustrations, and to

prevent extraneous cognitive load (Chandler & Sweller, 1991; Sweller, van

Merriënboer, & Paas, 1998). For the design of the growth curves, different

colours for lines and different symbols for points were used, following the

principle of discriminability and salience (Kosslyn, 2006).

The difference between alternative presentations was based on the way

value added was explained (Research question 1). In half of the conditions,

value added was presented as the difference between the school mean of

the students’ adjusted achievement and the mean of the “average school”

(i.e., the reference group). In the other conditions, value added was

explained as the difference between the average expected achievement

and observed achievement within the fictitious school.

In addition, three different presentation formats were used in the target

group: (1) a baseline version building on text and graphs, and two

elaborated versions enriched with either (2) tables, or (3) symbolic

representations of the underlying statistical concepts. For the basic version,

we opted for text explanation and growth curves, since that is a common

way to represent longitudinal data. Two additional conditions were created

by adding representations that are supposed to support the knowledge

construction of the learners (Hypothesis 2). First, we opted to add cross

tables to the basic version to support the use of prior knowledge, since the

target audience is acquainted with this form of representation from daily

use. The second elaborated version was based on symbolic representations

to explain the simplified regression equations (without detailed equations

as in Table 2). Schematic representations were used, but the variable

names were written in full instead of using Greek symbols (see Figure 2). In

this way, Kosslyn’s principle of appropriate knowledge (2006) was

respected. This form of representation was expected to foster a more deep-

level understanding of the value-added estimation procedure, resulting in

higher performance scores.

Figure 2. Example of symbol representation as

Performance test

An online post-test was developed to measure respondents’ conceptual

understanding of the SPF and their procedural skills in deriving information

from graphs and tables (Anderson, Krathwohl, Airasian, Cruikshank, Mayer,

Pintrich, et al., 2001). The post-test consisted of two parts. In the first part

(closed version) students were not allowed to lo

report, whilst they could do so in the second part (open version). In reality,

principals and teachers are always able to check the feedback report, as

mirrored in the open version of the test. Nevertheless, the closed version

was used to determine the potential differential learning effects of

alternative representations. Since we can expect a carry over effect of

taking the first test on the results of the second test, half of the

respondents only participated in the second test. B

experimental conditions and test approaches, 12 different test groups can

be distinguished with the number of participants in each group varying

between 25 to 28 (see Appendix A for a schematic overview).

As we found no previous research that tests SPF users’ comprehension

of value added and learning gain, we developed a test for this study. To do

so, we created a framework which included all of the different cognitive

tasks that have to be performed to correctly interpret the feedb

information. Using this framework, suitable test items were developed

varying in degree of difficulty. Out of this list of closed items (true

and multiple choice) and open items (filling in digit values), a test was

composed that could be comple

conceptual and procedural items on two central concepts in the SPF report:

• 6 items referring to conceptual knowledge of learning gain: For example:

“Learning gain is the extent to which pupils progress in a certain ski

domain. (true-untrue)”

Chapter 4

99

Example of symbol representation as used in the present study.

test was developed to measure respondents’ conceptual

and their procedural skills in deriving information

from graphs and tables (Anderson, Krathwohl, Airasian, Cruikshank, Mayer,

test consisted of two parts. In the first part

(closed version) students were not allowed to look back at the feedback

report, whilst they could do so in the second part (open version). In reality,

principals and teachers are always able to check the feedback report, as

mirrored in the open version of the test. Nevertheless, the closed version

sed to determine the potential differential learning effects of

alternative representations. Since we can expect a carry over effect of

taking the first test on the results of the second test, half of the

respondents only participated in the second test. Building on the different

experimental conditions and test approaches, 12 different test groups can

be distinguished with the number of participants in each group varying

between 25 to 28 (see Appendix A for a schematic overview).

search that tests SPF users’ comprehension

of value added and learning gain, we developed a test for this study. To do

so, we created a framework which included all of the different cognitive

tasks that have to be performed to correctly interpret the feedback

information. Using this framework, suitable test items were developed

Out of this list of closed items (true-untrue

and multiple choice) and open items (filling in digit values), a test was

composed that could be completed in 30 minutes. Tests consisted of

conceptual and procedural items on two central concepts in the SPF report:

6 items referring to conceptual knowledge of learning gain: For example:

“Learning gain is the extent to which pupils progress in a certain skill

Chapter 4

100

• 5 to 6 items referring to the conceptual knowledge of value added: For

example: “Value added can only be determined if you know the input

characteristics of pupils. (true-untrue)”

• 13 items referring to the reading off learning gain from graphs and

tables: For example: “Look with close attention to these growth curves.

Then, complete the blanks in the table (score school 1 at start grade 1,

score at end of grade 1 and learning gain).”

• 17 to 18 items referring to the derivation of the value-added scores from

graphs and tables: For example: “Look with close attention to these

growth curves. Determine if these statements are true or untrue. School

1 reached a higher value-added score than school 2 in grade 4.” See also

Figure 4 for a simplified example of deriving value-added scores from

growth curves (in this test, the growth curves represented 6 grades and

2 schools were presented simultaneously).

A section of each post-test was designed in accordance with the nature

of the experimental condition. This means that the terminology and curves

were adapted to the way in which the concept of value added was

explained, either in terms of adjusted or expected means.

The psychometric quality of the test was checked by first converting the

scores on the different test versions to one common scale, applying a three-

parameter IRT model. Seven test groups were defined in function of the

different test variants. A satisfying overall fit was found for 111 of the

original 127 items (LR = 508.9, SE = 556.0, p = .92). The number of bad

fitting items did not exceed the number to be expected based on

coincidence. The empirical reliability of the tests varied from .80 to .90

depending on the test version and test group, with the exception of .72 for

one particular test group (Mdn = .84). Exploration of a two-parameter IRT

model resulted in comparable results, but 4 extra items had to be removed

from the calibration to attain a good overall fit (LR = 535.4, SE = 549.0, p =

.65). The correlation between skill scores resulting from the three- and two-

parameter model is r = .95 (p < .001). The three-parameter model was

finally preferred, because the IRT scores had a slightly better normal

distribution.

Short survey of learner characteristics

In addition to the post-test, data was gathered on characteristics of the

participants. As discussed in the theoretical framework, individual

differences must be taken into account when designing studies on data

representation. However, since the instruction and testing time in this

experiment was limited, we could not use elaborate measurement

Chapter 4

101

instruments (as, for example, in Scheiter, Gerjets, Vollmann, & Catrambone,

2009). We therefore selected a number of indicator variables.

To take into account differences in prior knowledge of the participants

in this study (De Westelinck, Valcke, De Craene, & Kirschner, 2005; Mayer,

2001; Shah & Hoeffner, 2002; Sweller, van Merriënboer, & Paas, 1998),

data was collected for the following variables:

• their study program: freshmen or students with a prior bachelor degree

• the number of hours of mathematics per week in the last and second

last year of their secondary education, and

• their mathematics exam score at the end of their secondary education

(in %)

As an indication of attitudes towards statistics (e.g., Scheiter, Gerjets,

Vollmann, & Catrambone, 2009), an item was included measuring the

degree to which participants like statistics. This was measured on a 7-point

Likert scale ranging from 1 (totally dislike) to 7 (like it very much). As an

additional moderator variable, we included the participants’ perceived

clarity of the feedback report, since it is plausible that not all respondents

experienced the clarity of the instruction material equally. This was also

measured on a 7-point Likert scale, ranging from 1 (very unclear) to 7 (very

clear).

2.4. Procedure

The entire population of freshman of educational sciences was invited to

schedule their participation for an experimental session by means of a

learning management system. A maximum of 50 students could participate

in each session, which was set up in a computer lab. Following a brief

introduction participants were asked to assume the fictitious role of a

school principal who had received a school performance feedback report on

the results from a longitudinal study that their school had participated in.

Participants were asked to imagine that the pupils of their school had been

monitored over the six years of primary school education and had been

tested on seven occasions. Participants were asked to read at their own

pace the school feedback report presented to them by a PowerPoint-

presentation. At the end of the presentation, they were asked to click on

the link to the online test and short survey. If their computer screen turned

orange, they had to take the test without looking back at the presentation

(closed version). If the screen turned green, they received a printed version

of their school feedback report to guide them when answering the open

version of the post-test. Students were told they had approximately 90

Chapter 4

102

minutes to read through the materials and to complete the post-test and

short survey.

2.5. Analyses

An explorative analysis of the descriptive data was carried out to screen

response patterns in relation to the content, degree of difficulty, and the

nature of the questions. Therefore, scatter plots were used to represent the

locations of the items in terms of the item type (conceptual – procedural)

and item content (learning gain – value added). The item location is an IRT

parameter that is related to the percentage of correct answers (r = -.90 in

this study) and gives an indication of the item difficulty level: the lower an

item is located, the more participants scored correctly on that item, which

indicates a lower level of the difficulty. To clarify the meaning of these item

locations, the percentages of correct answers for these items are reported

below. In addition, an error analysis was performed by first listing all

possible answers, then revealing error patterns, and finally reconstructing

participants’ reasoning processes.

First, the potential difference between the scores on the open and

closed test was examined using a t-test. Depending on this result, further

analyses were performed with either only the open test (if no difference) or

both tests (in case they differ).

The differential impact of experimental conditions was checked on the

basis of univariate analyses of covariance, controlling for potentially

confounding variables (mathematics level, degree of liking statistics, current

study program, etc.). Differences were tested with respect to the IRT test

scores. Additionally, pairwise comparisons were executed to determine the

differences within the categories of significant factors. Furthermore,

relevant moderator effects were included in addition to the main effects to

get a more nuanced view on the predictors in the model. These moderator

effects were added to the model stepwise. For this study, the students’

mathematics exam scores in the last year of secondary education were

brought into interaction with the hours of mathematics they received per

week. Furthermore, the moderating relation between value-added

explanations and graphical representation modes was examined. Finally,

the model examined the interaction effect between the value-added

conditions and the perceived clarity of the presentation.

Assumptions were checked for the analysis techniques, i.e.,

homogeneity of regression slopes and of residual variances, which

confirmed that no assumption had been violated.

3. Results and discussion

3.1. Descriptive statistics of correct answer

No differences were found between the results of the open and closed

versions of the test (t(153) = .322,

obtained for the open version of the test were used in the subsequent

analyses.

Descriptive statistical analysis reveals that students did not experience

difficulties in reading exact values from the tables and graphs (more than

85% of the answers correct), or in calculating learning gains (more than 75%

correct). To illustrate the spread of the items, the panel di

shows the frequencies of the standardised item locations in relation to the

item type and item content.

Figure 3. Panel display of standardised item locations in relation to item type and

content.

For example, the upper right panel shows no items exceeding a standard

deviation above zero, indicating a mean item location for procedural

learning gain items. This implies that test items that required participants to

derive learning gain values from gra

than the mean test difficulty level. In contrast, the most difficult items

appear in the procedural value-added panel. On average, only 35% of the

respondents were able to derive value

presented in the feedback report. Reading off value

Chapter 4

103

Descriptive statistics of correct answers

No differences were found between the results of the open and closed

(153) = .322, p = .748). Consequently, the scores

obtained for the open version of the test were used in the subsequent

s reveals that students did not experience

difficulties in reading exact values from the tables and graphs (more than

85% of the answers correct), or in calculating learning gains (more than 75%

correct). To illustrate the spread of the items, the panel display in Figure 3

shows the frequencies of the standardised item locations in relation to the

Panel display of standardised item locations in relation to item type and

For example, the upper right panel shows no items exceeding a standard

deviation above zero, indicating a mean item location for procedural

learning gain items. This implies that test items that required participants to

derive learning gain values from graphs and tables were not more difficult

than the mean test difficulty level. In contrast, the most difficult items

added panel. On average, only 35% of the

respondents were able to derive value-added scores from the graphs

sented in the feedback report. Reading off value-added scores requires

Chapter 4

104

the comparison of data (heights and slopes) from different growth curves

(e.g., the school’s adjusted growth curve and the “national” average growth

curve or the school’s expected growth curve and the school’s observed

growth curve). In contrast, deriving the average learning gain over a certain

period only requires the examination of one growth curve (see Figure 4).

Calculating value added thus requires extra processing, possibly causing

cognitive overload in the working memory due to high element interactivity

(Sweller, van Merriënboer, & Paas, 1998).This difference in mental effort,

related to intrinsic cognitive load, may explain the lower scores for

procedural value-added items in comparison to the learning-gain items.

Figure 4. Example of deriving learning gain and value added from growth curves.

For deriving learning gain, the difference in skill scores between two points of the

same curve must be calculated. For example, the learning gain of this school in the

first grade is 50 - 35 = 15. For reading off value-added results, two curves need to

be compared and a geometrical translation need to be performed. Before

subtracting the end points, the starting points of the curves must coincide. For

example, the value added of this school in the second grade is - 10.

Examining the nature of the errors in calculating value added, patterns

can be observed for the incorrect answers. This enables us to reconstruct

the thinking process of participants and to identify certain misconceptions.

Typical errors made when calculating value added are (1) comparing the

wrong growth curves; (2) ignoring the difference in starting points of the

Chapter 4

105

curves before subtracting (no geometrical translation was performed); (3)

confusing the calculation process of value added and learning gain: (4) using

the wrong signs (+/-); and (5) confusing the heights of curves with their

slopes. This last misconception, called the slope-height confusion, has been

reported in earlier studies (Beichner, 1994; Clement, 1989; Kramarski, 2004;

Leinhardt, Zaslavsky, & Stein, 1990).

Respondents mostly gave correct answers to the conceptual questions

related to the information that was literally explained in the school

feedback presentation (87% correct answers). In contrast, low test scores

were observed when the questions required deep level conceptual learning

(24% correct answers). For example, the statement that “The learning gain

of pupils can be calculated by tests with the same maximum score” was

incorrectly classified as ‘true’ by 86% of the participants. During the

instruction period, participants were told that learning gain can only be

calculated if both tests are on the same scale, by IRT calibration or by taking

the same test twice. Only 2% of the participants who received the item “To

estimate a school’s value added, you first have to adjust for school

characteristics,” answered it correctly. This indicates that for these

participants either the difference between school and input characteristics

was not clear or they just did not notice the difference in this sentence.

3.2. Differences between conditions

The results of the analyses of covariance in Table 2 show significant

differences in test scores in relation to the way value added was explained.

The school feedback report that explained value added in terms of

expected means resulted in higher performance (see Table 3 for descriptive

statistics; t(298) = 2.283, β = -.536, p = < .05). But the effect sizes are limited

as the explained variance in test scores is 2.1% (partial η2).

A pairwise comparison of the presentation modes reveals significant

differences between the test scores for the basic SPF version and the

elaborated SPF version using tables, in favour of the basic version (∆ = .269,

SE = .117, p <.05, partial η2 = 1.7 %). This finding contradicts Kosslyn’s (2006)

theory on the advantage of observing the design principle of appropriate

knowledge. An explanation of this finding could be found in the structure

mapping hypothesis (Schnotz & Bannert, 2003). This hypothesis assumes

that adding representations is not beneficial in all cases but is dependent

on the kind of task being carried out. In this sense, tables may not have

been helpful in solving the tasks presented in this study because their

structure does not facilitate the construction of a task-appropriate mental

model. Indeed, adding tables may have been inappropriate for illustrating

Chapter 4

106

trends in the data, since tables might be more appropriate for determining

exact numbers (Meyer, Shinar, & Leiser, 1997). Therefore, adding tables

that were not in accordance to the different task purposes may have

caused extraneous cognitive load (Chandler & Sweller, 1991).

Table 2 Results of analysis of covariance for IRT test score

Test score

df F p

Corrected model 13 5.221 .000** Explanation mode value added (E) 1 6.238 .013* Presentation mode (P) 2 2.648 .072 E x P 2 .738 .479 Study Program 2 2.728 .067 Degree of liking statistics 1 1.228 .269 Hours of math sec. education (H) 1 .501 .480 Math exam score sec. education (S) 1 .714 .399 H x S 1 .024 .878 Perceived clarity of presentation (C) 1 10.921 .001**

E x C 1 4.778 .030* Error 298

Note. Adj. R2 = .15 for IRT test score *p ≤ .05. ** p ≤.01

Table 3 Numbers, means and standard deviations of IRT test scores for the six conditions

n M SD

Explanation mode value added Adjusted scores 154 -.070 .950 Expected scores 158 .068 .829 Presentation mode Basic version 102 .131 .974 Table version 103 -.077 .852 Symbol version 107 -.051 .841

Regarding the influence of individual differences on the test score, only

the perceived clarity of the presentation appears to be significant, both as a

main effect and as a moderator effect in interaction with the value-added

explanation mode (t(298) = 2.186, β = .487, p = < .05). The direction of this

interaction effect shows that the perceived clarity of the presentation is

even more important when value added is explained in terms of adjusted

means than in terms of expected means. In other words: “The more clear a

presentation is perceived, the higher the IRT test score,” holds more when

value added is explained by adjusted than by expected means.

Chapter 4

107

4. General discussion and conclusion

4.1. Interpretation of SPF in the present study

Since school performance feedback aims at contributing to internal school

quality policies, it is important that the target audience develops a good

understanding of the information offered. The results of the present study

reveal that at least one of the most widely used concepts in school

performance feedback, the concept of value added, is not well understood

by non-statistically skilled people. The results from our experiment indicate

that there is a lack of procedural and deep conceptual understanding of this

function. Even when comprehensive information was provided to

participants in the experimental setting, the conceptual basis of value

added was too complex for statistically unskilled people to master. These

findings confirm our first research hypothesis that users’ would have

difficulty interpreting complex conceptual and graphical information due to

interplay between the inherent complexity of SPF and a lack of prior

knowledge of the respondents. This interplay causes intrinsic cognitive load

(Sweller, van Merriënboer, & Paas, 1998), interpretation difficulties, and

misconceptions (e.g., slope-height confusion, see Beichner, 1994; Clement,

1989; Kramarski, 2004; Leinhardt, Zaslavsky, & Stein, 1990).

We compared the two explanations and representations of value added

in terms of their differential effect on participants’ understanding of the

concept. This proved to be helpful in detecting which explanation of value

added facilitated better conceptual and procedural understanding.

Explaining this concept in terms of the difference between observed and

expected growth appears to be better than explaining it in terms of the

difference between the school’s adjusted growth curve and the reference

growth curve. However, the effect size of the observed significant

differences is rather small. While more research is needed to confirm these

findings, they serve as a point of reflection for the designers of school

feedback systems.

In terms of the graphical representations used in our experiment, it is

rather surprising that the tables did not add to users’ understanding of the

feedback report. However, this does not imply that the use of tables in

combination with growth curves is not advisable. Previous research

indicates that different information is derived from tables and graphs

(Meyer, Shinar, & Leiser, 1997); both sources of information have merits,

depending on the task being performed (Schnotz & Bannert, 2003). An

appropriate use of tables and graphs can avoid extraneous cognitive load

and foster correct understanding.

Chapter 4

108

4.2. Strengths and limitations

Earlier studies which examined school feedback reports expressed concern

for the accuracy of feedback users’ interpretation of information, but were

not able to pinpoint what was being misunderstood (Earl & Fullan, 2003;

Kerr et al., 2006; Saunders, 2000; Williams & Coles, 2007). The present

study allows us to develop a more detailed understanding of what is

misunderstood when interpreting learning gain and value added from SPF

reports. The use of IRT techniques appears to deliver detailed information

both on the item parameters and on respondents’ scores. This allows SPF

developers to examine interpretation difficulties in detail and to adapt SPF

representation forms for clients that are statistically unskilled. Furthermore,

this may inspire feedback providers to set up support initiatives.

In our experimental study particular concepts were studied in a

controlled setting. However, participants were not genuine feedback users.

This feedback experiment must therefore be considered as a first attempt

to test the understanding of diverse modes of explaining SPF concepts.

These results require further examination in future research.

It is possible that the experimental tasks in this study placed too high a

demand on the participants, in that they were expected to derive and

calculate value added from graphical representations or to indicate what

conclusions can be drawn from the tables and the growth curves presented.

This is also a point of discussion for feedback providers: Is it necessary to

expect SPF users to master the basic principles of deriving value added and

learning gain scores from representations or should SPF reports be

simplified? Providing more technical information to users also implies more

complex interpretations of the SPF information.

4.3. Implications for future research

Our findings in relation to the different modes of explaining the concept of

value added need further confirmation. The different modes of explaining

and representing SPF concepts and their influence on users’ understanding

can be tested in a number of ways. IRT techniques appear to be useful in

this regard and can be applied in less controlled settings, such as in quasi-

experimental designs. This would provide more detailed item information

and could support the external validity of our findings. An alternative way

of testing value added conceptions is to interview the feedback users. This

method has provided useful results in previous research (Santelices & Taut,

2009; Saunders, 2000). However, more in-depth analyses, such as video-

taping feedback users as they explain their understanding of the

Chapter 4

109

representations and concepts in SPF reports, may provide more insight into

the reasoning process of respondents.

The individual differences and preferences that influence feedback

users’ understanding of the SPF data require further attention. It is

important to explore whether SPF developers should introduce feedback

reports that are more flexible in terms of form and content, i.e., tailored to

the individual user (Visscher & Coe, 2003). This study points at the

importance of how respondents perceive the SPF variables and data.

Indeed, it is often not the feedback characteristics as such, but the

perception of them that determine how the data will be used (Verhaeghe et

al., 2010; Visscher, 2002). Therefore, valid measures of users’ perception of

SPF variables should be developed.

4.4. Implications for practice

It is quite likely that the misconceptions observed in this study also occur in

school practices when interpreting school performance feedback.

Therefore, these findings underpin the importance of carefully examining

the interpretability of feedback reports. Feedback developers should adapt

the mode of explaining the concept of value added to the target audience;

they should be aware of the prior knowledge of feedback users and should

develop graphical representations that differ from those used in scientific

publications. The presentation of information in SPF reports should be

designed in line with the task to be performed (Kluger & DeNisi, 1996;

Schnotz & Bannert, 2003). This implies that feedback reports should be

designed according to the cognitive tasks that are necessary to understand

the information. Many studies stress the role of support when dealing with

school feedback (Bosker, Branderhorst, & Visscher, 2007; Earl & Fullan,

2003; Kerr et al., 2006; Saunders, 2000; Verhaeghe et al., 2010; Williams &

Coles, 2007). If school performance feedback is expected to contribute to

school improvement, attention must be given to the way users interpret

the information.

110

Appendix A: Assignment of participants to the different conditions and test formats

Note. Extra variation in the tests was added by developing parallel formats in case students subscribed in later sessions were influenced by

colleagues of earlier sessions.

Chapter 4

111

References

Anderson, L.W., Krathwohl, D. R., Airasian, P.W., Cruikshank, K.A., Mayer,

R.E., Pintrich, P.R., et al. (Eds.). (2001). A taxonomy for learning,

teaching, and assessing: A revision of Bloom’s taxonomy of educational

objectives. New York: Longman.

Beichner, R.J., (1994). Testing student interpretation of kinematics graphs.

American Journal of Physics, 62(8), 750-762.

Bosker, R.J., Branderhorst, E. M., & Visscher, A. J. (2007). Improving the



CiTO. (n.d.) Volg- en adviessysteem: Voor elke leerling de beste kansen

[Monitor and advice system: The best chances for each pupil]. Retrieved

December 1, 2008, from

http://www.cito.nl/vo/vas/algemeen/eind_fr.htm

Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of

instruction. Cognition and Instruction, 8(4), 293-332.

Clement, J. (1989). The concept of variation and misconceptions in

Cartesian graphing. Focus on Learning Problems in Mathematics, 11, 77-

87.

De Westelinck, K., Valcke, M., De Craene, B., & Kirschner, P. (2005).

Multimedia learning in social sciences: Limitations of external graphical

representations. Computers in Human Behavior, 21(4), 555-573.

Dekeyser, H. M. (2001). Student preference for verbal, graphic or symbolic

information in an independent learning environment for an applied

statistics course. In J.F.Rouet, J.J. Levonen, & A. Biardieu (Eds.),

Multimedia learning: cognitive and instructional issues (pp. 99-109).

Oxford: Pergamon.

Durham University, Centre for Evaluation and Monitoring. (n.d.). PIPS.

Retrieved December 1, 2008, from

http://www.cemcentre.org/RenderPage.asp?LinkID=22210000



Fitz-Gibbon, C.T. (1997). The value added national project: Final report.

Feasibility studies for a national system of value added indicators. Hayes:

School Curriculum and Assessment Authority.




Katholieke Universiteit Leuven, Centre for Educational Effectiveness and

Evaluation. (2008). PIRLS: Begrijpend lezen vierde leerjaar:

Chapter 4

112

Schoolfeedbackrapport n.a.v. deelname aan het PIRLS 2006 – onderzoek

[PIRLS: Comprehensive reading grade four: School feedback report in

response to participation to the PIRLS 2006 study]. Retrieved December

1, 2008, from http://ppw.kuleuven.be/pirls/voorbeeldrapport.pdf

Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H. & Barney, H. (2006).



Education, 112, 496-520.

Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on


feedback intervention theory. Psychological Bulletin, 119(2), 254-284.

Kosslyn, S.M. (2006). Graph design for the eye and mind. Oxford: Oxford

University Press.

Kramarski, B. (2004). Making sense of graphs: Does metacognitive

instruction make a difference on students’ mathematical conceptions

and alternative conceptions? Learning and Instruction, 14(6), 593-619.

Leinhardt, G., Zaslavsky, O., & Stein, M.K. (1990). Functions, graphs, and

graphing: Tasks, learning, and teaching. Review of Educational Research,

60(1), 1-64.

Leithwood, K., Aitken, R, & Jantzi, D. (2006). Making schools smarter:

Leading with evidence (3rd ed.). Thousand Oaks, CA: Corwin Press.

Mayer, R.E. (2001). Multimedia Learning. Cambridge: Cambridge University

Press.

Mevarech, Z.R. & Kramarsky, B. (1997). From verbal descriptions to graphic

representations: Stability and change in students’ alternative

conceptions. Educational Studies in Mathematics, 32(3), 229-263.

Meyer, J., Shinar, D., & Leiser, D. (1997). Multiple factors that determine

performance with tables and graphs. Human Factors, 39(2), 268-286.

Mittal, V. O., Carenini, G., Moore, J.D., & Roth, S. (1998). Describing

complex charts in natural language: A caption generation system.

Computational Linguistics, 24(3), 431-467.



perspective (pp 3-16). Oxford: Elsevier Science.

Organisation for Economic Co-operation and Development. (2008).

Measuring improvements in learning outcomes: Best-practices to assess

the value-added of schools. Paris: OECD Publishing.

Santelices, V., & Taut, S. (2009, September). Comprehension and use of



Research, Vienna.

Chapter 4

113

Saunders, L. (2000). Understanding schools’ use of value-added data: The

psychology and sociology of numbers. Research Paper in Education,

15(3), 241-258.

Scheiter, K., Gerjets, P., Vollmann, B., & Catrambone, R. (2009). The impact

of learner characteristics on information utilization strategies, cognitive

load experienced, and performance in hypermedia learning. Learning

and Instruction, 19(2009), 387-401.




Schnotz, W., Bannert, M. (2003). Construction and inference in learning

from multiple representation. Learning and Instruction, 13(2), 141-156.

Shah, P., & Hoeffner, J. (2002). Review of graph comprehension research:

Implications for instruction. Educational Psychology Review, 14, 47-69.

Smith III, J. P., diSessa, A.A., & Roschelle, J. (1993). Misconceptions

reconceived: A constructivist analysis of knowledge in transition. The

Journal of the Learning Sciences, 3(2), 115-163.

Sweller, J., van Merriënboer, J.J.G., & Paas, F.G.W.C. (1998). Cognitive

architecture and instructional design. Educational Psychology Review,

10(3), 251-296.

Tapiero, I. (2001). The construction and the updating of a spatial mental

model from text and map: effect of imagery and anchors. In J.F.Rouet, J.

J. Levonen, & A. Biardieu (Eds.), Multimedia learning: Cognitive and

instructional issues (pp. 45-57). Oxford: Pergamon.




Visscher, A.J. (1996). The implications of how school staff handle

information for the usage of school information systems. International

Journal of Educational Research, 25(4), 323-334.


feedback systems. In A. J. Visscher, & R. Coe (Eds.), School improvement

through performance feedback. Lisse: Swets & Zeitlinger.


performance feedback. Lisse: Swets & Zeitlinger.

Visscher, A., & Coe, R. (2003). School performance feedback systems:



Williams, D. & Coles, L. (2007). Teachers’ approaches to finding and using


Research, 49(2), 185-206.

114

CHAPTER 5

THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL PERFORMANCE FEEDBACK USE

Chapter 5

115

CHAPTER 5: THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL

PERFORMANCE FEEDBACK USE∗∗∗∗

Abstract

Information-rich environments are created to promote data use in schools

for the purpose of self-evaluation and quality assurance. However,

providing feedback does not guarantee that schools will actually put it to

use. One of the main stumbling blocks relates to the interpretation and

diagnosis of the information. This study examines the relationship between

data literacy competences, support given in interpreting the information,

actual use of the feedback, and potential school improvement effects. A

randomized field experiment with 188 school principals from primary

education was set up and a posttest was used to investigate the effects of a

support initiative. The results revealed that a minority of schools invested

significantly in the interpretation and diagnosis of the school performance

feedback (SPF), despite the fact that most of the respondents showed an

interest in the SPF report. In addition, data competence support and the

subsequent use of feedback were found to be limited.

∗ Based on Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in

press).The influence of competences and support on school performance feedback use. Educational Studies.

Chapter 5

116

1. Introduction and research questions

The growing autonomy of schools is going hand in hand with initiatives by

education authorities to hold schools accountable for their approach to

quality care (Nevo, 2002; Hofman, Dijkstra, & Hofman, 2009) and to create

information-rich environments. Schools are given feedback on their

functioning and performance via school performance feedback systems

(Visscher & Coe, 2002; 2003). The use of such systems as a policy

instrument is not a straightforward issue. School performance feedback

(SPF) has turned out to be a necessary yet insufficient step as both the

schools and the feedback systems have to meet certain requirements in

order to actually use this in practice (Visscher & Coe, 2003; Verhaeghe,

Vanhoof, Valcke, & Van Petegem, 2010). Consequently, current research

often reports disappointing results from school feedback use (Coe, 2002;

Saunders & Rudd, 1999; Tymms, 1995; Schildkamp, Visscher, & Luyten,

2009; Van Petegem & Vanhoof, 2004; Verhaeghe et al., 2010; Zupanc,

Urank, & Bren, 2009). One important obstacle is the lack of knowledge and

skills needed to process the information. School principals are usually not

trained in carrying out research, collecting data, data management or data

interpretation. This lack of data literacy (Earl & Fullan, 2003) leads to

valuable information often being neglected. Available research reveals a

need to support school principals and teachers in both the interpretation

and further use of the feedback data (Schildkamp & Teddlie, 2008;

Schildkamp, Visscher, & Luyten, 2009; Zupanc, Urank, & Bren, 2009). A

second critical issue - rising from the research review – is the need to

evaluate the impact of support initiatives related to the use of SPF (Zupanc,

Urank, & Bren, 2009). Indeed, current support initiatives often lack

empirical verification. And when evaluation initiatives have been set up,

they often focus too much on the short-term effects, such as the

satisfaction of participants, without considering the effects on the

organization (Mathison, 1992; Rossi, Lipsey & Freeman, 2004).

The present study aims at testing insights emerging from the current

knowledge base against empirical information. Answers are sought to the

following research questions:

• How do schools use SPF (in terms of phases in use and types of use)?

What are the effects of this use?

• To what extent are variations in SPF influenced by data literacy

competences?

• To what extent does specific SPF support has an impact on the

development of SPF competences, actual SPF use and resulting SPF

effects?

Chapter 5

117

2. Theoretical framework

In the following paragraphs, we first provide a theoretical framework used

to investigate the use of SPF. We subsequently address the question of SPF

effects. Finally, we focus on factors that are expected to influence the use

of school feedback, in particular data literacy competences and SPF usage

support. A visual representation of the theoretical framework is given in

Figure 1.

Figure 1. Framework for SPF usage in the present study

2.1. The use of SPF: Phases in use and types of use

Research shows that the process of SPF use in schools can be described in a

variety of ways (e.g., Schildkamp, 2007; Schildkamp & Kuiper, 2010). The

effective use of SPF implies a well-considered sequence of several

consecutive phases in a cyclical process (Huffman & Kalnin, 2003; Learning

Point Associates, 2004). In the process of school feedback use, Verhaeghe

et al. (2010) distinguish between receiving, reading and discussing the SPF

as a means to arrive at a correct interpretation. After the school has

performed an analysis of its results, the next stage involves putting to use

the information from the SPF, comprising a diagnosis that looks for

explanations for the school’s results.

Furthermore, SPF use refers to specific actions or changes in thinking

and processes. With reference to available research on evaluation data and

SPF use (Schildkamp, 2007; Schildkamp & Teddlie, 2008; Schildkamp,

Visscher & Luyten, 2009; Weiss, 1998), this study focuses on the following

types of use: an instrumental and conceptual use. In the case of a

conceptual use, we centre on changes in the thinking of the feedback users

(e.g. influences thinking in regard of how the pupils perform or how the

school functions). In the case of an instrumental use, we examine reported

Chapter 5

118

changes in school policies. The way the feedback will be used (types of use)

is expected to be correlated positively to the investment in the process of

SPF use (phases in use).

2.2. Effects of SPF

School performance feedback use will not automatically result in a

significant improvement of pupil performance (Fitz-Gibbon & Tymms, 2002;

Schildkamp, Visscher & Luyten, 2009). This underlines the importance of

examining effects beyond the level of educational performance and giving

sufficient attention to process-oriented effects (Schildkamp, 2007; Visscher

& Coe, 2002; 2003). The latter are described in terms of professional

development of team members, improved educational processes and

improvements in school functioning (Zupank, Urank, & Bren, 2009;

Schildkamp & Teddlie, 2008; Visscher & Coe, 2003). Furthermore,

unintended and undesirable effects can be observed, however; for

example, reduced motivation among teachers due to extra workload (Fitz-

Gibbon & Tymms, 2002; Schildkamp & Teddlie, 2008) or an excessive and

narrow focus on testing-towards-the-curriculum (Schildkamp & Teddlie,

2008; Visscher, 2002). In the present study, we map the perceived effects of

SPF use on the basis of self-reports of school improvement effects. This

approach has been successfully applied in previous studies on data use

(Huffman & Kalnin, 2003; Schildkamp & Teddlie, 2008; Schildkamp,

Visscher, & Luyten, 2009).

2.3. Influential factors: Competences and support

In our theoretical model, we distinguish between a variety of variables and

processes that influence the actual feedback use and the related effects: (1)

variables within the users that define their ability/orientation to adopt SPF

use and (2) levels of SPF support.

Data literacy competences

A competence is the ability to take satisfactory action through the

integration of knowledge, skills and attitudes. These three elements are

operationalized below in the context of school feedback use.

An attitude reveals how positively or negatively a person views a

particular matter (Petty & Wegener, 1998). A negative attitude towards SPF

is – according to Bosker, Branderhorst and Visscher (2007) – one of the

main obstacles in the use of feedback information. The attitude is the most

Chapter 5

119

significant aspect that determines a person’s willingness to invest time and

energy in dealing with information (Williams & Coles, 2007) and the users’

belief that they need the data in order to improve education (Schildkamp &

Kuiper, 2010). The concept can be operationalized in analogy to self-

evaluation research in schools (Meuret & Morlaix, 2003; Vanhoof, Van

Petegem, & De Maeyer, 2009). An individual’s attitude towards SPF can be

situated on a bipolar continuum. A number of examples include: School

feedback does/does not lead to better teaching, is favored/not favored by

most team members, and so on.

The importance of knowledge and skills is evidenced by the impact of

data literacy on the SPF use process (Webber & Johnston, 2000). “Data

literacy encompasses the strategies, skills and knowledge needed to define

information needs, and to locate, evaluate, synthesize, organize, present

and/or communicate information as needed” (Williams & Coles, 2007, p.

188). Data literacy is a condition for being able to convert data into valuable

and usable information (Earl & Fullan, 2003). The current lack of know-how

on making use of the information is an important obstacle (Kerr, Marsh,

Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Van Petegem & Vanhoof,

2004; Williams & Coles, 2007). Next to a lack of capacities needed to

interpret the data, there often is a lack of well-developed research skills

such as the formulation of research questions and hypotheses (Earl &

Fullan, 2003; Herman & Gribbons, 2001; Kerr et al. 2006). In this context,

we also have to distinguished between the actual mastery of knowledge

and skills on the one hand, and the level at which the users estimate their

skills on the other. The concept academic self-efficacy is applicable in the

context of SPF, which is a person's belief that he or she can perform certain

academic tasks to certain levels (Bandura, 1977; Schunk, 1991). In the

present study, academic self-efficacy focuses on the extent to which users

think they have understood the terms, figures and tables used and the

extent to which they believe they are able to find explanations for their

results. It is not only important to measure the actual knowledge and skills

but also to record the level of perceived self-efficacy, since it significantly

determines a person’s motivation for action (Bandura, 1977).

SPF usage support

Providing SPF support is essential because it might influence the actual and

experienced mastery of the competences of school principals to interpret

information relating to their school. A more detailed description of support

levels in this study will be discussed in the section about research

methodology. For the evaluation of support effects, we build on

Chapter 5

120

Kirkpatrick’s (1998) four levels of evaluation, which can be linked to our

broader theoretical framework. Table 1 describes these levels in general

terms and in terms of the SPF focus in this study.

Table 1

Kirkpatrick's Evaluation Levels (1998)

Description of evaluation levels Application in this study

Reaction. Immediate response of the

participants after the support. This

concerns a general impression of the

relevance and possibilities for

application.

This level is not reported because it

could logically only be obtained from

the experimental group.

Learning. Increase in knowledge and

skills and the change in attitudes as a

result of the support.

In this study, this level translates as

the question of whether the support

has contributed to an increase in data

literacy competences, specifically in

relation to the feedback report used.

Behavior. Application of what has

been learned in the organization and

behavioral changes.

In this particular case, it concerns the

question of how far schools progress in

the phases of SPF use and types of SPF

use.

Results. Effects of the support on

achieving the organization’s aims and

on the organization itself.

In the context of SPF use, this

evaluation level is represented in the

variable ‘perceived effects' of SPF.

Kirkpatrick's underlying premise (1998) is that the attainment of a higher

level can only be achieved once a lower level has been realized. This fits the

theoretical framework (see Figure 1) since SPF support provisions will only

contribute to school improvement effects when underlying SPF

competences have been affected.

3. Methodology: research design, procedure and research instruments

A between-groups field experiment with posttest was set up to investigate

the impact of SPF use support. The schools in this study can be classified

into two groups: a group with SPF support (experimental group) and a

group without SPF support (control group). The design was experimental

rather than quasi-experimental (Creswell, 2008; Field & Hole, 2003), given

that the schools were randomly assigned to either one of the two

Chapter 5

121

conditions and it was possible to control the independent variable, namely

the support intervention.

The experiment was set up in the context of a large-scale project,

whereby Flemish primary schools annually receive confidential feedback

based on the comparison of their school performance results with a

reference group. The schools receiving the feedback participate in a

longitudinal study, named Schoolloopbanen in het BasisOnderwijs (SiBO),

tracking approximately 6000 pupils from a representative sample of Flemish

schools (from the start of K3 until the end of grade six and the transition to

secondary education). Item Response Theory (IRT)-based techniques are

used to construct the test scores, enabling the estimation of growth curves.

At the beginning of 2008, about 200 schools received feedback reports

containing the results (grade 1 to grade 4) of the investigated pupil cohort.

Results were reported in relation to mathematics, reading fluency, reading

comprehension, and orthography, supplemented with information about

pupil characteristics (child factors, home factors, and Dutch language skills

at the start of grade 1). The central concepts in these reports include

learning gain, value added, and adjusted scores. These concepts were

explained in such a way that no prior statistical knowledge is required. The

data were supported with graphical representations (i.e. pie charts, growth

curves, and cross tables). The content of the text of each report was

standardized. The school principals were required to interpret the results

for their school, based on the general information made available.

Forty-five - chosen at random - of the 188 schools involved in the project

received an invitation to participate in the support. The principals in the

experimental support condition participated in a professional development

activity (a half-day workshop) with the following aims: (1) being able to

describe concepts from the report in their own words; (2) being able to

interpret the figures and tables from the SPF report; (3) being able to give

an explanation why performances could be worse or better as compared to

the reference group and (4) being able to describe which function(s) the

SPF report can fulfill in the context of their own school. To this end, school

principals met in small groups outside their own school. The feedback

designers explained the feedback reports during these meetings and

participants were given the opportunity to practice using and evaluating the

feedback information interactively.

Only 23 of the 45 schools invited effectively participated in the

experimental condition. Although the study participants were assigned to

the various conditions randomly, there is a real risk of selection bias caused

by the self-selection through working with volunteers (Rossi, Lipsey &

Freeman, 2004). This could endanger the internal validity of the study.

Chapter 5

122

Therefore, previously collected data was used to investigate whether this

subgroup deviated from the population of schools in the project in relation

to relevant school population and functioning criteria. This proved not to be

the case.

Five months after receiving the SPF – and after the experimental group

had participated in the support provision – the school principals of the SIBO

schools were asked to fill in a questionnaire. A total of 116 schools

completed the questionnaire (response rate 62%). The response for the

control condition amounted to 60% (n = 99) and the amount for the

experimental condition amounted to 74% (n = 17).

The various concepts from the theoretical framework were translated in

scales, consisting of specific questionnaire items. Each item presented the

principals with a statement they were to judge using a Likert scale. Table 2

presents descriptives of the scale scores for the different constructs; in

addition psychometric details are reported. The reliability analyses show

good to very good internal consistency values for all scales (α > .80).

Table 2

Descriptives and reliability of the research instruments

M SD Range N items α

Influencing factors

Attitude towards SPF use 3.97 1.08 1-6 7 .91

Academic self-efficacy 3.81 0.74 1-5 6 .92

SPF use

Phases in SPF use 3.81 0.75 1-5 6 .86

Conceptual SPF use 3.27 0.83 1-5 4 .86

Instrumental SPF use 2.85 0.97 1-5 3 .90

Effects

Perceived effects of SPF use 2.92 0.90 1-5 6 .94

A data literacy test was used to measure the knowledge and skills in

relation to feedback interpretation. The test focuses on the concepts and

representations used in the SPF reports and comprises 26 test items

mapping out both conceptual knowledge (correct conception of the terms

used) and procedural knowledge (skills in reading learning gains and added

value from graphs and tables). Both closed (true-untrue and multiple

choice) and open (filling in digit values) questions were used in the test.

Test scores were construed using IRT-analysis. A good overall fit was

achieved using a two-parameter model (LR = 248.4; SE = 320.0; p = .99) and

a good empirical validity of .83 was achieved using 24 retained items. The

scores were standardized to enhance their interpretability.

Chapter 5

123

Considering the nature of the theoretical framework and the research

questions, putting forward empirical evidence will require structural

equation modeling. Path analyses were therefore used to analyze whether

theory-based relationship expectations corresponded with the empirical

findings.

4. Results

4.1. Descriptive results

In this section, we focus on the sum scores and individual item scores for

the different constructs in the questionnaire. We discuss the variables as

ordered in our theoretical model: influencing factors, feedback use, and

perceived SPF effects. Finally, the results for the data literacy test are

discussed.

The sum score for the scale attitude towards SPF reveals that a large

majority of the respondents state that SPF use is (to some degree) a

valuable activity (M = 3.97, SD = 1.08). The most positive scores (M > 4)

were recorded in relation to the statements that SPF stimulates self-

evaluation, that much can be learned from SPF and that SPF results in

better management and more involvement in school policy. The statement

for which the lowest score was recorded (M = 3.46, SD = 1.22) related to

school feedback being an enjoyable activity for the majority of team

members.

In addition to a positive attitude, most of the respondents had a positive

self-efficacy score relating to the interpretation of and possible uses of the

feedback report (M = 3.81, SD =.74). For example, 80% stated that they

understand the terms, figures and tables in the report and can see

connections between the terms. Only a minority (between 12 and 18%)

disagreed with the statement concerning their ability to clearly grasp the

objectives and possibilities for the use of SPF or describe terms from the

report in their own words.

As regards the phases in feedback use, only a minority of the principals

reported having invested significant time and effort in the interpretation

and diagnosis of the SPF, despite the fact that the majority of respondents

indicated having an interest in the SPF report (M = 4.37, SD = 0.72).

Although 70% of the respondents agreed with the statement that the

report had been examined thoroughly (M = 3.84, SD = 0.97), only 43% of

Chapter 5

124

the respondents stated they had sought explanations for the performance

of their own schools on the basis of this report (M = 3.30, SD = 1.11).

In regard to the types of SPF use, the respondents scored significantly

higher (t (114) = 4.64, p < .001) for items pertaining to conceptual use (M =

3.27, SD = .83), as compared to items relating to instrumental use (M =

2.85, SD = .97). Half of the respondents stated that the SPF had an impact

on their perception of pupils’ performance and on the school in general

(conceptual use), while only 30% of the respondents stated that the report

had resulted in specific action (instrumental use).

The latter leads to the - not surprising - result that the perceived effect of

SPF use is rather low. Only a limited number of respondents reported any

effects of the SPF (M = 2.92, SD =.90). Between 30 and 40% stated that the

SPF report has contributed to more discussions on how the school

functions, to more attention for professional development, to a better

functioning of the school principal and to skills improvement in SPF use.

Around twenty percent of the respondents indicated that the SPF report

has improved the quality of the teaching in their schools.

The results of the data literacy test reveal that only 42% of the

respondents answered half of the questions correctly. Only 10% of the

respondents answered more than three-quarters of the questions correctly.

But some school principals (n = 5) succeeded in interpreting all the

information from the report correctly. Analysis of the difficulty of the

literacy test items points out that most principals experience difficulties in

relation to procedural exercises; i.e. reading the learning gains and added

value from the graphs and tables. The conceptual questions were

apparently less difficult.

4.2. Path model 1: Phases in use, types of use and perceived effects of

feedback use

The theoretical framework - presented in Figure 1 – shows the hypothetical

direct and indirect relations between the variables in our model: the data

literacy competences influence the perceived SPF effects via the phases in

use and types of feedback use. In a first analysis approach, the data of all

respondents were entered in the model without making a distinction

between SPF support conditions.

125

Figure 2. Results of path-model: Use and perceived effects of SPF use

Figure 3. Results of path model: Impact of support

Chapter 5

126

In order to test the mediation hypotheses, the direct effect of all

independent variables on the endogenous variable have to be studied

(MacKinnon 2008). This initial model was found to include various

statistically non-significant regression lines and co-variations. These had to

be removed - stepwise - in order to achieve a parsimonious model. Figure 2

shows the findings of the resulting path model, including standardized path

coefficients and percentages of variance explained (X² (df) = 8.1 (8), p =

0.43; RMSEA = 0.01; AGFI = 0.92; GFI = 0.98). This path model can be used

to answer the second research question about the extent to which

differences in SPF use (phases and types) and perceived SPF effects can be

explained by SPF competences.

The percentages of variance explained for the variables relating to SPF

use (phases and types) are highly relevant. For example, 39% of the

variance in the variable ‘phases in SF use’ can be explained by the data

literacy competences of the respondents. The higher the respondents’

estimation of their level of knowledge and skills (self-efficacy) and the more

positive their attitude towards SPF, the higher they invest in the use of SPF.

However, the additional effect of the ‘actual’ knowledge and skills is

limited. The theoretical model also hypothesized that the ‘types of SPF use’

can only be explained directly by the ‘phases in SPF use’. This only holds

true in relation to an instrumental use (24% of the variance explained).

When considering a conceptual use, also the attitude towards SPF use and

self-efficacy are relevant. Together with the ‘phases in SPF use’ these two

variables explain 43% of the variance in conceptual use. It can also be

concluded that a positive correlation (.32) exists between the unexplained

variance in the variables instrumental and conceptual use. This possibly

indicates - after checking for the other variables in the model - that the

number of instrumental and conceptual use respondents report increases

concurrently.

The ultimate variable in our model is the perceived effect of SPF use.

The path model explains 66% of the variance in this variable. The model

suggests that the ‘types of SPF use’ play an important role. The more

intensively respondents report conceptual and instrumental use of the SPF,

the higher their perception of the SPF effects. In contrast to our initial

model, there seems to exist a direct relationship between the attitude

towards SPF on the perceived effects of SPF. Therefore, we have to

conclude that the hypothetical mediation role of certain variables is more

direct than expected.

Chapter 5

127

4.3. Path model 2: Differential impact of support on data literacy

competences, feedback use and perceived effects

Building on the previous model, a subsequent path analysis was carried out

to test whether the SPF support condition results in significantly higher

scores than the control group as regards data literacy competences,

feedback use and perceived effects. A dummy variable was added to the

model referring to the experimental (1) and control (0) condition. Figure 3

displays the results of the path model with support, using the standardized

path coefficient and the percentage of variance explained in the

endogenous variables (X² (df) = 11.3 (13), p = 0.58; RMSEA = 0.01; AGFI =

0.92; GFI = 0.97).

The path model immediately reveals that there is no significant direct

impact of support on the ‘phases in SPF use’, the ‘types of SPF use’ and the

‘perceived effects of SPF use’. This is consistent with the - a priori -

theoretical framework. Nevertheless, it has to be stressed that the

proportion of variance explained in the competence related variable and

the self-efficacy variable is limited. The results also do not confirm the

expectation that SPF support contributes to a more positive attitude. Yet

support does have a statistically significant effect on the mastery of

knowledge and skills: 11% of the variance in the test scores can be

attributed to whether or not respondents received support. This impact

remains limited as far as the self-efficacy is concerned. Only 2% of the

variance in this variable can be attributed to the experimental condition.

5. Conclusion and discussion

In the present study, we focused on the question of how schools use school

performance feedback, and what the perceived effects are of SPF use. At a

general level, the respondents reported a rather low level of perceived

impact of SPF. Nonetheless, the majority of respondents stated that they

had thoroughly read and examined the SPF report. However, a less

significant number of respondents have considerably invested time and/or

efforts in interpreting the results and seeking explanations for the results of

their own schools. In line with our theoretical framework, differences in the

‘phases in SPF use’ translate into differences in the ‘types of SPF use’. There

is a considerably higher occurrence of conceptual use than instrumental

use. This can be explained by the fact that a conceptual use (control and

plan-oriented) precedes an instrumental use (goal-oriented) in a school

policy cycle (cf. Plan-Do-Check-Act-cycle, Deming, 1986). Research already

Chapter 5

128

revealed that many schools experience difficulties to use the findings of a

control stage in subsequent steps of quality control (Schildkamp 2007;

Verhaeghe et al., 2010).

The results are also helpful to demonstrate that differences in SPF use

correspond with differences in SPF competences. This study confirms

hypotheses related to the second research question. In regard of the

attitude towards SPF, we found that the impact is not only mediated by the

‘phases in SPF use’, but also a direct relationship exists with the ‘types of

SPF use’ and ‘perceived SPF effects’. Another relevant finding is that the

‘phases in SPF use’ are related more closely to the perceived mastery of

knowledge and skills (academic self-efficacy) as compared to the actual

mastery as measured with the data literacy test. We learn from this that

faith in one’s own knowledge and skills is very important in making the

transition to action (Bandura, 1977). Obviously, it should also be noted that

the actual mastery of the knowledge and skills is still relevant. School

policies should be developed on the basis of correct information (Devos &

Verhoeven, 2003).

A key research question in this study related to the differential impact of

an SPF support provision. Building on the theoretical framework, this

implies that we expect SPF support initiatives to effect - in a direct way - the

SPF related competences (attitudes, data literacy and self-efficacy). No

direct impact was expected on the ‘phases in SPF use’, related ‘types of SPF

use’ and ‘perceived SPF effects’. The results confirm by large our theoretical

assumptions. Principals that participated in the SPF support condition

attained higher on data literacy test scores and reported higher self-efficacy

levels. This consequently affected their process of feedback use (phases in

use). The expected indirect effect of SPF support is in line with Kirkpatrick’s

model (1998), implying that a higher level can only be achieved if lower

levels have been attained. Contrary to our expectations, participation in the

SPF support condition had no significant effect on the attitude towards SPF.

We have to stress that this attitude remains a crucial factor. Future SPF

support could focus to a larger extent, on the fundamental basis and

motives to implement SPF and on facilitating successful experiences with

SPF. Furthermore, SPF support initiatives that offer opportunities for

discussion and exchange of experiences – within and between schools -

must be considered (Huffman & Kalnin, 2003; Lachat & Smith, 2005;

Wayman, Midgley, & Stringfield, 2007). Some authors stress that such

discussions and exchanges are crucial to see benefits of SPF in terms of

school improvement (Zupanc, Urank, & Bren, 2009).

Another interesting finding is the larger impact of the SPF support

initiative on data literacy test scores as compared to the impact on

Chapter 5

129

academic self-efficacy. An initial explanation for this fact relates to the

limited scope of the support initiative. This was a single shot activity that

focused on the development of interpretation skills. The SPF support

seemed to have succeeded in the latter. A second explanation can be that

the SPF support has raised awareness among the participants about the

complexity of school feedback. This can explain why the SPF support results

mainly in higher literacy test scores and to a lesser extent in an increased

level of self-efficacy. A third explanation is that the participants have

learned hardly something from the support intervention. It is possible they

report the same level of self-efficacy but attain higher literacy test scores as

a result of an extra effort.

Future research about the impact of SPF support could adopt a

longitudinal approach with a more elaborated pre- and postintervention

measurement. This could enable to take into account the specific support

needs of respondents. These differences in need could also be linked to the

selection of SPF training participants. Moreover, the - delayed - effect of

SPF usage student achievement could be studied. Such effects could only be

expected after several SPF reports and persistent efforts for effective SPF

use. In addition, it would be worthwhile to set up research related to more

intensive support initiatives that go beyond single shot SPF support

provisions. At a theoretical level, a cross-validation of the path model

developed in the present study could be emphasized. In the present study

this was not possible because a sample size of 100 respondents is required

(Hoyle, 1995). Finally, the low data literacy test scores and its relationship

with the ‘phases in SPF use’ needs to make a methodological comment. The

data literacy test score was the single variable in the model not based on

perceptions of the respondents. The strong interrelations between

perception-based variables in the present study are thought-provoking. It

introduces the need for research that links the ‘perceived’ to the ‘expected’

and in particular to the ‘actual’ use of SPF. Finally, next to a focus on the

competences and perceptions of principals, future research could also

switch the attention to other critical actors in the discussion about

educational quality: inspection authorities, teachers, school teams, etc.

We finish by repeating that we observed a strong interest in SPF and a

positive attitude towards SPF among the Flemish school principals in our

study. This is in sharp contrast to their limited usage of the school

performance feedback information and the related effects on educational

practices and results. The study therefore shows in a particular way the

need to develop critical and conditional competences related to SPF use.

This is interesting from both a theoretical and practical point of view since

many support initiatives are being set up by feedback providers (e.g.

Chapter 5

130

helpdesks, after-school information sessions, information sessions at

school, and so on) without evaluating their direct and indirect effects.

References

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral

change. Psychological Review, 84(2), 191-215.







Creswell, J.W. (2008). Educational Research: Planning, conducting, and

evaluating quantitative and qualitative research (3rd ed.) Upper Saddle

River, NJ: Pearson Prentice Hall.


of Technology,Center for Advanced Engineering Study.

Devos, G., & Verhoeven, J. (2003). School self-evaluation - Conditions and

caveats: The case of secondary schools. Educational Management &

Administration, 31(1), 403-420.



Field, A.P., & Hole, G. (2003). How to design and report experiments.

Thousand Oaks, CA: Sage.








Evaluation.




Hoyle, R.H. (Ed.). (1995). Structural Equation Modeling: Concepts, issues

and applications. Thousand Oaks, CA: Sage.

Huffman, D. & Kalnin, J. (2003). Collaborative inquiry to make data-based

decisions in schools. Teaching and Teacher Education, 19, 569-580.

Chapter 5

131




Education, 112, 496-520.

Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels.

San Francisco: Berrett-Koehler.

Lachat, M.A., & Smith, S. (2005). Practices that support data use in urban

high schools. Journal of Education for Students Placed at Risk, 10(3), 333-

349.



data use at learning point associates. Retrieved from


MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. New

York: Lawrence Erlbaum Associates.

Mathison, S. (1992). An evaluation model for inservice teacher education.

Evaluation and Program Planning, 15, 255-261.

Meuret, D., & Morlaix, S. (2003). Conditions of success of a school's self-

evaluation: Some lessons of an European experience. School




perspective (pp 3-16). Oxford, UK: Elsevier Science.

Petty, R.E., & Wegener, D.T. (1998). Attitude change: Multiple roles for

persuasion variables. In D. Gilbert, S. Fiske & G. Lindzey (Eds.), The

handbook of social psychology (pp. 323-90). New York: McGraw-Hill.


approach. Thousand Oaks: Sage.



15(3), 241-258.




Sussex.



Twente, Enschede, The Netherlands.




Chapter 5

132







Schunk, D.H. (1991). Self-efficacy and academic motivation. Educational

Psychologist, 26(3&4), 207-231.






school improvement]. Pedagogische Studiën, 81, 338-353.

Vanhoof, J., Van Petegem, P., & De Maeyer, S. (2009). Attitude towards

school self-evaluation. Studies in Educational Evaluation, 35, 21-28.







Swets & Zeitlinger.





performance feedback. Lisse, The Netherlands: Swets & Zeitlinger

Wayman, J.C., Midgley, S., & Stringfield, S. (2007). Leadership for data-

based decision making: Collaborative educator teams. In A. B. Danzig, K.

M. Borman, B. A.Jones & W. F. Wright (Eds.), Learner-centered

leadership: Research, policy and practice (pp. 189-205). New Jersey, USA:

Lawrence Erlbaum Associates.

Webber, S., & Johnston, B. (2000). Conceptions of information literacy: New

perspectives and implications. Journal of Information Science, 26(6),

381-397.





Research, 49(2), 185-206.

Chapter 5

133





134

CHAPTER 6

EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK

Chapter 6

135

CHAPTER 6: EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK∗∗∗∗

Abstract

Effects of support by school performance feedback use

School development by systematic data use requires schools to be provided

with information-rich environments. However, providing school

performance feedback does not guarantee a successful use. Limited data

literacy competences of the users are one of the main stumbling blocks.

Support initiatives were developed and evaluated to overcome this

shortcoming. In a randomized field study, the effects of two experimental

conditions related to inservice and onservice education and training (INSET

and ONSET) are compared against a control group. This study examines the

relationship between data literacy competences, support provisions for

data interpretation, actual usage of the feedback, and school improvements

effects. The research was based on in-depth interviews involving 18 primary

school principals. The results of a case ordered predictor-outcome meta-

matrix do not only reveal difficulties in handling the information but also

incongruences in attitude towards feedback use between school principals

and teachers. The ONSET-condition led to the most optimal results

promoting a tailored support approach.

∗ Gebaseerd op Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten

van ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën.

Chapter 6

136

Samenvatting

Gezien hun maatschappelijke rol, wordt van scholen verwacht dat hun

benadering van schoolontwikkeling op een systematische manier gebeurt.

Daarom worden ze ondermeer aangezet om het interne

kwaliteitszorgbeleid te baseren op concrete data.

Schoolfeedbackinitiatieven zijn een mogelijke bron van dergelijke data. Het

gebruik van deze schoolfeedback blijkt echter niet vanzelfsprekend,

ondermeer door een gebrek aan datageletterdheidscompetenties. Om op

deze behoefte in te spelen worden verscheidene ondersteuningsactiviteiten

opgezet, die ofwel binnen (ONSET) of buiten de school worden

georganiseerd (INSET). In deze studie worden de resultaten gerapporteerd

van een evaluatieonderzoek waarbij naast een INSET- en een ONSET-

ondersteuningsopzet ook feedbackgebruikers in een controleconditie

worden betrokken. Bijzondere aandacht wordt daarbij besteed aan het

beïnvloeden van datageletterdheidscompetenties en het evalueren van

effecten op vier niveaus. Onderzoeksgegevens werden verzameld via

diepte-interviews met 18 schoolleiders uit de drie condities en werden

verwerkt in een case ordered predictor-outcome meta-matrix. De resultaten

tonen niet alleen een gebrek aan in kennis en vaardigheden om met

schoolfeedback om te gaan, maar ook een verdeelde houding tussen

schoolleiders en leerkrachten. Verder blijkt de ONSET-conditie tot de beste

resultaten te leiden wat impliceert dat ondersteuning in functie van

feedbackgebruik het best op maat van de school wordt aangeboden.

1. Probleemstelling

Van scholen wordt in groeiende mate verwacht dat ze van

schoolontwikkeling een systematisch proces maken (Nevo, 2002; Leithwood

& Aiken, 1995). Om hen daarbij te assisteren wordt gestreefd naar het

creëren van informatierijke omgevingen. Zo worden scholen ondermeer

voorzien van feedback over hun functioneren en prestaties door speciaal

daartoe opgezette schoolfeedbacksystemen. Dit gebeurt met de

verwachting dat scholen deze feedback aanwenden in het kader van

zelfevaluatie (Visscher & Coe, 2002; 2003).

Het gebruik van dergelijke informatiebronnen als een beleidsinstrument

blijkt echter niet vanzelfsprekend. Doorgaans blijven het gebruik en de

schoolverbeteringseffecten gelimiteerd (Coe, 2002; Saunders & Rudd, 1999;

Tymms, 1995; Schildkamp, Visscher, & Luyten, 2009; Van Petegem &

Vanhoof, 2004; Zupanc, Urank, & Bren, 2009). Schoolfeedback ontvangen

Chapter 6

137

blijkt een noodzakelijke maar geen voldoende stap. Zowel de scholen als de

feedbacksystemen moeten immers aan bepaalde voorwaarden voldoen

(Visscher & Coe, 2003; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010).

Één van de belangrijkste hinderpalen die een effectief gegevensgebruik in

de weg staat, is het ontbreken van datageletterdheid bij de gebruikers (Earl

& Fullan, 2003). Niet verwonderlijk zijn dan ook de onderzoeksbevindingen

waarbij schoolleiders en leerkrachten aangeven behoefte te hebben aan

bijkomende ondersteuning bij zowel het interpreteren als het verder

gebruik van de data (Schildkamp & Teddlie, 2008; Schildkamp et al., 2009;

Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc et al., 2009).

2. Conceptueel kader

2.1. Fasen in en types van schoolfeedbackgebruik

Schoolfeedbackgebruik kan op twee manieren omschreven worden.

Enerzijds kan verwezen worden naar de verschillende stappen die

feedbackgebruikers ondernemen om met de data aan de slag te gaan.

Onderzoek leert dat om gebruik te maken van schoolfeedback het

doordacht doorlopen van een cyclisch proces aangewezen is (Huffman &

Kalnin, 2003; Learning Point Associates, 2004; Verhaeghe et al., 2010).

Daarbij worden het ontvangen, lezen en bediscussiëren van de

schoolfeedback onderscheiden, om tot een correcte interpretatie te

komen. Nadat de school een sterkte-zwakteanalyse van haar resultaten

heeft gemaakt, volgt een fase waarin met de schoolfeedback aan de slag

wordt gegaan. Deze omvat het diagnosticeren door het zoeken naar

verklaringen voor de resultaten en het plannen, uitvoeren en evalueren van

acties. Omwille van een gebrek aan datageletterdheid en tijdsinvestering

blijken scholen deze stappen niet of moeizaam te doorlopen (Earl & Fullan,

2003; Verhaeghe et al., 2010).

Daarnaast kan bij het gebruiken van data binnen scholen verwezen

worden naar verschillende types van gebruik. Gebaseerd op de indeling

volgens Rossi, Lipsey en Freeman (2004) kan een onderscheid gemaakt

worden in verschillende soorten gebruik van evaluatiegegevens, eveneens

toegepast in de context van schoolfeedbackgebruik (Schildkamp et al.,

2009; Verhaeghe et al., 2010; Weiss, 1998). Scholen kunnen bijvoorbeeld

acties ondernemen (instrumenteel gebruik), aan het denken gaan

(conceptueel gebruik), bevestiging zoeken van bestaande standpunten

(symbolisch gebruik), het rapport in een verantwoordingcontext hanteren

Chapter 6

138

(strategisch gebruik) of het rapport gebruiken om teamleden te stimuleren

of motiveren (motiverend gebruik).

2.2. Effecten van schoolfeedbackgebruik

Het ultieme doel van schoolfeedbackgebruik is bij te dragen aan

schoolontwikkeling (Visscher & Coe, 2002, 2003). Echter,

schoolfeedbackgebruik blijkt niet steeds te resulteren in significant

verbeterde leerlingprestaties (Fitz-Gibbon & Tymms, 2002; Schildkamp et

al., 2009; Visscher, 2002). Bij het nagaan van schoolverbeteringseffecten

dient dan ook ruimer gekeken te worden naar ondermeer effecten op de

professionele ontwikkeling van teamleden (zoals een toenemende mate

van assessment literacy; Zupanc et al., 2009), verbeterde

onderwijsprocessen (zoals het intensifiëren van leerlingenbegeleiding,

Schildkamp & Teddlie, 2008) en een verbeterd schoolfunctioneren (zoals

het versterken van de cohesie in de school, Visscher & Coe, 2003). Ook

onbedoelde en onwenselijke effecten kunnen zich voordoen zoals

demotivatie bij leerkrachten door werkoverlast (Fitz-Gibbon & Tymms,

2002) of een te sterke focus op getoetste leerinhouden (Schildkamp &

Teddlie, 2008; Visscher, 2002).

2.3. Beïnvloedende factoren

Verschillen in schoolfeedbackgebruik en de effecten ervan kunnen

toegeschreven worden aan een viertal cluster van factoren die refereren

naar kenmerken van gebruikers, feedbacksystemen, ondersteuning en de

educatieve context (Verhaeghe et al., 2010; Visscher & Coe, 2003). Gezien

de gebrekkige datageletterdheidscompetenties van de gebruikers en de

urgente vraag naar onderzoek over ondersteuning hierbij spitsen we ons in

deze studie op deze twee factoren toe.

Competenties bij schoolfeedbackgebruik

Het begrip competentie verwijst naar de integratie van de kennis,

vaardigheden en attitudes die nodig zijn om adequaat te handelen in

specifieke situaties (Gonczi, 1994). Uit de onderzoeksliteratuur blijkt dat de

mate van informatiegeletterdheid (Webber & Johnston, 2000) een grote rol

speelt bij schoolfeedbackgebruik. Deze algemene term omvat de

strategieën, vaardigheden en kennis die nodig zijn om informatienoden te

bepalen en om de nodige informatie te verzamelen en te verwerken

(Williams & Coles, p 188). Toegepast op het domein van datagebruik binnen

Chapter 6

139

de school spreekt men van datageletterdheid. Het datageletterdheid zijn is

een noodzakelijke voorwaarde om data te kunnen omzetten in bruikbare

informatie (Earl & Fullan, 2003). Echter, de beperkte kennis om met de

gegevens aan de slag te gaan en de daarmee gepaard gaande onzekerheid

vormen vaak een obstakel (Earl & Fullan, 2003; Kerr et al., 2006; Saunders,

2000; Verhaeghe et al., 2010). Er zou niet alleen een gebrek aan

capaciteiten zijn om de data te interpreteren; ook onderzoeksvaardigheden

zoals het formuleren van onderzoeksvragen en hypothesen zijn doorgaans

niet sterk ontwikkeld (Earl & Fullan, 2003; Herman & Gribbons, 2001; Kerr

et al., 2006).

Het concept datageletterdheidscompetenties vraagt eveneens aandacht

voor de houding ten aanzien van schoolfeedback. Een negatieve houding

ten aanzien van schoolfeedback wordt door Bosker, Branderhorst en

Visscher (2007) als één van de voornaamste hinderpalen voor het gebruik

van feedbackinformatie naar voren geschoven. Het gaat dan bijvoorbeeld

om het geloof van de gebruikers dat ze data nodig hebben om hun

onderwijs te verbeteren (Schildkamp & Kuiper, 2010). De houding van

gebruikers ten opzichte van datagebruik bepaalt dan ook grotendeels in

hoeverre men bereid is om tijd en inspanningen te investeren in het gebruik

van de informatie (Williams & Coles, 2007).

Ondersteuning van scholen bij schoolfeedbackgebruik

Gezien de gebrekkige datageletterdheidscompetenties zijn

schoolfeedbackgebruikers vragende partij voor het ter beschikking stellen

van ondersteuning bij zowel de data-interpretatie als het verder gebruik

van de gegevens (bv. Schildkamp & Teddlie, 2008; Verhaeghe et al. , 2010;

Visscher & Coe, 2003). Deze ondersteuning kan geboden worden door

zowel externe ondersteuners – bijvoorbeeld educatieve diensten en

feedbackleveranciers – als door schoolteamleden intern in de school.

Voor het indelen van externe ondersteuningsinitiatieven kan Gardners

(1995) continuüm voor nascholingsinitiatieven gebruikt worden. Aan de

uitersten situeren zich initiatieven die buiten de school (Inservice Education

and Training - INSET) en binnen de school plaatsvinden (Onservice

Education and Training - ONSET). Een voordeel van INSET bijeenkomsten -

waarbij deelnemers uit verschillende scholen buiten de eigen school

samengebracht worden - is dat men door sociale interactie formeel en

informeel van elkaar kan leren (Mathison, 1992). Doordat doorgaans slechts

één afgevaardigde per school deelneemt, kan echter een beperktere impact

verwacht worden dan bij ONSET-initiatieven waarbij meerdere leden van

het schoolteam kunnen betrokken worden. Toch is er het vertrouwen dat

Chapter 6

140

schoolleiders als katalysator de geleerde inzichten kunnen doorgeven aan

het schoolteam (Kerr et al., 2006). Verscheidene studies tonen dan ook aan

dat de meest succesvolle leiders in datagebruik wel voortrekker zijn maar

dan via gedistribueerd leiderschap de taken voor datagebruik delen

(Wayman, Midgley, & Stringfield, 2007). ONSET-initiatieven zouden meer

kosteneffectief zijn dan inservice training doordat de training doorgaat

binnen de school met de eigen data en eigen problemen als uitgangspunt.

Bijgevolg is de kans groter dat de veranderingen aanvaard worden door de

sterkere betrokkenheid en praktijkband (Gardner, 1995; Murnane, Sharkey,

& Boudett, 2005). Wanneer daarenboven verschillende schoolteamleden

aanwezig zijn, kan dit aanzetten tot meer intern overleg en verdere

opvolging. Op die manier kan feedbackgebruik evolueren van een

individuele aangelegenheid naar een gedeelde verantwoordelijkheid, al dan

niet onder de vorm van collaborative data teams (Huffman & Kalnin, 2003;

Lachat & Smith, 2005; Wayman et al., 2007). De rol van de schoolleider is

ook bij deze evolutie van groot belang door ondermeer het creëren van een

duidelijke visie en verwachtingen rond datagebruik (Young, 2006) en het

coachen van de datateams (Lachat & Smith, 2005).

2.4. Evaluatiemodel voor ondersteuningsinitiatieven – Onderzoeksvragen

Om de mogelijke effecten van ondersteuning bij schoolfeedbackgebruik te

inventariseren en te integreren in het ruimer conceptueel kader doen we

een beroep op de vier opeenvolgende evaluatieniveaus voor

professionaliseringsactiviteiten van Kirkpatrick (1998).

Vooreerst worden de reacties van de deelnemers gemeten, onmiddellijk

na de ondersteuning. Het gaat om een algemene indruk en de relevantie en

toepassingsmogelijkheden. Al te vaak blijft de evaluatie van ondersteuning

beperkt tot dit niveau, terwijl de impact op de organisatie niet wordt

nagegaan (Mathison, 1992; Rossi et al., 2004). Vervolgens wordt de impact

op het leren van de deelnemers bekeken, of de toename aan kennis en

bekwaamheden en de verandering in attitudes als gevolg van de

ondersteuning. Ten derde wordt nagegaan of er een transfer is van wat er

geleerd werd tijdens de ondersteuning naar de organisatie en of er

gedragsveranderingen plaatsvinden. Tenslotte worden eventuele

schoolverbeteringseffecten nagegaan in het resultaatsniveau. Daarbij wordt

gekeken naar effecten van de ondersteuning op het bereiken van de doelen

van de organisatie en op de organisatie zelf.

Dit model kan toegepast worden om de impact van

ondersteuningsinitiatieven bij schoolfeedbackgebruik te evalueren. In Tabel

1 wordt dit nader toegelicht. De centrale onderzoeksvraag daarbij is in

Chapter 6

141

welke mate verschillen in schoolfeedbackgebruik verklaard kunnen worden

door ondersteuningsinitiatieven bij schoolfeedbackgebruik.

Tabel 1

Invloed van ondersteuning op schoolfeedbackgebruik volgens model Kirkpatrick

Evaluatieniveaus Kirkpatrick Toepassing op schoolfeedbackgebruik

Reactie:

Tevredenheid van de deelnemers

Tevredenheid van deelnemers over

ondersteuning bij schoolfeedbackgebruik

Leren:

Toename en/of verandering in

kennis, vaardigheden en attitudes

Verandering in

datageletterdheidscompetenties: kennis,

vaardigheden en attitudes nodig voor

succesvol schoolfeedbackgebruik

Gedrag:

Transfer geleerde inzichten naar

organisatie

Invloed op schoolfeedbackgebruik:

- Fasen in gebruik

- Types van gebruik

Resultaten:

Effecten op de organisatie

Invloed op schoolverbeteringseffecten door

schoolfeedbackgebruik

In deze bijdrage trachten we een antwoord te geven op de vraag naar de

impact van een ondersteuningsinitiatief bij schoolfeedbackgebruik door

gebruik te maken van het model van Kirkpatrick. Daarbij worden volgende

onderzoeksvragen gesteld:

1. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op de

tevredenheid van schoolfeedbackgebruikers (Reactie)?

2. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op

datageletterdheidscompetenties van schoolfeedbackgebruikers

(Leren)?

Zoals eerder beschreven bekijken we hier de mogelijke invloed van

ondersteuning op de kennis, vaardigheden en attitudes die gebruikers

nodig hebben voor succesvol schoolfeedbackgebruik.

3. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op het

gebruik van deze feedback binnen de school (Gedrag)?

Kirkpatricks model impliceert dat het realiseren van een hoger

impactniveau maar kan als een lager niveau gerealiseerd is. Indien

ondersteuning gericht is op het beïnvloeden van

datageletterdheidscompetenties, zal er eerst een impact zijn op de

kennis, vaardigheden en attitudes van de deelnemers. Vervolgens

zullen deze veranderde competenties bijdragen aan succesvol

Chapter 6

142

schoolfeedbackgebruik, dat in deze studie bepaald wordt in termen van

ondernomen stappen als soorten van feedbackgebruik.

4. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op de

schoolverbeteringseffecten door feedbackgebruik (Resultaten)?

We verwachten hierbij pas van schoolverbeteringseffecten te spreken

indien succesvol feedbackgebruik ze voorafgaat.

3. Methode

Voor het beantwoorden van de onderzoeksvragen werd gekozen voor een

veldexperiment met een posttest. De onderzoekspopulatie (N = 195

scholen) werd random toegewezen aan de verschillende condities.

3.1. Onderzoekscondities

Vertrekkende van het continuüm van inservice en onservice training

(Gardner, 1995) werd gekozen om twee ondersteuningsvarianten te

ontwerpen en uit te testen, afgezet tegenover een controlegroep die geen

bijkomende ondersteuning ontving (n = 150). De eerste experimentele

conditie noemen we de INSET-conditie omdat de training niet doorging op

de werkplek van de deelnemers en de leerinhouden gebaseerd waren op

een fictief schoolvoorbeeld in plaats van de eigen schoolresultaten.

Daarnaast onderscheiden we een ONSET-conditie aangezien zowel de

plaats van de training als de aangeboden leerinhouden dicht bij de

schooleigen context stonden. De kenmerken van beide condities worden

toegelicht in Tabel 2.

De leerdoelstellingen voor de twee experimentele

ondersteuningscondities waren identiek. Deelnemers werden na afloop van

de ondersteuning geacht in staat te zijn (1) in eigen woorden de centrale

begrippen uit het schoolfeedbackrapport te omschrijven; (2) de figuren en

de tabellen uit het schoolfeedbackrapport correct te interpreteren; (3)

verklaringen aan te geven waarom prestaties minder goed of beter kunnen

zijn dan die van de referentiegroep en (4) voor de eigen schoolcontext te

omschrijven welke functie(s) het schoolfeedbackrapport kan vervullen.

Deze leerdoelen richtten zich vooral op het tweede niveau van Kirkpatrick

(1998), waarin de beïnvloeding van kennis, vaardigheden en attitudes werd

beoogd. Daarnaast trachtte de ondersteuning ook indirect het

schoolfeedbackgebruik te beïnvloeden (Gedrag) door feedbackgebruikers

wegwijs te maken in de verschillende stappen voor systematisch

feedbackgebruik.

Chapter 6

143

Tabel 2

Beschrijving van INSET-en ONSET-conditie

INSET ONSET

Ondersteuners Twee medewerkers

Schoolfeedbackproject

Één van de twee medewerkers

van het Schoolfeedbackproject

uit de INSET-ondersteuning

Opzet Studievoormiddag Schoolbezoek

Doelgroep Meest betrokken persoon op

school bij gebruik van het

schoolfeedbackrapport (keuze

aan de school overgelaten)

Bij voorkeur de schoolleider, de

zorgcoördinator en twee

leerkrachten (uiteindelijke keuze

aan school overgelaten)

Deelnemers - 23 deelnemers uit 23 scholen

(10 in sessie 1 en 13 in sessie

2)

- 20 schoolleiders en 3

zorgcoördinatoren

- 13 deelnemers uit 7 scholen

- 6 x schoolleider met

zorgcoördinator; 1 x

schoolleider, zorgcoördinator

en leerkracht

Planning Ruim een maand na het

ontvangen van het

feedbackrapport

Idem als INSET

Locatie Universiteitsgebouw Eigen school

Inhoud - Aan de hand van

feedbackrapport met fictieve

school

- Uitleg bij de concepten en

representatievormen uit

feedbackrapport

- Leergesprek over de

gebruiksmogelijkheden van de

feedback

- Toelichting over onderliggend

schoolfeedbacksysteem

- Inoefen- en evaluatiemoment

Idem als INSET maar aanvullend

werd steeds een terugkoppeling

gemaakt naar het eigen

schoolfeedbackrapport.

Werkvorm Een variatie van didactische

werkvormen, van

instructiegerichte presentaties

tot vraaggesprekken en

groepsdiscussies.

Idem als INSET, maar enkel met

eigen schoolteamleden

Chapter 6

144

3.2. Selectie interviewrespondenten

Deze studie maakt deel uit van het Schoolfeedbackproject genaamd “Each

school its own mirror” (Verhaeghe & Van Damme, 2006). In het kader

daarvan ontvingen 195 Vlaamse scholen in het voorjaar 2008 feedback op

vertrouwelijke basis, waarbij hun schoolresultaten vergeleken werden met

een representatieve referentiegroep uit het SiBO-onderzoek (Maes, Van

Petegem, & Van Damme, 2005). Het ging om gegevens van een cohorte

leerlingen die (tot dan toe) van het einde van het kleuteronderwijs tot en

met het vierde leerjaar opgevolgd werden voor wiskunde en taal (spelling,

technisch en begrijpend lezen) aangevuld met informatie over

instroomkenmerken van leerlingen. De centrale concepten in het

feedbackrapport (leerwinst, toegevoegde waarde en gecorrigeerde scores)

werden zodanig uitgelegd dat de noodzaak van statistische voorkennis

zoveel mogelijk opgevangen werd. De feedbackdata werden ondersteund

door grafische voorstellingswijzen (cirkeldiagram, groeicurven en

kruistabellen). De tekst was gestandaardiseerd. Bijgevolg werd van

schoolteamleden verwacht om zelf de schooleigen data te interpreteren

aan de hand van de algemene uitleg.

Uit deze groep scholen werden door toevalstrekking 45 basisscholen

uitgenodigd om aan de ondersteuning deel te nemen. Daarvan namen er 23

deel aan de INSET- en 7 aan de ONSET-conditie (zie Figuur 1).

Figuur 1. Overzicht steekproeftrekking.

Hoewel de toewijzing van de proefpersonen aan de verschillende

condities random verliep, is een risico op selectievertekening mogelijk.

Omdat dit een mogelijke bedreiging kan vormen voor de interne validiteit

van het experiment werd op basis van eerder verzamelde gegevens

onderzocht of deze subgroep op relevante criteria afweek van de populatie

(N = 195). Uit deze analyses bleek dat de geselecteerde scholen niet

statistisch significant verschilden op vlak van de houding ten aanzien van

7 ONSET

45 vraag ondersteuning

150 geen vraag ondersteuning

6 interviews

23 INSET

6 interviews

6 interviews

195 basisscholen uit Schoolfeedbackproject

150 controle

Chapter 6

145

schoolfeedback, het verwachte gebruik van de schoolfeedback, de

perceptie van relevantie van de schoolfeedback, de instroomkenmerken

van leerlingen en de schoolprestaties uit de feedbackrapporten. Daarna

werden door toevalstrekking uit iedere conditie zes respondenten

geselecteerd voor deelname aan de interviews.

3.3. Onderzoeksinstrument en –procedure

Data werden verzameld door middel van semigestructureerde diepte-

interviews. Daartoe werden de schoolleiders een half jaar na de

ondersteuningsinterventies bezocht op hun school door één van beide

onderzoekers die de ondersteuningsinterventies hadden verzorgd.

De interviewvragen zijn opgesteld volgens het eerder besproken

conceptuele kader, passend binnen de vier evaluatieniveaus. Er werden

geen vragen gesteld die rechtstreeks naar de invloed van ondersteuning op

feedbackgebruik peilden, om antwoordvertekening door sociaalwenselijke

antwoorden te vermijden. Doorvragen was toegelaten om meer

verduidelijking of uitleg te krijgen (Lindlof & Taylor, 2002). Het

interviewinstrument bestond uit een veertigtal vragen voor ruim een uur

interviewtijd. Enkele voorbeelden van interviewvragen zijn:

• Reactie:

- Tevredenheid: Bent u tevreden over de ondersteuning van binnen en

buiten de school samen die u in het kader van het gebruik van de

schoolrapporten mocht genieten?

• Leren:

- Kennis en vaardigheden: Heeft u het gevoel voldoende vertrouwd te

zijn met het interpreteren van dergelijke feedbackgegevens? Welke

kennis en vaardigheden heeft men volgens u nodig om dit rapport

correct te kunnen interpreteren?

- Houding: Hoe staat u op dit moment tegenover het gebruik van

schoolfeedback? Is het de investering waard?

• Transfer

- Fasen in gebruik: Graag zouden we willen weten welk traject het

schoolrapport al heeft doorgemaakt sinds het hier in de school

toekwam. Zou u kort kunnen aangeven welke stappen er werden

gezet?

- Types van gebruik: Heeft het schoolfeedbackrapport tot concrete

actiepunten of beslissingen geleid? Bent u door het

schoolfeedbackrapport anders gaan kijken naar uw school?

• Resultaten

Chapter 6

146

- Effecten: Hoe zou u zelf de effecten van het gebruik dit

schoolfeedbackrapport omschrijven? Ziet u ongewenste

neveneffecten van het gebruik van dit schoolfeedbackrapport?

• Ondersteuning

- Genoten ondersteuning: Heeft u voor het interpreteren van de

resultaten een beroep gedaan op anderen binnen de school? Heeft u

een beroep gedaan op externen bij het interpreteren, diagnosticeren

of gebruiken van het schoolfeedbackrapport?

3.4. Analyse

Interviews werden opgenomen en nadien getranscribeerd. Daarna werden

ze onafhankelijk door twee onderzoekers gecodeerd door middel van de

kwalitatieve analysetool ATLAS.ti. Codes werden toegekend volgens de

middle order approach, wat toelaat om aanvankelijk ruime categorieën

later te verfijnen (Dey, 1993). De codering gebeurde hoofdzakelijk op een

deductieve wijze volgens de codes uit een codeboek, gebaseerd op het

theoretische kader. Eerst werden fragmenten geplaatst onder brede

categorieën. Wanneer aan relevante passages geen voorgedefinieerde

codes toegewezen konden worden, werden ze onder een brede categorie

geplaatst om later aan nieuwe codes toegewezen te worden, die op

inductieve wijze uit de data gegenereerd werden (Strauss & Corbin, 2007).

Na de codering van de afzonderlijke interviews werden gegevens

geanalyseerd volgens een case ordered predictor-outcome meta-matrix

(Miles & Huberman, 1994). Bij deze analyse worden de respondenten

opgedeeld volgens de onderzoekscondities waartoe ze behoren. Het doel

van deze opzet is niet enkel om de cases afzonderlijk te beschrijven maar

ook om een crosscase of variabele georiënteerde analyse uit te voeren.

Deze werkwijze gaat in de richting van een verklarende analyse van de

resultaten. De hypothese daarbij is dan de INSET- en ONSET-conditie zullen

leiden tot een hogere mate van feedbackgebruik dan de controleconditie,

met meer schoolverbeteringseffecten als gevolg. Om de sterkte van de

aanwezige kenmerken te bespreken, maken we gebruik van gradatiecodes

(afwezig – zwak aanwezig – sterk aanwezig – geen informatie). Zo werden

per variabele strikte criteria opgesteld om te bepalen in welke mate het

kenmerk aanwezig was. Deze gradaties maken het mogelijk een indicatie te

geven van de sterkte van de variabelen zonder gegevens verregaand te

kwantificeren.

Om deze metamatrix (zie Figuur 2) te construeren werden volgende

stappen ondernomen:

1. Anonimiseren van de transcripten om blind te kunnen coderen

Chapter 6

147

2. Volledige codering van de transcripten volgens het vooropgestelde

codeboek

3. Samenvatting per case volgens de structuur van het codeboek

4. Toekenning van gradatiecodes per respondent, aan iedere variabele

5. Onderbrengen van cases in metamatrix, met gradatiecodes

6. Cases terug identificeren en ordenen volgens experimentele conditie en

vervolgens naar gradatiecodes

De interviews werden onafhankelijk gecodeerd door twee onderzoekers.

Om de interbeoordelaarsbetrouwbaarheid na te gaan werden het codeboek

en gradatieregels gezamenlijk opgesteld. Daarna werd een interview

onafhankelijk door beiden gecodeerd en werd de

interbeoordelaarsbetrouwbaarheid als de verhouding tussen het aantal

overeenkomsten en het totale aantal toegekende codes onderzocht en

verhoogd tot .87 (Kurasaki, 2000; Miles & Huberman, 1994).

4. Resultaten

In de case ordered predictor-outcome meta-matrix (Figuur 2) worden de

respondenten geordend per conditie en naar gradaties in feedbackgebruik.

Hoe donkerder de celkleur, hoe sterker de betreffende variabele door de

respondent werd gerapporteerd. In de volgende alinea’s behandelen we

iedere variabele, zowel in algemene zin als per conditie. Daarbij wordt

telkens verwezen naar het aantal respondenten per conditie dat een

bepaalde uitspraak deed (C = Controlegroep, I = INSET-groep, O = ONSET-

groep).

4.1. Reactie

Om de tevredenheid na te gaan, werden de respondenten gevraagd om het

totale pakket van de genoten ondersteuning te beoordelen, inclusief de

INSET- of ONSET-ondersteuning en eventueel aanvullende interne en

externe ondersteuning. Algemeen stellen we een toenemende

tevredenheid van de genoten ondersteuning vast naarmate de intensiteit

van de ondersteuning toeneemt. Zo blijkt de tevredenheid groter bij de

ONSET-conditie dan bij de andere groepen. Enkele respondenten uit de

controleconditie konden geen tevredenheidsuitspraken doen omdat er

(quasi) geen ondersteuning op de school had plaatsgevonden (3C; / in

Figuur 2). Om meer zicht te krijgen waarop de tevredenheidsuitspraken

gebaseerd zijn, geven we een korte beschrijving van de genoten

ondersteuning in de scholen.

148

= Afwezig

= Zwak aanwezig = Sterk aanwezig CONTROLEGROEP (C) INSET-GROEP (I) ONSET-GROEP (O) / = Niet van toepassing/ Geen informatie

13 7 4 15 2 9 8 17 14 1 16 12 6 10 3 5 18 11

Reactie Tevredenheid / / /

Leren Kennis en vaardigheden Attitudes

Gedrag Fasen in gebruik Ontvangen Lezen en bespreken Interpreteren Diagnose

Planning acties Uitvoeren acties

Evalueren acties Types gebruik Instrumenteel gebruik Conceptueel gebruik

Symbolisch gebruik / Strategisch gebruik

/

/

/ /

/ / /

/

Motiverend gebruik Resultaten Effecten

Figuur 2. Impact van ondersteuning op schoolfeedbackgebruik: Resultaten in case ordered predictor-outcome meta-matrix.

Chapter 6

149

Gebruik interne ondersteuning

De bevraagde respondenten blijken niet allemaal een beroep te hebben

gedaan op expertise van andere teamleden in hun school. Slechts twee

schoolleiders uit de controlegroep geven aan enige vorm van interne

ondersteuning te hebben ondervonden terwijl dit bij alle ONSET-

respondenten wel het geval was. In bijna alle gevallen werd de

ondersteuning voorzien door de zorgcoördinator (2C, 6O) of zorgleerkracht

(1C, 1I), al dan niet aangevuld met leden van een kernteam (1I, 1O). In één

geval werd de ondersteuning aangeboden door de beleidsondersteuner

(1I).

Ik weet dat dit zijn nut heeft, maar je moet begrijpen dat wanneer je zo

een rapportje krijgt, dat je nog andere dingen binnenkrijgt. Als

schoolleiding moet je zien van: “Hoe zit dat hier in elkaar? Zo! Kort en

bondig.” En als je dan verder dieper wil gaan dan kan je het aan je

zorgcoördinator geven of aan een assistent, beleidsondersteuner, en dat

die dat dan meer in detail gaan uitspitten. (Respondent 18)

Verder blijkt dat indien er geen interne ondersteuning was, dat dit meestal

door tijdsgebrek was (1C, 1I) of het wegens omstandigheden niet

beschikbaar zijn van de zorgcoördinator (2C). Opvallend is dat leerkrachten

niet vermeld werden als bron van ondersteuning.

Gebruik externe ondersteuning

Enkel voor de scholen uit de experimentele condities was per definitie

externe ondersteuning aanwezig. Aanvullende vormen van externe

ondersteuning werden over het algemeen niet gezocht. Slechts één

respondent haalde aan een verkennend gesprek te hebben gevoerd met

een pedagogisch begeleider (1I). Enkele redenen voor het beperkt

aanspreken van pedagogische begeleiders werden aangegeven. Dezen

zouden over onvoldoende expertise en middelen beschikken om scholen

met dergelijke feedbackrapporten te begeleiden (1C, 1O) of zouden daarbij

onvoldoende oog hebben voor de pedagogische eigenheid van de school

(1C, 1O).

De respondenten uit de controlegroep die wel noemenswaardig

gebruikgemaakt hebben van de rapporten blijken meestal wel bijkomende

ondersteuning gezocht te hebben (3C), bestaande uit een algemene

studiedag over het onderzoeksproject, een informeel overleg binnen een

samenwerkingsverband van methodescholen, of een overleg met de

pedagogische begeleider en de schoolraad.

Chapter 6

150

4.2. Leren

Respondenten werden gevraagd de kennis, vaardigheden en attitudes

binnen de school te beschrijven. Er werd gekeken naar het gedeelde

potentieel binnen de school eerder dan naar de individuele eigenschappen

van de respondent. We kunnen niet eenduidig zeggen dat de

ondersteuningsinitiatieven hebben geleid tot sterk verbeterde

datageletterdheidscompetenties in deze studie. Hoewel de ONSET-groep er

het beste lijkt uit te komen, lijkt de INSET-groep niet te verschillen van de

controlegroep in competenties nodig voor het gebruik van hun

schoolfeedbackrapport.

Kennis en vaardigheden

De grootste tekorten doen zich voor op vlak van kennis en vaardigheden

nodig om de feedbackdata te interpreteren (2C, 5I, 2O), zelfs met de

aangeboden uitleg in het rapport.

De eerste keer dat ik er echt alleen mee op pad moest, was ik onzeker en

was het zeker niet duidelijk. (Respondent 6)

Andere tekorten in kennis en vaardigheden doen zich voor op het

overbrengen van de informatie naar het schoolteam (1C, 1O) of in

diagnosticeren en het plannen van acties (1C, 1I, 1O).

Maar dat is dus overal het probleem, bijvoorbeeld ook als je iets

aankaart bij een CLB [Centrum voor Leerlingenbegeleiding]. Ze doen

testen en ze stellen dat en dat vast. En hoe moeten we dan verder? Daar

geraken we dikwijls niet verder. Daar stopt het dikwijls. (Respondent 7)

Verschillende verklaringen voor deze beperkte datacompetenties komen

tijdens de interviews naar boven. Vooreerst geven respondenten aan dat de

bestaande voorkennis vrij beperkt is en niet verder reikt dan eenvoudige

statistieken bij klastoetsen (4I, 2O). Het ontbreken van deze voorkennis is

verder ook te wijten aan de opleiding waarbij er onvoldoende aandacht is

voor het leren gebruiken van data (1C, 1I).

Daar worden wij elke dag meer en meer mee geconfronteerd. Maar dat

vind ik persoonlijk ook een serieus mankement van de opleiding van

onderwijzers en onderwijzeressen, dat de mensen daar niet mee

vertrouwd zijn. Als je een aantal van die termen voorschotelt aan mijn

collega’s, die slaan achterover. (Respondent 16)

Chapter 6

151

Scholen die geen moeilijkheden ondervinden hebben dat meestal te danken

aan een uitgebreide voorkennis uit vooropleidingen of eerdere

werkervaringen (2C, 3O). Een andere verklaring voor het ontbreken van

deze datageletterdheid in sommige scholen zijn de directiewissels waarbij

de nodige kennis niet doorgegeven wordt aan de opvolger (1C, 1I).

Tenslotte is er op scholen een tijdsgebrek om

datageletterdheidscompetenties op te bouwen en om data te kunnen

interpreteren. Op die manier blijven ervaringen uit en wordt geen verdere

kennis over deze schoolfeedbackrapporten opgebouwd. Enkele

schoolleiders geven aan dat ze waarschijnlijk wel in staat zijn het rapport

correct te interpreteren indien ze daarvoor voldoende tijd kunnen en/of

willen vrijmaken (2C, 1I, 1O).

Ik erken dat ik daar eigenlijk geen tijd in wil steken. Ik heb andere dingen

die ook moeten gebeuren en dan vind ik dat dit te veel tijd vraagt in

verhouding. (Respondent 18)

Houding ten aanzien van schoolfeedback

De positieve houding bij schoolleiders en zorgcoördinatoren is vooral te

danken aan de groeiende interesse voor objectieve meetinstrumenten die

de leerwinst in kaart brengen en een vergelijking met een referentiegroep

mogelijk maken (4C, 2I, 5O). Volgens de respondenten zou die houding bij

leerkrachten een stuk negatiever zijn (3C, 3I, 3O). Deze negatieve houding

bij leerkrachten zou ondermeer toe te schrijven zijn aan de grote taaklast

bij de dataverzameling (1I, 2O), een ongerustheid om negatief uit de

resultaten te komen (1C, 2O), en het als bedreigend ervaren van externe

evaluaties (1I). Leerkrachten zouden bovendien een voorkeur hebben

feedback op leerlingenniveau (1C, 1I).

Elke onderwijzer of onderwijzeres heeft puntjes, heeft een puntenboek,

een Excel-werkmap en noem maar op. Dat zijn allemaal individuele

resultaten van de kinderen. Het gaat over de kinderen zeggen zij, en de

kinderen zijn belangrijk. Maar voor mij is de school belangrijk. Over de

individuele kinderen heen kijken naar de prestaties van een school of van

een groep binnen de school is niet evident voor ons. (Respondent 1)

Bepaalde respondenten relativeren het nut van de feedbackrapporten

door te wijzen op de beperkingen. Daarbij verwijzen ze naar de beperkte

bewijskracht van de feedback waarbij slechts één cohorte leerlingen

gevolgd werd (1C, 1I, 1O) die bovendien soms door leerlingenmobiliteit

behoorlijk onstabiel is (1I). Bovendien doet ook de inhoudelijke overlap met

Chapter 6

152

andere beschikbare gegevensbronnen (3I) de meerwaarde van deze

schoolfeedbackrapporten in vraag stellen. Daarnaast onthullen de

beweegredenen om deel te nemen aan het Schoolfeedbackproject iets over

de houding van de respondenten. Enkelingen nemen bijvoorbeeld enkel

deel aan dit onderzoek omdat ooit het engagement aangegaan is, al dan

niet door een vorige directie (2C, 2I, 1O).

Ik vind het zelf jammer. Mijn voorganger is hiermee, om welke reden dan

ook, mee gestart. Ik kan hem jammer genoeg niet meer vragen waarom.

(…) Moest ik daar helemaal vanaf het begin mee gestart zijn, dat zou ik

er zelf ook wel voor gekozen hebben om het samen met het team te

dragen. Dan zou het er een stukje anders uitzien. (Respondent 8)

4.3. Gedrag

Uit Figuur 2 blijkt dat de sterkte van datacompetenties positief samenhangt

met de sterkte van gebruik. Indien we op zoek gaan naar verschillen tussen

de conditiegroepen in feedbackgebruik, dan uiten deze zich vooral in de

intensiteit van het lezen en bespreken, interpreteren en diagnosticeren van

de feedbackinformatie, ten verdienste van respectievelijk de ONSET-,

INSET- en controlegroep. Wat opvalt, is dat de scholen die teamleden het

verregaandst bij deze processen betrekken, allen uit de experimentele

condities komen.

Fasen in schoolfeedbackgebruik

Het goed ontvangen van de schoolfeedback lijkt een vanzelfsprekendheid

maar blijkt dat niet steeds te zijn. Zo moesten twee scholen niet eens aan

verdere plannen denken, omdat de feedback nooit uit de mailbox van de

schoolleider is geraakt (1C, 1I). Bijgevolg kunnen de fasen van lezen,

interpreteren en diagnosticeren enkel in de andere scholen in kaart worden

gebracht. Slechts enkele schoolleiders kiezen ervoor alle leerkrachten actief

te betrekken bij deze fasen (2O). In de andere gevallen worden

leerkrachten enkel op de hoogte gebracht van de resultaten in een

personeelsvergadering (3I, 4O), via individuele besprekingen (1C, 1O) en/of

door rapporten vrijblijvend ter inzage aan te bieden (2C, 2I, 1O). In

bepaalde gevallen worden resultaten eerst apart behandeld in een

kernteam alvorens ze via een personeelsvergadering mee te delen (1I, 2O).

Over het algemeen blijft de informatieverspreiding bij deze groep

respondenten erg beperkt, zowel naar het aantal betrokkenen als naar de

aard van de informatie toe.

Chapter 6

153

Ik krijg het binnen, ik bekijk het, ik stel het voor aan de leerkrachten en ik

stel het voor op de personeelsvergadering. Daar houdt het meestal ook

op, veel verder gaat het niet. (Respondent 16)

Diegenen die ervoor kiezen de resultaten niet actief in het hele team te

verspreiden hebben daar verschillende redenen voor. Sommigen hebben

nog niets gedaan met de resultaten (2C, 1I) of verspreiden nooit dergelijke

informatie (1C). Anderen zijn van mening dat de resultaten wegens

leerkrachtenwissels (1C) of leerlingenmobiliteit geen valide beeld geven (1I)

of voelen zich te onzeker over de interpretatie en gebruiksmogelijkheden

(2C, 1I).

Eerlijk gezegd is dit voor mij heel moeilijk om dat juist in te schatten. Dat

vertel ik ook niet aan mijn leerkrachten omdat die dan misschien denken

dat ik foute informatie geef terwijl dat ik denk dat het heel belangrijk is,

maar ik kan het op dit moment niet juist inschatten. (Respondent 16)

De resultaten illustreren dat slechts een minderheid van schoolleiders

toekomt aan het plannen van acties (2C, 1I, 4O). Uit iedere conditie blijkt

slechts één school overgegaan te zijn tot het implementeren van acties. Niet

verwonderlijk is dat geen enkele school reeds toegekomen is aan het

evalueren van de uitgevoerde acties.

Soorten schoolfeedbackgebruik

Het voorafgaande negatieve beeld vereist nuancering. De

feedbackgegevens kunnen namelijk een invloed hebben op de

schoolwerking zonder meteen uit te monden in concrete acties. Dat blijkt

ook uit de resultaten, aangezien er meer sprake is van een conceptueel,

symbolisch en strategisch gebruik dan van een instrumenteel gebruik. Zo

rapporteert twee derde van de respondenten conceptueel gebruik (3C, 4I,

5O). Enkele waardevolle illustraties zijn het nauwgezetter gaan kijken naar

resultaten (1C), het waakzamer zijn bij mindere resultaten (1C), het oordeel

aanpassen over individuele leerkrachten n.a.v. goede resultaten (1C), het

leren denken in leerevoluties in plaats van in aparte leerjaren (1C, 1I, 2O),

het verruimen van de blik door de vergelijking met een referentiegroep (1I,

1O) en het genuanceerder kijken door gecorrigeerde scores (2I, 1O).

Sommige scholen zijn overgegaan tot acties en rapporteren

instrumenteel gebruik (1C, 1I, 1O) zoals de beslissing om te werken aan de

schrijfmotoriek van de kinderen, om niveaulezen en leesmoeders in te

voeren en om de aanpak van begrijpend lezen te veranderen.

Daarnaast blijkt dat schoolleiders de resultaten gebruiken uit

strategische doeleinden voor de onderwijsinspectie (3C, 4I, 2O). Soms

Chapter 6

154

gebeurt dit op een manier waarbij eerder het accent ligt op verantwoording

dan op schoolverbetering (3I).

De inspectie is verzot op het outputdossier en ik heb een heel kaftje met

allerlei gegevens in en dat is daar een onderdeel van. Op een bepaald

moment kregen die mannen dat onder ogen. (…) Die vragen altijd om al

het materiaal te geven dat je hebt en die kaft was daar ook bij. (…) In die

kaft zitten allerlei gegevens die ik heb over de kinderen en dat is voor hen

een stokpaardje en dat past daar perfect in. Daar heb ik goed mee

gescoord ondanks het feit dat ik het niet begreep. (Respondent 16)

Nagenoeg alle scholen die nog geen onderwijsinspectie over de vloer

kregen (/ in Figuur 2) geven aan dat ze de resultaten zouden voorleggen

tijdens een doorlichting (2C, 2I, 2O). In een enkele school werden de

rapporten gebruikt als vorm van publiciteit om leerlingen aan te trekken

(1O). Niet iedere schoolleider staat daar echter voor open (2C).

Misschien zijn er scholen die dat wel zouden willen gebruiken moesten ze

allemaal zo heel hoog boven de curve uitsteken maar ik vind dat niet

direct een goede manier om ouders of buitenstaanders om de oren te

slaan met die grafieken en met dat cijfermateriaal. (Respondent 2)

Heel wat schoolleiders gebruiken schoolfeedbackrapporten op een

symbolische manier. Bestaande argumenten worden dan bijvoorbeeld

kracht bijgezet door de resultaten (1C, 4I, 4O). Zo trachtte een schoolleider

zijn teamleden ervan te overtuigen dat het niet is omdat kinderen

anderstalig zijn, dat ze geen hoge scores kunnen behalen (1I). Één specifiek

voorbeeld gaat niet over het overtuigen van leerkrachten maar wel van

ouders. De school is ervan overtuigd dat leerlingen duidelijk leerwinst

maken en daarom gestimuleerd moeten worden om volgens capaciteiten

een studierichting te kiezen in het secundair onderwijs (1O). Nog een

andere schoolleider haalt aan dat deze resultaten enkel gebruikt zijn omdat

ze aansloten bij eerdere bevindingen van de school (1O). Daarnaast kan

symbolisch gebruik ook inhouden dat resultaten doelbewust niet in team

besproken worden omdat dit op dit moment niet constructief zou zijn voor

de schoolwerking (1C).

Respondenten uit de twee experimentele groepen blijken de resultaten

meer op een symbolische manier te gebruiken. Dat geldt ook voor het

motiverend gebruik van de schoolfeedback. Leerkrachten krijgen

bijvoorbeeld een schouderklopje en bevestiging van het goede werk (1C, 3I,

4O) en/of net een signaal om verder iets te doen met de mindere

resultaten (1I, 2O).

Chapter 6

155

Wij hadden altijd wel het idee van als we kijken naar ‘de grondstoffen’

die we binnenkrijgen en de kwaliteit van ‘de grondstoffen’, en zien wat

we afwerken, dan moeten we zeggen: “Kijk we hebben toch wel goed

werk geleverd”. Maar dat was altijd op basis van een gevoel. En nu

eindelijk hebben we die houvast, doordat het wordt bevestigd door

onderzoek. (Respondent 11)

De feedback blijkt ook zelfvertrouwen te kunnen geven aan

schoolteamleden, door te bevestigen dat de school het goed doet (1C, 1O).

In sommige scholen waar ook mindere resultaten werden geboekt, werden

enkel de positieve resultaten benadrukt, precies om te werken aan een

positieve houding van het team ten aanzien van schoolfeedback (1I) of uit

schrik om leerkrachten onterecht met de vinger te wijzen (1I).

4.4. Resultaten

De uiteindelijke bedoeling van ondersteuning bij feedbackgebruik is bij te

dragen tot schoolverbeteringseffecten. Een half jaar na het ontvangen van

het feedbackrapport blijken enkele scholen reeds waardevolle effecten te

rapporteren (2C, 1I, 2O). Er kan echter geen duidelijk verband aangetoond

worden tussen de bereikte effecten en de drie onderzoekscondities.

Wanneer we deze effecten nader bekijken zien we dat er mede dankzij

het gebruik van dit rapport een grotere alertheid is gegroeid bij

leerkrachten voor het uitvallen van leerlingen (1C), leerkrachten een

duidelijker beeld hebben gekregen van de evolutie van de leerlingen (1C),

er meer vertrouwen is in de werking van de school (1C, 2O) en een

kritischere houding kwam tegenover de eigen schoolprestaties (1I). Twee

scholen voelen zich dankzij de positieve resultaten heropgewaardeerd in de

buurt (2O).

Maar een naam of een faam die een school heeft in een buurt

veranderen is heel moeilijk. En met contacten buiten, met ouders, komt

dat nu nog geregeld ter sprake van: “Kijk, is dat wel een goede school?

Zijn jullie wel goed bezig? Zou ik mijn kinderen niet beter naar een

andere school doen?” En leerkrachten twijfelden vroeger dan voor een

stuk aan hun eigen kunnen. Nu zijn ze daar ook veel directer in en gaan

ze in contact met ouders ook veel meer durven zeggen van: “Neen, wij

zijn goed bezig, wij hebben onze resultaten”. (Respondent 11)

Daarnaast doen zich echter ook ongewenste of onvoorziene effecten

voor. In één school leidde het invoeren van het schoolfeedbacksysteem tot

teaching to the test (1C), ook al ging dat tegen de visie van de schoolleider

in.

Chapter 6

156

Maar wie bedriegen ze daar uiteindelijk mee? Zichzelf toch! Je gaat toch

als leerkracht toch niet naar die toetsen werken of je gaat ze toch geen

gelijkaardige test geven zodat ze volgende week goed zouden scoren?

Dan vallen die gewoon uit als ze in het middelbaar onderwijs komen. Dan

ben je als school toch ook niet meer geloofwaardig met de resultaten die

je naar voor brengt? (Respondent 7)

In een andere school leverde het toetsen van de leerlingen een gevoel van

teleurstelling en demotivatie op omdat de resultaten minder goed bleken

dan verwacht (1O).

5. Discussie en conclusie

5.1. Schoolfeedbackgebruik en ondersteuning

Vooreerst wijzen de onderzoeksresultaten op een grote variatie in de

manier waarop scholen vormgeven aan schoolfeedbackgebruik. Ook de

effecten van dit gebruik zijn zeer divers. In de volgende alinea’s

concluderen we dat in het verklaren van de verschillen tussen scholen de

theoretische verwachtingen in grote lijnen bevestigd werden. Daarbij

werden twee variabelen nader bekeken.

Een eerste variabele betrof de datageletterdheidscompetenties om met

het onderzochte schoolfeedbackrapport aan de slag te gaan. Over het

algemeen heeft de meerderheid van de respondenten nog moeite met de

interpretatie van de data. Als het in deze stap misgaat, is een verder

succesvol gebruik niet gegarandeerd (Earl & Fullan, 2003). Zo stelt Bandura

(1977) dat het geloof in eigen kennis en vaardigheden belangrijk is om tot

actie over te gaan. Ook voor de volgende fasen in gebruik blijken beperkte

competenties vooralsnog een rem zijn.

Competenties bestaan naast kennis en vaardigheden ook uit attitudes.

Daarvoor werd gepeild naar de houding van de respondenten ten opzichte

van schoolfeedbackgebruik, wat ook geen overwegend positief verhaal

opleverde. De eerdere bevinding dat schoolleiders een positievere houding

hebben dan leerkrachten werd door deze studie bevestigd (Vanhoof, Van

Petegem, & De Maeyer, 2009; Zupanc et al., 2009). Leerkrachten hebben

blijkbaar minder de kans om de meerwaarde en functionaliteit van

schoolfeedbackgebruik te ervaren maar worden wel geconfronteerd met de

lasten van de dataverzameling (Ingram, Louis, & Schroeder, 2004;

Verhaeghe et al., 2010). Ze zijn minder vertrouwd met het gebruik van

gegevens op schoolniveau en vinden dat de resultaten op groepsniveau te

Chapter 6

157

veraf staan van hun activiteiten op klasniveau (Schildkamp & Kuiper, 2010;

Zupanc et al., 2009). Bovendien komt daar nog eens het bedreigende

karakter van externe evaluaties bovenop, wat angst inboezemt voor

individuele evaluaties (Ingram et al., 2004), ook in het geval wanneer het

schoolfeedbackgebruik in het teken van zelfevaluatie eerder gericht is op de

schoolwerking dan op aparte individuen (Kyriakides & Campbell, 2004).

De tweede onderzochte variabele betrof ondersteuning bij

schoolfeedbackgebruik. Ondersteuning werd in deze studie

geoperationaliseerd in een INSET- en ONSET-conditie (Gardner, 1995). Door

middel van gecontroleerde experimentele ondersteuningsinterventies

bleek het mogelijk om differentiële effecten in de verschillende condities te

onderzoeken. Ook al was de opzet beperkt door zijn eenmalige interventie,

kleinschaligheid en verkennende resultaten, toch bood dit gecontroleerde

design enkele waardevolle inzichten. Aan de hand van Kirkpatricks

evaluatiemodel voor trainingsinitiatieven (1998) werden interventie-

effecten op vier niveaus beschreven.

Op het reactieniveau kunnen we zeggen dat de tevredenheid over de

genoten ondersteuning groter was indien meer ondersteuning er genoten

werd. De respondenten uit de controle- en INSET-groep gaven aan niet

actief naar interne en externe ondersteuning gezocht te hebben. ONSET-

deelnemers deden vaker beroep op schoolteamleden, waarschijnlijk

doordat de zorgcoördinator ook betrokken was geweest in de ONSET-

interventie. Deze respondenten drukten dan ook de grootste mate van

tevredenheid uit. Deze resultaten indiceren dat de respondenten eerder

een aanbodgerichte houding voor ondersteuning aannemen aangezien

spontaan zeer beperkt actief beroep gedaan wordt op schoolteamleden of

externe ondersteuningsdiensten. Verder is het opvallend dat leerkrachten

zelden gezien worden als ondersteuningsbron. Verschillende verklaringen

kwamen uit de resultaten naar voor. Zo hebben leerkrachten minder de

mogelijkheid om zich van hun drukke taakschema los te maken (Huffman &

Kalnin, 2003; Ingram et al., 2004) en houden ze er een minder positieve

houding ten opzichte van schoolfeedbackgebruik op na (Ingram et al., 2004;

Zupanc et al., 2009). Zorgcoördinatoren daarentegen worden wel

aangesproken omdat zij vaak over de nodige

datageletterdheidscompetenties beschikken door hun ervaring in het lezen

en interpreteren van data uit testen en leerlingvolgsystemen. Bovendien

valt schoolfeedbackgebruik te plaatsen onder hun taak van zorgcoördinatie

op school.

Bij het leerniveau werd de invloed bekeken van ondersteuning op de

nodige datacompetenties om met het rapport om te gaan. De ONSET-groep

kwam er als beste uit, gevolgd door de controlegroep. Op basis van deze

Chapter 6

158

bevindingen besluiten dat beide ondersteuningscondities eenduidig een

positief effect gehad hebben op het schoolfeedbackgebruik is dus

voorbarig. Daarom is het nodig om ook naar het volgende niveau te kijken,

waarbij de transfer van de geleerde inzichten uit de ondersteuning op de

organisatie wordt bekeken. We zien voor de experimentele groepen een

zichtbaar voordeel voor de lees- , interpretatie- en diagnosefase. Slechts

enkelen gaan over tot acties, waarbij geen duidelijk verschil tussen de

condities uitgemaakt kan worden. Dit houdt in dat slechts beperkt

instrumenteel gebruik waargenomen wordt. We nemen echter wel een

verscheidenheid in conceptueel, symbolisch, strategisch en motiverend

gebruik waar. Net zoals een verandering in denken kan leiden tot een

verandering in handelen gaat een conceptueel gebruik een instrumenteel

gebruik vooraf (Schildkamp & Teddlie, 2008; Vanhoof, Verhaeghe,

Verhaeghe, Valcke, & Van Petegem, in druk). Deze resultaten zijn in die zin

hoopgevend aangezien zo schoolfeedbackgebruik misschien geleidelijk aan

ingang vindt in Vlaamse scholen. Echter, om dit gebruik op een hoger

niveau te tillen en te integreren in bestaande kwaliteitszorg zijn bijkomende

ondersteuning en middelen nodig.

Gegeven de beperkte gebruiksresultaten en de eerder beperkte tijd

tussen de geboden ondersteuning en de dataverzameling, is slechts beperkt

sprake van schoolverbeteringseffecten. Dit houdt bijgevolg in dat geen

verschillen tussen condities gevonden kunnen worden.

5.2. Praktijkimplicaties

Uit deze onderzoeksresultaten volgt dat bij het opzetten van

ondersteuningsinitiatieven bij schoolfeedbackgebruik, vooraf best een

grondige behoefteanalyse gebeurt om zicht te krijgen op de

ondersteuningsnoden van schoolleiders en hun teamleden. Vermits de

ONSET-conditie over de gehele lijn er als beste uitkomt in dit onderzoek,

kan dit implicaties hebben voor de opzet van ondersteuning. Door

ondersteuning aan te bieden op de eigen school, aan de hand van de eigen

data, met het eigen team wordt blijkbaar het best op deze

ondersteuningsbehoeften ingespeeld. Dat geeft aan dat een persoonlijke

ondersteuning op maat verkozen wordt boven een veralgemeende aanpak

in studiedagen. Aansluitend bij deze werkwijze kan verwezen worden

beschikbare literatuur rond het opzetten van collaborative data teams

(Huffman & Kalnin, 2003; Lachat & Smith, 2005; Wayman et al., 2007). Voor

de ondersteuners houdt dat ondermeer in dat ze duidelijk zicht moeten

hebben op de schoolsituatie en in staat moeten zijn hun ondersteuning op

maat af te stellen. Aanvullend dient benadrukt te worden dat

Chapter 6

159

ondersteuning niet mag ophouden bij de interpretatie van de gegevens.

Scholen zouden minstens een aanzet moeten krijgen om met de gegevens

aan de slag te gaan.

Verder dient men bij het opzetten van schoolfeedbacksystemen in acht

te nemen dat ondersteuning niet zomaar de sleutel tot succesvol gebruik is.

Schoolleiders percipiëren namelijk niet enkel een gebrek aan

datageletterdheidscompetenties, maar eveneens een gebrek aan tijd.

Schoolfeedbackgebruik wordt daardoor niet geïntegreerd in een

systematisch reflecteren over de schoolwerking. Extra middelen voor zowel

beleidsmakers als leerkrachten om tijd voor deze taak vrij te maken kunnen

als een voorwaarde gezien worden. Eveneens zou meer aandacht in de

opleidingen voor leerkrachten en schoolleiders voor deze kwestie een

bevorderende factor kunnen zijn.

5.3. Wetenschappelijke relevantie en implicaties voor vervolgonderzoek

Vooreerst heeft deze opzet aangetoond dat Kirkpatricks model bruikbaar is

voor verdere toepassing in onderzoek over ondersteuning bij datagebruik.

Verder leverde deze studie een waardevolle poging om binnen deze context

een gecontroleerde veldstudie op te zetten, wat nieuw is voor dit

onderzoeksdomein. Voor vervolgonderzoek kan aanbevolen worden om

deze onderzoekslijn verder uit te bouwen door quasi-experimenteel

onderzoek op te zetten in educatieve contexten waar

schoolfeedbackgebruik reeds verder uitgebouwd is. Zo dienen de mogelijke

differentiële effecten te verklaren door gebruikersgebonden kenmerken

verder onderzocht te worden. In aanvulling op deze studie kunnen

kwantitatieve gegevensverzamelingen daartoe aangewend worden (bv.

Vanhoof et al., in druk). Daarnaast bevelen we aan om de effecten van

langetermijnondersteuning na te gaan. Longitudinaal onderzoek kan helpen

verklaren of de gevonden verschillen tussen condities deels te wijten zijn

aan de genoten ondersteuning of enkel aan verschillen tussen gebruikers.

Dit neemt niet weg dat studies rond eenmalige ondersteuningsinitiatieven

de nodige aandacht verdienen, zowel omdat ze een realiteit zijn in

educatieve settings alsook omdat bijvoorbeeld deze onderzoeksresultaten

waardevolle invloeden rapporteren. Daarbij is er best aandacht voor zowel

korte- als langetermijneffecten, alsook voor effectgerichte en

procesgerichte resultaten (Schildkamp & Teddlie, 2008; Schildkamp et al.,

2009).

Chapter 6

160

Literatuur

Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral

change. Psychological Review, 84(2), 191-215.







Earl, L., & Fullan, M. (2003). Using data in leadership for learning.

Cambridge Journal of Education, 33(3), 383-394.

Fitz-Gibbon, C.T., & Tymms, P. (2002). Technical and ethical issues in


Policy Analysis Archives, 10(6), 68-83.




Gonczi, A. (1994). Competency based assessment in the professions in

Australia. Assessment in Education: Principles, Policy & Practice, 1(1), 27-

44.




Evaluation.

Huffman, D., & Kalnin, J. (2003). Collaborative inquiry to make data-based


Ingram, D., Louis, K.S., Schroeder, R.G. (2004). Accountability policies and

teacher decision making: Barriers to the use of data to improve practice.

Teachers College Record, 106(6), 1258-1287.




Education, 112, 496-520.



Kurasaki, K.S. (2000). Intercoder reliability for validating conclusions drawn

from open-ended interview data. Field Methods, 12(3), 179-194).

Kyriakides, L., & Campbell, R.J. (2004). School self-evaluation and school



Chapter 6

161



349.



data use at learning point associates. Opgehaald op 23 oktober 2007,

van http://www.learningpt.org/pdfs/datause/guidebook.pdf

Leithwood, K., & Aitken, R. (1995). Making schools smarter: A system for

monitoring school and district progress. Newbury Park, CA: Corwin.

Lindlof, T.R., & Taylor, B.C. (2002). Qualitative communication research

methods (2nd ed.). London: Sage.

Maes, F., Van Petegem P., & Van Damme, J. (2005). Schoolloopbanen in het

basisonderwijs (SiBO): Doelstellingen en onderzoeksopzet. Paper

gepresenteerd op de Onderwijs Research Dagen, Gent, België.

Mathison, S. (1992). An evaluation model for inservice teacher education.

Evaluation and Program Planning, 15, 255-261.


expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage.

Murnane, R.J., Sharkey, N.S., & Boudett, K.P. (2005). Using student-

assessment results to improve instruction: Lessons from a workshop.

Journal of Education for Students Placed at Risk, 10(3), 269–280.



perspective (pp 3-16). Oxford, UK: Elsevier Science.






Sussex.












Chapter 6

162


indicatoren als strategisch instrument voor schoolontwikkeling.

Pedagogische Studiën, 81, 338-353.

Vanhoof, J., Van Petegem, P., & De Maeyer, S. (2009). Attitude towards

school self-evaluation. Studies in Educational Evaluation, 35, 21-28.


(in druk).The influence of competences and support on school





Verhaeghe, J.P., & Van Damme, J. (2006). School performance feedback in

Vlaanderen, een schets op basis op van een projectvoorstel. Informatie

vernieuwing onderwijs (IVO), 27(103), 19-27.

Visscher, A. J. (2002). A framework for studying school performance



Swets & Zeitlinger.


performance feedback. Lisse, The Netherlands: Swets & Zeitlinger.




Wayman, J.C., Midgley, S., & Stringfield, S. (2007). Leadership for data-

based decision making: Collaborative educator teams. In A.B. Danzig, K.

M. Borman, B.A., Jones & W.F. Wright (Eds.), Learner-centered



Weiss, C.H. 1998. Have we learned anything new about the use of


Webber, S., & Johnston, B. (2000). Conceptions of information literacy: New

perspectives and implications. Journal of Information Science, 26(6),

381-397.



Research, 49(2), 185-206.

Young, V.M. (2006). Teachers’ use of data: Loose coupling, agenda setting,

and team norms. American Journal of Education, 112, 521-547.



Chapter 6

163



164

CHAPTER 7

GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK

Chapter 7

165

CHAPTER 7: GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK

1. Introduction

In this final chapter of this doctoral dissertation on school performance

feedback, an overall reflection is presented about the outcomes of the

different studies. By resuming, integrating and summarizing these results, a

comprehensive picture is developed in relation to the research objectives

(RO). In addition, a general discussion is provided. The latter also requires

us to discuss the limitations of the different studies, and directions for

future research. After giving an overview of theoretical, practical,

methodological and policy implications, we finally present a general

conclusion.

2. Overview of research objectives and main findings

2.1. RO1: Exploring the characteristics of SPFSs

Numerous school feedback initiatives have been set up to provide schools

with confidential information about the way they function. This is expected

to foster school improvement processes by inducing continuous self-

reflection at the school level. However, up to now, no systematic

description or inventory of SPFSs characteristics was available to inform

feedback users and/or designers. Given that SPFS characteristics may

influence the degree to which the feedback is actually used for school

improvement (e.g., Schildkamp & Visscher, 2009; Verhaeghe et al., 2010), it

is important that SPF designers and/or users consider in a critical way the

key features of an SPFS. To make decisions based on data, users need to

purposefully choose the type of SPFS that corresponds to their information

needs. This requires the availability of a transparent overview of specific

characteristics of available SPFSs, especially including their strengths and

weaknesses. In Chapter 2, a preliminary framework was developed for

describing and comparing SPFSs, which has been applied to five SPFSs. This

framework comprehends analytical aspects related to the data gathering

process, the data analysis approach, the content of the feedback report and

about the numerical measures and graphical representations being used.

The results of the surveys and in-depth interviews with directors of five

SPFSs illustrate the wide variety in both the feedback reports and the

underlying feedback system. Apparently, the SPFS designers did make

deliberate decisions related to their feedback design, considering the

Chapter 7

166

ethical, practical, technical and infrastructural possibilities and constraints

of the educational system in which they operate. With respect to the

quality criteria of performance indicators, this descriptive and analytical

study paid specific attention to the relevance, accuracy, cost-effectiveness,

fairness and beneficence of the feedback delivered to schools (Fitz-Gibbon,

1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp &

Teddlie, 2008; Visscher, 2002). These quality criteria introduce the presence

of several prerequisites, related to all components of an SPFS.

First, with regard to the data gathering process, several procedures are

built-in, in order to guarantee accurate (i.e. reliable and valid) data. Both

testing instructions (protocols) and structured measurement instruments

are supplied to the schools. An interesting observation in relation to some

of these instruments is the technological features that enable tailored

testing of pupils, at any moment, about any subject, at any place. The

integration of test item banks, IRT techniques, computer adaptive testing

and data compatibility with a school’s management information system

seems to be the most promising way to attain accessible and low stakes

testing. A clear example of the latter type of SPFS is the assessment Tools

for Teaching and learning, developed at the Auckland University in New

Zealand.

Next, our study focused on several aspects of the data analyses being

used by SPFSs: on (1) the underlying scaling models being used, (2) the data

analysis model, (3) the opportunities for longitudinal measurements, (4) the

inclusion of pupil mobility and (5) the levels of aggregation. A wide variety

in scaling procedures and statistical analyses could be observed in the

selected SPFSs. A key point of discussion is related to finding a balance

between statistical correct - and thus complicated - analyses and accurate

results on the one hand and understandable analyses and user friendly

results on the other hand. For example, the analyses used in PIPS

(Performance Indicators in Primary Schools; Centre for Evaluation and

Monitoring) are fairly straightforward and not too complex. Though this

underpins the user-friendliness of PIPS, it might also lead to less accurate

data as schools are sometimes wrongly classified due to the lack of a

multilevel analysis perspective (Goldstein & Spiegelhalter, 1996; Karsten,

Visscher, Dijkstra, & Veenstra., 2010). Furthermore, it is important to realize

that measuring can introduce types of error, of which users should be

informed (Fitz Gibbon & Tymms, 2002; Mortimore & Sammons, 1994;

Rowe, 2004; Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996;

Yang et al., 1999; Karsten et al., 2010). Finally, in view of the analysis of the

five SPFSs a discussion started in relation to different conceptions of value

added; in particular when it comes to a fair comparison of a school’s

Chapter 7

167

performance with a reference group. The discussion about value-added

illustrates that both the conceptualization and operationalization of this

concept is highly problematic, due to actual constraints (e.g., pupil

mobility), ethical constraints (e.g., adjustment for pupil characteristics),

technical constraints (e.g., model complexity), and practical constraints

(e.g., immeasurability of variables).

Next, the feedback content of the SPFSs has been analyzed in order to

evaluate the data relevance. The SPFSs in this study focus mainly on a

limited number of cognitive outcomes (e.g., in relation to language,

mathematics and/or science), which are part of the core curriculum in most

countries. Developers of SPFSs might consider how to include other subject

areas in the SPFSs, as well as more attitudinal, behavioral and contextual

information, because the latter is critical when school staff is expected to

make data-driven improvement decisions. They will need a broader range

of data (Schildkamp & Kuiper, 2010). When analyzing the feedback content,

the analysis of the five SPFSs also centered on the numerical measures and

graphical representations being used. We could observe the use of a wide

range of numerical measures, comprising adjusted, expected, predicted and

raw scores. Examples are band scores, cut-off scores, grade scores, learning

gain scores, mean scores, percentages, percentiles, rescaled scores,

standardized scores, and value-added scores. These measures and the

accompanying graphical representations assume a sufficient level of

assessment literacy of SPFS users in view of a correct understanding of the

feedback. However, research revealed that even simple numerical

conceptions and representations are often interpreted incorrectly (Earl &

Fullan, 2003; Zupanc et al., 2009). This raises the question whether

feedback suppliers also ought to provide specific support to guarantee that

the feedback delivered can lead to the desired school improvement

outcomes and cannot result in harmful effects (Fitz Gibbon & Tymms, 2002;

Rowe, 2004).

2.2. RO 2: Developing a framework for SPF use, including influencing factors

and effects

To develop a framework for SPF use (Chapter 3), we could build on a basic

model developed by Visscher (2002; Visscher & Coe, 2003). His framework

discerns four sets of factors influencing the use of the performance

feedback, including the design process and features of the underlying

SPFSs, the implementation process and the school organizational features.

This framework served as a basis for the studies conducted in this

dissertation, although some adaptations were made. Visscher and Coe

Chapter 7

168

embed the process of feedback use in the broader school environment,

which are defined as context-related factors in our framework.

Furthermore, we distinguish support-related factors as a separate set

instead of positioning it within the implementation process and

characteristics of the feedback system. As a result, the following set of

influential factors is outlined: Factors related to the educational context, to

school and users, to SPFSs, and to support.

The second major adjustment to the Visscher framework was a

refinement of the conceptualization of SPF use. In the framework of

Visscher, only types of SPF usage are discerned. In our approach, we

additionally discern phases in SPFS use (Verhaeghe et al., 2010): (1) the

reception of the feedback in a school, (2) the reading and discussing of the

feedback information in order to come to (3) an interpretation of the

school’s results, followed by (4) a diagnosis or the search for explanations

and (5) the planning of improvement actions, which are (6) implemented

and about which the outcomes are (7) evaluated. Finally, in Chapter 3, two

additional types of feedback use were added to the typology described by

Visscher (i.e. instrumental, conceptual, symbolic, and strategic use): a

motivating/stimulating and pupil-directed type of feedback use.

The study described in Chapter 3 verifies the components of this

updated framework by involving a sample of primary school principals,

actively engaged in the School Feedback Project in Flanders. Semi-

structured in-depth interviews and a predefined coding scheme were used

as qualitative instruments. This resulted in a validation of all framework

components and some additions to the framework. The updated and

validated framework influenced the other studies set up in the context of

this doctoral thesis. Key elements of this framework influenced the studies

presented in Chapters 5 and 6. These chapters discuss a quantitative (path

modeling) and a qualitative study (case ordered predictor-outcome meta-

matrix), building on an experimental design.

The following key findings result from the studies described in Chapter 3

and 6. Firstly, the context-related factors that influence school performance

feedback refer to the educational climate in which SPFSs are developed and

implemented. For example, in Flanders, it holds that there is no strong

pressure on data use, due to a lack of a national assessment policy, the lack

of central assessment system, the non-coercive role of the educational

inspectorate, and the autonomy granted to schools. As a result, no strong

data culture is observed in schools. Second, characteristics of the feedback

and related SPFSs also influence feedback use. As described in Chapter 2,

the perceptions of feedback users about the relevance, accuracy, cost-

effectiveness, fairness and beneficence of the feedback delivered to

Chapter 7

169

schools, will mainly determine what efforts will be made to put feedback

use into practice. With respect to relevance, we found that some

respondents of the School Feedback Project lack information about school

subjects other than mathematics and language. Furthermore, they mention

that aggregated feedback information is especially interesting for actors

involved in mesolevel school activities, in contrast to teachers who prefer

pupil level information that can be linked to microlevel interventions.

Concerning the feedback interpretability, our findings illustrate that

principals experience difficulties in interpreting feedback information.

Remarkably, feedback that ought to have a signalizing function was rather

perceived as being valid only when it matched prior conceptions and

experiences of its users. Thirdly, school- and user-related factors that play

an important role in the use of the feedback could be linked to users’ data

literacy and expectations about feedback use. Furthermore, also the

priorities in task schemes within a school and the perception of the school’s

performance level appear to influence feedback use. Principals state that

no clear expectations were defined prior to using the school feedback, that

their data-literacy skills are limited and that feedback data use is not a

priority. Furthermore, when feedback results were perceived as

unsatisfying, this seems to confirm the feedback intervention theory for

being willingly to reduce the gap between the observed and intended

outcomes (Black & William, 1998; Hattie and Timperley, 2007) or it resulted

in withholding feedback information by school principals to not discourage

their school staff. Fourthly, some support related factors had to be

discussed. Support needs are observed during the different phases of SPFS

use: from the interpretation phase to the implementation of improvement

actions. Principals suggested two scenarios to involve external support

services. These suggestions inspired the design of the INSET and ONSET

interventions discussed in Chapter 6.

The interview results, obtained in the studies in Chapters 3 and 6, show

that, in general, school feedback is not intensively used and has a limited

impact on the actual way schools function. Mostly, schools did hardly attain

the phase of planning future actions on the base of the school feedback.

This resulted – consequently - in a limited instrumental usage. However,

conceptual use was reported more often, which was also found in Chapter

5. This suggests that conceptions about SPF use starts to enter school

related discussions and it suggests that it starts to affect teacher thinking. In

the framework, attention is paid to the expected impact of school feedback

use. Thus far, only basic indicators for intermediate school improvement

effects could been observed; such as an increased interest of school staff in

feedback results, a decreased reluctance to start feedback team

Chapter 7

170

discussions, a clearer picture of the learning gains of pupils, more

confidence in the school’s functioning and an increase in the reflection on a

school’s performance. Next to the expected outcomes of school feedback

use, also unintended outcomes could be outlined. For example, feedback

did lead to an increase in teaching to the test or to feelings of

disappointment of school staff when confronted with unsatisfying pupil

performance.

2.3. RO 3 & 4: Exploring data literacy competences & effects of alternative

data representation modes on feedback interpretation abilities

The analysis of SPFS characteristics - in Chapter 2 - revealed a typical use of

complex numerical measures and graphical representations in the feedback

reports. Furthermore, findings from the study - described in Chapter 3 -

illustrated that the interpretation phase is one of the main stumbling

phases in the process of feedback use. This is in line with the discussions in

the literature about the limited data literacy competences of data users.

However, no empirical assessment of data literacy competences related to

SPF use has yet been carried out and reported in the literature.

Furthermore, to our knowledge, no research findings focusing on the

interaction of data literacy competences with the characteristics of SPFSs

have been published. This explains the relevance of the study reported in

Chapter 4 about the research objectives 3 and 4. Additionally, also the

research findings reported in Chapters 5 (n = 116) and 6 (n = 18) contribute

to studying RO 3, since they report about the data literacy competences of

feedback users.

An experimental design with a post-test was set up, focusing on two

alternative ways to explain value added, in combination with three

alternative approaches to represent learning gain and value-added. The

participants were freshmen in domain of the educational sciences, enrolled

at Ghent University (n = 312). Tests were calibrated (by IRT based

techniques) to assess both the ability levels of the students and the item

difficulty levels. Students were asked to assume the role of a school

principal who received a school performance feedback report based on the

results from a longitudinal study in which his/her school participated

(similar to the feedback reports produced by the “School Feedback

Project”). The students received an introduction to the central concepts and

were given a set of related graphical representations, developed and

presented via a PowerPoint-presentation. Subsequently, they were

requested to complete a knowledge and skill test related the interpretation

of school feedback (test reliability ranging from .72 to .90). Both conceptual

Chapter 7

171

(i.e., understanding central concepts) and procedural knowledge (i.e.,

deriving information from graphical representations) was tested.

The descriptive results in Chapter 4 indicate that users experience major

difficulties to successfully solve procedural value-added items (only 35 % of

respondents were able to do so). We can explain this by referring to the

cognitive load theory (Chandler & Sweller, 1991; Sweller, van Merriënboer,

& Paas, 1998), as high cognitive demands are posed on the users when

interpreting value-added scores. The working memory is not able to cope

with too much information at the same time. Examining the nature of the

errors the participants make when calculating value added, patterns could

be observed in the incorrect answers. This enabled us to reconstruct the

thinking process of participants and to identify basic misconceptions. A

typical misconception, made when calculating value added was for example

the confusion of the heights of curves with their slopes, also known as the

slope-height confusion (Beichner, 1994; Clement, 1989; Kramarski, 2004;

Leinhardt, Zaslavsky, & Stein, 1990). Furthermore, respondents mostly gave

correct answers to the conceptual questions related to the information that

was literally explained in the school feedback presentation (87% correct

answers). In contrast, low test scores were observed when the questions

required deep level conceptual thinking (24% correct answers).

In Chapter 5, data literacy competences of school principals,

participating in the School Feedback Project, were examined by means of

an IRT calibrated data literacy test and a self-report based survey

(indicators of self-efficacy with respect to data interpretation and the

consecutive diagnosis phase). The test used reflected a reliability of .83 and

consisted of items measuring the conceptual and procedural understanding

of the feedback reports. The data literacy test results reveal that only 42%

of the respondents answered half of the questions correctly, though some

school principals succeeded in interpreting all the information from the

report. Analysis of the difficulty of the literacy test items points out that

most principals experience difficulties in relation to procedural items. The

conceptual questions were apparently less difficult. Although test scores

were rather disappointing, most of the respondents reflected a positive

self-efficacy score relating to their ability to interpret and use the feedback

report (M = 3.81, SD =0.74). The unsatisfying data literacy skills are

reconfirmed when looking at the findings from the in-depth interviews in

Chapter 6 with school principals, participating in the School Feedback

Project. Even if elaborate explanations are provided within the feedback

reports, users encounter and report interpretation difficulties.

Furthermore, communicating the feedback findings to other staff members

or looking for explanations appear to be difficult for school principals.

Chapter 7

172

Regarding attitudes towards SPF use - another aspect of data literacy - the

studies reported in Chapters 5 and 6 show a positive attitude towards SPF

use. The scale results reported in Chapter 5 (range 1-6, M = 3.97, SD = 1.08,

α = .91) imply that feedback use is considered as a relevant activity that

fosters self-evaluation. However, school principals report a less positive

attitude of their teachers (Chapter 6), as they are confronted with

considerable demands related to the data collection, and that they may feel

threatened by the feedback results. Therefore they seem to prefer pupil

level information instead of aggregated school feedback data.

With respect to RO 4, central in the research reported in Chapter 4, we

can conclude that our findings confirm the research hypothesis that users

experience difficulties in interpreting complex conceptual and graphical

information, due to the interplay between the inherent complexity of SPF

and their lack of prior knowledge. We compared two alternative ways to

explain value added on the final understanding of the concept. This study

proved to be helpful to detect which alternative explanation facilitated a

better conceptual and/or procedural understanding. Explaining the concept

in terms of “the difference between observed and expected growth”

appears to be better than explaining it in terms of “the difference between

the school’s adjusted growth curve and the reference growth curve”. In

terms of the alternative graphical representations used in the experiment,

it is rather surprising that the tables did not add to the users’ understanding

of the feedback report. However, this does not imply that the use of tables

in combination with growth curves is not advisable. Previous research

indicates that different information is derived from tables and graphs

(Meyer, Shinar, & Leiser, 1997); both sources of information have merits,

depending on the task being performed (Schnotz & Bannert, 2003). An

appropriate use of tables and graphs can therefore avoid extraneous

cognitive load and foster a better understanding.

2.4. RO 5: Exploring effects of support on SPF use

The research findings in the studies reported in Chapters 2, 3 and 4

demonstrate that one of the main stumbling blocks in SPF use is the

interpretation phase, primarily due to a lack of data literacy competences.

This finding raises the question for appropriate support initiatives, in view

of SPF use. Therefore, a field experiment with post test (n = 195) was set

up, building on the insights developed during the previous studies. This

resulted in an experimental study, reported in Chapters 5 (IRT testing,

survey research, path modeling) and 6 (in-depth interviews, case ordered

predictor-outcome meta-matrix). In both studies, participants were

Chapter 7

173

principals of schools involved in the School Feedback Project. The support

initiatives that were studied encompassed an INSET (inservice education

and training) and an ONSET (onservice education and training) initiative.

The INSET and ONSET approach can be positioned on the continuum

reported by Gardner (1995). Both support initiatives did build on

suggestions of school principals (see Chapter 3), as a solution to the data

interpretation difficulties they encountered. This helped to distinguish

three research conditions to which respondents were randomly assigned:

an INSET (n = 23), an ONSET (n = 7) and a control group (n = 150). In Chapter

5, the results of the INSET and control group have been presented. The

INSET intervention included a half-a-day workshop about SPF interpretation

and use, using a fictitious feedback report as instructional material, and

organized in a university building. In contrast, the ONSET intervention was

organized in the school of the principal, where his/her own school feedback

information was discussed. In view of the evaluation of the differential

support effects, we built on Kirkpatrick’s (1998) four levels in the evaluation

of training initiatives. First, the Reaction level, refers to the extent in which

participants are satisfied about the support initiative. Next, the Learning

level examines the increase in knowledge and skills and the change in

attitudes. This was studied by examining whether the support did

contribute to an increase in data literacy competences. Third, the Behavior

level checks the transfer of what has been learned to the local organization.

In our studies, this focus on the Behavioral level encompasses the influence

of the support intervention on the phases in SPF use and types of SPF use,

as defined in our theoretical framework in Chapter 3. The fourth level – the

Results level - refers to the effects of the support initiative on achieving the

organization’s aims and on the organization itself. This was measured by

asking for the perceived (school improvement) effects of SPF use.

The study about the impact of the INSET approach - as compared to the

control group - was reported in Chapter 5. A path model (X² (df) = 11.3 (13),

p = 0.58; RMSEA = 0.01; AGFI = 0.92; GFI = 0.97) was tested to check

whether principals in the INSET research condition attained significantly

higher data literacy competences (attitudes, knowledge and skills, and self-

efficacy), reported a higher extent of feedback use and reported a higher

number of perceived effects. Building on our theoretical framework, we

expected the INSET initiative to affect the SPF-related competences in a

direct way. This hypothesis was only partly confirmed since the support

provision did have a statistically significant effect on the mastery of

knowledge and skills and on self-efficacy, but not on the attitudes related to

SPF use. The impact on self-efficacy remained limited. This can be explained

by the limited scope of the support initiative, the raise in awareness about

Chapter 7

174

the complexity of school feedback, or by the quality of the support

intervention.

The path model test results also reveal no significant direct impact of

support on phases in SPF use, types of SPF use and no significant impact on

the perceived effects of school performance feedback use. Only indirect

effects of support on these variables are found. These indirect effects are in

line with Kirkpatrick’s (1998) model, implying that a higher level can only be

achieved if lower levels have been attained. Specific for our study, this

means that the phases in SPF use (Level 3), types of SPF use (Level 3) and

the resulting school improvement effects (Level 4) will only be influenced

by a particular support intervention, when this support had a prior effect on

the data literacy competences (Level 2) of its users.

The qualitative study, focusing on both the INSET and ONSET training

provision (Chapter 6), checked the impact on the following dependent

variables; satisfaction with the training (Level 1), data literacy competences

(Level 2), SPF use (level 3), and perceived effects of feedback use (Level 4).

ONSET participants report a higher satisfaction level, and attain a higher

data literacy competence level. Differences in feedback use could be

observed; and this in relation to the phases of reading and discussing,

interpreting and diagnosing. This differential impact can be linked to the

content of the ONSET approach. These contents were less prominent in the

INSET condition and were lacking in the control condition. No differences

were found in effects of data use.

3. General discussion: “Mirror, mirror on the wall”

Data-driven decision making is a buzz words that recently entered the

educational jargon (cf. the fancy abbreviation “D3M”). The related usage of

concepts, such as learning gain, output measurement, value added, etc. is

overwhelming to the (often) statistically less literate school staff. However,

teachers and school principals are supposed to master these concepts. This

expectation is implicitly in the way educational authorities and related

educational quality assurance systems (e.g., the inspectorate) position

policy papers that underline autonomy, accountability, and continuous

school improvement. Central in the discourse about school improvement is

the creation of data-rich environments that inform schools about their

functioning. It is in this context that SPFSs become important and help to

present a mirror for each school. “Mirror, mirror on the wall, are we doing

well at all?” is the central question that has to be answered by school staff.

The motive and need to look into the mirror is not personal vanity, but

Chapter 7

175

either external pressure (cf. accountability) or an internal motivation (cf.

school improvement) or a combination of both driving forces. Instead of

getting “wrinkled by age”, schools are expected to look better by being able

to close the gap between observed and the desired outcomes (Black &

William, 1998; Hattie & Timperley, 2007; Kluger & DeNisi, 1996). SPFSs,

therefore, should pinpoint the strengths and weaknesses in school

performance.

In order to be effective, school feedback should be helpful to answer

three questions (Hattie & Timperley, 2007; Hattie, 2009): Where am I going

(Feed up)? How am I going (Feed back)? Where to go next (Feed forward)?

The first question refers to the learning intentions and goals, the learning

targets and expectations underlying the curricula. The second question asks

to what extent the school attains its targets, while the answer on the third

question offers directions for future action. The literature about the impact

of current SPFSs and an analysis of the nature of the SPFSs, indicate that

current SPFSs are mainly geared to answer the second question (Feed

back). Additionally, simply receiving feedback will – as such - not guarantee

that the feedback will be used. Several participants in our studies

mentioned that they would like to receive additional information; e.g.,

concrete improvement indications and concrete directions for

improvement actions. The latter implies that limiting the process of school

feedback to “holding a mirror”, will not easily lead to a sufficient or

adequate level of self-reflection and related improvement actions.

However, it can be questioned whether SPF suppliers should fulfill the

additional need for school feedback support. As they are external agents,

they have a less clear view on all input, process, and contextual variables

that influence performance within a particular school. It looks more sound

to cooperate with actors that are more closely related to the schools, such

as educational advisors. Furthermore, a debate should start about the

function of SPFSs to determine whether school feedback should be

conceptualized in a broader way, and should therefore go beyond a signal

function.

In the context of school feedback, the question “How am I going?” might

pose specific problems. Feedback users expecting that a full picture will be

presented “in the mirror” about their school, might be disillusioned. Users

have to understand that an SPFSs reports on certain aspects of the school’s

functioning, that has been measured at a particular moment, by involving

particular (groups of) pupils/students, and building on specific

measurement instruments and techniques. Feedback results should

therefore be linked to other data sources, and to personal experiences. In

case this results in conflicting findings, educators have to search for

Chapter 7

176

explanations rather than denying specific feedback results. This was

exemplified in our own studies. In certain cases, the validity of the SPF was

questioned or even denied by principals, when the feedback did not match

the current policy or plans of the school. School feedback will not work if

users only “see what their eyes want to see”. School feedback use assumes

an open mind of its users. A second issue related to the how-am-I-going-

question, is that feedback users might only attain a blurred view in their

mirror, due to complex nature of the feedback and the limited data literacy

competences of the user. The provision of additional support in data

interpretation would be helpful to offer “glasses” to develop a better

understanding of the feedback. It can be suggested that the provision of

SPF support is an ethical requirement (cf. feedback should at least do no

harm; Fitz-Gibbon and Tymms, 2002) that is to be delivered by the feedback

suppliers. A third issue can be raised that centers on the possibility that the

mirror presents a distorted view of the school reality. The question has to

be asked whether SFPSs offer neutral or objective information. Every

approach to develop feedback, builds on assumptions about what is

relevant and when data are accurate. As discussed in Chapter 2, these

assumptions seem to differ considerably from system to system. Therefore,

a clear insight should be available about the underlying rationales to select

certain feedback characteristics. At least, users should get informed about

the strengths and limitations of the SPFS and the feedback received.

A final discussion concerns the extent to which schools, principal and

teachers fully exploit the level of autonomy granted within the Flemish

educational system. This introduces the impact of personal characteristics

of feedback users (Kluger & DeNisi, 1996). Flemish teachers and principals

are relatively free in designing their pedagogical project, choosing learning

methods, designing curricula and monitoring their quality. In the Flemish

context, we can question whether schools adopt a clear and powerful level

of “autonomy” and translate this into a school quality assurance policy. In

our studies, we hardly observed related indicators. Some schools, for

example, only added the feedback results to the output section of the self

evaluation report they prepared in view of a visit of the inspectorate. In

specific cases, the feedback was not read, nor screened. In this process, a

key role is played by the school principal. In most cases, the feedback report

entered the school via the desk of the school principal. He or she

determined whether the information was neglected or was distributed to

the school community members and was the starting point for a quality

related school team discussion. In the latter case, school principals

demonstrated a distributed leadership role, and school quality care became

the responsibility of all actors involved in the school system. This mirrors a

Chapter 7

177

broad view on professional development, profession identity, and an

inquiry habit of mind (Earl & Katz, 2006). It is to be stressed that the latter is

one of the core competences of Flemish teachers (Flemish Government,

2007).

4. Limitations of the studies and directions for future research

In the following paragraphs, we discuss a list of main critiques and or

shortcomings that can be raised in relation to our studies. At the same time,

this list helps to define directions for future research.

4.1. Study samples

The selection of research participants can be critiqued in a number of ways.

As the studies in this dissertation were part of a broader R&D project that

aimed at designing, developing and implementing an SPFS to be used in the

Flemish context, the recruitment of research participants was set up in a

particular way. In three studies, the samples consisted of primary school

principals, drawn from the larger pool of primary schools participating in

the SiBO project/ School Feedback Project (Chapters 3, 5, & 6). This sample

is relatively small (n = 195) when compared to the 2321 schools organizing

primary education in Flanders (Vlaamse overheid, Beleidsdomein Onderwijs

en Vorming, 2010). This resulted in rather small scale studies. Also, the

involvement of the principals in a pupil monitoring project might have

introduced a sampling bias since these principals expressed a clear interest

in examining pupil performances. Furthermore, this small group was

regularly asked to fill out research instruments (surveys and tests). As a

result, the response rate declined; though remained satisfactory for our

studies. Furthermore, the research samples were only put in a user context

linked to one particular SPFS in the Flemish educational context. This

introduces the need to expand our research by involving a larger and more

varied sample of principals, which is chosen from varying educational

contexts and in view of working with other SPFSs. This could help to

validate the current research findings. For example, within the UK, about

4.500 primary schools (and their principals) participate in PIPS related

research. This amount of participants creates opportunities to carry out

more advanced types of statistical analyses (e.g., multi-level analysis). Or,

better quality tests could be developed since a sound IRT calibration

approach requires a minimum of 500 respondents for each test item.

Chapter 7

178

The nature and quality of the research samples is also an issue in the

studies reported in Chapters 2 and 4. In Chapter 2, only five SPFSs have

been selected. This is not a representative selection. These five systems

were selected because they reflect the wide variety in SPFSs on the one

hand, but the selection was also driven by a pragmatic issue on the other

hand: to what extent was a spokesperson available to be involved in the

qualitative study. A more comprehensive inventory of SPFSs, used

worldwide, will offer perspectives to further develop the analytical

framework focusing on characteristics of SPFSs. On this base, an additional

research line could start that involves school feedback designers. In the

study described in Chapter 4, the discussion about the quality of the sample

takes a different direction. The decision to involve students affects the

external validity of the research findings. Although this experiment did lead

to interesting findings, the results of this study require to be validated with

a sample of principals or teachers. We cannot assume that the data literacy

competences of university freshmen are comparable to these of inservice

teachers or principals. Due to practical constraints, it is very difficult to set

up experimental studies involving school staff (e.g., administering data

literacy tests). Alternative research designs should be considered, such as

quasi-experimental designs with non-randomized groups of participants.

The studies reported on in the Chapters 3, 5 and 6 solely build on the

experiences of primary school principals. The involvement of other school

team members (teachers, care coordinators) can be considered. Since we

can expect that the availability of school data will increase over time, it

might be realistic that specific school staff members develop the related

data literacy competences. This type of task specialization seems to

increase in the Flemish educational system. Another approach could build

on an international project, involving teachers from countries where data

use is already a better integrated in the school culture (e.g., UK, The

Netherlands, New Zealand, etc.).

As stated earlier, a selection bias can have played a role since the

research participants were volunteers (Rossi, Lipsey en Freeman 2004). This

is critical in view of the internal validity of the study. Future research should

examine whether this subgroup is different from the population of school

principals by checking relevant school population characteristics.

Nevertheless, efforts have been undertaken to control for this type of bias

in the studies reported in Chapters 3, 4, 5 and 6.

Chapter 7

179

4.2. Research design and data analysis

A major advantage of applied research is bridging the gap between

scientific research and practice (Broekkamp, Vanderlinde, van Hout-

Wolters, & van Braak, 2009). However, applied research set up in a typical

school context, introduces several limitations. In the case of this

dissertation research, this affected the number of research participants

involved in the studies. They were all linked to the same School Feedback

Project. Furthermore, it can affect the external validity of the research

findings. Finally, to prevent the risk of putting too much pressure on the

principals in the project, both the number of research instruments, the

duration of interventions, the administration of pretests and intermediate

tests, etc. had to be limited. In future research, it is preferable to set up

more continuous support provisions, to develop a baseline as to the

dependent measures (pretests), and to set up follow-up tests. Furthermore,

a longitudinal perspective could be implemented to study the growth in

data literacy competences and the changes in school feedback use during

different consecutive feedback cycles. Lastly, the delayed effects of SPF

usage on student achievement could be studied. Such effects can only be

expected after several SPF cycles and a persistent effort in taking up

effective SPF use.

A next limitation builds on the measurement instruments used in the

different studies. Most did build on self-reporting (e.g., surveys and

interviews described in Chapters 3, 5 and 6) of the principals’ perceptions

about SPF. These perceptions can only be considered as proxies for their

attitudes towards SPF use, their actual feedback use on the school and the

concrete school improvement effects caused by SPF use. This limitation

introduces the need for research that links the ‘perceived’ to the ‘expected’

and the ‘actual’ use of SPF. To measure school improvement effects,

measurement techniques, such as school observations, video analyses of

staff meetings, analyses of inspection visit reports, class tests and school

documents by researchers are more optimal choices. Data resulting from

the use of these instruments help to develop a broader view of a school’s

functioning (e.g., Schildkamp & Kuiper, 2010). In the experimental studies,

in addition to the skills and knowledge tests developed in view of the

studies reported in Chapters 4 and 5, other tests could be used. For

instance, it could be interesting to present the feedback reports to school

staff and to invite them to make a concrete interpretation of the numerical

measures and graphical representation of the feedback results. These

concrete actions could be videotaped and consequently analyzed. This

could help to get more adequate information about the way principals or

Chapter 7

180

teachers interpret the data representations. In the literature, this

measurement approach has yet not been applied to research data-driven

decision making; though some preliminary results of observation studies

are reported in Santelices and Taut (2009), Van Petegem and Vanhoof

(2004), and Verhaeghe, Verhaeghe, Valcke, and Vanhoof (2008).

Furthermore, future research about SPF interpretation should also focus on

individual differences and preferences in data-interpretation, since little is

known about the impact of these differences on feedback interpretation.

We can also criticize the number of variables incorporated in the path-

model in Chapter 5 and the meta-matrix in Chapter 6. Our choice was

guided by the feasibility of the study and our prior research interest in

specific variables linked to data interpretation competences. This implies

that our research model presents a reduction of reality. It was therefore not

a surprise that not all variance in the endogenous variables could be

explained by the model used in Chapter 5 (34% unexplained variance in

total; only 11% explained variance in knowledge and skills to be related to

the support intervention). Also, remarks can be made about our scale

development and the lack of cross-validation of these instruments (Hoyle,

1995). However, as most studies were exploratory in nature, our findings

must be considered as preliminary results to be studied in depth in further

research. The studies outlined in Chapter 2 and 3, were of a descriptive

analytic nature. Though they do not result in spectacular findings, the

findings are valuable since they helped to develop the conceptual

framework for the following studies. However, an overall framework on SPF

use, comprising all relevant influencing factors is still lacking, as well as

empirical validation of all existing frameworks (Visscher, 2002; Visscher &

Coe, 2003). The literature about data usage and SPF use is growing; future

meta-evaluative research about influencing factors is advisable. However,

in this case a full validation of conceptual frameworks will remain difficult

since “not everything that can be counted counts, and not everything that

counts can be counted“(Cameron, 1963, p 13).

The studies in Chapters 5 and 6 built on a controlled field experiment.

This is very new in the SPFS literature. But it is clear that difficulties have

been encountered. First, we ran into ethical objections since not all

respondents were provided with the advantages of the onservice training

(ONSET). Next, the experimental conditions are bound to criteria for

controllability; this is not the case in reality (Rossi, Lipsey en Freeman,

2004). For example, the support intervention in the INSET-condition was

organized in such a way that questions of principals concerning their

personal school report could not get answered because of avoiding

interaction with the ONSET-condition. Only in the ONSET-condition there

Chapter 7

181

was room to discuss the school’s own feedback results. In normal

circumstances, we expect that principals would get input form the support

providers about particular school related questions. Finally, we still have to

question the extent to which we could control for the impact of

confounding, interacting variables in the field experiments. For example,

participants in the control condition were not prevented from search for

support in data use. We therefore promote the design and implementation

of more controlled field experiments and quasi experimental studies to

examine the factors affecting SPF use, especially in contexts where

feedback use is an integrated part of a school’s self-evaluation process.

4.3. Results

Issues can be raised concerning the validity, the limited explained variance,

the exploratory nature and the exemplary nature of our research findings.

We did already discuss these before. However, we want to stress that the

aim of the studies reported in this dissertation was not yet to come to

generalizable findings. Rather, we wanted to explore and illustrate school

feedback characteristics (Chapter 2), feedback use (Chapter 3), difficulties in

feedback interpretation (Chapter 4), and effects of feedback support

(Chapters 5 and 6). Furthermore, we have to stress that the conceptual

frameworks presented in Chapters 2 and 3, are not be considered as

comprehensive. Not all potentially relevant influencing factors have been

incorporated in Chapter 3, or all school feedback characteristics in Chapter

2.

Other limitations can be linked to the grounding of the studies in the

research literature. It has to be stressed that a broad domain of the

literature had to be explored in order to develop conceptual frameworks.

This body of the literature encompasses literature about school

effectiveness and school improvement, literature about data-driven

decision making, about SPFSs, about data representations, about cognitive

load, about inservice teacher training, about the evaluation of training

initiatives, feedback theory, etc. Though a clear attempt was made to build

on the most actual state of the literature, we are aware that there might be

shortcomings. However, the peer review experienced in the context of

conference and article submissions was a helpful step to guarantee a basic

quality of the work presented. In future studies, the literature base should

be expanded.

In all studies, and in particular in Chapter 4, we continuously highlighted

the interpretation problems in relation to school feedback information. This

might suggest that SPF is simply too complex in view of presenting relevant

Chapter 7

182

information. This does not hold for all aspects of SPF. Much depends on the

way information is gathered, reported and distributed. The Internet Testing

Unit (INTU) from the Centre for Evaluation and Monitoring, for example,

developed an “event mapper”. This is a self assessment tool to monitor a

school’s environment by asking students to build on questions by clicking

on an online map of the school. This can become very informative to detect

risk areas in a school; e.g., to detect and prevent bullying. This example

shows that SPF can build on inspiring and innovative ways to gather data,

process the information and to distribute feedback reports. Creativity in

developing these innovative directions is central to future research.

The results of the studies in Chapters 4, 5 and 6 mainly focused on

problems that feedback users did experience during the interpretation

phase. A similar analysis of obstacles during the phases of diagnosis,

planning, implementation and evaluation should be performed. This will

again result in a better understanding of the support needs of SPF users.

This future analysis of support needs is a prerequisite for designing

adequate support initiatives. This will require cooperation with the relevant

actors (i.e., school staff, feedback suppliers, inspection members,

educational advisors).

Furthermore, we want to stress that the use of SPF is something that

needs time to grow within the Flemish educational context. Some

disappointing results in our studies indicate that feedback use remains

mainly conceptual and that only preliminary school improvement effects

can be observed. The research findings about conceptual feedback use are

nevertheless promising as they might precede more intensive types of

feedback use (Schildkamp & Teddlie, 2008; Vanhoof, Verhaeghe,

Verhaeghe, Valcke, & Van Petegem, in press). SPF is therefore to be

considered as a large scale educational innovation that takes time to get

embedded in all facets of the educational arena and the thinking processes

and strategies of the actors involved.

5. Implications of the results

Drawing on the findings from the five studies outlined above, some

theoretical, methodological, practical and policy implications are suggested.

Some overlap with the directions for future research and are therefore

described rather concisely in the next paragraphs.

Chapter 7

183

5.1. Theoretical implications

A first - conceptual - implication is that a more refined description of SPF

use (see Chapter 3) has been developed and added to the research

literature. In addition to the detection of types in feedback use, also phases

in feedback use are now considered. Furthermore, also the existing types of

feedback use (instrumental, conceptual, symbolic and strategic) were

elaborated with a pupil-directed and motivating/stimulating feedback

usage type. Finally, more attention is now paid to the intermediate impact

of SPF instead of the narrow focus on the improvement of student

performance as a single school improvement effect.

Chapter 2 resulted in a more detailed framework for the analysis and

comparison of characteristics of SPFSs. After expanding and adjusting a

preliminary SPF framework, we could develop a set of standards for SPFS

developers and for SPF usage. This can result in the future in the

development of efficient instruments for data driven decision making.

Furthermore, it might inspire educational researchers to set up quasi-

experimental designs to study the way principals develop after receiving

different types of school feedback.

Finally, the experimental approach, reported in Chapter 4, presents an

innovative theoretical direction since it links the interpretation of SPF to

research about graphical data representations. In the context of data-driven

school improvement, this theoretical research field remains largely

unexplored. We expect that this study will trigger further assumptions and

empirical research about the way users approach numerical outcomes and

graphical representations as used in SPF reports.

5.2. Methodological implications

From a methodological point of view, some characteristics of our studies

inspire future research directions. The qualitative studies about SPF use

(Chapters 3 & 6) illustrate a particular controlled selection of participants

(e.g., theoretical sampling) and a particular systematic approach to the

analysis of the results (e.g., conceptually ordered predictor-outcome meta-

matrix). The way these qualitative studies have been set up can inspire

future qualitative research designs and the way they can tackle issues

related to developing a systematic approach and a clear analysis direction.

From a methodological point of view, our test calibration approach,

building on IRT, has proven to be adequate. The approach has several

advantages as compared to the application of classical test theory: (1) more

exact and reliable measures are obtained; (2) more information about the

Chapter 7

184

quality of the individual test items and the ability levels of the respondents

can be gathered. This information helps – in a better way – to track the

identification of interpretation difficulties. The latter results helped to

rethink how to present data to school staff and how to develop support

provisions. Furthermore, IRT allows to link several tests along a common

ability scale, creating opportunities to measure growth in ability over time.

Another methodological implication is the promotion of practical

research. The studies in Chapters 5 and 6 illustrate that it is possible to

evaluate the impact of workshops on different levels, looking beyond the

reaction level (Mathison, 1992; Rossi, Lipsey, & Freeman, 2004).

5.3. Practical implications

Since – in this dissertation - we mostly focused on applied research, our list

of practical implications is the longest. We can especially provide an

enumeration of ideas in view of the design or SPFSs and the related

implementation process. Many of these recommendations are especially

deduced from the discussion about SPFS characteristics that was elaborated

in Chapter 2.

First, school feedback system designers should try to minimize the

efforts for school staff and pupils in view of test administration. Developers

should try to build on data from existing management information systems

and available test item banks. Another efficiency measure builds on the

adoption of computer adaptive testing.

Second, with respect to the content of the feedback reports, more

attention should be paid to non-cognitive performance indicators and other

school subjects next to the predominant domains mathematics, language

and sciences. Attention should be paid to the development of attitudes

towards school and school subjects, socio-emotional variables (such as

wellbeing).

Third, data analysis approaches to produce school feedback should be

upgraded in order to adopt multilevel modeling and the statistical

adjustment for student background characteristics. However, raw (or

observed) scores should always be reported, because they refer to the

actual achievement level of a school. Users should get informed about the

shortcomings and strengths of the analysis methods used. Furthermore,

SPF designers should always try to find a balance between statistically

correct and user-friendly feedback.

Fourth, the presentation format of the school feedback should be well

considered. We advice to pursue more conformity in the way data sources

are present in a numerical way and in a graphical way. Furthermore,

Chapter 7

185

feedback designers should consider graphical representations that support

the processing of the represented information (Kluger & DeNisi, 1996;

Schnotz & Bannert, 2003). Feedback reports should be designed according

to the cognitive tasks that are necessary to understand the information

(e.g., line graphs to illustrate growth). Furthermore, the interpretability of

the feedback information should be evaluated in pilot studies. The latter is

important to guarantee that the feedback information and presentation

format fits the prior knowledge of the SPF users.

If the data literacy competences of school staff are insufficient to result

in a correct interpretation of the SPF, the provision of proper support is to

be taken up by the SPFS designers. However, since the support needs might

exceed the interpretation phase and users also encounter difficulties during

further steps of data use, other actors have to take up a support role. A

long-term cooperation with educational advisors and the educational

inspectorate could be helpful to create tailored ONSET trajectories. This will

require – during an initial phase - that these stakeholders are thoroughly

introduced into the characteristics and possibilities of the SPFSs.

The promotion of data-literacy competences could also become a part of

teacher education programs. If teachers are expected to adopt a role in the

quality assurance cycle of their school, they should be introduced to the

prevalent numerical measures and graphical representations that are

relevant for SPF interpretation. This is expected to prevent the pitfalls in

data use.

At a practical level, also recommendations for future SPF users can be

derived from our studies. Firstly, users should not use data from an SPFS

without being informed about its characteristics and possibilities.

Furthermore, users should expect and require that repeated measurements

are pursued to attain reliable results about the student performance being

studied. It is recommended that data from at least 3 consecutive school

years are used to develop school improvement actions (van de Grift, 2009).

In addition, data triangulation should be promoted, integrating the SPF

results with other data sources, in order to end up with grounded decisions.

Since we promote the “alert” use of SPF rather than a remedial usage, this

implies that SPF helps to develop an understanding of a school’s

functioning, and goes beyond offering clear-cut solutions and remediation

approaches. Finally, school principals should be encouraged to involve their

school team in discussing SPF. In such a way, they foster the development

of a data-driven school improvement approach and a distributed leadership

position in developing school policies (Huffman & Kalnin, 2003; Lachat &

Smith, 2005; Wayman, Midgley, & Stringfield, 2007).

Chapter 7

186

5.4. Policy implications

Finally, also policy implications follow from our research findings. First,

applied research should be promoted to drive both theory development,

the design of SPFSs and the implementation of a data-driven school

improvement approach (Broekkamp et al., 2009). The study in Chapter 2

illustrates that several SPFSs emerged from projects initially sponsored by

educational governments.

More resources should be made available to schools to get support in

the usage of SPF, to adopt the school wide use of - commercially - available

SPFSs and to create possibilities for spending time devoted to data use.

Furthermore, educational policy makers should be aware that the

creation of information-rich environments is not a guarantee for effective

feedback use. This requires the establishment of support initiatives. In

addition, the educational inspectorate needs to be informed about the

potential of SPFSs and should stimulate schools to effectively use the data

in their decision making, instead of merely adding it to the school quality

report. Furthermore, to stimulate school improvement and regular self-

evaluation at the school level, more initiatives to participate in low-stakes

testing should be promoted.

6. Final conclusion

To ensure that SPFSs will be used as intended (i.e., for school improvement

purposes), several conditions related to the users, the nature of SPF, the

available user support and the educational context have to be fulfilled.

Much can be gained when SPFSs provide schools with accurate, relevant

and user friendly data. Decisions made by SPFS developers about the design

of the SPFS affect school processes and learner results in ways that are not

yet fully understood. More research is needed to expand and adjust the

framework developed thus far. Furthermore, attempts to develop the data-

literacy competences of school staff are critical in view of the current trends

in macro-level school policies and the way schools have to develop their

autonomy. However, the first research findings in relation to SPF use in

Flemish schools are promising. Although - thus far - strong effects of SPF

use are lacking, there are some indications that data use is developing into

an accepted and standard feature of an internal school quality policy.

However, to bring current data use to a higher level, future research should

center on the evaluation of the impact of School Performance Feedback

Chapter 7

187

and on support provisions in view of the development of data literacy and

feedback use.

References



Black, P. & William, D. (1998). Assessment and classroom learning.

Assessment in Education: Principles, Policy & Practice, 5(1), 7-75.

Broekkamp, H., Vanderlinde, R., van Hout-Wolters, B., & van Braak, J.

(2009). De relatie tussen onderwijsonderzoek en onderwijspraktijk

verkend in Nederland en Vlaanderen [The relation between educational

research and educational practice explored in The Netherlands and in

Flanders]. Pedagogische Studien, 86(4), 313-320.

Cameron, W.B. (1963). Informal Sociology: A casual introduction to

sociological thinking. New York: Random House.





87.



Earl, L.M., & Katz, S. (2006). Leading schools in a data-rich world:

Harnessing data for school improvement. Thousand Oaks, CA: Sage.







Flemish Government. (2007, February 6). 15 december 2006. - Decreet

betreffende de lerarenopleidingen in Vlaanderen [15 December 2006. -

Decree on teacher education in Flanders]. Belgian Official Gazette, pp.

5888-5897.






Chapter 7

188




in Society, 159(3), 385-443.

Hattie, J. & Timperley, H. (2007). The power of feedback. Review of

Educational Research, 77(1), 81-112.

Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses

relating to achievement. New York: Routledge.



667-699.

Hoyle, R.H. (Ed.). (1995). Structural equation modeling: Concepts, issues and

applications.Thousand Oaks, CA: Sage.

Huffman, D., & Kalnin, J. (2003). Collaborative inquiry to make data-based


Karsten, S., Visscher, A.J., Bert Dijkstra, A., & Veenstra, R. Towards

standards for the publication of performance indicators in the public

sector: The case of schools. Public Administration, 88(1), 90-112.





feedback intervention theory. Psychological Bulletin, 119(2), 254–284.






349.



60(1), 1-64.

Meyer, J., Shinar, D., & Leiser, D. (1997). Multiple factors that determine

performance with tables and graphs. Human Factors, 39(2), 268-286.

Mortimore, P. & Sammons, P. (1994). School effectiveness and value added

measures. Assessment in Education: Principles, Policy and Practice, 1(3),

315.



Chapter 7

189







Santelices, V., & Taut, S. (2009, September). Comprehension and use of



Research, Vienna.






35(4), 150-159.



Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive


10(3), 251-296.






(in press).The influence of competences and support on school





Verhaeghe, G., Verhaeghe, J.P., Valcke, M, & Vanhoof, J. (2008, March).

Understanding school performance feedback: A contribution to the

development of effective school performance feedback. Paper presented

at the annual meeting of the American Educational Research

Association, New York.




Swets & Zeitlinger.

Chapter 7

190




Vlaamse overheid, Beleidsdomein Onderwijs en Vorming (2010). Vlaams

onderwijs in cijfers, 2009-2010 [The Flemish education in numbers,

2009-2010]. Brussels: Scheys.

Wayman, J. C., Midgley, S., & Stringfield, S. (2007). Leadership for data-

based decision making: Collaborative educator teams. In A.B. Danzig, K.

M. Borman, B.A. Jones & W.F. Wright (Eds.), Learner-centered



Yang, M., Goldstein, H., Rath, T., & Hill, N. (1999). The use of assessment

data for school improvement purposes. Oxford Review of Education,

25(4), 469-483.





191

NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH]

Samenvatting

192

NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH]

1. Inleiding

Van scholen wordt in groeiende mate verwacht dat ze van

schoolontwikkeling een systematisch proces maken en zich opstellen als

lerende organisatie (Nevo, 2002; Leithwood & Aiken, 1995). Om hen daarin

te ondersteunen worden informatierijke omgevingen gecreëerd. Zo worden

scholen ondermeer voorzien van feedback over hun functioneren en hun

prestaties aan de hand van speciaal daartoe opgezette

schoolfeedbacksystemen (SFSen). SFSen zijn externe systemen, bedoeld om

“performance” gerelateerde informatie te leveren aan scholen, op een

confidentiële manier. Dit gebeurt vanuit de verwachting dat scholen deze

feedback zullen aanwenden voor een zelfevaluatie en de interne

schoolontwikkeling (Visscher & Coe, 2002, p xi). Een belangrijk uitgangspunt

is dat schoolfeedback een meerwaarde zou vormen ten opzichte van de

bestaande informatiebronnen in scholen en de eigen intuïties en ervaringen

van schoolteamleden (Earl & Fullan, 2003).

Het gebruik van informatiebronnen als een omvattend

beleidsinstrument blijkt echter niet vanzelfsprekend te zijn. Doorgaans

blijven het gebruik en de schoolverbeteringseffecten beperkt (Coe, 2002;

Saunders & Rudd, 1999; Tymms, 1995; Schildkamp, Visscher, & Luyten,

2009; Van Petegem & Vanhoof, 2004; Zupanc, Urank, & Bren, 2009). Het

krijgen van specifieke schoolfeedback blijkt een noodzakelijke maar geen

voldoende stap voor het bevorderen van een systematische reflectie op

schoolniveau. Zowel binnen de scholen als de aan de kenmerken van

feedbacksystemen moet immers aan bepaalde voorwaarden voldaan zijn

(Visscher & Coe, 2003; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010).

Één van de belangrijkste hinderpalen die een effectief gegevensgebruik in

de weg staat, is het ontbreken van datageletterdheid bij de gebruikers (Earl

& Katz, 2006). Het is dan ook niet verwonderlijk dat uit heel wat

onderzoeksbevindingen blijkt dat schoolleiders en leerkrachten een

behoefte hebben aan bijkomende ondersteuning; zowel bij het

interpreteren als het verder gebruiken van de data (Schildkamp & Teddlie,

2008; Schildkamp, Visscher, & Luyten, 2009; Verhaeghe et al., 2010;

Visscher & Coe, 2003; Zupanc, Urank, & Bren, 2009).

Samenvatting

193

2. Conceptueel kader

Als theoretische basis voor de onderzoeken in het proefschrift werd vooral beroep gedaan op de wetenschappelijke literatuur over schooleffectiviteit, datagebruik (cf. data-driven decision making), datarepresentatie en nascholingsinitiatieven.

Twee centrale begrippen in de schooleffectiviteitsliteratuur zijn enerzijds schoolverantwoording en anderzijds schoolontwikkeling. Dit eerste begrip is vooral van toepassing op onderwijscontexten waarin centrale toetsing en externe controle centraal staan. Het tweede begrip verwijst naar een meer recente aanpak waarin gegevensgebruik binnen scholen voor zelfevaluatie en interne kwaliteitszorg centraal staan. Hoewel beide motieven voor datagebruik op het eerste gezicht tegenstrijdig lijken, komen beide benaderingen in de praktijk dikwijls samen voor (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank, & Bren, 2009). SFSen sluiten zeer expliciet aan op schoolontwikkeling omdat ze verondersteld worden de zelfreflectie te bevorderen. Maar om tot effectieve resultaten te kunnen leiden, dienen SFSen aan een aantal kwaliteitscriteria te voldoen. Daarvoor wordt verwezen naar literatuur over prestatie-indicatoren, die nuttig blijken wanneer ze relevante, accurate, kosteneffectieve en faire informatie aanreiken (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). Naast de inherente kwaliteit van de indicatoren blijken ook variabelen in de schoolsetting bepalend te zijn voor nuttig en succesvol gebruik. Zo wordt in de literatuur benadrukt dat instellingen of organisaties die feedbackinformatie leveren, alles moeten bewerkstelligen om bij te dragen aan positieve effecten bij de gebruikers (Goldstein & Myers, 1996; Fitz-Gibbon, 1996, Fitz-Gibbon & Tymms, 2002).

Om schoolfeedbackgebruik te omschrijven, vertrekken we van het conceptueel raamwerk van Visscher (2002; Visscher & Coe, 2003). Verschillen in schoolfeedbackgebruik en de effecten ervan worden toegeschreven aan vier cluster van factoren m.b.t. de (1) kenmerken van de gebruikers, (2) de feedback en het onderliggende SFS, (3) de geboden ondersteuning en (4) de educatieve context (Verhaeghe et al., 2010; Visscher & Coe, 2003). Deze factoren hebben een invloed op het schoolfeedbackgebruik, dat we omschrijven in termen van fasen in

schoolfeedbackgebruik en soorten van feedbackgebruik. Onderzoek leert dat om gebruik te maken van schoolfeedback het aangewezen is op een doordachte manier een cyclisch proces te doorlopen. In die cyclus wordt het (a) ontvangen, (b) lezen en bediscussiëren van de schoolfeedback onderscheiden, om (c) tot een correcte interpretatie te komen. Nadat de school een sterkte-zwakteanalyse van haar resultaten heeft gemaakt, volgt

Samenvatting

194

een fase waarin met de schoolfeedback aan de slag wordt gegaan. Deze omvat het (d) diagnosticeren door het zoeken naar verklaringen voor de resultaten en het (e) plannen, (f) uitvoeren en (g) evalueren van acties. Door een gebrek aan datageletterdheid en tijd blijken scholen deze stappen niet allemaal, of slechts moeizaam te doorlopen (Earl & Fullan, 2003; Verhaeghe et al., 2010). Naast deze cyclische aanpak, wordt bij het gebruik van feedback informatie verwezen naar types van gebruik. Hiervoor helpt de indeling van Rossi, Lipsey en Freeman (2004), die een onderscheid maken soorten gebruik van evaluatiegegevens; een indeling die we kunnen toepassen in de context van schoolfeedbackgebruik (Schildkamp, Visscher, & Luyten, 2009; Verhaeghe et al., 2010; Weiss, 1998). Scholen kunnen acties ondernemen (instrumenteel gebruik), aan het denken gaan (conceptueel gebruik), bevestiging zoeken van bestaande standpunten (symbolisch gebruik), het rapport in een verantwoordingcontext hanteren (strategisch gebruik) of het rapport gebruiken om teamleden te stimuleren of motiveren (motiverend gebruik).

Zoals reeds werd aangegeven in de inleiding, is het ultieme doel van schoolfeedbackgebruik bij te dragen aan schoolontwikkeling, ondermeer in termen van verbeterde leerresultaten van de lerenden (Visscher & Coe, 2002; 2003). Maar schoolfeedbackgebruik blijkt niet altijd reeds te resulteren in significant verbeterde leerlingprestaties (Fitz-Gibbon & Tymms, 2002; Schildkamp, Visscher, & Luyten, 2009; Visscher, 2002). Bij het nagaan van schoolverbeteringseffecten ligt het daarom voor de hand dat ook naar mediërende effecten gekeken worden; bijv. de effecten op de professionele ontwikkeling van teamleden (zoals een toenemende mate van assessment literacy; Zupanc, Urank, & Bren, 2009), de verbeterde onderwijsprocessen (zoals het intensifiëren van leerlingenbegeleiding, Schildkamp & Teddlie, 2008) en/of een verbeterd schoolfunctioneren (zoals het versterken van de cohesie in de school, Visscher & Coe, 2003). Schoolfeedbackgebruik kan ook resulteren in onbedoelde en onwenselijke effecten, zoals demotivatie bij leerkrachten, een overbevraging van leerkrachten (Fitz-Gibbon & Tymms, 2002) of een te sterke focus op getoetste leerinhouden, ook genoemd “teaching to the test” (Schildkamp & Teddlie, 2008; Visscher, 2002).

3. Het Schoolfeedbackproject: Een spiegel voor elke school Dit doctoraatsonderzoek werd opgezet in de context van het Schoolfeedbackproject genaamd “Each school its own mirror” (Verhaeghe & Van Damme, 2006). In het kader van dit project werd een prototype van een schoolfeedbacksysteem ontwikkeld. In de context van het ontwikkelingsonderzoek ontvingen 195 Vlaamse scholen jaarlijks feedback

Samenvatting

195

op vertrouwelijke basis, waarbij hun schoolresultaten vergeleken werden met een representatieve referentiegroep uit het SiBO-onderzoek (Maes, Van Petegem, & Van Damme, 2005). In het SiBO-onderzoek worden gegevens verzameld van een cohorte leerlingen die van het einde van het kleuteronderwijs tot en met de overgang naar het secundair onderwijs opgevolgd worden voor wiskunde en taal (spelling, technisch en begrijpend lezen) aangevuld met informatie over de instroomkenmerken van de leerlingen. Bij het uitwerken van de feedbackrapporten werd bij de vergelijking met de referentiegroep de betekenis van de eigen schoolprestaties uitgelegd aan de hand van een aantal centrale concepten: leerwinst, toegevoegde waarde en gecorrigeerde scores. Deze begrippen werden zodanig uitgelegd dat niet verwacht werd van de feedbackgebruikers veel statistische voorkennis te bezitten. De feedbackdata werden bovendien ondersteund met grafische voorstellingen (cirkeldiagrammen, groeicurven en kruistabellen). De tekst werd voor elke school gestandaardiseerd. Daarnaast werd van schoolteamleden verwacht om zelf de schooleigen data te interpreteren.

4. Onderzoeksdoelstellingen en –opzet

In het kader van dit doctoraatsonderzoek werden vijf onderzoeken opgezet

en uitgevoerd. De volgende vijf centrale onderzoeksdoelstellingen (OD)

stonden voorop:

• OD 1: Het verkennen van de kenmerken van SFSen

Hoofdstuk 2 maakt de lezer wegwijs in kenmerken van SFSen. Gegevens

werden verzameld door middel van vragenlijstenonderzoek en diepte-

interviews bij feedbackontwikkelaars. Een descriptieve analyse van vijf

SFSen leidde tot een eerste vergelijkend kader om een discussie over de

kenmerken van SFSen op gang te brengen.

• OD 2: Het ontwikkelen van een raamwerk voor het in kaart brengen van

schoolfeedbackgebruik, de beïnvloedende factoren en de verwachte

effecten

In hoofdstuk 3 wordt een raamwerk ontwikkeld en uitgetest om

schoolfeedbackgebruik, de beïnvloedende factoren en de uiteindelijke

effecten op de schoolwerking te beschrijven. Daarbij werden schoolleiders

uit het Schoolfeedbackproject geïnterviewd.

Samenvatting

196

• OD 3: Het verkennen van de datageletterdheidscompetenties van SFS

gebruikers

• OD 4: Het verkennen van effecten van alternatieve datarepresentaties

en de datageletterdheidscompetenties van SFS gebruikers

Enkele centrale concepten uit schoolfeedbackrapporten (vb. toegevoegde

waarde en leerwinst) worden in een experiment uitgetest op hun

interpreteerbaarheid. Respondenten werden random verdeeld over de

condities die verschillen in de manier waarop de centrale begrippen worden

uitgelegd en gerepresenteerd. Met behulp van gekalibreerde toetsen

(d.m.v. IRT-technieken) werd het vaardigheidsniveau van de respondenten

bepaald. Deze resultaten worden in het vierde hoofdstuk gerapporteerd.

• OD 5: Het verkennen van effecten van alternatieve vormen van

ondersteuning op schoolfeedbackgebruik

De hoofdstukken 5 en 6 pakken deze laatste onderzoeksdoelstelling aan

waarin de invloed van types van ondersteuning van schoolleiders uit het

Schoolfeedbackproject bij schoolfeedbackgebruik werd onderzocht.

Effecten van ondersteuning werden nagegaan door middel van

vragenlijsten en een gekalibreerde toets (Hoofdstuk 5) en diepte-interviews

(Hoofdstuk 6).

5. Voornaamste bevindingen

OD 1: Het verkennen van de kenmerken van SFSen

Tot nog toe ontbrak een helder kader om schoolfeedbacksystemen te

beschrijven en te vergelijken. In voorliggend proefschrift werd hiervoor een

eerste aanzet gegeven, met als doel zowel feedbackontwikkelaars als

feedbackgebruikers te informeren over de basiskenmerken van SFSen.

Daarbij komen ook voor- en nadelen van SFSen aan bod. Voor de aanpak

van deze onderzoeksdoelstelling werden de kenmerken van vijf SFSen met

betrekking tot hun dataverzamelingsmethode en technieken voor data-

analyse in kaart gebracht. Vervolgens werd de inhoud van de

feedbackrapporten kritisch ontleed, met inbegrip van de gebruikte

numerieke maten en de grafische representatievormen. Aparte aandacht

werd besteed aan de kwaliteitscriteria voor de geleverde feedback:

relevantie van de feedback, kosteneffectiviteit, accuraatheid, fairheid en

het benadrukken van positieve effecten (Fitz-Gibbon, 1996; Heck, 2006;

Samenvatting

197

Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher,

2002). Uit de analyse blijkt vooral dat de onderzochte SFSen heel sterk

verschillen in hun kenmerken.

Om een idee te kunnen krijgen op de accuraatheid van de data, moeten

we een goed zicht hebben van de gebruikte dataverzamelingsmethode.

Zowel gestructureerde testinstructies als meetinstrumenten werden

gebruikt. Interessant hierbij zijn het gebruik van technologieondersteunde

toepassingen. Vooral de combinatie van computeradaptief testen met

toetsen samengesteld uit itembanken en de gegevensuitwisseling met

studentenadministratiesystemen blijken grote voordelen op te leveren voor

de gebruiker.

Vervolgens werd gekeken naar de gebruikte methoden voor data-

analyse, de schaalconstructies, de mogelijkheden voor longitudinale

metingen en de gerapporteerde aggregatieniveaus. Voorts werd

onderzocht in welke mate rekening is gehouden met leerlingenmobiliteit.

Bij deze analyse stond centraal in welke mate voldaan werd aan de eisen

voor accuraatheid én gebruiksvriendelijkheid.

Daarna werd de feedbackinhoud van de verschillende SFSen nader

bekeken. Daarbij bleek dat de nadruk vooral ligt op cognitieve inhouden.

Verder werd ook onderzocht welke numerieke maten en grafische

representaties in de rapporten werden gebruikt. Er werd een zeer brede

waaier aan datarepresentaties vastgesteld. De keuze voor bepaalde

representatievormen heeft meteen gevolgen voor de veronderstelde

interpretatievaardigheden van de schoolfeedbackgebruikers.

OD 2: Het ontwikkelen van een raamwerk voor schoolfeedbackgebruik, de

beïnvloedende factoren en de verwachte effecten

Vertrekkende van het conceptueel raamwerk, ontwikkeld door Visscher

(2002; Visscher & Coe, 2003), werd een onderzoek opgezet om percepties

van schoolleiders over schoolfeedbackgebruik in kaart te brengen. Daarbij

werd aandacht besteed aan de beïnvloedende factoren, de fasen in

schoolfeedbackgebruik, de soorten van feedbackgebruik, en de uiteindelijk

effecten van feedbackgebruik op de schoolwerking. Informatie werd

verzameld door middel van diepte-interviews bij deelnemers aan het

Schoolfeedbackproject. Een analyse van deze resultaten hielp om het

conceptueel model van Visscher verder uit te breiden. Daarbij werden vier

clusters van factoren onderscheiden, die een invloed uitoefenen op

schoolfeedbackgebruik: factoren gerelateerd aan de onderwijscontext, aan

de gebruikers/school, aan de mogelijkheden voor ondersteuning en aan

kenmerken van het SFS. Schoolfeedbackgebruik werd - aanvullend op het

Samenvatting

198

kader van Visscher - ook omschreven in termen van te ondernemen

stappen in een cyclisch proces van datagebruik. De schoolleiders

rapporteerden daarbij voornamelijk problemen in de interpretatiefase. De

feedback van het Schoolfeedbackproject bleek in de meeste gevallen nog

niet geïntegreerd te zijn in de schoolwerking. Wat betreft types van

feedbackgebruik, werd maar zelden een instrumentele gebruiksvorm

gerapporteerd. Het is dan ook niet verwonderlijk dat heel wat

schoolverbeteringseffecten door schoolfeedbackgebruik bij deze bevraagde

groep uitbleven.

OD 3: Het verkennen van datageletterdheidscompetenties van SFS

gebruikers

Uit de resultaten van de vorige studies bleken de interpretatievaardigheden

van schoolfeedbackgebruikers beperkt. Dit is vooral kritisch omdat de

aangeboden representatievormen duidelijk een mate van

datageletterdheid veronderstellen. Daarom werd een experimenteel

onderzoek opgezet (Hoofdstuk 4) waarbij twee aanpakken voor het

verklaren van het begrip toegevoegde waarde en drie verschillende

representaties werden vergeleken in functie van hun interpreteerbaarheid.

Respondenten volgden een gestandaardiseerde instructie (doornemen van

een schoolfeedbackrapport in de rol van een schoolleider die de resultaten

van de eigen school te zien krijgt) en er werden kennis- en

vaardigheidstoetsen afgenomen. De toetsen, waarvan de resultaten door

middel van IRT-technieken werden geanalyseerd, helpen de

moeilijkheidsgraad van ieder toetsitem te bepalen en helpen eveneens het

vaardigheidsniveau van de deelnemers te bepalen in het interpreteren van

de feedbackinformatie. Ook werd gezocht naar patronen in de fouten, die

kunnen verwijzen naar misconcepties bij de respondenten. Uit de

resultaten blijkt dat vooral de procedurele toetsvragen, waarbij gevraagd

werd om resultaten van toegevoegde waarde te interpreteren van grafische

representaties, moeilijkheden opleveren (slechts 35% van de respondenten

losten deze correct op). Dit kan verklaard worden door de hoge eisen die

hierbij gesteld worden aan het werkgeheugen (cf. cognitive load theory;

Chandler & Sweller, 1991; Sweller, van Merriënboer, & Paas, 1998). Eén

bepaalde misconceptie bleek zeer vaak voor te komen, waarbij de

hellingsgraad en de hoogte van curves verkeerd geïnterpreteerd werden

(cf. slope-height confusion; Beichner, 1994; Clement, 1989; Kramarski,

2004; Leinhardt, Zaslavsky, & Stein, 1990).

In het vijfde hoofdstuk werd een vergelijkbare datageletterdheidstoets

gebruikt om het kennis- en vaardigheidsniveau te bepalen van

Samenvatting

199

schoolleiders, na het ontvangen van hun feedbackrapport in de context van

het Schoolfeedbackproject. Hieruit bleek dat slechts 42% van de

deelnemers erin slaagde om de helft van de items correct te

beantwoorden. Deze zwakke resultaten komen niet overeen met de hogere

inschatting van hun eigen kennisniveau (vijfpuntenschaal; M = 3.81, SD

=0.74).

Datageletterdheidscompetenties bestaan naast kennis en vaardigheden

ook uit attitudes ten aanzien van schoolfeedbackgebruik. Wanneer we

hiernaar peilden bij de schoolleiders (hoofdstukken 5 en 6), bleek dat zij

een positieve houding aannemen en er van uitgaan dat dit soort

datagebruik bij hen aanzet tot zelfevaluatie. Maar tegelijkertijd geven ze

aan dat hun leerkrachten een stuk minder positief tegen schoolfeedback

aankijken. Mogelijke verklaringen hiervoor zijn dat leerkrachten vooral

geconfronteerd worden met de lasten van de dataverzameling, zich

bovendien bedreigd voelen door deze evaluatie en een voorkeur hebben

voor leerlingendata van hun eigen klas, in plaats van geaggregeerde

gegevens op schoolniveau.

OD 4: Het verkennen van de effecten van alternatieve datarepresentaties en

de datageletterdheidscompetenties van SFS gebruikers

Een samenspel van een beperkte voorkennis en de inherent complexe

feedbackinformatie blijkt te leiden tot zwakke toetsscores bij de

respondenten in de experimentele groep. De aanpak om het begrip

toegevoegde waarde uit te leggen in termen van “het verschil tussen

geobserveerde en verwachte gemiddelde” leidde tot betere toetsscores

dan de aanpak om het begrip uit te leggen in termen van “het verschil

tussen het gecorrigeerde gemiddelde en het gemiddelde voor de

referentiegroep”. Verder blijkt uit de resultaten dat het toevoegen van

tabellen aan de groeicurven niet bijdraagt tot een betere

feedbackinterpretatie. Dit opvallende resultaat doet vragen rijzen bij de rol

van gebruikte representatievormen. Afhankelijk van welke informatie

afgelezen moet worden van deze figuren, is de ene dan wel een andere

representatievorm geschikt (Schnotz & Bannert, 2003).

OD 5: Het verkennen van de effecten van alternatieve

ondersteuningsaanpakken op schoolfeedbackgebruik

In de hoofdstukken 5 en 6 worden de resultaten van een

ondersteuningsinterventie gerapporteerd. Schoolleiders uit het

Schoolfeedbackproject (n = 195) namen deel aan een experiment waarin ze

Samenvatting

200

ad random werden toegewezen aan één van de volgende drie condities:

één conditie waarbij ondersteuning op school werd aangeboden (ONSET, n

= 7), één waarbij de ondersteuning plaatsvond op een locatie buiten de

school (INSET, n = 23) en één waarbij geen ondersteuning werd

aangeboden (controlegroep, n = 150). De INSET-groep werd uitgenodigd op

een studievoormiddag in een universiteitsgebouw. Zij kregen uitleg over de

interpretatie van de feedbackrapporten en over de gebruiksmogelijkheden

en dit aan de hand van een fictief scholenrapport. Dezelfde uitleg kwam

aan bod in de ONSET-groep, maar daarbij werd de schoolleider op de school

bezocht en werden de eigen schoolresultaten in de training betrokken.

Kirkpatricks model (1998) voor de evaluatie van trainingsinitiatieven bood

daarbij de structuur aan voor de evaluatie van de resultaten uit deze studie.

In het reactieniveau werd nagegaan in hoeverre de deelnemers tevreden

waren over de ondersteuning. Vervolgens werd - op het leerniveau -

nagegaan of er sprake was van een toename in

datageletterdheidscompetenties (kennis, vaardigheden, attitudes). Daarna

werd op het gedragsniveau onderzocht of wat geleerd werd ook toegepast

werd binnen de school. Tenslotte werd op het resultaatsniveau bekeken of

er sprake was van schoolverbeteringseffecten in de verschillende

ondersteuningscondities, als gevolg van het schoolfeedbackgebruik.

In hoofdstuk 5 werden enkel de INSET- en de controlegroep vergeleken.

De relatie tussen de verschillende variabelen werden uitgetest in een

padmodel (X² (df) = 11.3 (13), p = 0.58; RMSEA = 0.01; AGFI = 0.92; GFI =

0.97). De toetsing van dit model toonde aan dat de ondersteuning enkel -

op een directe manier - leidde tot significant hogere scores op de kennis- en

vaardigheidstoets en op een hogere inschatting van de eigen

datageletterdheid. Indirecte effecten werden vastgesteld door de

ondersteuningsinterventie op de fasen in gebruik en types van gebruik.

De kwalitatieve studie, gerapporteerd in hoofdstuk 6, maakte gebruik

van een metamatrix waarin de verschillende onderzoeksdeelnemers

geordend werden per conditie (ONSET, INSET en controle) en naar mate

van feedbackgebruik. ONSET-deelnemers rapporteerden een hogere mate

van tevredenheid, een sterkere beheersing van

datageletterdheidscompetenties en een intensiever doorlopen van de fasen

lezen en bespreken, interpreteren en diagnosticeren.

6. Conclusie

Onderwijsoverheden verwachten van scholen dat ze data aanwenden voor hun interne kwaliteitszorg. Uit de resultaten van de hier gerapporteerde

Samenvatting

201

onderzoeken blijkt dat datagebruik in de context van schoolfeedbackgebruik eerder beperkt blijft. Kritische feedbackgerelateerde begrippen zoals “leerwinst”, “toegevoegde waarde”, en “outputmetingen” blijken de eerder statistisch ongeletterde schoolleiders te overdonderen en als gevolg daarvan nauwelijks te informeren over de eigen effectiviteit van de schoolwerking. Het aanbieden van schoolfeedback blijkt niet automatisch te leiden tot zelfreflectie. Om te garanderen dat schoolfeedback gebruikt wordt voor schoolverbeteringsinitiatieven, moet namelijk aan een aantal voorwaarden voldaan zijn m.b.t. de gebruikers, de SFSen, de ondersteuning en de educatieve context. De onderzoeksresultaten geven aan dat nog veel kan verbeterd worden aan de accuraatheid, relevantie en gebruiksvriendelijkheid van de geleverde schoolfeedback. Dit betekent dat meer evaluatieonderzoek nodig is in relatie tot schoolfeedbackinitiatieven. Wat de onderzoeksresultaten zeer sterk duidelijk maken, is dat veel aandacht moet geschonken worden aan de ontwikkeling van datageletterdheidscompetenties van feedbackgebruikers. Pas dan kan verwacht worden dat de kansen tot zelfreflectie en autonome kwaliteitszorg ten volle benut worden. De vraag naar een dergelijke ondersteuning gaat verder dan louter een ondersteuning bij de interpretatie van de data. Scholen willen ook op weg gezet worden bij het nemen van beslissingen op basis van hun schoolfeedback. Om hierop in te gaan, zal intensieve samenwerking tussen feedbackleveranciers, inspectieleden en pedagogische begeleiders nodig zijn; vooral om ondersteuning op maat te kunnen aanbieden. Daarnaast moeten scholen aangezet worden om deze feedbackgegevens aan te grijpen om eigen inzichten en eerdere bevindingen te vergelijken en te integreren in hun dagelijkse werking. Het effectief leren gebruiken van schoolfeedback is dan een nieuwe taak voor leerkrachten die momenteel vooral gewoon zijn om individuele leerlinggegevens van de eigen klas te verwerken. Maar gebruikers moeten ook geïnformeerd worden over de sterke en zwakke punten van de geleverde feedback. Niettegenstaande de uiteindelijke effecten van schoolfeedbackgebruik in onze studies beperkt bleven, zijn er wel indicaties gevonden die de meerwaarde aantonen van schoolfeedbackgebruik. Gezien voorliggend onderzoek duidelijke beperkingen heeft - bijvoorbeeld in termen van de omvang, de onderzoeksopzet en de gekozen afhankelijke en mediërende variabelen - is verder onderzoek op basis van deze eerste bevindingen aangewezen.

Samenvatting

202

Literatuur







87.






Earl, L.M., & Katz, S. (2006). Leading schools in a data-rich world:

Harnessing data for school improvement. Thousand Oaks, CA: Sage.











667-699.











60(1), 1-64.



Samenvatting

203

Maes, F., Van Petegem P., & Van Damme, J. (2005). Schoolloopbanen in het

basisonderwijs (SiBO): Doelstellingen en onderzoeksopzet. Paper

gepresenteerd op de Onderwijs Research Dagen, Gent, België.






perspective (pp. 3–16). Oxford, UK: Elsevier Science.












Sussex.









Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive


10(3), 251-296.









Samenvatting

204


101-119.




Verhaeghe, J.P., & Van Damme, J. (2006). School performance feedback in

Vlaanderen, een schets op basis op van een projectvoorstel. Informatie

vernieuwing onderwijs (IVO), 27(103), 19-27.




Swets & Zeitlinger.










205

RESEARCH VALORISATION: PUBLICATIONS

Publications

206

RESEARCH VALORISATION: PUBLICATIONS

1. Articles in SSCI journals (a1)

1.1. Published – in press


School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188.

Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The influence of competences and support on school performance feedback use. Educational Studies.

1.2. Submitted

Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Van Petegem, P., & Valcke, M.

(2010). School characteristics facilitating school performance feedback use by teachers. Manuscript submitted for publication in School

Effectiveness and School Improvement. Verhaeghe, G., Schildkamp, K., & Luyten, H. (2010). Characteristics of School

Performance Feedback Systems. Manuscript submitted for publication in Educational Administration Quarterly.

Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten van ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën.

Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of schools: How to represent school feedback information. Manuscript submitted for publication in The Journal of Educational Research.

2. Articles in journals not included in the SSCI (a3)

Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (in press).

Datageletterdheid versterken bij scholen: Lessen uit het Schoolfeedbackproject [Strengthening the data literacy in schools: Lessons from the School Feedback Project]. Kwaliteitszorg in Het

Onderwijs. Vanhoof, J., Verhaeghe, G., Van Petegem, P., Verhaeghe, J.P., & Valcke, M.

(2009). Verschillen in het gebruik van schoolfeedback: Een verkenning van verklaringsgronden [Differences in school performance feedback use: An exploration of explanations]. Tijdschrift voor Onderwijsrecht &

Onderwijsbeleid, 2009(4), 306-322.

Publications

207

Verhaeghe, G., Vanhoof, J., Van Petegem, P., Verhaeghe, J.P., & Van Damme, J. (in press). Het gebruik van outputgegevens in basisscholen: Concretiseringen en illustraties uit het Schoolfeedbackproject [The use of output results in primary schools: Concretizations and illustrations from the School Feedback Project). Kwaliteitszorg in Het Onderwijs.

3. Chapters in books (b2)

Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Van Petegem, P., & Valcke, M.

(2010). Improving data literacy in schools: Lessons from the School Feedback Project. In K. Schildkamp, M.K. Lai & L. Earl (Eds.), Data-driven

decision making around the world: Challenges and opportunities. Manuscript submitted for publication.

4. Conference contributions

Verhaeghe, G., Verhaeghe, J.P. (2006, December). School Performance

Feedback als instrument voor kwaliteitszorg en middel tot reflectie over

schoolbeleid. Paper presented at the Vlaams Forum voor Onderwijsonderzoek, Antwerp.

Verhaeghe, G., Verhaeghe, J.P. (2007, June). Verstaanbare schoolfeedback

een realiteit? Paper presented at the Onderwijs Research Dagen (ORD), Groningen.

Verhaeghe, G., Verhaeghe, J.P., (2007, September). An attempt to develop

effective school performance feedback. Paper presented at the preconference of the European Conference on Educational Research, Ghent.

Verhaeghe, G., Verhaeghe, J.P., Valcke, M, & Vanhoof, J. (2008, March). Understanding school performance feedback: A contribution to the

development of effective school performance feedback. Paper presented at the annual meeting of the American Educational Research Association, New York.

Verhaeghe, G., Vanhoof, J., & Van Petegem, P. (2008, June). Diepte-

interviews naar het gebruik van schoolfeedback. Paper presented at the Onderwijs Research Dagen, Eindhoven.

Verhaeghe, G., Vanhoof, J., Verhaeghe, J.P., & Van Petegem, P. (2008, September). Feedback on school performance feedback: In-depth

interviews about the comprehensibility and usability. Paper presented at the European Conference on Educational Research, Göteborg.

Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (2009, January). The effect of support on the interpretation and use of school feedback.

Publications

208

Poster presented at the International Congress for School Effectiveness and School Improvement, Vancouver.

Verhaeghe, G., Vanhoof, J., Verhaeghe, J.P., & Van Petegem, P. (2009, January). Feedback on the use and interpretation of school performance

feedback: Perceptions of primary school principals. Paper presented at the International Congress for School Effectiveness and School Improvement, Vancouver.

Vanhoof, J., Verhaeghe, G., & van Petegem, P. (2009, May). Schoolfeedbackgebruik: Proces, resultaat en impact van ondersteuning. Paper presented at the Onderwijs Research Dagen, Leuven.

Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (2010, September). Does support matter in interpreting and using school

feedback? Findings from a quasi-experimental study. Paper presented at the European Conference on Educational Research, Vienna.

Verhaeghe, G., Vanhoof, J., Van Petegem, P., & Valcke, M. (2010, January). Supporting school performance feedback use: An experimental study. Poster presented at the International Congress for School Effectiveness and School Improvement, Kuala Lumpur.

Vanhoof, J., Verhaeghe, G., & Van Petegem, P. (2010, January). Data use

and the impact of a training initiative of data use. Symposium paper presented at the International Congress for School Effectiveness and School Improvement, Kuala Lumpur.

Verhaeghe, G., Vanhoof, J., Van Petegem, P., & Valcke, M. (2010, August). Supporting School Performance Feedback Use: An Experimental Study. Paper presented at the European Conference on Educational Research, Helsinki.