School performance feedback systems:
Design and implementation issues
Goedele Verhaeghe
Promotor: Prof. Dr. Martin Valcke
Proefschrift ingediend tot het behalen Doctor in de Pedagogische
School performance feedback systems:
Design and implementation issues
Goedele Verhaeghe
Promotor: Prof. Dr. Martin Valcke
Proefschrift ingediend tot het behalen van de academische graad van Doctor in de Pedagogische Wetenschappen
2011
This Ph.D. research project has been funded by:
The agency for Innovation by Science and Technology (IWT) GRANT NUMBER SBO 50194 (SCHOOL FEEDBACK PROJECT)
VOORWOORD
Those that can, do research.
Those that cannot, teach.
Those that cannot teach, teach teachers.
Those that cannot teach teachers, do educational research.
(Anoniem, Bron: www.vob-ond.be)
Dit citaat, dat ik ooit goedbedoeld doorgestuurd kreeg van een
natuurwetenschapper, zal bij menige collega’s wenkbrauwen doen fronsen. Ik kan
niet ontkennen dat ik gedurende mijn doctoraatstraject aan deze uitspraak
getwijfeld heb. Wat is nu juist onderzoek voeren? Voor mij dekte dit vele ladingen.
Dat ik uit deze zoektocht zelf veel uit opgestoken heb, dat is zeker. Daarnaast hoop
ik eveneens te hebben bijgedragen aan de onderzoeksliteratuur. Een dankwoordje
is hier dan ook aangewezen voor de kansen die ik kreeg in dit leerproces.
Hierbij dank ik in de eerste plaats mijn promotor Prof. Dr. Martin Valcke en de
projectcoördinator en mijn bureaugenoot Dr. Jean Pierre Verhaeghe voor de
nodige ondersteuning en waardering. Eveneens dank ik de faculteit, de universiteit
en het IWT voor de kansen die ze aan jonge mensen bieden.
Een volgend dankwoordje is er voor iedereen die dit doctoraatsonderzoek
praktisch mogelijk heeft gemaakt. Daarbij verdient mijn collega uit Antwerpen,
Prof. Dr. Jan Vanhoof, meer dan een eervolle vermelding. Daarnaast zijn er
natuurlijk de studenten en de schoolleiders bij wie ik mijn gegevens verkregen heb.
Zonder hun medewerking valt er niets te onderzoeken. De leden van mijn
begeleidingscommissie (Prof. Dr. Peter Van Petegem, Prof. Dr. Patrick Onghena,
Jean Pierre en Martin) en de beoordelaars van tijdschriften wil ik danken omdat zij
mijn werk op een hoger niveau tilden met hun constructieve opmerkingen.
Voorts wil ik al mijn collega’s bedanken voor de aangename sfeer op de
vakgroep. Voor iedere gemoedstoestand was er wel een luisterend oor. Hoewel
iedereen het wel druk had met zijn/haar eigen bezigheden, was er steeds ruimte
voor een aangenaam en interessant gesprek.
Tenslotte zou ik dit proefschrift willen opdragen aan drie belangrijke mannen in
mijn leven. Aan jou papa, omdat je ons goed inpeperde: “Als je iets doet, doe het
dan goed”. Ik hoop hierin geslaagd te zijn. Collin, bedankt voor “alles”, een woord
dat meer omvat dan de meeste mensen in hun leven ooit krijgen van iemand. En
kleine Remi, ma vraie joie de vivre, als er iemand een glimlach kan toveren ben jij
het wel.
Goedele Gent, december 2010
TABLE OF CONTENTS
CHAPTER 1: GENERAL INTRODUCTION 1
1. Introduction: Moving forward by looking backward 2
2. Conceptual framework for School Performance Feedback
Systems
3
3. Research context: Each school its own mirror 14
4. Problem statement 15
5. Dissertation overview: Purpose, research questions and
research design
16
References 19
CHAPTER 2: CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS 24
Abstract 25
1. Introduction 26
2. Conceptual framework 27
3. Method 31
4. Results: Application of the framework 35
5. Discussion 45
6. Conclusion 50
References 51
CHAPTER 3: PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL
PERFORMANCE FEEDBACK USE
55
Abstract 56
1. Introduction 57
2. Theoretical framework 57
3. Research questions 61
4. Research context 62
5. Research design 63
6. Findings and discussion 67
7. Implications, limitations and conclusion 79
References 83
CHAPTER 4: VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL
FEEDBACK INFORMATION
87
Abstract 88
1. Introduction 89
2. Method 96
3. Results and discussion 103
4. General discussion and conclusion 107
References 111
CHAPTER 5: THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL
PERFORMANCE FEEDBACK USE
114
Abstract 115
1. Introduction and research questions 116
2. Theoretical framework 117
3. Methodology: research design, procedure and research
instruments
120
4. Results 123
5. Conclusion and discussion 127
References 130
CHAPTER 6: EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK 134
Abstract 135
Samenvatting 136
1. Probleemstelling 136
2. Conceptueel kader 137
3. Methode 142
4. Resultaten 147
5. Discussie en conclusie 156
Literatuur 160
CHAPTER 7: GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK 164
1. Introduction 165
2. Overview of research objectives and main findings 165
3. General discussion: “Mirror, mirror on the wall” 174
4. Limitations of the studies and directions for future research 177
5. Implications of the results 182
6. Final conclusion 186
References 187
NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH] 191
1. Inleiding 192
2. Conceptueel kader 193
3. Het Schoolfeedbackproject: Een spiegel voor elke school 194
4. Onderzoeksdoelstellingen en -opzet 195
5. Voornaamste bevindingen 196
6. Conclusie 200
Literatuur 202
RESEARCH VALORISATION: PUBLICATIONS 205
1
CHAPTER 1
GENERAL INTRODUCTION
Chapter 1
2
CHAPTER 1: GENERAL INTRODUCTION∗
1. Introduction: Moving forward by looking backward
“There was a time in education when decisions were based on the best
judgements of the people in authority. It was assumed that school and
district leaders, as professionals in the field, had both the responsibility and
the right to make decisions about students, schools and even about
education more broadly. They did so using a combination of intimate and
privileged knowledge of the context, political savvy, experience and logical
analysis. Data played almost no part in decisions. Instead, leaders relied on
their tacit knowledge to formulate and execute plans. In the past several
decades, a great deal has changed. Accountability has become the
watchword of education and data hold a central place in the current wave
of large-scale reform. At the same time, school leaders find themselves
faced with challenges that are ill structured with more than a single, right
answer that demand reflective judgements (King & Kitchener, 1994);
judgements that require them to have knowledge and understanding in
relationship to context and evidence. School leaders are caught in the nexus
of accountability and improvement, trying to make sense of the role that
data can and should play in school leadership.” (Earl & Fullan, 2003, p 383)
In recent years, the trend of decentralizing educational systems has spurred
researchers to focus on school-based management and internal evaluation.
Because schools are granted autonomy, governmental bodies expect them
to be accountable for continuously monitoring their internal quality policy
and improving their functioning (Hofman, Dijkstra, & Hofman, 2009;
Leithwood, Aitken, & Jantzi, 2006; Nevo, 2002). As public institutions,
schools are required to inform about the resources invested. Besides this
external drive, schools as learning organizations are supposed to
∗ Based on: Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance
Feedback: Perceptions of Primary School Principals. School Effectiveness and School
Improvement, 21(2), 167-188. Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The
influence of competences and support on school performance feedback use. Educational
Studies. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten van
ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën.
Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of schools: How to represent school feedback information. Manuscript submitted for publication in The
Journal of Educational Research.
Chapter 1
3
systematically gather data on their school functioning for self-evaluation
purposes. The idea behind is that schools need to adapt to and interact with
its constantly changing environment as a continuous developing organism
(Earl & Fullan, 2003). “Moving forward by looking backward” is
characterizing this cyclic process. In this context, the current and past
performance level of a school serves as a starting point for developing
future plans and educational targets. Related buzz words of the current
educational jargon are data-driven decision making, school accountability,
school improvement and value added. Many of these terms are deduced
from managerial literature, which stresses the function of schools as
professional organizations.
In order to make proper decisions, schools need to get informed about
their functioning. Besides experiences, intuitions and impressions of school
staff, several data sources are embedded in the school’s self-evaluation
process. Not only own class tests, school questionnaires, class lists,
inspection reports and the like are used, but also school performance
feedback provided by specific systems. These so-called school performance
feedback systems are specifically designed for providing schools with
confidential information on their functioning. They follow the trend of data-
driven schools improvement by fulfilling the need of schools of accessible
information-rich environments. Several local initiatives have been
developed and implemented worldwide. However, little is known yet on the
impact of these systems on the schools’ functioning and performance (Coe
& Visscher, 2002; Schildkamp, 2007; Schildkamp, Visscher, & Luyten, 2009;
Visscher & Coe, 2003). Therefore, the impact of these feedback
interventions is an interesting niche to examine in educational research.
Not only it is worthwhile to consider possible school improvement effects,
but also the intended, unintended, desired and undesired outcomes.
Furthermore, before looking at the final outcomes, a closer look is
warranted on the process of feedback use, including the influencing
(f)actors.
2. Conceptual framework for School Performance Feedback Systems
2.1. Data-driven school improvement
Data driven decision making
Data-driven/-based decision making or data-driven school improvement
can be defined as “systematically analyzing existing data sources within the
Chapter 1
4
school, applying outcomes of analyses to innovate teaching, curricula, and
school performance, and, implementing (e.g. genuine improvement actions)
and evaluating these innovations” (Schildkamp & Kuiper, 2010, p 482).
Gathering data in order to continuously improving school actions is the
central goal of data-driven school improvement. This continuous movement
is characterized by a cyclic process; illustrated by several models described
in educational management literature. A generic model is the Shewhart or
Deming cycle (Deming, 1986), often applied in educational contexts. The
first P refers to the planning phase, followed by Doing, Checking and Acting.
Specific for data-driven decision making, these elements recur in several
data-use models both in practical and research literature (Abbott, 2008;
Learning Point Associates, 2004; Verhaeghe, Vanhoof, Valcke, & Van
Petegem., 2010; Mandinach, Honey, Light, & Brunner, 2008; Zupanc, Urank,
& Bren, 2009). Thereby, data are used to inform on the functioning of a
school, to set goals and make sound decisions for improvement, and to
evaluate the outcomes of these improvement actions.
School accountability and improvement
Most literature on data use results from studies from the United States;
thus from contexts in which school accountability traditionally has been
stressed (e.g., Teddlie, Kochan, & Taylor, 2002). Recent studies are often
situated within an educational context in which setting high standards and
establishing measurable goals is believed to improve individual outcomes in
education, as illustrated by the No Child left Behind Act (e.g., Schildkamp &
Teddlie, 2008). Therefore, most of these studies report on assessment data
only. However, in recent years, more studies on data-driven decision
making for school improvement have been published (e.g., Verhaeghe et
al., 2010; Visscher & Coe, 2003; Zupanc, Urank, & Bren, 2009). Within these
studies, data-based decision making is conceptualized in a broader sense by
not merely focusing on improving student outcomes and assessment data.
Several data sources are integrated to base decision on, such as self-
evaluation data, results of school and student surveys, school inspection
data, etc. (Schildkamp & Kuiper, 2010).
Systems providing data with the purpose of school accountability are
referred to as official accountability systems (Tymms, 1999). In a context of
which schools are held accountable for publicly funded activities, external
agencies generate data on the schools’ functioning to inform diverse
stakeholders on the return on investments. As opposed to these data
systems, professional monitoring systems generate data for voluntary and
Chapter 1
5
internal use by schools (Tymms, 1999). Therefore, these monitoring
systems are more in accordance with data-driven school improvement.
Both motives of data use appear opposite at first sight. However, several
studies illustrate the complementary and interacting position of school
accountability and improvement (Earl & Fullan, 2003; Hofman, Dijkstra, &
Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank,
& Bren, 2009). For example, assessment data can be used in public rankings
for accountability, while these data are also used for internal use within the
school after secondary analyses have been performed on the data (e.g.,
adjustment for pupil background characteristics, calculation of value-
added). The resulting improvement actions are then considered to
contribute to better pupil performances, which will be measured in the
following assessment. This can be considered as internal evaluation in the
service of external evaluation (Vanhoof & Van Petegem, 2007). On the
opposite, systems especially designed for providing schools with
confidential information can be interesting for school inspectorates to get
insight in the school’s functioning. If these inspectors act as critical friends,
supporting the school as learning organization, this external evaluation
functions in the service of internal evaluation (Vanhoof & Van Petegem,
2007).
2.2. Performance indicator systems and school performance feedback
systems
Performance indicators
To collect data on the schools’ performance and functioning, official
accountability and professional monitoring systems make use of
performance indicators. Goldstein and Spiegelhalter define a performance
indicator as ”a summary statistical measurement on an institution or
system which is intended to be related to the ‘quality’ of its functioning”
(1996, p 385). Following Rowe and Lievesley, these performance indicators
serve as “data indices of information by which the functional quality of
institutions or systems may be measured and evaluated” (2002, p 1). Fitz-
Gibbon & Tymms (2002) emphasize the systematic character of using
performance indicators, as they mention that these indicators are collected
at regular intervals to monitor a system’s performance. The content of
these performance indicators does not only cover output results of schools,
but also input, process and context information. These can include
indicators on resource provision and funding, participation rates of pupils,
Chapter 1
6
repetition rates, class sizes, factors affecting students’ progress rates, etc.
(Rowe & Lievesley, 2002).
To successfully serve schools in their data-driven school improvement,
these indicators have to meet certain requirements. First, feedback needs
to be relevant and useful (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004;
Rowe & Lievesley, 2002). Relevant feedback corresponds to the actual
information needs of the users (Rowe & Lievesley, 2002; Schildkamp &
Teddlie, 2008; Visscher, 2002). Furthermore, feedback needs to be
accurate, which refers to the reliability and validity of the data gathered
(Fitz-Gibbon, 1996; Heck, 2006; Rowe & Lievesley, 2002). Next, the cost-
effectiveness of the indicator system is an important consideration to take
into account (Fitz-Gibbon, 1996; Rowe & Lievesley, 2002). Related to this
utility perspective, the performance indicators should be delivered timely,
which both refers to both the currency and punctuality of the delivered
feedback (Fitz-Gibbon, 1996; Heck, 2006; Rowe & Lievesley, 2002; Visscher,
2002). Furthermore, users need to accept the performance indicators and
consider them to be fair. This fairness does not only refer to the striving
towards unbiased results (Heck, 2006), but also to the interpretability,
reliability, stability and incorruptibility of the reported performance
indicators (Fitz-Gibbon, 1996). Lastly, performance indicators should strive
towards beneficial effects and should avoid unwarranted harm (Fitz-
Gibbon, 1996; Fitz-Gibbon & Tymms, 2002; Goldstein & Myers, 1996). School Performance Feedback Systems (SPFSs)
A particular type of performance indicator systems are School Performance
Feedback Systems (SPFSs), which “are information systems external to
schools that provide them with confidential information on their
performance and functioning as a basis for school self-evaluation” (Visscher
& Coe, 2002, p xi). SPFSs primarily aim at supporting school improvement
and internal quality policy.
The different components of this definition require some explanation.
• The systemic organization of the feedback initiative: The feedback
providers are bound to an organization and produce school performance
feedback not as a one-shot activity but on a systematic basis. According
to the definition of performance indicators by Fitz-Gibbon & Tymms
(2002), data are collected at regular intervals to monitor a system’s
performance.
• The external component: SPFSs are external systems that offer their
services to schools. These services include data gathering, analysis and
reporting and sometimes support in using the feedback provided.
Chapter 1
7
• The goal of school improvement: This implies that SPFS developers
provide the school with performance feedback on a confidential basis, in
contrast with information made public for accountability reasons. By
generating data for voluntary use by schools, SPFSs are considered as
professional monitoring systems (Tymms, 1999).
• The unit level of information: SPFSs offer feedback on the schools’
functioning, which contains school level information. Therefore,
aggregation of individual pupil results is required.
• The content of the feedback: The content refers to the schools’
performance and functioning. This schools’ functioning encompasses
more than merely output results, but also refers to context, input and
process related indicators.
• The confidential character of the data: The development and discussion
of SPFSs knows increased attention as more studies are published on the
confidential use of information within schools. As a reaction on the
negative consequences of making results public (e.g. measure fixation,
misinterpretation; Smith, 1995) educational governments support
research initiatives by which low stake testing is promoted. Instead of
competition and external drives for school improvement, the inherent
need of monitoring quality in school functioning is stimulated by
providing confidential information.
• The focus on feedback: In accordance to feedback intervention theories,
data delivered by SPFSs is aimed to reduce the gap between the
intended and actual performance of actors (Black & William, 1998;
Hattie & Timperley, 2007). In order that feedback would be effective,
some conditions of the tasks performed, the feedback and situation
need to be fulfilled (Kluger & DeNisi, 1996). Applied to SPFSs, this implies
that outcomes of feedback use will be determined by characteristics of
the feedback reports, the educational context and the users. This will be
briefly discussed in the next paragraph.
The research domain of school performance feedback systems is recent and
rather unexplored. Especially studies on the actual use of these feedback
systems and their impact on the schools’ functioning are scarce. Therefore,
a firm overall theoretical framework for describing and evaluating school
performance feedback use and its effects is lacking.
2.3. Factors influencing school performance feedback utilization
Differences in the use of school feedback can be attributed to a variety of
factors. The most commonly used framework is that of Visscher (2002;
Chapter 1
8
Visscher & Coe, 2003). It has been applied in several studies on data use
(e.g., in Maier, 2010; Schildkamp & Teddlie, 2008; Schildkamp & Visscher,
2009; Verhaeghe et al., 2010; Zupanc, Urank & Bren, 2010). This framework
discerns four sets of factors influencing the use of the performance
feedback, including the design process and features of the underlying
SPFSs, the implementation process and the school organizational features.
This framework served as a basis for the studies conducted in this
dissertation, although some adaptations were made. Visscher and Coe
embed the process of feedback use in the broader school environment,
which we define as context-related factors. Furthermore, we distinguish
support-related factors as a separate set instead of placing it within the
implementation process and characteristics of the feedback system. As a
result, the following set of influential factors is outlined: Factors related to
the educational context, to school and users, to SPFSs, and to support. As
this framework will be described further on in this dissertation, we refer to
the main components and ideas.
The educational context of SPFSs
Context-related factors that impact feedback use include the school’s policy
strategies at the regional and/or governmental level (Sun, Creemers, & De
Jong, 2007; Visscher, 2002). For instance, policies can contain clear
expectations that schools make use of feedback information. Educational
governments can stimulate feedback use by pressure and/or support
(Visscher, 2002). Furthermore, feedback will be used differently depending
on the context in which accountability and/or improvement play a role (Earl
& Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof &
Van Petegem, 2007; Visscher, 2002; Zupanc, Urank, & Bren, 2009).
Moreover, data use will depend on the accessibility of data sources. For
example in Flanders, no central examination systems are available. This
means there is no public reporting on school examination results and
almost no high stakes testing, in contrast to the educational context in the
UK. Therefore, the data culture and the related data sources in English
schools will differ apparently from those in Flemish schools. Also
educational inspectorates in their role as quality guard keepers and critical
friends may promote the use of data (Vanhoof & Van Petegem, 2007). For
example, Flemish schools are encouraged to inform the inspectorate on
their functioning by means of output results. Depending on the
prescriptions and expectations of these inspections, certain types of data
use will be promoted.
Chapter 1
9
Users of SPFSs
School- and user-related characteristics are also key variables explaining
differences in school feedback use. Schildkamp and Kuiper (2010) mention
the style of school leadership, the degree of teacher collaboration, the
shared vision, norms and goals for data use, the available time to use data,
the provided training for data management and use, the designated data
expert in the school, and the pressure and support if using data as
important school characteristics having an influence on data use.
Furthermore, school performance levels also influence feedback use
(Visscher, 2002; Visscher & Coe, 2003). Schools receiving positive feedback
(large value added) will discuss the results differently compared to schools
receiving a less positive picture (Schildkamp, 2007). In line with control
theory, participants receiving negative feedback are more likely to make an
effort to reduce the discrepancy between the negative feedback and the
expected standards (Kluger & DeNisi, 1996). This will result in different
policy implications. However, this theory does not hold in all cases; it is not
unusual for school principals to withhold feedback information that does
not fit the current policy plan (Van Petegem & Vanhoof, 2004).
Considering personal characteristics of the feedback users, we firstly
refer to the motivation and attitudes to use an SPFS. Motivation varies from
internal quality development or external accountability to policy
preparation (Liket, 1992; van Aanholt & Buis, 1990). A negative attitude
towards SPF is – according to Bosker, Branderhorst, and Visscher (2007) –
one of the main obstacles in the use of feedback information. The attitude
is the most significant aspect that determines a person’s willingness to
invest time and energy in dealing with information (Williams and Coles
2007) and the users’ belief that they need the data in order to improve
education (Schildkamp and Kuiper 2010). Furthermore, previous
experiences with feedback use, general experience with school-related
data, and the statistical knowledge and skills needed to interpret feedback
reports will also influence feedback use. This data literacy “encompasses
the strategies, skills and knowledge needed to define information needs,
and to locate, evaluate, synthesize, organize, present and/or communicate
information as needed” (Williams and Coles 2007, p 188). Whereas most
teachers have experience with school test data, pupil monitoring systems,
and self-evaluations, in several studies school staff report that they are
lacking the skills and confidence when using data for school policy purposes
(Earl & Fullan, 2003; Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006;
Saunders, 2000; Williams & Coles, 2007). Data literacy is a condition for
being able to convert data into valuable and usable information (Earl and
Chapter 1
10
Fullan 2003). The current lack of know-how on making use of the
information is an important obstacle (Kerr et al. 2006; Saunders 2000; Van
Petegem and Vanhoof 2004; Williams and Coles 2007). Next to a lack of
capacities needed to interpret the data, there often is a lack of well
developed research skills such as the formulation of research questions and
hypotheses (Earl and Fullan 2003; Herman and Gribbons 2001; Kerr et al.
2006).
Characteristics of feedback reports and underlying SPFS
Not the characteristics of the feedback (system) but the users’ perception
of these characteristics mainly determines how feedback will be used
(Visscher, 2002). Therefore, we refer to the quality characteristics of
performance indicators outlined before (Fitz-Gibbon, 1996; Heck, 2006;
Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher,
2002). Consistent with our definition of SPFSs, feedback systems for school
improvement should guarantee confidentiality and anonymity to the
subjects and schools. At the level of content, feedback should be perceived
as relevant, non-threatening, and corresponding to the actual informational
needs (Schildkamp & Teddlie, 2008; Van Petegem & Vanhoof, 2007;
Visscher, 2002). Information should also be up-to-date, reliable, and valid
(Schildkamp & Teddlie, 2008; Visscher, 2002; Visscher & Coe, 2003). In
terms of ethical issues, feedback should at least do no harm (Fitz-Gibbon
and Tymms, 2002). For example, in some cases feedback can be threatening
to the recipients’ self-esteem, particularly in a system of accountability
(Visscher & Coe, 2003). Moreover, feedback should not harm subjects or
schools on the basis of misleading information (Goldstein & Myers, 1996).
Both features of the feedback reports as the underlying feedback system
are influencing the outcomes of the feedback usage. No detailed
frameworks or descriptions of these different components have been
published. Research is lacking on the variety in school performance
feedback systems. The publication of Visscher & Coe (2002) is the first
overview of some SPFSs worldwide, but no detailed comparative study has
been performed. Furthermore, the question stays unanswered why
feedback systems have been developed in a certain way. More information
on the rationales of feedback designers for opting for certain features is
required.
Chapter 1
11
Support in using SPFSs
Considering the lack of data literacy skills, school feedback users are
requesting for support, not only when interpreting the data, but also for the
further steps in data use. As a result, numerous studies stress the
importance of providing feedback support (Schildkamp & Teddlie, 2008;
Schildkamp, Visscher, & Luyten, 2009; Van Petegem & Vanhoof, 2007;
Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc, Urank, & Bren,
2009). This support can be provided by school staff within the school but
also by externals (e.g., educational support services or feedback suppliers),
either organized formally or informally, by one shot or long term
interventions, involving school principals or (parts of) the school team.
Furthermore, these support initiatives can be organized within or outside
the school, what can be considered as onservice and inservice education
and training (Gardner, 1995). School staff that are involved in SPFS training
are more likely to read the feedback reports and adopt a more positive
attitude (Tymms, 1995). However, research on the impact of support
initiatives related to the use of SPF is scarce as current support initiatives
often lack empirical verification (Zupanc, Urank, & Bren, 2009).
2.4. School performance feedback use: Types, phases and effects
Types of school performance feedback use
School feedback can be used in several ways, depending on what feedback
users aspire to. Rossi, Lipsey, and Freeman (2004) made a classification of
types of evaluation use: instrumental, conceptual and symbolic/convincing
use. This classification has been applied in studies on SPF use (Schildkamp,
Visscher, & Luyten, 2009; Verhaeghe et al., 2010; Visscher & Coe, 2003;
Weiss, 1998). An instrumental use of feedback serves as a starting point for
immediate policy-making decisions. For example, new reading methods are
introduced as the previous method led to disappointing results. A
conceptual use of feedback does not result in concrete actions but
influences the decision-making process, which indirectly affects action. An
example of conceptual use is the altered way of thinking about repeating
classes when confronted with remarkably high numbers for the school.
Even if feedback does not influence one’s conceptualizations, it can affect
the policy-making process in a symbolic way. This means feedback results
serve to convince others of existing opinions and to support viewpoints in
discussions (Visscher, 2002). Visscher & Coe (2003) added a fourth type of
data use: strategic use. Feedback can be used in a strategic way for
Chapter 1
12
accountability purposes, although this is not in line with a school
improvement discourse. These four types of feedback use can be
considered as intermediate results of feedback use that eventually will
contribute to school improvement. For example, a conceptual use results in
an altered way of thinking about pupil performances. This intermediate
result can in the end lead to effects of feedback use, such as a stronger
achievement orientation. In addition, feedback also can be used as a mean
to motivate or stimulate school staff to improve (Verhaeghe et al., 2010;
Schildkamp & Kuiper, 2010). Finally, a pupil-directed use of data is observed
when pupil level data stimulates supporting individual pupils in their
learning process (Verhaeghe et al., 2010).
Phases in school performance feedback use
In the framework of Visscher (2002; Visscher & Coe, 2003), SPFS usage is
described only in types of use. In addition, also phases in use could be
discerned (Verhaeghe et al., 2010). In analogy with the definition of data-
driven decision making of Schildkamp & Kuiper (2010), SPFS use
encompasses following stages: Analyzing the data, applying outcomes of
these analyses, implementing innovations, and evaluating these
innovations. Also the Learning Point Associates (2004) describes data use in
certain phases: Analyzing data patterns, generating hypotheses, developing
goal-setting guidelines, designing specific strategies, defining evaluation
criteria, and making the commitment with school staff to implement and
evaluate these actions. Specific for school performance feedback use,
following successive stages in feedback use could be discerned (Verhaeghe
et al., 2010; Verhaeghe et al., 2010):
• Receiving the feedback on school
• Reading and discussing
• Interpretation
• Diagnosis
• Planning of improvement actions
• Implementation of improvement actions
• Evaluation of both the improvement actions and the process of feedback
use.
Receiving SPF has turned out to be a necessary yet insufficient step as
both the schools and the feedback systems have to meet certain
requirements in order to actually use this in practice (Verhaeghe et al.,
2010; Visscher and Coe 2003). One of the major phases where school staff
gets stuck, is the interpretation phase, due to the lack of data literacy
competences needed to process the information. Although several studies
Chapter 1
13
report on the fact that school staff often struggle with data interpretation,
an examination of existing SPF systems and their related literature reveals
that research on user comprehension is scarce (Schildkamp & Teddlie,
2008). Few studies have examined the effectiveness of the various modes
of explaining and representing data in school feedback reports. This is
problematic considering the fact that SPF reports use complex concepts and
graphical representations, whilst SPF users (i.e., school staff) are often not
statistically skilled (Earl & Fullan, 2003; Kerr et al., 2006; Saunders, 2000;
Williams & Coles, 2007).
Effects of school performance feedback use
Feedback use should eventually lead to school improvement effects as
improved student outcomes, professional development, improved
didactical approaches, a stronger achievement orientation of staff etc.
(Schildkamp & Teddlie, 2008). This positive feedback impact has been
observed in several studies (Hammond &Yeshanew, 2007; Schildkamp &
Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009). However, as a result
of the difficulties in data interpretation and use and the limited use of
information, current research often reports disappointing results from
school feedback use (Coe, 2002; Saunders & Rudd, 1999; Schildkamp,
Visscher, & Luyten, 2009; Tymms, 1995; Van Petegem & Vanhoof, 2004;
Verhaeghe et al., 2010; Zupanc, Urank, & Bren, 2009). Several studies show
that the actual use of school performance feedback is often limited within
schools, which may (partly) have been caused by the characteristics of
these SPFSs (Earl & Fullan, 2003; Schildkamp & Kuiper, 2010; Schildkamp &
Visscher, 2009; Verhaeghe et al., 2010; Coe & Visscher, 2002).
In contrast to the intended effects, some literature findings refer to
unintended and undesired effects of data use. For example, the
(administrative) workload of teachers and principals may increase as a
result of using an SPFS (Fitz-Gibbon & Tymms, 2002; Schildkamp & Teddlie,
2008). Moreover, participants may feel threatened by the evaluation, and
evaluations may evoke defensiveness (Fitz-Gibbon & Tymms, 2002). Finally,
using an SPFS may have a demotivating impact on teachers, especially in
poorly performing schools (Van Petegem, Vanhoof, Daems, & Mahieu,
2005).
Chapter 1
14
3. Research context: Each school its own mirror
Until now, only a limited number of initiatives to develop data systems have
been undertaken in Flanders. The Flemish dislike of central examinations
and the resulting lack of systematic data collection on the performance of
pupils are in part responsible for this (Van Petegem, et al., 2005). However,
schools are required by law to monitor and improve their own quality in a
systematic manner. How they do so is a matter for the individual school and
is part of the autonomy which schools are granted in Flanders. Deregulation
and decentralization are therefore a continuing part of the educational
policy implemented in Flanders. Schools are becoming increasingly
autonomous and are achieving a greater degree of self-direction. The
Flemish government does not impose any formal systematic obligation
upon schools to carry out self-evaluation or to compel them to collect
output data. Policy with regard to school feedback use is therefore
primarily one of encouragement rather than strong pressure. When
carrying out inspections, the schools education inspectorate is primarily
concerned with schools’ output (in relation to their context, input and
process) and this is not without consequences for the way in which schools
themselves look at their own functioning in general and their output in
particular.
Within this context of autonomy and absence of central examination
data, several data initiatives have been taken. However, an SPFS accessible
to all Flemish schools was nonexistent. Therefore, researchers related to
three Flemish universities (Katholieke Universteit Leuven, Ghent University
and University of Antwerp) shared their expertise in developing an SPFS for
Flemish schools, named the School Feedback Project “Each school its own
mirror”1. The main objective of the School Feedback Project is to provide
schools with confidential information on their functioning to encourage
data-driven school improvement. The feedback project uses data from the
SIBO research project (Schoolloopbanen in het BasisOnderwijs [School
Trajectories in Primary Education]), which is a longitudinal study that has
been set up to investigate the school careers of 6,000 children from a
representative sample of Flemish schools, from the time they entered
kindergarten until the end of primary education. Data are collected by
means of standardized tests, surveys and observational data on child
characteristics, family background, class characteristics, classroom 1 This research was supported by the agency for Innovation by Science and Technology
(IWT), Grant number SBO 50194 (School Feedback Project). IWT is a Flemish government agency stimulating and supporting innovation by providing financial support to research institutes.
Chapter 1
15
practices, teacher attitudes and subjective theory, and school
characteristics (Verachtert, Van Damme, Onghena, & Ghesquiere, 2009;
Verhaeghe, Maes, Gombeir, & Peeters, 2002). The tests focus on language
learning (orthography, reading fluency, reading comprehension) and
mathematics. Item response theory based techniques are used to construct
the test scores, enabling to estimate growth curves. The SPF project, as so
far, was able to deliver trial versions of school feedback reports to the 1952
primary school principals participating.
The resulting trial feedback reports were delivered on yearly basis to the
schools. These individualized school reports informed about the
performance of their cohort under study. Results were reported for
mathematics, reading fluency, and orthography, supplemented with
information about pupil characteristics (child factors, home factors, and
Dutch language skills at the start of Grade 1). The school-specific results
were compared to the Flemish reference group. The central concepts in
these reports include learning gain, value added, and adjusted scores and
were explained in such a way that no prior statistical knowledge was
required. The data were supported with graphical representations (i.e.,
boxplots, bar graphs, pie graphs, growth curves, and cross tables). The text
of each report was standardized. The school principals were required to
interpret the results for their school, based on the general information
made available.
This studies conducted in this dissertation depart from this research and
development feedback project in order to contribute both to further
development of this SPFS and to scientific research on SPF use.
4. Problem statement
Research literature on SPFSs depicts some limitations that require further
examination. First, there is a lack of a firm theoretical framework for SPF
use. Neither the different components, nor the relations between all
variables of the framework developed by Visscher (2002; Visscher & Coe,
2003) can be considered as an overall structure that has been empirically
validated. Further examination of influencing factors on school
performance feedback use is required.
In addition, there is a lack of detailed studies on the use and impact of
existing school performance feedback initiatives (Coe & Visscher, 2002;
2 The number of the sample of SiBO schools receiving feedback reports from the School
Feedback Project might slightly differ from study to study, due to school fusions or drop out.
Chapter 1
16
Goldstein & Spiegelhalter, 1996; Schildkamp, 2007; Schildkamp, Visscher, &
Luyten, 2009; Visscher & Coe, 2003). Evaluation research on the functioning
and impact of SPFSs is warranted in order to evaluate the strengths and
weaknesses of these types of feedback interventions.
Several studies reported on the limited data literacy skills of school staff
in relation to data use. However, no detailed studies on SFPS user
comprehension have been performed. This research topic would be
interesting both from scientific and practical point of view.
In consequence of the limited capacity of school staff in interpreting and
handling the data, there is a large need for support initiatives. Not only
there is a need for setting up more support initiatives, but also the
evaluation of current support is warranted as these initiatives often lack
empirical verification (Zupanc, Urank, and Bren 2009).
5. Dissertation overview: Purpose, research questions and research
design
In the following chapters, five studies will be reported and discussed. In the
next chapter, we provide a general introduction in characteristics of SPFSs.
A framework for characteristics of SPFSs will be applied to five SPFSs
worldwide. This descriptive and analytic study illustrates both the wide
variety in features but also provides a discussion on the rationales for
making choices in feedback design.
Following on a framework of SPFS characteristics, Chapter 3 is devoted
to a framework for SPFS use. Parts of this framework will be used in further
studies described in the successive chapters. Based on the Visscher
framework, both influencing factors, SPF use and the resulting effects will
be analyzed in the context of the School Feedback Project by examining
users’ perceptions.
Intrigued by the call for research on feedback interpretability, the fourth
chapter focuses on the representation and interpretation of central SPF
concepts. Alternatives in representation modes of value added and learning
gain have been examined, by integration of literature on graphical data
representation. Particular attention will be paid to misconceptions and
interpretation difficulties.
The Chapters 5 and 6 tackle two crucial variables in SPF use: data literacy
competences and support in using SPF. By reporting the results of both a
quantitative (Chapter 5) and a qualitative (Chapter 6) study, the outcomes
of a field experiment with participants of the School Feedback Project will
result in recommendations for effective support in using SPF.
Chapter 1
17
A final chapter will enumerate the key finding from all studies by
answering the research questions. A complementary overall discussion and
general conclusion will conclude this dissertation.
Figure 1. Chapter overview
An overview of the research objectives, the central research questions,
methods, data analysis and participants for each of the five studies is
provided in Table 1.
Table 1
Dissertation overview
Research objectives Chapter numbers 2 3 4 5 6
RO 1: Exploring the characteristics of SPFSs �
RO 2: Developing a framework for SPF use, including
influencing factors and effects � (�) (�)
RO 3: Exploring data literacy competences � (�) (�)
RO 4: Exploring effects of alternative data
representation modes on feedback
interpretation abilities
�
RO 5: Exploring effects of support on SPF use � �
Chapter 1
18
Research questions 2 3 4 5 6
RQ1: What variety in SPFS characteristics can be
observed? �
RQ2: What are the rationales behind choosing for
certain SPFS characteristics? �
RQ3: What phases can be observed in practice when
schools use SPF? � (�) (�)
RQ4: What is/are the result(s) of using SPF? � (�) (�)
RQ5: How can differences be explained in the
interpretation and use of SPF in different
school contexts?
�
RQ7: What’s the differential impact of alternative
explanations and representations of value-
added on the conceptual and procedural
understanding of non-statically skilled?
�
RQ8: To what extent are variations in SPF use
influenced by data literacy competences? �
RQ9: To what extent does specific SPF support has
an impact on the development of SPF
competences, actual SPF use and resulting SPF
effects?
� �
RQ9.1: To what extent does INSET and ONSET
for SPF use have and impact on the
level of satisfaction of SPF users?
RQ9.2: To what extent does INSET and ONSET
for SPF use have and impact on the
data literacy competences of SPF
users?
RQ9.3: To what extent does INSET and ONSET
for SPF use have and impact on the
use of this feedback within the school?
RQ9.4: To what extent does INSET and ONSET
for SPF use have and impact on the
school improvement effects of SPF
use?
Methods 2 3 4 5 6
Survey research �
In-depth interviews � � �
Experiment � � �
Chapter 1
19
Data analysis 2 3 4 5 6
Qualitative analysis � � �
IRT-techniques � �
Path modeling �
Analysis of covariance �
Participants 2 3 4 5 6
School principals � � �
Students �
Feedback providers �
Note: � = main goal of study; (�) = side goal of study; SPF = school performance
feedback
References
Abbott, D.V. (2008). A functionality framework for educational
organizations: Achieving accountability at scale. In E. B. Mandinach & M.
Honey (Eds.), Data-driven school improvement: Linking data and
learning (pp. 257-276). New York: Teachers College Press.
Black, P. & William, D. (1998). Assessment and classroom learning.
Assessment in Education: Principles, Policy & Practice, 5(1), 7-75.
Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the
utilisation of management information systems in secondary schools.
School Effectiveness and School Improvement, 18(4), 451-467.
Coe, R. & Visscher, A.J. (2002). Drawing up the balance sheet for school
performance feedback systems. In A. J. Visscher & R. Coe (Eds.), School
improvement through performance feedback (pp. 221-254). Lisse, The
Netherlands: Swets & Zeitlinger.
Coe, R. (2002). Evidence on the role and impact of performance feedback in
schools. In A. J. Visscher & R. Coe (Eds.), School improvement through
performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger.
Deming, W.E. (1986). Out of the crisis. Cambridge: Massachusetts Institute
of Technology,Center for Advanced Engineering Study.
Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge
Journal of Education, 33(3), 383-394.
Fitz-Gibbon, C.T. (1996). Monitoring education: Indicators, quality and
effectiveness. London: Cassell.
Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in
indicator systems: Doing things right and doing wrong things. Education
Policy Analysis Archives, 10(6), 1-28. Retrieved from
http://epaa.asu.edu/ojs/article/viewFile/285/411
Chapter 1
20
Gardner, R. (1995). Onservice Teacher Education. In L. W. Anderson (Ed.),
International Encyclopedia of Teaching and Teacher Education (pp. 628-
632). London: Pergamon Press.
Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code
of ethics for performance indicators. Research Intelligence, 57, 12-16.
Goldstein, H. & Spiegelhalter, D.J. (1996). League tables and their
limitations: Statistical issues in comparisons of institutional
performance. Journal of the Royal Statistical Society: Series A: Statistics
in Society, 159(3), 385-443.
Hammond, P., & Yeshanew, T. (2007). The impact of feedback on school
performance. Educational Studies, 33(2), 99-113.
Hattie, J. & Timperley, H. (2007). The power of feedback. Review of
Educational Research, 77(1), 81-112.
Heck, R. (2006). Assessing school achievement progress: Comparing
alternative approaches. Educational Administration Quarterly, 42(5),
667-699.
Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support
school inquiry and continuous improvement: Final report to the Stuart
Foundation. Los Angeles: University of Carolina, Center for the Study of
Evaluation.
Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School self-
evaluation and student achievement. School Effectiveness and School
Improvement, 20(1), 47-68.
Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006).
Strategies to promote data use for instructional improvement: Actions,
outcomes, and lessons from three urban districts. American Journal of
Education, 112, 496-520.
King, P. & Kitchener, K. (1994) Developing Reflective Judgement:
understanding and promoting intellectual growth and critical thinking in
adolescents and adults. San Francisco, CA: Jossey-Bass.
Kluger, A.N., & DeNisi, A. (1996). The effects of feedback interventions on
performance: A historical review, a meta-analysis, and a preliminary
feedback intervention theory. Psychological Bulletin, 119(2), 254–284.
Learning Point Associates. (2004). Guide to using data in school
improvement efforts: A compilation of knowledge from data retreats and
data use at learning point associates. Retrieved from
http://www.learningpt.org/pdfs/datause/guidebook.pdf
Leithwood, K., Aitken, R., & Jantzi, D. (2006). Making schools smarter:
Leading with evidence. (3rd. ed.) Tousand Oaks, CA: Corwin Press.
Liket, T.M.E. (1992). Vrijheid & rekenschap: Zelfevaluatie en externe
evaluatie in het voortgezet onderwijs [Freedom and accountability: Self
Chapter 1
21
evaluation and external evaluation in secondary education]. Amsterdam:
Meulenhoff Educatief.
Maier, U. (2010). Accountability policies and teachers' acceptance and
usage of school performance feedback - a comparative study. School
Effectiveness and School Improvement, 21(2), 145-165.
Mandinach, E.B., Honey, M., Light, D., & Brunner, C. (2008). A conceptual
framework for data-driven decision making. In E. B. Mandinach & M.
Honey (Eds.), Data-driven school improvement: Linking data and
learning (pp. 13-31). New York: Teachers College Press.
Nevo, D. (2002). Dialogue evaluation: Combining internal and external
evaluation. In D. Nevo (Ed.), School-based evaluation: An international
perspective (pp. 3–16). Oxford, UK: Elsevier Science.
Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A
systematic approach. Thousand Oaks: Sage.
Rowe, K. & Lievesley, D. (2002). Constructing and using educational
performance indicators. Paper presented at the 2002 Asia-Pacific
Educational Research Association, Melbourne, Australia.
Rowe, K. (2004). Analysing and reporting performance indicator data:
'Caress' the data and user beware! Paper presented at the 2004 Public
Sector Performance and Reporting Conference, Sydney, Australia.
Saunders, L. (2000). Understanding schools’ use of ‘value added’ data: The
Psychology and sociology of numbers. Research Paper in Education,
15(3), 241-258.
Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’
data: A science in the service of an art? Paper presented at the British
Educational Research Association Conference, Brighton, University of
Sussex.
Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems
in the USA and in the Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schildkamp, K. (2007). The utilisation of a self-evaluation instrument for
primary education. Unpublished doctoral dissertation, University of
Twente, Enschede, The Netherlands.
Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform:
Which data, what purposes, and promoting and hindering factors.
Teaching and Teacher Education, 26(3), 482-496.
Schildkamp, K., & Visscher, A. (2009). Factors influencing the utilisation of a
school self-evaluation instrument. Studies in Educational Evaluation,
35(4), 150-159.
Chapter 1
22
Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school
self-evaluation instrument. School Effectiveness and School
Improvement, 20(1), 69-88.
Smith, P. (1995). On the unintended consequences of publishing
performance data in the public sector. International Journal of Public
Administration, 18(2&3), 277-310.
Sun, H., Creemers, B.P.M., & De Jong, R. (2007). Contextual factors and
effective school improvement. School Effectiveness and School
Improvement, 18(1), 93–122.
Teddlie, C., Kochan, S., & Taylor, D. (2002). The ABC+ model for school
diagnosis, feedback, and improvement. In A. J. Visscher & R. Coe (Eds.),
School improvement through performance feedback (pp. 75-114). Lisse,
The Netherlands: Swets & Zeitlinger.
Tymms, P. (1999). Baseline assessment and monitoring in primary schools.
Fulton Publishers: London.
van Aanholt, T., & Buis, T. (1990). De school onder de loep [The school under
scrutiny]. Culemborg, The Netherlands: Educaboek.
Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatie-
indicatoren als strategisch instrument voor schoolontwikkeling
[Feedback on school performance indicators as strategic instrument for
school improvement]. Pedagogische Studiën, 81, 338–353.
Van Petegem, P., Vanhoof, J., Daems, F., & Mahieu, P (2005). Publishing
information on individual schools. Educational Research and Evaluation,
11(1), 45-60.
Vanhoof, J. & Van Petegem, P. (2007). Matching internal and external
evaluation in an era of accountability and school development: Lessons
from a Flemish perspective. Studies in Educational Evaluation, 33(2),
101-119.
Verachtert, P., Van Damme, J., Onghena, P., & Ghesquiere, P. (2009). A
seasonal perspective on school effectiveness: Evidence from a Flemish
longitudinal study in kindergarten and first grade. School Effectiveness
and School Improvement, 20(2), 215-233.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
School Performance Feedback: Perceptions of Primary School Principals.
School Effectiveness and School Improvement, 21(2), 167-188.
Verhaeghe, G., Vanhoof, J., Van Petegem, P., Verhaeghe, J.P., & Van
Damme, J. (in press). Het gebruik van outputgegevens in basisscholen:
Concretiseringen en illustraties uit het Schoolfeedbackproject [The use
of output results in primary schools: Concretizations and illustrations
from the School Feedback Project). Kwaliteitszorg in Het Onderwijs.
Chapter 1
23
Verhaeghe, J.P., Maes, F., Gombeir, D., & Peeters, E. (2002). Longitudinaal
onderzoek in het basisonderwijs. Steekproeftrekking [A longitudinal
study in primary education: Sampling procedure]. Leuven, Belgium:
Steunpunt Loopbanen doorheen Onderwijs naar Arbeidsmarkt.
Visscher, A.J. (2002). A framework for studying school performance
feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement
through performance feedback (pp. 41-71). Lisse, The Netherlands:
Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Visscher, A. J., & Coe, R. (Eds.). (2002). School improvement through
performance feedback. Lisse, The Netherlands: Swets & Zeitlinger
Weiss, C.H. (1998). Have we learned anything new about the use of
evaluation? American Journal of Evaluation, 19(1), 21-33.
Williams, D., & Coles, L. (2007). Teachers’ approaches to finding and using
research evidence: An information literacy perspective. Educational
Research, 49(2), 185-206.
Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for
effectiveness and improvement in classrooms and schools in upper
secondary education in Slovenia: Assessment of/for Learning Analytic
Tool. School Effectiveness and School Improvement, 20(1), 89-122.
24
CHAPTER 2
CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS
Chapter 2
25
CHAPTER 2: CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS∗∗∗∗
Abstract
As evaluation and data-driven decision making are receiving increased
attention in education, more and more School Performance Feedback
Systems (SPFSs) are being developed and used worldwide. These systems
provide schools with data on their functioning. However, little research is
available on the characteristics of the different SPFSs. Therefore, this study
reflected on characteristics SPFSs to provide feedback designers and users
arguments for making sound choices in selecting certain school
performance feedback characteristics. Based on literature on data driven
decision making, a framework for identifying SPFS characteristics was
developed. Next, this framework was applied to five diverse SPFSs.
Interviews and surveys were administered, and documents about the
selected SPFSs were collected. By integrating the results of the survey,
semi-structured interviews, and SPFS documents, a summary meta-matrix
was created. The results illustrate wide variety of the five selected SPFSs,
with respect to features related to data gathering and data analysis
processes, the content, and the numerical measures and representation
modes used. Large variety in complexity and accuracy of the data modeling
can be detected. These findings imply that users need to be informed
properly on the underlying rationales of SPFSs features and on the
limitations and strengths of the performance indicators used. Expanding
and adjusting on the preliminary framework into a set of standards SPFS
developers and schools can use, might aid to develop efficient instruments
for data driven decision making.
∗ Based on Verhaeghe, G., Schildkamp, K., & Luyten, H. (2010). Characteristics of School
Performance Feedback Systems. Manuscript submitted for publication in Educational
Administration Quarterly.
Chapter 2
26
1. Introduction
Schools all over the world have been granted more autonomy.
Governmental bodies consider them as learning organizations and hold
them accountable for continuously monitoring their internal quality policy
and improving their functioning (Hofman, Dijkstra, & Hofman, 2009;
Leithwood, Aitken, & Jantzi, 2006). Therefore schools are required to
systematically gather data on their school functioning for self-evaluation
purposes. Several schools use School Performance Feedback Systems
(SPFSs) to gather these data, which “are information systems external to
schools that provide them with confidential information on their
performance and functioning as a basis for school self-evaluation” (Visscher
& Coe, 2002, p xi). SPFSs primarily aim at supporting school improvement
and internal quality policy. These feedback initiatives contribute to the
creation of information-rich environments which are essential for schools in
their data-driven decision making. Although data from SPFSs are only one
source of information, they may provide schools with important
information on variables associated with school effectiveness, which
schools can use to improve their performance in terms of improving
teacher instruction and ultimately student achievement (Davies & Rudd,
2001; Visscher & Coe, 2003). However, the empirical findings are not always
confirming the expected positive effects of the SPFSs. Several studies show
that often the actual use of school performance feedback is limited within
schools, which may (partly) have been caused by the characteristics of
these SPFSs (Earl & Fullan, 2003; Schildkamp & Kuiper, 2010; Schildkamp &
Visscher, 2009; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010; Coe &
Visscher, 2002).
A wide variety of SPFSs can be discerned, all designed for specific
purposes in certain educational contexts. All adopt their own data gathering
systems, statistical methods, data representations, etc. However, little is
known on the distinct characteristics of these SPFSs or on the rationales
behind these features. Little is also known about whether its users are
capable of correctly interpreting and analyzing data derived from these
systems, which is a crucial condition for data-driven decision making. A
debate on characteristics of SPFSs would be a first starting point for
reflection for current and future feedback providers and users. Therefore,
this study has been set up, focusing on the characteristics of SPFSs,
specifically on the data gathering and data analysis processes, the content,
and the representation modes of SPFSs. We will examine the variety in
these aspects and the underlying rationales of these variations.
Chapter 2
27
2. Conceptual framework
2.1. School performance feedback systems
The definition of SPFSs includes several important aspects:
• The systemic organization of the feedback initiative: The feedback
providers are bound to an organization and produce school performance
feedback not as a one-shot activity but on a systematic basis.
• The external component: This refers mainly to the data analysis and
feedback provision. The data gathering process can be conducted in
cooperation with school team members.
• The goal of school improvement: This implies that SPFS developers
provide the school performance feedback on a confidential basis, in
contrast with information made public for accountability reasons. By
generating data for voluntary use by schools, SPFSs are considered as
professional monitoring systems. They differ from official accountability
systems, by which schools are hold accountable as publicly funded
institutions (Tymms, 1999).
• The unit level of information: School performance feedback goes beyond
individual pupil results. At least some indications are provided on the
school’s functioning and effectiveness by aggregating data.
• The content of the feedback: The content refers to the schools’
performance and functioning. This schools’ functioning encompasses
more than merely output results, but also refers to context, input and
process related indicators.
If one looks at the definition and characteristics of an SPFS, many
different systems might be considered as SPFSs, including central
examination systems, school inspectorate, national assessment systems,
pupil monitoring systems, research projects, school self-evaluation systems
and providers of standardized tests (see Table 1).
However all systems described in Table 1 can function as SPFSs, they
simultaneously might function as an official accountability system. For
example, central examination data are often considered by inspectorates
and parents as a performance indicator for the school’s functioning. In
addition, these data can be transformed into confidential feedback, after
having performed secondary analyses on these results (Yang, Goldstein,
Rath, & Hill, 1999). Also, reports from inspection visits can serve both
purposes of accountability and improvement. This illustrates that the
relation between accountability and improvement may have different
Chapter 2
28
configurations (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009;
Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank, & Bren, 2009).
Table 1.
Different kinds of SPFSs
SPFS Description
Genuine SPFSs The core task of these systems is providing schools with
confidential information on their functioning.
Central examination
systems
Sometimes, (raw or adjusted) results of central
examinations are fed back to schools for school
improvement, instead of/ in addition to making the
results public.
School inspectorates These reports can be considered as school feedback if
they serve the purpose of school improvement,
instead of/ in addition to accountability.
National assessment
systems
This differs from central examinations as this
information is gathered in the first place for
educational governments to make a state of the art
of a national educational level. However, if school-
specific results are confidentially fed back to
schools, it can be considered as school feedback.
Pupil monitoring
systems
These systems are developed to assess individual
pupils’/ students’ learning progress in the first place.
These results can be used as school feedback, when
also aggregated reports are provided for a group of
pupils/ students.
Research projects Participation in research projects can result in a school
feedback report, as a return in investment.
School self-
evaluation systems
These are systems developed only with the purpose to
provide schools with confidential information on
their performance and functioning.
Standardized tests Some (psychometric) standardized tests, taken from
individual pupils/ students, can result in aggregated
scores for a class or group and thus can be
considered as school feedback.
2.2. Performance indicators
SPFSs gather information on the schools’ performance and functioning, by
making use of performance indicators. Following Goldstein and
Spiegelhalter “a performance indicator is a summary statistical
Chapter 2
29
measurement on an institution or system which is intended to be related to
the ‘quality’ of its functioning” (1996, p 385). Rowe and Lievesley add an
evaluative component to this definition: “performance indicators (PIs) are
defined as data indices of information by which the functional quality of
institutions or systems may be measured and evaluated” (2002, p 1).
Applied to the context of schools and internal quality policy, Fitz-Gibbon &
Tymms (2002, p 2) define an indicator “as an item of information collected
at regular intervals to track the performance of a system”. Hereby, they
emphasize the systematic character of the data gathering and analysis,
which corresponds to the definition of SPFSs by Visscher and Coe (2002).
School performance indicators do not only report about the output
aspect of school quality, such as pupil achievement results, but also on the
context, input and process of the school’s functioning. These can include
indicators on resource provision and funding, participation rates of pupils,
repetition rates, class sizes, factors affecting students’ progress rates, etc.
(Rowe & Lievesley, 2002).
To successfully serve schools in their internal quality policy, these
indicators have to meet certain requirements (Fitz-Gibbon, 1996; Heck,
2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008;
Visscher, 2002). First, feedback needs to be relevant and useful, which
means it corresponds to the actual information needs of the users.
Furthermore, feedback needs to be accurate, which relates to the reliability
and validity of the data gathered. Next, the cost-effectiveness of the
indicator system is an important factor to take into consideration. Related
to this utility perspective, the performance indicators should be delivered
timely, which both concerns the currency and punctuality of the delivered
feedback. Furthermore, users need to accept the performance indicators
and consider them to be fair. This fairness does not only refer to the striving
towards unbiased results, but also to the interpretability, reliability, stability
and incorruptibility of the reported performance indicators. Lastly,
performance indicators should strive towards beneficial effects and should
avoid unwarranted harm (Goldstein & Myers, 1996; Fitz-Gibbon, 1996, Fitz-
Gibbon & Tymms, 2002). Although there is a lack of systematic evaluation
of effects of SPFSs (Visscher & Coe, 2003), some literature findings refer to
unintended effects of data use. For example, the (administrative) workload
of teachers and principals may increase as a result of using an SPFS (Fitz-
Gibbon & Tymms, 2002; Schildkamp & Teddlie, 2008). Moreover,
participants may feel threatened by the evaluation, and evaluations may
evoke defensiveness (Fitz-Gibbon & Tymms, 2002). Finally, using an SPFS
may have a demotivating impact on teachers, especially in poorly
performing schools (Van Petegem, Vanhoof, Daems, & Mahieu, 2005).
Chapter 2
30
2.3. Framework for SPFSs
The central aim of this study is to explore the variety in characteristics of
SPFSs and to reveal the underlying rationales. School performance feedback
systems however cover a wide range of characteristics. In this study, we
focus on the following main aspects of school performance feedback (see
Table 2): the data gathering process, the data analysis, and the content of
the feedback report, with a focus on the numerical measures and graphical
representations used. Describing the data gathering and analysis is crucial
to get a view on the reliability and validity of the feedback produced. In
order to get a view on the relevance of the feedback system, the content of
the feedback reports of the selected systems are described. Finally, we
focus on the feedback representations used. This includes both the
numerical measures and graphical representations used, to get a view on
the interpretability of what is fed back to schools.
Table 2.
A framework for comparing SPFSs
SPFS characteristics
Data gathering
- Data administrators ( e.g., school team members, field workers from SPFS )
- Medium (e.g., paper pencil, computer)
- Structuredness of instruments (e.g., completely structured, semi structured,
computer adaptive)
- Types of instruments (e.g., tests, interviews, surveys, observation scales)
- Data source (e.g., pupils, teachers, parents)
- Timing (e.g., any time, fixed moments)
- Place (e.g., classroom, computer lab, playground)
- Options in test administration (e.g., fixed, flexible or demand driven supply)
Data analysis
- Type of analysis (e.g., quantitative, qualitative)
- Scaling model (e.g., Classical Test Theory, Item Response Theory)
- Model used (e.g., regression model, Ordinary Least Squares, multilevel
analysis)
- Type of value added (e.g., prior, concurrent)
- Levels of unit (e.g., pupil level, year group level, school level, cohort level,
subscale level, item level, subject level, aggregate level)
- Measurement moments (e.g., single measurements, successive
measurements, two linked measurements, longitudinal measurements)
Chapter 2
31
Feedback content
- Variables (e.g., attitudinal, behavioural, cognitive, contextual)
- Subjects (e.g., language, mathematics, science, world orientation)
- Non subject specific information (e.g., school culture, pupil background
variables, pupil mobility, socio-emotional development, ADHD scale,
attitudes to school, dyslexia, study skills)
- Reference group (e.g., national average, representative sample of
population, group of participating schools)
- Type of reference (e.g., self-referenced, norm referenced, criterion
referenced)
- Reliability indication (e.g., confidence intervals, significant values)
- Text content (e.g., results, interpretation of results, explanation of statistical
concepts and graphical representations, information on how to
communicate results)
- Numerical measures (e.g., raw scores, expected scores, cut off score, gain
score, mean score, predicted score, value-added score)
- Feedback medium (e.g., static reporting, flexible tool)
- Graphical representations (e.g., bar graph, box plot, histogram, layer graph,
line graph, pie graph)
- Reliability indices (e.g., confidence intervals, significance values)
3. Method
3.1. Instruments
In order to create a framework for describing SPFSs a qualitative method
has been used. Literature on SPFSs reveals that the framework developed
by Visscher (Visscher, 2002; Visscher & Coe, 2003) is the most frequently
cited and used (e.g., in Maier, 2010; Schildkamp & Teddlie, 2008;
Schildkamp & Visscher, 2009; Verhaeghe et al., 2010; Zupanc, Urank &
Bren, 2010). This framework discerns four sets of factors influencing the use
of the performance feedback, including the design features of the
underlying SPFSs and the characteristics of the feedback report itself. In
addition to the framework of Visscher, more concrete features will be used
to compare the selected SPFSs. These are chosen based on literature review
on performance indicators, data-driven decision making, data use and
SPFSs. This resulted in a framework that enables to analyze and describe
SPFSs.
This framework for describing SPFSs was restructured in the form of a
survey. All different options were summed and explained in a MS word file,
Chapter 2
32
including 46 items (11 items with background information to identify the
SPFS, 9 items on the data gathering process, 7 items on the data analysis,
14 items on the content of the feedback report and the concepts used, and
5 items on the graphical data representation).
Almost all questions were multiple-choice items, besides some open
questions. Depending on the items, the respondents were requested to
provide complementary explanation.
3.2. Selected SPFSs
The five systems described in this study were purposefully selected because
of their diversity in feedback characteristics and because of the availability
of information on these systems. This selection was not made to strive for
representativeness but to illustrate and describe exemplarity. First, each
selected SPFS is shortly described:
• Assessment Tools for Teaching and learning (asTTle): AsTTle has been
developed as part of a government funded research project at the
Visible Learning Labs of the University of Auckland in New Zealand. This
SPFS offers schools a national assessment model with all characteristics
of an SPFS, without the negative consequences of high stakes testing.
This feedback production should help to make teachers acquainted with
the national curriculum, to enhance future teaching and learning. About
80% of all elementary and high schools of NZ are using the asTTle (year
4-12). Participation is voluntary and free of charge. The feedback is
offered both in English and Maori, which have two distinct curricula.
Feedback reports are delivered directly and immediately to school team
members and pupils/students and parents via a secured online website
or via software used on the local network. There are no results made
public. A remarkable option of asTTle is the direct feedback delivery to
students and parents. The technological applications allow pupils to get
access to their results during their school career, over all different years
and schools. Summarized, asTTle functions as a professional monitoring
system as the purpose is to create a low-stake assessment system to be
used internally within the schools. As it provides the function of
following individual learning paths, it serves as a pupil monitoring
system. However, the main function is the detection of learning needs
on an aggregate level.
• Performance Indicators in Primary Schools (PIPS): PIPS was developed
by The Centre for Evaluation and Monitoring at the Durham University
(UK). It is widespread in primary schools (from reception to year 6), in
Chapter 2
33
England and Scotland and to a smaller scale in the other parts of the UK.
Furthermore, PIPS has local adaptations of the system, applied
worldwide. Within the UK, independent schools show the largest
interest in PIPS, as compared to the government funded schools, as they
lack monitoring systems and information on national testing because
they do not follow the national curriculum. As the access to PIPS is not
cost-free, schools have to use their school budgets. All participation is
voluntary, although some schools are strongly encouraged to participate
by their Local Authorities. PIPS started as a research project, which
transferred its services to schools. In some cases, Local Authorities also
get direct access to the data of their schools, if they have paid for the
assessments. They are not allowed to make these results public and are
supposed to use the data for supporting schools. The feedback is
delivered via regular mail (to the PIPS coordinator on the school) and via
a secured electronic portal. Depending on whether the assessments
were computer delivered or paper based, feedback production can take
between two days and eight weeks. The main function of PIPS is a pupil
monitoring system, besides a research project, SPFS and standardized
test.
• South African Monitoring system for Primary Schools (SAMP): PIPS
served as a basis for the development of this system. It has been evolved
to an almost complete distinct SPFS, developed at the Centre for
Evaluation and Assessment at the University of Pretoria (SA). Due to
resource limitations, feedback is only delivered in the Tshwane Region
for the first year of primary education. Furthermore, only the
government funded schools are reached as these are the schools with
the largest need for accessible assessment systems, in contrast to the
wealthier independent schools. Therefore, this SPFS delivers feedback
for free (limited to 80 learners per school). Very specific for the
development of SAMP is the complicated language context of SA, with
has 11 official languages. SAMP is restricted to the three predominant
languages of instruction in that region: English, Afrikaans and Sepedi.
Therefore, SAMP is a small scale SPFS offering feedback to 22 schools.
All of these schools are participating voluntary. The feedback users are
in the first place the school team members. They are free to
communicate the results with other stakeholders, such as parents, the
department of education, etc. Feedback supply via regular mail is not an
option as there is no assurance the package will reach its destination in
SA. Since many schools lack internet and even computer access,
electronic feedback delivery is neither an option. Therefore, feedback is
Chapter 2
34
delivered on the school to the contact person. This happens four days to
two weeks after data gathering.
• Leerling- en OnderwijsVolgSysteem (LOVS) [Pupil and Educational
Monitoring System]: Similar as PIPS, LOVS is in the first place a pupil
monitoring system (reception to year 7), besides a research project,
SPFS and standardized test in the Netherlands. Furthermore, some local
projects (e.g., in Germany, Turkey, Denmark, etc.) make use of the LOVS
software. Dissimilar to the other systems in this study, LOVS is also an
official accountability system (e.g., it is used by the Dutch Inspectorate)
in addition to a professional monitoring system. During inspection visits,
schools may be asked for permission to show their results on these
tests. Furthermore, the inspectorate sometimes strongly encourages
(weak scoring) schools to participate if they do insufficiently use data
sources to prove their functioning. This implies that some schools may
experience participation as an obligation, whilst in general voluntary
assessment is the rule. The wide acceptance of LOVS is indicated by a
95% rate of use of at least one of the tests in all elementary schools in
The Netherlands, including special needs education. This feedback is
provided by a private company, called CiTO [Central institute for Test
Development]. Due to this private character, schools use their budgets
for the services offered. As a consequence, they are the only owners of
their data. To disseminate their results to externals, schools need the
permission of the parents. The way of delivering feedback depends on
the tests taken. Some results are sent by regular mail, while other data is
provided via an electronic portal, via software on a disk or manually by
means of printed scoring tables. Also depending on the test taken and
the standardization process (based on previous or current reference
groups), feedback delivery takes a second to a few months.
• Schoolfeedbackproject (SFP) [School Feedback Project]: The SFP is still
in a developmental phase and is thus not commercially available yet.
The SFP is a research and development project, initiated by three
universities in Flanders (Belgium). As there is no central assessment
system, schools lack information on their performance in relation to the
national average or to results of schools with similar characteristics.
Therefore, a government funded project has been set up for creating a
Flemish SPFS. In this study, only the system developed for primary
education will be described (year 1-6). From 2011, this will be
commercially available, whilst the current sample (representative
reference group of 195 schools) is participating for free. Although
participation was voluntary in general, some school boards decided for
their schools to participate. Results are fed back confidentially to schools
Chapter 2
35
only. In addition, aggregated results are reported to school boards, as
part of the research project. School reports are delivered to the
feedback coordinator on the school by electronic mail. Due to the
developmental phase of the project, feedback generation took several
months. From 2011, feedback will be delivered in an automated and
quick way as the underlying software engine, feedback formats and
reference data are already available. Furthermore, the SFP is developing
a secured electronic portal to upload student data and download school
feedback.
3.3. Procedure
This survey was sent to the directors or coordinators of five selected SPFSs.
They were informed about the purpose of this study. Additionally, semi-
structured in-depth telephone interviews were hold, to elaborate or clarify
some of the answers from the survey and to gather information on the
rationales for opting for certain SPFS characteristics. The telephone
conversations, which took on average 90 minutes, were audio-taped with
permission of the interviewees and transcribed afterwards. The integrated
results from survey and interview were sent to the interviewees for
member checking.
Finally, the integrated files of surveys and interviews were summarized
in separate files for each feedback system. These files were integrated in a
conceptually ordered meta-matrix (Miles & Huberman, 1994) that
facilitates a variable-oriented and case-oriented analysis. Furthermore, this
meta-matrix serves to give a quick overview of the variety in feedback
systems. Parts of this meta-matrix will be illustrated and explained in the
results section.
4. Results: Application of the framework
4.1. Data gathering
Having a view on the data gathering process is of major importance for
evaluating the accuracy of the data on which the feedback is based on.
Therefore, the following elements have been included in the framework:
the persons gathering the data, types and structuredness of instruments
used, data gathering medium, time and place of data collection and the
data source. Table 3 gives an overview of the instruments used.
Chapter 2
36
Table 3
Overview of data gathering instruments used in selected SPFSs
asTTle PIPS SAMP LOVS SFP
Completely structured
Domain specific tests X X X X X
Survey on attitudes/ socio emotional
development
X X X
General achievement test X X
Observation scale X X
ADHD-scale X
Pupil background questionnaire X X
Test on study skills X
Survey on social emotional functioning X
Test of intelligence X
Test on interests X
Semi- structured
interviews on strategies in mathematics,
writing assessments
X
Pupil background questionnaire X
Rating scale for evaluation of a technical
piece of work
X
Computer adaptive
Domain specific tests X X (1)
X(2)
Screening instrument for Dyslexia X
Other
Automatic upload of pupil background
variables from data management system
X X X
Observation notes of testing: no
structured instruments
X
Upload of results from Statutory
Assessment Tests
X
Note (1): Computer-delivered version of PIPS for Years 1 – 6; all other tests use
stopping rules based on a number of mistakes made, on increasingly difficult items
Note (2): depending on the test taken
In almost all cases (asTTle, PIPS, LOVS, SFP) teachers and/or other school
team members organized the test administration on the school, following
strict testing instructions. Only in case of SAMP, field workers from the SPFS
guided the assessment. This choice was made because of the reliability of
the data collection and because of not interrupting teachers from their
teaching. Furthermore, teachers were not only organizing the test, but
Chapter 2
37
sometimes they were also providing data on the pupil’s functioning. In PIPS
and the LOVS for example, they completed observation scales, pupil
background questionnaires and/or surveys on the socio-emotional
functioning. In asTTle, teachers have an even more active role by
composing the test based on predefined parameters and options by using
the testing software tool. Furthermore, parents can also be asked to
provide information. In the case of the SFP, a parent questionnaire is
provided for gathering home and pupil background information.
Not only the testing instructions, but also most testing instruments are
highly structured. Almost all instruments are completely structured. This
means that tests and questionnaires entirely describe and guide the data
collection. In some cases, semi-structured instruments are used. For
example, SAMP does not require schools to complete structured
questionnaires on student background variables, but just lists what
information would be favorable to deliver (due to a lack of pupil
information and lack of computerized management system). In contrast,
asTTle, LOVS and PIPS make use of advanced software options that allow
automatic import of pupil level data from the school’s management
information systems. These three SPFSs additionally provide computer
adaptive testing. This means that test items are presented to pupils
accordingly to their ability level. For example, if pupil performs well on an
intermediate difficult item, a more difficult question will be presented.
Subsequently, if he performs poorly, he will be presented with a simpler
item.
Testing pupils can be very time consuming, especially in case of younger
children. As they do not master reading or writing skills, often a one-on-one
oral testing is necessary. In this case, the instructor provides the
explanation following the guidelines and the pupil provides the answers
(e.g. in PIPS and SAMP). In other cases, a one-on-one testing is required
because of the nature of the test (e.g. reading fluency in SFP and LOVS). In
other cases, a one-on-one testing is optional, as the testing medium allows
both individual and classical testing at any time. This is the case for asTTle,
as each pupil owns a personal computer and software adapts
standardization of the scores to the moment of testing. More rigid systems
with paper-pencil tests, computerized tests in computer labs and/or fixed
measurement moments will be more likely to adopt whole classroom
testing (PIPS, LOVS, SFP).
The place of testing is highly related to infrastructural characteristics.
Mostly, tests take place at the classroom if printed booklets are used
(asTTle, PIPS, LOVS, SFP), or in the computer lab for computerized versions
(PIPS, LOVS). Testing administration of asTTle is flexible because of
Chapter 2
38
technological provisions. Even testing at home is plausible. In case of SAMP,
each testing situation is slightly different as has to be sought to an
appropriate place in each school (e.g., in the staff room, under a shady
tree).
4.2. Data analysis
In this section, we will focus on the underlying scaling model used, on the
data analysis model used including value-added measures, on the
opportunities for longitudinal measurements, on the inclusion of pupil
mobility and on the levels of aggregation used. Being informed about the
data analysis of SPFSs is a prerequisite for making judgments on the
accuracy of the feedback. In all feedback systems, testing data are analyzed
quantitatively. We will focus in this section on the variety in these
techniques used.
First, the underlying scaling models have been examined. Item Response
Theory (IRT) is underlying all SPFSs to some degree. This technique
estimates several parameters, including the difficulty level of the items and
the skill scores of the respondents. By creating one skill scale that relates
different tests in a certain domain, IRT offers opportunities for longitudinal
measurements or computer adaptive testing. IRT has been applied in the
selected SPFSs for defining the item parameters and composing tests
(SAMP). AsTTle, PIPS, LOVS and SFP go further and use IRT for defining
ability test scores for the respondents for certain test versions. The IRT
model that has been used most widely is Rasch (in asTTle, PIPS and SAMP).
The techniques used in LOVS depend on the test taken and SFP uses a more
complex 2-parameter model. The system taking the most advantage of IRT
is asTTle. In combination with the possibilities of the software tools,
teachers are enabled to compose tests from an item bank with different
degrees of difficulty. Besides IRT, Classical Test Theory (CTT) is applied in all
systems. This is not only used for analyzing data from interviews, surveys
and/or observation scales (asTTle, PIPS, LOVS), but also for some tests
(SAMP, SFP) which require no further analysis than a sum score.
Only PIPS and SFP make explicit use of value-added measures. These
measures indicate to what extent scores (raw or adjusted) are above or
below an “expected” value. The expectations are based on statistical
analyses that estimate the impact of independent variables such as
cognitive aptitude, prior achievement and socioeconomic background.
Value added is reported in PIPS and SFP as a measure of the school’s
influence on the pupil’s performance. PIPS makes a distinction in prior and
concurrent value added. For estimating the former type, a general
Chapter 2
39
achievement score is taken into account, as an aggregate of subject specific
test scores and a developed ability score. Concurrent or contextual value
added only includes the developed ability score. In addition, SFP uses both
student background variables and prior achievement scores in the
estimation of contextual value added. Both systems conflict in this
conception as student background are either seen as redundant or
necessary variables to be included in the model. Furthermore, the value-
added realizations differ significantly in their level of reporting. While SFP is
convinced that value added should only be reported at an aggregate level,
PIPS allows pupil level residual analysis. LOVS implicitly applies the notion
of value added measures by reporting the difference in growth of the
school as compared to the reference group.
When focusing on the statistical model underlying the feedback
production, a large variety in complexity can be noticed. While some SPFSs
strive for complexity to provide a nuanced view on school performance
data (as SFP and LOVS), others consciously avoid model complexity in favor
of the transparency for feedback users. For example, in the calculation of
value added, PIPS applies an ordinary least squares in contrast to the
multilevel piecewise growth curve models of SFP. Other systems do not use
regression models as they do not intend predicting scores or calculating
value added in order to keep the low-stakes character of testing (asTTle) or
are still in a development phase with limited capacity for growing
complexity (SAMP).
The type of statistical models used defines the options for longitudinal
measurement. This means that scores for pupils are linked to each other
over time. Whether or not learning progress can be measured, depends on
the scale used. In case of asTTle, progress is estimated on one underlying
IRT ability scale, which links all tests in a certain domain. PIPS uses a scale of
standardized scores (either obtained by CTT or IRT) and puts these scores
on a time line. SFP in contrast not just places the (rescaled) IRT-scores on a
time line but provides both raw scores and scores adjusted for both for the
influence of prior achievement and pupil background characteristics using a
multilevel piecewise repeated measures model. This gives a different
conception of growth and longitudinal measurement. The adjusted scores
do not express the actual achievement levels, but the levels that would
have been achieved if the pupils would have had the same background
characteristics as the reference group. Also for some tests of the LOVS,
adjustments have been applied for pupil background characteristics.
Another factor delineating opportunities for longitudinality is the
number of measurement occasions. AsTTle, PIPS, LOVS and SFP for example
offer tests with (at least) three linked measurement moments, while SAMP
Chapter 2
40
only tests pupils at the start and end of the first year of primary education.
In all systems its users decide whether or not to participate in single or
successive measurements (repeated for different cohorts or not).
Finally, it is of major importance to stress the influence of pupil mobility,
in particular when this longitudinal data are represented for a cohort. A
consequence of pupil mobility is that values are missing for pupils who left
a testing sample by changing classes or schools. Therefore, multilevel
modeling is preferred because missing data do not prevent from estimating
growth curves, as all available pupil scores are taken into account in the
estimation procedure. However, when estimating more complex
longitudinal models, taking into account pupil mobility requires cross
classifications, which may overburden the capacities for statistical analysis.
A final aspect to be discussed in this section is the reported aggregation
level for respondents and content. With regard to the respondents, all
systems opt to report pupil level data, with exception of the SFP, which is
designed for evaluating and informing school policy with a focus on
aggregated data. The adjusted scores and value-added scores are in their
view only valid for aggregated data as this cancels out measurement errors
and bias by averaging. Furthermore, the model complexity does not allow
generating data on several aggregation levels, due to pupil mobility. The
systems that report pupil level data (asTTle, PIPS, SAMP and LOVS) also
report data on aggregated levels as classroom level, group level, school
level, etc. In these cases, the aggregated scores are easily obtained by
averaging the pupil scores for a certain group. Besides the respondent level,
also the reported content level is determined by both convictions and
methodological considerations. All systems report at (broad) subscale and
subject level. Only PIPS reports on the item level (only for reception
feedback) as this would have the largest information value to inform
planning in the classroom. AsTTle intentionally does not report on the item
level as this would lead to teaching to the test. Items are therefore just
considered as indicators of subjects. Another restriction for reporting item
level scores depends on the objective of the test taken. As SFP developed
tests for determining learning gains (which requires avoiding ceiling effects)
and not for diagnostics (necessity to determine outliers), it is not opportune
to report item scores. Following the SFP, this implies these tests are not
suitable to discern detailed subscales either, as these would not meet the
psychometric standards of reliability.
Chapter 2
41
4.3. School performance feedback content
This paragraph contains a description of the subjects and topics that are
reported in the feedback reports, the conceptual representations
(performance indicators) and reference groups used, and the sections
offered in reporting. Following the quality standards of performance
indicators, the feedback content has to be relevant and useful.
Furthermore, SPFS users should have to accept the performance indicators
and consider them to be fair.
Regarding the contents that have been tested in the selected SPFSs, we
can refer to Table 3, in which the data gathering instruments are described.
This table shows that all systems use domain specific tests. These refer in all
cases to language and mathematics tests with different subscales. Some
systems broadened their supply with tests for science (PIPS), English as a
foreign language (SAMP, LOVS), and/or technique and world orientation
(aggregation of geography, history and environmental science in LOVS).
The other data instruments from Table 3 report other non-cognitive
measures, such as attitudinal, behavioral and contextual contents.
Concerning behavioral scales, interesting examples are noticed. PIPS, for
example, offers a scale for detecting ADHD and LOVS for Dyslexia, whilst
handwriting is tested by asTTle and SAMP. To illustrate attitudinal
measurements, there are measures of attitudes related to subjects (asTTle,
PIPS, SAMP and LOVS), to the school culture in general (asTTle, PIPS and
SAMP) or to socio-emotional development (LOVS). Related to contextual
information, informing schools about their pupil mobility is of major
importance to get a view on their functioning. Why are pupils leaving?
Which newcomers are schools attracting? Which pupils go to special
education? Was the school aware of the huge number of pupils with
learning lags? These are some of the questions that stimulate reflection at
the school level, transcending individual learning pathways. Only the SFP
specifically reports on this.
Numerical measures
A wide range of numerical measures have been reported in the SPFSs in this
study. Table 4 gives an overview.
Chapter 2
42
Table 4.
Numerical measures used in selected SPFSs
asTTle PIPS SAMP LOVS(1)
SFP
Type of scores
Adjusted scores X X
Expected scores X X X X X
Predicted scores X X
Raw scores X X X X X
Numerical measures
Band score X X
Cut-off score X X
Grade score X X
Learning gain score X X X(2)
X X
Mean score X X X X
Percentage score X X
Percentile score X
Rescaled score X X X
Standardized score X X
Value-added score X X
Note (1): Depending on the tests taken
Note(2): SAMP also registers loss scores besides gain scores
Raw scores are fed back in all cases, as well as expected scores, as the
latter are reflected by the average for the reference group. Predicted
scores, resulting from regression analyses, are used for making predictions
for future performance, based on the current pupil achievements (PIPS,
LOVS). Adjusted scores are rarer as these require more advanced statistical
analysis (LOVS, SFP). All these types of scores are rescaled in meaningful
units for the users. For example, scales are created with a mean of 50 and a
standard deviation of 15. All these transformations are somehow arbitrary
as there are no conventions on which scales, bands or grades are favorable.
Mostly, test score have been transformed in relation to the local context.
For example, AsTTle and PIPS reformulate scores to grades in accordance to
the national curriculum, SAMP rescales to five-point scales teachers are
familiar with and LOVS expresses scores conform to preferences of
inspection authorities.
Feedback reports may contain more information than the mere testing
results. The explanation on how to interpret the results is only provided in
the feedback reports of PIPS and SFP. Other systems provide this
information in the accompanying manual. When it comes to searching for
explanations for the results for a specific school, no further help is provided
Chapter 2
43
in any report. However, AsTTle, LOVS and SAMP take considerable
initiatives for offering remediation material. AsTTle is the most advanced
system by offering supporting material for teachers in accordance to the
achieved grade levels per pupil and group.
What information can be deduced from the reports also depends on the
opportunities for references offered (norm, criterion or self reference).
These three forms of reference offer different opportunities for a school to
measure its own functioning against. All systems offer a norm to compare
the results with. In most cases, this reference is a (representative sample
of) the national average. Only SAMP cannot realize this as a small scale local
project. Instead, SAMP offers the opportunity to compare with schools with
the same language group within the sample. AsTTle and LOVS allow to
compare with schools which are similar, based on certain characteristics.
These features foster fair comparison, with the same underlying idea as
providing adjusted scores (comparing same-to-same); although adjusted
scores are estimated with a different calculation procedure. Criterion-based
references are less prevalent (asTTle and SAMP) as these imply an absolute
instead of relative point of comparison. These references refer in these
cases to the cut-off scores used. Opportunities for self reference are offered
in all systems by allowing schools to compare results over time, either
within cohorts (cf. gain scores, longitudinal measurements), or between
cohorts (multiple measurements with different year groups).
Representation modes
With regard to representation modes used in the feedback reports, we will
discuss the medium used to present the results, the graphical
representations, and the reliability indices (see Table 5).
SPFSs differ in the feedback media used to report the results. These
media are related to the flexibility for users in choosing representations or
manipulating their feedback output. In AsTTle, PIPS and LOVS, users can
select different types of representations by software tools or Excel macros.
They may for example select a table to present exact data, and growth
curves to show trends. SAMP en SFP are less flexible: these SPFSs provide
the user with a printed or digital PDF report of the results. SAMP
additionally reports the results in Excel sheets, which users can use to
perform secondary analyses on.
The graphical representations offered by the different SPFSs differ as
well. Some systems only include more simple form of representations, such
as bar graphs, cross tables and histograms (SAMP). Others include more
Chapter 2
44
complex graphical representations of the results, such as scatter plots with
regression lines (PIPS), line graphs (SFP) and layer graphs (LOVS).
The school performance feedback results are based on certain statistical
analyses, which include a certain measurement error. To enable users to
judge the accuracy and importance of their findings, information on the
uncertainty surrounding the results has been incorporated in asTTle, PIPS,
LOVS and SFP. These uncertainties can be indicated by adding confidence
intervals. All SPFSs studied present confidence intervals in either bar graphs
(AsTTle and LOVS) or longitudinal progress charts (PIPS). SFP represents
uncertainty by marking significant values in cross tables. SAMP prefers not
to present confidence intervals, as this would make the interpretation of
the result too complex for the users. Instead of, they warn the users not to
over interpret small differences or shifts in scores.
Table 5.
Overview of representation modes in selected SPFSs
asTTle PIPS SAMP LOVS(1) SFP
Medium: fixed
Printed report X X X
Pdf version X X X
Medium: flexible tools
Online tools X
Software applications on local
network
X X
Excel sheet X
Excel macro’s in sheet X
Graphical representations
Bar graph X X X
Box plot X X
Cross table X X X X X
Divided bar graph X X X
Grouped bar graph X
Histogram X
Layer graph X
Line graph X X X
Multipanel display X
Pie graph X X
Scatter plot with regression line X
Side by side graph X
Other: e.g., schemes, iconic
representations
X X
Chapter 2
45
asTTle PIPS SAMP LOVS(1) SFP
Reliability indices
Confidence intervals X X X
Significance values X
Note (1): Depending on the tests taken
5. Discussion
As evaluation and data-driven decision making are receiving increased
attention in education, more and more SPFSs are being developed and used
worldwide. However, little research is available on the characteristics of the
different SPFSs. Studies show that these characteristics may influence the
degree to which the feedback is actually used for school improvement (e.g.
Schildkamp & Visscher, 2009; Verhaeghe et al., 2010). Therefore, it is
important to carefully consider the characteristics of an SPFS, when
developing or selecting one to use. Users need to purposefully choose the
type of SPFS that corresponds to their information needs. This requires
transparency on the characteristics of different SPFSs. Therefore, in this
article, we developed an exemplary framework for identifying SPFS
characteristics, which usability has been demonstrated by applying it to five
different SPFSs. We illustrated variety in the data gathering processes, the
type of analyses, and the content of the feedback, including the numerical
measures and representation modes used. The goal of this study hereby
was not to judge the different SPFSs, but to highlight some issues
concerning SPFS characteristics.
With regard to data gathering, all SPFSs studied mainly offer completely
structured instruments, as cognitive tests, questionnaires on socio
emotional development, and diverse types of scales. Additional semi-
structured instruments as interviews or rating scales are provided as well.
All are accompanied by strict prescriptions on how to gather the data.
Providing highly structured instructions and instruments is a prerequisite
for standardized and reliable data collection (Fitz-Gibbon, 1996; Fitz-Gibbon
& Tymms, 2002), especially if data has been gathered by school staff. This
data collection by school team members only leads to accurate data in case
of low-stake testing (Fitz-Gibbon & Tymms, 2002; Yang et al., 1999; Smith,
1995). As data collection is very time consuming, technological tools may
create several advantages. Therefore, initiatives as computer adaptive
testing or automatic upload of data from management information systems
(as in asTTle, PIPS and LOVS) are facilitating efficient data gathering. These
Chapter 2
46
tools do not only prevent pupils and teachers from overburdening with data
collection, but also foster targeted data collection. However, only highly
advanced SPFSs adopt these tools. Furthermore, these software tools
cannot be applied in all contexts due to the infrastructural limitations.
As data-driven decision making is a cyclic process (cf. Plan-Do-Check-Act-
cycle, Deming, 1986; Verhaeghe et al., 2010), repeated testing for different
cohorts and year groups is required. Furthermore, systematic measurement
with small time lags will result in the most reliable trend (van de Grift,
2009). In all the SPFSs in this study, the users therefore have the choice for
which time intervals to opt.
With respect to the content of the feedback, the SPFSs in this study focus
rather narrowly on a few cognitive outcomes (e.g. language, mathematics
and/or science), which make part of the core curriculum in all countries.
Developers of SPFSs might consider how to include other subject areas in
the SPFSs, as well as more attitudinal, behavioral and contextual
information. If school staff wants to make informed decisions on how to
improve their education, they need different types of data (Schildkamp &
Kuiper, 2010). AsTTle, PIPS and LOVS have set the first steps, but other
types of data may be considered as well, such as data on the functioning of
teachers (e.g. teacher and student questionnaires). Moreover, schools
already have other types of data available (e.g. school data such as
achievement tests from other subjects, inspection reports, parent surveys,
class tests). It is important to consider these different types of data and
data sources in schools as well, in order to make a comprehensive
evaluation of the schools’ functioning (Schildkamp & Kuiper, 2010). A
preferable scenario to foster this data triangulation would be the
development of integrated management information systems (Bosker,
Branderhorst, & Visscher, 2007). In order to obtain an integrated system,
more coherence in data conceptualization and representation is required,
not only between different data sources, but also between different
instruments of the same SPFS. A first step herein can be taken by
developers of SPFS in creating more conformity in data analyses and
representations.
With regard to the data analysis, it is important to find a balance between
statistically correct - and thus complicated - analyses and accurate results
on the one hand and understandable analyses and user friendly results on
the other. For example, the analyses used in PIPS are fairly straightforward
and not too complex. Schools can understand the results, and studies show
that schools feel ownership over the results (Tymms & Albone, 2002),
Chapter 2
47
which directly influences the degree to which the feedback is actually used
(Kyriakides & Campbell, 2004; Schildkamp & Teddlie, 2008). However,
because the system does not use multilevel analyses, schools are
sometimes wrongly classified as, for example, underperforming. To reduce
these misclassifications, some researchers claim that it is necessary to apply
multilevel models (Goldstein & Spiegelhalter, 1996; Karsten, Visscher,
Dijkstra, & Veenstra., 2010). Following Yang et al. (1999), it is possible to
explain these multilevel models and outcomes to head teachers in an
understandable way. In contrast, others consider multilevel modeling as
inappropriate for feedback purposes and claim that the method of Ordinary
Least Squares is accurate enough and more understandable to schools (Fitz-
Gibbon, 1996; Fitz-Gibbon & Tymms, 2002; Sharp, 2006). Whatever
statistical analysis an SPFS uses, it should inform its users on the associated
constraints, as it presents an image of being a fair performance indicator
system.
Moreover, it is important to realize that any type of measurement
always includes some type of error. Statistical estimates always include
uncertainty, which needs to be taken into account in any interpretation.
This especially holds for small groups, such as classes and cohorts within
schools. An SPFS should therefore provide information on limitations and
uncertainties, and provide information on the reliability of the estimates
(Fitz Gibbon & Tymms, 2002; Mortimore & Sammons, 1994; Rowe, 2004;
Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996; Yang et al., 1999;
Karsten et al., 2010). These reliability indices are for example applied in
AsTTle, PIPS, LOVS, and SFP.
If school level data is intended to be used for making fair comparisons
with reference groups, it is advisable to work with value-added models.
Value-added is usually defined as everything the pupil has learned at
his/her school (e.g., van de Grift, 2009). However, the concept “value-
added” is not unproblematic (van de Grift, 2009). It is not possible to assess
everything a pupil has learned, such as social and creative abilities.
Furthermore, because pupils change from schools and classes, different
schools and classes have an influence on the pupils’ learning progress. Also
it is not clear how this learning progress should be measured. And how to
take into account the knowledge and skills acquired outside the school? As
a result, several problems have been associated when applying value-added
modeling (van de Grift, 2009; Karsten et al., 2010). We discuss some
important issues for this study.
Firstly, there is the problem of missing values, which may distort the
results. In this study, primarily in the SFP, serious attention has been
devoted to this issue (Knipprath & Verhaeghe, 2010). Moreover, these
Chapter 2
48
missing data might not just be random, but might be the result of certain
interventions (e.g. grade repeating) in schools. Incorporating the impact of
these missing values in the estimation procedures would be advisable
(Sanders, 2006; van de Grift, 2009; Yang et al., 1999).
Next, there is the instability of value-added judgments. Therefore, it is
recommended to use data on successive cohorts (at least 3 school years;
Van De Grift, 2009), to use longitudinal measurements (Heck, 2006) or to
average scores over several years (OECD, 2008). Also cross-sectional data
analysis might be used as it might have several advantages as compared to
longitudinal testing (Luyten, 2006; Sammons & Luyten, 2009).
Thirdly, there are different procedures for computing value-added
models, which lead to different rankings of schools (Fitz-Gibbon, 1996;
Goldstein & Spiegelhalter, 1996; Heck, 2006; OECD, 2008; Rowe, 2004;
Sanders, 2006; van de Grift, 2009; Yang et al., 1999). For example, there is
no consensus on the inclusion of student background characteristics in the
models used in the SPFSs in this study. As student achievement results are
influenced by prior achievement and student background characteristics
(such as gender and SES), several researchers stress that corrections for
these out-of-school influences might be required (Goldstein & Thomas,
1996; Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996; Heck,
2006; Yang et al., 1999; Karsten et al., 2010; Sanders, 2006; Rowe, 2004).
Fourthly, a value-added model has only limited predictive validity for
certain schools (e.g., for schools with large SES-gaps). SPFSs that use value-
added models should therefore always be careful with categorizing schools
as underperforming, and should use labels as durably underperforming or
durably outperforming instead of ranking schools (van de Grift, 2009).
Goldstein and Thomas (1996) and Yang et al. (1999) also recommend using
these kinds of procedures only to identify “institutions at extremes”, as a
screening device to detect problems. Furthermore, one should always keep
in mind that value-added measures are only relative indicators of school
performance that should be interpreted against the reference group (Rowe,
2004; Karsten et al., 2010). Comparing institutions based on statistical
models will always require prudency (Goldstein & Spiegelhalter, 1996).
Finally, users have difficulties when interpreting value-added data
(Karsten et al., 2010; Santelices & Taut, 2009; Vanhoof, Verhaeghe,
Verhaeghe, Valcke, & Van Petegem, in press). Users should be supported to
gradually acquire expertise in data interpretation by, for example, getting
offered more and diverse value-added models ( Schatz, VonSecker, & Alban,
2005). Also other conceptualizations in terms of “school contribution”
(Santelices & Taut, 2009) or “residual analysis” (Fitz-Gibbon, 1996; Schatz,
VonSecker, & Alban, 2005) might foster correct understanding.
Chapter 2
49
In addition to the discussion on the accuracy of the reported feedback
data, we want to stress that relevant and fair feedback information does
not always need complex statistical modeling. For example, information on
pupil mobility is highly relevant to inform school policy. This variable should
not be used merely as a covariate or grouping variable or to determine
missing data (van de Grift, 2009). As only SFP is explicitly reporting on
amount of students entering in, staying in and leaving from the cohort each
year, this could be a consideration for other SPFSs. Moreover, initiatives as
asTTle that keep track of student results during their whole career and
make those results accessible to them should be encouraged. By denying
the value-added aspect, asTTle reports student level data for all students,
irrespective what schools they have attended before. Furthermore, we
stress that besides adjusted scores, the raw scores need to be reported
because of their informative value.
After having analyzed the data, SPFS developers need to carefully consider
what types of numerical measures and graphical representations to offer to
the users. Research has revealed that even simple numerical conceptions
and representations are often interpreted incorrectly (Earl & Fullan, 2003;
Zupanc et al., 2009). A sufficient level of assessment literacy is a
prerequisite for correct understanding. If not, proper support initiatives
should be provided. Furthermore, other ethical issues are related to the
fairness of the data. For example, the arbitrary or unfair boundaries used in
case of cut-off scores (Fitz-Gibbon & Tymms, 2002; Heck, 2006).
Furthermore, explorations in term of absolute instead of relative measures
of performance need to be encouraged (Luyten, 2006). To see all these
risks, proper informing of the users is required (Karsten et al., 2010).
Furthermore, it might be worth to offer different types of representations
that serve different purposes (e.g. band scores for detecting outliers and
line graphs for visualizing growth; Kosslyn, 2006). This idea has been applied
in asTTle that provides seven types of reporting serving different purposes.
SPFS developers should think carefully about each of these characteristics
and keep in mind that the use of school performance feedback does not
always lead to improvement, but that it should at least do no harm (Fitz
Gibbon & Tymms, 2002; Rowe, 2004). Moreover, they should consider
offering training in the interpretation and use of the results, especially
when using more complex statistical modeling, as studies have shown that
SPFS use without proper training is difficult (Schildkamp & Visscher, 2009;
Verhaeghe et al., 2010; Vanhoof et al., in press). Furthermore, it is advisable
to provide the users with indications on what instructional or organizational
Chapter 2
50
processes should be improved upon (Coe & Visscher, 2002; Karsten et al.,
2010; Verhaeghe et al., 2010), which has not been done by most of the
SPFSs in this study.
6. Conclusion
We belief that the components of SPFSs discussed in this study are
important aspects in ensuring that the SPFSs that have been developed all
over the world will be used as they are intended to be: for school
improvement purposes. However, we also belief that there is much to be
gained when it comes to developing SPFSs that provide schools with
reliable, valid and user friendly data. Decisions made by SPFS developers
about the design of the SPFS impact the results in ways that are not yet
fully understood, but can have implications for determining how “strong or
poor” a school is performing. Expanding and adjusting on the preliminary
framework we developed into a set of standards SPFS developers and
schools can use, may aid in developing efficient instruments for data driven
decision making.
Acknowledgement
We would like to express our sincere gratitude to the directors and
researchers of the SPFSs involved in this study for their cooperation:
• Prof. John Hattie, Director of Visible Learning Labs, director of asTTle,
University of Auckland
• Dr. Christine Merrell, Director of Primary Systems, Centre for Evaluation
and Monitoring, Durham University
• Elizabeth Archer, Project coordinator of SAMP, Centre for Evaluation and
Assessment, University of Pretoria
• Geert Evers, Information Manager Primary Education, Centraal Instituut
voor Toetsontwikkeling
• Ilse Papenburg: Training and advice, Centraal Instituut voor
Toetsontwikkeling
• Dr. Jean Pierre Verhaeghe, Project coordinator of the SFP, Ghent
University and Katholieke Universiteit Leuven
Chapter 2
51
References
Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the
utilisation of management information systems in secondary schools.
School Effectiveness and School Improvement, 18(4), 451–467.
Coe, R., & Visscher, A.J. (2002). Drawing Up the Balance Sheet for School
Performance Feedback Systems. In R. Coe & A.J. Visscher (Eds.), School
improvement through performance feedback. Lisse: Swets & Zeitlinger
Publishers.
Davies, D. & Rudd, P. (2001). Evaluating school self-evaluation (Research
Report No. 21). Berkshire, UK: National Foundation for Educational
Research, Local Government Association.
Deming, W.E. (1986). Out of the crisis. Cambridge: Massachusetts Institute
of Technology, Center for Advanced Engineering Study.
Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge
Journal of Education, 33(3), 383-394.
Fitz-Gibbon, C.T. 1996, Monitoring education: Indicators, quality and
effectiveness London: Cassell.
Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in
indicator systems: Doing things right and doing wrong things. Education
Policy Analysis Archives, 10(6), 1-28. Retrieved from
http://epaa.asu.edu/ojs/article/viewFile/285/411
Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code
of ethics for performance indicators. Research Intelligence, 57, 12-16.
Goldstein, H. & Spiegelhalter, D.J. (1996). League tables and their
limitations: Statistical issues in comparisons of institutional
performance. Journal of the Royal Statistical Society: Series A: Statistics
in Society, 159(3), 385-443.
Goldstein, H., & Thomas, S. (1996). Using examination results as indicators
of school and college performance. Journal of the Royal Statistical
Society. Series A: Statistics in Society, 159(1), 149-163.
Heck, R. (2006). Assessing school achievement progress: Comparing
alternative approaches. Educational Administration Quarterly, 42(5),
667-699.
Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School self-
evaluation and student achievement. School Effectiveness and School
Improvement, 20(1), 47-68.
Karsten, S., Visscher, A.J., Bert Dijkstra, A., & Veenstra, R. Towards
standards for the publication of performance indicators in the public
sector: The case of schools. Public Administration, 88(1), 90-112.
Chapter 2
52
Knipprath, H., Verhaeghe, J.P. (2010, April). Instability of the school
population: the less favourable side of longitudinal educational
effectiveness research. Paper presented at the 2010 AERA Annual
Meeting. Denver.
Kosslyn, S.M. 2006, Graph design for the eye and mind. Oxford: Oxford
University Press
Kyriakides, L. & Campbell, R.J. (2004). School self-evaluation and school
improvement: A critique of values and procedures. Studies in
Educational Evaluation, 30, 23-36.
Leithwood, K., Aitken, R., & Jantzi, D. (2006). Making schools smarter:
Leading with evidence. (3rd. ed.) Tousand Oaks, CA: Corwin Press.
Schatz, C.J., VonSecker, C.E., & Alban, T.R. (2005). Balancing accountability
and improvement: introducing value-added models to a large school
system. In R. Lissitz (Ed.), Value added models in education: Theory and
applications (pp. 1-18). Maple Grove, Minnesota: JAM Press.
Luyten, H. (2006a). An empirical assessment of the absolute effect of
schooling: Regression-discontinuity applied to TIMSS-95. Oxford Review
of Education, 32(3), 397-429.
Luyten, H., Tymms, P., & Jones, P. (2009). Assessing school effects without
controlling for prior achievement? School Effectiveness and School
Improvement, 20(2), 145-165.
Maier, U. (2010). Accountability policies and teachers' acceptance and
usage of school performance feedback - a comparative study. School
Effectiveness and School Improvement, 21(2), 145-165.
Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An
expanded sourcebook. Thousand Oaks, CA: Sage.
Mortimore, P. & Sammons, P. (1994). School effectiveness and value added
measures. Assessment in Education: Principles, Policy and Practice, 1(3),
315.
Organisation for Economic Co-operation and Development (2008).
Measuring improvements in learning outcomes: Best-practices to assess
the value-added of schools Paris: OECD Publishing.
Rowe, K. (2004). Analysing and reporting performance indicator data:
'Caress' the data and user beware! Paper presented at the 2004 Public
Sector Performance and Reporting Conference, Sydney, Australia.
Rowe, K. & Lievesley, D. (2002). Constructing and using educational
performance indicators. Paper presented at the 2002 Asia-Pacific
Educational Research Association, Melbourne, Australia.
Sammons, P. & Luyten, H. (2009). Editorial article for special issue on
alternative methods for assessing school effects and schooling effects.
School Effectiveness and School Improvement, 20(2), 133-143.
Chapter 2
53
Sanders, W.L. (2006). Comparisons among various educational assessment
value-added models. Paper presented at The Power of Two National
Value-Added Conference, Columbus, Ohio
Santelices, V. & Taut, S. (2009, September). Comprehension and use of
value-added school performance indicators reported to teachers and
parents. Paper presented at the European Conference on Educational
Research, Vienna.
Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform:
Which data, what purposes, and promoting and hindering factors.
Teaching and Teacher Education, 26(3), 482-496.
Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems
in the USA and in the Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schildkamp, K., & Visscher, A. (2009). Factors influencing the utilisation of a
school self-evaluation instrument. Studies in Educational Evaluation,
35(4), 150-159.
Sharp, S. (2006). Assessing value-added in the first year of schooling: Some
results and methodological considerations. School Effectiveness and
School Improvement, 17(3), 329-346.
Smith, P. (1995). On the unintended consequences of publishing
performance data in the public sector. International Journal of Public
Administration, 18(2), 277-310.
Tymms, P. (1999). Baseline assessment and monitoring in primary schools.
Fulton Publishers: London.
Tymms, P., & Albone, S. (2002). Performance indicators in primary schools.
In A.J. Visscher, & R. Coe (Eds.), School improvement through
performance feedback (pp 191-218). Lisse: Swets & Zeitlinger.
van de Grift, W. (2009). Reliability and validity in measuring the value added
of schools. School Effectiveness and School Improvement, 20(2), 269-
285.
Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P.
(in druk).The influence of competences and support on school
performance feedback use. Educational Studies.
Vanhoof, J. & Van Petegem, P. (2007). Matching internal and external
evaluation in an era of accountability and school development: Lessons
from a Flemish perspective. Studies in Educational Evaluation, 33(2),
101-119.
Van Petegem, P., Vanhoof, J., Daems, F., & Mahieu, P (2005). Publishing
information on individual schools. Educational Research and Evaluation,
11(1), 45-60.
Chapter 2
54
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
school performance feedback: Perceptions of primary school principals.
School Effectiveness and School Improvement, 21(2), 167-188.
Visscher, A.J. (2002). A framework for studying school performance
feedback systems. In A.J. Visscher & R. Coe (Eds.), School improvement
through performance feedback (pp. 41-71). Lisse, The Netherlands:
Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through
performance feedback. Lisse, The Netherlands: Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Yang, M., Goldstein, H., Rath, T., & Hill, N. (1999). The use of assessment
data for school improvement purposes. Oxford Review of Education,
25(4), 469-483.
Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for
effectiveness and improvement in classrooms and schools in upper
secondary education in Slovenia: Assessment of/for Learning Analytic
Tool. School Effectiveness and School Improvement, 20(1), 89-122.
55
CHAPTER 3
PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL PERFORMANCE FEEDBACK USE
Chapter 3
56
CHAPTER 3: PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL
PERFORMANCE FEEDBACK USE∗∗∗∗
Abstract
The present study focuses on the perception of primary school principals of
school performance feedback (SPF) and of the actual use of this
information. This study is part of a larger project which aims to develop a
new school performance feedback system (SPFS). The study builds on an
eclectic framework that integrates the literature on SPFSs. Through in-
depth interviews with 16 school principals, four clusters of factors
influencing school feedback use were identified: context, school and user,
SPFS, and support. This study refines the description of feedback use in
terms of phases and types of use, and effects on school improvement.
Although school performance feedback can be seen as an important
instrument for school improvement, no systematic use of feedback by
school principals was observed. This was partly explained by a lack of skills,
time, and support.
∗ Based on Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School
Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and
School Improvement, 21(2), 167-188.
Chapter 3
57
1. Introduction
In recent years, the trend of decentralizing educational systems has
prompted researchers to focus on school-based management and internal
evaluation. Because schools are granted autonomy, governmental bodies
expect them to be accountable for monitoring their internal quality policy
(Nevo, 2002). In this context, the current performance level of a school
serves as a starting point for developing future plans and educational
targets. To asses their baseline performance level, schools can make use of
feedback offered by school performance feedback systems (SPFSs). These
external systems deliver confidential information about a school’s
performance and functioning (Visscher & Coe, 2002, 2003). Performance
feedback helps to reveal the strengths and weaknesses of a school’s
functioning and is expected to contribute to the school improvement
process by stimulating reflection and self-evaluation.
However, receiving feedback alone is not a sufficient condition to foster
self-evaluation and systematic reflection at the school level. Several other
conditions related to the school, the context, and the specific SPFS being
used, determine if and how schools will make use of the available feedback.
Empirical research on SPFSs is limited (Schildkamp & Teddlie, 2008). Studies
that have been carried out indicate that the actual use of school feedback
and its impact are rather low (Coe, 2002; Tymms, 1995; Saunders & Rudd,
1999; Van Petegem & Vanhoof, 2004). We believe that a detailed study of
the use and impact of existing school performance feedback initiatives is
warranted (Goldstein & Spiegelhalter, 1996; Schildkamp, 2007; Schildkamp,
Visscher, & Luyten, 2009; Visscher & Coe, 2002; 2003). In this study we
build on the findings of an ongoing project which focuses on the design,
development, and implementation of an SPFS in Flanders (The Dutch
speaking community of Belgium). We investigate the perceptions of school
principals of factors that promote or hinder their understanding and use of
school performance feedback information. The results of this study are
expected to support the development of SPFSs and to further refine
theories on school feedback use.
2. Theoretical framework
Based on a literature review, we developed a conceptual framework that
integrates factors affecting SPF use and effects (Fitz-Gibbon & Tymms,
2002; Schildkamp, 2007; Van Petegem & Vanhoof, 2007; Visscher, 2002;
Visscher & Coe, 2003). This framework is presented in Figure 1.
Chapter 3
58
Figure 1. Conceptual framework of school performance feedback use
2.1. School performance feedback use: Phases, types and effects
Adequate use of SPF is expected to lead to specific effects at the school and
pupil level (Visscher, 2002; Schildkamp, 2007). Its purpose is to contribute
to school improvement and lead to higher student performance (Visscher &
Coe, 2003). Apart from the intended effects of SPF, unintended effects have
also been reported in the literature, such as selective student admissions,
teaching to the test, and removing difficult students (Visscher, 2002). Other
studies refer to undesirable side effects of SPF, such as the demotivation of
school staff who become overwhelmed by the amount of the data involved
and the amount of time they have to invest (Fitz-Gibbon & Tymms, 2002;
Schildkamp & Teddlie, 2008). In this context, SPF does not always result in
significantly better student outcomes (Fitz-Gibbon & Tymms, 2002;
Schildkamp, Visscher, & Luyten, 2009; Visscher, 2002). Nevertheless,
recent research indicates that SPF can have a positive impact on pupil
achievement levels (Hammond & Yeshanew, 2007) and on the associated
school improvement processes (Schildkamp, 2007; Schildkamp & Teddlie,
2008; Schildkamp, Visscher, & Luyten, 2009). In these studies several effects
on process indicators were observed, such as an improvement in
consultation and communication about school functioning and school
quality, improved didactical approaches, and a stronger achievement
orientation of staff. However, considering the limited amount of research
available caution is warranted in drawing conclusions about the reported
effects of SPF use (Coe, 2002; Schagen, 2004).
The way school feedback is used plays a key role in its potential impact.
In terms of a policy-making cycle (e.g., Hoy & Miskel, 2001) feedback should
be used in the following sequence. First, feedback results must reach the
PH
ASES IN
U
SE
• Context related• School and user related• School performance feedback
(system) related• Support related
Interpretation
Policy actions
RESULTS:TYPES OF USE
EFFECTS
INFLUENCINGFACTORS
Chapter 3
59
proper person(s). Second, the data in the report must be read and
interpreted correctly for it to be meaningful. In the subsequent diagnostic
process, causes and explanations for the results are deliberated. The
diagnostic process results in actions that are implemented and finally
evaluated. However, research indicates that school principals do not always
disseminate feedback information or simply distribute feedback reports
without examining them (Van Petegem & Vanhoof, 2004). Other studies
found that school feedback users often get stuck in the transition from the
interpretation of SPF to active policy making (Vanhoof, 2007; Schildkamp,
2007). This is highly problematic as the interpretation of the data is
essential in deducing workable information (Earl & Fullan, 2003). These
phases of data use are outlined in the practice of data driven decision
making (Learning Point Associates, 2004). However, in the current literature
on SPF use for school improvement these phases are not distinguished in a
systematic way.
Within this policy-making cycle different types of feedback use can be
distinguished: (1) direct/instrumental, (2) conceptual, and (3)
symbolic/convincing (Rossi, Lipsey, & Freeman, 2004). An instrumental use
of feedback serves as a starting point for immediate policy making
decisions. A conceptual use of feedback does not result in concrete actions,
but influences the decision making process, which indirectly affects action.
Even if feedback does not influence one’s conceptualizations, it can affect
the policy making process in a symbolic way. This means feedback results
serve to convince others of existing opinions and to support viewpoints in
discussions (Visscher, 2002). Furthermore, feedback can be used in a
strategic way for accountability purposes, although this is not in line with a
school improvement discourse (Visscher & Coe, 2003). These four types of
feedback use can be considered as results of feedback use. For example, a
conceptual use results in an altered way of thinking about pupil
performances. This intermediate result can in the end lead to effects of
feedback use, such as a stronger achievement orientation.
2.2. Factors influencing school performance feedback utilization
Differences in the interpretation and use of school feedback can be
attributed to a variety of factors. In the framework of Visscher (2002) and
Visscher and Coe (2003) the following set of influential factors are outlined:
context, school and user, SPFS, and support. The authors embed the
process of feedback use in the broader school environment, which we call
context related factors. They do not distinguish support related factors as a
separate set, but place them within the implementation process and
Chapter 3
60
characteristics of the feedback system. These variables were selected based
on a literature review in the fields of educational innovation, educational
management, business administration, and computer science. However,
the relations between the different influencing factors and the feedback
effects are not examined (Visscher, 2002). This framework is used as a basis
for the present study.
Context related factors that impact feedback use include the school’s
policy strategies at the regional and/or governmental level (Sun, Creemers
& de Jong, 2007; Visscher, 2002). For instance, policies can contain clear
expectations that schools make use of feedback information. Educational
governments can stimulate feedback use by pressure and/or support.
Furthermore, feedback will be used differently depending on the context
(e.g., school improvement, school accountability, or a combination of both
strategies) (Vanhoof & Van Petegem, 2007; Visscher, 2002).
Secondly, school and user related characteristics seem to be key
variables explaining differences in school feedback use. First, the motivation
to use an SPFS leads to different utilizations. Motivation varies from
internal quality development or external accountability, to policy
preparation (van Aanholt & Buis, 1990; Liket, 1992). Secondly, previous
experiences with feedback use, general experience with school related
data, and the statistical knowledge and skills needed to interpret feedback
reports will also influence feedback use. While most teachers have
experience with school test data, pupil monitoring systems, and self-
evaluations, in several studies school staff report that they are lacking the
skills and confidence when using data for school policy purposes (Earl &
Fullan, 2003; Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006; Saunders,
2000; Williams & Coles, 2007). Thirdly, school performance levels also
influence feedback use (Visscher, 2002; Visscher & Coe, 2003). Schools
receiving positive feedback (large value added) will discuss the results
differently compared to schools receiving a less positive picture
(Schildkamp, 2007). In line with control theory, participants receiving
negative feedback are more likely to make an effort to reduce the
discrepancy between the negative feedback and the expected standards
(Kluger & DeNisi, 1996). This will result in different policy implications.
However, this theory does not hold in all cases; it is not unusual for school
principals to withhold feedback information that does not fit the current
policy plan (Van Petegem & Vanhoof, 2004).
A third set of factors influencing school performance feedback use refers
to the characteristics of the school feedback reports and the feedback
system. In this context, the perception of the user determines how
feedback will be used (Visscher, 2002; van den Berg & Ros, 1999). At the
Chapter 3
61
level of content, feedback should be perceived as relevant, non-
threatening, and corresponding to the actual informational needs
(Schildkamp & Teddlie, 2008; Visscher, 2002; Van Petegem & Vanhoof,
2007). Furthermore, the representation of both absolute and relative
school performance results also impacts the way feedback is used (Visscher,
2002; Visscher & Coe, 2003). If relative measures are used to compare the
school’s results with a reference group, these school scores should be
adjusted for the influence of pupil background characteristics and should be
linked to the relevant cohort group (Goldstein & Spiegelhalter, 1996).
Information should also be up-to-date, reliable, and valid (Visscher, 2002;
Visscher & Coe, 2003; Schildkamp & Teddlie, 2008). In terms of ethical
issues, Fitz-Gibbon and Tymms (2002) refer to the Hippocratic Oath and
state that feedback should “at least do no harm” (p. 75). For example, in
some cases feedback can be threatening to recipients’ self-esteem,
particularly in a system of accountability (Visscher & Coe, 2003). Consistent
with our definition of SPFSs, feedback systems for school improvement
should guarantee confidentiality and anonymity to the subjects and
schools. Moreover, feedback should not harm subjects or schools on the
basis of misleading information (Goldstein & Myers, 1996).
The fourth and final set of factors that affect feedback use concerns the
support experienced by feedback users (Schildkamp & Teddlie, 2008).
School staff that are involved in SPFS training are more likely to read the
feedback reports and adopt a more positive attitude (Tymms, 1995).
Numerous studies stress the importance of providing feedback support
(e.g., Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2209;
Van Petegem & Vanhoof, 2007; Visscher & Coe, 2003). This can be
administered by educational and government parties, school team
members, or the feedback system itself.
3. Research questions
This study examines the perception of school feedback users. Based on the
conceptual framework discussed above, the following research questions
are asked:
• What phases can be observed in practice when schools use school
performance feedback?
• What is/are the result(s) of using school performance feedback?
• How can differences be explained in the interpretation and the further
use of school performance feedback in different school contexts?
Chapter 3
62
4. Research context
This study is part of a larger SPF project called “Each school its own mirror.”
As there is currently no SPFS available in Flanders, this project is in the
process of developing and evaluating a new SPFS with collaboration
between researchers, various stakeholders, and a target group of primary
school principals and teachers. The system that has so far been developed
from the SPF project gives schools feedback on a confidential basis. These
feedback reports are designed to enable teachers and principals to
understand the value added scores of their school as compared to a
reference group. The reference group used is taken from another research
project (the SiBO project, Schoolloopbanen in het BasisOnderwijs [School
Trajectories in Primary Education]) that is currently tracking approximately
6000 children from a representative sample of Flemish schools (from the
time they entered kindergarten until the end of primary education). In the
SPF project, scores on tests and survey and observational data are being
continuously collected to gather information on child characteristics, family
background, class characteristics, classroom practices, teacher attitudes
and subjective theory, and school characteristics. The tests focus on
language learning (orthography, reading fluency, reading comprehension)
and mathematics. IRT-based techniques are used to construct the test
scores, enabling us to estimate growth curves.
The SPF project is currently able to deliver trial versions of school
feedback reports to the 198 primary school principals participating. In this
study, we build on the results from the trial versions sent to the schools in
the spring of 2007. These reports inform schools about the performance of
children and classes in the first two years of primary education. Results
were reported for mathematics, reading fluency, and orthography,
supplemented with information about pupil characteristics (child factors,
home factors, and Dutch language skills at the start of grade 1). The school
specific results were compared to the Flemish reference group. The central
concepts in these reports include learning gain, value added, and adjusted
scores and were explained in such a way that no prior statistical knowledge
was required. The data were supported with graphical representations (i.e.,
boxplots, bar graphs, pie graphs, growth curves, and cross tables). The text
of each report was standardized. The school principals were required to
interpret the results for their school, based on the general information
made available. They also received individual pupil feedback which
represents the observed scores and percentile rankings relative to the
reference group. Pupil feedback was presented to the schools shortly after
Chapter 3
63
taking the class tests, but the aggregated scores at class and school level
were sent approximately 10 months later.
5. Research design
5.1. Research approach
In this study we use a qualitative design to explore the perceptions of
primary school principals of SPF use. A qualitative approach is appropriate
since we want to develop a view on “naturally occurring, ordinary events in
natural settings, so that we have a strong handle on what ‘real life’ is like”
(Miles & Huberman, 1994, p. 10). It is recommended when the knowledge
base is limited and the nature of the variables, processes, and interrelations
is less clear (Maso & Smaling, 1998), which holds for the literature about
SPF use.
5.2. Research instrument and procedure
Data were gathered on the basis of semi-structured in-depth interviews.
This type of interview creates an informal relationship between researcher
and respondent, and gives the researcher a better understanding of the
perceptions, opinions, and views of respondents (Mason, 2002). The
interview questions were largely open ended and were derived from the
conceptual framework discussed above. Respondents were invited to
describe their school situation, to propose suggestions, and to express their
concerns. To clarify remarks or to ask for elaboration, spontaneous follow-
up probes were allowed (Lindlof & Taylor, 2002). Examples of questions
include:
• Questions about feedback characteristics: These questions focused on
the perceptions of the relevance, interpretability, user-friendliness,
validity and reliability of the feedback information (e.g., Do you think the
information is relevant to draw a picture of the school’s influence on
pupils’ performances? Which information is the most relevant? Why? Do
you trust the quality of the feedback results?).
• Questions about school and user characteristics mainly focused on
interpretation skills, expectations of feedback use, and the perception of
the school’s performance (e.g., Do you feel comfortable interpreting
these feedback results? If yes, where did you acquire the knowledge and
skills for this? Which problems did you encounter?) Furthermore
questions regarding school culture characteristics were asked (e.g., Is
Chapter 3
64
there a culture of systematic reflection? To what degree do teachers
welcome school performance feedback reports? Besides this feedback
project, are there other data gathering systems used to asses the
school’s functioning?).
• Questions about support initiatives included support use and support
needs (e.g., Have you engaged the team members when interpreting the
feedback reports? Do feel enough support from the school staff when
interpreting the results? Is there a need for more external support? For
which activities?)
• Questions on feedback use were formulated to discern different types
and phases of feedback use (e.g., Did you formulate any goals you want
to achieve by using feedback? Which initiatives are you undertaking to
communicate the feedback results to staff members? Did the feedback
report play a role in policy decision making? Has it influenced your way
of thinking about the school? Did you use the report for strategic
purposes, such as promoting your school, informing the school
inspectorate about your school’s results? Did you use the report to
legitimize your own convictions?)
• Questions about feedback effects were not stressed because it was
unlikely that effects of feedback use on the school could already have
been observed in the three months period between the feedback
delivery and the interview. However, questions about participants’
expectations of effects were posed (e.g., What effects should take place
for your effort to have been worthwhile?).
• The perception of context related factors is limited in this study to the
influence of the inspection visits to schools.
School principals were visited in their school office by one of the two
interviewers, three months after receiving their school performance
feedback report. Interviews lasted approximately 90 minutes.
5.3. Theoretical sampling
From the 198 SiBO-principals, a sample of 16 primary school principals was
selected by means of theoretical sampling, maximizing a variety of feedback
use (Mason, 2002; Silverman, 2005). In this sampling method the choice of
cases is made on conceptual grounds, not on representative grounds (Miles
& Huberman, 1994). To gather this sample, two months after having
received feedback reports, the 198 principals were asked to fill out an
online survey. We obtained a response rate of 61%. The principals were
selected for the present study on the basis of the following variables: the
Chapter 3
65
degree to which they used the school feedback, the number of children
without special needs in their school, experience in working with self-
evaluation, and school performance as represented in the feedback report.
For each variable the schools were divided in to three groups (low, average,
and high), with exception of the school performance level (positive or
negative value added). In this survey the principals were asked who they
had discussed the feedback report with, and chose from 6 answers. This
was considered as an indicator of feedback use. Respondents that depicted
more than 3 options were defined as high users. Principals that marked less
than two options were defined as low feedback users (M = 1.77, SD = 1.26).
The second variable concerns the school’s performance level (Visscher,
2002; Visscher & Coe, 2003). A distinction was made between schools with
a positive or negative value-added mathematics score at the end of grade
two. In the online survey principals were asked to report their degree of
experience in conducting self-evaluations in the school. Respondents with
scores higher than three on a 5-point Likert scale were classified as highly
experienced and those with scores less than three as having a low degree of
experience (M = 3.50, SD = 1.08). This selection criterion was used as it
indicates prior experience in data use for school improvement. The fourth
selection variable was the number of pupils without special needs at their
school. As feedback reports in this case were adjusted for pupil background
characteristics, a differential approval of the feedback relevance was
expected. Schools with percentages between 30 and 70 are considered as
having an average number of pupils without SEN (M = 50.36, SD = 27.73).
Figure 2 gives an overview of the selected schools.
Figure 2. Overview of selected respondents.
Note: H = high, A = average, L = low, ? = information unknown, + = positive value
added score, - = negative value added score; number of respondents between
parenthesis. From left to right respectively respondents from school 2, 11, 7, 1, 16,
3 & 10, 4 & 13, 15, 14 & 5, 6, 9, 8 & 12.
School population
Self-evaluation
Value added
Feedback use
Interviewees 16
H (5)
+ (3)
H (3)
H (1) A (1) L (1)
- (2)
L(2)
A (1) L (1)
A (7)
+ (4)
H(4)
H (2) A (2)
- (3)
H (3)
H (1) A (2)
L (4)
+ (1)
? (1)
L (1)
- (3)
H(1)
L (1)
L (2)
A (2)
Chapter 3
66
5.4. Framework analysis
Next to influencing the design of the SPFS, the results of this study were
also used as a means to evaluate the theoretical framework presented
above. Therefore the interview data were placed in the theoretical frame to
examine whether the theoretical findings were confirmed or needed to be
altered and/or elaborated. This can inspire future studies that build on new
preliminary concepts, and hypotheses (Ritchie & Spencer, 1994). These
findings can also contribute to the ecological validity of research findings on
feedback effects, as here they are applied in the context of school
improvement (Visscher & Coe, 2003).
Each interview was transcribed verbatim and was independently coded
by two researchers with ATLAS.ti, a qualitative analytic software tool. Codes
were assigned by following the middle order approach, which allows for the
initial application of broad categories that can later be refined (Dey, 1993).
Text fragments were mainly assigned to codes in a deductive way. First, text
fragments were placed under broad categories (e.g., effects of use, phases
of use, the four groups of influencing factors, types of use, and other
relevant information) and were then assigned to a predefined coding
structure. If no predefined code was appropriate, the text fragments
considered to be of importance were placed under the suitable broader
category. New codes were created for these fragments inductively,
emerging from the data, as in the grounded theory approach (Strauss &
Corbin, 2007).
For inter-rater agreement, the first two interviews were coded
collaboratively and the coding structure was set up. Two interviews were
then coded by both researchers separately to calculate inter-rater
reliability, following the formula of Miles and Huberman (1994): ratio
between the number of agreements and the total number of attributed
codes. An inter-rater correlation value of .90 was calculated, indicating
good inter-rater reliability.
After this coding phase, the analysis shifted from a focus on individual
interviews in a vertical analysis to a focus on the coding categories as they
occurred in all the different interviews in a horizontal analysis (variable
oriented approach; Miles & Huberman, 1994). This allows the researcher to
transcend the individual narratives of the school principals and to create a
spectrum of perceptions and interpretations.
Chapter 3
67
6. Findings and discussion
6.1. What phases can be observed when schools use school performance
feedback?
The interview results confirm that school performance feedback use in
primary schools is limited. Most schools were situated at the first phase of
the policy cycle described above. Only a few schools reached the planning
phase and action phase in the policy cycle.
Concerning the dissemination of information, the first stumbling block
occurred at the moment feedback reports arrived at the school. Though all
interviewees confirmed receipt of the report, one of them could not
remember it. This stumbling block became more apparent when we
examined the various ways in which the reports were handled. In some
schools, the report was not read: “Mostly the reports arrive at the school. I
give it a glimpse and then it is classified. Then, nothing is done with it”
(School 8). Other school principals reported they only took a quick look at it.
In contrast, others distributed the report to the teachers responsible for the
class that was discussed in the report. Others handed the report over to the
special needs teachers or special care coordinators. Sometimes teachers
were intentionally not asked to be involved in reading the reports.
My opinion is that if you are not really acquainted with the
interpretation of these data, you will not spontaneously unravel the
whole report. It is not so easy. It is an extra task on top of the rest. If I do
this and draw the conclusions and give it to them, it is already a lot.
(School 5)
Occasionally reading the feedback reports led to discussion between the
principal and the special care coordinator. In other cases, teachers were
also invited into a discussion, but even then it was not guaranteed that they
would read the reports. Principals reported that informal and unplanned
discussions took place:
We have a smoking room. That’s where we discussed the report. Those
who entered the room glanced through the report. It was not
intentionally communicated to the rest of the team members. This
happened rather informally. (School 10)
Other principals reported having a formal discussion during planned
multidisciplinary team meetings. In these instances, the school principal or
special care coordinator presented a summary of the results and their
interpretations. All school principals reported that they only discussed the
Chapter 3
68
feedback information within the school team, with the exception of also
reporting the information to the education inspectorate.
While we made a theoretical distinction between a reading and an
interpretation phase, it became clear that in practice these phases merged
together. The principals or special care coordinators that discussed the
results with team members proposed their own interpretations.
The new report was read and discussed by me and the care coordinator.
Afterwards the report was discussed in a team meeting with all teachers;
not just the teachers that are involved in the research. Conclusions and
underlying statistical procedures were communicated. Growth curves
were presented. (School 2)
Principals also stressed that the interpretation process was an intensive,
time consuming, and difficult activity. Some confirmed that they were not
able to correctly interpret or understand the information. This is
problematic as the interpretation phase is crucial for developing a solid and
valid basis for the development of school policy (Earl & Fullan, 2003). While
a minority reported not having experienced difficulties, all principals
reported that successfully interpreting the report requires effort.
You really have to examine it carefully to figure it out. I went over it …but
to really master it, you have to read and examine it several times.
(School 1)
I think that …one of the reasons is that you first look at it. It is similar to
the directions for use of a new apparatus. First you set it up and
afterwards you read how it works. If the set up is successful, you are not
going to read the instructions for use. (School 14)
The laborious interpretation phase seems to have a strong impact on the
diagnostic phase. Most principals dropped out after one attempt at
understanding the feedback results. Only a few principals set up initiatives
to identify strengths and weaknesses in their school and examined the
feedback information when looking for explanations. However, this was
rarely set up in a systematic way.
Principals frequently stated that the diagnostic and action phase were
barely reached. They also linked this to the lack of cues in the feedback
reports that might direct future action. This may be a reason why school
feedback is not systematically taken into consideration when developing
internal policy.
Chapter 3
69
We discuss it with the teachers involved. And, until now, the
interpretation is limited to the reading of the report and the file, but no
immediate actions follow from this. (School 4)
But this feedback is not that useful for classes and individual children. I
think this is the biggest concern. In fact it has to be as concrete as
possible. That is the request of teachers; something ready-made. In fact
this is also partly how I am. If I take a method book, I expect not to have
to search for accompanying exercises. (School 3)
6.2. What is/are the result(s) of using school performance feedback?
The findings discussed above indicate factors that can affect the outcomes
of SPF use. We found that in some schools feedback is used as a mirror
image of the school’s performance. In those cases a better understanding
of the school’s impact on pupil performance was developed. However, this
did not automatically lead to (policy) actions. This can be labeled as
conceptual feedback use; it led to reflection in schools, even when the
results confirm prior findings and impressions.
Indeed, so far we have (…) already noticed a few things concerning the
school’s position that we were not aware of before. What we also notice
is that there is a large pupil mobility, which influences our results
significantly. These are important findings for us. (School 12)
Most important was to see where the school’s position is. How well are
we performing and whether the school realizes a value added score. This
is, for me personally, a refinement in thinking about what you are doing
as a school, about your task, about your aims … (School 7)
Illustrations of instrumental feedback usage were rare. Some principals
stated that the feedback information did not offer enough starting points
(e.g., remedial information) to direct actions. However, some principals
reported that action had been taken, such as a reorganization of rosters, an
increase in the number of teaching hours, the introduction of a new reading
method, and more intensive mentoring of new teachers. Even when
information confirmed prior findings, it led to instrumental feedback use:
“What is reported confirms what we already assumed. It is more like an
affirmation of our feelings. And we have done a few things, such as
introducing a new spelling method.” (School 10)
Feedback information was particularly used in a symbolic way.
Respondents indicated that school feedback was a useful instrument in
highlighting existing opinions and underlining various problems in the
Chapter 3
70
school’s functioning. According to the respondents, the feedback was used
as input for shared decision-making. However, this did not lead to concrete
action.
I had my own vision of the school and I wanted to impose it on the team;
this was a good instrument to make out a case for it and to say it is
necessary that we deal with this. (School 4)
Examples that we found of strategic utilization referred to the use of
school feedback in the development of the self-evaluation report to be
submitted to the education inspectorate. Principals reported that they were
grateful to participate in the study because they could make use of school,
class, and pupil related information for this purpose. This factor deviates
from the original theoretical model of SPF usage. Schools seem to have
used the feedback information in the context of being accountable to the
inspection authorities. This is in contrast to the perception of the authors
and developers of the SPFS who want feedback to be used for school
improvement.
Not all of the information gathered about feedback use could be placed
within the predefined coding scheme that was based on the literature
(Rossi, Lipsey, & Freeman, 2004; Visscher, 2002). Therefore two extra codes
were created: a motivating use and a pupil directed use. In some cases, the
feedback information helped to motivate or stimulate school team
members. In some schools, the feedback was communicated to team
members for this purpose, which sometimes implied a selective
presentation of the results.
If you are an immigrant school, as we are, sometimes it is questioned if
our performance level is high enough. And if you receive an output report
from an external organization, it partly confirms we are doing a good
job. (School 16)
For making internal plans (…) we selected some results for reading and
mathematics. We used these results for our own reports to say: ‘Look, on
this measurement occasion, we just took out these results and notice
that our children score like this. And the Flemish average is like this.
Thus, we are below this average’. (School 7)
The latter statement illustrates that lower performance results were also
used to motivate the team members to overcome shortcomings.
Conversely, some school principals kept the feedback results private,
especially if they were not as good as expected. This was explained by the
intention not to discourage team members.
Chapter 3
71
For example, concerning the learning gain scores. Absolutely. If I had to
communicate it and mention that for example the learning gain in the
first grade is smaller than on average and in the second grade larger
than on average, this would be very hard to bear for the teacher involved
if this is made public. I am sure of that. (School 6)
All of the aforementioned examples indicate feedback usage at the
school level. During the interviews, principals stressed that aggregated
results were useful for policy makers, but not for teachers who prefer a
pupil directed utilization. Classroom teachers need data at the pupil level to
direct actions that correspond to the learning needs of individual pupils.
Pupil feedback is seen as complementary to pupil monitoring systems and is
also considered more accessible to interpretation and to direct action on
short notice.
These interview results indicate that school feedback is not extensively
used and has a limited impact. In fact, many school principals had not yet
noticed school improvement effects by using the SPF, and if they had, they
referred to the effects of using the feedback reports of the previous year.
[As a result of mentoring starting teachers and introducing a new
method; cf. instrumental use] We see the AVI-results [AVI is a Dutch
grading system for reading fluency often used in primary education].
When before almost no pupil reached an AVI-1 level at the end of the
year with that method and that young teacher, we now have several
AVI-6 levels. Thus we have good results. That partly was a result of that.
(School 1)
Some principals stated that, because of the longitudinal nature of the study
that provides the feedback services, barriers against the feedback
discussions in the group decreased and interest in the results increased.
This illustrates the valuable effects of process variables that indirectly
contribute to school improvement (Schildkamp, 2007; Schildkamp, Visscher,
& Luyten, 2009; Schildkamp & Teddlie, 2008).
6.3. How can differences be explained in the interpretation and the use of
school performance feedback in different school contexts?
In the theoretical framework, different factors/conditions were discerned
that explain differences in school feedback use. Our findings confirm the
distinction of four clusters of related factors.
Chapter 3
72
Context related factors
To understand why school feedback is used to such a limited extent, we
must take into account both the research context and the Flemish
educational context. In terms of the research context the SPF presented
information at the school level with adjusted scores. These built on a
comparison with a reference group, resulting in value added scores. This is
a very new approach that principals are not acquainted with.
In terms of the Flemish educational context, the central educational
authorities do not formally encourage or oblige schools to adopt an SPFS
approach. Indeed, some authorities are even reluctant to do so, stating that
it introduces the risk that schools will be compared and ranked on the basis
of biased information or that adjusted scores will reveal another school
performance level than expected. However, educational inspection
authorities adopt another view. They encourage schools to document
school performance on the basis of performance related information.
[On being questioned about whether it was a conscious choice to
participate in the research project] You always have the possibility to
refuse…The main reason for me to participate was that our inspectorate
often asks for output results. And yes, of course we have our own class
tests but there is no reference point, because teachers create their own
tests. We also have tests from our methods. But nowhere is there a
comparison with another school to see how we perform. (School 1)
School and user related factors
The interview analysis indicated four groups of related school and user
characteristics.
Differences in expected functions and effects of school performance
feedback. School principals differed in the degree to which they had
expectations of using feedback as well as the goals they oriented
themselves towards with feedback use. Some did not even define goals or
targets, while others reacted in a proactive way. When schools did
formulate explicit and shared goals, the chances of observing more optimal
and successful feedback use increased. This indicates that if schools are
convinced of the potential of school feedback, they undertake actions
toward effective use (Bosker, Branderhorst, & Visscher, 2007). These
actions have to be performed by the users themselves for innovations to
become successful (Fullan, 2007).
Chapter 3
73
A distinction can be made between utilization expectations and effect
oriented expectations. In the former situation, school principals expected to
use the school feedback as a mirror, helping to develop a clearer view of
the current school operation and school performance, and to detect
strengths and weaknesses. Others expected to use feedback for policy
development (e.g., for evaluating policy decisions or developing policy
plans).
We thought ‘look this research will be conducted over seven years; we
are going to follow it up. Where are we as a school? We are putting a lot
of effort into our care policy. What does this effort give us in return?’ (…)
In fact, we do have a very problematic population and it is our goal to
see what the benefit is of all our effort. (School 1)
Another utilization oriented perspective was discussed above (i.e., when
principals used the information for accountability purposes). Almost all
principals intended to use the feedback as input for their discussions with
inspection authorities but stressed that they would not do this for parents.
In terms of effect expectations, principals expected that investing time and
effort in school feedback would eventually improve education: “We expect
to improve our quality of education. So far, for the first grade, it was worth
the effort. That is the goal: an improvement of our education” (School 1).
We found no evidence that the principals systematically reflected upon
their expectations with regard to feedback use and feedback effects. In
addition, principals indicated that their expectations of the feedback did
not necessarily reflect the opinions of their staff members.
Teachers are not willing to participate because it is a lot of work for
them. Moreover, the SPF project examines the same domains as the
pupil monitoring system, thus it does not directly benefit them. (…)
Teachers participate in this research project because the previous school
principal decided they would. For them, it is ‘if it must be.’ (School 2)
Differences in statistical knowledge and skills. Most school principals
claimed not to have advanced statistical knowledge. Their statistical
knowledge was acquired during their initial teacher training and additional
training courses, and was partly based on learning to work with pupil
monitoring systems. However, they stressed that this was insufficient to
work with school performance feedback. Conversely, some did not
experience difficulties, either because everything was explained in the
report or because they had sufficient prior knowledge.
Chapter 3
74
Everything [in the feedback reports] is explained in terms of how to
interpret it. Thus, if one pays enough attention to the instructions ‘to
read it this way and these numbers, if this is mentioned it means this,’
then I think no extra prior knowledge is needed. (School 4)
Differences in time available for feedback use. Some principals reported
that if more time was available, they would have made more use of the
feedback. Because principals and teachers have to divide their time over a
large number of activities, less urgent tasks as those related to SPF use are
not prioritized. This confirms previous findings that the self-evaluation of a
school is not a priority for principals and teachers (Visscher, 1996; Williams
& Coles, 2007).
There is often a lack of time. You cannot use this as an excuse but it is
often the reason. For example at team meetings, you want to put this
and this on the agenda, but then there is not enough time to go more
deeply into it, because there are so many issues coming from the outside.
(School 11)
Differences in perceptions of positive/negative feedback results. When
school feedback reflected low performance levels, the principals were
willing to search for explanations. This confirms the control theory of Kluger
and DeNisi (1996). However this observation cannot be generalized: When
the performance levels were far below average, sometimes feedback
results were not distributed in order not to discourage team members.
When performance results were perceived to be relatively good, further
use of the feedback reports decreased: “We are scoring on average, so
there are no severe differences. So why should we pay much attention to
it?” (School 3).
The perception of the performance results was influenced by the way
the results were represented, for example by the way value added is
calculated. The feedback reports presented both adjusted scores that took
into account the influence of pupil background characteristics and
nonadjusted scores. Our results indicate that especially in schools with a
large number of children with special needs, the adjusted performance
scores were valued positively.
The surplus value of this research for our school is that for all these years
we’ve had the impression we were doing things right. Because we have a
large number of foreign speaking and special needs children we want to
know the effects of the way we organize our education and monitor our
children. (…) Particularly in the last few years with the introduction of
Chapter 3
75
adjusted scores, some attention is given to the pupils’ progress, while
taking into account certain factors. (School 7)
School performance feedback (system) related factors
Feedback has to meet a number of requirements to facilitate correct
interpretation and to promote feedback utilization.
Differences in perceived feedback relevance. All school principals
requested that feedback should fit their needs. These needs differed
between schools. Some principals expressed a primary interest in
performance results on mathematics and language; others were more
interested in socio-emotional development or other subjects. Furthermore,
schools’ preferences differed in the calculation of value added scores
(observed or adjusted scores), in the way information was aggregated
(pupil – class – school – other subgroups), in the amount of statistical
background information in the reports, and the nature of the reference
group(s). During the interviews these differences were observed between
and within schools. Differences were also related to the roles and
occupations of feedback users. Teachers prefer pupil level feedback, pupil
relevant error analysis, and remedial material, whilst policy makers prefer
aggregated information that reflects their school focus.
In my opinion, the school and class level is the most interesting, in view
of my function. I am supposed to work mainly on school and class level
and less on the pupil level. Thus for me this is more interesting than an
individual report. But of course a teacher will see it differently. I am sure
of that. This teacher will probably prefer feedback about the pupils in this
class. (School 10)
When asked for ideas on how to better meet user needs, respondents
suggested enlarging the amount of school subjects to be tested, focusing on
different pupil cohorts, and tailoring information. The interviewees were
not pleased about redundant information. They required feedback systems
to focus on complementary information. In particular, some principals
asked for information that would complement the available monitoring
systems. All respondents required that the performance feedback be up to
date. In particular, teachers expected feedback within the same school year
as when tests had been taken, in order to support low-scoring pupils. When
teachers shifted classes, feedback results of previous years were considered
irrelevant.
Differences in perceived feedback interpretability. For this factor no
coherent picture could be deduced from the interview data. Most principals
Chapter 3
76
stated that interpreting the information was difficult. Some stressed that
interpreting the information without support was a hopeless task. Some
stated that the information could not be understood after only one reading.
But not all principals considered this to be a problem or experienced
difficulty in analyzing the reports. Some principals stated that it is important
to stress that school feedback is a complex field and cannot be simplified
without losing depth and meaning.
It is magnificent the way this report [is written]…It is not easy to explain
something complex that clearly. They [the feedback developers] largely
succeeded in it, but it is still a large amount of information. (…) Of
course, sometimes I get lost, which is not surprising, considering the
technicality of it. (School 7)
During the interviews, explanations for why some principals were not
able to correctly interpret the feedback were given. Some complained of a
lack of structure in the feedback information. Others criticized the amount,
stating that they skipped a lot of information, were selective, and focused
only on the school results.
Maybe some parts are less interesting for me, but this is not a reason to
leave out this information from the reports, because everything is
concisely described. For example, the information about pupil mobility, if
it does not interest you, just turn the page. (School 10)
In contrast, others appreciated the comprehensiveness of the feedback
reports and preferred the additional information. A third element
influencing the interpretability of feedback was the balance between
technical concepts and the way school staff label and discuss education.
Feedback was often experienced as being too abstract. Additionally,
principals seemed to be less familiar with feedback that was aggregated at
the class and school level. Both the language used and the graphical
representations (growth curves, box plots) led to difficulties in
interpretation. Some school principals stressed that the feedback is not
appropriate for teachers as they do not possess the competence or
experience to interpret the information, whilst others did not question the
competence of their staff.
Differences in perceived validity and reliability. Respondents trusted the
professionalism of the feedback developers. Nevertheless, they expressed
some concerns. Some principals valued the feedback less because the
adjusted scores do not take into account school specific process and
context variables. The feedback developers wanted to articulate these
differences, but schools preferred an adjustment model taking into account
Chapter 3
77
more external influences that explain school outcomes and result in an
average school profile.
I think researchers do not have enough information [about pupil and
school characteristics]. They do not know we introduced a new reading
method, which caused problems. They do not know there was a starting
teacher. And they do not know that this teacher is not worthy of being
called a teacher. That gives different results. This information should be
on top of it [of the current adjustment procedure]. It is important for the
school. (…) Now it does not give a correct image of the school. (School 1)
The feedback was perceived as valid and reliable when the results were
congruent with the findings of pupil monitoring systems, school tests, or
intuition. When this was not the case the results were seen as less valid and
low performance was more easily attributed to external factors, such as the
difficulty of test items, atypical question methods, and incorrect results of
the reference group (i.e., some schools were thought to have falsified their
results by helping their pupils during the test). Others criticized the single-
shot nature of the data gathering. A particular problem arose when a school
was geographically distributed. Aggregation of data at the school level was
of lesser value because the school’s population, and sometimes also
school’s policies, can differ between geographical locations. Finally,
concerns were expressed when class organization or differentiation forms
were very different from the approaches adopted in the reference group.
Differences in perceived user-friendliness of the SPFS. The nature of the
overall feedback system influenced feedback use. Respondents complained
about the large investment of time and effort during the data gathering
process. Teachers and pupils perceived the tests as stressful. In addition,
questionnaires directed to parents required a considerable amount of time
and a willingness to report private information. Furthermore, test times
overlapped with other key assessment and evaluation periods in the school
year. This explains why some teachers considered participation in the
project as an extra burden on top of a heavy workload. This feeling was
reinforced when the feedback was perceived as less relevant.
User-friendliness also refers to the tailoring of the school feedback.
Some principals suggested adapting the report to the individual school
setting. In the same line, satisfaction with the communication between user
and the feedback system played a role. Moreover, the schools received the
feedback at a rather unexpected moment, which made it difficult to include
the new information in the policy making cycle.
Chapter 3
78
Support related factors
The interviewees offered valuable information on user needs concerning
feedback support and advised us about how to fulfill these needs. The
results reveal that feedback use requires both policy oriented and research
oriented skills. These are skills that must be developed (Visscher, 2002).
Differences in support needs. As mentioned above, most users reported
not being able to interpret the information without extra support.
Nevertheless, feedback support should go further than just assuring a
correct interpretation. Almost all principals reported that they got stuck
after their attempt at interpretation. They stated that they did not feel
confident about their interpretative capacities and that they needed
recommendations on how to proceed to the next phases in feedback use.
I can only hope my interpretations are correct. But definitely with the last
report, it is so extensive that there is some – I will not say doubt, but fear
– that it might be wrong. (School 6)
It is the same problem as with pupil monitoring systems. You can go to
the teacher and say ‘these are your results. This child scores an E, Here
you are.’ That teacher will file the report and there it stops. (School 3)
The respondents asked for specific help dependent on whether they
received positive or negative feedback results. Furthermore, they requested
help in diagnosing the causes and circumstances that the results could be
attributed to. Most respondents asked for concrete instructions for action.
This suggests that consultation services could help to fulfill these additional
needs.
Differences in support characteristics. When asked for ideas on how to
organize support, some respondents requested a face-to-face introduction
to the concepts and representations in the report. These sessions should be
organized on site, but if that is not possible, regional meetings are
acceptable.
Feedback support should be functional, offering intelligible, theoretical,
and practical information. Principals expected the support to go beyond the
interpretation phase and to empower schools to diagnose their results.
Concerning the interpretation, we try to manage it. But we do not know
if we are doing it right. It would be interesting if the SPF project would
come with the report to the schools and would explain the information in
a team meeting with the teachers, with the whole team, to show us how
to look at the results. ‘What’s the next step?’ Because now we only get
the ‘sec’ results and read them as such, as how they are printed. Even
Chapter 3
79
some reading advice is provided, the impulse to really do something with
it is always lacking. (School 4)
Defining the role of external support services was a difficult issue. Some
respondents claimed that schools have to take the lead in feedback use.
This is in line with Earl and Fullan (2003), who claim that professional
development will help strengthen personal confidence and self-efficacy in
coping with complex feedback information. The respondents indicated a
preference for internal support by counselors and via in-service training.
External support from feedback suppliers should not interfere with these
initiatives. They emphasized the demand-driven nature of support. This
confirms the idea that external support must be tailored to the needs of
individual schools. A sufficient level of goodness-of-fit is a requirement to
achieve successful support (Nevo, 1995).
Principals also referred to school team members as a basis for support.
Principals mostly got support from the special care coordinator or teacher.
Often these staff members were more experienced in interpreting
statistical concepts and graphical representations. These staff members can
play a role as complementary specialists. As they have a more flexible work
schedule, they can allocate time to study feedback reports. This is not the
case for teachers that have to work according to a prescheduled roster.
Some school principals also wanted to protect team members against work
overload, thus not involving them in feedback use activities. They might
also have perceived these staff members as less important sources of
support in feedback use.
7. Implications, limitations and conclusion
The present study focuses on the perception of principals of school
performance feedback and the actual use of feedback information. This
study took place within the context of a larger project aiming to develop
and implement a new school performance feedback system. This study also
builds on an eclectic framework that integrates the literature on SPF use.
This framework was the guiding structure for interviews with 16 principals
from different primary school settings. Our results indicate that the
elements presented in the theoretical framework reappear in the
interviews. Figure 3 represents the integration of findings from the
literature and our study.
80
Figure 3. Integration of literature and research findings on SPF use
Context related
◦ School improvement –accountability
◦ Pressure and support
◦ Support needs
◦ Support set up
◦ Internal - External
School and userrelated
◦ Functions/expectations of SPF use
◦ Prior knowledge and experience in data use
◦ Priorities in task scheme
◦ Statistical knowledge and skills
◦ Perception of school performance level
School performancefeedback(system)related
◦ Perception of relevance
◦ Perception of interpretability
◦ Perception of validity and reliability
◦ Perception of user-friendliness
Supportrelated
Results: Types offeedback use
InstrumentalConceptualSymbolicStrategicPupil directedMotivating
Successivephases
Reading and discussing
DiagnosisPlanning
Implementation
Evaluation
Receivingfeedback
Interpretion
Intended – Unintended Desirable – Undesirable
Product - Process
F E E D B A C K U S E
E F F E C T S I N F L U E N C I N G F A C T O R S
Chapter 3
81
The aim of this study was to illustrate and elaborate a framework of
factors that influence school performance feedback use. Where previous
studies have provided literature findings (Visscher, 2002; Visscher & Coe,
2003), perspectives of feedback suppliers (Schildkamp & Teddlie, 2008;
Visscher & Coe, 2002), and quantitative methods of testing feedback use
(Schildkamp, Visscher, & Luyten, 2009), this study illustrates the influence
of different variables on feedback use in a qualitative way.
From a theoretical perspective our findings can help refine the
description of feedback use. Whereas previous studies (e.g., Schildkamp,
2007; Visscher, 2002; Visscher & Coe, 2003) make a distinction between
different kinds of information use (cf. instrumental, symbolic, and
conceptual use; Rossi, Lipsey, & Freeman, 2004; cf. strategic use; Visscher &
Coe, 2003), an empirical investigation of the phases of feedback use has not
been carried out. In this study both were explored. Additional types of
feedback use emerged from the data: a motivating and pupil directed use.
The interview data also show that different types of feedback use are
related to one another and occur simultaneously or successively. While a
sequence of feedback phases can be discerned theoretically (Learning Point
Associates, 2004), the process of feedback use is less systematic in practice.
Our findings indicate that users can get stuck in the process of feedback
use. A crucial challenge for future feedback use is to detect the difficulties
in each phase and to offer appropriate support to systematize the process
involved.
Our findings indicate that interpreting school feedback, making a
diagnosis based on the results, discussing causes, and setting up actions
based on feedback results is not a clear-cut process. The results reveal that
feedback use requires both policy oriented and research oriented skills
which must be developed by users (Visscher, 2002). Educational authorities
should not neglect the importance of stimulating professional development
and providing external support. Expectations about the positive impact that
feedback use can have on school improvement will only be realized if extra
support is available (Schildkamp & Teddlie, 2008; Sun, Creemers, & de Jong,
2007).
To design appropriate support initiatives, a detailed analysis of the
difficulties encountered when interpreting feedback reports must be
conducted. For example, a recent study, which used both oral
comprehension tests and IRT-calibrated online tests, illustrated the
misconceptions that respondents reported during the interpretation of
feedback reports. The results of that study contributed to the design of
specific support initiatives (Verhaeghe, Verhaeghe, Vanhoof, & Valcke,
Chapter 3
82
2009). Furthermore, experimental studies that manipulate the nature of
external support can contribute to the design of a more sophisticated SPFS
(e.g., Tymms, 1995) and the required support measures. In the design of
SPFS, it is important to integrate the characteristics which appear to have a
considerable influence on feedback use, such as relevance, interpretability,
reliability, and validity. These characteristics are mediated by the
perceptions of the feedback users. What is considered relevant by feedback
developers, policy makers, or researchers does not necessarily correspond
with what the target group perceives as relevant. However, little is known
about the effect of these differing perspectives in the context of school
feedback use.
Moreover, one cannot expect schools to successfully implement
innovations without making sufficient resources available (Davies & Rudd,
2001; Kimball, 2002). As school feedback use is not heavily promoted
(Davies & Rudd, 2001), resources are limited. When we consider the work
load of teachers and principals, our findings indicate that teachers will
prioritize their classroom related activities at the expense of school level
issues.
This study was conducted in Flanders where there is no accountability
culture or central examination system. It is not yet clear whether effective
feedback use in such a context should only function within a school
improvement perspective, as we found that feedback use was stimulated
by an accountability orientation in terms of the inspection visits. It would be
useful to examine the (in)direct influence of national and international
authorities on feedback use (Creemers, 2006). Future research could focus
on the relationship between a school improvement and an accountability
orientation of educational authorities and key stakeholders (Vanhoof & Van
Petegem, 2007) and on the balance between internal and external
evaluations (Kyriakides & Campbell, 2004), influencing feedback use in
schools.
The present study contains certain limitations. The validity of our
findings is restricted to a specific educational context, with a particular
school performance feedback system. However, the aim of this study was
not to formulate generalizations but to explore and illustrate feedback use
by its users. Another limitation is that a comprehensive framework is
needed with an evidence based set of influencing factors. Neither this
study, nor previous school performance feedback studies have attempted
to meet this need. Furthermore, the link between school performance
feedback use and the more general practice of data driven decision making
remains unexplored. Despite the focus on accountability in the data driven
Chapter 3
83
decision making literature, common points of interest with SPF use can be
further examined.
References
Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the
utilisation of management information systems in secondary schools.
School Effectiveness and School Improvement, 18(4), 451-467.
Creemers, B.P.M. (2006). The importance and perspectives of international
studies in educational effectiveness. Educational Research and
Evaluation, 12(6), 499-511.
Coe, R. (2002). Evidence on the role and impact of performance feedback in
schools. In A. J. Visscher & R. Coe (Eds.), School improvement through
performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger.
Davies, D., & Rudd, P. (2001). Evaluating school self-evaluation (LGA
research report 21). Berkshire: National Foundation for Educational
Research, Local Government Association.
Dey, I. (1993). Qualitative data analysis: A user-friendly guide for social
scientists. London: Routledge.
Earl, L., & Fullan, M. (2003). Using data in leadership for learning.
Cambridge Journal of Education, 33(3), 383-394.
Fitz-Gibbon, C.T., & Tymms, P. (2002). Technical and ethical issues in
indicator systems: Doing things right and doing wrong things. Education
Policy Analysis Archives, 10(6), 68-82.
Fullan, M. (2007). The new meaning of educational change (4th ed.).
London: Cassell.
Goldstein, H., & Myers, K. (1996). Freedom of information: Towards a code
of ethics for performance indicators. Research Intelligence, 57, 12-16.
Goldstein, H., & Spiegelhalter, D.J. (1996). League tables and their
limitations: Statistical issues in comparisons of institutional
performance. Journal of the Royal Statistical Society. Series A (Statistics
in Society). 159(3), 385-443.
Hammond, P., & Yeshanew, T. (2007). The impact of feedback on school
performance. Educational Studies, 33(2), 99-113.
Hoy, W., & Miskel, C. (2001). Educational administration: Theory, research
and practice. Boston: McGraw-Hill.
Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006).
Strategies to promote data use for instructional improvement: Actions,
outcomes, and lessons from three urban districts. American Journal of
Education, 112(4), 496-520.
Chapter 3
84
Kimball, S.M. (2002). Analysis of feedback, enabling conditions and fairness
perceptions of teachers in three school districts with new standards-
based evaluation systems. Journal of Personnel Evaluation in Education,
16(4), 241-268.
Kluger, A.N., & DeNisi, A. (1996). The effects of feedback interventions on
performance: A historical review, a meta-analysis, and a preliminary
feedback intervention theory. Psychological Bulletin, 119(2), 254-284.
Kyriakides, L., & Campbell, R.J. (2004). School self-evaluation and school
improvement: A critique of values and procedures. Studies in
Educational Evaluation, 30, 23-36.
Learning Point Associates. (2004). Guide to using data in school
improvement efforts: A compilation of knowledge from data retreats and
data use at learning point associates. Retrieved October 23, 2007, from
http://www.learningpt.org/pdfs/datause/guidebook.pdf
Liket, T.M.E. (1992). Vrijheid & rekenschap: Zelfevaluatie en externe
evaluatie in het voortgezet onderwijs [Freedom and accountability: Self
evaluation and external evaluation in secondary education]. Amsterdam:
Meulenhoff Educatief.
Lindlof, T.R., & Taylor, B.C. (2002). Qualitative communication research
methods (2nd ed.). London: Sage Publications.
Maso, I., & Smaling, A. (1998). Kwalitatief onderzoek: Praktijk en theorie
[Qualitative research: Practice and theory]. Amsterdam: Boom.
Mason, J. (2002). Qualitative Researching (2nd ed.). London: Sage
Publications.
Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An
expanded sourcebook (2nd ed.). Thousand Oaks: Sage Publications.
Nevo, D. (1995). School-based evaluation: A dialogue for school
improvement. Oxford: Pergamon.
Nevo, D. (2002). Dialogue evaluation: Combining internal and external
evaluation. In D. Nevo (Ed.), School-based evaluation: An international
perspective (pp. 3-16). Oxford: Elsevier Science.
Ritchie, J., & Spencer, L. (1994). Qualitative data analysis for applied policy
research. In: A. Bryman & R. Burgess (Eds.), Analysing qualitative data
(pp. 173-194). London: Routledge.
Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic
approach (7th ed.). London: Sage Publications.
Saunders, L. (2000). Understanding schools’ use of ‘value added’ data: The
Psychology and sociology of numbers. Research Paper in Education,
15(3), 241-258.
Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’
data: A science in the service of an art? Paper presented at the British
Chapter 3
85
Educational Research Association Conference, Brighton, University of
Sussex.
Schagen, I. (2004, November). Weighing the baby or fattening it: The use of
data to inform school evaluation. Paper presented at the NFER/ConfEd
Annual Research Conference, London.
Schildkamp, K. (2007). The utilisation of a self-evaluation instrument for
primary education. Unpublished doctoral dissertation, University of
Twente.
Schildkamp, K., & Teddlie, C. (2008). School performance feedback systems
in the USA and in The Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school
self-evaluation instrument. School Effectiveness and School
Improvement, 20(1), 69-88.
Silverman, D. (2005). Doing qualitative research: A practical handbook (2nd
ed.). Londen: Sage Publications.
Strauss, A.L., & Corbin, J. (2007). Basics of qualitative research: Grounded
theory procedures and techniques (3rd ed.). Newbury Park, CA: Sage
Publications.
Sun, H., Creemers, B., & de Jong, R. (2007). Contextual factors and effective
school improvement. School Effectiveness and School Improvement,
18(1), 93-122.
Tymms, P. (1995). Influencing educational practice through performance
indicators. School Effectiveness and School Improvement, 6(2), 123-145.
van Aanholt, T., & Buis, T. (1990). De school onder de loep [The school under
scrutiny]. Culemborg, The Netherlands: Educaboek.
Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatie-
indicatoren als strategisch instrument voor schoolontwikkeling
[Feedback on school performance indicators as strategic instrument for
school improvement]. Pedagogische Studiën, 81, 338-353.
Van Petegem, P., & Vanhoof, J. (2007). Towards a model of effective school
feedback: School heads’ point of view. Educational Research and
Evaluation, 13(4), 311-325.
Van den Berg, R., & Ros, A. (1999). The permanent importance of the
subjective reality of teachers during educational innovation: A concerns-
based approach. American Educational Research Journal, 36(4), 879-906.
Vanhoof, J. (2007). Zelfevaluatie binnenstebuiten: Een onderzoek naar het
verloop en de kwaliteit van zelfevaluaties in scholen [Self-evaluation
inside out: A study on the proceeding and quality of self-evaluations in
schools]. Mechelen: Wolters-Plantijn.
Chapter 3
86
Vanhoof, J., & Van Petegem, P. (2007). Matching internal and external
evaluation in an era of accountability and school development: Lessons
from a Flemish perspective. Studies in Educational Evaluation, 33(2),
101-119.
Verhaeghe, G., Verhaeghe, J.P., Vanhoof, J., & Valcke, M. (2009). The value-
added results of schools: How to represent school feedback information?
Manuscript submitted for publication.
Visscher, A.J. (1996). The implications of how school staff handle
information for the usage of school information systems. International
Journal of Educational Research, 25(4), 323-334.
Visscher, A.J. (2002). A framework for studying school performance
feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement
through performance feedback (pp. 41-71). Lisse: Swets & Zeitlinger.
Visscher, A., & Coe, R. (Eds.). (2002). School improvement through
performance feedback. Lisse: Swets & Zeitlinger.
Visscher, A., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Williams, D., & Coles, L. (2007). Teachers' approaches to finding and using
research evidence: An information literacy perspective. Educational
Research, 49(2), 185-206.
87
CHAPTER 4
VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL FEEDBACK INFORMATION
Chapter 4
88
CHAPTER 4: VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL
FEEDBACK INFORMATION∗∗∗∗
Abstract
The use of data for school improvement purposes has recently gained
research interest. In order to use school performance feedback (SPF)
effectively, it is necessary to interpret feedback information correctly.
However, systematic research on this topic is scarce. Therefore, the present
experimental study was set up to examine the effectiveness of various
modes of explaining and representing the statistical concepts ‘value added’
and ‘learning gain’ in SPF reports. The results indicate that non-statistically
skilled people encounter interpretation difficulties, especially in deriving
value-added scores and for complex conceptual questions. This delineates
the importance of developing effective SPF systems and support initiatives.
∗ Based on Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of
schools: How to represent school feedback information. Manuscript submitted for publication in The Journal of Educational Research.
Chapter 4
89
1. Introduction
Over the last decades, governmental bodies require schools to be
accountable for their educational quality in return for school autonomy
(Nevo, 2002). Schools are not only expected to gain insight into their input
and process characteristics, but also to link these to their output. Therefore,
schools are required to systematically gather data on their functioning for
self-evaluation purposes. In this context, school performance feedback
(SPF) is an important source of information on pupil performance. It also
helps the school to detect the extent to which it contributes to pupils’
performance levels.
The correct interpretation of school feedback information is a crucial
condition for effective feedback use (Visscher, 2002; Visscher & Coe, 2003).
The interpretation phase is one of the most important phases in the process
of using feedback and requires a considerable amount of time, skills, and
effort (Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010). An examination
of existing SPF systems and their related literature reveals that research on
user comprehension is scarce (Schildkamp & Teddlie, 2008). Few studies
have examined the effectiveness of the various modes of explaining and
representing data in school feedback reports. This is problematic
considering the fact that SPF reports use complex concepts, such as
learning gain and value added, whilst SPF users (i.e., school principals and
teachers) are often not statistically skilled (Earl & Fullan, 2003; Kerr, Marsh,
Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Williams & Coles, 2007).
In this study we explore whether alternative modes of explaining and
representing SPF information have a differential impact on user
comprehension in a group non-statistically skilled people. The results of this
study are expected to contribute to a better understanding of the way the
target group (i.e., mainly school principals and teachers) interprets SPF
information. The results are also expected to direct the design and
development of a new SPF system.
Below we outline the central concepts investigated and present our
theoretical frame on how alternative ways of presenting complex
information influence users’ understanding. Following this, research
questions and hypotheses are presented.
1.1 School Performance Feedback
School performance feedback is conveyed to its users through school
performance feedback systems (SPFSs). Visscher and Coe (2002) define
SPFSs as “Information systems external to schools that provide them with
Chapter 4
90
confidential information on their performance and functioning as a basis for
school self-evaluation” (2002, p xi). SPFSs primarily aim at supporting school
improvement and internal quality policy, which distinguishes them from
school accountability systems. These feedback initiatives are
complementary to central examination results (provided by government
agencies) and to school and class data. SPFSs contribute to the creation of
information-rich environments essential for schools in their data-driven
decision making process. These data provisions serve schools in their role as
learning organisations that continuously monitor and improve their quality
policy (Hofman, Dijkstra, & Hofman, 2009; Leithwood, Aitken, & Jantzi,
2006).
Examples of available SPFSs or related projects are Performance
Indicators in Primary Schools (PIPS; Durham University, Centre for
Evaluation and Monitoring, n.d.), and the VAS [Monitor and Advice System]
(CiTO [Central Institute for Test Development], n.d.). In some cases, SPF is
also provided to schools in return for their participation in large scale
research projects. This approach is adopted by Flemish researchers in
relation to the Progress in International Reading Literacy Study (Katholieke
Universiteit Leuven, Centre for Educational Effectiveness and Evaluation,
2008), the Programme for International Student Assessment, and the
Trends in International Mathematics and Science Study.
The present study focuses on the representation and understanding of
two statistical concepts that can be considered as key concepts in school
feedback reports: learning gain and value added (Saunders, 2000). In this
study, learning gain (or gain score) is defined as the progress of a learner in
a certain knowledge domain. This can be considered as the difference
between the test scores of an individual at two different moments, on the
condition that the same scale is used for both tests. The latter implies that
the same test is presented twice or different tests are IRT (Item Response
Theory) calibrated. For example, PIPS Reception builds on the
administration of the same test at the start and at the end of the school
year while the VAS calculates the learning gain on the basis of skill scores,
estimated by the performance on different tests at three measurement
occasions.
Value added refers to the extent to which the school has contributed to
the achievement progress of its students (Organisation for Economic Co-
operation and Development, 2008). It can be operationalised as the school
level residual, after adjusting for the effects of student background
characteristics. Two different approaches to explaining the concept of value
added can be discerned in SPF reports. Both approaches to explaining the
concept are used in an overview publication of the OECD in relation to
Chapter 4
91
value-added modelling (OECD, 2008), but are not clearly distinguished or
further refined.
The first approach is based on the notion of expected values. Based on
students’ background characteristics and their prior achievement, a certain
achievement level can be expected. Students’ actual/observed achievement
level can be higher or lower than the expected/predicted achievement
level, because of measurement error, uncontrolled individual differences,
or differences between schools. Averaging the differences between
expected achievement and observed achievement within a particular
school will principally (i.e., when there is a sufficient number of students)
cancel out measurement error and the impact of individual differences.
What is left is the shared value added for all students attending the same
school. This approach is applied in the feedback reports of the Centre for
Evaluation and Monitoring and in the Value Added National Project (Fitz-
Gibbon, 1997).
The second approach to explaining the concept of value added starts
from the notion of adjusted achievement level. The adjusted achievement
level represents the achievement level a learner in a particular school
would have had if his or her input characteristics and prior achievement
were equal to the reference group, i.e., the “average” school. If a school
does not differ from an average school in its contribution to students’
learning, its mean adjusted achievement level will be equal to that of the
average school. In that case, the difference between the school mean of the
students’ adjusted achievement scores and the mean of the average school
will be zero. However, if there is a difference, the school’s contribution to
students’ learning (value added) is higher or lower than in the average
school. This approach is used in the PIRLS reports in Flanders.
It should be noted that these two approaches refer to exactly the same
statistical procedure and resulting regression equation. They only differ in
the underlying mathematical operation used to isolate the school level
residual from that equation, as represented in Table 1.
Chapter 4
92
Table 1 Regression equations of two approaches determining the school’s value added
Approach 1 – Expected means
Basic multilevel regression equation ��� � �� � ��� � ��� � ����
Observed school mean is given by � � � �� � � �����
�
���� � ���
Stripping the school level residual gives the predicted or expected school mean
� �� � �� � � �����
�
����
Subtracting predicted mean from raw mean yields
� � � � �� � � ���
From which follows ��� � � � � � ��
Meaning: Value added =
observed mean – expected mean
Approach 2 – Adjusted means
Basic multilevel regression equation ��� � �� � ����� � ��� � ����
Observed school mean is given by � � � �� � � �����
�
���� � ���
Stripping the effects of student input characteristics (= setting them equal to reference group)
� � �����
�
����
Yields adjusted school mean � �, � �� � ���
From which follows ��� � � �, � ��
Meaning: Value added =
adjusted mean – grand mean
To our knowledge, the differential impact of these two modes of
explaining value added on SPF users’ understanding of the concept has not
yet been studied. Not much research has been carried out on the way SPF is
interpreted by its users or how this interpretation is influenced by the
representational format of SPF reports. However, since school feedback
builds on numerical information, statistical concepts, and graphical
Chapter 4
93
representations, literature on graph design and interpretation is used as a
starting point for the design of this study.
1.2 Representation of complex quantitative information
Numerous studies have examined the way graphical representation of
numerical information is understood (e.g., Kosslyn, 2006; Leinhardt,
Zaslavsky, & Stein, 1990). Research has also examined the representation of
textual information in combination with multimedia representations, such
as illustrations and graphs (e.g., Mayer, 2001; Mittal, Carenini, Moore, &
Roth, 1998; Schnotz & Bannert, 2003), in relation to task performance. We
do not present an overview of this field of research, but build on the
available evidence about design principles derived from these studies. We
refer to these principles when describing the experimental design of the
present study.
Principles of effective graph design
An interesting overview study that summarises best practices in graph
design is the work of Kosslyn (2006). According to this author, many graphs
are not satisfactory because they do not adequately consider the aims,
needs and competences of the user. Based on research in perceptual and
cognitive information processing, Kosslyn proposes eight design principles:
(1) The principle of relevance. This concerns limiting or reducing the amount
of information presented: Only the information necessary to get the
message across must be represented. (2) The principle of appropriate
knowledge. This means tailoring information to the prior knowledge of the
user. (3) The principle of salience. Since it is crucial to attract the audience’s
attention, this principle stresses making large perceptible differences in
information presentation. (4) The principle of discriminability. The graphical
representation should enable users to distinguish between different pieces
of information. (5) The principle of perceptual organisation. This refers to
the tendency of users to group together perceptual elements and to
remember these groups better than isolated elements. Furthermore,
Kosslyn recommends promoting understanding and memory: (6) the
principle of compatibility stresses the importance of the compatibility
between form and meaning, and (7) the principle of informative changes
indicates that readers interpret any changes in displays as conveying
information (e.g., changing the colour, adding lines). Finally, (8) the principle
of capacity limitations, addresses users’ limited capacity to retain and
process information. Mayer (2001) also stresses these limitations in his
Chapter 4
94
cognitive theory of multimedia learning. In this context, Sweller, van
Merriënboer, and Paas (1998) describe the concepts of intrinsic and
extraneous cognitive load. Intrinsic cognitive load refers to the difficulty
inherent to instructional materials. The degree of intrinsic cognitive load
depends on the element interactivity, or the number of elements
simultaneously manipulated in one’s working memory. While intrinsic
cognitive load is related to the material being learned, extraneous cognitive
load refers to the instructional design. This concerns the execution of
cognitive activity that is redundant to the purpose of the task (Chandler &
Sweller, 1991). Overtaxing the user’s working memory is caused by
ineffective presentation of the materials. For example, when accompanying
text and illustrations are presented separately or inappropriately, the
reader has to invest extra cognitive effort to integrate the information. Both
types of cognitive load are additive, but only extraneous cognitive load can
be altered and prevented by the design of the learning material.
Misconceptions in interpreting graphs
Next to research on data representation, our study also builds on previous
research on the interpretation of graphs and the common misconceptions
of inexperienced users. Smith III, diSessa, & Roschelle (1993) define a
misconception as “a student’s conception that produces a systematic
pattern of errors” (p.119) that arises from the student’s prior learning. This
prior learning can follow from formal instruction (Smith III, diSessa, &
Roschelle, 1993), general knowledge, or intuition (Leinhardt, Zaslavsky, &
Stein, 1990). Alternative terms that are used to depict students’
misconceptions include preconceptions, alternative conceptions, naïve
beliefs, alternative beliefs, alternative frameworks, naïve theories, and
systematic errors (Mevarech & Kramarsky, 1997; Smith III, diSessa, &
Roschelle, 1993). All terms refer to students’ conceptualisations that differ
from the accepted or intended meaning of the instructed concepts. From a
constructivist point of view, misconceptions can be considered as the
incomplete acquisition of expert knowledge in a learning process, rather
than mistakes that impede learning (Smith III, diSessa, & Roschelle, 1993).
Clement (1988) and Leinhardt, Zaslavsky, and Stein (1990) present a review
of common misconceptions that occur when people interpret graphs. These
include the slope-height confusion, confusion in responding with a point or
an interval, and mistaking a graphical as an iconic representation (e.g.,
uphill and downhill for a rising and a descending curve). These
misconceptions are also mentioned in the studies of Beichner (1994),
Mevarech and Kramarsky (1997), and Kramarski (2004).
Chapter 4
95
SPF systems have not yet been incorporated in Flanders’ school system
which means that feedback users are inexperienced. Given that SPF
contains several complicated concepts (e.g., value added and learning gain)
and forms of representation (e.g., growth curves), we expect feedback
users to make several mistakes when interpreting the information. In this
study we explore which difficulties are encountered by statistically unskilled
participants when trying to understand learning gain and value added. In
this study we try to prevent interpretation difficulties by using effective
graph design principles in the SPF reports. It is therefore important to
examine what ways of representing information in the SPF report are
effective, taking into account the characteristics of its users.
Individual differences
Previous research has explored the interaction between representation
formats and individual characteristics. The effect of representation forms
on task performance appears to depend on learning styles, individual
preferences (Dekeyser, 2001), differences in ability (Mayer, 2001; Tapiero,
2001), and prior knowledge (De Westelinck, Valcke, De Craene, & Kirschner,
2005; Mayer, 2001; Shah & Hoeffner, 2002). Prior knowledge also appears
to be an important factor in determining the degree of intrinsic cognitive
load. Since more experienced users are able to handle higher element
interactivity they experience a lower degree of intrinsic cognitive load
(Sweller, van Merriënboer, & Paas, 1998). Furthermore, a recent study was
carried out on the interaction between learner characteristics and
hypermedia learning, cognitive load, and information utilisation strategies
(Scheiter, Gerjets, Vollmann, & Catrambone, 2009). The results of that
study indicate that characteristics such as positive attitudes towards
mathematics, more complex epistemological beliefs, higher prior
knowledge, and better cognitive and metacognitive strategy use have a
positive influence on these outcome variables.
This study will incorporate several individual variables that are expected
to have an influence on feedback users’ understanding of the information.
1.3 Research questions and hypotheses
The present study examines the extent to which non-statistically skilled
users understand the explanations and representations of the concepts
learning gain and value added. We focus on feedback users’ conceptual and
procedural (i.e., deriving information from graphical representations)
understanding of these terms. The level of understanding is indicated by
Chapter 4
96
the number of misconceptions and other mistakes made during the
interpretation of the information. Since the feedback users participating in
this study are inexperienced in interpreting SPF reports, we expect them to
have misconceptions and interpretation difficulties with the complex
conceptual and graphical information (Hypothesis 1).
Interpreting information accurately largely depends on the instructional
design of the learning material. Two modes of explaining the term value
added have been discussed above. Since no research has been carried out
on these two modes of explanation, we cannot formulate any expectations
with regard to the differential impact they may have on feedback users’
conceptual and procedural understanding of the SPF report. We therefore
examine this in an exploratory way (Research question 1)
Adding representations to textual information can support the
interpretation of the information presented, as indicated in the theory of
multimedia learning (Mayer, 2001). Analogously, we test the hypothesis
that graphical representations with supporting information that is supposed
to facilitate interpretation are more favourable for successful feedback
interpretation than basic representations (Hypothesis 2).
As individual learner characteristics have an effect on task performance
and cognitive load, we take these differences into account and expect them
to serve as significant control and/or moderator variables.
2. Method
2.1. Design
A 2x3 factorial experimental design with post-test was used. Two variants
of explaining the concept of value added were combined with three
alternatives to represent learning gain and value added (for a schematic
overview, see appendix A).
2.2. Participants
The target audience for SPF consists of school principals and teachers in
primary and secondary education. However, since no SPF system is
currently available in Flanders, a study was set up involving first year
students in educational sciences (N = 312, mean age 19.33 years, SD 1.69,
88% women) at the Ghent University. Not all participants started
educational studies without prior study experience as some had already
obtained a professional bachelor degree in a different subject area (n = 62,
Chapter 4
97
mean age 21.98 years, SD 1.14, 81% women; versus freshmen, n = 250,
mean age 18.75 years, SD 1.16, 90% women).
Students participated in this experiment as a formal part of their study
programme. They subscribed individually for one of the eight parallel
experimental sessions.
2.3. Material
SPF Tutorial - Experimental conditions
Participants were randomly assigned to one of the six different
experimental conditions. In each condition, they received a specific version
of an SPF report via PowerPoint presentation. This medium was used as it
enables a controlled stepwise instruction for each participant at his/her
own pace and simulates the electronic version of the SPF that schools
receive.
Each of the six presentations consisted of approximately 40 slides. First
an introduction and slide overview was presented, followed by pie graphs
representing the background characteristics of the fictitious school
population. Next, an explanation of the concept learning gain was
presented, which was the same in all experimental conditions. Third, the
definition and estimation of value added was shown.
Each feedback report presented the average growth curve from grade
one to grade six of a cohort of pupils and presented the school’s value
added for one single subject. The horizontal axis of the line graphs indicated
time and measurement moments; the vertical axis indicated the mean skill
score, as represented in Figure 1.
Figure 1. Example of growth curve as used in the present study.
Chapter 4
98
The design of the tutorials builds on Kosslyn’s principles of relevance,
appropriate knowledge, and capacity limitations (2006). As few slides as
needed were used to clearly explain the concepts learning gain and value
added. Therefore, only limited information was given about the underlying
statistical analyses, such as Item Response Theory and regression analysis.
Furthermore, captions were added piecewise to graphs, consistent with
Mayer’s theory of multimedia learning (2001). Spatial contiguity was
respected to promote the integration of captions and illustrations, and to
prevent extraneous cognitive load (Chandler & Sweller, 1991; Sweller, van
Merriënboer, & Paas, 1998). For the design of the growth curves, different
colours for lines and different symbols for points were used, following the
principle of discriminability and salience (Kosslyn, 2006).
The difference between alternative presentations was based on the way
value added was explained (Research question 1). In half of the conditions,
value added was presented as the difference between the school mean of
the students’ adjusted achievement and the mean of the “average school”
(i.e., the reference group). In the other conditions, value added was
explained as the difference between the average expected achievement
and observed achievement within the fictitious school.
In addition, three different presentation formats were used in the target
group: (1) a baseline version building on text and graphs, and two
elaborated versions enriched with either (2) tables, or (3) symbolic
representations of the underlying statistical concepts. For the basic version,
we opted for text explanation and growth curves, since that is a common
way to represent longitudinal data. Two additional conditions were created
by adding representations that are supposed to support the knowledge
construction of the learners (Hypothesis 2). First, we opted to add cross
tables to the basic version to support the use of prior knowledge, since the
target audience is acquainted with this form of representation from daily
use. The second elaborated version was based on symbolic representations
to explain the simplified regression equations (without detailed equations
as in Table 2). Schematic representations were used, but the variable
names were written in full instead of using Greek symbols (see Figure 2). In
this way, Kosslyn’s principle of appropriate knowledge (2006) was
respected. This form of representation was expected to foster a more deep-
level understanding of the value-added estimation procedure, resulting in
higher performance scores.
Figure 2. Example of symbol representation as
Performance test
An online post-test was developed to measure respondents’ conceptual
understanding of the SPF and their procedural skills in deriving information
from graphs and tables (Anderson, Krathwohl, Airasian, Cruikshank, Mayer,
Pintrich, et al., 2001). The post-test consisted of two parts. In the first part
(closed version) students were not allowed to lo
report, whilst they could do so in the second part (open version). In reality,
principals and teachers are always able to check the feedback report, as
mirrored in the open version of the test. Nevertheless, the closed version
was used to determine the potential differential learning effects of
alternative representations. Since we can expect a carry over effect of
taking the first test on the results of the second test, half of the
respondents only participated in the second test. B
experimental conditions and test approaches, 12 different test groups can
be distinguished with the number of participants in each group varying
between 25 to 28 (see Appendix A for a schematic overview).
As we found no previous research that tests SPF users’ comprehension
of value added and learning gain, we developed a test for this study. To do
so, we created a framework which included all of the different cognitive
tasks that have to be performed to correctly interpret the feedb
information. Using this framework, suitable test items were developed
varying in degree of difficulty. Out of this list of closed items (true
and multiple choice) and open items (filling in digit values), a test was
composed that could be comple
conceptual and procedural items on two central concepts in the SPF report:
• 6 items referring to conceptual knowledge of learning gain: For example:
“Learning gain is the extent to which pupils progress in a certain ski
domain. (true-untrue)”
Chapter 4
99
Example of symbol representation as used in the present study.
test was developed to measure respondents’ conceptual
and their procedural skills in deriving information
from graphs and tables (Anderson, Krathwohl, Airasian, Cruikshank, Mayer,
test consisted of two parts. In the first part
(closed version) students were not allowed to look back at the feedback
report, whilst they could do so in the second part (open version). In reality,
principals and teachers are always able to check the feedback report, as
mirrored in the open version of the test. Nevertheless, the closed version
sed to determine the potential differential learning effects of
alternative representations. Since we can expect a carry over effect of
taking the first test on the results of the second test, half of the
respondents only participated in the second test. Building on the different
experimental conditions and test approaches, 12 different test groups can
be distinguished with the number of participants in each group varying
between 25 to 28 (see Appendix A for a schematic overview).
search that tests SPF users’ comprehension
of value added and learning gain, we developed a test for this study. To do
so, we created a framework which included all of the different cognitive
tasks that have to be performed to correctly interpret the feedback
information. Using this framework, suitable test items were developed
Out of this list of closed items (true-untrue
and multiple choice) and open items (filling in digit values), a test was
composed that could be completed in 30 minutes. Tests consisted of
conceptual and procedural items on two central concepts in the SPF report:
6 items referring to conceptual knowledge of learning gain: For example:
“Learning gain is the extent to which pupils progress in a certain skill
Chapter 4
100
• 5 to 6 items referring to the conceptual knowledge of value added: For
example: “Value added can only be determined if you know the input
characteristics of pupils. (true-untrue)”
• 13 items referring to the reading off learning gain from graphs and
tables: For example: “Look with close attention to these growth curves.
Then, complete the blanks in the table (score school 1 at start grade 1,
score at end of grade 1 and learning gain).”
• 17 to 18 items referring to the derivation of the value-added scores from
graphs and tables: For example: “Look with close attention to these
growth curves. Determine if these statements are true or untrue. School
1 reached a higher value-added score than school 2 in grade 4.” See also
Figure 4 for a simplified example of deriving value-added scores from
growth curves (in this test, the growth curves represented 6 grades and
2 schools were presented simultaneously).
A section of each post-test was designed in accordance with the nature
of the experimental condition. This means that the terminology and curves
were adapted to the way in which the concept of value added was
explained, either in terms of adjusted or expected means.
The psychometric quality of the test was checked by first converting the
scores on the different test versions to one common scale, applying a three-
parameter IRT model. Seven test groups were defined in function of the
different test variants. A satisfying overall fit was found for 111 of the
original 127 items (LR = 508.9, SE = 556.0, p = .92). The number of bad
fitting items did not exceed the number to be expected based on
coincidence. The empirical reliability of the tests varied from .80 to .90
depending on the test version and test group, with the exception of .72 for
one particular test group (Mdn = .84). Exploration of a two-parameter IRT
model resulted in comparable results, but 4 extra items had to be removed
from the calibration to attain a good overall fit (LR = 535.4, SE = 549.0, p =
.65). The correlation between skill scores resulting from the three- and two-
parameter model is r = .95 (p < .001). The three-parameter model was
finally preferred, because the IRT scores had a slightly better normal
distribution.
Short survey of learner characteristics
In addition to the post-test, data was gathered on characteristics of the
participants. As discussed in the theoretical framework, individual
differences must be taken into account when designing studies on data
representation. However, since the instruction and testing time in this
experiment was limited, we could not use elaborate measurement
Chapter 4
101
instruments (as, for example, in Scheiter, Gerjets, Vollmann, & Catrambone,
2009). We therefore selected a number of indicator variables.
To take into account differences in prior knowledge of the participants
in this study (De Westelinck, Valcke, De Craene, & Kirschner, 2005; Mayer,
2001; Shah & Hoeffner, 2002; Sweller, van Merriënboer, & Paas, 1998),
data was collected for the following variables:
• their study program: freshmen or students with a prior bachelor degree
• the number of hours of mathematics per week in the last and second
last year of their secondary education, and
• their mathematics exam score at the end of their secondary education
(in %)
As an indication of attitudes towards statistics (e.g., Scheiter, Gerjets,
Vollmann, & Catrambone, 2009), an item was included measuring the
degree to which participants like statistics. This was measured on a 7-point
Likert scale ranging from 1 (totally dislike) to 7 (like it very much). As an
additional moderator variable, we included the participants’ perceived
clarity of the feedback report, since it is plausible that not all respondents
experienced the clarity of the instruction material equally. This was also
measured on a 7-point Likert scale, ranging from 1 (very unclear) to 7 (very
clear).
2.4. Procedure
The entire population of freshman of educational sciences was invited to
schedule their participation for an experimental session by means of a
learning management system. A maximum of 50 students could participate
in each session, which was set up in a computer lab. Following a brief
introduction participants were asked to assume the fictitious role of a
school principal who had received a school performance feedback report on
the results from a longitudinal study that their school had participated in.
Participants were asked to imagine that the pupils of their school had been
monitored over the six years of primary school education and had been
tested on seven occasions. Participants were asked to read at their own
pace the school feedback report presented to them by a PowerPoint-
presentation. At the end of the presentation, they were asked to click on
the link to the online test and short survey. If their computer screen turned
orange, they had to take the test without looking back at the presentation
(closed version). If the screen turned green, they received a printed version
of their school feedback report to guide them when answering the open
version of the post-test. Students were told they had approximately 90
Chapter 4
102
minutes to read through the materials and to complete the post-test and
short survey.
2.5. Analyses
An explorative analysis of the descriptive data was carried out to screen
response patterns in relation to the content, degree of difficulty, and the
nature of the questions. Therefore, scatter plots were used to represent the
locations of the items in terms of the item type (conceptual – procedural)
and item content (learning gain – value added). The item location is an IRT
parameter that is related to the percentage of correct answers (r = -.90 in
this study) and gives an indication of the item difficulty level: the lower an
item is located, the more participants scored correctly on that item, which
indicates a lower level of the difficulty. To clarify the meaning of these item
locations, the percentages of correct answers for these items are reported
below. In addition, an error analysis was performed by first listing all
possible answers, then revealing error patterns, and finally reconstructing
participants’ reasoning processes.
First, the potential difference between the scores on the open and
closed test was examined using a t-test. Depending on this result, further
analyses were performed with either only the open test (if no difference) or
both tests (in case they differ).
The differential impact of experimental conditions was checked on the
basis of univariate analyses of covariance, controlling for potentially
confounding variables (mathematics level, degree of liking statistics, current
study program, etc.). Differences were tested with respect to the IRT test
scores. Additionally, pairwise comparisons were executed to determine the
differences within the categories of significant factors. Furthermore,
relevant moderator effects were included in addition to the main effects to
get a more nuanced view on the predictors in the model. These moderator
effects were added to the model stepwise. For this study, the students’
mathematics exam scores in the last year of secondary education were
brought into interaction with the hours of mathematics they received per
week. Furthermore, the moderating relation between value-added
explanations and graphical representation modes was examined. Finally,
the model examined the interaction effect between the value-added
conditions and the perceived clarity of the presentation.
Assumptions were checked for the analysis techniques, i.e.,
homogeneity of regression slopes and of residual variances, which
confirmed that no assumption had been violated.
3. Results and discussion
3.1. Descriptive statistics of correct answer
No differences were found between the results of the open and closed
versions of the test (t(153) = .322,
obtained for the open version of the test were used in the subsequent
analyses.
Descriptive statistical analysis reveals that students did not experience
difficulties in reading exact values from the tables and graphs (more than
85% of the answers correct), or in calculating learning gains (more than 75%
correct). To illustrate the spread of the items, the panel di
shows the frequencies of the standardised item locations in relation to the
item type and item content.
Figure 3. Panel display of standardised item locations in relation to item type and
content.
For example, the upper right panel shows no items exceeding a standard
deviation above zero, indicating a mean item location for procedural
learning gain items. This implies that test items that required participants to
derive learning gain values from gra
than the mean test difficulty level. In contrast, the most difficult items
appear in the procedural value-added panel. On average, only 35% of the
respondents were able to derive value
presented in the feedback report. Reading off value
Chapter 4
103
Descriptive statistics of correct answers
No differences were found between the results of the open and closed
(153) = .322, p = .748). Consequently, the scores
obtained for the open version of the test were used in the subsequent
s reveals that students did not experience
difficulties in reading exact values from the tables and graphs (more than
85% of the answers correct), or in calculating learning gains (more than 75%
correct). To illustrate the spread of the items, the panel display in Figure 3
shows the frequencies of the standardised item locations in relation to the
Panel display of standardised item locations in relation to item type and
For example, the upper right panel shows no items exceeding a standard
deviation above zero, indicating a mean item location for procedural
learning gain items. This implies that test items that required participants to
derive learning gain values from graphs and tables were not more difficult
than the mean test difficulty level. In contrast, the most difficult items
added panel. On average, only 35% of the
respondents were able to derive value-added scores from the graphs
sented in the feedback report. Reading off value-added scores requires
Chapter 4
104
the comparison of data (heights and slopes) from different growth curves
(e.g., the school’s adjusted growth curve and the “national” average growth
curve or the school’s expected growth curve and the school’s observed
growth curve). In contrast, deriving the average learning gain over a certain
period only requires the examination of one growth curve (see Figure 4).
Calculating value added thus requires extra processing, possibly causing
cognitive overload in the working memory due to high element interactivity
(Sweller, van Merriënboer, & Paas, 1998).This difference in mental effort,
related to intrinsic cognitive load, may explain the lower scores for
procedural value-added items in comparison to the learning-gain items.
Figure 4. Example of deriving learning gain and value added from growth curves.
For deriving learning gain, the difference in skill scores between two points of the
same curve must be calculated. For example, the learning gain of this school in the
first grade is 50 - 35 = 15. For reading off value-added results, two curves need to
be compared and a geometrical translation need to be performed. Before
subtracting the end points, the starting points of the curves must coincide. For
example, the value added of this school in the second grade is - 10.
Examining the nature of the errors in calculating value added, patterns
can be observed for the incorrect answers. This enables us to reconstruct
the thinking process of participants and to identify certain misconceptions.
Typical errors made when calculating value added are (1) comparing the
wrong growth curves; (2) ignoring the difference in starting points of the
Chapter 4
105
curves before subtracting (no geometrical translation was performed); (3)
confusing the calculation process of value added and learning gain: (4) using
the wrong signs (+/-); and (5) confusing the heights of curves with their
slopes. This last misconception, called the slope-height confusion, has been
reported in earlier studies (Beichner, 1994; Clement, 1989; Kramarski, 2004;
Leinhardt, Zaslavsky, & Stein, 1990).
Respondents mostly gave correct answers to the conceptual questions
related to the information that was literally explained in the school
feedback presentation (87% correct answers). In contrast, low test scores
were observed when the questions required deep level conceptual learning
(24% correct answers). For example, the statement that “The learning gain
of pupils can be calculated by tests with the same maximum score” was
incorrectly classified as ‘true’ by 86% of the participants. During the
instruction period, participants were told that learning gain can only be
calculated if both tests are on the same scale, by IRT calibration or by taking
the same test twice. Only 2% of the participants who received the item “To
estimate a school’s value added, you first have to adjust for school
characteristics,” answered it correctly. This indicates that for these
participants either the difference between school and input characteristics
was not clear or they just did not notice the difference in this sentence.
3.2. Differences between conditions
The results of the analyses of covariance in Table 2 show significant
differences in test scores in relation to the way value added was explained.
The school feedback report that explained value added in terms of
expected means resulted in higher performance (see Table 3 for descriptive
statistics; t(298) = 2.283, β = -.536, p = < .05). But the effect sizes are limited
as the explained variance in test scores is 2.1% (partial η2).
A pairwise comparison of the presentation modes reveals significant
differences between the test scores for the basic SPF version and the
elaborated SPF version using tables, in favour of the basic version (∆ = .269,
SE = .117, p <.05, partial η2 = 1.7 %). This finding contradicts Kosslyn’s (2006)
theory on the advantage of observing the design principle of appropriate
knowledge. An explanation of this finding could be found in the structure
mapping hypothesis (Schnotz & Bannert, 2003). This hypothesis assumes
that adding representations is not beneficial in all cases but is dependent
on the kind of task being carried out. In this sense, tables may not have
been helpful in solving the tasks presented in this study because their
structure does not facilitate the construction of a task-appropriate mental
model. Indeed, adding tables may have been inappropriate for illustrating
Chapter 4
106
trends in the data, since tables might be more appropriate for determining
exact numbers (Meyer, Shinar, & Leiser, 1997). Therefore, adding tables
that were not in accordance to the different task purposes may have
caused extraneous cognitive load (Chandler & Sweller, 1991).
Table 2 Results of analysis of covariance for IRT test score
Test score
df F p
Corrected model 13 5.221 .000** Explanation mode value added (E) 1 6.238 .013* Presentation mode (P) 2 2.648 .072 E x P 2 .738 .479 Study Program 2 2.728 .067 Degree of liking statistics 1 1.228 .269 Hours of math sec. education (H) 1 .501 .480 Math exam score sec. education (S) 1 .714 .399 H x S 1 .024 .878 Perceived clarity of presentation (C) 1 10.921 .001**
E x C 1 4.778 .030* Error 298
Note. Adj. R2 = .15 for IRT test score *p ≤ .05. ** p ≤.01
Table 3 Numbers, means and standard deviations of IRT test scores for the six conditions
n M SD
Explanation mode value added Adjusted scores 154 -.070 .950 Expected scores 158 .068 .829 Presentation mode Basic version 102 .131 .974 Table version 103 -.077 .852 Symbol version 107 -.051 .841
Regarding the influence of individual differences on the test score, only
the perceived clarity of the presentation appears to be significant, both as a
main effect and as a moderator effect in interaction with the value-added
explanation mode (t(298) = 2.186, β = .487, p = < .05). The direction of this
interaction effect shows that the perceived clarity of the presentation is
even more important when value added is explained in terms of adjusted
means than in terms of expected means. In other words: “The more clear a
presentation is perceived, the higher the IRT test score,” holds more when
value added is explained by adjusted than by expected means.
Chapter 4
107
4. General discussion and conclusion
4.1. Interpretation of SPF in the present study
Since school performance feedback aims at contributing to internal school
quality policies, it is important that the target audience develops a good
understanding of the information offered. The results of the present study
reveal that at least one of the most widely used concepts in school
performance feedback, the concept of value added, is not well understood
by non-statistically skilled people. The results from our experiment indicate
that there is a lack of procedural and deep conceptual understanding of this
function. Even when comprehensive information was provided to
participants in the experimental setting, the conceptual basis of value
added was too complex for statistically unskilled people to master. These
findings confirm our first research hypothesis that users’ would have
difficulty interpreting complex conceptual and graphical information due to
interplay between the inherent complexity of SPF and a lack of prior
knowledge of the respondents. This interplay causes intrinsic cognitive load
(Sweller, van Merriënboer, & Paas, 1998), interpretation difficulties, and
misconceptions (e.g., slope-height confusion, see Beichner, 1994; Clement,
1989; Kramarski, 2004; Leinhardt, Zaslavsky, & Stein, 1990).
We compared the two explanations and representations of value added
in terms of their differential effect on participants’ understanding of the
concept. This proved to be helpful in detecting which explanation of value
added facilitated better conceptual and procedural understanding.
Explaining this concept in terms of the difference between observed and
expected growth appears to be better than explaining it in terms of the
difference between the school’s adjusted growth curve and the reference
growth curve. However, the effect size of the observed significant
differences is rather small. While more research is needed to confirm these
findings, they serve as a point of reflection for the designers of school
feedback systems.
In terms of the graphical representations used in our experiment, it is
rather surprising that the tables did not add to users’ understanding of the
feedback report. However, this does not imply that the use of tables in
combination with growth curves is not advisable. Previous research
indicates that different information is derived from tables and graphs
(Meyer, Shinar, & Leiser, 1997); both sources of information have merits,
depending on the task being performed (Schnotz & Bannert, 2003). An
appropriate use of tables and graphs can avoid extraneous cognitive load
and foster correct understanding.
Chapter 4
108
4.2. Strengths and limitations
Earlier studies which examined school feedback reports expressed concern
for the accuracy of feedback users’ interpretation of information, but were
not able to pinpoint what was being misunderstood (Earl & Fullan, 2003;
Kerr et al., 2006; Saunders, 2000; Williams & Coles, 2007). The present
study allows us to develop a more detailed understanding of what is
misunderstood when interpreting learning gain and value added from SPF
reports. The use of IRT techniques appears to deliver detailed information
both on the item parameters and on respondents’ scores. This allows SPF
developers to examine interpretation difficulties in detail and to adapt SPF
representation forms for clients that are statistically unskilled. Furthermore,
this may inspire feedback providers to set up support initiatives.
In our experimental study particular concepts were studied in a
controlled setting. However, participants were not genuine feedback users.
This feedback experiment must therefore be considered as a first attempt
to test the understanding of diverse modes of explaining SPF concepts.
These results require further examination in future research.
It is possible that the experimental tasks in this study placed too high a
demand on the participants, in that they were expected to derive and
calculate value added from graphical representations or to indicate what
conclusions can be drawn from the tables and the growth curves presented.
This is also a point of discussion for feedback providers: Is it necessary to
expect SPF users to master the basic principles of deriving value added and
learning gain scores from representations or should SPF reports be
simplified? Providing more technical information to users also implies more
complex interpretations of the SPF information.
4.3. Implications for future research
Our findings in relation to the different modes of explaining the concept of
value added need further confirmation. The different modes of explaining
and representing SPF concepts and their influence on users’ understanding
can be tested in a number of ways. IRT techniques appear to be useful in
this regard and can be applied in less controlled settings, such as in quasi-
experimental designs. This would provide more detailed item information
and could support the external validity of our findings. An alternative way
of testing value added conceptions is to interview the feedback users. This
method has provided useful results in previous research (Santelices & Taut,
2009; Saunders, 2000). However, more in-depth analyses, such as video-
taping feedback users as they explain their understanding of the
Chapter 4
109
representations and concepts in SPF reports, may provide more insight into
the reasoning process of respondents.
The individual differences and preferences that influence feedback
users’ understanding of the SPF data require further attention. It is
important to explore whether SPF developers should introduce feedback
reports that are more flexible in terms of form and content, i.e., tailored to
the individual user (Visscher & Coe, 2003). This study points at the
importance of how respondents perceive the SPF variables and data.
Indeed, it is often not the feedback characteristics as such, but the
perception of them that determine how the data will be used (Verhaeghe et
al., 2010; Visscher, 2002). Therefore, valid measures of users’ perception of
SPF variables should be developed.
4.4. Implications for practice
It is quite likely that the misconceptions observed in this study also occur in
school practices when interpreting school performance feedback.
Therefore, these findings underpin the importance of carefully examining
the interpretability of feedback reports. Feedback developers should adapt
the mode of explaining the concept of value added to the target audience;
they should be aware of the prior knowledge of feedback users and should
develop graphical representations that differ from those used in scientific
publications. The presentation of information in SPF reports should be
designed in line with the task to be performed (Kluger & DeNisi, 1996;
Schnotz & Bannert, 2003). This implies that feedback reports should be
designed according to the cognitive tasks that are necessary to understand
the information. Many studies stress the role of support when dealing with
school feedback (Bosker, Branderhorst, & Visscher, 2007; Earl & Fullan,
2003; Kerr et al., 2006; Saunders, 2000; Verhaeghe et al., 2010; Williams &
Coles, 2007). If school performance feedback is expected to contribute to
school improvement, attention must be given to the way users interpret
the information.
110
Appendix A: Assignment of participants to the different conditions and test formats
Note. Extra variation in the tests was added by developing parallel formats in case students subscribed in later sessions were influenced by
colleagues of earlier sessions.
Chapter 4
111
References
Anderson, L.W., Krathwohl, D. R., Airasian, P.W., Cruikshank, K.A., Mayer,
R.E., Pintrich, P.R., et al. (Eds.). (2001). A taxonomy for learning,
teaching, and assessing: A revision of Bloom’s taxonomy of educational
objectives. New York: Longman.
Beichner, R.J., (1994). Testing student interpretation of kinematics graphs.
American Journal of Physics, 62(8), 750-762.
Bosker, R.J., Branderhorst, E. M., & Visscher, A. J. (2007). Improving the
utilisation of management information systems in secondary schools.
School Effectiveness and School Improvement, 18(4), 451-467.
CiTO. (n.d.) Volg- en adviessysteem: Voor elke leerling de beste kansen
[Monitor and advice system: The best chances for each pupil]. Retrieved
December 1, 2008, from
http://www.cito.nl/vo/vas/algemeen/eind_fr.htm
Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of
instruction. Cognition and Instruction, 8(4), 293-332.
Clement, J. (1989). The concept of variation and misconceptions in
Cartesian graphing. Focus on Learning Problems in Mathematics, 11, 77-
87.
De Westelinck, K., Valcke, M., De Craene, B., & Kirschner, P. (2005).
Multimedia learning in social sciences: Limitations of external graphical
representations. Computers in Human Behavior, 21(4), 555-573.
Dekeyser, H. M. (2001). Student preference for verbal, graphic or symbolic
information in an independent learning environment for an applied
statistics course. In J.F.Rouet, J.J. Levonen, & A. Biardieu (Eds.),
Multimedia learning: cognitive and instructional issues (pp. 99-109).
Oxford: Pergamon.
Durham University, Centre for Evaluation and Monitoring. (n.d.). PIPS.
Retrieved December 1, 2008, from
http://www.cemcentre.org/RenderPage.asp?LinkID=22210000
Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge
Journal of Education, 33(3), 383-394.
Fitz-Gibbon, C.T. (1997). The value added national project: Final report.
Feasibility studies for a national system of value added indicators. Hayes:
School Curriculum and Assessment Authority.
Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School self-
evaluation and student achievement. School Effectiveness and School
Improvement, 20(1), 47-68.
Katholieke Universiteit Leuven, Centre for Educational Effectiveness and
Evaluation. (2008). PIRLS: Begrijpend lezen vierde leerjaar:
Chapter 4
112
Schoolfeedbackrapport n.a.v. deelname aan het PIRLS 2006 – onderzoek
[PIRLS: Comprehensive reading grade four: School feedback report in
response to participation to the PIRLS 2006 study]. Retrieved December
1, 2008, from http://ppw.kuleuven.be/pirls/voorbeeldrapport.pdf
Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H. & Barney, H. (2006).
Strategies to promote data use for instructional improvement: Actions,
outcomes, and lessons from three urban districts. American Journal of
Education, 112, 496-520.
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on
performance: A historical review, a meta-analysis, and a preliminary
feedback intervention theory. Psychological Bulletin, 119(2), 254-284.
Kosslyn, S.M. (2006). Graph design for the eye and mind. Oxford: Oxford
University Press.
Kramarski, B. (2004). Making sense of graphs: Does metacognitive
instruction make a difference on students’ mathematical conceptions
and alternative conceptions? Learning and Instruction, 14(6), 593-619.
Leinhardt, G., Zaslavsky, O., & Stein, M.K. (1990). Functions, graphs, and
graphing: Tasks, learning, and teaching. Review of Educational Research,
60(1), 1-64.
Leithwood, K., Aitken, R, & Jantzi, D. (2006). Making schools smarter:
Leading with evidence (3rd ed.). Thousand Oaks, CA: Corwin Press.
Mayer, R.E. (2001). Multimedia Learning. Cambridge: Cambridge University
Press.
Mevarech, Z.R. & Kramarsky, B. (1997). From verbal descriptions to graphic
representations: Stability and change in students’ alternative
conceptions. Educational Studies in Mathematics, 32(3), 229-263.
Meyer, J., Shinar, D., & Leiser, D. (1997). Multiple factors that determine
performance with tables and graphs. Human Factors, 39(2), 268-286.
Mittal, V. O., Carenini, G., Moore, J.D., & Roth, S. (1998). Describing
complex charts in natural language: A caption generation system.
Computational Linguistics, 24(3), 431-467.
Nevo, D. (2002). Dialogue evaluation: Combining internal and external
evaluation. In D. Nevo (Ed.), School-based evaluation: An international
perspective (pp 3-16). Oxford: Elsevier Science.
Organisation for Economic Co-operation and Development. (2008).
Measuring improvements in learning outcomes: Best-practices to assess
the value-added of schools. Paris: OECD Publishing.
Santelices, V., & Taut, S. (2009, September). Comprehension and use of
value-added school performance indicators reported to teachers and
parents. Paper presented at the European Conference on Educational
Research, Vienna.
Chapter 4
113
Saunders, L. (2000). Understanding schools’ use of value-added data: The
psychology and sociology of numbers. Research Paper in Education,
15(3), 241-258.
Scheiter, K., Gerjets, P., Vollmann, B., & Catrambone, R. (2009). The impact
of learner characteristics on information utilization strategies, cognitive
load experienced, and performance in hypermedia learning. Learning
and Instruction, 19(2009), 387-401.
Schildkamp, K., & Teddlie, C. (2008). School performance feedback systems
in the USA and in The Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schnotz, W., Bannert, M. (2003). Construction and inference in learning
from multiple representation. Learning and Instruction, 13(2), 141-156.
Shah, P., & Hoeffner, J. (2002). Review of graph comprehension research:
Implications for instruction. Educational Psychology Review, 14, 47-69.
Smith III, J. P., diSessa, A.A., & Roschelle, J. (1993). Misconceptions
reconceived: A constructivist analysis of knowledge in transition. The
Journal of the Learning Sciences, 3(2), 115-163.
Sweller, J., van Merriënboer, J.J.G., & Paas, F.G.W.C. (1998). Cognitive
architecture and instructional design. Educational Psychology Review,
10(3), 251-296.
Tapiero, I. (2001). The construction and the updating of a spatial mental
model from text and map: effect of imagery and anchors. In J.F.Rouet, J.
J. Levonen, & A. Biardieu (Eds.), Multimedia learning: Cognitive and
instructional issues (pp. 45-57). Oxford: Pergamon.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
school performance feedback: Perceptions of primary school principals.
School Effectiveness and School Improvement, 21(2), 167-188.
Visscher, A.J. (1996). The implications of how school staff handle
information for the usage of school information systems. International
Journal of Educational Research, 25(4), 323-334.
Visscher, A.J. (2002). A framework for studying school performance
feedback systems. In A. J. Visscher, & R. Coe (Eds.), School improvement
through performance feedback. Lisse: Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through
performance feedback. Lisse: Swets & Zeitlinger.
Visscher, A., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Williams, D. & Coles, L. (2007). Teachers’ approaches to finding and using
research evidence: An information literacy perspective. Educational
Research, 49(2), 185-206.
114
CHAPTER 5
THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL PERFORMANCE FEEDBACK USE
Chapter 5
115
CHAPTER 5: THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL
PERFORMANCE FEEDBACK USE∗∗∗∗
Abstract
Information-rich environments are created to promote data use in schools
for the purpose of self-evaluation and quality assurance. However,
providing feedback does not guarantee that schools will actually put it to
use. One of the main stumbling blocks relates to the interpretation and
diagnosis of the information. This study examines the relationship between
data literacy competences, support given in interpreting the information,
actual use of the feedback, and potential school improvement effects. A
randomized field experiment with 188 school principals from primary
education was set up and a posttest was used to investigate the effects of a
support initiative. The results revealed that a minority of schools invested
significantly in the interpretation and diagnosis of the school performance
feedback (SPF), despite the fact that most of the respondents showed an
interest in the SPF report. In addition, data competence support and the
subsequent use of feedback were found to be limited.
∗ Based on Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in
press).The influence of competences and support on school performance feedback use. Educational Studies.
Chapter 5
116
1. Introduction and research questions
The growing autonomy of schools is going hand in hand with initiatives by
education authorities to hold schools accountable for their approach to
quality care (Nevo, 2002; Hofman, Dijkstra, & Hofman, 2009) and to create
information-rich environments. Schools are given feedback on their
functioning and performance via school performance feedback systems
(Visscher & Coe, 2002; 2003). The use of such systems as a policy
instrument is not a straightforward issue. School performance feedback
(SPF) has turned out to be a necessary yet insufficient step as both the
schools and the feedback systems have to meet certain requirements in
order to actually use this in practice (Visscher & Coe, 2003; Verhaeghe,
Vanhoof, Valcke, & Van Petegem, 2010). Consequently, current research
often reports disappointing results from school feedback use (Coe, 2002;
Saunders & Rudd, 1999; Tymms, 1995; Schildkamp, Visscher, & Luyten,
2009; Van Petegem & Vanhoof, 2004; Verhaeghe et al., 2010; Zupanc,
Urank, & Bren, 2009). One important obstacle is the lack of knowledge and
skills needed to process the information. School principals are usually not
trained in carrying out research, collecting data, data management or data
interpretation. This lack of data literacy (Earl & Fullan, 2003) leads to
valuable information often being neglected. Available research reveals a
need to support school principals and teachers in both the interpretation
and further use of the feedback data (Schildkamp & Teddlie, 2008;
Schildkamp, Visscher, & Luyten, 2009; Zupanc, Urank, & Bren, 2009). A
second critical issue - rising from the research review – is the need to
evaluate the impact of support initiatives related to the use of SPF (Zupanc,
Urank, & Bren, 2009). Indeed, current support initiatives often lack
empirical verification. And when evaluation initiatives have been set up,
they often focus too much on the short-term effects, such as the
satisfaction of participants, without considering the effects on the
organization (Mathison, 1992; Rossi, Lipsey & Freeman, 2004).
The present study aims at testing insights emerging from the current
knowledge base against empirical information. Answers are sought to the
following research questions:
• How do schools use SPF (in terms of phases in use and types of use)?
What are the effects of this use?
• To what extent are variations in SPF influenced by data literacy
competences?
• To what extent does specific SPF support has an impact on the
development of SPF competences, actual SPF use and resulting SPF
effects?
Chapter 5
117
2. Theoretical framework
In the following paragraphs, we first provide a theoretical framework used
to investigate the use of SPF. We subsequently address the question of SPF
effects. Finally, we focus on factors that are expected to influence the use
of school feedback, in particular data literacy competences and SPF usage
support. A visual representation of the theoretical framework is given in
Figure 1.
Figure 1. Framework for SPF usage in the present study
2.1. The use of SPF: Phases in use and types of use
Research shows that the process of SPF use in schools can be described in a
variety of ways (e.g., Schildkamp, 2007; Schildkamp & Kuiper, 2010). The
effective use of SPF implies a well-considered sequence of several
consecutive phases in a cyclical process (Huffman & Kalnin, 2003; Learning
Point Associates, 2004). In the process of school feedback use, Verhaeghe
et al. (2010) distinguish between receiving, reading and discussing the SPF
as a means to arrive at a correct interpretation. After the school has
performed an analysis of its results, the next stage involves putting to use
the information from the SPF, comprising a diagnosis that looks for
explanations for the school’s results.
Furthermore, SPF use refers to specific actions or changes in thinking
and processes. With reference to available research on evaluation data and
SPF use (Schildkamp, 2007; Schildkamp & Teddlie, 2008; Schildkamp,
Visscher & Luyten, 2009; Weiss, 1998), this study focuses on the following
types of use: an instrumental and conceptual use. In the case of a
conceptual use, we centre on changes in the thinking of the feedback users
(e.g. influences thinking in regard of how the pupils perform or how the
school functions). In the case of an instrumental use, we examine reported
Chapter 5
118
changes in school policies. The way the feedback will be used (types of use)
is expected to be correlated positively to the investment in the process of
SPF use (phases in use).
2.2. Effects of SPF
School performance feedback use will not automatically result in a
significant improvement of pupil performance (Fitz-Gibbon & Tymms, 2002;
Schildkamp, Visscher & Luyten, 2009). This underlines the importance of
examining effects beyond the level of educational performance and giving
sufficient attention to process-oriented effects (Schildkamp, 2007; Visscher
& Coe, 2002; 2003). The latter are described in terms of professional
development of team members, improved educational processes and
improvements in school functioning (Zupank, Urank, & Bren, 2009;
Schildkamp & Teddlie, 2008; Visscher & Coe, 2003). Furthermore,
unintended and undesirable effects can be observed, however; for
example, reduced motivation among teachers due to extra workload (Fitz-
Gibbon & Tymms, 2002; Schildkamp & Teddlie, 2008) or an excessive and
narrow focus on testing-towards-the-curriculum (Schildkamp & Teddlie,
2008; Visscher, 2002). In the present study, we map the perceived effects of
SPF use on the basis of self-reports of school improvement effects. This
approach has been successfully applied in previous studies on data use
(Huffman & Kalnin, 2003; Schildkamp & Teddlie, 2008; Schildkamp,
Visscher, & Luyten, 2009).
2.3. Influential factors: Competences and support
In our theoretical model, we distinguish between a variety of variables and
processes that influence the actual feedback use and the related effects: (1)
variables within the users that define their ability/orientation to adopt SPF
use and (2) levels of SPF support.
Data literacy competences
A competence is the ability to take satisfactory action through the
integration of knowledge, skills and attitudes. These three elements are
operationalized below in the context of school feedback use.
An attitude reveals how positively or negatively a person views a
particular matter (Petty & Wegener, 1998). A negative attitude towards SPF
is – according to Bosker, Branderhorst and Visscher (2007) – one of the
main obstacles in the use of feedback information. The attitude is the most
Chapter 5
119
significant aspect that determines a person’s willingness to invest time and
energy in dealing with information (Williams & Coles, 2007) and the users’
belief that they need the data in order to improve education (Schildkamp &
Kuiper, 2010). The concept can be operationalized in analogy to self-
evaluation research in schools (Meuret & Morlaix, 2003; Vanhoof, Van
Petegem, & De Maeyer, 2009). An individual’s attitude towards SPF can be
situated on a bipolar continuum. A number of examples include: School
feedback does/does not lead to better teaching, is favored/not favored by
most team members, and so on.
The importance of knowledge and skills is evidenced by the impact of
data literacy on the SPF use process (Webber & Johnston, 2000). “Data
literacy encompasses the strategies, skills and knowledge needed to define
information needs, and to locate, evaluate, synthesize, organize, present
and/or communicate information as needed” (Williams & Coles, 2007, p.
188). Data literacy is a condition for being able to convert data into valuable
and usable information (Earl & Fullan, 2003). The current lack of know-how
on making use of the information is an important obstacle (Kerr, Marsh,
Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Van Petegem & Vanhoof,
2004; Williams & Coles, 2007). Next to a lack of capacities needed to
interpret the data, there often is a lack of well-developed research skills
such as the formulation of research questions and hypotheses (Earl &
Fullan, 2003; Herman & Gribbons, 2001; Kerr et al. 2006). In this context,
we also have to distinguished between the actual mastery of knowledge
and skills on the one hand, and the level at which the users estimate their
skills on the other. The concept academic self-efficacy is applicable in the
context of SPF, which is a person's belief that he or she can perform certain
academic tasks to certain levels (Bandura, 1977; Schunk, 1991). In the
present study, academic self-efficacy focuses on the extent to which users
think they have understood the terms, figures and tables used and the
extent to which they believe they are able to find explanations for their
results. It is not only important to measure the actual knowledge and skills
but also to record the level of perceived self-efficacy, since it significantly
determines a person’s motivation for action (Bandura, 1977).
SPF usage support
Providing SPF support is essential because it might influence the actual and
experienced mastery of the competences of school principals to interpret
information relating to their school. A more detailed description of support
levels in this study will be discussed in the section about research
methodology. For the evaluation of support effects, we build on
Chapter 5
120
Kirkpatrick’s (1998) four levels of evaluation, which can be linked to our
broader theoretical framework. Table 1 describes these levels in general
terms and in terms of the SPF focus in this study.
Table 1
Kirkpatrick's Evaluation Levels (1998)
Description of evaluation levels Application in this study
Reaction. Immediate response of the
participants after the support. This
concerns a general impression of the
relevance and possibilities for
application.
This level is not reported because it
could logically only be obtained from
the experimental group.
Learning. Increase in knowledge and
skills and the change in attitudes as a
result of the support.
In this study, this level translates as
the question of whether the support
has contributed to an increase in data
literacy competences, specifically in
relation to the feedback report used.
Behavior. Application of what has
been learned in the organization and
behavioral changes.
In this particular case, it concerns the
question of how far schools progress in
the phases of SPF use and types of SPF
use.
Results. Effects of the support on
achieving the organization’s aims and
on the organization itself.
In the context of SPF use, this
evaluation level is represented in the
variable ‘perceived effects' of SPF.
Kirkpatrick's underlying premise (1998) is that the attainment of a higher
level can only be achieved once a lower level has been realized. This fits the
theoretical framework (see Figure 1) since SPF support provisions will only
contribute to school improvement effects when underlying SPF
competences have been affected.
3. Methodology: research design, procedure and research instruments
A between-groups field experiment with posttest was set up to investigate
the impact of SPF use support. The schools in this study can be classified
into two groups: a group with SPF support (experimental group) and a
group without SPF support (control group). The design was experimental
rather than quasi-experimental (Creswell, 2008; Field & Hole, 2003), given
that the schools were randomly assigned to either one of the two
Chapter 5
121
conditions and it was possible to control the independent variable, namely
the support intervention.
The experiment was set up in the context of a large-scale project,
whereby Flemish primary schools annually receive confidential feedback
based on the comparison of their school performance results with a
reference group. The schools receiving the feedback participate in a
longitudinal study, named Schoolloopbanen in het BasisOnderwijs (SiBO),
tracking approximately 6000 pupils from a representative sample of Flemish
schools (from the start of K3 until the end of grade six and the transition to
secondary education). Item Response Theory (IRT)-based techniques are
used to construct the test scores, enabling the estimation of growth curves.
At the beginning of 2008, about 200 schools received feedback reports
containing the results (grade 1 to grade 4) of the investigated pupil cohort.
Results were reported in relation to mathematics, reading fluency, reading
comprehension, and orthography, supplemented with information about
pupil characteristics (child factors, home factors, and Dutch language skills
at the start of grade 1). The central concepts in these reports include
learning gain, value added, and adjusted scores. These concepts were
explained in such a way that no prior statistical knowledge is required. The
data were supported with graphical representations (i.e. pie charts, growth
curves, and cross tables). The content of the text of each report was
standardized. The school principals were required to interpret the results
for their school, based on the general information made available.
Forty-five - chosen at random - of the 188 schools involved in the project
received an invitation to participate in the support. The principals in the
experimental support condition participated in a professional development
activity (a half-day workshop) with the following aims: (1) being able to
describe concepts from the report in their own words; (2) being able to
interpret the figures and tables from the SPF report; (3) being able to give
an explanation why performances could be worse or better as compared to
the reference group and (4) being able to describe which function(s) the
SPF report can fulfill in the context of their own school. To this end, school
principals met in small groups outside their own school. The feedback
designers explained the feedback reports during these meetings and
participants were given the opportunity to practice using and evaluating the
feedback information interactively.
Only 23 of the 45 schools invited effectively participated in the
experimental condition. Although the study participants were assigned to
the various conditions randomly, there is a real risk of selection bias caused
by the self-selection through working with volunteers (Rossi, Lipsey &
Freeman, 2004). This could endanger the internal validity of the study.
Chapter 5
122
Therefore, previously collected data was used to investigate whether this
subgroup deviated from the population of schools in the project in relation
to relevant school population and functioning criteria. This proved not to be
the case.
Five months after receiving the SPF – and after the experimental group
had participated in the support provision – the school principals of the SIBO
schools were asked to fill in a questionnaire. A total of 116 schools
completed the questionnaire (response rate 62%). The response for the
control condition amounted to 60% (n = 99) and the amount for the
experimental condition amounted to 74% (n = 17).
The various concepts from the theoretical framework were translated in
scales, consisting of specific questionnaire items. Each item presented the
principals with a statement they were to judge using a Likert scale. Table 2
presents descriptives of the scale scores for the different constructs; in
addition psychometric details are reported. The reliability analyses show
good to very good internal consistency values for all scales (α > .80).
Table 2
Descriptives and reliability of the research instruments
M SD Range N items α
Influencing factors
Attitude towards SPF use 3.97 1.08 1-6 7 .91
Academic self-efficacy 3.81 0.74 1-5 6 .92
SPF use
Phases in SPF use 3.81 0.75 1-5 6 .86
Conceptual SPF use 3.27 0.83 1-5 4 .86
Instrumental SPF use 2.85 0.97 1-5 3 .90
Effects
Perceived effects of SPF use 2.92 0.90 1-5 6 .94
A data literacy test was used to measure the knowledge and skills in
relation to feedback interpretation. The test focuses on the concepts and
representations used in the SPF reports and comprises 26 test items
mapping out both conceptual knowledge (correct conception of the terms
used) and procedural knowledge (skills in reading learning gains and added
value from graphs and tables). Both closed (true-untrue and multiple
choice) and open (filling in digit values) questions were used in the test.
Test scores were construed using IRT-analysis. A good overall fit was
achieved using a two-parameter model (LR = 248.4; SE = 320.0; p = .99) and
a good empirical validity of .83 was achieved using 24 retained items. The
scores were standardized to enhance their interpretability.
Chapter 5
123
Considering the nature of the theoretical framework and the research
questions, putting forward empirical evidence will require structural
equation modeling. Path analyses were therefore used to analyze whether
theory-based relationship expectations corresponded with the empirical
findings.
4. Results
4.1. Descriptive results
In this section, we focus on the sum scores and individual item scores for
the different constructs in the questionnaire. We discuss the variables as
ordered in our theoretical model: influencing factors, feedback use, and
perceived SPF effects. Finally, the results for the data literacy test are
discussed.
The sum score for the scale attitude towards SPF reveals that a large
majority of the respondents state that SPF use is (to some degree) a
valuable activity (M = 3.97, SD = 1.08). The most positive scores (M > 4)
were recorded in relation to the statements that SPF stimulates self-
evaluation, that much can be learned from SPF and that SPF results in
better management and more involvement in school policy. The statement
for which the lowest score was recorded (M = 3.46, SD = 1.22) related to
school feedback being an enjoyable activity for the majority of team
members.
In addition to a positive attitude, most of the respondents had a positive
self-efficacy score relating to the interpretation of and possible uses of the
feedback report (M = 3.81, SD =.74). For example, 80% stated that they
understand the terms, figures and tables in the report and can see
connections between the terms. Only a minority (between 12 and 18%)
disagreed with the statement concerning their ability to clearly grasp the
objectives and possibilities for the use of SPF or describe terms from the
report in their own words.
As regards the phases in feedback use, only a minority of the principals
reported having invested significant time and effort in the interpretation
and diagnosis of the SPF, despite the fact that the majority of respondents
indicated having an interest in the SPF report (M = 4.37, SD = 0.72).
Although 70% of the respondents agreed with the statement that the
report had been examined thoroughly (M = 3.84, SD = 0.97), only 43% of
Chapter 5
124
the respondents stated they had sought explanations for the performance
of their own schools on the basis of this report (M = 3.30, SD = 1.11).
In regard to the types of SPF use, the respondents scored significantly
higher (t (114) = 4.64, p < .001) for items pertaining to conceptual use (M =
3.27, SD = .83), as compared to items relating to instrumental use (M =
2.85, SD = .97). Half of the respondents stated that the SPF had an impact
on their perception of pupils’ performance and on the school in general
(conceptual use), while only 30% of the respondents stated that the report
had resulted in specific action (instrumental use).
The latter leads to the - not surprising - result that the perceived effect of
SPF use is rather low. Only a limited number of respondents reported any
effects of the SPF (M = 2.92, SD =.90). Between 30 and 40% stated that the
SPF report has contributed to more discussions on how the school
functions, to more attention for professional development, to a better
functioning of the school principal and to skills improvement in SPF use.
Around twenty percent of the respondents indicated that the SPF report
has improved the quality of the teaching in their schools.
The results of the data literacy test reveal that only 42% of the
respondents answered half of the questions correctly. Only 10% of the
respondents answered more than three-quarters of the questions correctly.
But some school principals (n = 5) succeeded in interpreting all the
information from the report correctly. Analysis of the difficulty of the
literacy test items points out that most principals experience difficulties in
relation to procedural exercises; i.e. reading the learning gains and added
value from the graphs and tables. The conceptual questions were
apparently less difficult.
4.2. Path model 1: Phases in use, types of use and perceived effects of
feedback use
The theoretical framework - presented in Figure 1 – shows the hypothetical
direct and indirect relations between the variables in our model: the data
literacy competences influence the perceived SPF effects via the phases in
use and types of feedback use. In a first analysis approach, the data of all
respondents were entered in the model without making a distinction
between SPF support conditions.
125
Figure 2. Results of path-model: Use and perceived effects of SPF use
Figure 3. Results of path model: Impact of support
Chapter 5
126
In order to test the mediation hypotheses, the direct effect of all
independent variables on the endogenous variable have to be studied
(MacKinnon 2008). This initial model was found to include various
statistically non-significant regression lines and co-variations. These had to
be removed - stepwise - in order to achieve a parsimonious model. Figure 2
shows the findings of the resulting path model, including standardized path
coefficients and percentages of variance explained (X² (df) = 8.1 (8), p =
0.43; RMSEA = 0.01; AGFI = 0.92; GFI = 0.98). This path model can be used
to answer the second research question about the extent to which
differences in SPF use (phases and types) and perceived SPF effects can be
explained by SPF competences.
The percentages of variance explained for the variables relating to SPF
use (phases and types) are highly relevant. For example, 39% of the
variance in the variable ‘phases in SF use’ can be explained by the data
literacy competences of the respondents. The higher the respondents’
estimation of their level of knowledge and skills (self-efficacy) and the more
positive their attitude towards SPF, the higher they invest in the use of SPF.
However, the additional effect of the ‘actual’ knowledge and skills is
limited. The theoretical model also hypothesized that the ‘types of SPF use’
can only be explained directly by the ‘phases in SPF use’. This only holds
true in relation to an instrumental use (24% of the variance explained).
When considering a conceptual use, also the attitude towards SPF use and
self-efficacy are relevant. Together with the ‘phases in SPF use’ these two
variables explain 43% of the variance in conceptual use. It can also be
concluded that a positive correlation (.32) exists between the unexplained
variance in the variables instrumental and conceptual use. This possibly
indicates - after checking for the other variables in the model - that the
number of instrumental and conceptual use respondents report increases
concurrently.
The ultimate variable in our model is the perceived effect of SPF use.
The path model explains 66% of the variance in this variable. The model
suggests that the ‘types of SPF use’ play an important role. The more
intensively respondents report conceptual and instrumental use of the SPF,
the higher their perception of the SPF effects. In contrast to our initial
model, there seems to exist a direct relationship between the attitude
towards SPF on the perceived effects of SPF. Therefore, we have to
conclude that the hypothetical mediation role of certain variables is more
direct than expected.
Chapter 5
127
4.3. Path model 2: Differential impact of support on data literacy
competences, feedback use and perceived effects
Building on the previous model, a subsequent path analysis was carried out
to test whether the SPF support condition results in significantly higher
scores than the control group as regards data literacy competences,
feedback use and perceived effects. A dummy variable was added to the
model referring to the experimental (1) and control (0) condition. Figure 3
displays the results of the path model with support, using the standardized
path coefficient and the percentage of variance explained in the
endogenous variables (X² (df) = 11.3 (13), p = 0.58; RMSEA = 0.01; AGFI =
0.92; GFI = 0.97).
The path model immediately reveals that there is no significant direct
impact of support on the ‘phases in SPF use’, the ‘types of SPF use’ and the
‘perceived effects of SPF use’. This is consistent with the - a priori -
theoretical framework. Nevertheless, it has to be stressed that the
proportion of variance explained in the competence related variable and
the self-efficacy variable is limited. The results also do not confirm the
expectation that SPF support contributes to a more positive attitude. Yet
support does have a statistically significant effect on the mastery of
knowledge and skills: 11% of the variance in the test scores can be
attributed to whether or not respondents received support. This impact
remains limited as far as the self-efficacy is concerned. Only 2% of the
variance in this variable can be attributed to the experimental condition.
5. Conclusion and discussion
In the present study, we focused on the question of how schools use school
performance feedback, and what the perceived effects are of SPF use. At a
general level, the respondents reported a rather low level of perceived
impact of SPF. Nonetheless, the majority of respondents stated that they
had thoroughly read and examined the SPF report. However, a less
significant number of respondents have considerably invested time and/or
efforts in interpreting the results and seeking explanations for the results of
their own schools. In line with our theoretical framework, differences in the
‘phases in SPF use’ translate into differences in the ‘types of SPF use’. There
is a considerably higher occurrence of conceptual use than instrumental
use. This can be explained by the fact that a conceptual use (control and
plan-oriented) precedes an instrumental use (goal-oriented) in a school
policy cycle (cf. Plan-Do-Check-Act-cycle, Deming, 1986). Research already
Chapter 5
128
revealed that many schools experience difficulties to use the findings of a
control stage in subsequent steps of quality control (Schildkamp 2007;
Verhaeghe et al., 2010).
The results are also helpful to demonstrate that differences in SPF use
correspond with differences in SPF competences. This study confirms
hypotheses related to the second research question. In regard of the
attitude towards SPF, we found that the impact is not only mediated by the
‘phases in SPF use’, but also a direct relationship exists with the ‘types of
SPF use’ and ‘perceived SPF effects’. Another relevant finding is that the
‘phases in SPF use’ are related more closely to the perceived mastery of
knowledge and skills (academic self-efficacy) as compared to the actual
mastery as measured with the data literacy test. We learn from this that
faith in one’s own knowledge and skills is very important in making the
transition to action (Bandura, 1977). Obviously, it should also be noted that
the actual mastery of the knowledge and skills is still relevant. School
policies should be developed on the basis of correct information (Devos &
Verhoeven, 2003).
A key research question in this study related to the differential impact of
an SPF support provision. Building on the theoretical framework, this
implies that we expect SPF support initiatives to effect - in a direct way - the
SPF related competences (attitudes, data literacy and self-efficacy). No
direct impact was expected on the ‘phases in SPF use’, related ‘types of SPF
use’ and ‘perceived SPF effects’. The results confirm by large our theoretical
assumptions. Principals that participated in the SPF support condition
attained higher on data literacy test scores and reported higher self-efficacy
levels. This consequently affected their process of feedback use (phases in
use). The expected indirect effect of SPF support is in line with Kirkpatrick’s
model (1998), implying that a higher level can only be achieved if lower
levels have been attained. Contrary to our expectations, participation in the
SPF support condition had no significant effect on the attitude towards SPF.
We have to stress that this attitude remains a crucial factor. Future SPF
support could focus to a larger extent, on the fundamental basis and
motives to implement SPF and on facilitating successful experiences with
SPF. Furthermore, SPF support initiatives that offer opportunities for
discussion and exchange of experiences – within and between schools -
must be considered (Huffman & Kalnin, 2003; Lachat & Smith, 2005;
Wayman, Midgley, & Stringfield, 2007). Some authors stress that such
discussions and exchanges are crucial to see benefits of SPF in terms of
school improvement (Zupanc, Urank, & Bren, 2009).
Another interesting finding is the larger impact of the SPF support
initiative on data literacy test scores as compared to the impact on
Chapter 5
129
academic self-efficacy. An initial explanation for this fact relates to the
limited scope of the support initiative. This was a single shot activity that
focused on the development of interpretation skills. The SPF support
seemed to have succeeded in the latter. A second explanation can be that
the SPF support has raised awareness among the participants about the
complexity of school feedback. This can explain why the SPF support results
mainly in higher literacy test scores and to a lesser extent in an increased
level of self-efficacy. A third explanation is that the participants have
learned hardly something from the support intervention. It is possible they
report the same level of self-efficacy but attain higher literacy test scores as
a result of an extra effort.
Future research about the impact of SPF support could adopt a
longitudinal approach with a more elaborated pre- and postintervention
measurement. This could enable to take into account the specific support
needs of respondents. These differences in need could also be linked to the
selection of SPF training participants. Moreover, the - delayed - effect of
SPF usage student achievement could be studied. Such effects could only be
expected after several SPF reports and persistent efforts for effective SPF
use. In addition, it would be worthwhile to set up research related to more
intensive support initiatives that go beyond single shot SPF support
provisions. At a theoretical level, a cross-validation of the path model
developed in the present study could be emphasized. In the present study
this was not possible because a sample size of 100 respondents is required
(Hoyle, 1995). Finally, the low data literacy test scores and its relationship
with the ‘phases in SPF use’ needs to make a methodological comment. The
data literacy test score was the single variable in the model not based on
perceptions of the respondents. The strong interrelations between
perception-based variables in the present study are thought-provoking. It
introduces the need for research that links the ‘perceived’ to the ‘expected’
and in particular to the ‘actual’ use of SPF. Finally, next to a focus on the
competences and perceptions of principals, future research could also
switch the attention to other critical actors in the discussion about
educational quality: inspection authorities, teachers, school teams, etc.
We finish by repeating that we observed a strong interest in SPF and a
positive attitude towards SPF among the Flemish school principals in our
study. This is in sharp contrast to their limited usage of the school
performance feedback information and the related effects on educational
practices and results. The study therefore shows in a particular way the
need to develop critical and conditional competences related to SPF use.
This is interesting from both a theoretical and practical point of view since
many support initiatives are being set up by feedback providers (e.g.
Chapter 5
130
helpdesks, after-school information sessions, information sessions at
school, and so on) without evaluating their direct and indirect effects.
References
Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral
change. Psychological Review, 84(2), 191-215.
Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the
utilisation of management information systems in secondary schools.
School Effectiveness and School Improvement, 18(4), 451-467.
Coe, R. (2002). Evidence on the role and impact of performance feedback in
schools. In A. J. Visscher & R. Coe (Eds.), School improvement through
performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger.
Creswell, J.W. (2008). Educational Research: Planning, conducting, and
evaluating quantitative and qualitative research (3rd ed.) Upper Saddle
River, NJ: Pearson Prentice Hall.
Deming, W.E. (1986). Out of the crisis. Cambridge: Massachusetts Institute
of Technology,Center for Advanced Engineering Study.
Devos, G., & Verhoeven, J. (2003). School self-evaluation - Conditions and
caveats: The case of secondary schools. Educational Management &
Administration, 31(1), 403-420.
Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge
Journal of Education, 33(3), 383-394.
Field, A.P., & Hole, G. (2003). How to design and report experiments.
Thousand Oaks, CA: Sage.
Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in
indicator systems: Doing things right and doing wrong things. Education
Policy Analysis Archives, 10(6), 1-28. Retrieved from
http://epaa.asu.edu/ojs/article/viewFile/285/411
Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support
school inquiry and continuous improvement: Final report to the Stuart
Foundation. Los Angeles: University of Carolina, Center for the Study of
Evaluation.
Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School self-
evaluation and student achievement. School Effectiveness and School
Improvement, 20(1), 47-68.
Hoyle, R.H. (Ed.). (1995). Structural Equation Modeling: Concepts, issues
and applications. Thousand Oaks, CA: Sage.
Huffman, D. & Kalnin, J. (2003). Collaborative inquiry to make data-based
decisions in schools. Teaching and Teacher Education, 19, 569-580.
Chapter 5
131
Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006).
Strategies to promote data use for instructional improvement: Actions,
outcomes, and lessons from three urban districts. American Journal of
Education, 112, 496-520.
Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels.
San Francisco: Berrett-Koehler.
Lachat, M.A., & Smith, S. (2005). Practices that support data use in urban
high schools. Journal of Education for Students Placed at Risk, 10(3), 333-
349.
Learning Point Associates. (2004). Guide to using data in school
improvement efforts: A compilation of knowledge from data retreats and
data use at learning point associates. Retrieved from
http://www.learningpt.org/pdfs/datause/guidebook.pdf
MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. New
York: Lawrence Erlbaum Associates.
Mathison, S. (1992). An evaluation model for inservice teacher education.
Evaluation and Program Planning, 15, 255-261.
Meuret, D., & Morlaix, S. (2003). Conditions of success of a school's self-
evaluation: Some lessons of an European experience. School
Effectiveness and School Improvement, 14(1), 53-71.
Nevo, D. (2002). Dialogue evaluation: Combining internal and external
evaluation. In D. Nevo (Ed.), School-based evaluation: An international
perspective (pp 3-16). Oxford, UK: Elsevier Science.
Petty, R.E., & Wegener, D.T. (1998). Attitude change: Multiple roles for
persuasion variables. In D. Gilbert, S. Fiske & G. Lindzey (Eds.), The
handbook of social psychology (pp. 323-90). New York: McGraw-Hill.
Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic
approach. Thousand Oaks: Sage.
Saunders, L. (2000). Understanding schools’ use of ‘value added’ data: The
Psychology and sociology of numbers. Research Paper in Education,
15(3), 241-258.
Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’
data: A science in the service of an art? Paper presented at the British
Educational Research Association Conference, Brighton, University of
Sussex.
Schildkamp, K. (2007). The utilisation of a self-evaluation instrument for
primary education. Unpublished doctoral dissertation, University of
Twente, Enschede, The Netherlands.
Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform:
Which data, what purposes, and promoting and hindering factors.
Teaching and Teacher Education, 26(3), 482-496.
Chapter 5
132
Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems
in the USA and in the Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school
self-evaluation instrument. School Effectiveness and School
Improvement, 20(1), 69-88.
Schunk, D.H. (1991). Self-efficacy and academic motivation. Educational
Psychologist, 26(3&4), 207-231.
Tymms, P. (1995). Influencing educational practice through performance
indicators. School Effectiveness and School Improvement, 6(2), 123-145.
Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatie-
indicatoren als strategisch instrument voor schoolontwikkeling
[Feedback on school performance indicators as strategic instrument for
school improvement]. Pedagogische Studiën, 81, 338-353.
Vanhoof, J., Van Petegem, P., & De Maeyer, S. (2009). Attitude towards
school self-evaluation. Studies in Educational Evaluation, 35, 21-28.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
School Performance Feedback: Perceptions of Primary School Principals.
School Effectiveness and School Improvement, 21(2), 167-188.
Visscher, A.J. (2002). A framework for studying school performance
feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement
through performance feedback (pp. 41-71). Lisse, The Netherlands:
Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through
performance feedback. Lisse, The Netherlands: Swets & Zeitlinger
Wayman, J.C., Midgley, S., & Stringfield, S. (2007). Leadership for data-
based decision making: Collaborative educator teams. In A. B. Danzig, K.
M. Borman, B. A.Jones & W. F. Wright (Eds.), Learner-centered
leadership: Research, policy and practice (pp. 189-205). New Jersey, USA:
Lawrence Erlbaum Associates.
Webber, S., & Johnston, B. (2000). Conceptions of information literacy: New
perspectives and implications. Journal of Information Science, 26(6),
381-397.
Weiss, C.H. (1998). Have we learned anything new about the use of
evaluation? American Journal of Evaluation, 19(1), 21-33.
Williams, D., & Coles, L. (2007). Teachers’ approaches to finding and using
research evidence: An information literacy perspective. Educational
Research, 49(2), 185-206.
Chapter 5
133
Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for
effectiveness and improvement in classrooms and schools in upper
secondary education in Slovenia: Assessment of/for Learning Analytic
Tool. School Effectiveness and School Improvement, 20(1), 89-122.
134
CHAPTER 6
EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK
Chapter 6
135
CHAPTER 6: EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK∗∗∗∗
Abstract
Effects of support by school performance feedback use
School development by systematic data use requires schools to be provided
with information-rich environments. However, providing school
performance feedback does not guarantee a successful use. Limited data
literacy competences of the users are one of the main stumbling blocks.
Support initiatives were developed and evaluated to overcome this
shortcoming. In a randomized field study, the effects of two experimental
conditions related to inservice and onservice education and training (INSET
and ONSET) are compared against a control group. This study examines the
relationship between data literacy competences, support provisions for
data interpretation, actual usage of the feedback, and school improvements
effects. The research was based on in-depth interviews involving 18 primary
school principals. The results of a case ordered predictor-outcome meta-
matrix do not only reveal difficulties in handling the information but also
incongruences in attitude towards feedback use between school principals
and teachers. The ONSET-condition led to the most optimal results
promoting a tailored support approach.
∗ Gebaseerd op Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten
van ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën.
Chapter 6
136
Samenvatting
Gezien hun maatschappelijke rol, wordt van scholen verwacht dat hun
benadering van schoolontwikkeling op een systematische manier gebeurt.
Daarom worden ze ondermeer aangezet om het interne
kwaliteitszorgbeleid te baseren op concrete data.
Schoolfeedbackinitiatieven zijn een mogelijke bron van dergelijke data. Het
gebruik van deze schoolfeedback blijkt echter niet vanzelfsprekend,
ondermeer door een gebrek aan datageletterdheidscompetenties. Om op
deze behoefte in te spelen worden verscheidene ondersteuningsactiviteiten
opgezet, die ofwel binnen (ONSET) of buiten de school worden
georganiseerd (INSET). In deze studie worden de resultaten gerapporteerd
van een evaluatieonderzoek waarbij naast een INSET- en een ONSET-
ondersteuningsopzet ook feedbackgebruikers in een controleconditie
worden betrokken. Bijzondere aandacht wordt daarbij besteed aan het
beïnvloeden van datageletterdheidscompetenties en het evalueren van
effecten op vier niveaus. Onderzoeksgegevens werden verzameld via
diepte-interviews met 18 schoolleiders uit de drie condities en werden
verwerkt in een case ordered predictor-outcome meta-matrix. De resultaten
tonen niet alleen een gebrek aan in kennis en vaardigheden om met
schoolfeedback om te gaan, maar ook een verdeelde houding tussen
schoolleiders en leerkrachten. Verder blijkt de ONSET-conditie tot de beste
resultaten te leiden wat impliceert dat ondersteuning in functie van
feedbackgebruik het best op maat van de school wordt aangeboden.
1. Probleemstelling
Van scholen wordt in groeiende mate verwacht dat ze van
schoolontwikkeling een systematisch proces maken (Nevo, 2002; Leithwood
& Aiken, 1995). Om hen daarbij te assisteren wordt gestreefd naar het
creëren van informatierijke omgevingen. Zo worden scholen ondermeer
voorzien van feedback over hun functioneren en prestaties door speciaal
daartoe opgezette schoolfeedbacksystemen. Dit gebeurt met de
verwachting dat scholen deze feedback aanwenden in het kader van
zelfevaluatie (Visscher & Coe, 2002; 2003).
Het gebruik van dergelijke informatiebronnen als een beleidsinstrument
blijkt echter niet vanzelfsprekend. Doorgaans blijven het gebruik en de
schoolverbeteringseffecten gelimiteerd (Coe, 2002; Saunders & Rudd, 1999;
Tymms, 1995; Schildkamp, Visscher, & Luyten, 2009; Van Petegem &
Vanhoof, 2004; Zupanc, Urank, & Bren, 2009). Schoolfeedback ontvangen
Chapter 6
137
blijkt een noodzakelijke maar geen voldoende stap. Zowel de scholen als de
feedbacksystemen moeten immers aan bepaalde voorwaarden voldoen
(Visscher & Coe, 2003; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010).
Één van de belangrijkste hinderpalen die een effectief gegevensgebruik in
de weg staat, is het ontbreken van datageletterdheid bij de gebruikers (Earl
& Fullan, 2003). Niet verwonderlijk zijn dan ook de onderzoeksbevindingen
waarbij schoolleiders en leerkrachten aangeven behoefte te hebben aan
bijkomende ondersteuning bij zowel het interpreteren als het verder
gebruik van de data (Schildkamp & Teddlie, 2008; Schildkamp et al., 2009;
Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc et al., 2009).
2. Conceptueel kader
2.1. Fasen in en types van schoolfeedbackgebruik
Schoolfeedbackgebruik kan op twee manieren omschreven worden.
Enerzijds kan verwezen worden naar de verschillende stappen die
feedbackgebruikers ondernemen om met de data aan de slag te gaan.
Onderzoek leert dat om gebruik te maken van schoolfeedback het
doordacht doorlopen van een cyclisch proces aangewezen is (Huffman &
Kalnin, 2003; Learning Point Associates, 2004; Verhaeghe et al., 2010).
Daarbij worden het ontvangen, lezen en bediscussiëren van de
schoolfeedback onderscheiden, om tot een correcte interpretatie te
komen. Nadat de school een sterkte-zwakteanalyse van haar resultaten
heeft gemaakt, volgt een fase waarin met de schoolfeedback aan de slag
wordt gegaan. Deze omvat het diagnosticeren door het zoeken naar
verklaringen voor de resultaten en het plannen, uitvoeren en evalueren van
acties. Omwille van een gebrek aan datageletterdheid en tijdsinvestering
blijken scholen deze stappen niet of moeizaam te doorlopen (Earl & Fullan,
2003; Verhaeghe et al., 2010).
Daarnaast kan bij het gebruiken van data binnen scholen verwezen
worden naar verschillende types van gebruik. Gebaseerd op de indeling
volgens Rossi, Lipsey en Freeman (2004) kan een onderscheid gemaakt
worden in verschillende soorten gebruik van evaluatiegegevens, eveneens
toegepast in de context van schoolfeedbackgebruik (Schildkamp et al.,
2009; Verhaeghe et al., 2010; Weiss, 1998). Scholen kunnen bijvoorbeeld
acties ondernemen (instrumenteel gebruik), aan het denken gaan
(conceptueel gebruik), bevestiging zoeken van bestaande standpunten
(symbolisch gebruik), het rapport in een verantwoordingcontext hanteren
Chapter 6
138
(strategisch gebruik) of het rapport gebruiken om teamleden te stimuleren
of motiveren (motiverend gebruik).
2.2. Effecten van schoolfeedbackgebruik
Het ultieme doel van schoolfeedbackgebruik is bij te dragen aan
schoolontwikkeling (Visscher & Coe, 2002, 2003). Echter,
schoolfeedbackgebruik blijkt niet steeds te resulteren in significant
verbeterde leerlingprestaties (Fitz-Gibbon & Tymms, 2002; Schildkamp et
al., 2009; Visscher, 2002). Bij het nagaan van schoolverbeteringseffecten
dient dan ook ruimer gekeken te worden naar ondermeer effecten op de
professionele ontwikkeling van teamleden (zoals een toenemende mate
van assessment literacy; Zupanc et al., 2009), verbeterde
onderwijsprocessen (zoals het intensifiëren van leerlingenbegeleiding,
Schildkamp & Teddlie, 2008) en een verbeterd schoolfunctioneren (zoals
het versterken van de cohesie in de school, Visscher & Coe, 2003). Ook
onbedoelde en onwenselijke effecten kunnen zich voordoen zoals
demotivatie bij leerkrachten door werkoverlast (Fitz-Gibbon & Tymms,
2002) of een te sterke focus op getoetste leerinhouden (Schildkamp &
Teddlie, 2008; Visscher, 2002).
2.3. Beïnvloedende factoren
Verschillen in schoolfeedbackgebruik en de effecten ervan kunnen
toegeschreven worden aan een viertal cluster van factoren die refereren
naar kenmerken van gebruikers, feedbacksystemen, ondersteuning en de
educatieve context (Verhaeghe et al., 2010; Visscher & Coe, 2003). Gezien
de gebrekkige datageletterdheidscompetenties van de gebruikers en de
urgente vraag naar onderzoek over ondersteuning hierbij spitsen we ons in
deze studie op deze twee factoren toe.
Competenties bij schoolfeedbackgebruik
Het begrip competentie verwijst naar de integratie van de kennis,
vaardigheden en attitudes die nodig zijn om adequaat te handelen in
specifieke situaties (Gonczi, 1994). Uit de onderzoeksliteratuur blijkt dat de
mate van informatiegeletterdheid (Webber & Johnston, 2000) een grote rol
speelt bij schoolfeedbackgebruik. Deze algemene term omvat de
strategieën, vaardigheden en kennis die nodig zijn om informatienoden te
bepalen en om de nodige informatie te verzamelen en te verwerken
(Williams & Coles, p 188). Toegepast op het domein van datagebruik binnen
Chapter 6
139
de school spreekt men van datageletterdheid. Het datageletterdheid zijn is
een noodzakelijke voorwaarde om data te kunnen omzetten in bruikbare
informatie (Earl & Fullan, 2003). Echter, de beperkte kennis om met de
gegevens aan de slag te gaan en de daarmee gepaard gaande onzekerheid
vormen vaak een obstakel (Earl & Fullan, 2003; Kerr et al., 2006; Saunders,
2000; Verhaeghe et al., 2010). Er zou niet alleen een gebrek aan
capaciteiten zijn om de data te interpreteren; ook onderzoeksvaardigheden
zoals het formuleren van onderzoeksvragen en hypothesen zijn doorgaans
niet sterk ontwikkeld (Earl & Fullan, 2003; Herman & Gribbons, 2001; Kerr
et al., 2006).
Het concept datageletterdheidscompetenties vraagt eveneens aandacht
voor de houding ten aanzien van schoolfeedback. Een negatieve houding
ten aanzien van schoolfeedback wordt door Bosker, Branderhorst en
Visscher (2007) als één van de voornaamste hinderpalen voor het gebruik
van feedbackinformatie naar voren geschoven. Het gaat dan bijvoorbeeld
om het geloof van de gebruikers dat ze data nodig hebben om hun
onderwijs te verbeteren (Schildkamp & Kuiper, 2010). De houding van
gebruikers ten opzichte van datagebruik bepaalt dan ook grotendeels in
hoeverre men bereid is om tijd en inspanningen te investeren in het gebruik
van de informatie (Williams & Coles, 2007).
Ondersteuning van scholen bij schoolfeedbackgebruik
Gezien de gebrekkige datageletterdheidscompetenties zijn
schoolfeedbackgebruikers vragende partij voor het ter beschikking stellen
van ondersteuning bij zowel de data-interpretatie als het verder gebruik
van de gegevens (bv. Schildkamp & Teddlie, 2008; Verhaeghe et al. , 2010;
Visscher & Coe, 2003). Deze ondersteuning kan geboden worden door
zowel externe ondersteuners – bijvoorbeeld educatieve diensten en
feedbackleveranciers – als door schoolteamleden intern in de school.
Voor het indelen van externe ondersteuningsinitiatieven kan Gardners
(1995) continuüm voor nascholingsinitiatieven gebruikt worden. Aan de
uitersten situeren zich initiatieven die buiten de school (Inservice Education
and Training - INSET) en binnen de school plaatsvinden (Onservice
Education and Training - ONSET). Een voordeel van INSET bijeenkomsten -
waarbij deelnemers uit verschillende scholen buiten de eigen school
samengebracht worden - is dat men door sociale interactie formeel en
informeel van elkaar kan leren (Mathison, 1992). Doordat doorgaans slechts
één afgevaardigde per school deelneemt, kan echter een beperktere impact
verwacht worden dan bij ONSET-initiatieven waarbij meerdere leden van
het schoolteam kunnen betrokken worden. Toch is er het vertrouwen dat
Chapter 6
140
schoolleiders als katalysator de geleerde inzichten kunnen doorgeven aan
het schoolteam (Kerr et al., 2006). Verscheidene studies tonen dan ook aan
dat de meest succesvolle leiders in datagebruik wel voortrekker zijn maar
dan via gedistribueerd leiderschap de taken voor datagebruik delen
(Wayman, Midgley, & Stringfield, 2007). ONSET-initiatieven zouden meer
kosteneffectief zijn dan inservice training doordat de training doorgaat
binnen de school met de eigen data en eigen problemen als uitgangspunt.
Bijgevolg is de kans groter dat de veranderingen aanvaard worden door de
sterkere betrokkenheid en praktijkband (Gardner, 1995; Murnane, Sharkey,
& Boudett, 2005). Wanneer daarenboven verschillende schoolteamleden
aanwezig zijn, kan dit aanzetten tot meer intern overleg en verdere
opvolging. Op die manier kan feedbackgebruik evolueren van een
individuele aangelegenheid naar een gedeelde verantwoordelijkheid, al dan
niet onder de vorm van collaborative data teams (Huffman & Kalnin, 2003;
Lachat & Smith, 2005; Wayman et al., 2007). De rol van de schoolleider is
ook bij deze evolutie van groot belang door ondermeer het creëren van een
duidelijke visie en verwachtingen rond datagebruik (Young, 2006) en het
coachen van de datateams (Lachat & Smith, 2005).
2.4. Evaluatiemodel voor ondersteuningsinitiatieven – Onderzoeksvragen
Om de mogelijke effecten van ondersteuning bij schoolfeedbackgebruik te
inventariseren en te integreren in het ruimer conceptueel kader doen we
een beroep op de vier opeenvolgende evaluatieniveaus voor
professionaliseringsactiviteiten van Kirkpatrick (1998).
Vooreerst worden de reacties van de deelnemers gemeten, onmiddellijk
na de ondersteuning. Het gaat om een algemene indruk en de relevantie en
toepassingsmogelijkheden. Al te vaak blijft de evaluatie van ondersteuning
beperkt tot dit niveau, terwijl de impact op de organisatie niet wordt
nagegaan (Mathison, 1992; Rossi et al., 2004). Vervolgens wordt de impact
op het leren van de deelnemers bekeken, of de toename aan kennis en
bekwaamheden en de verandering in attitudes als gevolg van de
ondersteuning. Ten derde wordt nagegaan of er een transfer is van wat er
geleerd werd tijdens de ondersteuning naar de organisatie en of er
gedragsveranderingen plaatsvinden. Tenslotte worden eventuele
schoolverbeteringseffecten nagegaan in het resultaatsniveau. Daarbij wordt
gekeken naar effecten van de ondersteuning op het bereiken van de doelen
van de organisatie en op de organisatie zelf.
Dit model kan toegepast worden om de impact van
ondersteuningsinitiatieven bij schoolfeedbackgebruik te evalueren. In Tabel
1 wordt dit nader toegelicht. De centrale onderzoeksvraag daarbij is in
Chapter 6
141
welke mate verschillen in schoolfeedbackgebruik verklaard kunnen worden
door ondersteuningsinitiatieven bij schoolfeedbackgebruik.
Tabel 1
Invloed van ondersteuning op schoolfeedbackgebruik volgens model Kirkpatrick
Evaluatieniveaus Kirkpatrick Toepassing op schoolfeedbackgebruik
Reactie:
Tevredenheid van de deelnemers
Tevredenheid van deelnemers over
ondersteuning bij schoolfeedbackgebruik
Leren:
Toename en/of verandering in
kennis, vaardigheden en attitudes
Verandering in
datageletterdheidscompetenties: kennis,
vaardigheden en attitudes nodig voor
succesvol schoolfeedbackgebruik
Gedrag:
Transfer geleerde inzichten naar
organisatie
Invloed op schoolfeedbackgebruik:
- Fasen in gebruik
- Types van gebruik
Resultaten:
Effecten op de organisatie
Invloed op schoolverbeteringseffecten door
schoolfeedbackgebruik
In deze bijdrage trachten we een antwoord te geven op de vraag naar de
impact van een ondersteuningsinitiatief bij schoolfeedbackgebruik door
gebruik te maken van het model van Kirkpatrick. Daarbij worden volgende
onderzoeksvragen gesteld:
1. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op de
tevredenheid van schoolfeedbackgebruikers (Reactie)?
2. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op
datageletterdheidscompetenties van schoolfeedbackgebruikers
(Leren)?
Zoals eerder beschreven bekijken we hier de mogelijke invloed van
ondersteuning op de kennis, vaardigheden en attitudes die gebruikers
nodig hebben voor succesvol schoolfeedbackgebruik.
3. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op het
gebruik van deze feedback binnen de school (Gedrag)?
Kirkpatricks model impliceert dat het realiseren van een hoger
impactniveau maar kan als een lager niveau gerealiseerd is. Indien
ondersteuning gericht is op het beïnvloeden van
datageletterdheidscompetenties, zal er eerst een impact zijn op de
kennis, vaardigheden en attitudes van de deelnemers. Vervolgens
zullen deze veranderde competenties bijdragen aan succesvol
Chapter 6
142
schoolfeedbackgebruik, dat in deze studie bepaald wordt in termen van
ondernomen stappen als soorten van feedbackgebruik.
4. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op de
schoolverbeteringseffecten door feedbackgebruik (Resultaten)?
We verwachten hierbij pas van schoolverbeteringseffecten te spreken
indien succesvol feedbackgebruik ze voorafgaat.
3. Methode
Voor het beantwoorden van de onderzoeksvragen werd gekozen voor een
veldexperiment met een posttest. De onderzoekspopulatie (N = 195
scholen) werd random toegewezen aan de verschillende condities.
3.1. Onderzoekscondities
Vertrekkende van het continuüm van inservice en onservice training
(Gardner, 1995) werd gekozen om twee ondersteuningsvarianten te
ontwerpen en uit te testen, afgezet tegenover een controlegroep die geen
bijkomende ondersteuning ontving (n = 150). De eerste experimentele
conditie noemen we de INSET-conditie omdat de training niet doorging op
de werkplek van de deelnemers en de leerinhouden gebaseerd waren op
een fictief schoolvoorbeeld in plaats van de eigen schoolresultaten.
Daarnaast onderscheiden we een ONSET-conditie aangezien zowel de
plaats van de training als de aangeboden leerinhouden dicht bij de
schooleigen context stonden. De kenmerken van beide condities worden
toegelicht in Tabel 2.
De leerdoelstellingen voor de twee experimentele
ondersteuningscondities waren identiek. Deelnemers werden na afloop van
de ondersteuning geacht in staat te zijn (1) in eigen woorden de centrale
begrippen uit het schoolfeedbackrapport te omschrijven; (2) de figuren en
de tabellen uit het schoolfeedbackrapport correct te interpreteren; (3)
verklaringen aan te geven waarom prestaties minder goed of beter kunnen
zijn dan die van de referentiegroep en (4) voor de eigen schoolcontext te
omschrijven welke functie(s) het schoolfeedbackrapport kan vervullen.
Deze leerdoelen richtten zich vooral op het tweede niveau van Kirkpatrick
(1998), waarin de beïnvloeding van kennis, vaardigheden en attitudes werd
beoogd. Daarnaast trachtte de ondersteuning ook indirect het
schoolfeedbackgebruik te beïnvloeden (Gedrag) door feedbackgebruikers
wegwijs te maken in de verschillende stappen voor systematisch
feedbackgebruik.
Chapter 6
143
Tabel 2
Beschrijving van INSET-en ONSET-conditie
INSET ONSET
Ondersteuners Twee medewerkers
Schoolfeedbackproject
Één van de twee medewerkers
van het Schoolfeedbackproject
uit de INSET-ondersteuning
Opzet Studievoormiddag Schoolbezoek
Doelgroep Meest betrokken persoon op
school bij gebruik van het
schoolfeedbackrapport (keuze
aan de school overgelaten)
Bij voorkeur de schoolleider, de
zorgcoördinator en twee
leerkrachten (uiteindelijke keuze
aan school overgelaten)
Deelnemers - 23 deelnemers uit 23 scholen
(10 in sessie 1 en 13 in sessie
2)
- 20 schoolleiders en 3
zorgcoördinatoren
- 13 deelnemers uit 7 scholen
- 6 x schoolleider met
zorgcoördinator; 1 x
schoolleider, zorgcoördinator
en leerkracht
Planning Ruim een maand na het
ontvangen van het
feedbackrapport
Idem als INSET
Locatie Universiteitsgebouw Eigen school
Inhoud - Aan de hand van
feedbackrapport met fictieve
school
- Uitleg bij de concepten en
representatievormen uit
feedbackrapport
- Leergesprek over de
gebruiksmogelijkheden van de
feedback
- Toelichting over onderliggend
schoolfeedbacksysteem
- Inoefen- en evaluatiemoment
Idem als INSET maar aanvullend
werd steeds een terugkoppeling
gemaakt naar het eigen
schoolfeedbackrapport.
Werkvorm Een variatie van didactische
werkvormen, van
instructiegerichte presentaties
tot vraaggesprekken en
groepsdiscussies.
Idem als INSET, maar enkel met
eigen schoolteamleden
Chapter 6
144
3.2. Selectie interviewrespondenten
Deze studie maakt deel uit van het Schoolfeedbackproject genaamd “Each
school its own mirror” (Verhaeghe & Van Damme, 2006). In het kader
daarvan ontvingen 195 Vlaamse scholen in het voorjaar 2008 feedback op
vertrouwelijke basis, waarbij hun schoolresultaten vergeleken werden met
een representatieve referentiegroep uit het SiBO-onderzoek (Maes, Van
Petegem, & Van Damme, 2005). Het ging om gegevens van een cohorte
leerlingen die (tot dan toe) van het einde van het kleuteronderwijs tot en
met het vierde leerjaar opgevolgd werden voor wiskunde en taal (spelling,
technisch en begrijpend lezen) aangevuld met informatie over
instroomkenmerken van leerlingen. De centrale concepten in het
feedbackrapport (leerwinst, toegevoegde waarde en gecorrigeerde scores)
werden zodanig uitgelegd dat de noodzaak van statistische voorkennis
zoveel mogelijk opgevangen werd. De feedbackdata werden ondersteund
door grafische voorstellingswijzen (cirkeldiagram, groeicurven en
kruistabellen). De tekst was gestandaardiseerd. Bijgevolg werd van
schoolteamleden verwacht om zelf de schooleigen data te interpreteren
aan de hand van de algemene uitleg.
Uit deze groep scholen werden door toevalstrekking 45 basisscholen
uitgenodigd om aan de ondersteuning deel te nemen. Daarvan namen er 23
deel aan de INSET- en 7 aan de ONSET-conditie (zie Figuur 1).
Figuur 1. Overzicht steekproeftrekking.
Hoewel de toewijzing van de proefpersonen aan de verschillende
condities random verliep, is een risico op selectievertekening mogelijk.
Omdat dit een mogelijke bedreiging kan vormen voor de interne validiteit
van het experiment werd op basis van eerder verzamelde gegevens
onderzocht of deze subgroep op relevante criteria afweek van de populatie
(N = 195). Uit deze analyses bleek dat de geselecteerde scholen niet
statistisch significant verschilden op vlak van de houding ten aanzien van
7 ONSET
45 vraag ondersteuning
150 geen vraag ondersteuning
6 interviews
23 INSET
6 interviews
6 interviews
195 basisscholen uit Schoolfeedbackproject
150 controle
Chapter 6
145
schoolfeedback, het verwachte gebruik van de schoolfeedback, de
perceptie van relevantie van de schoolfeedback, de instroomkenmerken
van leerlingen en de schoolprestaties uit de feedbackrapporten. Daarna
werden door toevalstrekking uit iedere conditie zes respondenten
geselecteerd voor deelname aan de interviews.
3.3. Onderzoeksinstrument en –procedure
Data werden verzameld door middel van semigestructureerde diepte-
interviews. Daartoe werden de schoolleiders een half jaar na de
ondersteuningsinterventies bezocht op hun school door één van beide
onderzoekers die de ondersteuningsinterventies hadden verzorgd.
De interviewvragen zijn opgesteld volgens het eerder besproken
conceptuele kader, passend binnen de vier evaluatieniveaus. Er werden
geen vragen gesteld die rechtstreeks naar de invloed van ondersteuning op
feedbackgebruik peilden, om antwoordvertekening door sociaalwenselijke
antwoorden te vermijden. Doorvragen was toegelaten om meer
verduidelijking of uitleg te krijgen (Lindlof & Taylor, 2002). Het
interviewinstrument bestond uit een veertigtal vragen voor ruim een uur
interviewtijd. Enkele voorbeelden van interviewvragen zijn:
• Reactie:
- Tevredenheid: Bent u tevreden over de ondersteuning van binnen en
buiten de school samen die u in het kader van het gebruik van de
schoolrapporten mocht genieten?
• Leren:
- Kennis en vaardigheden: Heeft u het gevoel voldoende vertrouwd te
zijn met het interpreteren van dergelijke feedbackgegevens? Welke
kennis en vaardigheden heeft men volgens u nodig om dit rapport
correct te kunnen interpreteren?
- Houding: Hoe staat u op dit moment tegenover het gebruik van
schoolfeedback? Is het de investering waard?
• Transfer
- Fasen in gebruik: Graag zouden we willen weten welk traject het
schoolrapport al heeft doorgemaakt sinds het hier in de school
toekwam. Zou u kort kunnen aangeven welke stappen er werden
gezet?
- Types van gebruik: Heeft het schoolfeedbackrapport tot concrete
actiepunten of beslissingen geleid? Bent u door het
schoolfeedbackrapport anders gaan kijken naar uw school?
• Resultaten
Chapter 6
146
- Effecten: Hoe zou u zelf de effecten van het gebruik dit
schoolfeedbackrapport omschrijven? Ziet u ongewenste
neveneffecten van het gebruik van dit schoolfeedbackrapport?
• Ondersteuning
- Genoten ondersteuning: Heeft u voor het interpreteren van de
resultaten een beroep gedaan op anderen binnen de school? Heeft u
een beroep gedaan op externen bij het interpreteren, diagnosticeren
of gebruiken van het schoolfeedbackrapport?
3.4. Analyse
Interviews werden opgenomen en nadien getranscribeerd. Daarna werden
ze onafhankelijk door twee onderzoekers gecodeerd door middel van de
kwalitatieve analysetool ATLAS.ti. Codes werden toegekend volgens de
middle order approach, wat toelaat om aanvankelijk ruime categorieën
later te verfijnen (Dey, 1993). De codering gebeurde hoofdzakelijk op een
deductieve wijze volgens de codes uit een codeboek, gebaseerd op het
theoretische kader. Eerst werden fragmenten geplaatst onder brede
categorieën. Wanneer aan relevante passages geen voorgedefinieerde
codes toegewezen konden worden, werden ze onder een brede categorie
geplaatst om later aan nieuwe codes toegewezen te worden, die op
inductieve wijze uit de data gegenereerd werden (Strauss & Corbin, 2007).
Na de codering van de afzonderlijke interviews werden gegevens
geanalyseerd volgens een case ordered predictor-outcome meta-matrix
(Miles & Huberman, 1994). Bij deze analyse worden de respondenten
opgedeeld volgens de onderzoekscondities waartoe ze behoren. Het doel
van deze opzet is niet enkel om de cases afzonderlijk te beschrijven maar
ook om een crosscase of variabele georiënteerde analyse uit te voeren.
Deze werkwijze gaat in de richting van een verklarende analyse van de
resultaten. De hypothese daarbij is dan de INSET- en ONSET-conditie zullen
leiden tot een hogere mate van feedbackgebruik dan de controleconditie,
met meer schoolverbeteringseffecten als gevolg. Om de sterkte van de
aanwezige kenmerken te bespreken, maken we gebruik van gradatiecodes
(afwezig – zwak aanwezig – sterk aanwezig – geen informatie). Zo werden
per variabele strikte criteria opgesteld om te bepalen in welke mate het
kenmerk aanwezig was. Deze gradaties maken het mogelijk een indicatie te
geven van de sterkte van de variabelen zonder gegevens verregaand te
kwantificeren.
Om deze metamatrix (zie Figuur 2) te construeren werden volgende
stappen ondernomen:
1. Anonimiseren van de transcripten om blind te kunnen coderen
Chapter 6
147
2. Volledige codering van de transcripten volgens het vooropgestelde
codeboek
3. Samenvatting per case volgens de structuur van het codeboek
4. Toekenning van gradatiecodes per respondent, aan iedere variabele
5. Onderbrengen van cases in metamatrix, met gradatiecodes
6. Cases terug identificeren en ordenen volgens experimentele conditie en
vervolgens naar gradatiecodes
De interviews werden onafhankelijk gecodeerd door twee onderzoekers.
Om de interbeoordelaarsbetrouwbaarheid na te gaan werden het codeboek
en gradatieregels gezamenlijk opgesteld. Daarna werd een interview
onafhankelijk door beiden gecodeerd en werd de
interbeoordelaarsbetrouwbaarheid als de verhouding tussen het aantal
overeenkomsten en het totale aantal toegekende codes onderzocht en
verhoogd tot .87 (Kurasaki, 2000; Miles & Huberman, 1994).
4. Resultaten
In de case ordered predictor-outcome meta-matrix (Figuur 2) worden de
respondenten geordend per conditie en naar gradaties in feedbackgebruik.
Hoe donkerder de celkleur, hoe sterker de betreffende variabele door de
respondent werd gerapporteerd. In de volgende alinea’s behandelen we
iedere variabele, zowel in algemene zin als per conditie. Daarbij wordt
telkens verwezen naar het aantal respondenten per conditie dat een
bepaalde uitspraak deed (C = Controlegroep, I = INSET-groep, O = ONSET-
groep).
4.1. Reactie
Om de tevredenheid na te gaan, werden de respondenten gevraagd om het
totale pakket van de genoten ondersteuning te beoordelen, inclusief de
INSET- of ONSET-ondersteuning en eventueel aanvullende interne en
externe ondersteuning. Algemeen stellen we een toenemende
tevredenheid van de genoten ondersteuning vast naarmate de intensiteit
van de ondersteuning toeneemt. Zo blijkt de tevredenheid groter bij de
ONSET-conditie dan bij de andere groepen. Enkele respondenten uit de
controleconditie konden geen tevredenheidsuitspraken doen omdat er
(quasi) geen ondersteuning op de school had plaatsgevonden (3C; / in
Figuur 2). Om meer zicht te krijgen waarop de tevredenheidsuitspraken
gebaseerd zijn, geven we een korte beschrijving van de genoten
ondersteuning in de scholen.
148
= Afwezig
= Zwak aanwezig = Sterk aanwezig CONTROLEGROEP (C) INSET-GROEP (I) ONSET-GROEP (O) / = Niet van toepassing/ Geen informatie
13 7 4 15 2 9 8 17 14 1 16 12 6 10 3 5 18 11
Reactie Tevredenheid / / /
Leren Kennis en vaardigheden Attitudes
Gedrag Fasen in gebruik Ontvangen Lezen en bespreken Interpreteren Diagnose
Planning acties Uitvoeren acties
Evalueren acties Types gebruik Instrumenteel gebruik Conceptueel gebruik
Symbolisch gebruik / Strategisch gebruik
/
/
/ /
/ / /
/
Motiverend gebruik Resultaten Effecten
Figuur 2. Impact van ondersteuning op schoolfeedbackgebruik: Resultaten in case ordered predictor-outcome meta-matrix.
Chapter 6
149
Gebruik interne ondersteuning
De bevraagde respondenten blijken niet allemaal een beroep te hebben
gedaan op expertise van andere teamleden in hun school. Slechts twee
schoolleiders uit de controlegroep geven aan enige vorm van interne
ondersteuning te hebben ondervonden terwijl dit bij alle ONSET-
respondenten wel het geval was. In bijna alle gevallen werd de
ondersteuning voorzien door de zorgcoördinator (2C, 6O) of zorgleerkracht
(1C, 1I), al dan niet aangevuld met leden van een kernteam (1I, 1O). In één
geval werd de ondersteuning aangeboden door de beleidsondersteuner
(1I).
Ik weet dat dit zijn nut heeft, maar je moet begrijpen dat wanneer je zo
een rapportje krijgt, dat je nog andere dingen binnenkrijgt. Als
schoolleiding moet je zien van: “Hoe zit dat hier in elkaar? Zo! Kort en
bondig.” En als je dan verder dieper wil gaan dan kan je het aan je
zorgcoördinator geven of aan een assistent, beleidsondersteuner, en dat
die dat dan meer in detail gaan uitspitten. (Respondent 18)
Verder blijkt dat indien er geen interne ondersteuning was, dat dit meestal
door tijdsgebrek was (1C, 1I) of het wegens omstandigheden niet
beschikbaar zijn van de zorgcoördinator (2C). Opvallend is dat leerkrachten
niet vermeld werden als bron van ondersteuning.
Gebruik externe ondersteuning
Enkel voor de scholen uit de experimentele condities was per definitie
externe ondersteuning aanwezig. Aanvullende vormen van externe
ondersteuning werden over het algemeen niet gezocht. Slechts één
respondent haalde aan een verkennend gesprek te hebben gevoerd met
een pedagogisch begeleider (1I). Enkele redenen voor het beperkt
aanspreken van pedagogische begeleiders werden aangegeven. Dezen
zouden over onvoldoende expertise en middelen beschikken om scholen
met dergelijke feedbackrapporten te begeleiden (1C, 1O) of zouden daarbij
onvoldoende oog hebben voor de pedagogische eigenheid van de school
(1C, 1O).
De respondenten uit de controlegroep die wel noemenswaardig
gebruikgemaakt hebben van de rapporten blijken meestal wel bijkomende
ondersteuning gezocht te hebben (3C), bestaande uit een algemene
studiedag over het onderzoeksproject, een informeel overleg binnen een
samenwerkingsverband van methodescholen, of een overleg met de
pedagogische begeleider en de schoolraad.
Chapter 6
150
4.2. Leren
Respondenten werden gevraagd de kennis, vaardigheden en attitudes
binnen de school te beschrijven. Er werd gekeken naar het gedeelde
potentieel binnen de school eerder dan naar de individuele eigenschappen
van de respondent. We kunnen niet eenduidig zeggen dat de
ondersteuningsinitiatieven hebben geleid tot sterk verbeterde
datageletterdheidscompetenties in deze studie. Hoewel de ONSET-groep er
het beste lijkt uit te komen, lijkt de INSET-groep niet te verschillen van de
controlegroep in competenties nodig voor het gebruik van hun
schoolfeedbackrapport.
Kennis en vaardigheden
De grootste tekorten doen zich voor op vlak van kennis en vaardigheden
nodig om de feedbackdata te interpreteren (2C, 5I, 2O), zelfs met de
aangeboden uitleg in het rapport.
De eerste keer dat ik er echt alleen mee op pad moest, was ik onzeker en
was het zeker niet duidelijk. (Respondent 6)
Andere tekorten in kennis en vaardigheden doen zich voor op het
overbrengen van de informatie naar het schoolteam (1C, 1O) of in
diagnosticeren en het plannen van acties (1C, 1I, 1O).
Maar dat is dus overal het probleem, bijvoorbeeld ook als je iets
aankaart bij een CLB [Centrum voor Leerlingenbegeleiding]. Ze doen
testen en ze stellen dat en dat vast. En hoe moeten we dan verder? Daar
geraken we dikwijls niet verder. Daar stopt het dikwijls. (Respondent 7)
Verschillende verklaringen voor deze beperkte datacompetenties komen
tijdens de interviews naar boven. Vooreerst geven respondenten aan dat de
bestaande voorkennis vrij beperkt is en niet verder reikt dan eenvoudige
statistieken bij klastoetsen (4I, 2O). Het ontbreken van deze voorkennis is
verder ook te wijten aan de opleiding waarbij er onvoldoende aandacht is
voor het leren gebruiken van data (1C, 1I).
Daar worden wij elke dag meer en meer mee geconfronteerd. Maar dat
vind ik persoonlijk ook een serieus mankement van de opleiding van
onderwijzers en onderwijzeressen, dat de mensen daar niet mee
vertrouwd zijn. Als je een aantal van die termen voorschotelt aan mijn
collega’s, die slaan achterover. (Respondent 16)
Chapter 6
151
Scholen die geen moeilijkheden ondervinden hebben dat meestal te danken
aan een uitgebreide voorkennis uit vooropleidingen of eerdere
werkervaringen (2C, 3O). Een andere verklaring voor het ontbreken van
deze datageletterdheid in sommige scholen zijn de directiewissels waarbij
de nodige kennis niet doorgegeven wordt aan de opvolger (1C, 1I).
Tenslotte is er op scholen een tijdsgebrek om
datageletterdheidscompetenties op te bouwen en om data te kunnen
interpreteren. Op die manier blijven ervaringen uit en wordt geen verdere
kennis over deze schoolfeedbackrapporten opgebouwd. Enkele
schoolleiders geven aan dat ze waarschijnlijk wel in staat zijn het rapport
correct te interpreteren indien ze daarvoor voldoende tijd kunnen en/of
willen vrijmaken (2C, 1I, 1O).
Ik erken dat ik daar eigenlijk geen tijd in wil steken. Ik heb andere dingen
die ook moeten gebeuren en dan vind ik dat dit te veel tijd vraagt in
verhouding. (Respondent 18)
Houding ten aanzien van schoolfeedback
De positieve houding bij schoolleiders en zorgcoördinatoren is vooral te
danken aan de groeiende interesse voor objectieve meetinstrumenten die
de leerwinst in kaart brengen en een vergelijking met een referentiegroep
mogelijk maken (4C, 2I, 5O). Volgens de respondenten zou die houding bij
leerkrachten een stuk negatiever zijn (3C, 3I, 3O). Deze negatieve houding
bij leerkrachten zou ondermeer toe te schrijven zijn aan de grote taaklast
bij de dataverzameling (1I, 2O), een ongerustheid om negatief uit de
resultaten te komen (1C, 2O), en het als bedreigend ervaren van externe
evaluaties (1I). Leerkrachten zouden bovendien een voorkeur hebben
feedback op leerlingenniveau (1C, 1I).
Elke onderwijzer of onderwijzeres heeft puntjes, heeft een puntenboek,
een Excel-werkmap en noem maar op. Dat zijn allemaal individuele
resultaten van de kinderen. Het gaat over de kinderen zeggen zij, en de
kinderen zijn belangrijk. Maar voor mij is de school belangrijk. Over de
individuele kinderen heen kijken naar de prestaties van een school of van
een groep binnen de school is niet evident voor ons. (Respondent 1)
Bepaalde respondenten relativeren het nut van de feedbackrapporten
door te wijzen op de beperkingen. Daarbij verwijzen ze naar de beperkte
bewijskracht van de feedback waarbij slechts één cohorte leerlingen
gevolgd werd (1C, 1I, 1O) die bovendien soms door leerlingenmobiliteit
behoorlijk onstabiel is (1I). Bovendien doet ook de inhoudelijke overlap met
Chapter 6
152
andere beschikbare gegevensbronnen (3I) de meerwaarde van deze
schoolfeedbackrapporten in vraag stellen. Daarnaast onthullen de
beweegredenen om deel te nemen aan het Schoolfeedbackproject iets over
de houding van de respondenten. Enkelingen nemen bijvoorbeeld enkel
deel aan dit onderzoek omdat ooit het engagement aangegaan is, al dan
niet door een vorige directie (2C, 2I, 1O).
Ik vind het zelf jammer. Mijn voorganger is hiermee, om welke reden dan
ook, mee gestart. Ik kan hem jammer genoeg niet meer vragen waarom.
(…) Moest ik daar helemaal vanaf het begin mee gestart zijn, dat zou ik
er zelf ook wel voor gekozen hebben om het samen met het team te
dragen. Dan zou het er een stukje anders uitzien. (Respondent 8)
4.3. Gedrag
Uit Figuur 2 blijkt dat de sterkte van datacompetenties positief samenhangt
met de sterkte van gebruik. Indien we op zoek gaan naar verschillen tussen
de conditiegroepen in feedbackgebruik, dan uiten deze zich vooral in de
intensiteit van het lezen en bespreken, interpreteren en diagnosticeren van
de feedbackinformatie, ten verdienste van respectievelijk de ONSET-,
INSET- en controlegroep. Wat opvalt, is dat de scholen die teamleden het
verregaandst bij deze processen betrekken, allen uit de experimentele
condities komen.
Fasen in schoolfeedbackgebruik
Het goed ontvangen van de schoolfeedback lijkt een vanzelfsprekendheid
maar blijkt dat niet steeds te zijn. Zo moesten twee scholen niet eens aan
verdere plannen denken, omdat de feedback nooit uit de mailbox van de
schoolleider is geraakt (1C, 1I). Bijgevolg kunnen de fasen van lezen,
interpreteren en diagnosticeren enkel in de andere scholen in kaart worden
gebracht. Slechts enkele schoolleiders kiezen ervoor alle leerkrachten actief
te betrekken bij deze fasen (2O). In de andere gevallen worden
leerkrachten enkel op de hoogte gebracht van de resultaten in een
personeelsvergadering (3I, 4O), via individuele besprekingen (1C, 1O) en/of
door rapporten vrijblijvend ter inzage aan te bieden (2C, 2I, 1O). In
bepaalde gevallen worden resultaten eerst apart behandeld in een
kernteam alvorens ze via een personeelsvergadering mee te delen (1I, 2O).
Over het algemeen blijft de informatieverspreiding bij deze groep
respondenten erg beperkt, zowel naar het aantal betrokkenen als naar de
aard van de informatie toe.
Chapter 6
153
Ik krijg het binnen, ik bekijk het, ik stel het voor aan de leerkrachten en ik
stel het voor op de personeelsvergadering. Daar houdt het meestal ook
op, veel verder gaat het niet. (Respondent 16)
Diegenen die ervoor kiezen de resultaten niet actief in het hele team te
verspreiden hebben daar verschillende redenen voor. Sommigen hebben
nog niets gedaan met de resultaten (2C, 1I) of verspreiden nooit dergelijke
informatie (1C). Anderen zijn van mening dat de resultaten wegens
leerkrachtenwissels (1C) of leerlingenmobiliteit geen valide beeld geven (1I)
of voelen zich te onzeker over de interpretatie en gebruiksmogelijkheden
(2C, 1I).
Eerlijk gezegd is dit voor mij heel moeilijk om dat juist in te schatten. Dat
vertel ik ook niet aan mijn leerkrachten omdat die dan misschien denken
dat ik foute informatie geef terwijl dat ik denk dat het heel belangrijk is,
maar ik kan het op dit moment niet juist inschatten. (Respondent 16)
De resultaten illustreren dat slechts een minderheid van schoolleiders
toekomt aan het plannen van acties (2C, 1I, 4O). Uit iedere conditie blijkt
slechts één school overgegaan te zijn tot het implementeren van acties. Niet
verwonderlijk is dat geen enkele school reeds toegekomen is aan het
evalueren van de uitgevoerde acties.
Soorten schoolfeedbackgebruik
Het voorafgaande negatieve beeld vereist nuancering. De
feedbackgegevens kunnen namelijk een invloed hebben op de
schoolwerking zonder meteen uit te monden in concrete acties. Dat blijkt
ook uit de resultaten, aangezien er meer sprake is van een conceptueel,
symbolisch en strategisch gebruik dan van een instrumenteel gebruik. Zo
rapporteert twee derde van de respondenten conceptueel gebruik (3C, 4I,
5O). Enkele waardevolle illustraties zijn het nauwgezetter gaan kijken naar
resultaten (1C), het waakzamer zijn bij mindere resultaten (1C), het oordeel
aanpassen over individuele leerkrachten n.a.v. goede resultaten (1C), het
leren denken in leerevoluties in plaats van in aparte leerjaren (1C, 1I, 2O),
het verruimen van de blik door de vergelijking met een referentiegroep (1I,
1O) en het genuanceerder kijken door gecorrigeerde scores (2I, 1O).
Sommige scholen zijn overgegaan tot acties en rapporteren
instrumenteel gebruik (1C, 1I, 1O) zoals de beslissing om te werken aan de
schrijfmotoriek van de kinderen, om niveaulezen en leesmoeders in te
voeren en om de aanpak van begrijpend lezen te veranderen.
Daarnaast blijkt dat schoolleiders de resultaten gebruiken uit
strategische doeleinden voor de onderwijsinspectie (3C, 4I, 2O). Soms
Chapter 6
154
gebeurt dit op een manier waarbij eerder het accent ligt op verantwoording
dan op schoolverbetering (3I).
De inspectie is verzot op het outputdossier en ik heb een heel kaftje met
allerlei gegevens in en dat is daar een onderdeel van. Op een bepaald
moment kregen die mannen dat onder ogen. (…) Die vragen altijd om al
het materiaal te geven dat je hebt en die kaft was daar ook bij. (…) In die
kaft zitten allerlei gegevens die ik heb over de kinderen en dat is voor hen
een stokpaardje en dat past daar perfect in. Daar heb ik goed mee
gescoord ondanks het feit dat ik het niet begreep. (Respondent 16)
Nagenoeg alle scholen die nog geen onderwijsinspectie over de vloer
kregen (/ in Figuur 2) geven aan dat ze de resultaten zouden voorleggen
tijdens een doorlichting (2C, 2I, 2O). In een enkele school werden de
rapporten gebruikt als vorm van publiciteit om leerlingen aan te trekken
(1O). Niet iedere schoolleider staat daar echter voor open (2C).
Misschien zijn er scholen die dat wel zouden willen gebruiken moesten ze
allemaal zo heel hoog boven de curve uitsteken maar ik vind dat niet
direct een goede manier om ouders of buitenstaanders om de oren te
slaan met die grafieken en met dat cijfermateriaal. (Respondent 2)
Heel wat schoolleiders gebruiken schoolfeedbackrapporten op een
symbolische manier. Bestaande argumenten worden dan bijvoorbeeld
kracht bijgezet door de resultaten (1C, 4I, 4O). Zo trachtte een schoolleider
zijn teamleden ervan te overtuigen dat het niet is omdat kinderen
anderstalig zijn, dat ze geen hoge scores kunnen behalen (1I). Één specifiek
voorbeeld gaat niet over het overtuigen van leerkrachten maar wel van
ouders. De school is ervan overtuigd dat leerlingen duidelijk leerwinst
maken en daarom gestimuleerd moeten worden om volgens capaciteiten
een studierichting te kiezen in het secundair onderwijs (1O). Nog een
andere schoolleider haalt aan dat deze resultaten enkel gebruikt zijn omdat
ze aansloten bij eerdere bevindingen van de school (1O). Daarnaast kan
symbolisch gebruik ook inhouden dat resultaten doelbewust niet in team
besproken worden omdat dit op dit moment niet constructief zou zijn voor
de schoolwerking (1C).
Respondenten uit de twee experimentele groepen blijken de resultaten
meer op een symbolische manier te gebruiken. Dat geldt ook voor het
motiverend gebruik van de schoolfeedback. Leerkrachten krijgen
bijvoorbeeld een schouderklopje en bevestiging van het goede werk (1C, 3I,
4O) en/of net een signaal om verder iets te doen met de mindere
resultaten (1I, 2O).
Chapter 6
155
Wij hadden altijd wel het idee van als we kijken naar ‘de grondstoffen’
die we binnenkrijgen en de kwaliteit van ‘de grondstoffen’, en zien wat
we afwerken, dan moeten we zeggen: “Kijk we hebben toch wel goed
werk geleverd”. Maar dat was altijd op basis van een gevoel. En nu
eindelijk hebben we die houvast, doordat het wordt bevestigd door
onderzoek. (Respondent 11)
De feedback blijkt ook zelfvertrouwen te kunnen geven aan
schoolteamleden, door te bevestigen dat de school het goed doet (1C, 1O).
In sommige scholen waar ook mindere resultaten werden geboekt, werden
enkel de positieve resultaten benadrukt, precies om te werken aan een
positieve houding van het team ten aanzien van schoolfeedback (1I) of uit
schrik om leerkrachten onterecht met de vinger te wijzen (1I).
4.4. Resultaten
De uiteindelijke bedoeling van ondersteuning bij feedbackgebruik is bij te
dragen tot schoolverbeteringseffecten. Een half jaar na het ontvangen van
het feedbackrapport blijken enkele scholen reeds waardevolle effecten te
rapporteren (2C, 1I, 2O). Er kan echter geen duidelijk verband aangetoond
worden tussen de bereikte effecten en de drie onderzoekscondities.
Wanneer we deze effecten nader bekijken zien we dat er mede dankzij
het gebruik van dit rapport een grotere alertheid is gegroeid bij
leerkrachten voor het uitvallen van leerlingen (1C), leerkrachten een
duidelijker beeld hebben gekregen van de evolutie van de leerlingen (1C),
er meer vertrouwen is in de werking van de school (1C, 2O) en een
kritischere houding kwam tegenover de eigen schoolprestaties (1I). Twee
scholen voelen zich dankzij de positieve resultaten heropgewaardeerd in de
buurt (2O).
Maar een naam of een faam die een school heeft in een buurt
veranderen is heel moeilijk. En met contacten buiten, met ouders, komt
dat nu nog geregeld ter sprake van: “Kijk, is dat wel een goede school?
Zijn jullie wel goed bezig? Zou ik mijn kinderen niet beter naar een
andere school doen?” En leerkrachten twijfelden vroeger dan voor een
stuk aan hun eigen kunnen. Nu zijn ze daar ook veel directer in en gaan
ze in contact met ouders ook veel meer durven zeggen van: “Neen, wij
zijn goed bezig, wij hebben onze resultaten”. (Respondent 11)
Daarnaast doen zich echter ook ongewenste of onvoorziene effecten
voor. In één school leidde het invoeren van het schoolfeedbacksysteem tot
teaching to the test (1C), ook al ging dat tegen de visie van de schoolleider
in.
Chapter 6
156
Maar wie bedriegen ze daar uiteindelijk mee? Zichzelf toch! Je gaat toch
als leerkracht toch niet naar die toetsen werken of je gaat ze toch geen
gelijkaardige test geven zodat ze volgende week goed zouden scoren?
Dan vallen die gewoon uit als ze in het middelbaar onderwijs komen. Dan
ben je als school toch ook niet meer geloofwaardig met de resultaten die
je naar voor brengt? (Respondent 7)
In een andere school leverde het toetsen van de leerlingen een gevoel van
teleurstelling en demotivatie op omdat de resultaten minder goed bleken
dan verwacht (1O).
5. Discussie en conclusie
5.1. Schoolfeedbackgebruik en ondersteuning
Vooreerst wijzen de onderzoeksresultaten op een grote variatie in de
manier waarop scholen vormgeven aan schoolfeedbackgebruik. Ook de
effecten van dit gebruik zijn zeer divers. In de volgende alinea’s
concluderen we dat in het verklaren van de verschillen tussen scholen de
theoretische verwachtingen in grote lijnen bevestigd werden. Daarbij
werden twee variabelen nader bekeken.
Een eerste variabele betrof de datageletterdheidscompetenties om met
het onderzochte schoolfeedbackrapport aan de slag te gaan. Over het
algemeen heeft de meerderheid van de respondenten nog moeite met de
interpretatie van de data. Als het in deze stap misgaat, is een verder
succesvol gebruik niet gegarandeerd (Earl & Fullan, 2003). Zo stelt Bandura
(1977) dat het geloof in eigen kennis en vaardigheden belangrijk is om tot
actie over te gaan. Ook voor de volgende fasen in gebruik blijken beperkte
competenties vooralsnog een rem zijn.
Competenties bestaan naast kennis en vaardigheden ook uit attitudes.
Daarvoor werd gepeild naar de houding van de respondenten ten opzichte
van schoolfeedbackgebruik, wat ook geen overwegend positief verhaal
opleverde. De eerdere bevinding dat schoolleiders een positievere houding
hebben dan leerkrachten werd door deze studie bevestigd (Vanhoof, Van
Petegem, & De Maeyer, 2009; Zupanc et al., 2009). Leerkrachten hebben
blijkbaar minder de kans om de meerwaarde en functionaliteit van
schoolfeedbackgebruik te ervaren maar worden wel geconfronteerd met de
lasten van de dataverzameling (Ingram, Louis, & Schroeder, 2004;
Verhaeghe et al., 2010). Ze zijn minder vertrouwd met het gebruik van
gegevens op schoolniveau en vinden dat de resultaten op groepsniveau te
Chapter 6
157
veraf staan van hun activiteiten op klasniveau (Schildkamp & Kuiper, 2010;
Zupanc et al., 2009). Bovendien komt daar nog eens het bedreigende
karakter van externe evaluaties bovenop, wat angst inboezemt voor
individuele evaluaties (Ingram et al., 2004), ook in het geval wanneer het
schoolfeedbackgebruik in het teken van zelfevaluatie eerder gericht is op de
schoolwerking dan op aparte individuen (Kyriakides & Campbell, 2004).
De tweede onderzochte variabele betrof ondersteuning bij
schoolfeedbackgebruik. Ondersteuning werd in deze studie
geoperationaliseerd in een INSET- en ONSET-conditie (Gardner, 1995). Door
middel van gecontroleerde experimentele ondersteuningsinterventies
bleek het mogelijk om differentiële effecten in de verschillende condities te
onderzoeken. Ook al was de opzet beperkt door zijn eenmalige interventie,
kleinschaligheid en verkennende resultaten, toch bood dit gecontroleerde
design enkele waardevolle inzichten. Aan de hand van Kirkpatricks
evaluatiemodel voor trainingsinitiatieven (1998) werden interventie-
effecten op vier niveaus beschreven.
Op het reactieniveau kunnen we zeggen dat de tevredenheid over de
genoten ondersteuning groter was indien meer ondersteuning er genoten
werd. De respondenten uit de controle- en INSET-groep gaven aan niet
actief naar interne en externe ondersteuning gezocht te hebben. ONSET-
deelnemers deden vaker beroep op schoolteamleden, waarschijnlijk
doordat de zorgcoördinator ook betrokken was geweest in de ONSET-
interventie. Deze respondenten drukten dan ook de grootste mate van
tevredenheid uit. Deze resultaten indiceren dat de respondenten eerder
een aanbodgerichte houding voor ondersteuning aannemen aangezien
spontaan zeer beperkt actief beroep gedaan wordt op schoolteamleden of
externe ondersteuningsdiensten. Verder is het opvallend dat leerkrachten
zelden gezien worden als ondersteuningsbron. Verschillende verklaringen
kwamen uit de resultaten naar voor. Zo hebben leerkrachten minder de
mogelijkheid om zich van hun drukke taakschema los te maken (Huffman &
Kalnin, 2003; Ingram et al., 2004) en houden ze er een minder positieve
houding ten opzichte van schoolfeedbackgebruik op na (Ingram et al., 2004;
Zupanc et al., 2009). Zorgcoördinatoren daarentegen worden wel
aangesproken omdat zij vaak over de nodige
datageletterdheidscompetenties beschikken door hun ervaring in het lezen
en interpreteren van data uit testen en leerlingvolgsystemen. Bovendien
valt schoolfeedbackgebruik te plaatsen onder hun taak van zorgcoördinatie
op school.
Bij het leerniveau werd de invloed bekeken van ondersteuning op de
nodige datacompetenties om met het rapport om te gaan. De ONSET-groep
kwam er als beste uit, gevolgd door de controlegroep. Op basis van deze
Chapter 6
158
bevindingen besluiten dat beide ondersteuningscondities eenduidig een
positief effect gehad hebben op het schoolfeedbackgebruik is dus
voorbarig. Daarom is het nodig om ook naar het volgende niveau te kijken,
waarbij de transfer van de geleerde inzichten uit de ondersteuning op de
organisatie wordt bekeken. We zien voor de experimentele groepen een
zichtbaar voordeel voor de lees- , interpretatie- en diagnosefase. Slechts
enkelen gaan over tot acties, waarbij geen duidelijk verschil tussen de
condities uitgemaakt kan worden. Dit houdt in dat slechts beperkt
instrumenteel gebruik waargenomen wordt. We nemen echter wel een
verscheidenheid in conceptueel, symbolisch, strategisch en motiverend
gebruik waar. Net zoals een verandering in denken kan leiden tot een
verandering in handelen gaat een conceptueel gebruik een instrumenteel
gebruik vooraf (Schildkamp & Teddlie, 2008; Vanhoof, Verhaeghe,
Verhaeghe, Valcke, & Van Petegem, in druk). Deze resultaten zijn in die zin
hoopgevend aangezien zo schoolfeedbackgebruik misschien geleidelijk aan
ingang vindt in Vlaamse scholen. Echter, om dit gebruik op een hoger
niveau te tillen en te integreren in bestaande kwaliteitszorg zijn bijkomende
ondersteuning en middelen nodig.
Gegeven de beperkte gebruiksresultaten en de eerder beperkte tijd
tussen de geboden ondersteuning en de dataverzameling, is slechts beperkt
sprake van schoolverbeteringseffecten. Dit houdt bijgevolg in dat geen
verschillen tussen condities gevonden kunnen worden.
5.2. Praktijkimplicaties
Uit deze onderzoeksresultaten volgt dat bij het opzetten van
ondersteuningsinitiatieven bij schoolfeedbackgebruik, vooraf best een
grondige behoefteanalyse gebeurt om zicht te krijgen op de
ondersteuningsnoden van schoolleiders en hun teamleden. Vermits de
ONSET-conditie over de gehele lijn er als beste uitkomt in dit onderzoek,
kan dit implicaties hebben voor de opzet van ondersteuning. Door
ondersteuning aan te bieden op de eigen school, aan de hand van de eigen
data, met het eigen team wordt blijkbaar het best op deze
ondersteuningsbehoeften ingespeeld. Dat geeft aan dat een persoonlijke
ondersteuning op maat verkozen wordt boven een veralgemeende aanpak
in studiedagen. Aansluitend bij deze werkwijze kan verwezen worden
beschikbare literatuur rond het opzetten van collaborative data teams
(Huffman & Kalnin, 2003; Lachat & Smith, 2005; Wayman et al., 2007). Voor
de ondersteuners houdt dat ondermeer in dat ze duidelijk zicht moeten
hebben op de schoolsituatie en in staat moeten zijn hun ondersteuning op
maat af te stellen. Aanvullend dient benadrukt te worden dat
Chapter 6
159
ondersteuning niet mag ophouden bij de interpretatie van de gegevens.
Scholen zouden minstens een aanzet moeten krijgen om met de gegevens
aan de slag te gaan.
Verder dient men bij het opzetten van schoolfeedbacksystemen in acht
te nemen dat ondersteuning niet zomaar de sleutel tot succesvol gebruik is.
Schoolleiders percipiëren namelijk niet enkel een gebrek aan
datageletterdheidscompetenties, maar eveneens een gebrek aan tijd.
Schoolfeedbackgebruik wordt daardoor niet geïntegreerd in een
systematisch reflecteren over de schoolwerking. Extra middelen voor zowel
beleidsmakers als leerkrachten om tijd voor deze taak vrij te maken kunnen
als een voorwaarde gezien worden. Eveneens zou meer aandacht in de
opleidingen voor leerkrachten en schoolleiders voor deze kwestie een
bevorderende factor kunnen zijn.
5.3. Wetenschappelijke relevantie en implicaties voor vervolgonderzoek
Vooreerst heeft deze opzet aangetoond dat Kirkpatricks model bruikbaar is
voor verdere toepassing in onderzoek over ondersteuning bij datagebruik.
Verder leverde deze studie een waardevolle poging om binnen deze context
een gecontroleerde veldstudie op te zetten, wat nieuw is voor dit
onderzoeksdomein. Voor vervolgonderzoek kan aanbevolen worden om
deze onderzoekslijn verder uit te bouwen door quasi-experimenteel
onderzoek op te zetten in educatieve contexten waar
schoolfeedbackgebruik reeds verder uitgebouwd is. Zo dienen de mogelijke
differentiële effecten te verklaren door gebruikersgebonden kenmerken
verder onderzocht te worden. In aanvulling op deze studie kunnen
kwantitatieve gegevensverzamelingen daartoe aangewend worden (bv.
Vanhoof et al., in druk). Daarnaast bevelen we aan om de effecten van
langetermijnondersteuning na te gaan. Longitudinaal onderzoek kan helpen
verklaren of de gevonden verschillen tussen condities deels te wijten zijn
aan de genoten ondersteuning of enkel aan verschillen tussen gebruikers.
Dit neemt niet weg dat studies rond eenmalige ondersteuningsinitiatieven
de nodige aandacht verdienen, zowel omdat ze een realiteit zijn in
educatieve settings alsook omdat bijvoorbeeld deze onderzoeksresultaten
waardevolle invloeden rapporteren. Daarbij is er best aandacht voor zowel
korte- als langetermijneffecten, alsook voor effectgerichte en
procesgerichte resultaten (Schildkamp & Teddlie, 2008; Schildkamp et al.,
2009).
Chapter 6
160
Literatuur
Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral
change. Psychological Review, 84(2), 191-215.
Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the
utilisation of management information systems in secondary schools.
School Effectiveness and School Improvement, 18(4), 451-467.
Coe, R. (2002). Evidence on the role and impact of performance feedback in
schools. In A. J. Visscher & R. Coe (Eds.), School improvement through
performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger.
Earl, L., & Fullan, M. (2003). Using data in leadership for learning.
Cambridge Journal of Education, 33(3), 383-394.
Fitz-Gibbon, C.T., & Tymms, P. (2002). Technical and ethical issues in
indicator systems: Doing things right and doing wrong things. Education
Policy Analysis Archives, 10(6), 68-83.
Gardner, R. (1995). Onservice Teacher Education. In L. W. Anderson (Ed.),
International Encyclopedia of Teaching and Teacher Education (pp. 628-
632). London: Pergamon Press.
Gonczi, A. (1994). Competency based assessment in the professions in
Australia. Assessment in Education: Principles, Policy & Practice, 1(1), 27-
44.
Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support
school inquiry and continuous improvement: Final report to the Stuart
Foundation. Los Angeles: University of Carolina, Center for the Study of
Evaluation.
Huffman, D., & Kalnin, J. (2003). Collaborative inquiry to make data-based
decisions in schools. Teaching and Teacher Education, 19, 569-580.
Ingram, D., Louis, K.S., Schroeder, R.G. (2004). Accountability policies and
teacher decision making: Barriers to the use of data to improve practice.
Teachers College Record, 106(6), 1258-1287.
Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006).
Strategies to promote data use for instructional improvement: Actions,
outcomes, and lessons from three urban districts. American Journal of
Education, 112, 496-520.
Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels.
San Francisco: Berrett-Koehler.
Kurasaki, K.S. (2000). Intercoder reliability for validating conclusions drawn
from open-ended interview data. Field Methods, 12(3), 179-194).
Kyriakides, L., & Campbell, R.J. (2004). School self-evaluation and school
improvement: A critique of values and procedures. Studies in
Educational Evaluation, 30, 23-36.
Chapter 6
161
Lachat, M.A., & Smith, S. (2005). Practices that support data use in urban
high schools. Journal of Education for Students Placed at Risk, 10(3), 333-
349.
Learning Point Associates. (2004). Guide to using data in school
improvement efforts: A compilation of knowledge from data retreats and
data use at learning point associates. Opgehaald op 23 oktober 2007,
van http://www.learningpt.org/pdfs/datause/guidebook.pdf
Leithwood, K., & Aitken, R. (1995). Making schools smarter: A system for
monitoring school and district progress. Newbury Park, CA: Corwin.
Lindlof, T.R., & Taylor, B.C. (2002). Qualitative communication research
methods (2nd ed.). London: Sage.
Maes, F., Van Petegem P., & Van Damme, J. (2005). Schoolloopbanen in het
basisonderwijs (SiBO): Doelstellingen en onderzoeksopzet. Paper
gepresenteerd op de Onderwijs Research Dagen, Gent, België.
Mathison, S. (1992). An evaluation model for inservice teacher education.
Evaluation and Program Planning, 15, 255-261.
Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An
expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage.
Murnane, R.J., Sharkey, N.S., & Boudett, K.P. (2005). Using student-
assessment results to improve instruction: Lessons from a workshop.
Journal of Education for Students Placed at Risk, 10(3), 269–280.
Nevo, D. (2002). Dialogue evaluation: Combining internal and external
evaluation. In D. Nevo (Ed.), School-based evaluation: An international
perspective (pp 3-16). Oxford, UK: Elsevier Science.
Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic
approach. Thousand Oaks: Sage.
Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’
data: A science in the service of an art? Paper presented at the British
Educational Research Association Conference, Brighton, University of
Sussex.
Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform:
Which data, what purposes, and promoting and hindering factors.
Teaching and Teacher Education, 26(3), 482-496.
Schildkamp, K., & Teddlie, C. (2008). School performance feedback systems
in the USA and in The Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school
self-evaluation instrument. School Effectiveness and School
Improvement, 20(1), 69-88.
Tymms, P. (1995). Influencing educational practice through performance
indicators. School Effectiveness and School Improvement, 6(2), 123-145.
Chapter 6
162
Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatie-
indicatoren als strategisch instrument voor schoolontwikkeling.
Pedagogische Studiën, 81, 338-353.
Vanhoof, J., Van Petegem, P., & De Maeyer, S. (2009). Attitude towards
school self-evaluation. Studies in Educational Evaluation, 35, 21-28.
Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P.
(in druk).The influence of competences and support on school
performance feedback use. Educational Studies.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
school performance feedback: Perceptions of primary school principals.
School Effectiveness and School Improvement, 21(2), 167-188.
Verhaeghe, J.P., & Van Damme, J. (2006). School performance feedback in
Vlaanderen, een schets op basis op van een projectvoorstel. Informatie
vernieuwing onderwijs (IVO), 27(103), 19-27.
Visscher, A. J. (2002). A framework for studying school performance
feedback systems. In A.J. Visscher & R. Coe (Eds.), School improvement
through performance feedback (pp. 41-71). Lisse, The Netherlands:
Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through
performance feedback. Lisse, The Netherlands: Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Wayman, J.C., Midgley, S., & Stringfield, S. (2007). Leadership for data-
based decision making: Collaborative educator teams. In A.B. Danzig, K.
M. Borman, B.A., Jones & W.F. Wright (Eds.), Learner-centered
leadership: Research, policy and practice (pp. 189-205). New Jersey, USA:
Lawrence Erlbaum Associates.
Weiss, C.H. 1998. Have we learned anything new about the use of
evaluation? American Journal of Evaluation, 19(1), 21-33.
Webber, S., & Johnston, B. (2000). Conceptions of information literacy: New
perspectives and implications. Journal of Information Science, 26(6),
381-397.
Williams, D., & Coles, L. (2007). Teachers’ approaches to finding and using
research evidence: An information literacy perspective. Educational
Research, 49(2), 185-206.
Young, V.M. (2006). Teachers’ use of data: Loose coupling, agenda setting,
and team norms. American Journal of Education, 112, 521-547.
Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for
effectiveness and improvement in classrooms and schools in upper
Chapter 6
163
secondary education in Slovenia: Assessment of/for Learning Analytic
Tool. School Effectiveness and School Improvement, 20(1), 89-122.
164
CHAPTER 7
GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK
Chapter 7
165
CHAPTER 7: GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK
1. Introduction
In this final chapter of this doctoral dissertation on school performance
feedback, an overall reflection is presented about the outcomes of the
different studies. By resuming, integrating and summarizing these results, a
comprehensive picture is developed in relation to the research objectives
(RO). In addition, a general discussion is provided. The latter also requires
us to discuss the limitations of the different studies, and directions for
future research. After giving an overview of theoretical, practical,
methodological and policy implications, we finally present a general
conclusion.
2. Overview of research objectives and main findings
2.1. RO1: Exploring the characteristics of SPFSs
Numerous school feedback initiatives have been set up to provide schools
with confidential information about the way they function. This is expected
to foster school improvement processes by inducing continuous self-
reflection at the school level. However, up to now, no systematic
description or inventory of SPFSs characteristics was available to inform
feedback users and/or designers. Given that SPFS characteristics may
influence the degree to which the feedback is actually used for school
improvement (e.g., Schildkamp & Visscher, 2009; Verhaeghe et al., 2010), it
is important that SPF designers and/or users consider in a critical way the
key features of an SPFS. To make decisions based on data, users need to
purposefully choose the type of SPFS that corresponds to their information
needs. This requires the availability of a transparent overview of specific
characteristics of available SPFSs, especially including their strengths and
weaknesses. In Chapter 2, a preliminary framework was developed for
describing and comparing SPFSs, which has been applied to five SPFSs. This
framework comprehends analytical aspects related to the data gathering
process, the data analysis approach, the content of the feedback report and
about the numerical measures and graphical representations being used.
The results of the surveys and in-depth interviews with directors of five
SPFSs illustrate the wide variety in both the feedback reports and the
underlying feedback system. Apparently, the SPFS designers did make
deliberate decisions related to their feedback design, considering the
Chapter 7
166
ethical, practical, technical and infrastructural possibilities and constraints
of the educational system in which they operate. With respect to the
quality criteria of performance indicators, this descriptive and analytical
study paid specific attention to the relevance, accuracy, cost-effectiveness,
fairness and beneficence of the feedback delivered to schools (Fitz-Gibbon,
1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp &
Teddlie, 2008; Visscher, 2002). These quality criteria introduce the presence
of several prerequisites, related to all components of an SPFS.
First, with regard to the data gathering process, several procedures are
built-in, in order to guarantee accurate (i.e. reliable and valid) data. Both
testing instructions (protocols) and structured measurement instruments
are supplied to the schools. An interesting observation in relation to some
of these instruments is the technological features that enable tailored
testing of pupils, at any moment, about any subject, at any place. The
integration of test item banks, IRT techniques, computer adaptive testing
and data compatibility with a school’s management information system
seems to be the most promising way to attain accessible and low stakes
testing. A clear example of the latter type of SPFS is the assessment Tools
for Teaching and learning, developed at the Auckland University in New
Zealand.
Next, our study focused on several aspects of the data analyses being
used by SPFSs: on (1) the underlying scaling models being used, (2) the data
analysis model, (3) the opportunities for longitudinal measurements, (4) the
inclusion of pupil mobility and (5) the levels of aggregation. A wide variety
in scaling procedures and statistical analyses could be observed in the
selected SPFSs. A key point of discussion is related to finding a balance
between statistical correct - and thus complicated - analyses and accurate
results on the one hand and understandable analyses and user friendly
results on the other hand. For example, the analyses used in PIPS
(Performance Indicators in Primary Schools; Centre for Evaluation and
Monitoring) are fairly straightforward and not too complex. Though this
underpins the user-friendliness of PIPS, it might also lead to less accurate
data as schools are sometimes wrongly classified due to the lack of a
multilevel analysis perspective (Goldstein & Spiegelhalter, 1996; Karsten,
Visscher, Dijkstra, & Veenstra., 2010). Furthermore, it is important to realize
that measuring can introduce types of error, of which users should be
informed (Fitz Gibbon & Tymms, 2002; Mortimore & Sammons, 1994;
Rowe, 2004; Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996;
Yang et al., 1999; Karsten et al., 2010). Finally, in view of the analysis of the
five SPFSs a discussion started in relation to different conceptions of value
added; in particular when it comes to a fair comparison of a school’s
Chapter 7
167
performance with a reference group. The discussion about value-added
illustrates that both the conceptualization and operationalization of this
concept is highly problematic, due to actual constraints (e.g., pupil
mobility), ethical constraints (e.g., adjustment for pupil characteristics),
technical constraints (e.g., model complexity), and practical constraints
(e.g., immeasurability of variables).
Next, the feedback content of the SPFSs has been analyzed in order to
evaluate the data relevance. The SPFSs in this study focus mainly on a
limited number of cognitive outcomes (e.g., in relation to language,
mathematics and/or science), which are part of the core curriculum in most
countries. Developers of SPFSs might consider how to include other subject
areas in the SPFSs, as well as more attitudinal, behavioral and contextual
information, because the latter is critical when school staff is expected to
make data-driven improvement decisions. They will need a broader range
of data (Schildkamp & Kuiper, 2010). When analyzing the feedback content,
the analysis of the five SPFSs also centered on the numerical measures and
graphical representations being used. We could observe the use of a wide
range of numerical measures, comprising adjusted, expected, predicted and
raw scores. Examples are band scores, cut-off scores, grade scores, learning
gain scores, mean scores, percentages, percentiles, rescaled scores,
standardized scores, and value-added scores. These measures and the
accompanying graphical representations assume a sufficient level of
assessment literacy of SPFS users in view of a correct understanding of the
feedback. However, research revealed that even simple numerical
conceptions and representations are often interpreted incorrectly (Earl &
Fullan, 2003; Zupanc et al., 2009). This raises the question whether
feedback suppliers also ought to provide specific support to guarantee that
the feedback delivered can lead to the desired school improvement
outcomes and cannot result in harmful effects (Fitz Gibbon & Tymms, 2002;
Rowe, 2004).
2.2. RO 2: Developing a framework for SPF use, including influencing factors
and effects
To develop a framework for SPF use (Chapter 3), we could build on a basic
model developed by Visscher (2002; Visscher & Coe, 2003). His framework
discerns four sets of factors influencing the use of the performance
feedback, including the design process and features of the underlying
SPFSs, the implementation process and the school organizational features.
This framework served as a basis for the studies conducted in this
dissertation, although some adaptations were made. Visscher and Coe
Chapter 7
168
embed the process of feedback use in the broader school environment,
which are defined as context-related factors in our framework.
Furthermore, we distinguish support-related factors as a separate set
instead of positioning it within the implementation process and
characteristics of the feedback system. As a result, the following set of
influential factors is outlined: Factors related to the educational context, to
school and users, to SPFSs, and to support.
The second major adjustment to the Visscher framework was a
refinement of the conceptualization of SPF use. In the framework of
Visscher, only types of SPF usage are discerned. In our approach, we
additionally discern phases in SPFS use (Verhaeghe et al., 2010): (1) the
reception of the feedback in a school, (2) the reading and discussing of the
feedback information in order to come to (3) an interpretation of the
school’s results, followed by (4) a diagnosis or the search for explanations
and (5) the planning of improvement actions, which are (6) implemented
and about which the outcomes are (7) evaluated. Finally, in Chapter 3, two
additional types of feedback use were added to the typology described by
Visscher (i.e. instrumental, conceptual, symbolic, and strategic use): a
motivating/stimulating and pupil-directed type of feedback use.
The study described in Chapter 3 verifies the components of this
updated framework by involving a sample of primary school principals,
actively engaged in the School Feedback Project in Flanders. Semi-
structured in-depth interviews and a predefined coding scheme were used
as qualitative instruments. This resulted in a validation of all framework
components and some additions to the framework. The updated and
validated framework influenced the other studies set up in the context of
this doctoral thesis. Key elements of this framework influenced the studies
presented in Chapters 5 and 6. These chapters discuss a quantitative (path
modeling) and a qualitative study (case ordered predictor-outcome meta-
matrix), building on an experimental design.
The following key findings result from the studies described in Chapter 3
and 6. Firstly, the context-related factors that influence school performance
feedback refer to the educational climate in which SPFSs are developed and
implemented. For example, in Flanders, it holds that there is no strong
pressure on data use, due to a lack of a national assessment policy, the lack
of central assessment system, the non-coercive role of the educational
inspectorate, and the autonomy granted to schools. As a result, no strong
data culture is observed in schools. Second, characteristics of the feedback
and related SPFSs also influence feedback use. As described in Chapter 2,
the perceptions of feedback users about the relevance, accuracy, cost-
effectiveness, fairness and beneficence of the feedback delivered to
Chapter 7
169
schools, will mainly determine what efforts will be made to put feedback
use into practice. With respect to relevance, we found that some
respondents of the School Feedback Project lack information about school
subjects other than mathematics and language. Furthermore, they mention
that aggregated feedback information is especially interesting for actors
involved in mesolevel school activities, in contrast to teachers who prefer
pupil level information that can be linked to microlevel interventions.
Concerning the feedback interpretability, our findings illustrate that
principals experience difficulties in interpreting feedback information.
Remarkably, feedback that ought to have a signalizing function was rather
perceived as being valid only when it matched prior conceptions and
experiences of its users. Thirdly, school- and user-related factors that play
an important role in the use of the feedback could be linked to users’ data
literacy and expectations about feedback use. Furthermore, also the
priorities in task schemes within a school and the perception of the school’s
performance level appear to influence feedback use. Principals state that
no clear expectations were defined prior to using the school feedback, that
their data-literacy skills are limited and that feedback data use is not a
priority. Furthermore, when feedback results were perceived as
unsatisfying, this seems to confirm the feedback intervention theory for
being willingly to reduce the gap between the observed and intended
outcomes (Black & William, 1998; Hattie and Timperley, 2007) or it resulted
in withholding feedback information by school principals to not discourage
their school staff. Fourthly, some support related factors had to be
discussed. Support needs are observed during the different phases of SPFS
use: from the interpretation phase to the implementation of improvement
actions. Principals suggested two scenarios to involve external support
services. These suggestions inspired the design of the INSET and ONSET
interventions discussed in Chapter 6.
The interview results, obtained in the studies in Chapters 3 and 6, show
that, in general, school feedback is not intensively used and has a limited
impact on the actual way schools function. Mostly, schools did hardly attain
the phase of planning future actions on the base of the school feedback.
This resulted – consequently - in a limited instrumental usage. However,
conceptual use was reported more often, which was also found in Chapter
5. This suggests that conceptions about SPF use starts to enter school
related discussions and it suggests that it starts to affect teacher thinking. In
the framework, attention is paid to the expected impact of school feedback
use. Thus far, only basic indicators for intermediate school improvement
effects could been observed; such as an increased interest of school staff in
feedback results, a decreased reluctance to start feedback team
Chapter 7
170
discussions, a clearer picture of the learning gains of pupils, more
confidence in the school’s functioning and an increase in the reflection on a
school’s performance. Next to the expected outcomes of school feedback
use, also unintended outcomes could be outlined. For example, feedback
did lead to an increase in teaching to the test or to feelings of
disappointment of school staff when confronted with unsatisfying pupil
performance.
2.3. RO 3 & 4: Exploring data literacy competences & effects of alternative
data representation modes on feedback interpretation abilities
The analysis of SPFS characteristics - in Chapter 2 - revealed a typical use of
complex numerical measures and graphical representations in the feedback
reports. Furthermore, findings from the study - described in Chapter 3 -
illustrated that the interpretation phase is one of the main stumbling
phases in the process of feedback use. This is in line with the discussions in
the literature about the limited data literacy competences of data users.
However, no empirical assessment of data literacy competences related to
SPF use has yet been carried out and reported in the literature.
Furthermore, to our knowledge, no research findings focusing on the
interaction of data literacy competences with the characteristics of SPFSs
have been published. This explains the relevance of the study reported in
Chapter 4 about the research objectives 3 and 4. Additionally, also the
research findings reported in Chapters 5 (n = 116) and 6 (n = 18) contribute
to studying RO 3, since they report about the data literacy competences of
feedback users.
An experimental design with a post-test was set up, focusing on two
alternative ways to explain value added, in combination with three
alternative approaches to represent learning gain and value-added. The
participants were freshmen in domain of the educational sciences, enrolled
at Ghent University (n = 312). Tests were calibrated (by IRT based
techniques) to assess both the ability levels of the students and the item
difficulty levels. Students were asked to assume the role of a school
principal who received a school performance feedback report based on the
results from a longitudinal study in which his/her school participated
(similar to the feedback reports produced by the “School Feedback
Project”). The students received an introduction to the central concepts and
were given a set of related graphical representations, developed and
presented via a PowerPoint-presentation. Subsequently, they were
requested to complete a knowledge and skill test related the interpretation
of school feedback (test reliability ranging from .72 to .90). Both conceptual
Chapter 7
171
(i.e., understanding central concepts) and procedural knowledge (i.e.,
deriving information from graphical representations) was tested.
The descriptive results in Chapter 4 indicate that users experience major
difficulties to successfully solve procedural value-added items (only 35 % of
respondents were able to do so). We can explain this by referring to the
cognitive load theory (Chandler & Sweller, 1991; Sweller, van Merriënboer,
& Paas, 1998), as high cognitive demands are posed on the users when
interpreting value-added scores. The working memory is not able to cope
with too much information at the same time. Examining the nature of the
errors the participants make when calculating value added, patterns could
be observed in the incorrect answers. This enabled us to reconstruct the
thinking process of participants and to identify basic misconceptions. A
typical misconception, made when calculating value added was for example
the confusion of the heights of curves with their slopes, also known as the
slope-height confusion (Beichner, 1994; Clement, 1989; Kramarski, 2004;
Leinhardt, Zaslavsky, & Stein, 1990). Furthermore, respondents mostly gave
correct answers to the conceptual questions related to the information that
was literally explained in the school feedback presentation (87% correct
answers). In contrast, low test scores were observed when the questions
required deep level conceptual thinking (24% correct answers).
In Chapter 5, data literacy competences of school principals,
participating in the School Feedback Project, were examined by means of
an IRT calibrated data literacy test and a self-report based survey
(indicators of self-efficacy with respect to data interpretation and the
consecutive diagnosis phase). The test used reflected a reliability of .83 and
consisted of items measuring the conceptual and procedural understanding
of the feedback reports. The data literacy test results reveal that only 42%
of the respondents answered half of the questions correctly, though some
school principals succeeded in interpreting all the information from the
report. Analysis of the difficulty of the literacy test items points out that
most principals experience difficulties in relation to procedural items. The
conceptual questions were apparently less difficult. Although test scores
were rather disappointing, most of the respondents reflected a positive
self-efficacy score relating to their ability to interpret and use the feedback
report (M = 3.81, SD =0.74). The unsatisfying data literacy skills are
reconfirmed when looking at the findings from the in-depth interviews in
Chapter 6 with school principals, participating in the School Feedback
Project. Even if elaborate explanations are provided within the feedback
reports, users encounter and report interpretation difficulties.
Furthermore, communicating the feedback findings to other staff members
or looking for explanations appear to be difficult for school principals.
Chapter 7
172
Regarding attitudes towards SPF use - another aspect of data literacy - the
studies reported in Chapters 5 and 6 show a positive attitude towards SPF
use. The scale results reported in Chapter 5 (range 1-6, M = 3.97, SD = 1.08,
α = .91) imply that feedback use is considered as a relevant activity that
fosters self-evaluation. However, school principals report a less positive
attitude of their teachers (Chapter 6), as they are confronted with
considerable demands related to the data collection, and that they may feel
threatened by the feedback results. Therefore they seem to prefer pupil
level information instead of aggregated school feedback data.
With respect to RO 4, central in the research reported in Chapter 4, we
can conclude that our findings confirm the research hypothesis that users
experience difficulties in interpreting complex conceptual and graphical
information, due to the interplay between the inherent complexity of SPF
and their lack of prior knowledge. We compared two alternative ways to
explain value added on the final understanding of the concept. This study
proved to be helpful to detect which alternative explanation facilitated a
better conceptual and/or procedural understanding. Explaining the concept
in terms of “the difference between observed and expected growth”
appears to be better than explaining it in terms of “the difference between
the school’s adjusted growth curve and the reference growth curve”. In
terms of the alternative graphical representations used in the experiment,
it is rather surprising that the tables did not add to the users’ understanding
of the feedback report. However, this does not imply that the use of tables
in combination with growth curves is not advisable. Previous research
indicates that different information is derived from tables and graphs
(Meyer, Shinar, & Leiser, 1997); both sources of information have merits,
depending on the task being performed (Schnotz & Bannert, 2003). An
appropriate use of tables and graphs can therefore avoid extraneous
cognitive load and foster a better understanding.
2.4. RO 5: Exploring effects of support on SPF use
The research findings in the studies reported in Chapters 2, 3 and 4
demonstrate that one of the main stumbling blocks in SPF use is the
interpretation phase, primarily due to a lack of data literacy competences.
This finding raises the question for appropriate support initiatives, in view
of SPF use. Therefore, a field experiment with post test (n = 195) was set
up, building on the insights developed during the previous studies. This
resulted in an experimental study, reported in Chapters 5 (IRT testing,
survey research, path modeling) and 6 (in-depth interviews, case ordered
predictor-outcome meta-matrix). In both studies, participants were
Chapter 7
173
principals of schools involved in the School Feedback Project. The support
initiatives that were studied encompassed an INSET (inservice education
and training) and an ONSET (onservice education and training) initiative.
The INSET and ONSET approach can be positioned on the continuum
reported by Gardner (1995). Both support initiatives did build on
suggestions of school principals (see Chapter 3), as a solution to the data
interpretation difficulties they encountered. This helped to distinguish
three research conditions to which respondents were randomly assigned:
an INSET (n = 23), an ONSET (n = 7) and a control group (n = 150). In Chapter
5, the results of the INSET and control group have been presented. The
INSET intervention included a half-a-day workshop about SPF interpretation
and use, using a fictitious feedback report as instructional material, and
organized in a university building. In contrast, the ONSET intervention was
organized in the school of the principal, where his/her own school feedback
information was discussed. In view of the evaluation of the differential
support effects, we built on Kirkpatrick’s (1998) four levels in the evaluation
of training initiatives. First, the Reaction level, refers to the extent in which
participants are satisfied about the support initiative. Next, the Learning
level examines the increase in knowledge and skills and the change in
attitudes. This was studied by examining whether the support did
contribute to an increase in data literacy competences. Third, the Behavior
level checks the transfer of what has been learned to the local organization.
In our studies, this focus on the Behavioral level encompasses the influence
of the support intervention on the phases in SPF use and types of SPF use,
as defined in our theoretical framework in Chapter 3. The fourth level – the
Results level - refers to the effects of the support initiative on achieving the
organization’s aims and on the organization itself. This was measured by
asking for the perceived (school improvement) effects of SPF use.
The study about the impact of the INSET approach - as compared to the
control group - was reported in Chapter 5. A path model (X² (df) = 11.3 (13),
p = 0.58; RMSEA = 0.01; AGFI = 0.92; GFI = 0.97) was tested to check
whether principals in the INSET research condition attained significantly
higher data literacy competences (attitudes, knowledge and skills, and self-
efficacy), reported a higher extent of feedback use and reported a higher
number of perceived effects. Building on our theoretical framework, we
expected the INSET initiative to affect the SPF-related competences in a
direct way. This hypothesis was only partly confirmed since the support
provision did have a statistically significant effect on the mastery of
knowledge and skills and on self-efficacy, but not on the attitudes related to
SPF use. The impact on self-efficacy remained limited. This can be explained
by the limited scope of the support initiative, the raise in awareness about
Chapter 7
174
the complexity of school feedback, or by the quality of the support
intervention.
The path model test results also reveal no significant direct impact of
support on phases in SPF use, types of SPF use and no significant impact on
the perceived effects of school performance feedback use. Only indirect
effects of support on these variables are found. These indirect effects are in
line with Kirkpatrick’s (1998) model, implying that a higher level can only be
achieved if lower levels have been attained. Specific for our study, this
means that the phases in SPF use (Level 3), types of SPF use (Level 3) and
the resulting school improvement effects (Level 4) will only be influenced
by a particular support intervention, when this support had a prior effect on
the data literacy competences (Level 2) of its users.
The qualitative study, focusing on both the INSET and ONSET training
provision (Chapter 6), checked the impact on the following dependent
variables; satisfaction with the training (Level 1), data literacy competences
(Level 2), SPF use (level 3), and perceived effects of feedback use (Level 4).
ONSET participants report a higher satisfaction level, and attain a higher
data literacy competence level. Differences in feedback use could be
observed; and this in relation to the phases of reading and discussing,
interpreting and diagnosing. This differential impact can be linked to the
content of the ONSET approach. These contents were less prominent in the
INSET condition and were lacking in the control condition. No differences
were found in effects of data use.
3. General discussion: “Mirror, mirror on the wall”
Data-driven decision making is a buzz words that recently entered the
educational jargon (cf. the fancy abbreviation “D3M”). The related usage of
concepts, such as learning gain, output measurement, value added, etc. is
overwhelming to the (often) statistically less literate school staff. However,
teachers and school principals are supposed to master these concepts. This
expectation is implicitly in the way educational authorities and related
educational quality assurance systems (e.g., the inspectorate) position
policy papers that underline autonomy, accountability, and continuous
school improvement. Central in the discourse about school improvement is
the creation of data-rich environments that inform schools about their
functioning. It is in this context that SPFSs become important and help to
present a mirror for each school. “Mirror, mirror on the wall, are we doing
well at all?” is the central question that has to be answered by school staff.
The motive and need to look into the mirror is not personal vanity, but
Chapter 7
175
either external pressure (cf. accountability) or an internal motivation (cf.
school improvement) or a combination of both driving forces. Instead of
getting “wrinkled by age”, schools are expected to look better by being able
to close the gap between observed and the desired outcomes (Black &
William, 1998; Hattie & Timperley, 2007; Kluger & DeNisi, 1996). SPFSs,
therefore, should pinpoint the strengths and weaknesses in school
performance.
In order to be effective, school feedback should be helpful to answer
three questions (Hattie & Timperley, 2007; Hattie, 2009): Where am I going
(Feed up)? How am I going (Feed back)? Where to go next (Feed forward)?
The first question refers to the learning intentions and goals, the learning
targets and expectations underlying the curricula. The second question asks
to what extent the school attains its targets, while the answer on the third
question offers directions for future action. The literature about the impact
of current SPFSs and an analysis of the nature of the SPFSs, indicate that
current SPFSs are mainly geared to answer the second question (Feed
back). Additionally, simply receiving feedback will – as such - not guarantee
that the feedback will be used. Several participants in our studies
mentioned that they would like to receive additional information; e.g.,
concrete improvement indications and concrete directions for
improvement actions. The latter implies that limiting the process of school
feedback to “holding a mirror”, will not easily lead to a sufficient or
adequate level of self-reflection and related improvement actions.
However, it can be questioned whether SPF suppliers should fulfill the
additional need for school feedback support. As they are external agents,
they have a less clear view on all input, process, and contextual variables
that influence performance within a particular school. It looks more sound
to cooperate with actors that are more closely related to the schools, such
as educational advisors. Furthermore, a debate should start about the
function of SPFSs to determine whether school feedback should be
conceptualized in a broader way, and should therefore go beyond a signal
function.
In the context of school feedback, the question “How am I going?” might
pose specific problems. Feedback users expecting that a full picture will be
presented “in the mirror” about their school, might be disillusioned. Users
have to understand that an SPFSs reports on certain aspects of the school’s
functioning, that has been measured at a particular moment, by involving
particular (groups of) pupils/students, and building on specific
measurement instruments and techniques. Feedback results should
therefore be linked to other data sources, and to personal experiences. In
case this results in conflicting findings, educators have to search for
Chapter 7
176
explanations rather than denying specific feedback results. This was
exemplified in our own studies. In certain cases, the validity of the SPF was
questioned or even denied by principals, when the feedback did not match
the current policy or plans of the school. School feedback will not work if
users only “see what their eyes want to see”. School feedback use assumes
an open mind of its users. A second issue related to the how-am-I-going-
question, is that feedback users might only attain a blurred view in their
mirror, due to complex nature of the feedback and the limited data literacy
competences of the user. The provision of additional support in data
interpretation would be helpful to offer “glasses” to develop a better
understanding of the feedback. It can be suggested that the provision of
SPF support is an ethical requirement (cf. feedback should at least do no
harm; Fitz-Gibbon and Tymms, 2002) that is to be delivered by the feedback
suppliers. A third issue can be raised that centers on the possibility that the
mirror presents a distorted view of the school reality. The question has to
be asked whether SFPSs offer neutral or objective information. Every
approach to develop feedback, builds on assumptions about what is
relevant and when data are accurate. As discussed in Chapter 2, these
assumptions seem to differ considerably from system to system. Therefore,
a clear insight should be available about the underlying rationales to select
certain feedback characteristics. At least, users should get informed about
the strengths and limitations of the SPFS and the feedback received.
A final discussion concerns the extent to which schools, principal and
teachers fully exploit the level of autonomy granted within the Flemish
educational system. This introduces the impact of personal characteristics
of feedback users (Kluger & DeNisi, 1996). Flemish teachers and principals
are relatively free in designing their pedagogical project, choosing learning
methods, designing curricula and monitoring their quality. In the Flemish
context, we can question whether schools adopt a clear and powerful level
of “autonomy” and translate this into a school quality assurance policy. In
our studies, we hardly observed related indicators. Some schools, for
example, only added the feedback results to the output section of the self
evaluation report they prepared in view of a visit of the inspectorate. In
specific cases, the feedback was not read, nor screened. In this process, a
key role is played by the school principal. In most cases, the feedback report
entered the school via the desk of the school principal. He or she
determined whether the information was neglected or was distributed to
the school community members and was the starting point for a quality
related school team discussion. In the latter case, school principals
demonstrated a distributed leadership role, and school quality care became
the responsibility of all actors involved in the school system. This mirrors a
Chapter 7
177
broad view on professional development, profession identity, and an
inquiry habit of mind (Earl & Katz, 2006). It is to be stressed that the latter is
one of the core competences of Flemish teachers (Flemish Government,
2007).
4. Limitations of the studies and directions for future research
In the following paragraphs, we discuss a list of main critiques and or
shortcomings that can be raised in relation to our studies. At the same time,
this list helps to define directions for future research.
4.1. Study samples
The selection of research participants can be critiqued in a number of ways.
As the studies in this dissertation were part of a broader R&D project that
aimed at designing, developing and implementing an SPFS to be used in the
Flemish context, the recruitment of research participants was set up in a
particular way. In three studies, the samples consisted of primary school
principals, drawn from the larger pool of primary schools participating in
the SiBO project/ School Feedback Project (Chapters 3, 5, & 6). This sample
is relatively small (n = 195) when compared to the 2321 schools organizing
primary education in Flanders (Vlaamse overheid, Beleidsdomein Onderwijs
en Vorming, 2010). This resulted in rather small scale studies. Also, the
involvement of the principals in a pupil monitoring project might have
introduced a sampling bias since these principals expressed a clear interest
in examining pupil performances. Furthermore, this small group was
regularly asked to fill out research instruments (surveys and tests). As a
result, the response rate declined; though remained satisfactory for our
studies. Furthermore, the research samples were only put in a user context
linked to one particular SPFS in the Flemish educational context. This
introduces the need to expand our research by involving a larger and more
varied sample of principals, which is chosen from varying educational
contexts and in view of working with other SPFSs. This could help to
validate the current research findings. For example, within the UK, about
4.500 primary schools (and their principals) participate in PIPS related
research. This amount of participants creates opportunities to carry out
more advanced types of statistical analyses (e.g., multi-level analysis). Or,
better quality tests could be developed since a sound IRT calibration
approach requires a minimum of 500 respondents for each test item.
Chapter 7
178
The nature and quality of the research samples is also an issue in the
studies reported in Chapters 2 and 4. In Chapter 2, only five SPFSs have
been selected. This is not a representative selection. These five systems
were selected because they reflect the wide variety in SPFSs on the one
hand, but the selection was also driven by a pragmatic issue on the other
hand: to what extent was a spokesperson available to be involved in the
qualitative study. A more comprehensive inventory of SPFSs, used
worldwide, will offer perspectives to further develop the analytical
framework focusing on characteristics of SPFSs. On this base, an additional
research line could start that involves school feedback designers. In the
study described in Chapter 4, the discussion about the quality of the sample
takes a different direction. The decision to involve students affects the
external validity of the research findings. Although this experiment did lead
to interesting findings, the results of this study require to be validated with
a sample of principals or teachers. We cannot assume that the data literacy
competences of university freshmen are comparable to these of inservice
teachers or principals. Due to practical constraints, it is very difficult to set
up experimental studies involving school staff (e.g., administering data
literacy tests). Alternative research designs should be considered, such as
quasi-experimental designs with non-randomized groups of participants.
The studies reported on in the Chapters 3, 5 and 6 solely build on the
experiences of primary school principals. The involvement of other school
team members (teachers, care coordinators) can be considered. Since we
can expect that the availability of school data will increase over time, it
might be realistic that specific school staff members develop the related
data literacy competences. This type of task specialization seems to
increase in the Flemish educational system. Another approach could build
on an international project, involving teachers from countries where data
use is already a better integrated in the school culture (e.g., UK, The
Netherlands, New Zealand, etc.).
As stated earlier, a selection bias can have played a role since the
research participants were volunteers (Rossi, Lipsey en Freeman 2004). This
is critical in view of the internal validity of the study. Future research should
examine whether this subgroup is different from the population of school
principals by checking relevant school population characteristics.
Nevertheless, efforts have been undertaken to control for this type of bias
in the studies reported in Chapters 3, 4, 5 and 6.
Chapter 7
179
4.2. Research design and data analysis
A major advantage of applied research is bridging the gap between
scientific research and practice (Broekkamp, Vanderlinde, van Hout-
Wolters, & van Braak, 2009). However, applied research set up in a typical
school context, introduces several limitations. In the case of this
dissertation research, this affected the number of research participants
involved in the studies. They were all linked to the same School Feedback
Project. Furthermore, it can affect the external validity of the research
findings. Finally, to prevent the risk of putting too much pressure on the
principals in the project, both the number of research instruments, the
duration of interventions, the administration of pretests and intermediate
tests, etc. had to be limited. In future research, it is preferable to set up
more continuous support provisions, to develop a baseline as to the
dependent measures (pretests), and to set up follow-up tests. Furthermore,
a longitudinal perspective could be implemented to study the growth in
data literacy competences and the changes in school feedback use during
different consecutive feedback cycles. Lastly, the delayed effects of SPF
usage on student achievement could be studied. Such effects can only be
expected after several SPF cycles and a persistent effort in taking up
effective SPF use.
A next limitation builds on the measurement instruments used in the
different studies. Most did build on self-reporting (e.g., surveys and
interviews described in Chapters 3, 5 and 6) of the principals’ perceptions
about SPF. These perceptions can only be considered as proxies for their
attitudes towards SPF use, their actual feedback use on the school and the
concrete school improvement effects caused by SPF use. This limitation
introduces the need for research that links the ‘perceived’ to the ‘expected’
and the ‘actual’ use of SPF. To measure school improvement effects,
measurement techniques, such as school observations, video analyses of
staff meetings, analyses of inspection visit reports, class tests and school
documents by researchers are more optimal choices. Data resulting from
the use of these instruments help to develop a broader view of a school’s
functioning (e.g., Schildkamp & Kuiper, 2010). In the experimental studies,
in addition to the skills and knowledge tests developed in view of the
studies reported in Chapters 4 and 5, other tests could be used. For
instance, it could be interesting to present the feedback reports to school
staff and to invite them to make a concrete interpretation of the numerical
measures and graphical representation of the feedback results. These
concrete actions could be videotaped and consequently analyzed. This
could help to get more adequate information about the way principals or
Chapter 7
180
teachers interpret the data representations. In the literature, this
measurement approach has yet not been applied to research data-driven
decision making; though some preliminary results of observation studies
are reported in Santelices and Taut (2009), Van Petegem and Vanhoof
(2004), and Verhaeghe, Verhaeghe, Valcke, and Vanhoof (2008).
Furthermore, future research about SPF interpretation should also focus on
individual differences and preferences in data-interpretation, since little is
known about the impact of these differences on feedback interpretation.
We can also criticize the number of variables incorporated in the path-
model in Chapter 5 and the meta-matrix in Chapter 6. Our choice was
guided by the feasibility of the study and our prior research interest in
specific variables linked to data interpretation competences. This implies
that our research model presents a reduction of reality. It was therefore not
a surprise that not all variance in the endogenous variables could be
explained by the model used in Chapter 5 (34% unexplained variance in
total; only 11% explained variance in knowledge and skills to be related to
the support intervention). Also, remarks can be made about our scale
development and the lack of cross-validation of these instruments (Hoyle,
1995). However, as most studies were exploratory in nature, our findings
must be considered as preliminary results to be studied in depth in further
research. The studies outlined in Chapter 2 and 3, were of a descriptive
analytic nature. Though they do not result in spectacular findings, the
findings are valuable since they helped to develop the conceptual
framework for the following studies. However, an overall framework on SPF
use, comprising all relevant influencing factors is still lacking, as well as
empirical validation of all existing frameworks (Visscher, 2002; Visscher &
Coe, 2003). The literature about data usage and SPF use is growing; future
meta-evaluative research about influencing factors is advisable. However,
in this case a full validation of conceptual frameworks will remain difficult
since “not everything that can be counted counts, and not everything that
counts can be counted“(Cameron, 1963, p 13).
The studies in Chapters 5 and 6 built on a controlled field experiment.
This is very new in the SPFS literature. But it is clear that difficulties have
been encountered. First, we ran into ethical objections since not all
respondents were provided with the advantages of the onservice training
(ONSET). Next, the experimental conditions are bound to criteria for
controllability; this is not the case in reality (Rossi, Lipsey en Freeman,
2004). For example, the support intervention in the INSET-condition was
organized in such a way that questions of principals concerning their
personal school report could not get answered because of avoiding
interaction with the ONSET-condition. Only in the ONSET-condition there
Chapter 7
181
was room to discuss the school’s own feedback results. In normal
circumstances, we expect that principals would get input form the support
providers about particular school related questions. Finally, we still have to
question the extent to which we could control for the impact of
confounding, interacting variables in the field experiments. For example,
participants in the control condition were not prevented from search for
support in data use. We therefore promote the design and implementation
of more controlled field experiments and quasi experimental studies to
examine the factors affecting SPF use, especially in contexts where
feedback use is an integrated part of a school’s self-evaluation process.
4.3. Results
Issues can be raised concerning the validity, the limited explained variance,
the exploratory nature and the exemplary nature of our research findings.
We did already discuss these before. However, we want to stress that the
aim of the studies reported in this dissertation was not yet to come to
generalizable findings. Rather, we wanted to explore and illustrate school
feedback characteristics (Chapter 2), feedback use (Chapter 3), difficulties in
feedback interpretation (Chapter 4), and effects of feedback support
(Chapters 5 and 6). Furthermore, we have to stress that the conceptual
frameworks presented in Chapters 2 and 3, are not be considered as
comprehensive. Not all potentially relevant influencing factors have been
incorporated in Chapter 3, or all school feedback characteristics in Chapter
2.
Other limitations can be linked to the grounding of the studies in the
research literature. It has to be stressed that a broad domain of the
literature had to be explored in order to develop conceptual frameworks.
This body of the literature encompasses literature about school
effectiveness and school improvement, literature about data-driven
decision making, about SPFSs, about data representations, about cognitive
load, about inservice teacher training, about the evaluation of training
initiatives, feedback theory, etc. Though a clear attempt was made to build
on the most actual state of the literature, we are aware that there might be
shortcomings. However, the peer review experienced in the context of
conference and article submissions was a helpful step to guarantee a basic
quality of the work presented. In future studies, the literature base should
be expanded.
In all studies, and in particular in Chapter 4, we continuously highlighted
the interpretation problems in relation to school feedback information. This
might suggest that SPF is simply too complex in view of presenting relevant
Chapter 7
182
information. This does not hold for all aspects of SPF. Much depends on the
way information is gathered, reported and distributed. The Internet Testing
Unit (INTU) from the Centre for Evaluation and Monitoring, for example,
developed an “event mapper”. This is a self assessment tool to monitor a
school’s environment by asking students to build on questions by clicking
on an online map of the school. This can become very informative to detect
risk areas in a school; e.g., to detect and prevent bullying. This example
shows that SPF can build on inspiring and innovative ways to gather data,
process the information and to distribute feedback reports. Creativity in
developing these innovative directions is central to future research.
The results of the studies in Chapters 4, 5 and 6 mainly focused on
problems that feedback users did experience during the interpretation
phase. A similar analysis of obstacles during the phases of diagnosis,
planning, implementation and evaluation should be performed. This will
again result in a better understanding of the support needs of SPF users.
This future analysis of support needs is a prerequisite for designing
adequate support initiatives. This will require cooperation with the relevant
actors (i.e., school staff, feedback suppliers, inspection members,
educational advisors).
Furthermore, we want to stress that the use of SPF is something that
needs time to grow within the Flemish educational context. Some
disappointing results in our studies indicate that feedback use remains
mainly conceptual and that only preliminary school improvement effects
can be observed. The research findings about conceptual feedback use are
nevertheless promising as they might precede more intensive types of
feedback use (Schildkamp & Teddlie, 2008; Vanhoof, Verhaeghe,
Verhaeghe, Valcke, & Van Petegem, in press). SPF is therefore to be
considered as a large scale educational innovation that takes time to get
embedded in all facets of the educational arena and the thinking processes
and strategies of the actors involved.
5. Implications of the results
Drawing on the findings from the five studies outlined above, some
theoretical, methodological, practical and policy implications are suggested.
Some overlap with the directions for future research and are therefore
described rather concisely in the next paragraphs.
Chapter 7
183
5.1. Theoretical implications
A first - conceptual - implication is that a more refined description of SPF
use (see Chapter 3) has been developed and added to the research
literature. In addition to the detection of types in feedback use, also phases
in feedback use are now considered. Furthermore, also the existing types of
feedback use (instrumental, conceptual, symbolic and strategic) were
elaborated with a pupil-directed and motivating/stimulating feedback
usage type. Finally, more attention is now paid to the intermediate impact
of SPF instead of the narrow focus on the improvement of student
performance as a single school improvement effect.
Chapter 2 resulted in a more detailed framework for the analysis and
comparison of characteristics of SPFSs. After expanding and adjusting a
preliminary SPF framework, we could develop a set of standards for SPFS
developers and for SPF usage. This can result in the future in the
development of efficient instruments for data driven decision making.
Furthermore, it might inspire educational researchers to set up quasi-
experimental designs to study the way principals develop after receiving
different types of school feedback.
Finally, the experimental approach, reported in Chapter 4, presents an
innovative theoretical direction since it links the interpretation of SPF to
research about graphical data representations. In the context of data-driven
school improvement, this theoretical research field remains largely
unexplored. We expect that this study will trigger further assumptions and
empirical research about the way users approach numerical outcomes and
graphical representations as used in SPF reports.
5.2. Methodological implications
From a methodological point of view, some characteristics of our studies
inspire future research directions. The qualitative studies about SPF use
(Chapters 3 & 6) illustrate a particular controlled selection of participants
(e.g., theoretical sampling) and a particular systematic approach to the
analysis of the results (e.g., conceptually ordered predictor-outcome meta-
matrix). The way these qualitative studies have been set up can inspire
future qualitative research designs and the way they can tackle issues
related to developing a systematic approach and a clear analysis direction.
From a methodological point of view, our test calibration approach,
building on IRT, has proven to be adequate. The approach has several
advantages as compared to the application of classical test theory: (1) more
exact and reliable measures are obtained; (2) more information about the
Chapter 7
184
quality of the individual test items and the ability levels of the respondents
can be gathered. This information helps – in a better way – to track the
identification of interpretation difficulties. The latter results helped to
rethink how to present data to school staff and how to develop support
provisions. Furthermore, IRT allows to link several tests along a common
ability scale, creating opportunities to measure growth in ability over time.
Another methodological implication is the promotion of practical
research. The studies in Chapters 5 and 6 illustrate that it is possible to
evaluate the impact of workshops on different levels, looking beyond the
reaction level (Mathison, 1992; Rossi, Lipsey, & Freeman, 2004).
5.3. Practical implications
Since – in this dissertation - we mostly focused on applied research, our list
of practical implications is the longest. We can especially provide an
enumeration of ideas in view of the design or SPFSs and the related
implementation process. Many of these recommendations are especially
deduced from the discussion about SPFS characteristics that was elaborated
in Chapter 2.
First, school feedback system designers should try to minimize the
efforts for school staff and pupils in view of test administration. Developers
should try to build on data from existing management information systems
and available test item banks. Another efficiency measure builds on the
adoption of computer adaptive testing.
Second, with respect to the content of the feedback reports, more
attention should be paid to non-cognitive performance indicators and other
school subjects next to the predominant domains mathematics, language
and sciences. Attention should be paid to the development of attitudes
towards school and school subjects, socio-emotional variables (such as
wellbeing).
Third, data analysis approaches to produce school feedback should be
upgraded in order to adopt multilevel modeling and the statistical
adjustment for student background characteristics. However, raw (or
observed) scores should always be reported, because they refer to the
actual achievement level of a school. Users should get informed about the
shortcomings and strengths of the analysis methods used. Furthermore,
SPF designers should always try to find a balance between statistically
correct and user-friendly feedback.
Fourth, the presentation format of the school feedback should be well
considered. We advice to pursue more conformity in the way data sources
are present in a numerical way and in a graphical way. Furthermore,
Chapter 7
185
feedback designers should consider graphical representations that support
the processing of the represented information (Kluger & DeNisi, 1996;
Schnotz & Bannert, 2003). Feedback reports should be designed according
to the cognitive tasks that are necessary to understand the information
(e.g., line graphs to illustrate growth). Furthermore, the interpretability of
the feedback information should be evaluated in pilot studies. The latter is
important to guarantee that the feedback information and presentation
format fits the prior knowledge of the SPF users.
If the data literacy competences of school staff are insufficient to result
in a correct interpretation of the SPF, the provision of proper support is to
be taken up by the SPFS designers. However, since the support needs might
exceed the interpretation phase and users also encounter difficulties during
further steps of data use, other actors have to take up a support role. A
long-term cooperation with educational advisors and the educational
inspectorate could be helpful to create tailored ONSET trajectories. This will
require – during an initial phase - that these stakeholders are thoroughly
introduced into the characteristics and possibilities of the SPFSs.
The promotion of data-literacy competences could also become a part of
teacher education programs. If teachers are expected to adopt a role in the
quality assurance cycle of their school, they should be introduced to the
prevalent numerical measures and graphical representations that are
relevant for SPF interpretation. This is expected to prevent the pitfalls in
data use.
At a practical level, also recommendations for future SPF users can be
derived from our studies. Firstly, users should not use data from an SPFS
without being informed about its characteristics and possibilities.
Furthermore, users should expect and require that repeated measurements
are pursued to attain reliable results about the student performance being
studied. It is recommended that data from at least 3 consecutive school
years are used to develop school improvement actions (van de Grift, 2009).
In addition, data triangulation should be promoted, integrating the SPF
results with other data sources, in order to end up with grounded decisions.
Since we promote the “alert” use of SPF rather than a remedial usage, this
implies that SPF helps to develop an understanding of a school’s
functioning, and goes beyond offering clear-cut solutions and remediation
approaches. Finally, school principals should be encouraged to involve their
school team in discussing SPF. In such a way, they foster the development
of a data-driven school improvement approach and a distributed leadership
position in developing school policies (Huffman & Kalnin, 2003; Lachat &
Smith, 2005; Wayman, Midgley, & Stringfield, 2007).
Chapter 7
186
5.4. Policy implications
Finally, also policy implications follow from our research findings. First,
applied research should be promoted to drive both theory development,
the design of SPFSs and the implementation of a data-driven school
improvement approach (Broekkamp et al., 2009). The study in Chapter 2
illustrates that several SPFSs emerged from projects initially sponsored by
educational governments.
More resources should be made available to schools to get support in
the usage of SPF, to adopt the school wide use of - commercially - available
SPFSs and to create possibilities for spending time devoted to data use.
Furthermore, educational policy makers should be aware that the
creation of information-rich environments is not a guarantee for effective
feedback use. This requires the establishment of support initiatives. In
addition, the educational inspectorate needs to be informed about the
potential of SPFSs and should stimulate schools to effectively use the data
in their decision making, instead of merely adding it to the school quality
report. Furthermore, to stimulate school improvement and regular self-
evaluation at the school level, more initiatives to participate in low-stakes
testing should be promoted.
6. Final conclusion
To ensure that SPFSs will be used as intended (i.e., for school improvement
purposes), several conditions related to the users, the nature of SPF, the
available user support and the educational context have to be fulfilled.
Much can be gained when SPFSs provide schools with accurate, relevant
and user friendly data. Decisions made by SPFS developers about the design
of the SPFS affect school processes and learner results in ways that are not
yet fully understood. More research is needed to expand and adjust the
framework developed thus far. Furthermore, attempts to develop the data-
literacy competences of school staff are critical in view of the current trends
in macro-level school policies and the way schools have to develop their
autonomy. However, the first research findings in relation to SPF use in
Flemish schools are promising. Although - thus far - strong effects of SPF
use are lacking, there are some indications that data use is developing into
an accepted and standard feature of an internal school quality policy.
However, to bring current data use to a higher level, future research should
center on the evaluation of the impact of School Performance Feedback
Chapter 7
187
and on support provisions in view of the development of data literacy and
feedback use.
References
Beichner, R.J., (1994). Testing student interpretation of kinematics graphs.
American Journal of Physics, 62(8), 750-762.
Black, P. & William, D. (1998). Assessment and classroom learning.
Assessment in Education: Principles, Policy & Practice, 5(1), 7-75.
Broekkamp, H., Vanderlinde, R., van Hout-Wolters, B., & van Braak, J.
(2009). De relatie tussen onderwijsonderzoek en onderwijspraktijk
verkend in Nederland en Vlaanderen [The relation between educational
research and educational practice explored in The Netherlands and in
Flanders]. Pedagogische Studien, 86(4), 313-320.
Cameron, W.B. (1963). Informal Sociology: A casual introduction to
sociological thinking. New York: Random House.
Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of
instruction. Cognition and Instruction, 8(4), 293-332.
Clement, J. (1989). The concept of variation and misconceptions in
Cartesian graphing. Focus on Learning Problems in Mathematics, 11, 77-
87.
Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge
Journal of Education, 33(3), 383-394.
Earl, L.M., & Katz, S. (2006). Leading schools in a data-rich world:
Harnessing data for school improvement. Thousand Oaks, CA: Sage.
Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in
indicator systems: Doing things right and doing wrong things. Education
Policy Analysis Archives, 10(6), 1-28. Retrieved from
http://epaa.asu.edu/ojs/article/viewFile/285/411
Fitz-Gibbon, C.T. (1996). Monitoring education: Indicators, quality and
effectiveness. London: Cassell.
Flemish Government. (2007, February 6). 15 december 2006. - Decreet
betreffende de lerarenopleidingen in Vlaanderen [15 December 2006. -
Decree on teacher education in Flanders]. Belgian Official Gazette, pp.
5888-5897.
Gardner, R. (1995). Onservice Teacher Education. In L. W. Anderson (Ed.),
International Encyclopedia of Teaching and Teacher Education (pp. 628-
632). London: Pergamon Press.
Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code
of ethics for performance indicators. Research Intelligence, 57, 12-16.
Chapter 7
188
Goldstein, H. & Spiegelhalter, D.J. (1996). League tables and their
limitations: Statistical issues in comparisons of institutional
performance. Journal of the Royal Statistical Society: Series A: Statistics
in Society, 159(3), 385-443.
Hattie, J. & Timperley, H. (2007). The power of feedback. Review of
Educational Research, 77(1), 81-112.
Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses
relating to achievement. New York: Routledge.
Heck, R. (2006). Assessing school achievement progress: Comparing
alternative approaches. Educational Administration Quarterly, 42(5),
667-699.
Hoyle, R.H. (Ed.). (1995). Structural equation modeling: Concepts, issues and
applications.Thousand Oaks, CA: Sage.
Huffman, D., & Kalnin, J. (2003). Collaborative inquiry to make data-based
decisions in schools. Teaching and Teacher Education, 19, 569-580.
Karsten, S., Visscher, A.J., Bert Dijkstra, A., & Veenstra, R. Towards
standards for the publication of performance indicators in the public
sector: The case of schools. Public Administration, 88(1), 90-112.
Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels.
San Francisco: Berrett-Koehler.
Kluger, A.N., & DeNisi, A. (1996). The effects of feedback interventions on
performance: A historical review, a meta-analysis, and a preliminary
feedback intervention theory. Psychological Bulletin, 119(2), 254–284.
Kramarski, B. (2004). Making sense of graphs: Does metacognitive
instruction make a difference on students’ mathematical conceptions
and alternative conceptions? Learning and Instruction, 14(6), 593-619.
Lachat, M.A., & Smith, S. (2005). Practices that support data use in urban
high schools. Journal of Education for Students Placed at Risk, 10(3), 333-
349.
Leinhardt, G., Zaslavsky, O., & Stein, M.K. (1990). Functions, graphs, and
graphing: Tasks, learning, and teaching. Review of Educational Research,
60(1), 1-64.
Meyer, J., Shinar, D., & Leiser, D. (1997). Multiple factors that determine
performance with tables and graphs. Human Factors, 39(2), 268-286.
Mortimore, P. & Sammons, P. (1994). School effectiveness and value added
measures. Assessment in Education: Principles, Policy and Practice, 1(3),
315.
Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic
approach. Thousand Oaks: Sage.
Chapter 7
189
Rowe, K. & Lievesley, D. (2002). Constructing and using educational
performance indicators. Paper presented at the 2002 Asia-Pacific
Educational Research Association, Melbourne, Australia.
Rowe, K. (2004). Analysing and reporting performance indicator data:
'Caress' the data and user beware! Paper presented at the 2004 Public
Sector Performance and Reporting Conference, Sydney, Australia.
Santelices, V., & Taut, S. (2009, September). Comprehension and use of
value-added school performance indicators reported to teachers and
parents. Paper presented at the European Conference on Educational
Research, Vienna.
Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems
in the USA and in the Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schildkamp, K., & Visscher, A. (2009). Factors influencing the utilisation of a
school self-evaluation instrument. Studies in Educational Evaluation,
35(4), 150-159.
Schnotz, W., Bannert, M. (2003). Construction and inference in learning
from multiple representation. Learning and Instruction, 13(2), 141-156.
Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive
architecture and instructional design. Educational Psychology Review,
10(3), 251-296.
Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatie-
indicatoren als strategisch instrument voor schoolontwikkeling
[Feedback on school performance indicators as strategic instrument for
school improvement]. Pedagogische Studiën, 81, 338–353.
Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P.
(in press).The influence of competences and support on school
performance feedback use. Educational Studies.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
School Performance Feedback: Perceptions of Primary School Principals.
School Effectiveness and School Improvement, 21(2), 167-188.
Verhaeghe, G., Verhaeghe, J.P., Valcke, M, & Vanhoof, J. (2008, March).
Understanding school performance feedback: A contribution to the
development of effective school performance feedback. Paper presented
at the annual meeting of the American Educational Research
Association, New York.
Visscher, A.J. (2002). A framework for studying school performance
feedback systems. In A.J. Visscher & R. Coe (Eds.), School improvement
through performance feedback (pp. 41-71). Lisse, The Netherlands:
Swets & Zeitlinger.
Chapter 7
190
Visscher, A.J., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Vlaamse overheid, Beleidsdomein Onderwijs en Vorming (2010). Vlaams
onderwijs in cijfers, 2009-2010 [The Flemish education in numbers,
2009-2010]. Brussels: Scheys.
Wayman, J. C., Midgley, S., & Stringfield, S. (2007). Leadership for data-
based decision making: Collaborative educator teams. In A.B. Danzig, K.
M. Borman, B.A. Jones & W.F. Wright (Eds.), Learner-centered
leadership: Research, policy and practice (pp. 189-205). New Jersey, USA:
Lawrence Erlbaum Associates.
Yang, M., Goldstein, H., Rath, T., & Hill, N. (1999). The use of assessment
data for school improvement purposes. Oxford Review of Education,
25(4), 469-483.
Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for
effectiveness and improvement in classrooms and schools in upper
secondary education in Slovenia: Assessment of/for Learning Analytic
Tool. School Effectiveness and School Improvement, 20(1), 89-122.
191
NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH]
Samenvatting
192
NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH]
1. Inleiding
Van scholen wordt in groeiende mate verwacht dat ze van
schoolontwikkeling een systematisch proces maken en zich opstellen als
lerende organisatie (Nevo, 2002; Leithwood & Aiken, 1995). Om hen daarin
te ondersteunen worden informatierijke omgevingen gecreëerd. Zo worden
scholen ondermeer voorzien van feedback over hun functioneren en hun
prestaties aan de hand van speciaal daartoe opgezette
schoolfeedbacksystemen (SFSen). SFSen zijn externe systemen, bedoeld om
“performance” gerelateerde informatie te leveren aan scholen, op een
confidentiële manier. Dit gebeurt vanuit de verwachting dat scholen deze
feedback zullen aanwenden voor een zelfevaluatie en de interne
schoolontwikkeling (Visscher & Coe, 2002, p xi). Een belangrijk uitgangspunt
is dat schoolfeedback een meerwaarde zou vormen ten opzichte van de
bestaande informatiebronnen in scholen en de eigen intuïties en ervaringen
van schoolteamleden (Earl & Fullan, 2003).
Het gebruik van informatiebronnen als een omvattend
beleidsinstrument blijkt echter niet vanzelfsprekend te zijn. Doorgaans
blijven het gebruik en de schoolverbeteringseffecten beperkt (Coe, 2002;
Saunders & Rudd, 1999; Tymms, 1995; Schildkamp, Visscher, & Luyten,
2009; Van Petegem & Vanhoof, 2004; Zupanc, Urank, & Bren, 2009). Het
krijgen van specifieke schoolfeedback blijkt een noodzakelijke maar geen
voldoende stap voor het bevorderen van een systematische reflectie op
schoolniveau. Zowel binnen de scholen als de aan de kenmerken van
feedbacksystemen moet immers aan bepaalde voorwaarden voldaan zijn
(Visscher & Coe, 2003; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010).
Één van de belangrijkste hinderpalen die een effectief gegevensgebruik in
de weg staat, is het ontbreken van datageletterdheid bij de gebruikers (Earl
& Katz, 2006). Het is dan ook niet verwonderlijk dat uit heel wat
onderzoeksbevindingen blijkt dat schoolleiders en leerkrachten een
behoefte hebben aan bijkomende ondersteuning; zowel bij het
interpreteren als het verder gebruiken van de data (Schildkamp & Teddlie,
2008; Schildkamp, Visscher, & Luyten, 2009; Verhaeghe et al., 2010;
Visscher & Coe, 2003; Zupanc, Urank, & Bren, 2009).
Samenvatting
193
2. Conceptueel kader
Als theoretische basis voor de onderzoeken in het proefschrift werd vooral beroep gedaan op de wetenschappelijke literatuur over schooleffectiviteit, datagebruik (cf. data-driven decision making), datarepresentatie en nascholingsinitiatieven.
Twee centrale begrippen in de schooleffectiviteitsliteratuur zijn enerzijds schoolverantwoording en anderzijds schoolontwikkeling. Dit eerste begrip is vooral van toepassing op onderwijscontexten waarin centrale toetsing en externe controle centraal staan. Het tweede begrip verwijst naar een meer recente aanpak waarin gegevensgebruik binnen scholen voor zelfevaluatie en interne kwaliteitszorg centraal staan. Hoewel beide motieven voor datagebruik op het eerste gezicht tegenstrijdig lijken, komen beide benaderingen in de praktijk dikwijls samen voor (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank, & Bren, 2009). SFSen sluiten zeer expliciet aan op schoolontwikkeling omdat ze verondersteld worden de zelfreflectie te bevorderen. Maar om tot effectieve resultaten te kunnen leiden, dienen SFSen aan een aantal kwaliteitscriteria te voldoen. Daarvoor wordt verwezen naar literatuur over prestatie-indicatoren, die nuttig blijken wanneer ze relevante, accurate, kosteneffectieve en faire informatie aanreiken (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). Naast de inherente kwaliteit van de indicatoren blijken ook variabelen in de schoolsetting bepalend te zijn voor nuttig en succesvol gebruik. Zo wordt in de literatuur benadrukt dat instellingen of organisaties die feedbackinformatie leveren, alles moeten bewerkstelligen om bij te dragen aan positieve effecten bij de gebruikers (Goldstein & Myers, 1996; Fitz-Gibbon, 1996, Fitz-Gibbon & Tymms, 2002).
Om schoolfeedbackgebruik te omschrijven, vertrekken we van het conceptueel raamwerk van Visscher (2002; Visscher & Coe, 2003). Verschillen in schoolfeedbackgebruik en de effecten ervan worden toegeschreven aan vier cluster van factoren m.b.t. de (1) kenmerken van de gebruikers, (2) de feedback en het onderliggende SFS, (3) de geboden ondersteuning en (4) de educatieve context (Verhaeghe et al., 2010; Visscher & Coe, 2003). Deze factoren hebben een invloed op het schoolfeedbackgebruik, dat we omschrijven in termen van fasen in
schoolfeedbackgebruik en soorten van feedbackgebruik. Onderzoek leert dat om gebruik te maken van schoolfeedback het aangewezen is op een doordachte manier een cyclisch proces te doorlopen. In die cyclus wordt het (a) ontvangen, (b) lezen en bediscussiëren van de schoolfeedback onderscheiden, om (c) tot een correcte interpretatie te komen. Nadat de school een sterkte-zwakteanalyse van haar resultaten heeft gemaakt, volgt
Samenvatting
194
een fase waarin met de schoolfeedback aan de slag wordt gegaan. Deze omvat het (d) diagnosticeren door het zoeken naar verklaringen voor de resultaten en het (e) plannen, (f) uitvoeren en (g) evalueren van acties. Door een gebrek aan datageletterdheid en tijd blijken scholen deze stappen niet allemaal, of slechts moeizaam te doorlopen (Earl & Fullan, 2003; Verhaeghe et al., 2010). Naast deze cyclische aanpak, wordt bij het gebruik van feedback informatie verwezen naar types van gebruik. Hiervoor helpt de indeling van Rossi, Lipsey en Freeman (2004), die een onderscheid maken soorten gebruik van evaluatiegegevens; een indeling die we kunnen toepassen in de context van schoolfeedbackgebruik (Schildkamp, Visscher, & Luyten, 2009; Verhaeghe et al., 2010; Weiss, 1998). Scholen kunnen acties ondernemen (instrumenteel gebruik), aan het denken gaan (conceptueel gebruik), bevestiging zoeken van bestaande standpunten (symbolisch gebruik), het rapport in een verantwoordingcontext hanteren (strategisch gebruik) of het rapport gebruiken om teamleden te stimuleren of motiveren (motiverend gebruik).
Zoals reeds werd aangegeven in de inleiding, is het ultieme doel van schoolfeedbackgebruik bij te dragen aan schoolontwikkeling, ondermeer in termen van verbeterde leerresultaten van de lerenden (Visscher & Coe, 2002; 2003). Maar schoolfeedbackgebruik blijkt niet altijd reeds te resulteren in significant verbeterde leerlingprestaties (Fitz-Gibbon & Tymms, 2002; Schildkamp, Visscher, & Luyten, 2009; Visscher, 2002). Bij het nagaan van schoolverbeteringseffecten ligt het daarom voor de hand dat ook naar mediërende effecten gekeken worden; bijv. de effecten op de professionele ontwikkeling van teamleden (zoals een toenemende mate van assessment literacy; Zupanc, Urank, & Bren, 2009), de verbeterde onderwijsprocessen (zoals het intensifiëren van leerlingenbegeleiding, Schildkamp & Teddlie, 2008) en/of een verbeterd schoolfunctioneren (zoals het versterken van de cohesie in de school, Visscher & Coe, 2003). Schoolfeedbackgebruik kan ook resulteren in onbedoelde en onwenselijke effecten, zoals demotivatie bij leerkrachten, een overbevraging van leerkrachten (Fitz-Gibbon & Tymms, 2002) of een te sterke focus op getoetste leerinhouden, ook genoemd “teaching to the test” (Schildkamp & Teddlie, 2008; Visscher, 2002).
3. Het Schoolfeedbackproject: Een spiegel voor elke school Dit doctoraatsonderzoek werd opgezet in de context van het Schoolfeedbackproject genaamd “Each school its own mirror” (Verhaeghe & Van Damme, 2006). In het kader van dit project werd een prototype van een schoolfeedbacksysteem ontwikkeld. In de context van het ontwikkelingsonderzoek ontvingen 195 Vlaamse scholen jaarlijks feedback
Samenvatting
195
op vertrouwelijke basis, waarbij hun schoolresultaten vergeleken werden met een representatieve referentiegroep uit het SiBO-onderzoek (Maes, Van Petegem, & Van Damme, 2005). In het SiBO-onderzoek worden gegevens verzameld van een cohorte leerlingen die van het einde van het kleuteronderwijs tot en met de overgang naar het secundair onderwijs opgevolgd worden voor wiskunde en taal (spelling, technisch en begrijpend lezen) aangevuld met informatie over de instroomkenmerken van de leerlingen. Bij het uitwerken van de feedbackrapporten werd bij de vergelijking met de referentiegroep de betekenis van de eigen schoolprestaties uitgelegd aan de hand van een aantal centrale concepten: leerwinst, toegevoegde waarde en gecorrigeerde scores. Deze begrippen werden zodanig uitgelegd dat niet verwacht werd van de feedbackgebruikers veel statistische voorkennis te bezitten. De feedbackdata werden bovendien ondersteund met grafische voorstellingen (cirkeldiagrammen, groeicurven en kruistabellen). De tekst werd voor elke school gestandaardiseerd. Daarnaast werd van schoolteamleden verwacht om zelf de schooleigen data te interpreteren.
4. Onderzoeksdoelstellingen en –opzet
In het kader van dit doctoraatsonderzoek werden vijf onderzoeken opgezet
en uitgevoerd. De volgende vijf centrale onderzoeksdoelstellingen (OD)
stonden voorop:
• OD 1: Het verkennen van de kenmerken van SFSen
Hoofdstuk 2 maakt de lezer wegwijs in kenmerken van SFSen. Gegevens
werden verzameld door middel van vragenlijstenonderzoek en diepte-
interviews bij feedbackontwikkelaars. Een descriptieve analyse van vijf
SFSen leidde tot een eerste vergelijkend kader om een discussie over de
kenmerken van SFSen op gang te brengen.
• OD 2: Het ontwikkelen van een raamwerk voor het in kaart brengen van
schoolfeedbackgebruik, de beïnvloedende factoren en de verwachte
effecten
In hoofdstuk 3 wordt een raamwerk ontwikkeld en uitgetest om
schoolfeedbackgebruik, de beïnvloedende factoren en de uiteindelijke
effecten op de schoolwerking te beschrijven. Daarbij werden schoolleiders
uit het Schoolfeedbackproject geïnterviewd.
Samenvatting
196
• OD 3: Het verkennen van de datageletterdheidscompetenties van SFS
gebruikers
• OD 4: Het verkennen van effecten van alternatieve datarepresentaties
en de datageletterdheidscompetenties van SFS gebruikers
Enkele centrale concepten uit schoolfeedbackrapporten (vb. toegevoegde
waarde en leerwinst) worden in een experiment uitgetest op hun
interpreteerbaarheid. Respondenten werden random verdeeld over de
condities die verschillen in de manier waarop de centrale begrippen worden
uitgelegd en gerepresenteerd. Met behulp van gekalibreerde toetsen
(d.m.v. IRT-technieken) werd het vaardigheidsniveau van de respondenten
bepaald. Deze resultaten worden in het vierde hoofdstuk gerapporteerd.
• OD 5: Het verkennen van effecten van alternatieve vormen van
ondersteuning op schoolfeedbackgebruik
De hoofdstukken 5 en 6 pakken deze laatste onderzoeksdoelstelling aan
waarin de invloed van types van ondersteuning van schoolleiders uit het
Schoolfeedbackproject bij schoolfeedbackgebruik werd onderzocht.
Effecten van ondersteuning werden nagegaan door middel van
vragenlijsten en een gekalibreerde toets (Hoofdstuk 5) en diepte-interviews
(Hoofdstuk 6).
5. Voornaamste bevindingen
OD 1: Het verkennen van de kenmerken van SFSen
Tot nog toe ontbrak een helder kader om schoolfeedbacksystemen te
beschrijven en te vergelijken. In voorliggend proefschrift werd hiervoor een
eerste aanzet gegeven, met als doel zowel feedbackontwikkelaars als
feedbackgebruikers te informeren over de basiskenmerken van SFSen.
Daarbij komen ook voor- en nadelen van SFSen aan bod. Voor de aanpak
van deze onderzoeksdoelstelling werden de kenmerken van vijf SFSen met
betrekking tot hun dataverzamelingsmethode en technieken voor data-
analyse in kaart gebracht. Vervolgens werd de inhoud van de
feedbackrapporten kritisch ontleed, met inbegrip van de gebruikte
numerieke maten en de grafische representatievormen. Aparte aandacht
werd besteed aan de kwaliteitscriteria voor de geleverde feedback:
relevantie van de feedback, kosteneffectiviteit, accuraatheid, fairheid en
het benadrukken van positieve effecten (Fitz-Gibbon, 1996; Heck, 2006;
Samenvatting
197
Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher,
2002). Uit de analyse blijkt vooral dat de onderzochte SFSen heel sterk
verschillen in hun kenmerken.
Om een idee te kunnen krijgen op de accuraatheid van de data, moeten
we een goed zicht hebben van de gebruikte dataverzamelingsmethode.
Zowel gestructureerde testinstructies als meetinstrumenten werden
gebruikt. Interessant hierbij zijn het gebruik van technologieondersteunde
toepassingen. Vooral de combinatie van computeradaptief testen met
toetsen samengesteld uit itembanken en de gegevensuitwisseling met
studentenadministratiesystemen blijken grote voordelen op te leveren voor
de gebruiker.
Vervolgens werd gekeken naar de gebruikte methoden voor data-
analyse, de schaalconstructies, de mogelijkheden voor longitudinale
metingen en de gerapporteerde aggregatieniveaus. Voorts werd
onderzocht in welke mate rekening is gehouden met leerlingenmobiliteit.
Bij deze analyse stond centraal in welke mate voldaan werd aan de eisen
voor accuraatheid én gebruiksvriendelijkheid.
Daarna werd de feedbackinhoud van de verschillende SFSen nader
bekeken. Daarbij bleek dat de nadruk vooral ligt op cognitieve inhouden.
Verder werd ook onderzocht welke numerieke maten en grafische
representaties in de rapporten werden gebruikt. Er werd een zeer brede
waaier aan datarepresentaties vastgesteld. De keuze voor bepaalde
representatievormen heeft meteen gevolgen voor de veronderstelde
interpretatievaardigheden van de schoolfeedbackgebruikers.
OD 2: Het ontwikkelen van een raamwerk voor schoolfeedbackgebruik, de
beïnvloedende factoren en de verwachte effecten
Vertrekkende van het conceptueel raamwerk, ontwikkeld door Visscher
(2002; Visscher & Coe, 2003), werd een onderzoek opgezet om percepties
van schoolleiders over schoolfeedbackgebruik in kaart te brengen. Daarbij
werd aandacht besteed aan de beïnvloedende factoren, de fasen in
schoolfeedbackgebruik, de soorten van feedbackgebruik, en de uiteindelijk
effecten van feedbackgebruik op de schoolwerking. Informatie werd
verzameld door middel van diepte-interviews bij deelnemers aan het
Schoolfeedbackproject. Een analyse van deze resultaten hielp om het
conceptueel model van Visscher verder uit te breiden. Daarbij werden vier
clusters van factoren onderscheiden, die een invloed uitoefenen op
schoolfeedbackgebruik: factoren gerelateerd aan de onderwijscontext, aan
de gebruikers/school, aan de mogelijkheden voor ondersteuning en aan
kenmerken van het SFS. Schoolfeedbackgebruik werd - aanvullend op het
Samenvatting
198
kader van Visscher - ook omschreven in termen van te ondernemen
stappen in een cyclisch proces van datagebruik. De schoolleiders
rapporteerden daarbij voornamelijk problemen in de interpretatiefase. De
feedback van het Schoolfeedbackproject bleek in de meeste gevallen nog
niet geïntegreerd te zijn in de schoolwerking. Wat betreft types van
feedbackgebruik, werd maar zelden een instrumentele gebruiksvorm
gerapporteerd. Het is dan ook niet verwonderlijk dat heel wat
schoolverbeteringseffecten door schoolfeedbackgebruik bij deze bevraagde
groep uitbleven.
OD 3: Het verkennen van datageletterdheidscompetenties van SFS
gebruikers
Uit de resultaten van de vorige studies bleken de interpretatievaardigheden
van schoolfeedbackgebruikers beperkt. Dit is vooral kritisch omdat de
aangeboden representatievormen duidelijk een mate van
datageletterdheid veronderstellen. Daarom werd een experimenteel
onderzoek opgezet (Hoofdstuk 4) waarbij twee aanpakken voor het
verklaren van het begrip toegevoegde waarde en drie verschillende
representaties werden vergeleken in functie van hun interpreteerbaarheid.
Respondenten volgden een gestandaardiseerde instructie (doornemen van
een schoolfeedbackrapport in de rol van een schoolleider die de resultaten
van de eigen school te zien krijgt) en er werden kennis- en
vaardigheidstoetsen afgenomen. De toetsen, waarvan de resultaten door
middel van IRT-technieken werden geanalyseerd, helpen de
moeilijkheidsgraad van ieder toetsitem te bepalen en helpen eveneens het
vaardigheidsniveau van de deelnemers te bepalen in het interpreteren van
de feedbackinformatie. Ook werd gezocht naar patronen in de fouten, die
kunnen verwijzen naar misconcepties bij de respondenten. Uit de
resultaten blijkt dat vooral de procedurele toetsvragen, waarbij gevraagd
werd om resultaten van toegevoegde waarde te interpreteren van grafische
representaties, moeilijkheden opleveren (slechts 35% van de respondenten
losten deze correct op). Dit kan verklaard worden door de hoge eisen die
hierbij gesteld worden aan het werkgeheugen (cf. cognitive load theory;
Chandler & Sweller, 1991; Sweller, van Merriënboer, & Paas, 1998). Eén
bepaalde misconceptie bleek zeer vaak voor te komen, waarbij de
hellingsgraad en de hoogte van curves verkeerd geïnterpreteerd werden
(cf. slope-height confusion; Beichner, 1994; Clement, 1989; Kramarski,
2004; Leinhardt, Zaslavsky, & Stein, 1990).
In het vijfde hoofdstuk werd een vergelijkbare datageletterdheidstoets
gebruikt om het kennis- en vaardigheidsniveau te bepalen van
Samenvatting
199
schoolleiders, na het ontvangen van hun feedbackrapport in de context van
het Schoolfeedbackproject. Hieruit bleek dat slechts 42% van de
deelnemers erin slaagde om de helft van de items correct te
beantwoorden. Deze zwakke resultaten komen niet overeen met de hogere
inschatting van hun eigen kennisniveau (vijfpuntenschaal; M = 3.81, SD
=0.74).
Datageletterdheidscompetenties bestaan naast kennis en vaardigheden
ook uit attitudes ten aanzien van schoolfeedbackgebruik. Wanneer we
hiernaar peilden bij de schoolleiders (hoofdstukken 5 en 6), bleek dat zij
een positieve houding aannemen en er van uitgaan dat dit soort
datagebruik bij hen aanzet tot zelfevaluatie. Maar tegelijkertijd geven ze
aan dat hun leerkrachten een stuk minder positief tegen schoolfeedback
aankijken. Mogelijke verklaringen hiervoor zijn dat leerkrachten vooral
geconfronteerd worden met de lasten van de dataverzameling, zich
bovendien bedreigd voelen door deze evaluatie en een voorkeur hebben
voor leerlingendata van hun eigen klas, in plaats van geaggregeerde
gegevens op schoolniveau.
OD 4: Het verkennen van de effecten van alternatieve datarepresentaties en
de datageletterdheidscompetenties van SFS gebruikers
Een samenspel van een beperkte voorkennis en de inherent complexe
feedbackinformatie blijkt te leiden tot zwakke toetsscores bij de
respondenten in de experimentele groep. De aanpak om het begrip
toegevoegde waarde uit te leggen in termen van “het verschil tussen
geobserveerde en verwachte gemiddelde” leidde tot betere toetsscores
dan de aanpak om het begrip uit te leggen in termen van “het verschil
tussen het gecorrigeerde gemiddelde en het gemiddelde voor de
referentiegroep”. Verder blijkt uit de resultaten dat het toevoegen van
tabellen aan de groeicurven niet bijdraagt tot een betere
feedbackinterpretatie. Dit opvallende resultaat doet vragen rijzen bij de rol
van gebruikte representatievormen. Afhankelijk van welke informatie
afgelezen moet worden van deze figuren, is de ene dan wel een andere
representatievorm geschikt (Schnotz & Bannert, 2003).
OD 5: Het verkennen van de effecten van alternatieve
ondersteuningsaanpakken op schoolfeedbackgebruik
In de hoofdstukken 5 en 6 worden de resultaten van een
ondersteuningsinterventie gerapporteerd. Schoolleiders uit het
Schoolfeedbackproject (n = 195) namen deel aan een experiment waarin ze
Samenvatting
200
ad random werden toegewezen aan één van de volgende drie condities:
één conditie waarbij ondersteuning op school werd aangeboden (ONSET, n
= 7), één waarbij de ondersteuning plaatsvond op een locatie buiten de
school (INSET, n = 23) en één waarbij geen ondersteuning werd
aangeboden (controlegroep, n = 150). De INSET-groep werd uitgenodigd op
een studievoormiddag in een universiteitsgebouw. Zij kregen uitleg over de
interpretatie van de feedbackrapporten en over de gebruiksmogelijkheden
en dit aan de hand van een fictief scholenrapport. Dezelfde uitleg kwam
aan bod in de ONSET-groep, maar daarbij werd de schoolleider op de school
bezocht en werden de eigen schoolresultaten in de training betrokken.
Kirkpatricks model (1998) voor de evaluatie van trainingsinitiatieven bood
daarbij de structuur aan voor de evaluatie van de resultaten uit deze studie.
In het reactieniveau werd nagegaan in hoeverre de deelnemers tevreden
waren over de ondersteuning. Vervolgens werd - op het leerniveau -
nagegaan of er sprake was van een toename in
datageletterdheidscompetenties (kennis, vaardigheden, attitudes). Daarna
werd op het gedragsniveau onderzocht of wat geleerd werd ook toegepast
werd binnen de school. Tenslotte werd op het resultaatsniveau bekeken of
er sprake was van schoolverbeteringseffecten in de verschillende
ondersteuningscondities, als gevolg van het schoolfeedbackgebruik.
In hoofdstuk 5 werden enkel de INSET- en de controlegroep vergeleken.
De relatie tussen de verschillende variabelen werden uitgetest in een
padmodel (X² (df) = 11.3 (13), p = 0.58; RMSEA = 0.01; AGFI = 0.92; GFI =
0.97). De toetsing van dit model toonde aan dat de ondersteuning enkel -
op een directe manier - leidde tot significant hogere scores op de kennis- en
vaardigheidstoets en op een hogere inschatting van de eigen
datageletterdheid. Indirecte effecten werden vastgesteld door de
ondersteuningsinterventie op de fasen in gebruik en types van gebruik.
De kwalitatieve studie, gerapporteerd in hoofdstuk 6, maakte gebruik
van een metamatrix waarin de verschillende onderzoeksdeelnemers
geordend werden per conditie (ONSET, INSET en controle) en naar mate
van feedbackgebruik. ONSET-deelnemers rapporteerden een hogere mate
van tevredenheid, een sterkere beheersing van
datageletterdheidscompetenties en een intensiever doorlopen van de fasen
lezen en bespreken, interpreteren en diagnosticeren.
6. Conclusie
Onderwijsoverheden verwachten van scholen dat ze data aanwenden voor hun interne kwaliteitszorg. Uit de resultaten van de hier gerapporteerde
Samenvatting
201
onderzoeken blijkt dat datagebruik in de context van schoolfeedbackgebruik eerder beperkt blijft. Kritische feedbackgerelateerde begrippen zoals “leerwinst”, “toegevoegde waarde”, en “outputmetingen” blijken de eerder statistisch ongeletterde schoolleiders te overdonderen en als gevolg daarvan nauwelijks te informeren over de eigen effectiviteit van de schoolwerking. Het aanbieden van schoolfeedback blijkt niet automatisch te leiden tot zelfreflectie. Om te garanderen dat schoolfeedback gebruikt wordt voor schoolverbeteringsinitiatieven, moet namelijk aan een aantal voorwaarden voldaan zijn m.b.t. de gebruikers, de SFSen, de ondersteuning en de educatieve context. De onderzoeksresultaten geven aan dat nog veel kan verbeterd worden aan de accuraatheid, relevantie en gebruiksvriendelijkheid van de geleverde schoolfeedback. Dit betekent dat meer evaluatieonderzoek nodig is in relatie tot schoolfeedbackinitiatieven. Wat de onderzoeksresultaten zeer sterk duidelijk maken, is dat veel aandacht moet geschonken worden aan de ontwikkeling van datageletterdheidscompetenties van feedbackgebruikers. Pas dan kan verwacht worden dat de kansen tot zelfreflectie en autonome kwaliteitszorg ten volle benut worden. De vraag naar een dergelijke ondersteuning gaat verder dan louter een ondersteuning bij de interpretatie van de data. Scholen willen ook op weg gezet worden bij het nemen van beslissingen op basis van hun schoolfeedback. Om hierop in te gaan, zal intensieve samenwerking tussen feedbackleveranciers, inspectieleden en pedagogische begeleiders nodig zijn; vooral om ondersteuning op maat te kunnen aanbieden. Daarnaast moeten scholen aangezet worden om deze feedbackgegevens aan te grijpen om eigen inzichten en eerdere bevindingen te vergelijken en te integreren in hun dagelijkse werking. Het effectief leren gebruiken van schoolfeedback is dan een nieuwe taak voor leerkrachten die momenteel vooral gewoon zijn om individuele leerlinggegevens van de eigen klas te verwerken. Maar gebruikers moeten ook geïnformeerd worden over de sterke en zwakke punten van de geleverde feedback. Niettegenstaande de uiteindelijke effecten van schoolfeedbackgebruik in onze studies beperkt bleven, zijn er wel indicaties gevonden die de meerwaarde aantonen van schoolfeedbackgebruik. Gezien voorliggend onderzoek duidelijke beperkingen heeft - bijvoorbeeld in termen van de omvang, de onderzoeksopzet en de gekozen afhankelijke en mediërende variabelen - is verder onderzoek op basis van deze eerste bevindingen aangewezen.
Samenvatting
202
Literatuur
Beichner, R.J., (1994). Testing student interpretation of kinematics graphs.
American Journal of Physics, 62(8), 750-762.
Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of
instruction. Cognition and Instruction, 8(4), 293-332.
Clement, J. (1989). The concept of variation and misconceptions in
Cartesian graphing. Focus on Learning Problems in Mathematics, 11, 77-
87.
Coe, R. (2002). Evidence on the role and impact of performance feedback in
schools. In A. J. Visscher & R. Coe (Eds.), School improvement through
performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger.
Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge
Journal of Education, 33(3), 383-394.
Earl, L.M., & Katz, S. (2006). Leading schools in a data-rich world:
Harnessing data for school improvement. Thousand Oaks, CA: Sage.
Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in
indicator systems: Doing things right and doing wrong things. Education
Policy Analysis Archives, 10(6), 1-28. Retrieved from
http://epaa.asu.edu/ojs/article/viewFile/285/411
Fitz-Gibbon, C.T. (1996). Monitoring education: Indicators, quality and
effectiveness. London: Cassell.
Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code
of ethics for performance indicators. Research Intelligence, 57, 12-16.
Heck, R. (2006). Assessing school achievement progress: Comparing
alternative approaches. Educational Administration Quarterly, 42(5),
667-699.
Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School self-
evaluation and student achievement. School Effectiveness and School
Improvement, 20(1), 47-68.
Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels.
San Francisco: Berrett-Koehler.
Kramarski, B. (2004). Making sense of graphs: Does metacognitive
instruction make a difference on students’ mathematical conceptions
and alternative conceptions? Learning and Instruction, 14(6), 593-619.
Leinhardt, G., Zaslavsky, O., & Stein, M.K. (1990). Functions, graphs, and
graphing: Tasks, learning, and teaching. Review of Educational Research,
60(1), 1-64.
Leithwood, K., Aitken, R., & Jantzi, D. (2006). Making schools smarter:
Leading with evidence. (3rd. ed.) Tousand Oaks, CA: Corwin Press.
Samenvatting
203
Maes, F., Van Petegem P., & Van Damme, J. (2005). Schoolloopbanen in het
basisonderwijs (SiBO): Doelstellingen en onderzoeksopzet. Paper
gepresenteerd op de Onderwijs Research Dagen, Gent, België.
Maier, U. (2010). Accountability policies and teachers' acceptance and
usage of school performance feedback - a comparative study. School
Effectiveness and School Improvement, 21(2), 145-165.
Nevo, D. (2002). Dialogue evaluation: Combining internal and external
evaluation. In D. Nevo (Ed.), School-based evaluation: An international
perspective (pp. 3–16). Oxford, UK: Elsevier Science.
Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic
approach. Thousand Oaks: Sage.
Rowe, K. & Lievesley, D. (2002). Constructing and using educational
performance indicators. Paper presented at the 2002 Asia-Pacific
Educational Research Association, Melbourne, Australia.
Rowe, K. (2004). Analysing and reporting performance indicator data:
'Caress' the data and user beware! Paper presented at the 2004 Public
Sector Performance and Reporting Conference, Sydney, Australia.
Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’
data: A science in the service of an art? Paper presented at the British
Educational Research Association Conference, Brighton, University of
Sussex.
Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems
in the USA and in the Netherlands: A comparison. Educational Research
and Evaluation, 14(3), 255-282.
Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school
self-evaluation instrument. School Effectiveness and School
Improvement, 20(1), 69-88.
Schnotz, W., Bannert, M. (2003). Construction and inference in learning
from multiple representation. Learning and Instruction, 13(2), 141-156.
Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive
architecture and instructional design. Educational Psychology Review,
10(3), 251-296.
Tymms, P. (1995). Influencing educational practice through performance
indicators. School Effectiveness and School Improvement, 6(2), 123-145.
Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatie-
indicatoren als strategisch instrument voor schoolontwikkeling
[Feedback on school performance indicators as strategic instrument for
school improvement]. Pedagogische Studiën, 81, 338–353.
Vanhoof, J. & Van Petegem, P. (2007). Matching internal and external
evaluation in an era of accountability and school development: Lessons
Samenvatting
204
from a Flemish perspective. Studies in Educational Evaluation, 33(2),
101-119.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
School Performance Feedback: Perceptions of Primary School Principals.
School Effectiveness and School Improvement, 21(2), 167-188.
Verhaeghe, J.P., & Van Damme, J. (2006). School performance feedback in
Vlaanderen, een schets op basis op van een projectvoorstel. Informatie
vernieuwing onderwijs (IVO), 27(103), 19-27.
Visscher, A.J. (2002). A framework for studying school performance
feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement
through performance feedback (pp. 41-71). Lisse, The Netherlands:
Swets & Zeitlinger.
Visscher, A.J., & Coe, R. (2003). School performance feedback systems:
Conceptualisation, analysis, and reflection. School Effectiveness and
School Improvement, 14(3), 321-349.
Weiss, C.H. (1998). Have we learned anything new about the use of
evaluation? American Journal of Evaluation, 19(1), 21-33.
Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for
effectiveness and improvement in classrooms and schools in upper
secondary education in Slovenia: Assessment of/for Learning Analytic
Tool. School Effectiveness and School Improvement, 20(1), 89-122.
205
RESEARCH VALORISATION: PUBLICATIONS
Publications
206
RESEARCH VALORISATION: PUBLICATIONS
1. Articles in SSCI journals (a1)
1.1. Published – in press
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using
School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188.
Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The influence of competences and support on school performance feedback use. Educational Studies.
1.2. Submitted
Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Van Petegem, P., & Valcke, M.
(2010). School characteristics facilitating school performance feedback use by teachers. Manuscript submitted for publication in School
Effectiveness and School Improvement. Verhaeghe, G., Schildkamp, K., & Luyten, H. (2010). Characteristics of School
Performance Feedback Systems. Manuscript submitted for publication in Educational Administration Quarterly.
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten van ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën.
Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of schools: How to represent school feedback information. Manuscript submitted for publication in The Journal of Educational Research.
2. Articles in journals not included in the SSCI (a3)
Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (in press).
Datageletterdheid versterken bij scholen: Lessen uit het Schoolfeedbackproject [Strengthening the data literacy in schools: Lessons from the School Feedback Project]. Kwaliteitszorg in Het
Onderwijs. Vanhoof, J., Verhaeghe, G., Van Petegem, P., Verhaeghe, J.P., & Valcke, M.
(2009). Verschillen in het gebruik van schoolfeedback: Een verkenning van verklaringsgronden [Differences in school performance feedback use: An exploration of explanations]. Tijdschrift voor Onderwijsrecht &
Onderwijsbeleid, 2009(4), 306-322.
Publications
207
Verhaeghe, G., Vanhoof, J., Van Petegem, P., Verhaeghe, J.P., & Van Damme, J. (in press). Het gebruik van outputgegevens in basisscholen: Concretiseringen en illustraties uit het Schoolfeedbackproject [The use of output results in primary schools: Concretizations and illustrations from the School Feedback Project). Kwaliteitszorg in Het Onderwijs.
3. Chapters in books (b2)
Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Van Petegem, P., & Valcke, M.
(2010). Improving data literacy in schools: Lessons from the School Feedback Project. In K. Schildkamp, M.K. Lai & L. Earl (Eds.), Data-driven
decision making around the world: Challenges and opportunities. Manuscript submitted for publication.
4. Conference contributions
Verhaeghe, G., Verhaeghe, J.P. (2006, December). School Performance
Feedback als instrument voor kwaliteitszorg en middel tot reflectie over
schoolbeleid. Paper presented at the Vlaams Forum voor Onderwijsonderzoek, Antwerp.
Verhaeghe, G., Verhaeghe, J.P. (2007, June). Verstaanbare schoolfeedback
een realiteit? Paper presented at the Onderwijs Research Dagen (ORD), Groningen.
Verhaeghe, G., Verhaeghe, J.P., (2007, September). An attempt to develop
effective school performance feedback. Paper presented at the preconference of the European Conference on Educational Research, Ghent.
Verhaeghe, G., Verhaeghe, J.P., Valcke, M, & Vanhoof, J. (2008, March). Understanding school performance feedback: A contribution to the
development of effective school performance feedback. Paper presented at the annual meeting of the American Educational Research Association, New York.
Verhaeghe, G., Vanhoof, J., & Van Petegem, P. (2008, June). Diepte-
interviews naar het gebruik van schoolfeedback. Paper presented at the Onderwijs Research Dagen, Eindhoven.
Verhaeghe, G., Vanhoof, J., Verhaeghe, J.P., & Van Petegem, P. (2008, September). Feedback on school performance feedback: In-depth
interviews about the comprehensibility and usability. Paper presented at the European Conference on Educational Research, Göteborg.
Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (2009, January). The effect of support on the interpretation and use of school feedback.
Publications
208
Poster presented at the International Congress for School Effectiveness and School Improvement, Vancouver.
Verhaeghe, G., Vanhoof, J., Verhaeghe, J.P., & Van Petegem, P. (2009, January). Feedback on the use and interpretation of school performance
feedback: Perceptions of primary school principals. Paper presented at the International Congress for School Effectiveness and School Improvement, Vancouver.
Vanhoof, J., Verhaeghe, G., & van Petegem, P. (2009, May). Schoolfeedbackgebruik: Proces, resultaat en impact van ondersteuning. Paper presented at the Onderwijs Research Dagen, Leuven.
Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (2010, September). Does support matter in interpreting and using school
feedback? Findings from a quasi-experimental study. Paper presented at the European Conference on Educational Research, Vienna.
Verhaeghe, G., Vanhoof, J., Van Petegem, P., & Valcke, M. (2010, January). Supporting school performance feedback use: An experimental study. Poster presented at the International Congress for School Effectiveness and School Improvement, Kuala Lumpur.
Vanhoof, J., Verhaeghe, G., & Van Petegem, P. (2010, January). Data use
and the impact of a training initiative of data use. Symposium paper presented at the International Congress for School Effectiveness and School Improvement, Kuala Lumpur.
Verhaeghe, G., Vanhoof, J., Van Petegem, P., & Valcke, M. (2010, August). Supporting School Performance Feedback Use: An Experimental Study. Paper presented at the European Conference on Educational Research, Helsinki.