(Garrard 2009)

8/6/2019 (Garrard 2009)

1/16

Cognitive archaeology: Uses, methods, and results

Peter Garrard

University of Southampton School of Medicine, Division of Clinical Neurosciences, Southampton General Hospital,

LD69 South Path and Lab Block, Southampton SO16 6YD, UK

Received 9 April 2008; received in revised form 25 July 2008; accepted 29 July 2008

Abstract

The earliest stages of cognitive decline in cases of slowly progressive dementia are difficult to pinpoint,

yet detection of the preclinical period of the illness is likely to be of significant importance to under-

standing Alzheimers disease and other slowly progressive dementias at both clinical and biological levels.

A number of authors have used retrospective analysis to describe preclinical linguistic decline in written

texts and spoken language samples. This paper reviews the methods available for classifying and

comparing such samples, and presents some exploratory analyses of historical texts derived from verbatim

records of preclinical spoken activity. Change in the nature of the language used by Harold Wilson (Prime

Minister of the United Kingdom 1964e1970 and 1974e1976) is quantified in the light of a later diagnosis

of probable Alzheimers disease and historical uncertainties about his final months in office.

2008 Elsevier Ltd. All rights reserved.

Keywords: Alzheimers disease; Mild cognitive impairment (MCI); Textual analysis; Digital stylometry

1. Introduction

Functional reserve is a property of many biological systems whose performance depends on

the cumulative effects of populations of similarly-structured subunits. Mammals are, for

example, endowed with pairs of organs (lungs, kidneys, adrenal glands, and gonads), whose

physiological effects under normal conditions are not detectably changed if one of the pair is

lost or otherwise rendered inoperative, and the other partially compromised. Unpaired organs,

such as the liver and heart, will also continue to meet circulatory and metabolic/digestive

E-mail address: [email protected]

0911-6044/$ - see front matter 2008 Elsevier Ltd. All rights reserved.

doi:10.1016/j.jneuroling.2008.07.006

Journal of Neurolinguistics 22 (2009) 250e265www.elsevier.com/locate/jneuroling
mailto:[email protected]://www.elsevier.com/locate/jneurolinghttp://www.elsevier.com/locate/jneurolingmailto:[email protected]

8/6/2019 (Garrard 2009)

2/16

demands in the face of marked depletion of their constituent cells. Given the obvious adaptive

advantages of this redundancy to creatures competing for reproductive success in a Hobbesian

environment, it would be surprising if the organ system that mediates behaviour, learning,

perception, planning, decision making, and communication was not similarly endowed.

In its mature state, the analogy with other paired organs is not applicable to the hemisphericstructure of the brain (though the similarity may hold during development (de Bode & Curtiss,

2000; Vicari et al., 2000)). Moreover, the demonstrable heterogeneity of function within

different regions of the cerebral cortex means that the effects of a focal insult are seldom

completely ameliorated by compensatory activity in undamaged regions. Nonetheless, the

brains capacity to support normal levels of cognitive activity in the face of gradual decline in

the structural and functional integrity of its constituent elements implies that a degree of

redundancy is indeed built into the systems architecture, a point that is strikingly illustrated by

the not uncommon finding of marked cerebral atrophy on CT or MRI brain scans of cognitively

normal elderly subjects (Matsubayashi et al., 1992).

The existence of a functional reserve capacity in the brain (Fig. 1) is supported, and can to

a limited extent be quantified, by postmortem studies of the nigrostriatal systems of aged brains.

Neuronal depletion within these paired mid-brain structures produces the classical, idiopathic

form of Parkinsons disease (a syndrome of progressive motor dysfunction characterized by the

emergence of tremor, rigidity and loss of dexterity), whose earliest effects are often relieved by

the pharmacological supplementation of the neurotransmitter dopamine. Postmortem exami-

nation of the striatum has revealed cases with marked degrees of dopamine depletion associated

with only mild motor symptoms at the time of death, suggesting a significant, if variable,

functional reserve capacity inherent in the system (Bernheimer et al., 1973).

The notion of a reserve capacity for more global measures of cognitive function has alsobeen upheld by postmortem studies focusing on the common causes of late-onset dementia

syndromes. The most pervasive of these is Alzheimers disease (AD), which gives rise to

0

10

20

30

40

50

60

70

80

90

100

Years

%I

ntegrity

C

B

A

0 5 10?

Neuronal function

Cognitive function

Fig. 1. Illustrative representation of the course of a neurodegenerative condition at the functional and neuronal levels.

The diagram includes three key points in the clinical evolution of dementia: the onset of the earliest symptoms (point

A); the diagnosis of the condition (point B); and death (point C). The duration of the period to the left of point A is

unknown.

251P. Garrard / Journal of Neurolinguistics 22 (2009) 250e265

8/6/2019 (Garrard 2009)

3/16

a progressive and irreversible decline in a range of cognitive abilities, typically beginning with

episodic memory (Galton et al., 2000). The clinical features of AD normally begin in the sixties

or seventies, with diagnosis depending on the recognition of the typical pattern of symptom-

atology and the exclusion of other, more unusual causes of late-life cognitive decline. Verifi-

cation of the diagnosis, however, requires the demonstration of amyloid plaques (AP) andneurofibrillary tangles (NFT) within the substance of the brain, normally at autopsy.

Although postmortem examination is carried out in only a minority of cases, careful corre-

lations between clinical and pathological findings have revealed a complex relationship between

these two descriptive levels. The seminal studies ofBraak and Braak (1995) traced a characteristic

sequence of NFT formation that followed a neuroanatomical pathway through entorhinal, limbic,

and finally isocortical stages. Tomlinson, Blessed, and Roth (1970) demonstrated that cognitive

function could be preserved in the presence of established degenerative disease, suggesting that

clinical dementia, like Parkinsonism, may occur when pathological change exceeds a certain

threshold level. More recent community-based surveys of autopsy findings in an unselected

sample of elderly people in the United Kingdom found evidence of vascular and/or degenerative

changes in almost 80% (Neuropathology Group of MRC CFAS, 2001), a figure that sits in striking

contrast to estimated clinically-defined dementia prevalence rates among the oldest old of

around 25% (Fichter et al., 1995). Perhaps most striking of all was a cross-sectional neuropath-

ological study of incidental Alzheimer changes in postmortem brains across a wide age spectrum,

which suggested that a disease whose clinical manifestations typically appear in the seventh or

eighth decade of life may begin to develop in early adulthood (Ohm et al., 1995).

The ability to identify and measure the earliest phase of AD e ie after the earliest patho-

logical changes but before the patient meets diagnostic criteria for dementia (ie anywhere to the

left of point B in Fig. 1)e

could provide important insights into the phenomenon ofcognitive reserve. Since the duration of this presymptomatic period reflects the capacity of the

reserve, any marked degree of variability would clearly be of further interest e as well as

enormous socioeconomic importance e if any environmental factors (eg diet, education,

intellectual engagement in later life) could be shown to be positively or negatively correlated

with it. Before discussing existing and future attempts to acquire this information, however, the

clinical characteristics of patients with established AD will be briefly reviewed.

Neuropsychological studies of AD have revealed a cumulative pattern of deficits mapping on to

the anatomical pattern of progression described by Braak and Braak (1995): episodic memory

deficits (attributable to mesial temporal and limbic involvement) usually occur in the earlieststages,

while effects on semantic memory, visuospatial skills, word production, and executive function(indicating disruption of neocortical regions) emerge later. In a minority of (usually younger onset)

cases, bimanual praxis is the earliest and clinically dominant feature (biparietal variant AD)

(Galton et al., 2000; Ross et al., 1996). Given the language systems complexity and dependence on

multiple and widespread cortical regions, it is perhaps not surprising that detailed studies of

language processing in AD have provided some of the most valuable contributions at the neuro-

psychological level. Analyses of individuals and groups have demonstrated disruption in produc-

tion and comprehension at both word and sentence level (Croot, Hodges, & Patterson, 1999; Croot

et al., 2000; Kempler et al., 1998), and disintegration of semantic memory (Garrard et al., 1998).

2. Measuring cognitive reserve

Neuropsychological data is not normally acquired until there are already clinical grounds for

making a diagnosis of AD (ie some way down the cognitive function slope illustrated in Fig. 1),

252 P. Garrard / Journal of Neurolinguistics 22 (2009) 250e265

8/6/2019 (Garrard 2009)

4/16

and is therefore unable to tell us anything about the trajectory of the line prior to point B.

Moreover, many studies have employed assessment techniques based on standardised tasks

such as word fluency and picture naming, all of which may be subject to effortful compen-

sation, or to ceiling or floor effects depending on premorbid educational level (OCarroll &

Ebmeier, 1995). Finally, because theoretically motivated methods of evaluation are designed totest hypotheses about functional organisation, such tests tend to be sensitive to a relatively

narrow subset of deficits.

The importance of the earlier, prediagnostic, period of patients cognitive histories has given

rise to the concept of mild cognitive impairment (MCI), a state in which a patient may be

aware of and report symptoms of cognitive dysfunction, but which is at too early a stage of

progression to justify a diagnosis of dementia (between points A and B in Fig. 1)

(Bruscoli & Lovestone, 2004). As we have seen, diagnosing AD is probabilistic (ie a judge-

ment of the future likelihood of finding plaque and tangle pathology in the brain), and the

recognition of MCI adds a further layer of uncertainty e namely the need to distinguish

between an essentially stable state of mild impairment (eg due to anxiety, depression, or the

ageing process) and one that is destined to deteriorate at some future time (ie those in the

earliest stages of AD).

It is perhaps not surprising that patients with MCI are neuropsychologically heterogeneous

(Nordlund et al., 2005), nor that the proportion of MCI cases who go on to develop dementia

within a year is highly variable, in some samples lower than 10% (Bruscoli & Lovestone,

2004). Consequently, using the duration of MCI as a surrogate for the cognitive reserve

capacity is difficult to justify, and highlights the need for a retrospective approach that allows

a reliable index of the duration of the preclinical period to be reproducibly obtained.

A number of studies have already demonstrated how this might be effectively achieved usingarchived language samples dating back years, or even decades, before the onset of cognitive

symptoms. Such outputs are free from the distorting effects that knowledge or suspicion of

incipient cognitive decline might have on performance, and are interpreted under three basic

assumptions: 1) that the material in question is reliably datable; 2) that there are measurable

differences between the characteristics of such samples from individuals with normal and

disordered cognition; and 3) that these differences become more pronounced with progression

of the disease.

If these conditions are met, then the onset of any relevant change in a text corpus should be

identifiable. This will, in turn, allow objective and reproducible estimates of the duration of the

presymptomatic and preclinical phases of the disease to be made. If obtained from large enoughcohorts of affected individuals, such measurements could provide insights into the factors

determining variations in preclinical states, and suggest strategies for optimizing them. Progress

towards this goal has come from retrospective analyses of language samples that have been

recorded or archived for various reasons.

2.1. The Nun study

This ongoing longitudinal study traces incident dementia among members of a religious

order using interval neuropsychological assessment and postmortem examination. The study

has also examined premorbid linguistic data produced by participants as many as fifty yearsbefore the appearance of the earliest symptoms of AD (Snowdon, 2003). Between 1931 and

1943, at ages of between 18 and 32 years, subjects were required to write their autobiographies

on entry into the order. When, many years later, these texts were analysed for measures of


8/6/2019 (Garrard 2009)

5/16

syntactic complexity and idea density, lower scores on both dimensions predicted poorer

performance on memory and other cognitive tests many decades later. Intriguingly, a subgroup

analysis identified reliably lower initial idea density scores in individuals in whom Alzheimer

pathology was demonstrated at postmortem. The latter finding was interpreted as suggesting the

existence of common factors underpinning both neurocognitive development and susceptibilityto AD (Snowdon et al., 1996).

Updated autobiographies, written by a subset of the original entrants in the late 1950s and

again in the late 1980s, were used for a longitudinal comparison, which supported an apparently

linear decline on both measures over the course of the lifespan (Kemper et al., 2001).

Surprisingly, the rate of decline did not differ between those who were later diagnosed with

dementia and those who remained cognitively healthy into later life though a similar study using

data from more regular language assessments of a different group of volunteers did demonstrate

a difference in the rate of decline in idea density ( Kemper, Thompson, & Marquis, 2001).

Several aspects of the Nun Study data are clearly relevant to the question of accurately

estimating cognitive reserve: the first is that the retrospective linguistic data employed (ie the

written diary entries) were not only naturalistic but free from any compensatory biases that

might derive from an awareness of being tested. A second is the uniformity, over a number of

behavioural and demographic dimensions, of the participants themselves. As Kemper notes

(Kemper et al., 2001):

Participants in the Nun Study have led relatively homogeneous adult lives. Participants

have the same reproductive and marital histories, have similar social activities and

support throughout their adult lives, have similar occupations and incomes, have equal

access to preventative health and medical care, and do not smoke or drink alcohol

excessively [p. 238].

Because some or all of these lifestyle factors are likely to be important in determining the

robustness of the cognitive reserve, however, informative variations in the linguistic data may

be missed when study participants are well matched. Moreover, those subjects who were fol-

lowed up over their entire lifetime were observed on at most three occasions e perhaps

insufficient to detect subtle changes in language samples predating the clinical onset of

dementia. Although Kemper et al. assumed that the decline seen in both the demented and non-

demented groups was linear, a larger number of observations might have demonstrated

a departure from linearity e for example, a longer maintenance period followed by a precipitate

decline in successfully aging subjects, and a more gradual decline in those who developeddementia (see Fig. 2). Such longitudinal differences would be compatible with variations in the

cognitive reserve.

2.2. The Iris Murdoch study

The celebrated English novelist and philosopher Iris Murdoch (1922e1999) was diagnosed

with Alzheimers disease in 1997, following deterioration in her cognitive abilities, particularly

marked in the domain of language. Postmortem examination of her brain later confirmed the

presence of diagnostic amyloid plaques and neurofibrillary tangles as the dominant pathological

feature. Murdoch may have been the first to notice her own decline; in an interview published inThe Observer, she commented on an uncharacteristic writers block that had plagued her while

she was writing her final novel, Jacksons Dilemma, in 1995. To look for prediagnostic

Alzheimer-like characteristics in the language of this work, Garrard et al. (2005) studied stylistic,


8/6/2019 (Garrard 2009)

6/16

syntactic and lexical attributes, comparing them with works composed at earlier periods in her

four-and-a-half decade long writing career. To enhance the power of the analyses digitised

versions of the complete texts were used. Concordance software (Watt, 2002) generated word

lists, type-to-token ratios, and collocation (pairs of word types occurring at fixed intervals from

one another) statistics. Word and character counts were used to derive sentence length distri-butions, as an indirect index of syntactic complexity (Rosenberg & Abbeduto, 1987).

Stylistic and syntactic analyses revealed no detectable differences between the three works.

Whether this reflected a relative preservation of syntactic ability in the language disorder of

Alzheimers disease e at least in some cases (Croot et al., 1999) e or, perhaps more likely, the

insensitivity of the methods used to assess them (Bates et al., 1995; Kempler et al., 1998), is

unclear. The comparison did, however, detect differences in the lexical characteristics of the

three books. Data indicated an initial phase of enhancement over the first twenty years of IMs

career, as measured by both the variety (cumulative type-to-token ratios) and frequency (using

published norms (Francis, 1967)) of the vocabulary. This was followed by a later decline on

both measures over the last twenty-five years of her life. All these findingse

the absence of anystructural variation, and the marked difference in word frequency without a similar effect of

word length, coupled with a more repetitive and higher frequency vocabulary e mirrored the

changes that have been consistently documented in the spontaneous spoken language of early

Alzheimers disease sufferers (Croisile et al., 1995; Croisile et al., 1996; Garrard et al., 1998;

Kremin et al., 2001).

Indirectly, the Murdoch data also implied the existence of a detectable gradient in abnormal

linguistic characteristics, as formal neuropsychological testing two years after AD was diag-

nosed demonstrated a similar, though more severe, impoverishment of vocabulary, semantic

impairment, frequency-dependent anomia, and a surface dysgraphia (Garrard et al., 2005).

A more decisive demonstration of this putative gradient, however, would clearly requirea comparison of like with like, and a more extensive survey of the impressive literary output

from the last fifteen years of Murdochs working life (Table 1) is likely to yield information

about its nature and temporal characteristics.

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20

Years

%I

ntegrity

Neuronal function

Rapid cognitive decline

'Successful aging'

Fig. 2. Hypothetical course of two dementia sufferers with the same rate of neuronal degradation: the slowly progressive

case possesses a larger cognitive reserve than the more rapidly progressive, and therefore enjoys more symptom free

years.


8/6/2019 (Garrard 2009)

7/16

2.3. Studies of archived spoken language

Spoken output generally requires a greater degree of spontaneity than written, offers feweropportunities for off-line revision, and may therefore be a more sensitive indicator of change.

One of the earliest retrospective language studies was carried out by Brian Butterworth, using

televised speeches of the former U.S. President Ronald Reagan. Occurrence rates for errors in

both content and syntax, and for abnormally long word-finding pauses, were significantly

higher during Reagans debates against Walter Mondale in 1984 than during similar events in

1980, when he was campaigning against the incumbent President Carter [unpublished data].

Reagan was famously diagnosed with Alzheimers disease in 1994 e five years after the end of

his second term as President. The announcement of the diagnosis, and the implications of its

progressive nature, gave rise to speculation about Reagans mental performance while he was

still in office. The journalist Lesley Stahl, for example, describes an interview with the Pres-

ident in which a vacant Reagan barely seemed to realize anyone else was in the room (Stahl,

1999). Regardless of the significance of such one-off anecdotal observations however, the

similarities between Butterworths language error data and the language problems characteristic

of Alzheimers disease (Schwartz & Moscovitch, 1990), would suggest that the earliest

cognitive effects of the disease were detectable at least ten years before a diagnosis was made.

3. Automated discourse analysis

The techniques used to define differences between texts and between samples of continuousdiscourse that have been described so far e such as deriving measures of syntactic complexity

and idea density, and comparing lexical frequency rates using published databases e have for

the most part been normative, top-down methods. Yet this general approach has obvious

disadvantages: the most obvious is its labour intensiveness, which inevitably limits the size of

the text samples to which it can be applied; for informative studies to be conducted on large,

longitudinal written and spoken samples, this is clearly impractical. A second difficulty is that

reliance on lexical frequency norms alone as an index of linguistic change is subject to error,

because i) low frequency words tend to be under-represented in the available databases, and ii)

word usage is subject to prevailing fashion and other transient influences that may not have

been current when the norms were compiled.Automated, data-driven methods of analysis offer a potential solution to all these difficulties,

and considerable progress has been made in the field of text classification over the past decade

(Feldman & Sanger, 2007; Forsyth, 1999). Various techniques have been validated in the fields

Table 1

Titles and years of publication of Iris Murdochs last eight novels

Title Year published

The Sea, The Sea 1978

Nuns and Soldiers 1980The Philosophers Pupil 1983

The Good Apprentice 1985

The Book and the Brotherhood 1988

The Message to the Planet 1990

The Green Knight 1994

Jacksons Dilemma 1995


8/6/2019 (Garrard 2009)

8/16

of authorship attribution (Love, 2002), genre analysis (Stamatatos, Fakotatis, & Kokkinakis,

2000) and topic identification (Clifton, Cooley, & Rennie, 2004), providing the basis for a range

of methods for specifying differences between texts, all of which can be rapidly implemented

using digital text samples as input. A comprehensive survey of these methods is beyond the

scope of the present article, but in view of the potential usefulness to the enterprise ofpresymptomatic discourse analysis in cognitive ageing, they will be briefly reviewed.

3.1. Digital stylometry

Burrows pioneered a method for quantifying differences between texts based on the means

and standard deviations of the proportional frequencies of the n commonest words across

a corpus of contemporary texts of the same genre (Burrows, 2004). The mean of the z-trans-

formed values associated with each word in the target texts yields a summary statistic (Delta),

the magnitude of which varies inversely with similarity. Burrows showed that pairs of texts

originating from male and female authors, from Northern and Southern hemispheres, and from

the 19th and 20th centuries all yielded higher Delta values in between- than within-group

comparisons (Burrows, 2003).

Burrows Delta depends on the frequency distributions of the commonest word types, the

majority of which are grammatical (function) words. Similar measures based on mid- or low-

frequency usages (which include more lexical, or content words) are differentially sensitive to

texts of differing length (Burrows, 2006).

3.2. N-gram analysis

An extension of the word-count method is to use the frequencies of any recurring feature of

a text from letters upwards, and compare the occurrences of each across samples. N-grams

above the letter level can be flexibly defined in terms of words, parts of speech (using auto-

mated parsing routines), and letter or word collocations, allowing rapid automated comparison

of texts over a range of different dimensions of interest. In the field of forensic linguistics the

method has proved sensitive to differences at lexical, syntactic, and stylistic levels (Chaski,

2004). The approach has also proved successful as a basis for authorship attribution and topic

identification (Peng, Schuurmans, & Wang, 2004).

3.3. Entropy

Juola (2003) has proposed a method for estimating the inherent redundancy in a piece of

continuous discourse or text. In the framework of information theory (Shannon & Weaver,

1949) entropy is proportional to the number of binary decisions required to determine an

unknown value. Where the values in question are letters of the alphabet, successful discovery of

an unknown could be achieved heuristically by asking sequentially in which half of the

alphabet, which half of that half, and so forth, the target letter is located. This algorithm would

be needed to identify any member of a truly random sequence of letters, but the multiplicity of

constraints that apply to connected discourse greatly reduces the candidate letters that may

complete a fragment of text. A method for arriving at a comprehensive estimate of similaritybetween two documents based on one such constraint (ie the tendency for similar strings of

characters to recur), is to determine the average number of consecutive characters in one

document that matches all possible character n-grams within the other (Wyner, 1996). In the


8/6/2019 (Garrard 2009)

9/16

extreme case in which the two texts are identical, the value would be simply determined by the

number of characters they (it) contained. In other pairs, higher values would result from more

frequent usages of larger combinations of words. The method would therefore be suitable for

estimating differences between an index text and multiple subsequent outputs by the same

author to identify and timestamp the onset of progressive degrees of deviation.For these methods to be accepted as appropriate to the analysis of text or discourse passages

in the field of cognitive ageing, they must first be shown to be reliably associated with the

presence of underlying cerebral pathology (Garrard, in press). If they can, then in individuals

who have left behind a datable record of spontaneous verbal activity spanning the presymp-

tomatic, preclinical and symptomatic periods of disease, it should be possible robustly to

identify the earliest vestiges of cognitive change. The usefulness of such a marker to the study

of variations in cognitive reserve has already been discussed, but the detection and dating of

AD like changes in archived language may also prove important in other spheres. By way of an

illustrative case-study I will outline the methods, and some preliminary results, from work

currently in progress relating to a British Prime Minister, Harold Wilson, and the reasons for his

sudden and unexplained resignation from office.

4. The Harold Wilson project

The political sphere provides a source of spoken language samples, faithfully transcribed

and saved for posterity since the late 19th century, when Thomas Curson Hansard introduced

the Official Report (usually referred to simply as Hansard, after its founder). Hansardcontains

transcripts of all spoken activity in the two Houses of Parliament. Although it cannot and does

not always report every word said by a Member, departures from verbatim are seldomnoticeable, and typically reflect deletions of repeated words, fillers and particles, as well as

corrections of departures from grammatical convention. To illustrate this point, a recent extract

from the Hansard version (A) and a verbatim transcript (B) taken from a live recording, with

altered segments underlined, is reproduced below.

[A]

The Prime Minister: I thank my hon. Friend for taking up the cause of veterans in her

constituency. She is absolutely right; last week the Health Secretary announced that

veterans would be accorded priority treatment in the national health service, as they

should be. He also announced that there will be a new community-based veterans mentalhealth care service, which will run for the next two years with independent evaluation.

There are 150 mental health professionals working throughout defence, employed by the

Ministry of Defence, and we are determined to do what we can to support not only our

veterans but all those in our armed forces who do an outstanding job and to whom we owe

a debt of gratitude and a duty of care. (Hansard, 2007).

[B]

The Prime Minister: Let let let me thank my honourable Friend for taking up the cause of

vet- veterans in her constituency and shes absolutely right that last week the Health

Secretary announced that er veterans would be accorded priority treatment in the nationalhealth service, as they should be. He has also announced that therell be a new

community-based veterans mental healthcare service, and that will run for the next two

years, with independent evaluation. There are hundred and fifty mental health


8/6/2019 (Garrard 2009)

10/16

professionals working across defence, er through employment by the MOD, and we are

determined to do what we can to support not only our veterans, but all those in our armed

forces who do an outstanding job and to whom we owe a debt of gratitude and a duty of

care.

Naturally, some parliamentary speeches recorded in Hansard would have been read from

texts that may not even have been written by the speaker, so sampling of the archive should be

limited to sessions in which verbal exchanges are less carefully planned. Prime Ministers

Questions (PMQs), a twice weekly1 opportunity for members to interrogate the Prime Minister

(or a deputy, if he is absent) on a range of matters, would seem to meet these requirements.

Although the questions asked at PMQs are prepared in advance, these can easily be eliminated

from the text to be analysed. The Prime Ministers responses, follow-up questions, and

subsequent exchanges are, for the most part, unscripted. Indeed, the practice of speaking from

a prepared script during PMQs attracts disapproval, if not derision2.

Longitudinal analysis of the speeches of one celebrated victim of late-life cognitive declinehas the potential to contribute to the resolution of a longstanding historical dispute. Harold

Wilson (HW) is one of the most fascinating characters to have appeared on the British political

stage in recent times, and the motive for his unexpected resignation during a third term as Prime

Minister in 1976 remains one of the great unsolved mysteries of British politics. HW was noted

for his intellectual gifts and academic precocity, his prodigious memory, astute political sense,

and razor-sharp wit in debate (Pimlott, 1993). His unforeseen resignation in the middle of

a third term as Prime Minister has been variously attributed to an alleged involvement with the

KGB (Mitrokhin, 2000), the impact of negative propaganda spread by rogue elements within

the security services (Wright, 1987), and even a plot to replace him forcibly with an emergency

administration headed by Lord Louis Mountbatten. A more prosaic explanation, however, isthat in the months leading up to March 1976, HW was becoming aware of a progressive mental

blunting which, much later, would turn out to have been the preclinical phase of a progressive

degenerative dementia, very probably Alzheimers disease (Pimlott, 1993).

The existence of precisely dated language samples from HW and his contemporaries

therefore raises the historically significant possibility that the time course of this preclinical

period may be able to be retrospectively determined. To carry out a textual analysis on so large

a scale, top-down methods would certainly be impractical. I will therefore present some

preliminary results obtained from the Hansard archive using methods broadly similar to those

described above under the heading of Digital stylometry.

Transcripts of PMQs that were held while HW was Prime Minister (ie firstly between

October 1964 and June 1970 and secondly between March 1974 and April 1976) were obtained

and converted to ASCII format using optical character recognition software. Markers were

added to identify the date at each change of year and month, while the identity and party

affiliation of every speaker was recorded at the beginning of any speech or contribution to

debate. The texts of questions themselves were omitted because they had been prepared in

advance and would in some cases have been read from a script. Unattributable comments and

interjections from the floor were also removed, as were entries that recorded the reading of

a report or communique.

1 Until 1997 PMQs were held on Tuesday and Thursday mornings, Current practice is for the event to be held once

a week, on Wednesdays.2 Judging by regular entries in the record such as, Honourable Members: Reading!.


8/6/2019 (Garrard 2009)

11/16

Three twelve-month epochs were selected for analysis: April 1965eMarch 1966; April

1969eMarch 1970; and April 1975eMarch 1976 (the month immediately preceding HWs

resignation announcement). These periods contain over 200 separate PMQs sessions, at which

HW answered in person in all but 16 (when the deputy leader or a senior Cabinet minster

responded in his absence). If HWs resignation was, as hypothesised, influenced by his growingawareness of incipient cognitive decline (or, to use modern terminology, the emergence of the

pre-Alzheimer MCI state), then ex hypothesi we should expect consistent differences in his

output across epochs, that would not be detectable in the records of other speakers.

As before, Concordance (Watt, 2002) was used to generate word lists: the corpus contained

537,932 word tokens, and 12,993 unique word types (proper nouns included). Words associated

with a frequency of 1 (hapaxlegomena), accounted for 4765 items. There were 1854 words with

frequency 2, and 1033 with frequency 3, resulting in a heavily skewed distribution. The mean

word frequency was 42.2, with standard deviation 588.9. Pearson analysis of a subset of these

Table 2

The 30 most frequently used words in the entire text sample, and their overall occurrence rates by epoch (expressed as

a percentage of the total number of words in the epoch) in utterances made by HW (right hand column) and by all other

speakers (left hand column)

All other speakers HW

Percentage of all words used in: Percentage of all words used in:

Word or lemma 1965e1996 1969e1970 1975e1976 1965e1996 1969e1970 1975e1976

THE 7.58 7.6 8.33 7.07 7.23 7.95

BE (all grammatical forms) 4.63 4.25 4.06 4.84 3.55 4.28

OF 3.45 3.55 3.49 6.30 3.35 3.33TO 3.15 3.09 3.09 3.25 3.06 2.84

THAT 2.75 2.52 2.64 2.57 2.32 2.30

IN 2.12 2.36 2.2 2.10 2.48 2.19

A or AN 1.96 1.89 1.71 1.95 1.82 1.68

I or ME 1.89 1.73 1.86 2.65 2.36 2.83

AND 1.82 1.86 1.81 1.89 1.96 1.86

HAVE (all grammatical forms) 1.73 1.78 1.66 2.03 2.10 2.09

HONOURABLE 1.42 1.65 1.57 1.48 1.84 1.80

WILL or WOULD 1.33 1.35 1.43 1.07 1.22 1.05

HE or HIM 0.99 0.99 1.1 0.54 0.70 0.60

NOT 1.14 0.99 0.9 1.06 0.91 0.96

IT 1.1 0.97 0.88 1.17 1.01 0.99FOR 0.87 0.96 1 0.87 1.02 0.99

RIGHT 0.92 0.96 0.94 0.79 0.90 0.85

THIS 1.1 0.96 0.75 1.14 0.99 0.89

MY 0.63 0.88 0.93 0.62 0.93 1.03

WE or US 1.1 0.81 0.83 1.45 0.99 0.99

AS 0.77 0.7 0.68 1.41 0.74 0.82

WHICH 0.66 0.77 0.62 0.69 0.82 0.73

DO (all grammatical forms) 0.68 0.64 0.71 0.62 0.59 0.64

WITH 0.64 0.74 0.64 0.72 0.82 0.75

FRIEND(S) 0.49 0.66 0.72 0.50 0.77 0.84

BY 0.54 0.59 0.58 0.55 0.67 0.65

MINISTER(S) 0.58 0.48 0.48 0.20 0.14 0.18PRIME 0.51 0.44 0.46 0.08 0.08 0.08

GOVERNMENT(S) 0.42 0.44 0.44 0.40 0.38 0.44

THERE 0.5 0.41 0.38 0.57 0.46 0.43


8/6/2019 (Garrard 2009)

12/16

values together with their published frequency norms (Brown, Kucera and Francis, andThorndike Lorge (Brown, 1984; Francis, 1967; Thorndike & Lorge, 1944)), did not reveal any

significant correlations between the internally derived and published values (R 0.13 [Brown];

R 0.29 [K & F]; R 0.19 [T-L]), supporting the suggestion made earlier that word frequency

Table 3

The 30 most frequently used content words in the entire text sample, and their overall occurrence rates by epoch

(expressed as a percentage of the total number of words in the epoch) in utterances made by HW (right hand column)

and by all other speakers (left hand column)

All other speakers HWProportion of all words in: Proportion of all words in:

Word or lemma 1965e1966 1969e1970 1975e1976 1965e1966 1969e1970 1975e1976

AGREE 0.18 0.13 0.17 0.16 0.07 0.12

ANSWER 0.17 0.14 0.13 0.17 0.15 0.15

AWARE 0.25 0.24 0.23 0.08 0.15 0.12

BRITISH 0.11 0.10 0.11 0.08 0.07 0.08

COUNTRY 0.12 0.15 0.18 0.08 0.14 0.17

FRIEND 0.50 0.67 0.73 0.42 0.62 0.70

GENTLEMAN/GENTLEMEN 0.55 0.47 0.33 0.59 0.65 0.34

GOVERNMENT 0.42 0.45 0.45 0.37 0.38 0.39

HOUSE 0.34 0.41 0.42 0.32 0.43 0.49LAST 0.19 0.18 0.19 0.22 0.21 0.21

MANY 0.12 0.13 0.14 0.11 0.14 0.13

MATTER 0.17 0.24 0.22 0.17 0.27 0.29

MEMBER 0.20 0.21 0.23 0.23 0.22 0.29

MINISTER 0.58 0.49 0.48 0.11 0.14 0.10

MORE 0.16 0.17 0.19 0.16 0.16 0.16

OPPOSITION 0.07 0.09 0.16 0.07 0.07 0.21

ORDER 0.16 0.22 0.11 0.05 0.05 0.03

PART 0.09 0.11 0.08 0.09 0.14 0.10

PARTY 0.07 0.06 0.17 0.06 0.06 0.18

PEOPLE 0.09 0.10 0.15 0.05 0.06 0.08

POINT 0.11 0.16 0.11 0.09 0.09 0.08POLICY 0.15 0.11 0.19 0.13 0.10 0.16

PRIME 0.52 0.44 0.47 0.08 0.08 0.08

PUBLIC 0.06 0.07 0.15 0.05 0.06 0.11

QUESTION 0.44 0.46 0.33 0.49 0.56 0.40

SECRETARY 0.13 0.12 0.15 0.09 0.12 0.12

STATE 0.07 0.09 0.15 0.06 0.06 0.11

STATEMENT 0.13 0.08 0.11 0.14 0.09 0.11

THINK/THOUGHT 0.35 0.26 0.16 0.51 0.36 0.21

TIME 0.20 0.20 0.20 0.23 0.20 0.21

Table 4

Values ofWfor pairwise comparisons (using Wilcoxsons signed rank test) between language attributable to HW and all

other speakers during each of the three epochs studied. Comparisons reaching statistical significance are printed in bold

HW 69e70 HW 75e76 All 65e66 All 69e70 All 75e76

HW 65e66 1.10 1.52 L2.02 1.75 L2.59

HW 69e

70 1.35 0.96 1.44L

2.41HW 75e76 0.46 0.42 1.44

All 65e66 0.01 0.95

All 69e70 0.42


8/6/2019 (Garrard 2009)

13/16

8/6/2019 (Garrard 2009)

14/16

complexity, are essentially top-down methods, which use the theoretical assumptions of neu-

ropsychological models to characterize a piece of discourse. Although these methods have the

advantage of an empirical basis that allows one to know what to look for as well as how and

why to look for it, they are limited in scope and are frequently dependent on data (eg lexical

frequency) that may not be universally applicable across languages, cultures and time periods.By contrast, metrics such as cumulative typeetoken ratios and the relative distributions of

lexical types will vary as a result of the lexical choices of the speaker or writer. It could be

argued that such choices are likely to be highly individual specific rather than reflections of

group membership or neuropsychological condition. Stylometric analysis has certainly been

successful in distinguishing the work of two individuals, but its sensitivity to distinctions at

group level e century, sex, (English-speaking) country of birth e attest to collective as well as

individual influences.

Of course it does not follow automatically from this that the presence or absence of

degenerative neuropathology delineates a group in the same sense, though this is an empirical

question that remains to be resolved. It also remains to be seen whether the power that auto-

mated stylometric analysis derives from being applied to literary texts many thousands of words

in length is sufficient to deal with the very much smaller samples that are usually produced in

the course of day-to-day life. If degenerative cognitive decline does have a stylometric

signature and descriptive methods are available to detect it, then the scope for further insights

into the origins and natural history of these common and devastating disorders will be

considerably enhanced.

References

Bates, E., Harris, C., Marchman, V., Wulfeck, B., & Kritchevsky, M. (1995). Production of complex syntax in normalaging and Alzheimers-disease. Language and Cognitive Processes, 10(5), 487e539.

Bernheimer, H., Birkmeyer, W., Hornykiewicz, O., Jellinger, K., & Seitelberger, F. (1973). Brain dopamine and the

syndromes of Parkinson and Huntington Clinical, morphological and neurochemical correlations. Journal of the

Neurological Sciences, 20(4), 415e455.

de Bode, S., & Curtiss, S. (2000). Language after hemispherectomy. Brain and Cognition, 43(1e3), 135e138.

Braak, H., & Braak, E. (1995). Staging of Alzheimers-disease-related neurofibrillary changes. Neurobiology of Aging,

16(3), 271e278.

Brown, G. D. A. (1984). A frequency count of 190,000 words in the London-Lund Corpus of English Conversation.

Behavioural Research Methods Instrumentation and Computers, 16, 502e532.

Bruscoli, M., & Lovestone, S. (2004). Is MCI really just early dementia? A systematic review of conversion studies.

International Psychogeriatrics, 16(2), 129e140.

Burrows, J. (2003). Questions of authorship: attribution and beyonde a lecture delivered on the occasion of the Roberto

Busa Award ACH-ALLC 2001, New York. Computers and the Humanities, 37(1), 5e32.

Burrows, J. (2004). Textual analysis. In S. Schreibman, R. Siemans, & J. Unsworth (Eds.), A companion to digital

humanities (pp. 323e347). Oxford: Blackwell.

Burrows, J. (2006). All the way through: testing for authorship in different frequency strata. Literary and Linguistic

Computing fqi067.

Chaski, C. E. (2004). Forensic linguistics: an introduction to language, crime and the law. International Journal of

Speech Language and the Law, 11(2), 298e303.

Clifton, C., Cooley, R., & Rennie, J. (2004). TopCat: data mining for topic identification in a text corpus. IEEE

Transactions on Knowledge and Data Engineering, 16(8), 949e964.

Croisile, B., Adelein, P., Carmoi, T., Aimard, G., & Trillet, M. (1995). Evaluation of spelling in Alzheimers-disease.

Revue De Neuropsychologie, 5(1), 23e

51.Croisile, B., Ska, B., Brabant, M.-J., Duchenne, A., Lepage, Y., Aimard, G., et al. (1996). Comparative study of oral and

written picture description in patients with Alzheimers disease. Brain and Language, 53(1), 1e19.

Croot, K., Hodges, J. R., & Patterson, K. (1999). Evidence for impaired sentence comprehension in early Alzheimers

disease. Journal of the International Neuropsychological Society, 5(5), 393e404.


8/6/2019 (Garrard 2009)

15/16

8/6/2019 (Garrard 2009)

16/16

Stahl, L. R. (1999). Reporting live. New York: Simon and Schuster.

Stamatatos, E., Fakotatis, N., & Kokkinakis, G. (2000). Automatic text categorisation in terms of genre and author.

Computational Linguistics, 26, 471e495.

Thorndike, E. L., & Lorge, I. (1944). The teachers word book of 30,000 words. New York: Teachers College, Columbia

University.

Tomlinson, B. E., Blessed, G., & Roth, M. (1970). Observations on the brains of demented old people. Journal of the

Neurological Sciences, 11(3), 205e242.

Vicari, S., Albertoni, A., Chilosi, A. M., Cipriani, P., Cioni, G., & Bates, E. (2000). Plasticity and reorganization during

language development in children with early brain injury. Cortex, 36(1), 31e46.

Watt, R. J. C. (2002). Concordance. Dundee.

Wright, P. (1987). Spycatcher. Heinemann.

Wyner, A. J. (1996). Entropy estimation and patterns. In Workshop on Information Systems and Information Theory.

Haifa, Israel.


Documents

(Garrard 2009)