Multi-modal Patient Cohort Identification from EEG Report ...travis/papers/amia_symp_2016_slides.pdf · from EEG Report and Signal Data ... Guidelines for writing EEG reports Clinical

Multi-modal Patient Cohort Identificationfrom EEG Report and Signal Data

Travis R. Goodwin and Sanda M. Harabagiu

The University of Texas at Dallas

Human Language Technology Research Institute

http://www.hlt.utdallas.edu

Conflicts

Nothing to disclose.

Presentation Outline

1. Introduction

2. Data

3. Methods

4. Evaluation

5. Conclusions

Introduction: The ProblemBackground:

• An electroencephalogram (EEG) measures the electrical activity of the brain.

• EEGs are an important investigational tool in the diagnosis and management of epilepsies and other types of brain disorders.

Problem:

EEG interpretation is known to have only moderate agreement (Beniczky et al., 2013).

Introduction: The Solution

Solution:

• Leverage “Big Data” of EEG reports

• Improve interpretation agreement by allowing neurologists to search for patients that exhibit similar EEG characteristics

• Automatically identifying patient cohorts can• inform the clinical decision of neurologists

• enable comparative clinical effectiveness research

Introduction: Examples

Scenario 1:

Neurologist suspects their patient has epilepsy potential and wants to review similar cases

Query: Patients with a history of seizures and EEG with TIRDA without sharps, spikes, or electrographic seizures.

Scenario 2:

Neurologist wants to investigate effective interventions for epilepsy accompanied by mental health disorders.

Query: Patients with a history of Alzheimer and abnormal EEG

Temporal Intermittent Rhythmic Delta Activity

Multi-modal Encephalogram Patient Cohort Discovery(MERCuRY)

Goal: Identify patients satisfying cohort criteria expression in a natural language query.

MERCuRY System:

• Considers both the EEG report as well as the signal data

• Natural language processing to identify inclusion/exclusion criteria in the query

• Deep learning to represent EEG signals and produce a multi-modal index of EEG information

• Ranks patients using relevance models


1. Introduction

2. Data

3. Methods

4. Evaluation

5. Conclusions

Data: TUH EEG Corpus

• Temple University Hospital (TUH) EEG Corpus

• Largest publically available dataset of EEG data• 25,000 EEG sessions

• 15,000 patients

• Collected over 12 years

• Contains both EEG Reports and EEG signal data

Data: EEG Reports

American Clinical Neurophysiology Society (ACNS) Guidelines for writing EEG reports

Clinical History: patients age, gender, relevant medical conditions and medications

Introduction: EEG technique/configuration • “digital video EEG”, “standard 10-20 system with 1 channel EKG”

Description: describes any notable waveform activity, patterns, or EEG events• “sharp wave”, “burst suppression pattern”, “very quick jerks of the head”

Impression: interpretation of whether the EEG indicates normal or abnormal brain activity, as well as a list of contributing epileptiform phenomena

• “abnormal EEG due to background slowing”

Clinical correlation: relates the EEG findings to the over-all clinical picture of the patient• “very worrisome prognostic features”

Data: EEG Signals

Each report is associated with its EEG signal recording

• EEG signals contain 24 to 36 channels with an additional annotation channel identifying events of interest to the physicians and/or technicians

• Sampled at a rate of 250 Hz or 256 Hz using 16-bits per sample

• Each EEG recording contains roughly 20 MB of raw data!

• Uses the European Data Format (EDF+) schema


1. Introduction

2. Data

3. Methods

4. Evaluation

5. Conclusions

Methods: Overview

• Queries are processed by natural language analysis:• term filtering, query formulation, query expansion

• EEG reports are identified by a relevance model• Case 1: EEG report ONLY• Case 2: EEG report + EEG signal (Multi-modal)

• Relevance model relies on multi-modal Index• Term/phrase dictionary, tiered inverted lists, EEG signal fingerprints

• EEG Report Processing:• section identification, medical language processing

• EEG Signal Processing:• deep neural learning: fingerprint detection

Query

Relevance Model

Index

Methods: MERCuRY System Overview

EEG Cohort Description

EEG Report

EEG Signal

EEG Report Processing1. Section identification2. Medical language processing

EEG Signal ProcessingDeep Neural Network

Multi-Modal EEG INDEX

Analysis1. Term Filtering2. Query Formulation3. Query Expansion

Term / PhraseDictionary

Tiered Inverted Lists

EEG Signal fingerprints

Relevance Model

Case 1

Case 2

Patient Cohort CASE 1

Patient Cohort CASE 2

Methods: Indexing EEG Reports

Concept dictionary

• Identified 5 types of medical concepts:• PROBLEMS (e.g. “seizure”)

• TREATMENTS (e.g. “Topamax”)

• TESTS (e.g. “video EEG”)

• EEG PATTERNS/ACTIVITIES (e.g. “focal slowing”, “polyspike”)

• EEG EVENTS (e.g. “blink”, “jerk of the head”)

• Links each medical concept to the terms expressing it:• e.g., “spike and slow wave” “spike”, “and”, “slow”, “wave”

Methods: Indexing EEG Reports

Term Dictionary

• Each term is associated with two inverted lists• “positive” polarity: mentions in which the term is positive

• “negative” polarity: mentions in which the term is negated

• Each cell in the inverted list contains:• the EEG report containing the mention

• the name of the section containing the mention

• the position of the mention within the section (e.g. term offset)

• position(s) of the term in any associated medical concepts

• the EEG signal fingerprint associated with the EEG report

Overview of the Tiered Multi-Modal Index

term IDterm ID…term ID….term ID…term IDterm IDterm IDterm IDterm ID….term ID

alphabeta…hypertension….lovenox…seizuresharpslowspikestroke….wave

TER

M

DIC

TIO

NA

RY

POSITIVEPOLARITY

NEGATIVEPOLARITY

EEG Report ID EEG Signal Fingerprint ID

Report Section Report Section Position

Medical Concept ID Concept Position

EEG Report ID EEG Signal Fingerprint ID

Report Section Report Section Position

Medical Concept ID Concept Position

Next

Next

EEG Signalfingerprints

Medical Concept ID….Medical Concept ID

Concept Type….Concept Type

alpha….Sharp and slow wave

term ID….term ID

Medical ConceptDICTIONARY

Tiered Inverted Lists

Methods: “Fingerprinting” EEG Signals

EEG signal encoded as a dense floating-point matrix, 𝑫 ∈ ℝ𝑁,𝐿

• 𝑁 is the number of electrode channels

• 𝐿 is the number of samples in the recording (e.g. 𝐿 / 250 = duration of the recording in seconds)

One pass over the EEG signals requires considering over 1.8 terabytes of information!

We need a more compact representation: EEG fingerprints

Methods: Learning EEG Fingerprints

• Deep neural learning• Process EEG signals in a matter of hours

rather than weeks• Reduce each EEG signal from 20 MB to a

few hundred bytes

• Recurrent Neural Network• Consider the EEG signal as a sequence of samples• For each sample 𝒙𝒕, learn to predict the next sample 𝒙𝒕+𝟏

• Long Short-Term Memory• Can learn long-range interactions in the EEG signal• Maintains & updates an internal memory 𝒎𝒕• Final internal memory 𝒎𝑳becomes the EEG fingerprint

Methods: Relevance Model

Purpose: measure the relevance between a query and an EEG report

Case 1: consider EEG reports only• BM25F ranking function

• Gives a different weight to query matches in each field, and for each polarity

Case 2: consider EEG report + EEG fingerprint• Retrieve initial set of EEG reports as in Case 1

• Identify the 𝝀 top-ranked EEG reports

• Lookup the 𝜹 most-similar EEG fingerprints for top-ranked reports

In our experiments,𝝀 = 𝟓; 𝜹 = 𝟑


1. Introduction

2. Data

3. Methods

4. Evaluation

5. Conclusions

Evaluation: Queries

Asked neurologists to provide patient cohort descriptions (queries)

Patient Cohort Description (Queries)1. History of seizures and EEG with TIRDA without sharps, spikes, or electrographic

seizures2. History of Alzheimer dementia and normal EEG3. Patients with altered mental status and EEG showing nonconvulsive status

epilepticus (NCSE)4. Patients under 18 years old with absence seizures5. Patients over age 18 with history of developmental delay and EEG with

electrographic seizures

Evaluation: Patient Cohort Quality

• For each query• Identified the 10 most relevant EEG reports

• Random sample of 10 EEG reports retrieved between ranks 11 and 100.

• Asked neurologists to judge whether each EEG report was “relevant”:• 1: the patient described in the report definitely belongs to the cohort

• 0: the patient described in the report does not belong to the cohort

• Measured using standard information retrieval metrics:• Mean Average Precision (MAP)

• Normalized Discounted Cumulative Gain (NDCG)

• Precision at Rank 10 (P@10)

Evaluation: Patient Cohort Quality

Relevance Model MAP NDCG P @ 10Baseline 1: BM25 52.05% 66.41% 80.00%Baseline 2: LMD 50.37% 65.90% 80.00%Baseline 3: DFR 46.22% 59.35% 70.00%MERCuRY: Case 1 (a) 58.59% 72.14% 90.00%MERCuRY: Case 1 (b) 57.95% 70.34% 90.00%MERCuRY: Case 2 (a) 70.43% 84.62% 100.00%MERCuRY: Case 2 (b) 69.87% 83.21% 100.00%

(a) With concept dictionary (b) Without concept dictionary


1. Introduction

2. Data

3. Methods

4. Evaluation

5. Conclusions

Conclusions

• Including EEG signals with EEG reports lead to improved patient cohort retrieval

• No noticeable improvement when indexing medical concepts: • can recognize concepts in the query only• maintaining positional information is sufficient to identify medical concepts in EEG

reports

• Considering polarity & section information substantially improved performance

• EEG fingerprints are able to fill in the gaps in EEG reports• Future work:

• Tune parameters 𝜹 and 𝝀• Jointly encode EEG reports + EEG signals

Acknowledgements

Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under award number 1U01HG008468. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Questions?

Background: Multi-modal Retrieval

• Text REtrieval Conference (TREC) Medical Records Track• 2011 & 2012: systems were given natural language queries and text-only EHRs

• Goal was to retrieve sets of electronic medical records relevant to the query

• Biomedical Multi-modal Retrieval (Demner-Fushman et al, 2012)• Considers both images and text of scientific articles

• Allows users to discover similar images to those used in a scientific article

• AALIM (Syeda-Mahmood et al, 2007)• Allows users to discover similar ECG, echo, or audio recordings given the ECG,

echo, or audio recording of a patient

Documents

Multi-modal Patient Cohort Identification from EEG Report ...travis/papers/amia_symp_2016_slides.pdf · from EEG Report and Signal Data ... Guidelines for writing EEG reports Clinical