Upload
trinhtram
View
213
Download
0
Embed Size (px)
Citation preview
Multi-modal Patient Cohort Identificationfrom EEG Report and Signal Data
Travis R. Goodwin and Sanda M. Harabagiu
The University of Texas at Dallas
Human Language Technology Research Institute
http://www.hlt.utdallas.edu
Conflicts
Nothing to disclose.
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Introduction: The ProblemBackground:
• An electroencephalogram (EEG) measures the electrical activity of the brain.
• EEGs are an important investigational tool in the diagnosis and management of epilepsies and other types of brain disorders.
Problem:
EEG interpretation is known to have only moderate agreement (Beniczky et al., 2013).
Introduction: The Solution
Solution:
• Leverage “Big Data” of EEG reports
• Improve interpretation agreement by allowing neurologists to search for patients that exhibit similar EEG characteristics
• Automatically identifying patient cohorts can• inform the clinical decision of neurologists
• enable comparative clinical effectiveness research
Introduction: Examples
Scenario 1:
Neurologist suspects their patient has epilepsy potential and wants to review similar cases
Query: Patients with a history of seizures and EEG with TIRDA without sharps, spikes, or electrographic seizures.
Scenario 2:
Neurologist wants to investigate effective interventions for epilepsy accompanied by mental health disorders.
Query: Patients with a history of Alzheimer and abnormal EEG
Temporal Intermittent Rhythmic Delta Activity
Multi-modal Encephalogram Patient Cohort Discovery(MERCuRY)
Goal: Identify patients satisfying cohort criteria expression in a natural language query.
MERCuRY System:
• Considers both the EEG report as well as the signal data
• Natural language processing to identify inclusion/exclusion criteria in the query
• Deep learning to represent EEG signals and produce a multi-modal index of EEG information
• Ranks patients using relevance models
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Data: TUH EEG Corpus
• Temple University Hospital (TUH) EEG Corpus
• Largest publically available dataset of EEG data• 25,000 EEG sessions
• 15,000 patients
• Collected over 12 years
• Contains both EEG Reports and EEG signal data
Data: EEG Reports
American Clinical Neurophysiology Society (ACNS) Guidelines for writing EEG reports
Clinical History: patients age, gender, relevant medical conditions and medications
Introduction: EEG technique/configuration • “digital video EEG”, “standard 10-20 system with 1 channel EKG”
Description: describes any notable waveform activity, patterns, or EEG events• “sharp wave”, “burst suppression pattern”, “very quick jerks of the head”
Impression: interpretation of whether the EEG indicates normal or abnormal brain activity, as well as a list of contributing epileptiform phenomena
• “abnormal EEG due to background slowing”
Clinical correlation: relates the EEG findings to the over-all clinical picture of the patient• “very worrisome prognostic features”
Data: EEG Signals
Each report is associated with its EEG signal recording
• EEG signals contain 24 to 36 channels with an additional annotation channel identifying events of interest to the physicians and/or technicians
• Sampled at a rate of 250 Hz or 256 Hz using 16-bits per sample
• Each EEG recording contains roughly 20 MB of raw data!
• Uses the European Data Format (EDF+) schema
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Methods: Overview
• Queries are processed by natural language analysis:• term filtering, query formulation, query expansion
• EEG reports are identified by a relevance model• Case 1: EEG report ONLY• Case 2: EEG report + EEG signal (Multi-modal)
• Relevance model relies on multi-modal Index• Term/phrase dictionary, tiered inverted lists, EEG signal fingerprints
• EEG Report Processing:• section identification, medical language processing
• EEG Signal Processing:• deep neural learning: fingerprint detection
Query
Relevance Model
Index
Methods: MERCuRY System Overview
EEG Cohort Description
EEG Report
EEG Signal
EEG Report Processing1. Section identification2. Medical language processing
EEG Signal ProcessingDeep Neural Network
Multi-Modal EEG INDEX
Analysis1. Term Filtering2. Query Formulation3. Query Expansion
Term / PhraseDictionary
Tiered Inverted Lists
EEG Signal fingerprints
Relevance Model
Case 1
Case 2
Patient Cohort CASE 1
Patient Cohort CASE 2
Methods: Indexing EEG Reports
Concept dictionary
• Identified 5 types of medical concepts:• PROBLEMS (e.g. “seizure”)
• TREATMENTS (e.g. “Topamax”)
• TESTS (e.g. “video EEG”)
• EEG PATTERNS/ACTIVITIES (e.g. “focal slowing”, “polyspike”)
• EEG EVENTS (e.g. “blink”, “jerk of the head”)
• Links each medical concept to the terms expressing it:• e.g., “spike and slow wave” “spike”, “and”, “slow”, “wave”
Methods: Indexing EEG Reports
Term Dictionary
• Each term is associated with two inverted lists• “positive” polarity: mentions in which the term is positive
• “negative” polarity: mentions in which the term is negated
• Each cell in the inverted list contains:• the EEG report containing the mention
• the name of the section containing the mention
• the position of the mention within the section (e.g. term offset)
• position(s) of the term in any associated medical concepts
• the EEG signal fingerprint associated with the EEG report
Overview of the Tiered Multi-Modal Index
term IDterm ID…term ID….term ID…term IDterm IDterm IDterm IDterm ID….term ID
alphabeta…hypertension….lovenox…seizuresharpslowspikestroke….wave
TER
M
DIC
TIO
NA
RY
POSITIVEPOLARITY
NEGATIVEPOLARITY
EEG Report ID EEG Signal Fingerprint ID
Report Section Report Section Position
Medical Concept ID Concept Position
EEG Report ID EEG Signal Fingerprint ID
Report Section Report Section Position
Medical Concept ID Concept Position
Next
Next
EEG Signalfingerprints
Medical Concept ID….Medical Concept ID
Concept Type….Concept Type
alpha….Sharp and slow wave
term ID….term ID
Medical ConceptDICTIONARY
Tiered Inverted Lists
Methods: “Fingerprinting” EEG Signals
EEG signal encoded as a dense floating-point matrix, 𝑫 ∈ ℝ𝑁,𝐿
• 𝑁 is the number of electrode channels
• 𝐿 is the number of samples in the recording (e.g. 𝐿 / 250 = duration of the recording in seconds)
One pass over the EEG signals requires considering over 1.8 terabytes of information!
We need a more compact representation: EEG fingerprints
Methods: Learning EEG Fingerprints
• Deep neural learning• Process EEG signals in a matter of hours
rather than weeks• Reduce each EEG signal from 20 MB to a
few hundred bytes
• Recurrent Neural Network• Consider the EEG signal as a sequence of samples• For each sample 𝒙𝒕, learn to predict the next sample 𝒙𝒕+𝟏
• Long Short-Term Memory• Can learn long-range interactions in the EEG signal• Maintains & updates an internal memory 𝒎𝒕• Final internal memory 𝒎𝑳becomes the EEG fingerprint
Methods: Relevance Model
Purpose: measure the relevance between a query and an EEG report
Case 1: consider EEG reports only• BM25F ranking function
• Gives a different weight to query matches in each field, and for each polarity
Case 2: consider EEG report + EEG fingerprint• Retrieve initial set of EEG reports as in Case 1
• Identify the 𝝀 top-ranked EEG reports
• Lookup the 𝜹 most-similar EEG fingerprints for top-ranked reports
In our experiments,𝝀 = 𝟓; 𝜹 = 𝟑
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Evaluation: Queries
Asked neurologists to provide patient cohort descriptions (queries)
Patient Cohort Description (Queries)1. History of seizures and EEG with TIRDA without sharps, spikes, or electrographic
seizures2. History of Alzheimer dementia and normal EEG3. Patients with altered mental status and EEG showing nonconvulsive status
epilepticus (NCSE)4. Patients under 18 years old with absence seizures5. Patients over age 18 with history of developmental delay and EEG with
electrographic seizures
Evaluation: Patient Cohort Quality
• For each query• Identified the 10 most relevant EEG reports
• Random sample of 10 EEG reports retrieved between ranks 11 and 100.
• Asked neurologists to judge whether each EEG report was “relevant”:• 1: the patient described in the report definitely belongs to the cohort
• 0: the patient described in the report does not belong to the cohort
• Measured using standard information retrieval metrics:• Mean Average Precision (MAP)
• Normalized Discounted Cumulative Gain (NDCG)
• Precision at Rank 10 (P@10)
Evaluation: Patient Cohort Quality
Relevance Model MAP NDCG P @ 10Baseline 1: BM25 52.05% 66.41% 80.00%Baseline 2: LMD 50.37% 65.90% 80.00%Baseline 3: DFR 46.22% 59.35% 70.00%MERCuRY: Case 1 (a) 58.59% 72.14% 90.00%MERCuRY: Case 1 (b) 57.95% 70.34% 90.00%MERCuRY: Case 2 (a) 70.43% 84.62% 100.00%MERCuRY: Case 2 (b) 69.87% 83.21% 100.00%
(a) With concept dictionary (b) Without concept dictionary
Presentation Outline
1. Introduction
2. Data
3. Methods
4. Evaluation
5. Conclusions
Conclusions
• Including EEG signals with EEG reports lead to improved patient cohort retrieval
• No noticeable improvement when indexing medical concepts: • can recognize concepts in the query only• maintaining positional information is sufficient to identify medical concepts in EEG
reports
• Considering polarity & section information substantially improved performance
• EEG fingerprints are able to fill in the gaps in EEG reports• Future work:
• Tune parameters 𝜹 and 𝝀• Jointly encode EEG reports + EEG signals
Acknowledgements
Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under award number 1U01HG008468. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Questions?
Background: Multi-modal Retrieval
• Text REtrieval Conference (TREC) Medical Records Track• 2011 & 2012: systems were given natural language queries and text-only EHRs
• Goal was to retrieve sets of electronic medical records relevant to the query
• Biomedical Multi-modal Retrieval (Demner-Fushman et al, 2012)• Considers both images and text of scientific articles
• Allows users to discover similar images to those used in a scientific article
• AALIM (Syeda-Mahmood et al, 2007)• Allows users to discover similar ECG, echo, or audio recordings given the ECG,
echo, or audio recording of a patient