50
Department of Computer Science and Engineering Yixin Chen 陈陈 陈陈 (一) , Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of Medicine Kelly Faulkner, Kevin Heard, Marin Kollef, Thomas Bailey Real-Time Clinical Warning for Hospitalized Patients via Data Mining 陈陈陈陈陈陈陈陈陈陈陈陈陈陈陈陈陈 ()

Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Embed Size (px)

Citation preview

Page 1: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Department of Computer Science and EngineeringYixin Chen (陈一昕) , Yi Mao, Minmin Chen, Rahav Dor, Greg

Hackermann, Zhicheng Yang, Chengyang Lu

School of MedicineKelly Faulkner, Kevin Heard, Marin Kollef, Thomas Bailey

Real-Time Clinical Warning for Hospitalized Patients via Data Mining (数据挖掘实现的住院病人的实时预警)

Page 2: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Background • The ICU direct costs per day for

survivors is between six and seven times those for non-ICU care.

• Unlike patients at ICUs, general hospital wards (GHW) patients are not under extensive electronic monitoring and nurse care.

• Clinical study has found that 4–17% of patients will undergo cardiopulmonary or respiratory arrest while in the GHW of hospital.

Page 3: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Project mission• Sudden deteriorations (e.g. septic

shock, cardiopulmonary or respiratory arrest) of GHW patients can often be severe and life threatening.

• Goal: Provide early detection and

intervention based on data mining– to prevent these serious, often life-

threatening events.– Using both clinical data and wireless

body sensor data

• A NIH-ICTS funded project: currently under clinical trials at Barnes-Jewish Hospital, St. Louis, MO

Page 4: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

What exactly do we predict

Is he going to die?

Page 5: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

What exactly do we predict

Is he going to ICU?

Page 6: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

System Architecture

• Tier 1: EWS (early warning system)• Clinical data, lab tests, manually collected, low frequency

• Tier 2: RDS (real-time data sensing)• Body sensor data, automatically collected, wirelessly transmitted, high frequency

Page 7: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Agenda

Background and overview1

Real-time data sensing (RDS)3

Future work5

Early warning system (EWS)2

Page 8: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Medical Record (34 vital signs: pulse, temperature, oxygen saturation, shock index, respirations, age, blood pressure …)

Time/second

Time/second

Page 9: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Related Work

Main problems : Most previous general work uses a snapshot method that takes all the features at a given time as input to a model, discarding the temporal evolving of data

Medical data

mining

medical knowledge

machine learning

methods

SCAP and PSI

Acute Physiology Score, Chronic

Health Score , andAPACHE score are

used to predict renal failures

Modified Early Warning

Score (MEWS)

decision trees

neural networks

SVM

Page 10: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Overview of EWS Goal: Design an data mining algorithm that can automatically identify patients at risk of clinical deterioration based on their existing electronic medical records time-series.

0

5000

10000

15000

20000

25000

30000

Non-ICUICU

Challenges: • Classification of high-

dimensional time series data

• Irregular data gaps• measurement errors• class imbalance

Page 11: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Key Techniques in the EWS Algorithm

• Temporal bucketing • Discriminative classification • Bootstrap aggregating (bagging)• Exploratory under-sampling• Exponential moving average smoothing• Kernel-density estimation

Page 12: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Workflow of the System

Exploratory Undersampl ing

Logisti c Regression

Bucket bagging

Data set D,T

Converge?

Predict Model

Final Model

Yes

No

Real -time data stream

Final Model

EMA Smoothing

> threshold?

Alarm Warning

No

Data Preprocessing

> i teration count?

Bucketing

Yes

No

Bucketing

Data Preprocessing

(A) Model bui lding phase (B) Deployment phase

Generate a 24-hour window

Yes

Page 13: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Data Preprocessing

Outlier removal

Normalization

Page 14: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Temporal Bucketing

We retain data in a sliding window of the last 24 hours and divided it evenly into 6 buckets

In order to capture temporal variations, we compute several feature values for each bucket, including the minimum, maximum, and average

Bucket 1 Bucket 3 Bucket 5Bucket 2 Bucket 4 Bucket 6

Page 15: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Discriminative Classification

Clinical data

Data preprocessing

Classification Algo.

Output Model, Threshold

• Logistic regression (LR)

• Support vector machine (SVM)

• Use max, min, and avg of each bucket and each vital sign as the input features. (~ 400 features in total)

• Use the training data to learn the model parameters.

Temporal Bucketing

Page 16: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Aggregated Bootstrapping (bagging)

………….

………….

Final Model

Advantages:

1. Handles outliers

2. Avoid over-fitting

3. Better model quality

Page 17: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Biased Bucket Bagging

………….

………….

Final Model

Bucketing

Page 18: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Exploratory Undersampling

Predict model

Class balance

Remove the right record from the majori ty class according to the predicted value

Page 19: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Exponential Moving Average (EMA)

Page 20: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Evaluation Criteria

AUC (Area Under receives operating characteristic (ROC) Curve) represents the probability that a randomly chosen positive example is correctly rated with greater suspicion than a randomly chosen negative example.

Page 21: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Results on Historical DatabaseMethod AUC SENS PPV NPV ACCU

1 0.86809 0.44753 0.29562 0.97345 0.92747

2 0.8907 0.5135 0.3386 0.9751 0.9293

3 0.91995 0.58558 0.36864 0.97871 0.93269

4 0.92108 0.60087 0.37466 0.97948 0.93342

5 0.9221 0.60961 0.37805 0.97992 0.93384

1: bucketing + logistic regression

2: bucketing + logistic regression + bagging

3: bucketing + logistic regression + bucket bagging

4: bucketing + logistic regression + biased bucket bagging

5: bucketing + logistic regression + biased bucket bagging + exploratory undersampling

At specificity=0.95

Page 22: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Comparison of various modelsMethod AUC SPEC SENS PPV NPV ACCU

RPART 0.6703 0.93 0.55 0.287 0.977 0.912

SVM (Linear kernel

0.6879 0.9762 0.3997 0.4405 0.9719 0.95033

SVM (Quadratic

kernel

0.6851 0.9675 0.4028 0.3676 0.9718 0.94216

SVM (Cubic kernel)

0.6792 0.9681 0.3904 0.3646 0.9713 0.94216

SVM(RBF kernel

0.6968 0.9615 0.4321 0.3448 0.9730 0.93774

Our method 5 0.9221 0.94996 0.60961 0.37805 0.97992 0.93384

Page 23: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Dates Start Date Last Date

277 days 1/24/2011 11/1/2011

ICU Transfers

total with alert w/o alert

ICU transfer 510 243 267

Total 11286 1430 9856

Ratio 4.5% 17.0 % 2.7 %

Deaths total with alert w/o alert

Deaths 239 138 102

Total 11286 1430 9856

Ratio 2.12% 9.65 % 1.02 %

Alerts already triggered early prevention that may prevented deaths

Clinical Trial at Barnes-Jewish Hospital

Page 24: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Agenda

Background & Related work1

Real-time data sensing (RDS)3

Future work5

Early warning system (EWS)2

Page 25: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

A challenging problem• Classification based on multiple high-frequency real-time time-

series (heart rate, pulse, oxygen sat., CO2, temperature, etc.)

Overview of RDS

Page 26: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Wireless Sensor Network at BJH

Page 27: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Overview of Learning Algorithm

Key techniques:

Feature extraction from multiple time series

Feature selection

Classification algorithms

Exploratory undersampling

Page 28: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

A Large Pool of Features

Features: • Detrended fluctuation

analysis (DFA) features• Approximate entropy

(ApEn)• Spectral features• First-order features• Second-order features• Cross-sign features

Page 29: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Detrended Fluctuation Analysis (DFA)

DFA is a method for quantifying the statistical self-affinity of a time-seriessignal. (See: e.g., Peng et al. 1994)

Applicable to both pulse rate and SpO2

Page 30: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Spectral Analysis (FFT)

Used component values of VLF (<0.04Hz), LF (0.04-0,15HZ), HF (0.15-0.4HZ), and the ratio LF/HF for each signal.

Page 31: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Other Features

• Approximate Entropy (ApEn): It quantifies the unpredictability of fluctuations in a time series. – A low value deterministic– A high value unpredictable

• First Order Features: – Mean, standard deviation– skewness (symmetry of distribution), Kurtosis (peakness of distribution)

• Second Order Features: related to co-occurrence of patterns– First quantify a time series into Q discrete bins, then construct a pattern matrix – energy (E), entropy (S), correlation (COR), inertia (F), local homogeneity (LH),

• Cross-sign features: link multiple vital signs together– Correlation: the degree of departure of two signals from independence– Coherence: amplitude and phase about the frequencies held in common

between two signals

Page 32: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Empty Feature Set

Current Feature Set

Evaluate each of the remaining features

Forward Feature Selection

Pick one feature to add into the set

(if no improvement)Final feature set

Page 33: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Experimental Setup

Dataset: MIMIC-II (Multiparameter Intelligent Monitoring in Intensive Care II): A public-access ICU databaseThe data model can be used for both GHW patients with sensors and ICU patients

Our data: between 2001 and 2008 from a variety of ICUs (medical, surgical, coronary care, and neonatal)

Prediction goal: death or survivalReal-time vital signs: heart rate and oxygen saturation rateClass imbalance: most patients survivedEvaluation: Based on a 10-fold cross validation

Page 34: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Method Feature AUC Specificity Sensitivity PPV NPV

LSVM 1 0.5759 0.9497 0.0755 0.2550 0.7781

LR 1 0.4742 0.9483 0.0729 0.3181 0.7555

KSVM 1 0.5897 0.9497 0.1265 0.3643 0.7879

LSVM 2 0.4473 0.9497 0.0346 0.1300 0.7705LR 2 0.4902 0.9483 0.0313 0.1667 0.7473

KSVM 2 0.5016 0.9497 0.0676 0.2450 0.7768

LSVM 1 & 2 0.5757 0.9497 0.1416 0.3917 0.7694

LR 1 & 2 0.5370 0.9483 0.0521 0.2500 0.7513

KSVM 1 & 2 0.6332 0.9497 0.1428 0.4146 0.7911

LSVM: Linear SVMLR: Logistic RegressionKSVM: RBF Kernel SVM

1: DFA of Heart Rate2: DFA of Oxygen Saturation

Result – Linear and Nonlinear Classification

Page 35: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Algorithm Features AUC

KSVM DFA 0.6332

DFA + Cross-sign features 0.6565

DFA + Cross-sign features + ApEn 0.6753

All features 0.7079

Logistic Regression DFA 0.5370

DFA + Cross-sign features 0.5731

DFA + Cross-sign features + ApEn 0.5974

All features 0.7402

Result – Feature Combinations

Page 36: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Result – Feature Selection

Method #Selected Features

AUC Specificity Sensitivity PPV NPV

KSVM 5 0.7752 0.9654 0.4852 0.8041 0.8651

LR 23 0.7844 0.9483 0.5208 0.7692 0.8567

LR is our first choice: better AUC, interpretability, efficiency

Page 37: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

First 12 Selected Features (in logistic regression)

standard deviation of heart rate

ApEn of heart rate

Energy of oxygen saturation

LF of oxygen saturation

LF of heart rate

DFA of oxygen saturation

Mean of heart rate

HF of heart rate

Inertia of heart rate

Homogeneity of heart rate

Energy of heart rate

linear correlation of heart rate of oxygen saturation

Page 38: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Result – Our Final Model

Method AUC Specificity Sensitivity PPV NPV

1 0.7402 0.9500 0.3646 0.7000 0.8185

2 0.7767 0.9500 0.4615 0.9000 0.6440

3 0.8082 0.9500 0.4865 0.9000 0.6546

Method 1: Logistic Regression + all features

Method 2: Logistic Regression + all features + exploratory undersampling

Method 3: Logistic Regression + feature selection + exploratory undersampling

Page 39: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Current Work: Density-based LR

• Standard logistic regression φk(x) = xk:– P(y=1|x) = 1/(1 + exp( - ∑ wk xk))

– Probability of an event (e.g., ICU, death) grows or decreases monotonically with each feature

– Not true in many case: e.g., ICU transfer rate vs. age

• Ideas: transform each feature xk

Page 40: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Current Work: Density-based LR

• Use a kernel-density estimator to estimate p(xk, y=1) and p(xk, y=0) for each feature xk

• Resulting in a nonlinear separation plane that conforms to the true distribution of data

• Advantages over KLR, SVM– Efficiency, interpretability

Page 41: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Example of Density-based LR

Original LR Density-based LR

Test Data:

Page 42: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Future Work

• Distance-based classification algorithms for multi-dimensional time-series– Dynamic time warping, information distance

• Combination of feature-base and distance-based classification algorithms– Include distance information in the objective function

• Combining Tier-1 and Tier-2 data– Multi-kernel methods

• Interpretation of alerts– Based on the magnitude and sign of model coefficients

Page 43: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Acknowledgement

Page 44: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Real-Time Simulation on Historical Data

Method AUC SENS PPV NPV ACCU

1 0.6834 0.30159 0.2345 0.9634 0.9128

1 + EMA 0.78203 0.36508 0.27059 0.96664 0.9128

2 0.74359 0.30159 0.23457 0.96342 0.9293

2 + EMA 0.777737 0.38095 0.27907 0.96342 0.92134

4 0.77689 0.38905 0.27907 0.96745 0.9336

4 + EMA 0.81411 0.39683 0.28736 0.96825 0.92212

5 0.79902 0.4127 0.29545 0.96096 0.9229

5 + EMA 0.79902 0.4127 0.29545 0.96096 0.9229

@ Specificity=0.95

Page 45: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

(Assuming feature Independence)

Page 46: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Feature Coefficient

local homogeneity of heart rate -14.50

standard deviation of oxygen saturation

10.20

entropy of oxygen saturation 10.17

LF of heart rate 8.62

local homogeneity of oxygen saturation

7.77

LF/HF of oxygen saturation 4.53

inertia of heart rate 3.86

entropy of heart rate 2.97

low frequency of oxygen saturation

-2.89

mean of oxygen saturation -2.86

Page 47: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Let each be the bucket sample that is independently drawn from . is the predictor.

The aggregated predictor is: The average prediction error in is:

The error in the aggregated predictor is:

Using the inequality gives us .

( , ),1i iD y i m

' 2( ( , ))i i ie E y D y

( , )D y

2( ( , ))Ae E y D y

( , ) ( ( , ))A i iD y E D y

2 2( )EZ EZ'e e

Why Bagging Works?

( , )i iD y

Page 48: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Algorithm details – Biased Bucket bagging (BBB)

020000400006000080000

100000120000140000160000180000

2buckets

3buckets

4buckets

5buckets

A critical factor deciding how much bagging will improve accuracy is the variance of these bootstrap models. We see that BBB with 4 buckets has the largest difference between and . Besides this, BBB with 4 buckets also has the highest standard deviations in predict results. So we choose BBB with 4 buckets as the final method.

2 ( , )i iE D y 2 ( , )i iE D y

Standard deviation

Page 49: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Algorithm Details –Bucket Bagging

………….

………….

Final Model

Bucketing

Page 50: Department of Computer Science and Engineering Yixin Chen (陈一昕), Yi Mao, Minmin Chen, Rahav Dor, Greg Hackermann, Zhicheng Yang, Chengyang Lu School of

Result on Real-Time System

We can see that all cases attain best performance when is around 0.06, showing that the choice of is robust. This small optimal value shows that historical records plays an important role for prediction.

Cross validation for the EMA parameter