162
Albert-Ludwigs-Universität Freiburg Dissertation zur Erlangung des Doktorgrades der Fakultät für Mathematik und Physik Statistical Analysis of Processes with Application to Neurological Data Vorgelegt von Malenka Andrea Mader geb. Killmann aus Göppingen im Februar 2016

Statistical Analysis of Processes with Application to

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Analysis of Processes with Application to

Albert-Ludwigs-Universität Freiburg

Dissertationzur Erlangung des Doktorgrades

der Fakultät für Mathematik und Physik

Statistical Analysis of Processeswith Application toNeurological Data

Vorgelegt vonMalenka Andrea Mader

geb. Killmannaus Göppingen

im Februar 2016

Page 2: Statistical Analysis of Processes with Application to
Page 3: Statistical Analysis of Processes with Application to

Dekan: Prof. Dr. Dietmar KrönerBetreuer der Arbeit: Prof. Dr. Björn Schelter & Prof. Dr. Jens TimmerReferent: Prof. Dr. Jens TimmerKoreferent: Prof. Dr. Gerhard StockPrüfer (Theoretische Physik): Prof. Dr. Thomas FilkPrüfer (Experimentalphysik): Prof. Dr. Oskar von der LüheDatum der mündlichen Prüfung: 13.05.2016

Page 4: Statistical Analysis of Processes with Application to
Page 5: Statistical Analysis of Processes with Application to

Published Articles

2012:

• M. Killmann, L. Sommerlade, W. Mader, J. Timmer, B. Schelter. Inferenceof time-dependent causal influences in networks. Biomed. Eng., 57: 387–390,2012.

2013:

• M. Mader, W. Mader, L. Sommerlade, J. Timmer, B. Schelter. Block-bootstrapping for noisy data. J. Neurosci. Meth., 219: 285–291, 2013.

2014:

• M. Mader, J. Klatt, F. Amtage, B. Hellwig, W. Mader, L. Sommerlade, J.Timmer, B. Schelter. Spectral and higher-order-spectral analysis of tremortime series. Clin. Exp. Pharmacol., 4: 1000149, 2014.

• B. Schelter, M. Mader, W. Mader, L. Sommerlade, B. Platt, Y.C. Lai, C. Gre-bogi, M. Thiel. Overarching framework for data-based modelling. Europhys.Lett., 105: 30004, 2014.

• L. Sommerlade, M. Mader, W. Mader, J. Timmer, M. Thiel, C. Grebogi,and B. Schelter. Optimized spectral estimation for nonlinear synchronizingsystems. Phys. Rev. E. 89: 032912, 2014.

• W. Mader, Y. Linke, M. Mader, L. Sommerlade, J. Timmer, and B. Schel-ter. A numerically efficient implementation of the expectation maximizationalgorithm for state space models. Appl. Math. Comput. 241: 222–232, 2014.

• M. Mader, W. Mader, B. J. Gluckman, J. Timmer, B. Schelter. Statisticalevaluation of forecasts. Phys. Rev. E, 90: 022133, 2014.

• J. Jacobs, T. Golla, M. Mader, B. Schelter, M. Dümpelmann, R. Korinthen-berg, and A. Schulze-Bonhage. Electrical stimulation for cortical mappingreduces the density of high frequency oscillations. Epilepsy Res. 108: 1758–1769, 2014.

2015:

• L. Sommerlade, M. Thiel, M. Mader, W. Mader, J. Timmer, B. Platt, and B.Schelter. Assessing the strength of directed influences among neural signals:An approach to noisy data. J. Neurosci. Meth. 239: 47–64, 2015.

Page 6: Statistical Analysis of Processes with Application to

• W. Mader, M. Mader, J. Timmer, M. Thiel, and B. Schelter. Networks: Onthe relation of bi- and multivariate measures. Sci. Rep. 5: 10805, 2015.

Book Chapters

• B. Schelter, M. Thiel, M. Mader, and W. Mader. Signal Processing of theEEG: Approaches Tailored to Epilepsy. In: R. Tetzlaff, C. E. Elger, andK. Lehnertz, eds. Recent Advances in Predicting and Preventing EpilepticSeizures: Proceedings of the 5th International Workshop on Seizure Predic-tion. Singapore: World Scientific Pub Co; 2013. pp. 119–131

• L. Sommerlade, M. Thiel, B. Platt, A. Plano, G. Riedel, C. Grebogi, W.Mader, M. Mader, J. Timmer, and B. Schelter. Time-Variant Estimation ofConnectivity and Kalman’s Filter. In: K. Sameshima and L.A. Baccala, eds.Methods in Brain Connectivity Inference through Multivariate Time SeriesAnalysis. Florida: CRC Press; 2014, pp. 161–177

Conference Talks

• Inference of Time-Dependent Causal Influences in Networks. BiomedicalTechnology Conference, 2012, Jena.

• Block Bootstrapping for Point Processes: Statistical Evaluation of the Inter-action Structure. Workshop on Point Processes, 2011, Freiburg.

• Statistik zum Anfassen. Wochenendseminar für Doktoranden, 2014, Freiburg.

Page 7: Statistical Analysis of Processes with Application to
Page 8: Statistical Analysis of Processes with Application to
Page 9: Statistical Analysis of Processes with Application to

Contents

Glossary

Introduction 1

1. Statistical Inference 51.1. Point estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2. Interval estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3. Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4. Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5. Summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . 9

I. Statistical Inference of Process Properties 11

2. Time-Dependent Network Analysis 132.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.1. State-space modeling for network analysis . . . . . . . . . . 142.1.2. Estimation of state-space model parameters . . . . . . . . . 172.1.3. Network reconstruction from state-space parameters . . . . . 212.1.4. Statistical inference for network reconstruction . . . . . . . . 23

2.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2.1. Two approaches to time-resolved interaction measures . . . . 242.2.2. Parametric bootstrap for statistical assessment . . . . . . . . 26

2.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3. Optimal Block-Length Selection 313.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1.1. Traditional block-length selection . . . . . . . . . . . . . . . 323.1.2. Effect of noise onto block-length selection . . . . . . . . . . . 343.1.3. Robust block-length estimation . . . . . . . . . . . . . . . . 34

3.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2.1. Block-length dependence on noise-to-signal ratio . . . . . . . 373.2.2. Traditional vs. proposed block-length selection . . . . . . . . 383.2.3. Block bootstrap applied to tremor data . . . . . . . . . . . . 40

3.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Page 10: Statistical Analysis of Processes with Application to

Contents

4. Bispectral Analysis 434.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1.1. Spectrum and bispectrum . . . . . . . . . . . . . . . . . . . 444.1.2. Estimation of spectrum and bispectrum . . . . . . . . . . . 474.1.3. Normalizations of the bispectrum . . . . . . . . . . . . . . . 494.1.4. Statistical analysis of normalized bispectra . . . . . . . . . . 52

4.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2.1. Modeling first order harmonics . . . . . . . . . . . . . . . . 554.2.2. Performance of block bootstrap-based hypothesis test . . . . 584.2.3. Bispectral analysis of tremor time series . . . . . . . . . . . 60

4.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5. Phase-Amplitude Coupling 675.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.1.1. Concept of phase-amplitude coupling . . . . . . . . . . . . . 685.1.2. Measures of phase-amplitude coupling . . . . . . . . . . . . 715.1.3. Statistical assessment of phase-amplitude coupling . . . . . . 73

5.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

II. Statistical Assessment of Event Predictors 79

6. Predicting Extreme Events 816.1. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.1.1. Establishing a predictor . . . . . . . . . . . . . . . . . . . . 826.1.2. Quantifying the performance of predictors . . . . . . . . . . 846.1.3. Traditional hypothesis tests for event predictors . . . . . . . 876.1.4. Independent assessment of positive and negative predictions 88

6.2. Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.2.1. Simulating prediction-observation time series . . . . . . . . . 906.2.2. Testing the proposed random predictors . . . . . . . . . . . 96

6.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Summary 99

Bibliography 105

Page 11: Statistical Analysis of Processes with Application to

Contents

Appendix 125

A. Maximum-likelihood estimation 127A.1. Incomplete data likelihood and Kalman filter . . . . . . . . . . . . . 127A.2. Kalman smoother and lag-one covariance smoother . . . . . . . . . 129

B. Definition of the renormalized partial directed coherence 131

C. Rectification of the electromyogram 137C.1. Algorithm of Rectification . . . . . . . . . . . . . . . . . . . . . . . 137C.2. Reasoning for Rectification . . . . . . . . . . . . . . . . . . . . . . . 137

D. Results of spectral and bispectral analysis of tremor data 139

Page 12: Statistical Analysis of Processes with Application to
Page 13: Statistical Analysis of Processes with Application to

Glossary

Abbreviations

ARN [d] N -dimensional autoregressiveprocess of order d

EEG ElectroencephalogramEMG ElectromyogramET Essential tremorFN False negativeFP False positiveH0 Null hypothesisiid Independent identically

distributedIP Intervention periodNSR Noise-to-signal-ratioOP Occurrence periodPT Parkinsonian tremorRP Random predictorSSM State-space modelTN True negativeTP True positive

Distributions

B(k,K; p) Binomial distributioncorresponding to k eventsin K trials in which p is theprobability for an event

Dβ β-distributionN (µ, σ) Normal distribution

with expectation µand variance σ2

U(a, b] Uniform distributionon the interval (a, b]

Random variables and their samples

ε Dynamic noiseη Observation noiseR Random variabler Sample of R

X(t) Processx(t) Realizations of X(t)Y (t) Processy(t) Realizations of Y (t)Z(t) Processz(t) Realizations of Z(t)

Operators

det(·) Determinant of a matrixtr(·) Trace of a matrix· Analytic signal· Estimator· Fourier transform· Finite Fourier transform· Hilbert transform〈 〉 Sample meanCov CovarianceE ExpectationMSE Mean squared errorVar Variance⊗ Kronecker multiplicationvec Vec-Operator+ Complex conjugation′ Transposition

Properties of Processes

B BispectrumBc Bispectral coefficientBcoh BicoherenceCB Cross-bispectrumCBc Cross-bispectral coefficientCBcoh Cross-bicoherenceCoh CoherenceCS Cross-spectrumγ AutocorrelationH Inverse of Covariance

Page 14: Statistical Analysis of Processes with Application to

Glossary

m SkewnessR Autocovarianceρ Correlation coefficientS SpectrumΣ Covarianceσ Standard deviationσ2 Variance

Measures

A- False alarm rateB Brier scoreC Proportion correctCP Cross-periodogramD Kullback-Leibler distancef Feature, i.e. general measure

of one or several processesF+ False prediction rateH Heights-ratioM Modulation indexP Partial directed coherenceP PeriodogramP Phase-amplitude plotR Renormalized partial

directed coherenceρ2 Coefficient of determination,

or coefficient of multiplecorrelation

S+ SensitivityS- SpecificitysH Heidke skill score

Multitudes

N Dimension of a systemn+ Number of positive predictions

n− Number of negative predictionsnB Number of bootstrap realizationsnBin Number of binsnBl Number of blocks for the

block bootstrapnE Number of eventsnFN Number of false negativesnFP Number of false positivesnnE Number of absent eventsnR Number of repetitionsnTP Number of true positivesnTN Number of true negativesT Total number of data points

Common Matrices

0N N ×N zero matrix1N N ×N identity matrixA Process matrix of an

autoregressive process

Other Symbols

∼ Distributed asα Significance levelαc Confidence leveld Maximum lag of an

autoregressive processj, k Index, e.g., of processest Time, mostly discreteτ Lag of an autoregressive processfs Sampling rateν Frequencyω Oscillation frequency of a processφ, ϕ Phase or phase shiftΘ, θ Parameter

Page 15: Statistical Analysis of Processes with Application to
Page 16: Statistical Analysis of Processes with Application to
Page 17: Statistical Analysis of Processes with Application to

IntroductionPhysics is the science of describing natural phenomena and systems systemati-cally. A crucial modus operandi for this is the synergy of experiments and theory.While experimental physicists describe nature by reproducible observations, the-orists build models by which observations of nature may be explained. Basedon these models, predictions of the system’s behavior are made in the form ofhypotheses. These hypotheses are then validated by experiments, such that thelimits of the model’s applicability may be determined. To validate whether theobservations are in accordance with the predictions, methods of statistical in-ference are applied. In this thesis, different methods of statistical inference arepresented and applied within the field of neuroscience. This interdisciplinary fieldcombines methods of physics, mathematics, biology, medicine, and engineering inorder to understand functioning and interplay of neurons, or parts of the brain.

In the history of physics, major advances are based on the interplay of experimentand theory. Based on the extensive observations of Galileo Galilei, it was the theoristIsaac Newton, who derived his well-known laws of motion in the seventeenth cen-tury. Today, these laws are considered the foundation of classical physics. Withinthe following 200 years, classical physics was extended by thermodynamics and elec-tromagnetism. Most effects of contemporary every day life were well described bytheories of classical physics. At the end of the 19th century, however, these theoriesdid no longer suffice to explain discovered phenomena such as the photoelectric ef-fect. Observed by experimenters like Henry Hertz, the theoretic explanation basedon quantized light packages was derived by Albert Einstein in 1905. The usefulnessof Einstein’s model was tested in experiments, e.g., by Robert Millikan, who wasawarded the Nobel Prize in 1923, two years after Albert Einstein. Even thoughAlbert Einstein was awarded the Nobel Prize for the discovery of the photoelectriceffect, today, he is best-known for his theories of relativity. His special theory ofrelativity resulted from the findings of Michelson and Morley, who excluded thehitherto aether theory by their experiments. On the basis of the special theoryof relativity, Einstein postulated the general theory of relativity. He hypothesizeda bending of starlight by the Sun that would exceed the bending expected basedon Newton’s laws of gravity [181]. The accordance of Einstein’s hypothesis with SirArthur Eddington’s observations during the solar eclipse of 1919 was considered thestart of the acceptance of the general relativity. Today, general relativity is pow-erful theory of physics. A similarly powerful theory is provided by the StandardModel of particle physics. At the end of the 20th century, fundamental particleshypothesized by this theory were actually detected. The Higgs boson, one of thelast key particles hypothesized, was finally detected at the Large Hadron Collider

1

Page 18: Statistical Analysis of Processes with Application to

Introduction

in 2012 [1], supporting the power of the Standard Model.This synergy of theory and experiment has been accompanied by the applica-

tion of statistics. From the onset of quantitative physics up to today, methods ofstatistics have been applied in order to account for stochasticity of natural systems.There are three major reasons for the account of stochasticity [170]. First, even ifthe system analyzed was deterministic, smaller and clearer models might be derivedwhen modeling minor influences, stochastically. Second, observation of the systeminduces variability that may be modeled by stochasticity, denoted observation noise.Finally, some phenomena, such as nuclear decays, inherently are stochastic. In allthree cases, statistical methods are useful to quantify properties of the observationson the one hand, and relate them to those expected from the assumed model onthe other hand. While the first step is mostly achieved by descriptive statistics,the second step is implemented by statistical inference. The general assumptionof statistical inference is that observations are sampled from a statistical ensem-ble. Based on observations from this ensemble, parameters that characterize theunderlying system are determined. To specify uncertainties of resulting parameterestimates, confidence intervals are derived. To test whether a parameter conformsto a preconceived idea about the system when taking the system’s stochasticity intoaccount, hypothesis tests are conducted.Two fundamental methods of determining and accounting for the system’s vari-

ability are resampling-based and analytic methods. Resampling-based methods likethe bootstrap [48] or surrogate methods [207], aim at sampling the variability of aparameter by realizing new observations from an observed population. Analyticmethods quantify the variability of a parameter based on a mathematically deriveddistribution of the underlying population. The major advantage of analytic methodsis that their assumptions are clearer than those made when employing resampling-based methods. This results in a better specification of the variability that is tobe quantified by those methods. Moreover, analytic methods are computationallyless expensive. In this thesis, both resampling-based and analytic methods for sta-tistical inference are developed to investigate properties of stochastic systems thatchange their state over time. Such dynamic systems are denoted processes.In Part I, statistical methods for inference of properties of stochastic processes

are presented. Major concepts of statistical inference are summarized in Chap. 1.These concepts are point estimation, confidence interval estimation and hypothesistesting.In Chap. 2, it is discussed how stochastic complex systems may be modeled by

so-called autoregressive processes in a state-space model [196,199]. This model com-prises the time-dependent stochastic dynamics of system components and observa-tion noise. Furthermore it models the interaction of components of the complexsystem. Different interaction measures based on autoregressive modeling have beenintroduced [12,36,89,105,192,227]. While the statistical properties of these measures areknown for the case of time-constant interactions, the statistical assessment has beenunclear for the time-dependent case. This gap is closed by a bootstrap method to

2

Page 19: Statistical Analysis of Processes with Application to

Introduction

statistically infer confidence intervals for such measures of interaction [190]. One ofthe prerequisites of the bootstrap is independence of the resampled quantities [48].For processes, this prerequisite is generally not met, since subsequent states are cor-related. The idea of the proposed bootstrap is to resample independent residualsof the corresponding state-space model rather than the observations, directly.An alternative to model-based bootstrapping is the block bootstrap [32,119]. To this

end, the observations are subdivided into independent segments, which are resam-pled [32]. A key parameter of the block bootstrap is the length of segments [32,82,119].It has been shown that the optimal block length depends on the autocorrelation ofthe underlying process [82]. An explicit functional dependence of block length andautocorrelation has been derived for variance estimation of so-called smooth func-tion models, such as mean, variance, and variance of the mean [82]. So far, methodsof optimal block-length estimation have not accounted for stochastic effects thatdiminish the autocorrelation [160,187]. In Chap. 3, it is shown that this induces a biason the estimator of the optimal block length, and a method to overcome this bias ispresented [132]. This method is tested for its robustness with respect to stochasticityin simulations. Finally, it is applied to block bootstrap the confidence intervals ofestimated variance, analyzing neurological data.In Chap. 4, the block bootstrap is employed within the scope of hypothesis test-

ing for bispectral analysis. By conventional spectral analysis, linear properties ofprocesses and their interaction are assessed in the frequency domain [24]. A nonlin-ear extension of spectral analysis is bispectral analysis [214]. The bispectrum quan-tifies so called three-wave coupling, a second-order nonlinearity in the frequencydomain [23]. It is estimated by subdividing the observed time series into indepen-dent segments. Based on this subdivision, a block bootstrap-based hypothesis testis proposed [130]. This block bootstrap ensures the destruction of three-wave cou-pling. As tested in simulations, the corresponding hypothesis test is reliable [130],while its analytic counterpart [90,91] fails if the number of data is limited, as in mostapplications. Finally, the block bootstrap of bispectra is applied to neurologicaldata, when investigating first order harmonics [130].An analytic rather than bootstrap-based statistical method is presented in Chap. 5,

where phase-amplitude coupling is addressed. This wave coupling has been claimedto be an indicator of fundamental changes in the brain. The degree of phase-amplitude coupling is asserted to depend on the task performed [26,30,210,219], andincrease during learning [211] as well as in patients with pathologies like Parkin-son’s disease [41,127,194] and social anxiety disorder [142]. Phase-amplitude coupling iscommonly quantified by measures based on the so-called phase-amplitude plot [212].So far, resampling-based methods have been applied to statistically assess thesemeasures [30,121,161]. Their appropriateness has been doubted [10]. In Chap. 5, an an-alytic alternative, which is versatile with respect to the hypothesis tests that maybe derived from it, is proposed. The method is based on a transformation of thephase-amplitude plot to a statistic which is χ2-distributed.Time series analysis methods such as those presented in Part I, may be em-

3

Page 20: Statistical Analysis of Processes with Application to

Introduction

ployed to predict events such as thunderstorms, formation of stars, or epilepticseizures [71,84,99,138,143,145,149,150,185,224,230]. Due to the impact of such events, it is de-sirable to predict them trustworthily, which is addressed in Part II. To identifytrustworthiness, methods of statistics are applied [84,145]. By comparing the per-formance of a prediction method to that of a predictor randomly raising alarm,trustworthiness is quantified. In Chap. 6, an according statistical method based ona homogeneous Poissonian random predictor is presented. In contrast to traditionalstatistical methods, this random predictor allows for an independent statistical as-sessment of true positive and true negative predictions of an event predictor [131].With the assessment of reliability and power of the according hypothesis tests thisthesis closes.

4

Page 21: Statistical Analysis of Processes with Application to

1. Statistical Inference

The goal of statistical inference is to deduce properties of a system by analyzing a setof its observations. Statistical inference is based on the assumption that an observedstate of a stochastic system is one realization out of a larger population of states thatthe system possibly can be in. This population of states is mathematically describedby a random variable R with probability distribution Fθ. The distribution Fθ relatesR to the observation r as a function of the parameter θ. Statistical inference includesthree types of deductions from a set of observations. First, the parameter θ can bededucted by point estimation as summarized in Sec. 1.1. Second, plausible rangesof this parameter can be deducted. These ranges are determined by confidenceintervals of the point estimate, as summarized in Sec. 1.2. A common method forthe estimation of confidence intervals is the bootstrap. It is summarized in Sec. 1.3.The bootstrap may also be used for the third type of statistical inference, whichis hypothesis testing. As summarized in Sec. 1.4, the goal of hypothesis testingis to identify parameters that are not compatible with a predefined hypothesis.Hypothesis testing is thus dual to considering confidence intervals. Within thisthesis, all three approaches of statistical inference are employed, as presented in theoutlook in Sec. 1.5. If not indicated otherwise, the content of this chapter follows [86].

1.1. Point estimation

Let the random variables (R1, . . . , RT ) =: R associated to observations(r1, . . . , rT ) =: r be independent identically distributed (iid), denoted Rk

iid∼Fθ. Akey step of point estimation is to link the parameter of interest, θ, to R or corre-sponding observations r. The statistic

θR = h(R) = h(R1, . . . , RT ) , (1.1)

that constitutes this linkage by a function h, is called estimator of θ. An estimatorfor which h is a smooth function of the expectation E[Rk] of iid random variables Rk,k = 1, . . . , T , is called a smooth function-model estimator [82]. Inserting the observedsample r into Eq. (1.1) instead of the random variable R, yields the correspondingestimate

θr = h(r) = h(r1, . . . , rT ) (1.2)

of θ. The empirical distribution of estimates derived from a set of different samplesrj, j = 1, . . . , nR, is called sampling distribution. While an estimate is generally a

5

Page 22: Statistical Analysis of Processes with Application to

1. Statistical Inference

real number, vector or matrix of real numbers, the estimator is a random variablewith a distribution assigned to it. This distribution is estimated by the correspond-ing sampling distribution of estimates.Desirable properties of estimators are unbiasedness and consistency. An estimator

θR is unbiased if its expectation equals the true value of θ, i.e.,

E[θR

]= θ . (1.3)

If Eq. (1.3) holds only for T → ∞ the estimator is asymptotically unbiased. Anestimator is called consistent if its variance,

Var[θR] =: σ2θR, (1.4)

vanishes as T increases. If σ2θR

is consistently estimated by some estimator VR, the

standard deviation of θR, denoted σθR , is consistently estimated by

√VR =: σθR . (1.5)

This fact may be used for Gaussian confidence-interval estimation as derived in thefollowing section.

1.2. Interval estimation

Interval estimation means estimating the upper and lower bounds, θ(l)R and θ

(u)R ,

of a confidence interval (θ(l)R , θ

(u)R ). Here, only symmetric confidence intervals are

considered. Estimates of the bounds are obtained from the ql,u-quantiles of thesampling distribution derived from a set of estimates θr, respectively. The resultingconfidence interval contains αc = qu − ql of the mass of the sampling distribution.The percentage αc is denoted the confidence level. For a Gaussian estimator θR,the confidence interval of an estimator is

(θr − zqσθr , θr + zqσθr) (1.6)

It is specified by the estimated standard deviation σθr of the estimator, cf. Eq. (1.5),and zq, the q = ql = (100%−qu)-quantile of the standard normal distributionN (0, 1)with zero-mean and unit variance [50,169].

6

Page 23: Statistical Analysis of Processes with Application to

1.3. Bootstrap

Estimate θr based on observation r = (r1, . . . , rT ).

Estimate Fθr

by plugging θ, insteadof the true but unknown θ, into theparametrized distribution Fθ.

Parametric

Estimate Fθ from the observations r

by considering each rk equally proba-ble and building the histogram.

Nonparametric

Realize nB bootstrap samples r∗(j),j = 1, . . . , nB, from F

θ.

Realize nB bootstrap samples r∗(j),

j = 1, . . . , nB, from Fθ.

Estimate θ∗(j) from each bootstrap realization r∗(j) for all j = 1, . . . , nB.

Empirical distribution of {θ∗(j)}j=1,...,nB is sampling distribution of θ.

Figure 1.1.: Float chart of parametric and nonparametric bootstrap.

1.3. Bootstrap

The bootstrap is a Monte Carlo method proposed by Efron [48] in order to estimatethe distribution of an estimator θR based on one observed sample r of iid randomvariables Rk

iid∼ Fθ, for k = 1, . . . , T . The bootstrap imitates the situation that thedistribution Fθ underlying the sample r is known. If Fθ were known, nR new real-izations r(j) could be sampled, and corresponding estimates θr(j) could be derived.Their sampling distribution would estimate the distribution of the estimator θR,such that, e.g., confidence intervals as introduced in the previous section could bederived [49,50].In most cases, however, the underlying distribution is not known, and only one

sample r is available. The idea of the bootstrap is to estimate the underlying distri-bution Fθ based on r, and realize so-called bootstrap samples r∗(j) = (r∗1, . . . , r

∗T )(j),

j = 1, . . . , nB, from the estimated distribution. Based on these bootstrap realiza-tions, the distribution of the estimator θR may be estimated from the set of boot-strap estimates θr∗

(j)=: θ∗(j). Two approaches of estimating the underlying distri-

bution Fθ are distinguished as summarized in Fig. 1.1. The parametric approachis based on knowledge of the form of Fθ. Inserting the estimate θr instead of theunknown θ, yields the parametric estimate Fθrof the underlying distribution. Forthe nonparametric approach, no assumptions about Fθ are made. To estimate Fθfrom the sample r, each observation rk, k = 1, . . . , T , is considered equally proba-ble to be realized. The empirical distribution of rk then yields the nonparametric

7

Page 24: Statistical Analysis of Processes with Application to

1. Statistical Inference

estimate of Fθ, denoted Fθ. Realizing a bootstrap sample r∗(j) from this distribu-tion is equivalent to drawing T times randomly with replacement from the originalset {r1, . . . , rT} of observations [48,174]. As in the parametric case, the bootstrapsamples r∗(1), . . . , r

∗(nB) yield bootstrap estimates θ∗(1), . . . , θ

∗(nB) of the parameter θ.

Their sampling distribution estimates the distribution of θR, consistently, both inthe parametric and the nonparametric case [174].In time series analysis, successive states of a process are usually correlated, such

that one of the assumptions of the bootstrap is violated [119]. A common exampleof autocorrelated processes is an autoregressive model of order d ≥ 1, given by

X(t) =d∑

τ=1

A(τ)X(t− τ) + ε(t) , for t = 1, . . . , T , (1.7)

with appropriate initial conditions X(−τ), τ = 0, . . . , d − 1, and process matrixA(τ). Instead of bootstrapping corresponding realizations x(t), the iid residualsε(t)

iid∼N (0, σ2ε ) are bootstrapped [39,50,70]. As an alternative to bootstrapping resid-

uals of autoregressive models, the block bootstrap has been proposed [32]. To thisend, the time series of correlated data x(t), t = 1, . . . , T , is subdivided into eitheroverlapping [119] or non-overlapping segments [32], commonly denoted blocks. Insteadof single data points these blocks are bootstrapped. Generally, the nonparametricbootstrap is applied, such that bootstrap realizations are obtained by drawing withreplacement from the set of blocks [32,119]. Under certain differentiability conditionsof the estimator θR it may be shown that block bootstrapped sampling distributionsare asymptotically consistent [174].Due to its adaptability, bootstrap and block bootstrap methods have been widely

used [15,32,49–51,70,119,160,213]. While their power has been tested for estimators forwhich analytic results are known [48,49,82,119,174], the bootstrap is particularly usefulif analytic statistical analysis is too complex or impossible [50].

1.4. Hypothesis testing

Besides point estimation and interval estimation, hypothesis testing is the third keymethod of statistical inference. A hypothesis test consists of two major components.The first component is the null hypothesis, H0. It is a statement about the param-eter θ that is to be inferred from an observation r sampled from the underlyingdistribution Fθ. Generally, the null hypothesis is of the form H0 : f(θ) = f(θ0),where f(·) is a function of the parameter investigated. For the sake of brevity,here, f(·) is assumed to be the identity. In common tests like the t- or F -test,transformations are more complex [225]. The second component of a hypothesis testis the question whether H0 may be classified implausible when assuming that rhas actually been sampled from Fθ0 . The term “implausible” is quantified by thesignificance level α. In particular, H0 is considered “implausible” if, under H0, the

8

Page 25: Statistical Analysis of Processes with Application to

1.5. Summary and outlook

probability for values equal to or more extreme than θr, is lower than α. Math-ematically, the hypothesis test refers to the comparison of according quantiles ofthe null distribution of θR to the estimate θr derived from the observation r. Thenull distribution is the distribution of the estimator θR under the assumption thatH0 applies. For two-sided hypothesis tests, the reference quantiles are the α

2- and

(100% − α2)-quantiles of the null distribution. When rejecting H0, the alternative,

abbreviated H1 : θ 6= θ0, is considered true. Alternatives corresponding to one-sided hypothesis tests, are either H1 : θ < θ0 or H1 : θ > θ0. Reference quantilesof respective one-sided hypothesis tests are the α- and (100%− α)-quantiles of thenull distribution.While the null distribution is key to hypothesis testing, it is not known in most

applications. To derive the null distribution, the bootstrap may be a powerfuloption. For this, it is decisive to ensure that the distribution Fθ or Fθr estimatedwithin the scope of the nonparametric or parametric bootstrap, is in accordancewith H0

[50]. Then the hypothesis test may yield correct conclusions. A hypothesistest is correct if the probability to reject H0 is α in case H0 applies. A test iscalled reliable if it keeps the size correct, such that the probability to reject H0

if H0 applies, does not exceed α. If this probability is below α the test is calledconservative. It rejects H0 less likely than admissible.Performing a null hypothesis test is dual to estimating confidence intervals. The

(100%−α)-confidence interval of a parameter θ comprises those values θ0 for whichthe null hypothesis test would be rejected at the significance level α.

1.5. Summary and outlook

Throughout this thesis the statistical concept summarized in this chapter are ap-plied to infer properties of processes (Part I) or predictors (Part II), based onobserved time series. The notion of point estimators appears in all chapters. InChap. 2, the parameters of autoregressive processes are estimated within dynamicalcomplex systems. Based on these parameters, measures of interaction are derived.Plausible ranges of these measures are quantified by bootstrapped confidence inter-vals. In Chap. 3, the block bootstrap is employed to estimate confidence intervalsfor smooth function models. In the remainder of this thesis, hypothesis tests ratherthan confidence intervals are employed. The block bootstrap is applied to samplethe null distribution within the scope of bispectral analysis in Chap. 4. It is shownthat it outperforms the test corresponding to an analytic null distribution. A hy-pothesis test based on an analytic null distribution is designed for phase-amplitudecoupling in Chap. 5. For event prediction, analytic null distributions are derived inPart II, Chap. 6.

9

Page 26: Statistical Analysis of Processes with Application to
Page 27: Statistical Analysis of Processes with Application to

Part I.

Statistical Inference of ProcessProperties

11

Page 28: Statistical Analysis of Processes with Application to
Page 29: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

This chapter is based on the publications

M. Killmann, L. Sommerlade, W. Mader, J. Timmer, B. Schelter. Inferenceof time-dependent causal influences in networks. Biomed. Eng., 57: 387–390,2012. [108]

B. Schelter, M. Mader, W. Mader, L. Sommerlade, B. Platt, Y.C. Lai, C.Grebogi, M. Thiel. Overarching framework for data-based modelling. Euro-phys. Lett., 105: 30004, 2014. [190]

Complex systems are investigated in various fields of science, including nuclear andmolecular physics [13,60,153], quantum [47,120], laser [56,179,201], and statistical physics [3].The investigation of complex system ranges even to social sciences [63,77,98,158] andthe neurosciences [19,28,73,140]. In the latter, the interplay of neurons [19,73] or partsof the brain [28,140] is analyzed within the scope of complex systems. Analysis ofcomplex systems aims at understanding the dynamics of each subsystem as well asits interrelation with the remainder of the system [27,117]. Among other techniquesnetwork theory is applied [151]. The complex system is considered as a networkwith nodes which are interrelated by links [18,204]. Different types of links are distin-guished. Weighted links are distinguished from unweighted links, direct from indi-rect ones [94], and directed from undirected ones [151]. In the case of weighted links,numbers corresponding to degree of interaction are assigned to the connections be-tween nodes. Unweighted links refer to binary linkage. An indirect link connectstwo nodes by bilateral linkages to mediating nodes while a direct link connects twonodes without intermediate nodes [37,94]. Directed links indicate the direction of in-formation flow while undirected links only indicate that attached nodes interactat all [151]. This third distinction is based on the concept of Granger causality [79].Its fundamental assumption is that causes need to precede their effects. Node Ais Granger-causal for node B given all other nodes of the network if the futureof B is predicted with smaller forecast error when knowledge about A is includedinto the prediction [79]. A multitude of measures quantifying Granger causality havebeen developed [12,36,89,105,192,227]. These measures aim at reconstructing links in thenetwork based on recordings of the dynamics of the network nodes.Once links are identified, networks may be described by their properties, such

as average shortest path length and degree of centrality [16,134,222]. Based on such

13

Page 30: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

properties, e.g., the network’s susceptibilities to errors may be identified. Accordingto these susceptibilities, optimal mechanisms to prevent damage with high impact,as e.g., power grid blackouts [156,232] or disease spreading [75,167], may be developed.When drawing such conclusions, it is necessary to distinguish true links from

spurious ones. Since measures of interaction are applied to observations of thecomplex network, methods of statistical inference need to be employed to identifytrue links. Common statistical methods are tailored to time-constant interactionmeasures [105,188,192]. Applying these methods to time-resolved interaction measureswould not account for the correlation of subsequent time points. In this chapter,a parametric bootstrap method is proposed. Exemplarily, it is applied to estimatethe confidence intervals of the renormalized partial directed coherence, a measurequantifying Granger causality [192].While methodological foundations and the bootstrap procedure are presented

in Sec. 2.1, its application in simulation studies assessing the performance of thebootstrap is presented in Sec. 2.2.

2.1. Methodology

In Sec. 2.1.1, it is derived that stochastic processes may be time-discretely modeledby time-dependent autoregressive processes, as published in [190]. Observations ofthese processes are modeled by the state-space model [92]. A standard techniquefor estimating parameters of the state-space model is maximum-likelihood estima-tion [196]. It is summarized in Sec. 2.1.2. Based on maximum-likelihood parameterestimates, time-constant and time-varying interaction measures, as summarized inSec. 2.1.3, have been defined [12,36,89,105,192,199,227] . The bootstrap method by whichconfidence intervals may be estimated, even if the linkage of nodes varies over time,is presented in Sec. 2.1.4. It is published in [190].

2.1.1. State-space modeling for network analysis

Let a complex system consist of N processes {X1(t), . . . , XN(t)}, which are de-scribed by a set of coupled Ito stochastic differential equations [74],

dXk(t) = fk(X1(t), . . . , XN(t),θk

)d t+ dWk(t) , for k = 1, . . . , N , (2.1)

where t denotes continuous time. The dynamics ofXk(t) is described by the functionfk parametrized by θk. The dynamics of Xk(t) depends not only on itself but alsoon all other processes Xj(t), j 6= k. Stochasticity is reflected by the Wiener processWk(t), which is defined by [169]

(a) Wk(0) = 0,

(b) E[Wk(t)] = 0, for all t > 0,

(c) Wk(t) is normal, for all t > 0, and

(d) stationary independent incrementsdWk(t).

14

Page 31: Statistical Analysis of Processes with Application to

2.1. Methodology

SettingW (t) := (W1(t), . . . ,WN(t)) ′ with independent Wk(t), k = 1, . . . , N , andX(t) := (X1(t), . . . , XN(t)) ′, f := (f1, . . . , fk)

′, Θ = (θ1, . . . ,θk)′, where ′ denotes

the transposition, the stochastic differential equation (2.1) is reformulated [74],

dX(t) = f(X(t),Θ

)d t+ dW (t) . (2.2)

The network of N processes is then described as vector-valued solution X(t). TheIto stochastic differential equation (2.2) is solved in the interval [t, t+ ∆t] by [74,190]

X(t+ ∆t) = X(t) +

∫ t+∆t

t

f(X(τ),Θ

)d τ +

∫ t+∆t

t

dW (τ) , (2.3)

where ∆t is the integration step. The deterministic integral is∫ t+∆t

t

f(X(τ),Θ

)d τ = B(t)X(t) (2.4)

where B(t) is an appropriate time-dependent (N ×N)-matrix [190]. In case X(t) isa linear process, the matrix entry Bk1k2(t) corresponds to the linkage strength fromnode k2 onto k1 at time t + ∆t. If X(t) is nonlinear, B(t) contains X(t) besideslinkages of nodes.The stochastic term

∫ t+∆t

t

dW (τ) =: I (2.5)

in Eq. (2.3) is an Ito integral [95]. Since the Ito integral is defined as the mean-square limit of a linear combination of normal random variables, I itself is a randomvariable fully characterized by its first two moments [74]. Its expectation is zero bydefinition (b). Its covariance is proportional to ∆t, since each component Wk(t)is a solution of the Fokker-Planck equation with zero drift coefficient and diffusioncoefficient σWk

[74]. Accordingly, expression (2.5) is solved by a Gaussian processς(t) ∼ N (0N ,∆tΣς) with diagonal matrix Σς , or equivalently, by

√∆tζ(t) with

ζ(t) ∼ N (0N ,Σζ) and Σζ a ∆t-independent diagonal covariance matrix. Insertingthis and Eq. (2.4) into Eq. (2.3), yields

X(t+ ∆t) = A(t)X(t) +√

∆tζ(t) , (2.6)

with time-dependent process matrixA(t) = 1N+B(t) and noise ζ(t)iid∼N (0,Σζ)

[190].The process X(t) is observed at discrete time points ti = iδt, i = 1, . . . , T , with

a sampling rate fs = 1δt. This is modeled by [190]

Y (ti) = g(X(ti),Ψ) + η(ti) , (2.7)

i.e, observation noise η(ti) and an observation function g. The observation noiseη(ti)

iid∼ N (0,Ση) is uncorrelated across the M observed nodes, such that Ση is

15

Page 32: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

a diagonal matrix of size (M ×M). The observation function g models the mea-surement device specified by parameters Ψ. In the case of linearity, g(X(ti),Ψ) isreplaced by CX(ti), with C a (M ×N)-matrix containing parameters Ψ [190].In general, the integration step ∆t is much smaller than the sampling time δt,

e.g. δt = n∆t, n > 1. Since this may be incorporated by ensuring that only eachn-th data point of X(t) is observed in Eq. (2.7) [123], here, ∆t = δt is set withoutloss of generality. Then observation of the network is modeled by [199],

X(ti) = A(ti)X(ti−1) + ε(ti) , ε(ti)iid∼ N (0,Σε) , (2.8)

Y (ti) = CX(ti) + η(ti) , η(ti)iid∼ N (0,Ση) . (2.9)

Equation (2.8) is called the state equation [196]. It is anN -dimensional autoregressiveprocess of order 1 (ARN [1]), cf. Eq. (1.7) [169]. The noise component ε(ti) containsthe square root of the integration step,

√∆t. Equation (2.9) is called observation

equation, containing observation noise η(ti)[196]. Equations (2.8) and (2.9) are called

the state-space model (SSM) [196]. In the following, the discrete time point ti isdenoted t, and ti+1 is denoted t+ 1. To distinguish the case in which the parametermatrix A(t) varies over time from the case in which it is time-constant, Eqs. (2.8)and (2.9) are denoted time-varying or time-invariant SSM, respectively.In analogy to higher order differential equations, higher time lags d > 1 can be

incorporated into the state equation (2.8), such that

X(t) =d∑

τ=1

A(τ, t)X(t− τ) + ε(t) (2.10)

with a parameter matrix A(τ, t) for each lag τ = 1, . . . , d [169]. This is an N -dimensional autoregressive process of order d, denoted ARN [d]. By setting

X(t) =

X(t)X(t− 1)

...X(t− d)

, (2.11)

an ARN [d] is reducible to an ARNd[1],

X(t) = A(t)X(t− 1) + ε(t) (2.12)

with an augmented parameter matrix

A(t) =

A(1, t) A(2, t) A(3, t) . . . A(d− 1, t) A(d, t)1N 0N 0N . . . 0N 0N

0N 1N 0N. . . 0N 0N

0N 0N 1N. . . 0N 0N

...... . . . . . . ...

...0N 0N 0N . . . 1N 0N

, (2.13)

16

Page 33: Statistical Analysis of Processes with Application to

2.1. Methodology

containing not only the parameters A(τ, t) but also N -dimensional zero matrices,0N , and according identity matrices, 1N

[196]. The noise vector and its covariancematrix are augmented to ε(t) ∼ N (0Nd,Σε) with the (Nd × Nd) covariance ma-trix [196],

Σε =

Σε 0N . . . 0N0N 0N . . . 0N...

......

...0N . . . . . . 0N

. (2.14)

Choosing the observation matrix [196],

C =(C 0N . . . 0N

)(2.15)

with d− 1 zero matrices 0N , ensures that the observation, Y (t), remains unalteredby the transformation from an ARN [d] into an ARNd[1]. Without loss of generality,it is thus enough to consider time-invariant and time-varying SSMs that incorporateARN [1], where N = N d is chosen appropriately according to the model order d andthe dimension of the network N [196].

2.1.2. Estimation of state-space model parameters

A powerful method to estimate parameters Θ of the SSM, is maximum-likelihoodestimation [196]. It is first reviewed for the time-invariant SSM. The extensions nec-essary for the time-varying SSM are summarized afterwards.

Maximum-likelihood estimation for the time-invariant SSM

If not stated otherwise, this passage follows [196].A set of random variables {Rk}k=1,...,T is characterized by its joint probability den-

sity GΘ(R1, . . . , RT ) parametrized by a set of parameters Θ [169]. Based on some fixedset of parameters Θ0, GΘ0(R1, . . . , RT ) is the probability of a realization (r1, . . . , rT )in which each random variable Rk takes values rk, for k = 1, . . . , T . In maximum-likelihood estimation [65] the situation is opposite. The likelihood [169]

L(r1, . . . , rT |Θ) = GΘ(R1 = r1, . . . , RT = rT ) , (2.16)

of the observed data (r1, . . . , rT ) is a function of the set of parameters Θ [159].In maximum-likelihood estimation, L(r1, . . . , rT |Θ) is maximized with respect toΘ [4,65,169]. Since in most applications the random variables are iid normally dis-tributed and the logarithm is a monotonic function, it is common to minimize thelog-likelihood L(r1, . . . , rT |Θ), i.e., the negative logarithm of the likelihood withoutconstant terms, instead. Maximum-likelihood estimators Θ are asymptotically un-biased [169]. They are normally distributed with asymptotic variance given by theCramer-Rao bound [169].

17

Page 34: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

The parameters of the time-invariant SSM cf. Eqs. (2.8) and (2.9),

X(t) = AX(t− 1) + ε(t) , ε(t)iid∼N (0,Σε) , (2.17)

Y (t) = CX(t) + η(t) , η(t)iid∼N (0,Ση) , (2.18)

are Θ := {µ0,Σ0,A,Σε,Ση}, containing also the expectation of the initial valueX(0), µ0, and its covariance Σ0. To estimate these parameters by maximum-likelihood estimation from sampled observations y(1), . . . ,y(T ), two approaches areconceivable. For the first approach, the incomplete data log-likelihood is minimized.The incomplete data log-likelihood is the log-likelihood of innovations,

ξΘ(t) := y(t)− y(t|t− 1) , (2.19)

see App. A.1. Innovations ξΘ(t) are the residuals of the observation y(t) and theobservation expected when taking previous observations into account, i.e.,

y(t|t− 1) := E[y(t)|y(1), . . . ,y(t− 1)] = CE[x(t)|y(1), . . . ,y(t− 1)] . (2.20)

E[x(t)|y(1), . . . ,y(t)] is the expectation with respect to the probability distributionof the underlying state X(t) when observations y(1), . . . ,y(t) have been made upto time point t. To derive this conditional expectation, the Kalman filter, as sum-marized in App. A.1, may be employed. Finally, maximum-likelihood parametersmay be obtained from numerically finding the minimum of the incomplete datalog-likelihood. The major drawback of this approach is that the resulting parame-ters estimates do not need to correspond to stationary processes. For exponentialfamilies of probability distributions, stationarity of processes is ensured, on thecontrary, when applying the second approach [195]. It is based on the complete datalog-likelihood Lc

(x(1), . . . ,x(T ),y(1), . . . ,y(T ) |Θ

). Other than the incomplete

data log-likelihood of the first approach, the complete data log-likelihood incorpo-rates the hidden states x(t), as well. Since only y(1), . . . ,y(T ) are observed, theexpectation of Lc conditioned on these observations is minimized for the second ap-proach. To this end, the Expectation-Maximization algorithm [44] is applied. Eachiteration j of this algorithm consists of an expectation step and a maximization step.For the SSM, the expectation step consists of deriving the conditional expectationof the log-likelihood,

E[Lc(x(1), . . . ,x(T ),y(1), . . . ,y(T )

∣∣Θ) ∣∣∣y(1), . . . ,y(T ); Θj

]

= ln (det (Σ0)) + tr[Σ−1

0

(P (0|T ) +

(x(0|T )− µ0

)(x(0|T )− µ0

) ′ )]

+ T ln (det (Σε)) + tr[Σ−1ε

(S11 − S10A

′−AS10′+AS00A

′)]

+ T ln (det (Ση)) + tr[Σ−1η

T∑

t=1

(vv ′+CP (t|T )C ′

) ].

(2.21)

18

Page 35: Statistical Analysis of Processes with Application to

2.1. Methodology

with v = y(t)−Cx(t|T ) and

S11 =T∑

t=1

(x(t∣∣T)x(t∣∣T) ′−P

(t∣∣T)), (2.22)

S00 =T∑

t=1

(x(t− 1

∣∣T)x(t− 1

∣∣T) ′−P

(t− 1

∣∣T)), (2.23)

S10 =T∑

t=1

(x(t∣∣T)x(t− 1

∣∣T) ′−P

(t, t− 1

∣∣T)), (2.24)

where ′ denotes transposition, det(·) the determinant, and tr(·) the trace. Equa-tion (2.21) yields the expected complete data log-likelihood conditioned on theobservations y(1), . . . ,y(T ), and parameters Θj = {µ0, Σ0, A, Σε, Ση} of the j-thiteration, containing the conditional expectations

x(t∣∣T)

:= E[x(t)

∣∣y(1), . . . ,y(T ); Θj

], (2.25)

P(t∣∣T)

:= E[(x(t)− x(t|T )) (x(t)− x(t|T ))′

∣∣y(1), . . . ,y(T ); Θj] (2.26)

P(t, t− 1

∣∣T)

:= E[(x(t)− x(t|T )) (x(t− 1)− x(t|T ))′

∣∣y(1), . . . ,y(T ); Θj] .(2.27)

In contrast to E[·|y(1), . . . ,y(t)] occurring in the incomplete data log-likelihood, theexpectations in Eqs. (2.25)–(2.27) are conditioned on all observed data y(1), . . . ,y(T )

and the j-th iteration parameter estimate, Θj. Estimates of quantities (2.25)–(2.27)may be obtained from the Kalman smoother and the lag-one covariance smoother,as summarized in App. A.2 [178]. Initial values for both smoothers are derived fromthe Kalman filter, see App. A.1.In the second step of the Expectation-Maximization algorithm, the parameters

most likely having caused observations y(1), . . . ,y(T ) are determined. They arethe parameters that minimize the expectation of the complete data log-likelihoodof the current iteration j. In particular, these parameters are

A = S10S−100 , (2.28)

Σε =1

T(S11 − S10S

−100 S10

′) , (2.29)

Ση =1

T

T∑

t=1

[(y(t)−Cx(t|T )) (y(t)−Cx(t|T )) ′+CP (t|T )C ′

]. (2.30)

They constitute the set of parameters Θj+1 of the next iteration of the Expectation-Maximization algorithm. Iterating the Expectation-Maximization algorithm re-duces maximum-likelihood estimation in the SSM to the following algorithm.

19

Page 36: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

Choose an initial set of parameters Θ0 and iterate the Expectation-Maximizationalgorithm for j = 0, 1, . . . until the parameter sets of subsequent iterations donot change up to a predefined precision.

E Expectation step (j-th iteration):For the set of parameters Θj, employ the Kalman smoother to obtain x(t|T ),P (t|T ), and the lag-one covariance smoother to obtain P (t, t− 1|T ), as sum-marized in App. A.2, for all time points t. For initial values of Kalmansmoother and lag-one covariance smooother, the Kalman filter, as summa-rized in App. A.1, is employed.

M Maximization step (j-th iteration):Evaluate the maximization equations (2.28)–(2.30) using S11, S00, S10 asgiven by Eqs. (2.22)–(2.24) to obtain the set of parameters Θj+1 for the nextiteration of the Expectation-Maximization algorithm.

It has been shown that constraints onto parameters may by built into the algo-rithm [195]. Furthermore, criteria that ensure convergence have been identified [228].

Maximum-likelihood estimation in the time-varying SSM

If the parameter matrix A(t) of the SSM varies over time, the maximum-likelihoodmethod has to be adapted. This passage follows the derivation in [199], if not statedotherwise.The key step of maximum-likelihood estimation for time-varying parameter ma-

trices, A(t), is to assume that the time scale of variation of A(t) can be separatedfrom that of ε(t). To this end, each component of A(t) is considered as a pro-cess with bounded variation. All components of A(t) are subsumed in the vectorU(t) = vecA(t). Instead of fitting the time-varying SSM, Eqs. (2.8)–(2.9), theextended model

U(t) = U(t− 1) +$(t) , $(t)iid∼N (0,Σ$) , (2.31)

X(t) = A(t− 1)X(t− 1) + ε(t) , ε(t)iid∼N (0,Σε) , (2.32)

Y (t) = CX(t) + η(t) , η(t)iid∼N (0,Ση) , (2.33)

is fitted. Parameter estimation in the extended SSM is possible by employing theextended Kalman filter to the joint state vector (U(t),X(t)), yielding a nonlinearjoint state space. Alternatively, Eqs. (2.31)–(2.33) can be split up into the dualSSM. It consists of a SSM for states X(t),

X(t) = A(t− 1)X(t− 1) + ε(t) , (2.34)Y (t) = CXX(t) + η(t) , (2.35)

20

Page 37: Statistical Analysis of Processes with Application to

2.1. Methodology

and one for the parameter process U(t),

U(t) = U(t− 1) +$(t) , (2.36)Y (t) = CUU(t) + η(t) . (2.37)

The first state space remains unaltered from Eqs. (2.32) and (2.33), with CX = Cin Eq. (2.33). The state equation of parameters, Eq. (2.36), remains unaltered toEq. (2.31), as well. It reflects the bounded variation of parameter changes A(t),due to the finite variance of Σ$. The observation matrix CU in the observationequation (2.37), on the contrary, needs to be defined such that the observation Y (t)matches that given by Eq. (2.35). Maximum-likelihood estimation for dual SSMs,Eqs. (2.34)–(2.37), is performed using the dual Kalman filter [221]. To this end,Kalman filters of both state spaces are linked such that x(t|t) are filtered based onparameters u(t− 1|t− 1), while parameter states u(t|t) are filtered based on priorstate estimates x(t−1|t−1). When including Gaussian error propagation, also thecovariances P (t|t− 1) of both state spaces are linked, leading to better estimates.

2.1.3. Network reconstruction from state-space parameters

Based on the parameters of the SSM, the dynamics of a network may be recon-structed [12,79,192]. This is made explicit by the example of a network consisting oftime-invariant dynamics and links between N nodes such that the process matrixA is a time constant (Nd×Nd) matrix. For the reconstruction of interactions,the time-lagged (N ×N) parameter matrices A(τ), τ = 1, . . . , d, are consideredinstead. The influence of node k onto node j, may be quantified in two ways.Either entries Ajk(τ) of all time lags τ = 1, . . . , d are investigated, directly. Theoff-diagonals Ajk(τ), j 6= k, refer to interactions as derived in Sec. 2.1.1. Alterna-tively, interaction measures summarizing these entries are applied. A multitude ofsuch measures have been introduced [12,37,105,192]. By these measures, weighted directdirected links are quantified both in the time and the frequency domain. A timedomain measure is the directed partial correlation [37,52]. In many applications inves-tigation in the frequency domain is more insightful. The frequency domain analogof the directed partial correlation is the partial directed coherence (PDC) [12]. Basedon the parameters of an ARN [d], it is defined [12]

Pjk(ν) =

∣∣∣Ajk (ν)∣∣∣

√∑

l

∣∣∣Alk(ν)∣∣∣2∈ [0, 1] , (2.38)

where

A (ν) = 1−d∑

τ=1

A(τ)e−iντ (2.39)

21

Page 38: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

is essentially the finite Fourier transform of estimated parameter matrices A(τ).The PDC, Pjk(ν), quantifies the influence of node k onto j relative to the influenceof node k onto all other nodes. Consequently, the PDC ranges from 0, if node k doesnot influence j, to 1, if node k influences node j, exclusively. A major drawbackof the PDC is its counterintuitive normalization by the total influence of nodek onto the network. To address this drawback, an alternative normalization hasbeen proposed resulting in the renormalized partial directed coherence (rPDC) [192].Essentially, it normalizes the squared numerator of the PDC by the covariance ofthe Fourier transformed parameter estimates. To define the rPDC from node k ontoj, consider the (2× 1)-vector [192]

F jk(ν) =

(Re(Ajk(ν))

Im(Ajk(ν))

)=

(−∑d

τ=1 Ajk(τ) cos(ντ)∑dτ=1 Ajk(τ) sin(ντ)

), (2.40)

such that F jk(ν) ′ F jk(ν) =∣∣∣Ajk(ν)

∣∣∣2

. It contains real and imaginary part of the

Fourier transform of the estimated parameter matrices, Eq. (2.39), for j 6= k [192].Its covariance is

gggjk(ν) = F jk(ν) F jk(ν) ′

=d∑

τ1,τ2=1

Ajk(τ1) Ajk(τ2)

(cos(τ1ν) cos(τ2ν) − cos(τ1ν) sin(τ2ν)− sin(τ1ν) cos(τ2ν) sin(τ1ν) sin(τ2ν)

).

(2.41)

As shown in App. B,

Ajk(τ1) Ajk(τ2) = T Σε,jj Hkk(τ1, τ2) . (2.42)

The covariance of the noise εj(t) of process Xj(t) is Σε,jj, and the inverse covarianceof the process is

H(τ1, τ2) =(xd(τ1) xd(τ2) ′

)−1

, (2.43)

where xd(τ1) xd(τ2) ′ is the covariance of delay-embedded states x(t|T ) obtainedfrom the Kalman smoother, for t = 1, . . . , T . The notation Hkk(τ1, τ2) refers toentries of this covariance matrix corresponding to the k-th process and lags τ1,τ2, respectively. For the derivation of Eq. (2.42) and an AR2[2] example of thisderivation, see App. B. Note that the renormalized partial directed coherence wasoriginally defined for xdk(τ1) the delay-embedded but unsmoothed states [192]. This isdue to the fact, that it was derived under the assumption of observing a realizationx(t) of the underlying process, directly, instead within the SSM [192].Finally, the rPDC is defined,

Rjk(ν) = F jk′(ν)Υ−1

jk (ν)F jk(ν) . (2.44)

22

Page 39: Statistical Analysis of Processes with Application to

2.1. Methodology

with normalization

Υjk(ν) =d∑

τ1,τ2=1

Σε,jj Hkk(τ1, τ2)

(cos(τ1ν) cos(τ2ν) − cos(τ1ν) sin(τ2ν)− sin(τ1ν) cos(τ2ν) sin(τ1ν) sin(τ2ν)

), (2.45)

instead of the covariance gggjk(ν), Eq. (2.41) [192].In case the parameter matrices vary over time, i.e., A(τ, t) rather than A(τ), the

time-resolved rPDC Rjk(ν, t) is obtained analogously, inserting A(τ, t) instead ofA(τ) [199].

2.1.4. Statistical inference for network reconstruction

For the time-constant parameter case, the distribution of the renormalized par-tial directed coherence (rPDC) for each frequency is given by a noncentral χ2-distribution [192]. However, if the parameters and thus the rPDC are time-dependent,the rPDC-distributions at subsequent time points are correlated. Statistically test-ing the significance of subsequent rPDC-values would exhibit correlations of subse-quent tests.This is circumvented by the proposed parametric bootstrap of SSM-residuals,

published in [190]. By this bootstrap, the temporal correlation of subsequent pa-rameters is naturally incorporated, avoiding separate tests for each time point. Itsfundamental idea is to assume that the fitted parameters Θ of the SSM are the trueones. The variability of the system is bootstrapped by simulating new realizationsfrom the SSM based on a parametric bootstrap of the residuals in the SSM. The dis-tribution of the interaction measure computed for all bootstrap realizations mimicshow the variability of the system is transferred to the interaction measure. Thisvariability is quantified by confidence intervals for the measure. In particular, thefollowing steps are conducted to bootstrap confidence intervals for the time-resolvedrPDC.

1. Estimate parameters of the time-varying SSM, Θ = {µ0, Σ0, A(t), Σε, Ση}.To this end, apply the Expectation-Maximization algorithm employing thedual Kalman filter and smoother, as well as the lag-one covariance smootherto the observation y(1), . . . ,y(T ). Derive the time-resolved rPDC R(ν, t)according to Eqs. (2.40), (2.44) and (2.45) with time-dependent parameters.

2. Generate residuals ε∗(t) iid∼ N (0, Σε) of the state equation, (2.34), andη∗(t)

iid∼ N (0, Ση) of the observation equation, (2.35), of the time-varyingSSM, for t = 1, . . . , T .

3. Generate a bootstrap realization y∗(1), . . . ,y∗(T ) by integrating the time-varying SSM, using parameters estimated in 1. and bootstrap residuals ob-tained in 2.

23

Page 40: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

4. Estimate parameters of the time-varying SSM from the bootstrap realizationin 3., and derive bootstrap rPDC R∗(ν, t), analogously to step 1.

Steps 2. – 4. of this algorithm are repeated nB times, resulting in a set of nB boot-strap rPDCs, R∗(1)(ν, t), . . . ,R

∗(nB)(ν, t). Confidence intervals are derived from the

quantiles of the sampling distribution of bootstrap rPDCs.

2.2. Application

When reconstructing the interaction structure of a network based on the rPDC as afunction of time, two approaches are conceivable. Either data is cut into segmentsin which the dynamics are considered constant. Time-constant parameters A areobtained for each segment according to the Expectation-Maximization algorithm.As shown in Sec. 2.2.1 and published in [108], this approach does not resolve thedynamics of interaction, satisfactorily, if interactions vary on a shorter time scalethan the length of segments. Estimating the parameters from the dual SSM is apromising alternative. To statistically validate the resulting time resolved rPDC,the proposed bootstrap is applied in Sec. 2.2.2, following publication [190].

2.2.1. Two approaches to time-resolved interaction measures

A naive way of obtaining the time-resolved renormalized partial directed coherence(rPDC) is deriving it segmentwise as in the time-constant way. To compare thisapproach to a pointwise time-resolved approach, a realization of the AR2[2],

X(t) =

(X1(t)X2(t)

)=

(1.3 c(t)0 1.7

)X(t− 1) +

(−0.8 0

0 −0.8

)X(t− 2) + ε(t)

(2.46)

with Gaussian noise ε(t) iid∼N (0,Σε), Σε = 12, is considered. Initial values aredrawn randomly. Transients are removed. The main frequencies of the two oscilla-tors X1(t) and X2(t) are ω1 = 0.93Hz and ω2 = 1.26Hz, respectively. Observationis modeled

Y (t) = X(t) + η(t) (2.47)

according to the SSM. The covariance of the observation noise η(t)iid∼N (0,Ση) is

diagonal with entries such that the noise-to-signal-ratio (NSR) is 10%. Interactionis modeled by the sinusoidal coupling strength

c(t) = sin

(2π

4

5 000t

), (2.48)

24

Page 41: Statistical Analysis of Processes with Application to

2.2. Application

0 1000 2000 3000 4000 50000

0.5

1

1.5

2

2.5

time in samples

rPD

C

0 1000 2000 3000 4000 50000

0.002

0.004

0.006

0.008

0.01

time in samples

(a)

0 1000 2000 3000 4000 50000

0.5

1

1.5

2

2.5

time in samples

rPD

C

0 1000 2000 3000 4000 50000

0.002

0.004

0.006

0.008

0.01

time in samples

(b)

Figure 2.1.: Time-constant (a) and time-dependent (b) renormalized partial directedcoherence at oscillation frequencies of the simulated AR2[2] with sinu-soidal coupling.

from X2(t) onto X1(t), for t = 1, . . . , 5 000. The interaction is reconstructed fromthe observed realization y(t) by computing the rPDC. Within the T = 5 000 datapoints the coupling c(t) conducts 4 oscillations. The rPDC is not designed todiscriminate negative influences from positive ones such that 8 rPDC-cycles areexpected.To compute the rPDC, parameters are fitted based on the observations y(t)

(a) by the time-constant approach in nBl = 20 non-overlapping segments of 250data points each. To this end, time-constant parameters are estimated usingthe algorithm published in [133].

(b) by the time-resolved approach based on the dual Kalman filter. To this end,time-resolved parameters are estimated [199].

For the estimation of parameters, the model order d = 10 is chosen. This acknowl-edges that the exact model order is unknown in applications. From the estimatedparameters, the rPDC is calculated for j 6= k ∈ {1, 2}, respectively, yielding

(a) the rPDC Rbjk(ωk), b = 1, . . . , nB, segment by segment.

(b) the time-resolved rPDC Rjk(ωk, t), t = 1, . . . , T [199].

The resulting rPDC values are evaluated at the main frequencies ωk of the drivingoscillator. Results are shown in Fig. 2.1 (a) and (b), respectively, for j = 1 andk = 2, i.e., for the direction of actual sinusoidal interaction from process X2(t)onto X1(t). Note, that the ordinate is increased for the rPDC derived segment bysegment. This is due to the fact that the rPDC scales with the number of datapoints. To ease comparability, the two time series of rPDC are plotted such thatthe ordinate in (a) is increased by a factor 250, referring to the number of datapoints in each segment, with respect to the ordinate in (b).

25

Page 42: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

0 2 4 6 8

x 10−3

0

10

20

30

40

rPDC

hist

ogra

m

(a)

0 1000 2000 3000 4000 50000

5

10

x 10−3

time in samples

rPD

C

x1 <− x2

0 1000 2000 3000 4000 50000

5

10

x 10−3

time in samples

x2 <− x1(b)

0 1000 2000 3000 4000 50000

5

10

x 10−3

time in samples

rPD

C

x1 <− x2

0 1000 2000 3000 4000 50000

5

10

x 10−3

time in samples

x2 <− x1(c)

Figure 2.2.: Exemplary histogram of bootstrap-rPDCs (a), and time courses of raw(red dotted) and statistically assessed (blue solid) rPDC (b,c). Sinu-soidal coupling from X2(t) onto X1(t) is revealed (b). Absent couplingof the other direction is correctly identified, too (c).

Since the dynamics of interaction change within segments, the sinusoidal inter-action is not recovered well, cf. Fig. 2.1 (a). Applying the time-varying approach,on the contrary, 8 oscillations are revealed as expected, see Fig. 2.1 (b). To statisti-cally assess the time resolved rPDC, Rjk(ωk, t), the proposed bootstrap is applied,as described in the following.

2.2.2. Parametric bootstrap for statistical assessment

To statistically assess the time resolved interaction measure Rjk(ωk, t) of an ob-served network, y(t), the parametric bootstrap is applied according to the algorithmpresented in Sec. 2.1.4. To this end, the parameters fitted to derive Rjk(ωk, t) areused to sample nB = 100 parametric bootstrap realizations y∗(t) of the network.For each bootstrap realization, parameters of the corresponding dual SSM are fit-ted, and R∗jk(ωk, t) is computed for j 6= k ∈ {1, 2}, as for the original realizationy(t) [199]. Based on bootstrap rPDCs, confidence intervals are estimated assuminga normal distribution of rPDCs. Confidence intervals of the form

(Rjk(ωk, t)− zαcσR∗ ,Rjk(ωk, t) + zαcσR∗) (2.49)

26

Page 43: Statistical Analysis of Processes with Application to

2.2. Application

are obtained for each time point t = 1, . . . , T at the main frequency ωk of thedriving process. These confidence intervals are based on the standard deviation σR∗

jk

estimated from the bootstrap rPDCs R∗jk(ωk, t) at each time point t = 1, . . . , T .The confidence level is chosen αc = 90%, such that the percentage of expectedvalues lower than the lower bound of the corresponding confidence intervals is 5%.The corresponding quantile of the standard normal distribution is zαc ≈ 1.65. Tojustify the normal approximation of the confidence intervals, the histogram of the100 bootstrap rPDCs evaluated at a time point at the end of the simulation andfrequency ω2 is shown in Fig. 2.2 (a), exemplarily.Based on the estimated confidence intervals, the rPDC Rjk(ωk, t) is set 0, if the

bootstrap confidence interval includes 0. Time-resolved results of this zero-correctedrPDC are shown in Fig. 2.2 (b,c, in solid blue). While the sinusoidal couplingfrom X2(t) onto X1(t) is still revealed after bootstrapping, the bootstrap correctlyidentifies absent coupling from X1(t) onto X2(t). The percentage of false positivesis 0.3%. For comparison, raw rPDC, i.e. without zero-correction, is displayed inred dots.To test the estimated confidence intervals for the rPDC in more complex networks

than the sinusoidally coupled AR2[2], an AR4[2] network

X(t) = A(t, 1)X(t− 1) +A(t, 2)X(t− 2) + ε(t) (2.50)

with X(t) = (X1(t), X2(t), X3(t), X4(t)) ′, noise ε(t) iid∼N (0,14), and process matri-ces

A(t, 1) =

1.3 c12(t) c13(t) c14(t)c21(t) 1.6 c23(t) c24(t)c31(t) c32(t) 1.5 c34(t)c41(t) c42(t) c43(t) 1.7

and A(t, 2) = −

0.8 0 0 00 0.8 0 00 0 0.8 00 0 0 0.8

(2.51)

is considered. Time-dependent coupling is shown in Fig. 2.3. In particular, couplingstrengths are

c12(t) =

{0 if t ≤ T

3

0.7 else, c21(t) =

{0.7 if t ≤ T

5

0 else

c24(t) = e−t

2 500 sin

(25

5 000t

), c31(t) = 0.5 , for all t ,

c34(t) = 0.8

{t/2 500 if t ≤ T

2

2− t/2 500 elseT2

,

(2.52)

and zero coupling otherwise. Again, T = 5 000 data points are simulated, theobservation matrix is an identity matrix and observation noise is added, such thatthe NSR is 10%, as before.

27

Page 44: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

0 5000−1

0

1

time in samples

coup

ling

stre

ngth

0 5000−1

0

1

0 5000−1

0

1

(a)

1 2

3 4

(b)

Figure 2.3.: Time-resolved coupling of the AR4[2]-network in matrix representa-tion (a) and presented as a network (b).

2000 40000

0.01

0.02

2000 40000

0.01

2000 4000012

x 10−3

2000 40000

1

2x 10

−3

2000 40000

5

x 10−3

2000 4000012

x 10−3

2000 40000

0.51

x 10−3

2000 40000

5

x 10−4

2000 40000

0.51

x 10−3

2000 40000

1

x 10−3

time in samples

rPD

C

2000 40000

5

x 10−3

2000 40000

0.02

Figure 2.4.: Statistically evaluated rPDCs at oscillation frequencies of the respec-tively driving oscillator. Coupling of the AR4[2] network as simulatedand shown in Fig. 2.3 is mostly revealed, while absent coupling correctlyidentified as well.

28

Page 45: Statistical Analysis of Processes with Application to

2.3. Summary

Results of the raw (dotted red) and bootstrap-assessed rPDC (solid blue) areshown in Fig. 2.4. The average false positive rate is 4.0%, i.e., close to the expected5%. For permanently absent coupling, the average false positive rate is 4.4%. Forthe direction from 3 to 4, the bootstrap correctly identifies all nonzero rPDCs ascompatible with 0 even though rPDC values range up to 0.03. The on-off cou-pling from 1 to 2 as well as the off-on coupling from 2 to 1 are reconstructed with4.4% and 0.5% false positives, respectively. The persistent coupling from 1 to 3is recovered with 0.2% false negatives. Exponentially damped coupling from 4 to2 is reconstructed in shape and strength. All eight oscillations are revealed. Thetriangular coupling from 4 to 3 is qualitatively reconstructed.To conclude, network dynamics are recovered by the rPDC when statistically

validated by the bootstrap, even for larger networks with complicated coupling.Direct connections are correctly identified and distinguished from indirect links.

2.3. Summary

To resolve properties of complex systems, network analysis is applied in variousfields of science [3,13,19,28,47,56,60,63,73,77,98,120,140,153,158,179,201]. Key to network analysisis to resolve the dynamics of interactions between the nodes of the network. Asmathematically derived [190], coupled stochastic processes may be modeled by time-varying autoregressive processes. Based on the parameters of autoregressive pro-cesses, a multitude of multivariate measures quantifying interactions of nodes, suchas Granger causality, have been defined [12,36,89,105,192,227]. In contrast to the com-mon correlation, they are capable of distinguishing direct from indirect links [37,94].Furthermore the direction of interaction between nodes is identified [151]. So far,most applications assume time-constant parameters of the autoregressive processes,resulting in time-constant links [12,36,89,105,192,227]. According parameters are fittedby maximum-likelihood estimation, when modeling the observation of processes asdone by the state-space model [196]. Generally, however, the interaction of nodesvaries over time, such that a time-resolved approach is in need [199]. As presented inthis chapter, a first approach is to consider segments of data, in which establishedtime-constant measures are applied. As shown by simulations, this is insufficient ifthe dynamics of interaction changes within segments [108]. Choosing segments arbi-trarily small may not be possible due to a finite sampling frequency. Furthermore,the variance of the parameter estimates increases considerably. The alternativeapproach incorporates the time dependence in the state-space model in order toobtain time-resolved measures of interaction [199]. The system’s parameters are as-sumed to vary over time themselves, resulting in the time-varying state-space model.As shown by simulations, this renders identification of rapid changes of the inter-action in networks possible [108,190,199]. The drawback of this approach, so far, hasbeen that statistical methods established for the time-constant measures should notbe applied anymore due to the temporal correlation of parameters. This is over-come by a residual-based parametric bootstrap to estimate confidence intervals as

29

Page 46: Statistical Analysis of Processes with Application to

2. Time-Dependent Network Analysis

presented in this chapter [190]. This bootstrap is exemplarily applied to estimateconfidence intervals of the renormalized partial directed coherence [192] for a two-and four-dimensional network in which different types of coupling are simulated.Interactions are dependably reconstructed when applying the proposed bootstrap.As shown by simulations, spurious interactions would be supposed if the interactionmeasure was not statistically assessed by the bootstrap.

30

Page 47: Statistical Analysis of Processes with Application to

3. Optimal Block-Length Selection

This chapter is based on the publication

M. Mader, W. Mader, L. Sommerlade, J. Timmer, B. Schelter. Block-bootstrapping for noisy data. J. Neurosci. Meth., 219: 285–291, 2013. [132]

In the previous chapter, a parametric bootstrap of independent residuals in thestate-space model is presented [190]. The bootstrap assumption of independence ismet since independent residuals, rather than autocorrelated observations, are re-sampled. If the assumption of independence is not met, on the contrary, a nonpara-metric block bootstrap [32] presents an alternative. It is applicable if the process ismixing [32]. This is the case if states of amply apart time points are independentin probability [32]. To conduct the block bootstrap, the time series to be analyzedis subdivided into segments, furthermore denoted blocks [32]. From the ensemble ofblocks, bootstrap realizations are obtained by drawing randomly with replacementand associating drawn blocks until time series of the same lengths as the originaltime series are obtained [86].A common method to select block lengths for the block bootstrap is based on

balancing bias and variance of the bootstrap estimator [82]. Longer blocks reducethe bias, yet, variance is increased. The optimal block length is defined as thelength by which the mean squared error of the considered estimator and its blockbootstrapped correlate is minimized [82,160,187]. It has been shown that optimal blocklengths scale with T , the number of data used for the analysis [82]. In particular, theoptimal block length scales as T

13 if the variance of estimators is block bootstrapped.

The coefficient of proportionality of T13 depends on both processes and estimators

analyzed. It is a function of the autocorrelation of the process, when bootstrap-ping the variance of smooth function-model estimators [82]. Based on this variance,Gaussian confidence intervals may be derived [86]. For exponentially mixing pro-cesses, it has been proposed to consider the envelope of the exponentially decayingautocorrelation, rather than the autocorrelation itself in order to select the blocklength optimally [160]. As shown in this chapter, such methods need to be adaptedif the process investigated is governed by noise that diminishes the autocorrelationsystematically. These findings follow the publication [132].In Sec. 3.1, theoretical considerations of the optimal choice of block length and

necessary adaptions for the noisy regime are presented. The necessity for adaptionsis illustrated by simulations in Sec. 3.2. Block-length selection is finally appliedto a block bootstrap in order to estimate confidence intervals within the scope ofvariance estimation based on data recorded from a tremor patient.

31

Page 48: Statistical Analysis of Processes with Application to

3. Optimal Block-Length Selection

3.1. Methodology

The optimal block length has been defined by the number of data points, L, forwhich the mean squared error MSE(L) = E[(θ∗(L)− θ)2] of the estimator θ and itsbootstrapped correlate θ∗(L) is minimized [82]. The bootstrapped estimator, θ∗(L),is a function of the block length L, since it depends on the realizations sampledfrom the ensemble of blocks, which in turn vary with the block length. Traditionalselection of block length for smooth function models is summarized in Sec. 3.1.1.The effect of noise onto block-length selection is modeled in Sec. 3.1.2. Block lengthsare biased if traditional methods of block-length selection are applied. Adaptionsthat take the effect of noise into account are presented in Sec. 3.1.3.

3.1.1. Traditional block-length selection

The optimal block length of smooth function models estimators, θ = h(E[Y ]), forunivariate processes Y (t) is obtained from minimizing [82]

MSE(L) = E[(θ∗(L)− θ

)2]

= C20

(1

T 2L2C2

1 +L

T 3vC2

2

), (3.1)

with v = 1 in the case of non-overlapping blocks and v = 23in the case of maximally

overlapping blocks, as well as constants

C1 =∞∑

τ=−∞|τ | γY (τ) and C2 =

∞∑

τ=−∞γY (τ) , (3.2)

containing the true autocorrelation [169]

γY (τ) =1

Var[Y ]E[Y (t)Y (t− τ)] , for all t. (3.3)

Here, Y (t) is assumed to be stationary. The constant C0 depends on the structureof the smooth function-model estimator. In particular,

C0 =

{−1

2h′′(µ) if θ = bias

h′(µ) if θ = variance, (3.4)

for bias and variance estimation, respectively. Its exact value is of minor relevancefor optimal block-length selection, since it disappears when setting the first deriva-tive of MSE to zero,

0!

=ddL

(MSE(L)) = C20

( −2

T 2L3C2

1 +1

T 3vC2

2

). (3.5)

32

Page 49: Statistical Analysis of Processes with Application to

3.1. Methodology

This is a necessary condition for minimizing the mean squared error, such that theoptimal block length

L =

(2T

C21

vC22

) 13

=

(2T

(∑∞τ=−∞ |τ | γY (τ)

)2

v(∑∞

τ=−∞ γY (τ))2

) 13

(3.6)

is a function of T and the autocorrelation γY (τ), yet independent of C0.Since the characteristic time scale of the process is given by the envelope of the

autocorrelation, it has been proposed to substitute γY (τ), τ ≥ 0, by its envelopeφτ with decay constant φ < 1 [160]. With the geometric series

∞∑

τ=−∞φ|τ | =

−1∑

τ=−∞φ|τ | + φ0 +

∞∑

τ=1

φτ = 2φ∞∑

τ=0

φk + 1 = 2φ1

1− φ + 1 (3.7)

and its derivative,

∞∑

τ=−∞|τ |φ|τ | = 2

∞∑

τ=0

τφτ = 2φ

(1− φ)2, (3.8)

respectively, the optimal block length, Eq.(3.6), is [160]

L =

T

(2 φ

(1−φ)2

)2

v(

1 + 2 φ1−φ

)2

13

=

(T

4φ2

v(1− φ2)2

) 13

. (3.9)

Since the true autocorrelation is unknown, the optimal block length is estimatedfrom fitting the envelope φτ to the estimated autocorrelation [169],

γy(τ) =1

σ2y T

T∑

t=τ+1

y(t)y(t− τ) , τ ≥ 0 , (3.10)

of the zero-mean observation y(t). The so derived estimate φy is plugged intoEq. (3.9) to derive an estimate of the optimal block length [160]. The estimate of thevariance, σ2

y, used to estimate the autocorrelation, Eq. (3.10), is [24]

σ2y =

1

T

T∑

t=1

y2(t) . (3.11)

The subsequent derivations correspond to block bootstrapping with non-overlappingblocks. This is the original form of block bootstrapping [32]. By introducing theabove used factor v = 2

3the subsequent derivations apply to maximally overlapping

blocks, accordingly [82,119,160].

33

Page 50: Statistical Analysis of Processes with Application to

3. Optimal Block-Length Selection

−2000 −1500 −1000 −500 0 500 1000 1500 2000−0.5

0

0.5

1

time lag in samples

auto

corr

elat

ion

Figure 3.1.: Diminution of autocorrelation for τ 6= 0 due to additive noise.

3.1.2. Effect of noise onto block-length selection

If the process analyzed is afflicted by noise, η(t), e.g., the process

Y (t) = X(t) + η(t) , (3.12)

instead of the underlying X(t) is observed, the block-length selection needs to beadapted. The reason for this is illustrated in Fig. 3.1. It shows the estimatedautocorrelation of an autoregressive process of order 2 (AR1[2]) with parametersa1 = 1.9975 and a2 = −0.998 in the state-space model (SSM). Zero-mean Gaussianobservation noise is added according to Eq. (3.12), such that the noise-to-signal ratio(NSR) is 1. This results in the diminution of the autocorrelation for τ 6= 0. It is tobe noted that neither the fact that the underlying process, X(t), is an autoregressiveone nor that η(t) is additive Gaussian is decisive for the diminution at lags τ 6= 0. Itis solely the temporal independence of the noise that diminishes the autocorrelationat all time lags τ 6= 0. If fitting φτ to the autocorrelation for τ > 0 [160], however,the decay constant is generally biased leading to a biased optimal block length.

3.1.3. Robust block-length estimation

The diminution of the autocorelation is accounted for, when adapting the deriva-tion of the decay constant φ. As published in [132], two appropriate adaptions areconceivable.

1. Fit Aφτ instead of φτ to the decay of the autocorrelation, for τ > 0. If theautocorrelation oscillates, the decay is reflected by the envelope, which in turnis given by the local maxima of the autocorrelation. If the autocorrelation doesnot oscillate, as for an AR1[1], the decay is given by the autocorrelation itselfrather than its envelope.

2. Fit an autoregressive process in the SSM to the data, as described in Chap. 2.The parameter φ is a function of the autoregressive parameters.

34

Page 51: Statistical Analysis of Processes with Application to

3.1. Methodology

While the first approach is motivated by the diminution of the autocorrelation,the second does not suffer from finding local maxima. For the first approach, theoptimal block length, cf. Eq. (3.6) [82],

L =

(2T

(∑∞τ=−∞ |τ | γY (τ)

)2

(∑∞τ=−∞ γY (τ)

)2

) 13

, (3.13)

is derived by substituting γY (τ) by Aφ|τ |, instead of φ|τ |. Arranging the amplitudeA outside the sums of the geometric series, Eq. (3.7), and its derivative, Eq. (3.8),leads to canceling of A such that the block length, cf. Eq. (3.9),

L =

(4T

φ2

(1− φ2)2

) 13

, (3.14)

remains unchanged.The second approach to robust block-length selection is particularly promising if

the underlying dynamics may be modeled by a low-order, especially first or secondorder, autoregressive process. As derived in the next two passages, φ and accord-ingly L, are functions of the process parameters. To estimate these parameters inthe SSM, the maximum-likelihood approach presented in Chap. 2 may be applied.

Block length of an AR1[1]

For an AR1[1],

X(t) = aX(t− 1) + ε(t) , ε(t)iid∼N (0, σ2

ε ) , t = 1, . . . , T , (3.15)

the autocorrelation [24] is

γX(τ) =E[X(t)X(t− τ)]

Var[X]=

E[aτX(t− τ)X(t− τ)]

Var[X]= aτ , τ ≥ 0. (3.16)

The process parameter a equals the decay constant φ of the autocorrelation. Ac-cordingly, the optimal block length

L =

(4T

a2

(1− a2)2

) 13

(3.17)

is a function of the process parameter a.At the example of the AR1[1], the effect of noise onto block-length selection may

be made explicit. To this end, the observation is modeled,

Y (t) = X(t) + η(t) , η(t)iid∼N (0, σ2

η) , t = 1, . . . , T . (3.18)

35

Page 52: Statistical Analysis of Processes with Application to

3. Optimal Block-Length Selection

The autocorrelation of Y (t) is still a function of the process parameter a, yet

γY (τ) =E[Y (t)Y (t− τ)]

Var[Y ]=

{Var[X]+σ2

η

Var[Y ]= 1 , if τ = 0

Var[X]Var[Y ]

a|τ | , else. (3.19)

For τ = 0, the identity E[Y 2(t)] = Var[Y ] = Var[X] + σ2η is used in the numerator.

This identity holds due to the independence of the underlying process X(t) andthe observation noise η(t). From Eq. (3.19) it may be seen that, for τ 6= 0, theautocorrelation is diminished by the factor

Var[X]

Var[Y ]=

1

1 +σ2η

Var[X]

. (3.20)

This factor depends on NSR =σ2η

Var[X]. Inserting the autocorrelation of the observa-

tion instead of the underlying process into the block-length equation (3.17) yields

LY =

(4T

a2Y

(1− a2Y )2

) 13

=

4T

a2X

Var[X]2

Var[Y ]2(1− a2

XVar[X]2

Var[Y ]2

)2

13

. (3.21)

This is a biased estimator of the block length, due to the factor Var[X]2

Var[Y ]2occurring

both in numerator and denominator. In the first approach of robust block-lengthestimation, this is compensated by fitting Aφτ to the decay of the autocorrelationfor τ > 0 rather than φτ for τ ≥ 0. In the second approach, this is compensated byparameter estimation in the SSM rather than negligence of observation noise.

Block length of an AR1[2]

An AR1[2] is the discrete analog of a noise-driven damped harmonic oscillator, whenidentifying oscillation frequency ω0 and decay φ of the latter with the parametersof the first. The explicit functional dependence is [92]

a1 = 2φ cos

(√(lnφ)2 − ω2

0

), (3.22)

a2 = −φ2 . (3.23)

Due to Eq. (3.23), the optimal block length of an AR1[2],

L =

(4T

−a2

(1 + a2)2

) 13

, (3.24)

is a function of the process parameter a2 of the AR1[2].

36

Page 53: Statistical Analysis of Processes with Application to

3.2. Application

−100 −80 −60 −40 −20 0 20 40 60 80 100

0.2

0.4

0.6

0.8

1

time lag

auto

corr

elat

ion

a=0.5a=0.7a=0.9a=0.94a=0.99

Figure 3.2.: Autocorrelations of AR1[1]-processes with process parameters a andNSR = 1. The diminution at τ 6= 0 is due to observation noise.

3.2. Application

In this Section, block-length estimation is first applied to simulated autoregressiveprocesses in the SSM. In Sec. 3.2.1, simulations of AR1[1] in the SSM show thatblock lengths derived according to the traditional approach depend on the NSR.This block-length dependence on NSR is resolved by the proposed approach. InSec. 3.2.2, it is shown by simulation, that the traditional approach also fails forhigher order autoregressive processes, while the proposed approach remains robust.In Sec. 3.2.3, the robust approach to block-length selection is applied to a tremortime series. Tremor is the involuntary rhythmic movement of limbs [87,88]. It mayvary in strength over time [148]. To assess this variation, confidence intervals ofvariance estimates are block bootstrapped based on the proposed approach to robustblock-length selection.

3.2.1. Block-length dependence on noise-to-signal ratio

To simulate the effect of noise on block-length selection, AR1[1]-processes in theSSM,

X(t) = aX(t− 1) + ε(t) , ε(t)iid∼N (0, σ2

ε ) (3.25)

Y (t) = X(t) + η(t) , η(t)iid∼N (0, σ2

η) (3.26)

are simulated for various process parameters a ∈ [0.5, 0.99], and T = 300 000 datapoints. In Fig. 3.2, exemplary autocorrelations of AR1[1]-processes with NSR = 1and different parameters a are shown. The autocorrelation is diminished by thefactor 1

1+NSR = 12for τ 6= 0, cf. Sec. 3.1.3.

By the traditional approach of block-length selection, i.e., fitting φ|τ | to the auto-correlation for τ ≥ 0, this diminution is ignored. This leads to block lengths withincreasing bias as NSR increases, see Fig. 3.3 (a). As shown in Fig. 3.3 (b), thisNSR dependence is resolved by the proposed approach, for which Aφτ is fitted for

37

Page 54: Statistical Analysis of Processes with Application to

3. Optimal Block-Length Selection

0

500

1000

1500

NS

R

a0.5 0.7 0.9 0.92 0.94 0.96 0.98 0.99

0

0.5

1

5

10(a)

0

500

1000

1500

a0.5 0.7 0.9 0.92 0.94 0.96 0.98 0.99

0

0.5

1

5

10(b)

Figure 3.3.: Block length (color-coded) as function of a and NSR. Traditional block-length selection (a) yields block lengths that depend on NSR, while theproposed method (b) allows for robust block-length estimation.

τ > 0. Block length increases as a increases, in both approaches. This is to beexpected, since the denominator of the optimal block length is of order (1 − a2)2.With a → 1, the block length increases until divergence. Physically, this increaseof block length is expected, since the mixing property of the AR1[1] increases withincreasing parameter a such that longer blocks are in need.To derive the results shown in Fig. 3.3 the decay of the autocorrelation is estimated

from the first 20 data points of the estimated autocorrelation. This restriction ismade to compensate for the finite size effect that induces variation in the estimatedautocorrelation. This variation affects the autocorrelation increasingly as the lagincreases. By confining to the first few data points of the autocorrelation, theimpact onto block-length selection is reduced. Confining to the first 20 data pointsinduces a lower bound of the parameter range assessable, since the autocorrelationdecays increasingly fast with decreasing parameter a. Based on the first 20 datapoints, the lower bound of parameters is fixed to a = 0.5.

3.2.2. Traditional vs. proposed block-length selection

An AR1[2]-process with parameters a1 = 1.9, a2 = −0.99, and σ2ε = 0.05, is sim-

ulated for T = 50 000 data points. Due to the relation of the process parametera2 to the block length L = (4T −a2

(1+a2)2)13 , cf. Eq. (3.24) in Sec. 3.1.3, the true op-

timal block length is known, L = 1 296 data points. The observation y(t) of theAR1[2] is realized according to the SSM with NSR = 1. Optimal block lengthsare estimated based on φy obtained from fitting φτ and Aφτ to the envelope of theestimated autocorrelation of the observation y(t). Fits are shown in Fig. 3.4 (a) and(b), respectively.

38

Page 55: Statistical Analysis of Processes with Application to

3.2. Application

0 50 100 150 200−1

−0.5

0

0.5

1

time lag in samples

auto

corr

elat

ion

0 50 100 150 200−1

−0.5

0

0.5

1

time lag in samples

autocorrelationfit of envelope

autocorrelationfit of envelope

(a)

0 50 100 150 200−1

−0.5

0

0.5

1

time lag in samples

auto

corr

elat

ion

0 50 100 150 200−1

−0.5

0

0.5

1

time lag in samples

autocorrelationfit of envelope

autocorrelationfit of envelope

(b)

Figure 3.4.: Traditional (a) and proposed (b) approach: Fit of the envelope of theautocorrelation to estimate the decay φ necessary for the selection ofthe optimal block length.

Approach Truth Traditional Proposed Traditional ProposedBlock length L = 1 296 L

(t)y = 702 L

(p)y = 1 325 L

(t)x = 1 308 L

(p)x = 1 320(

L− L)/L 40.6% 2.2% 0.9% 1.9%

Table 3.1.: Optimal block lengths of the simulation study. The traditional approachunderestimates the block length considerably if estimated from a noisyobservation y(t). This is resolved by the proposed adaption. The re-sulting block length is comparable to that estimated from the noise-freeregime, i.e., based on x(t).

The block lengths derived from these fits are L(t)y = 707 for the traditional ap-

proach, and L(p)y = 1 325 for the proposed approach, respectively. The traditional

approach yields a deviation from the true value by 40.6%, while the block lengthderived from the adaption deviates by 2.2%. To relate the deviation of the block-length estimate L(p)

y to a quasi-ideal estimate, the optimal block length derivedfrom the underlying realization x(t), L(t)

x and L(p)x are considered. Since the un-

derlying realization is unaffected from additional noise, the resulting block lengthsfrom the traditional and proposed approach are comparable. In particular, they areL

(t)x = 1 308 and L(p)

x = 1 320, respectively. For comparison, all block lengths withrelative deviations from the known block length L are summarized in Tab. 3.1.Due to the proposed adaption, the block length estimated from the observation

is similar to the one obtainable from the underlying realization. Both deviate fromthe known true optimal block length by approximately 2%, while optimal blocklength is considerably underestimated by the traditional approach.

39

Page 56: Statistical Analysis of Processes with Application to

3. Optimal Block-Length Selection

0 50 100 150 200 250 3000

10

20

30

time in s

rect

fied

EM

G

0 0.2 0.4 0.6 0.8 1

0

10

20

30

time in s

rect

ified

EM

G

0 0.2 0.4 0.6 0.8 1

0

10

20

30

time in s

(a)

0 50 100 150 200 250 3000

10

20

30

time in s

rect

fied

EM

G

0 0.2 0.4 0.6 0.8 1

0

10

20

30

time in s

rect

ified

EM

G

0 0.2 0.4 0.6 0.8 1

0

10

20

30

time in s

(b)

0 50 100 150 200 250 3000

10

20

30

time in s

rect

fied

EM

G

0 0.2 0.4 0.6 0.8 1

0

10

20

30

time in s

rect

ified

EM

G

0 0.2 0.4 0.6 0.8 1

0

10

20

30

time in s

(c)

Figure 3.5.: Exemplary EMG recording of a tremor patient. Tremor strength seemsto increase at the end of the 5min-recording (a). An exemplary 1 s-period of EMG from the beginning (b) and the end (c) of the recordingperiod is shown.

3.2.3. Block bootstrap applied to tremor data

A standard technique for monitoring muscular activity of tremor patients is the elec-tromyogram (EMG) [87,88]. The raw EMG is modeled by modulated noise [186]. Toreveal the oscillatory behavior, the EMG needs to be rectified, see Appendix C [186].Exemplarily, a rectified EMG recorded from the flexor muscle of the left hand of aParkinsonian tremor patient is shown in Fig. 3.5. It is sampled at fs = 1 000Hz.The amplitude of the signal seems to increase at the end of the 5min-recording,Fig. 3.5 (a). For a more detailed view of the data, two randomly chosen 1 s-segmentsat the beginning and the end of the recording are shown in Fig. 3.5 (b) and (c), re-spectively. To investigate whether the variances of the two segments are compatible,Gaussian 95%-confidence intervals are block bootstrapped for both segments. Dueto the central limit theorem, the assumption of Gaussianity of the variance estima-tor is justified. The optimal block length for the block bootstrap is determined bythe proposed adaption.Since the high-amplitude period at the end of the recording persists for 55 s,

another 55 s segment at the beginning of the recording is chosen as a reference. Forboth these segments, the block length is estimated by fitting Aφτ to the estimatedautocorrelation. Estimated autocorrelation (blue solid) and fitted envelope (blackdashed) are shown in Fig. 3.6 for time lags τ = 0, 1, . . . , 400 data points. Respective

40

Page 57: Statistical Analysis of Processes with Application to

3.3. Summary

0 100 200 300 400−0.5

0

0.5

1

time lag in samples

auto

corr

elat

ion

autocorrelationfit of envelope

0 100 200 300 400−0.5

0

0.5

1

time lag in samples

autocorrelationfit of envelope

(a)

0 100 200 300 400−0.5

0

0.5

1

time lag in samples

auto

corr

elat

ion

autocorrelationfit of envelope

0 100 200 300 400−0.5

0

0.5

1

time lag in samples

autocorrelationfit of envelope

(b)

Figure 3.6.: Estimated autocorrelation of the EMG of a trembling tremor patient atthe beginning (a) and the end (b) of the recording period. The diminu-tion for τ > 0 occurs as in the autoregressive processes simulated above.

optimal block lengths are derived, LEMG,b = 1189 data points at the beginning andLEMG,e = 848 data points at the end of the recording period. To block bootstrapthe variance, each set of data is subdivided into blocks of respective lengths LEMG,b

and LEMG,e. Blocks are drawn randomly with replacement from respective poolsof blocks, and bootstrap realizations of the same length as the original data arecomposed. From resulting bootstrap samples, the variance is estimated. Bootstrapvariances are used to estimate the standard deviation of the variance. This standarddeviation is incorporated into the 95%-Gaussian confidence intervals correspondingto the beginning of the recording, (0.95, 1.02), and to the end of the recording(11.05, 12.37), respectively. Since confidence intervals do not overlap, it is reasonableto conclude that variance has increased from the beginning of the recording to theend of it.If the naive approach is used instead, confidence intervals are underestimated. At

the beginning of the recording the confidence interval is estimated (0.97, 1.01), atthe end it is (11.38, 12.05).

3.3. Summary

While the original method of the bootstrap is powerful if the underlying data areindependent [48], block bootstrap has been proposed for autocorrelated data [32]. Forthe block bootstrap, segments of data instead of single data points are randomlydrawn with replacement [32,119]. The decisive parameter is the choice of the blocklength [82]. In this chapter, the block-length selection for variance estimation ofsmooth function model estimators is investigated [82,160]. It is shown that specialcare is necessary if the process analyzed is corrupted by noise that diminishes theautocorrelation [132]. Noise induced effects onto the block-length selection are de-scribed analytically and numerically at the example of low order autoregressiveprocesses. While the traditional approach results in biased block-length estimation

41

Page 58: Statistical Analysis of Processes with Application to

3. Optimal Block-Length Selection

in the noisy regime, the proposed adaption yields robust optimal block lengths upto high noise-to-signal ratios [132]. As an exemplary application for the block boot-strap the proposed adaption of block-length selection is used to estimate confidenceintervals of the variance of tremor signals. Tremor is a neurological phenomenonwhich in the example chosen manifests as a pathological involuntary trembling ofthe hand [87,88]. As revealed by the block bootstrap, the tremor strength quantifiedby variance increases considerably. Block bootstrapped 95%-confidence intervals donot overlap.

42

Page 59: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

This chapter is based on the publication

M. Mader, J. Klatt, F. Amtage, B. Hellwig, W. Mader, L. Sommerlade, J.Timmer, B. Schelter. Spectral and higher-order-spectral analysis of tremortime series. Clin. Exp. Pharmacol., 4: 1000149, 2014. [130]

In the previous chapter, the block bootstrap is applied to estimate the variance ofsmooth function models. Based on this variance, confidence intervals are estimated.In this chapter, the block bootstrap is applied to estimate null distributions forhypothesis testing within the scope of bispectral analysis, a nonlinear extension ofspectral analysis.Spectral analysis is one of the key techniques of time series analysis [24,164,169]. It is

used in various fields, ranging from fluid dynamics [76] to optics [180], solar physics [33],engineering [45], and solid-state physics [215]. By spectral analysis the frequency con-tent of processes may be quantified [80]. Most investigations are restricted to secondorder spectral analysis, investigating the linear properties of processes in the fre-quency domain [57,101,113,126,203]. If an oscillatory process is nonlinear, spectral peaksoccur not only at the fundamental oscillation frequency, but also at multiples ofit [68,85]. These higher harmonics may not be distinguished from independent os-cillations at multiples of the fundamental frequency, when applying second orderspectral analysis [130]. This restriction is resolved by higher order spectral analy-sis [23].The third order spectrum is called bispectrum [214]. Bispectral analysis renders

higher harmonics due to quadratic nonlinearity identifiable [23,214]. Two major [53]normalizations have been proposed for the bispectrum [23,110]. Kim and Powersintroduced the so-called bicoherence, which is normalized to values in [0, 1] [110].Numerical simulations suggest to apply a χ2-statistic to statistically test the nullhypothesis of zero-bicoherence [53,54]. Other than for the bicoherence, statisticalproperties of the bispectral coefficient are known analytically [23,91]. In this chapter,a block bootstrap for hypothesis testing within the scope of bispectral analysis ispresented. It is published in [130]. Methodological foundations of bispectral analysisand the proposed hypothesis test are presented in Sec. 4.1. The performance ofthe test is compared to the performance of the established analytic statistic of thebispectral coefficient in Sec. 4.2. Finally, the block bootstrap-based test is appliedto tremor data. Higher harmonics of the tremor frequency around 4 to 6Hz aredistinguished from independent oscillations in the low β-frequency band, rangingfrom 8 to 16Hz.

43

Page 60: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

4.1. Methodology

Higher order spectra extend the concept of second order spectra [24]. The n-th orderspectrum is proportional to the Fourier transform of so-called n-th order cumulant(n ≥ 2) [214]. It is equally called (n−1)-th order polyspectrum [23]. In particular, thethird order spectrum is the second order polyspectrum, called the bispectrum [214].The univariate bispectrum quantifies quadratic nonlinearities of one process [169].The bivariate analog quantifies quadratic nonlinearities between two processes [169].Univariate and bivariate bispectra are introduced in Sec. 4.1.1. Bispectral estima-tion is reviewed in Sec. 4.1.2. Two normalizations and their estimation are summa-rized in Sec. 4.1.3. In Sec. 4.1.4, the block bootstrap in association with bispectralestimation is presented in the following publication [130].

4.1.1. Spectrum and bispectrum

The second order spectrum, or short spectrum,

SX(ν) =1

∞∑

τ=−∞RX(τ)e−iντ , for ν ∈ Z , (4.1)

is proportional to the Fourier transform of the second order cumulant, i.e., theautocovariance

RX(τ) = E[X(t)X(t+ τ)] , (4.2)

of a real-valued, zero-mean, stationary and mixing process X(t) [24]. The third orderanalog of the spectrum is the bispectrum

BX(ν1, ν2) =1

(2π)2

∞∑

τ1=−∞

∞∑

τ2=−∞RX(τ1, τ2)e−i(ν1τ1+ν2τ2) , (4.3)

where RX(τ1, τ2) = E[X(t)X(t + τ1)X(t + τ2)] is the third order cumulant [24]. Forlinear Gaussian processes, the bispectrum is zero since all joint cumulants higherthan second order are zero [169]. If the process is non-Gaussian or nonlinear of secondorder the bispectrum is nonzero [23,169]. A typical process with nonzero bispectrumis given by the quadratic phase-coupling model [152]

X(t) = cos(ω1t+ ϕ1) + cos(ω2t+ ϕ2) + cos(ω1t+ ω2t+ ϕ1 + ϕ2) , (4.4)

consisting of three oscillations at frequencies ω1, ω2, and ω1 + ω2. Phase shiftsϕ1, ϕ2

iid∼U(0, 2π] are fixed within each realization but independent uniformly dis-tributed on (0, 2π] across realizations. Phases of the first two oscillators couple intothe phase of the third oscillator. This induces a quadratic nonlinearity. The bi-spectrum BX(ν1, ν2) quantifies the nonlinear self-coupling of X(t) from the phases

44

Page 61: Statistical Analysis of Processes with Application to

4.1. Methodology

corresponding to frequencies ν1 and ν2 onto their sum. Accordingly, BX(ν1, ν2) = 0,except for tuples (ν1, ν2) = ±(ω1, ω2), ±(−ω1, ω1 + ω2), ±(−ω2, ω1 + ω2), and per-mutations ±(ω2, ω1), ±(ω1 + ω2,−ω1), ±(ω1 + ω2,−ω2), respectively [152].To quantify the linear interaction between two processes X1(t) and X2(t) rather

than within one process X(t), the coherence,

CohX1X2(ν) =CSX1X2(ν)√SX1(ν) SX2(ν)

, (4.5)

such that |CohX1X2(ν)| ∈ [0, 1], has been introduced [24]. It is a normalized versionof the cross-spectrum [24]

CSX1X2(ν) =1

∞∑

τ=−∞RX1X2(τ)e−iντ . (4.6)

The cross-spectrum is the bivariate analog of the spectrum, containing the Fouriertransformed second-order joint cumulant, the cross-covariance [24]

RX1X2(τ) = E[X1(t)X2(t+ τ)] . (4.7)

The third order analog of the cross-spectrum, Eq. (4.6), is the cross-bispectrum [24].The cross-bispectrum is proportional to the Fourier transform of the third-orderjoint cumulant. For two processes X1(t) and X2(t), there are six third-order jointcumulants

RX1X1X2(τ1, τ2) = E[X1(t)X1(t+ τ1)X2(t+ τ2)] ,

RX1X2X1(τ1, τ2) = E[X1(t)X2(t+ τ1)X1(t+ τ2)] ,

RX1X2X2(τ1, τ2) = E[X1(t)X2(t+ τ1)X2(t+ τ2)] ,

RX2X1X1(τ1, τ2) = E[X2(t)X1(t+ τ1)X1(t+ τ2)] ,

RX2X1X2(τ1, τ2) = E[X2(t)X1(t+ τ1)X2(t+ τ2)] , andRX2X2X1(τ1, τ2) = E[X2(t)X2(t+ τ1)X1(t+ τ2)] ,

(4.8)

according to six permutations of triples (Xj1 , Xj2 , Xj3) with j1, j2, j3 ∈ {1, 2} [24].This results in six cross-bispectra,

CBXj1Xj2Xj3 (ν1, ν2) =1

(2π)2

∞∑

τ1=−∞

∞∑

τ2=−∞RXj1Xj2Xj3

(τ1, τ2)e−i(ν1τ1+ν2τ2) , (4.9)

with respective choices j1, j2, j3 ∈ {1, 2} [24]. For j1 = j2 = j3, the cross-bispectrumreduces to the bispectrum. In analogy to the univariate quadratic phase-couplingmodel, Eq. (4.4), quadratic phase coupling across processes is of the form [136]

X1(t) = cos(ω11t+ ϕ11) + cos(ω12t+ ϕ12) + cos(ω11t+ ω22t+ ϕ11 + ϕ22) , (4.10)X2(t) = cos(ω21t+ ϕ21) + cos(ω22t+ ϕ22) + cos(ω21t+ ω22t+ ϕ) , (4.11)

45

Page 62: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

−15 −10 −5 0 5 10 15−15

−10

−5

0

5

10

15

frequency ν1 in Hz

freq

uenc

y ν 2 in

Hz

Figure 4.1.: Symmetries of the bispectrum. It is enough to consider ν1, ν2 such thatν1 ≥ ν2 ≥ 0, since the other frequency tuples can be reconstructed fromthis sector. Exemplarily, the reconstruction of the bispectrum in othersectors from BX(ν1, ν2) for ν1 = 4Hz and ν2 = 9Hz is demonstrated.A blue ’x’ denotes bispectral values equal to BX(ν1, ν2), red ’o’ denotesthose equal to BX

+(ν1, ν2).

with random but fixed phase shifts, ϕ11, ϕ12, ϕ21, ϕ22, and ϕiid∼U(0, 2π] for each

realization. The third oscillator in processX1(t) is influenced by the phase ω22t+ϕ22

of the second oscillation of X2(t). The according cross-bispectrum

CBX1X2X1(ν1, ν2) =1

(2π)2

∞∑

τ1=−∞

∞∑

τ2=−∞RX1X2X1(τ1, τ2)e−i(ν1τ1+ν2τ2) (4.12)

is nonzero at frequency tuples (ν1, ν2) = ±(ω11, ω22) [136].

Symmetries of bispectrum and cross-bispectrum

Bispectra and cross-bispectra are governed by symmetries [136,152] shown in Fig. 4.1.The bispectrum of the whole (ν1, ν2)-plane may be reconstructed if the bispectrum ofone of the twelve sectors is known. Bispectra within one sector equal complex conju-gated bispectra of the opposite sector. Sectors mirrored at the first angular bisectorare obtained from transposition. Given the bispectrum at frequency tuple (ν1, ν2),

46

Page 63: Statistical Analysis of Processes with Application to

4.1. Methodology

the bispectral values at eleven tuples may be reconstructed according to [152]

BX(ν1, ν2) = BX(ν2, ν1)

= BX+

(−ν2,−ν1) = BX+

(−ν1,−ν2)

= BX+

(−ν2, ν1 + ν2) = BX+

(ν1 + ν2,−ν2)

= BX+

(−ν1, ν1 + ν2) = BX+

(ν1 + ν2,−ν1)

= BX(−ν1 − ν2, ν2) = BX(ν2,−ν1 − ν2)

= BX(−ν1 − ν2, ν1) = BX(ν1,−ν1 − ν2) ,

(4.13)

where + denotes complex conjugation. In Fig. 4.1, the frequency tuples at whichthe bispectrum is the complex conjugate of an exemplary BX(4Hz, 9Hz) are shownin blue, the tuples corresponding to the complex conjugated are displayed in red.The six cross-bispectra of two processes X1(t) and X2(t) exhibit less symmetries

than the univariate bispectrum. In general, only the symmetry

CB(ν1, ν2) = CB+

(−ν1, ν2) (4.14)

holds [124]. Furthermore, only four of the six different types of cross-bispectra containdifferent information, since

CBX1X2X1(ν1, ν2) = CBX2X1X1(ν1, ν2) (4.15)CBX2X1X2(ν1, ν2) = CBX1X2X2(ν1, ν2) , (4.16)

according to the quadratic phase coupling [136].

4.1.2. Estimation of spectrum and bispectrum

Uni- and bivariate bispectra are estimated in analogy to the spectrum and thecross-spectrum [169]. Key to spectral estimation is the finite Fourier transform,

z(νk) =L∑

t=0

z(t)e−iνkt , (4.17)

of a realization z(t), t = 1, . . . , L, sampled from process Z(t). The Fourier trans-form yields a representation of z(t) in the frequency domain evaluated at naturalfrequencies νk = 2πk

L, k = 1, . . . , L [24]. An estimator of the spectrum is the finite

Fourier transform of the estimated autocovariance

RX(τ) =1

L

L−τ∑

t=0

x(t)x(t− τ) (4.18)

divided by 2π [169]. An equivalent estimator is the periodogram

Px(ν) =1

L

∣∣∣x(ν)∣∣∣2

(4.19)

47

Page 64: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

divided by 2π [169]. This estimator is asymptotically unbiased, yet inconsistent [24].A consistent estimator of the spectrum is

SX(ν) =1

2π nR

nR∑

k=1

Px(k)(ν) , (4.20)

which is obtained from averaging the periodograms of nR independent realizationsor segments of data, denoted x(k)(t), k = 1, . . . , nR

[169].Analogous to the univariate spectrum, the bivariate cross-spectrum of two pro-

cesses X1(t) and X2(t) is estimated by averaging cross-periodograms,

CPx(k)1 x

(k)2

(ν) =1

Lx(k)

1 (ν) x(k)

2

+

(ν) , (4.21)

of independent realizations x(k)1 (t) and x(k)

2 (t) [169]. Again, the resulting estimator ofthe cross-spectrum

CSX1X2(ν) =1

2π nR

nR∑

k=1

CPx(k)1 x

(k)2

(ν) (4.22)

is consistent [24]. The coherence, cf. (4.5), is estimated by replacing cross-spectrumCSX1X2(ν), and spectra SX1(ν) and SX2(ν) by their estimators, such that [24]

CohX1X2(ν) =CSX1X2(ν)√SX1(ν) SX2(ν)

. (4.23)

Spectral and cross-spectral estimates are based on the periodogram and the cross-periodogram. The third order analogs of periodogram and cross-periodogram arethe biperiodograms [91]

BPx(k)x(k)x(k)(ν1, ν2) =1

Lx(k)

(ν1)x(k)(ν2) x(k)+

(ν1 + ν2) , (4.24)

and the cross-biperiodogram [124]

CBPx(k)j1x(k)j2x(k)j3

(ν1, ν2) =1

Lx(k)

j1(ν1)x(k)

j2(ν2) x(k)

j3

+

(ν1 + ν2) , (4.25)

of respective realizations, x(k)(t), x(k)j1

(t), x(k)j2

(t), and x(k)j3

(t), with j1, j2, j3 ∈ {1, 2}.Analogous to the second order case, bispectra and cross-bispectra are obtained fromaveraging biperiodograms and cross-biperiodograms. In particular, an estimator ofthe bispectrum of process X1(t) is

BX1X1X1(ν1, ν2) =: BX1(ν1, ν2) =1

(2π)2 nR

nR∑

k=1

BPx(k)1 x

(k)1 x

(k)1

(ν1, ν2) , (4.26)

48

Page 65: Statistical Analysis of Processes with Application to

4.1. Methodology

and the estimator of the according cross-bispectrum is

CBXj1Xj2Xj3 (ν1, ν2) =1

(2π)2 nR

nR∑

k=1

CBPx(k)j1x(k)j2x(k)j3

(ν1, ν2) , (4.27)

with j1, j2, j3 ∈ {1, 2}. These estimators are asymptotically unbiased and consis-tent, since

E[CBXj1Xj2Xj3 (ν1, ν2)] = CBXj1Xj2Xj3 (ν1, ν2) , and (4.28)

Var[CBXj1Xj2Xj3 (ν1, ν2)] =L

nRSXj1 (ν1) SXj2 (ν2) SXj3 (ν1 + ν2) , (4.29)

for LnR→ 0 as L→∞ and nR →∞ [91].

As common in literature [91,110,124,136,152], all factors 2π within the scope of spectraland bispectral analysis are omitted in the following.

4.1.3. Normalizations of the bispectrum

The two most common normalizations of the bispectrum yield the bicoherence andthe bispectral coefficient [53]. David R. Brillinger presented both normalizations as aresult from regression analysis of intra-wave coupling [23]. While the bicoherence is adirect result from regression analysis, the bispectral coefficient is an approximationthat ensures normalization of the bispectrum by its estimator’s variance [23]. In thefollowing, these two normalizations are presented for the univariate case. Analogousdefinitions corresponding to the bivariate case of two processes X1(t) and X2(t) leadto cross-bicoherences and cross-bispectral coefficients.

Bicoherence

The bicoherence is the coefficient of multiple correlation obtained from the regres-sion model [110],

X(ν1 + ν2) = cν1ν2X(ν1)X(ν2) . (4.30)

Even though Brillinger had published a higher order regression model to motivatehigher order spectral analysis in general [23], the resulting coefficient of multiplecorrelation remained unused until Kim and Powers proposed to use it as a meansto quantify the “coherence between three waves due to the wave coupling” [111] asgiven by Eq. (4.30).For regression analysis, the mean squared error

MSE = E[(X(ν1 + ν2)− cν1ν2X(ν1)X(ν2)

)2]

= E[(X(ν1 + ν2)− cν1ν2X(ν1)X(ν2)

)X

+

(ν1 + ν2)]

− cν1ν2+

E[(X(ν1 + ν2)− cν1ν2X(ν1)X(ν2)

) (X(ν1)X

+

(ν2))]

,

(4.31)

49

Page 66: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

is minimized. At the minimum, the first derivative of the mean squared error withrespect to the parameter cν1ν2 is necessarily

E[(X(ν1 + ν2)− cν1ν2X(ν1)X(ν2)

) (X(ν1)X(ν2)

)+]

= 0 . (4.32)

Accordingly, the second term of MSE, cf. Eq. (4.31), vanishes. Solving Eq. (4.31)for the regression parameter cν1ν2 for which MSE = 0, i.e. MSE minimal, yields

cν1ν2 =E[X(ν1 + ν2)

(X(ν1)X(ν2)

)+]

E[∣∣∣X(ν1)X(ν2)

∣∣∣2] =

BX+

(ν1, ν2)

E[∣∣∣X(ν1)X(ν2)

∣∣∣2] . (4.33)

Inserting cν1ν2 into the Eq. (4.31) leads to

MSE = E[∣∣∣X(ν1 + ν2)

∣∣∣2]− BX

+(ν1, ν2)

E[∣∣∣X(ν1)X(ν2)

∣∣∣2] E

[X(ν1)X(ν2) X

+

(ν1 + ν2)]

= E[∣∣∣X(ν1 + ν2)

∣∣∣2]1−

∣∣∣BX+(ν1, ν2)∣∣∣2

E[∣∣∣X(ν1)X(ν2)

∣∣∣2]E[∣∣∣X(ν1 + ν2)

∣∣∣2]

.

(4.34)

The ratio∣∣BX(ν1, ν2)

∣∣2

E[∣∣X(ν1)X(ν2)

∣∣2]E[∣∣X(ν1 + ν2)

∣∣2] =: ρ2 (4.35)

is the squared coefficient of multiple correlation [110,196], also called the coefficient ofdetermination [23]. It quantifies the goodness by which the model of Eq. (4.30) fitsthe process analyzed. It attains values in the range [0, 1]. For ρ2 = 1, the processis perfectly fitted by the model, such that the mean squared error is 0. For ρ2 = 0,the mean squared error is maximal, E[|X(ν1 + ν2)|2]. The square root of ρ2 definesthe bicoherence [110],

BcohX(ν1, ν2) =|BX(ν1, ν2)|√

E[∣∣∣X(ν1)X(ν2)

∣∣∣2]√

E[∣∣∣X(ν1 + ν2)

∣∣∣2] . (4.36)

It is the analog of the coherence in that it is the coefficient of multiple correlationobtained from regression analysis [110,196]. As the absolute value of the coherence,the bicoherence is normalized to values in [0, 1].

50

Page 67: Statistical Analysis of Processes with Application to

4.1. Methodology

The estimator of the bicoherence is obtained from substituting each componentin the definition by the corresponding estimator [110], resulting in

BcohX(ν1, ν2) =1nR

∑nRk=1

∣∣BPX(ν1, ν2)∣∣

√1nR

∑nRk=1

∣∣x(k)(ν1)x(k)

(ν2)∣∣2√

1nR

∑nRk=1

∣∣x(k)(ν1 + ν2)

∣∣2. (4.37)

The major drawback of the bicoherence is that the denominator increases as L in-creases, such that low values do not necessarily correspond to third-order uncoupledGaussian processes [91]. This drawback is resolved by the bispectral coefficient [91],presented in the following passage.

Bispectral coefficient

Instead of using the coefficient of determination as in the case of the bicoherence,Brillinger proposed to consider its approximation [23]

BcX(ν1, ν2) =BX(ν1, ν2)√

SX(ν1) SX(ν2) SX(ν1 + ν2). (4.38)

BcX is called bicoherence [152] or bicoherence spectrum [109] in engineering literature.Those names are ambiguous since |BcX | is neither normalized to values [0, 1], noris the coefficient of determination from regression analysis as the coherence, nordoes it correspond to the bicoherence defined in Eq. (4.36) [91]. Here, it is calledbispectral coefficient, instead.The interpretation of the bispectral coefficient becomes explicit when considering

the linear model [91]

X(t) =∞∑

τ=−∞aτε(t− τ) . (4.39)

The iid noise ε(t) with variance σ2ε and skewness mε = E[ε3(t)] is filtered by the

coefficients aτ [91]. Both spectrum SX(ν) = σ2ε |HX(ν)|2 and bispectrum

BX(ν1, ν2) = mεHX(ν1)HX(ν2)HX+

(ν1 + ν2) (4.40)

are functions of the transfer function HX(ν) = a(ν) [91]. Using Eq. (4.38), theabsolute value of the bispectral coefficient, |BcX(ν1, ν2)| = mε

σ3ε, is proportional to

the skewness mε[91]. Accordingly, |BcX(ν1, ν2)| = 0 for Gaussian processes [169].

To estimate the bispectral coefficient, all components SX(ν1), SX(ν2), SX(ν1 +ν2)and BX(ν1, ν2) are substituted by their corresponding estimators [91]. Since thebispectral coefficient is the bispectrum divided by a factor proportional to theasymptotic variance of the bispectral estimator, cf. Eq. (4.29), the estimator ofthe bispectral coefficient is asymptotically complex normally distributed with unitvariance [23,91].

51

Page 68: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

4.1.4. Statistical analysis of normalized bispectra

To statistically assess whether bispectra or normalized bispectra are compatible withzero, hypothesis tests may be employed. Bispectra and corresponding normalizedcoefficients may be zero due to either Gaussianity or linearity [169]. When testing thenull hypothesis of linearity, it is therefore useful to ensure Gaussianity of the databefore estimating the bispectrum or any of its normalized coefficients. The pro-cedure of obtaining data (y(1), . . . , y(T )) that conform to a Gaussian distributionfrom not necessarily Gaussian data (x(1), . . . , x(T )) is further called gaussianiza-tion. To gaussianize data, first a set {z(1), . . . , z(T )} of Gaussian numbers is real-ized. Both this set and the original set of data are sorted according to their value,yielding (z(1), . . . , z(T )) and (x(1), . . . , x(T )). For each time point t, the originaldata x(t) is substituted according to its rank kt in (x(1), . . . , x(T )) by the Gaussiannumber of the same rank kt in (z(1), . . . , z(T )). This yields the gaussianized data(z(k1), . . . , z(kT )) that corresponds to (x(1), . . . , x(T )).After gaussianization, the null hypothesis of linearity may be tested by a hypoth-

esis test of zero-bispectrum, zero-bicoherence, or zero-bispectral coefficient. Theestimator of the bispectrum is known to be asymptotically complex normally dis-tributed about the true bispectrum with variance given by (4.29) [91]. The estimatorof bicoherence is claimed to be χ2-distributed [53,54]. However, this result is based onsimulations rather than on analytic derivations. The bispectral coefficient is definedsuch that its estimator is asymptotically complex normally distributed with vari-ance 1 about the true bispectral coefficient [23]. Its asymptotic behavior is reachedin the limit of L = T u, where u is a parameter in the range (0, 1

2) [90,91]. By the

identity T = nR L, the number nR of realizations or blocks for the estimation of thebispectral coefficient is determined.As published in [130], an alternative based on the block bootstrap is proposed to de-

rive the null distribution of zero-bispectrum, zero-bicoherence, and zero-bispectralcoefficient, respectively. The fundamental idea of the block bootstrap is to resamplethe segments used of bispectral estimation to derive bootstrap bispectra, bicoher-ences, or bispectral coefficients, in which the absence of quadratic phase couplingis ensured. The following 5-step algorithm explains the block bootstrap in moredetail.

(1) Consider nBl blocks of independent data {x(k)(t), t = 1, . . . , L}k=1,...,nBl . Eachblock contains L data points that can be derived from nBl independent record-ings of the analyzed process or from nBl independent blocks of one recording.

(2) Draw randomly with replacement nBl tuples (x(k), x(lk))k=1,...,nBl of blocks fromthe set obtained in (1). In particular, for each randomly drawn block k, drawa second block lk 6= k randomly, and repeat this for all blocks k = 1, . . . , nBl.

52

Page 69: Statistical Analysis of Processes with Application to

4.1. Methodology

(3) For each block k = 1, . . . , nBl in (2), compute the bootstrap biperiodograms

BP(k)∗x (ν1, ν2) =

1

T

(x(k)

(ν1)x(k)(ν2) x(lk)+

(ν1 + ν2)

), (4.41)

and the components

fk(ν1) :=∣∣x(k)

(ν1)∣∣2 , (4.42)

fk(ν2) :=∣∣x(k)

(ν2)∣∣2 , (4.43)

flk(ν1 + ν2) :=∣∣x(lk)

(ν1 + ν2)∣∣2 , (4.44)

fk,ν1ν2 :=∣∣x(k)

(ν1) x(k)+(ν2)

∣∣2 , (4.45)

that constitute the normalizations of bicoherence and bispectral coefficient.

(4) Obtain the bootstrap bispectrum from averaging the biperiodograms of (3),such that

B∗x(ν1, ν2) =

1

nBl

nBl∑

k=1

BP(k)∗x (ν1, ν2) . (4.46)

(5) Obtain the bootstrap bicoherence

Bcoh∗x(ν1, ν2) =

∣∣∣B∗x(ν1, ν2)∣∣∣

√1nBl

∑nBlk=1 fk,ν1ν2

1nBl

∑nBllk=1 flk(ν1 + ν2)

, (4.47)

and bootstrap bispectral coefficient

Bc∗x(ν1, ν2) =

B∗x(ν1, ν2)√

1nBl

∑nBlk=1 fk(ν1) 1

nBl

∑nBlk=1 fk(ν2) 1

nBl

∑nBllk=1 flk(ν1 + ν2)

,

(4.48)

by averaging expressions given in Eqs. (4.42)–(4.45) across blocks k = 1, . . . , nBl.

The destruction of quadratic phase coupling is ensured by selecting tuples(x(k), x(lk))k=1,...,nBl in (2), such that lk 6= k. To estimate the null distributionsof zero-bispectrum, zero-bicoherence, and zero-bispectral coefficient, the algorithmis repeated nB times. Null distributions are sampled by the empirical distribu-tions of B

∗x(ν1, ν2), Bcoh

∗x(ν1, ν2) and Bc

∗x(ν1, ν2), respectively. The null hypothesis

is rejected if the original bispectral value, Bx(ν1, ν2), Bcohx(ν1, ν2) or Bcx(ν1, ν2),exceeds the (1 − α)-quantiles of the according sampled null distributions, with αthe significance level of the test.

53

Page 70: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

In step (1) of the block bootstrap algorithm, independence of the nBl realizationsx(k)(t) is required. Independence is demanded for bispectral estimation rather thanthe block bootstrap. Sufficient independence of blocks for the bootstrap would beachieved even by overlapping blocks [119]. If x(k)(t) is obtained from subdividing along realization into nB blocks for bispectral estimation, the autocovariance may beused as measure of independence. Choosing the block length according to the lagat which the autocovariance has decayed considerably, data of subsequent blocksare sufficiently independent. In analogy to bispectral estimation, the maximum lagof decayed autocovariances and cross-covariance is used to determine the length ofblocks for cross-bispectral estimation.For cross-bispectral estimation, further adaptions of the above block bootstrap

are necessary. According to the null hypothesis tested, only one of the six differentcross-bispectra and respective normalizations needs to be considered. For the sakeof brevity, the adaptions of the block bootstrap are made explicit for the estimatedcross-bicoherence

CBcohx1x2x1(ν1, ν2) =

1nBl

∑nBlk=1

∣∣∣∣x(k)

1 (ν1) x(k)

2 (ν2) x(lk)

1

+(ν1 + ν2)

∣∣∣∣√

1nBl

∑nBlk=1

∣∣∣∣x(k)

1 (ν1) x(k)

2+(ν2)

∣∣∣∣2

1nBl

∑nBllk=1

∣∣∣∣x(lk)

1 (ν1 + ν2)

∣∣∣∣2,

(4.49)

exemplarily. By CBcohx1x2x1(ν1, ν2) quadratic phase coupling from frequency ν1 inX1(t) and frequency ν2 in X2(t) onto frequency ν1 + ν2 in X1(t) is quantified.

(1) Consider nBl blocks of independent data {(x(k)1 , x

(k)2 )} of both processes X1(t)

and X2(t).

(2) Draw randomly with replacement nBl triples (x(k)1 , x

(k)2 , x

(lk)1 ), lk 6= k.

(3) For each block k = 1, . . . , nBl, compute cross-biperiodograms CBP(k)∗x1x2x1

(ν1, ν2)and adapt Eqs. (4.42)–(4.45) to

fk(ν1) :=∣∣x(k)

1 (ν1)∣∣2 , (4.50)

fk(ν2) :=∣∣x(k)

2 (ν2)∣∣2 , (4.51)

flk(ν1 + ν2) :=∣∣x(lk)

1 (ν1 + ν2)∣∣2 , (4.52)

fk,ν1ν2 :=∣∣x(k)

1 (ν1) x(k)

2

+

(ν2)∣∣2 . (4.53)

(4) In analogy to the univariate case, obtain the bootstrap cross-bispectral esti-mate

CB∗x1x2x1

(ν1, ν2) =1

nBl

nBl∑

k=1

CBP(k)∗x1x2x1

(ν1, ν2) . (4.54)

54

Page 71: Statistical Analysis of Processes with Application to

4.2. Application

(5) Obtain the bootstrap cross-bicoherence estimate

CBcoh∗x1x2x1

(ν1, ν2) =

∣∣∣CB∗x1x2x1

(ν1, ν2)∣∣∣

√1nBl

∑nBlk=1 fk,ν1ν2

1nBl

∑nBllk=1 flk(ν1 + ν2)

, (4.55)

and the bootstrap cross-bispectral coefficient estimate

CBc∗x1x2x1

(ν1, ν2) =CB∗x1x2x1

(ν1, ν2)√1nBl

∑nBlk=1 fk(ν1) 1

nBl

∑nBlk=1 fk(ν2) 1

nBl

∑nBllk=1 flk(ν1 + ν2)

,

(4.56)

by averaging expressions given in Eqs. (4.50)–(4.53) across blocks k = 1, . . . , nBl.

As in the univariate case, the (1−α)-quantiles of the bootstrapped null distributionmay be used in corresponding hypothesis tests.

4.2. Application

In this section, bispectral analysis is applied to distinguish first harmonics of qua-dratic processes from independent oscillations at twice the fundamental frequencyboth in simulations and in medical data. First harmonics of quadratic processes aremodeled by quadratic phase coupling. By bispectral analysis these harmonics aredistinguishable from independent oscillations as shown in Sec. 4.2.1, following pub-lication [130]. For testing the null hypothesis of zero-bicoherence or zero-bispectralcoefficient, the proposed block bootstrap is employed in Sec. 4.2.2. The performanceof this test is assessed with respect to reliability and power. It is compared to thetest based on the analytic null distribution of the bispectral coefficient. Finally,bispectral analysis with the proposed block bootstrtap is applied to tremor data inSec. 4.2.3, following publication [130].

4.2.1. Modeling first order harmonics

To model higher harmonics in the spectrum, the quadratic phase-coupling modelas given in Eq. (4.4) is adapted. Setting ω1 = ω2 =: ω and adding Gaussianobservation noise η(t)

iid∼N (0, σ2η), yields

X(t) = cos(ωt+ ϕ1) + cos(ωt+ ϕ2) + cos(2ωt+ ϕ) + η(t) . (4.57)

Again, phase shifts ϕ1, ϕ2iid∼U(0, 2π] are uniformly distributed. The spectrum ex-

hibits peaks at ±ω and ±2ω. If ϕ = ϕ1 + ϕ2, the spectral peak at 2ω is the higherharmonic of ω. If on the contrary ϕ iid∼U(0, 2π], the 2ω-oscillation corresponds to an

55

Page 72: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

0 5 10 15

10−10

frequency in Hz

inde

pend

ent

Sx

0 5 10 15

10−10

frequency in Hz

high

er h

arm

onic

Sx

0

0.2

0.4

0.6

0.8

Bcohx

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

Bcohx

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

50

100

Bcx

0 5 10 150

5

10

15

0

50

100

Bcx

frequency in Hz0 5 10 15

0

5

10

15

(a)

0 5 10 15

10−10

frequency in Hz

inde

pend

ent

Sx

0 5 10 15

10−10

frequency in Hz

high

er h

arm

onic

Sx

0

0.2

0.4

0.6

0.8

Bcohx

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

Bcohx

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

50

100

Bcx

0 5 10 150

5

10

15

0

50

100

Bcx

frequency in Hz0 5 10 15

0

5

10

15

(c)

0 5 10 15

10−10

frequency in Hz

inde

pend

ent

Sx

0 5 10 15

10−10

frequency in Hz

high

er h

arm

onic

Sx

0

0.2

0.4

0.6

0.8

Bcohx

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

Bcohx

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

50

100

Bcx

0 5 10 150

5

10

15

0

50

100

Bcx

frequency in Hz0 5 10 15

0

5

10

15

(e)

0 5 10 15

10−10

frequency in Hz

ind

ep

en

de

nt

Sx

0 5 10 15

10−10

frequency in Hz

hig

he

r h

arm

on

ic

Sx

0

0.2

0.4

0.6

0.8

Bcohx

fre

qu

en

cy in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

Bcohx

frequency in Hz

fre

qu

en

cy in

Hz

0 5 10 150

5

10

15

0

50

100

Bcx

0 5 10 150

5

10

15

0

50

100

Bcx

frequency in Hz0 5 10 15

0

5

10

15(b)

0 5 10 15

10−10

frequency in Hz

inde

pend

ent

Sx

0 5 10 15

10−10

frequency in Hz

high

er h

arm

onic

Sx

0

0.2

0.4

0.6

0.8

Bcohx

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

Bcohx

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

50

100

Bcx

0 5 10 150

5

10

15

0

50

100

Bcx

frequency in Hz0 5 10 15

0

5

10

15(d)

0 5 10 15

10−10

frequency in Hz

inde

pend

ent

Sx

0 5 10 15

10−10

frequency in Hz

high

er h

arm

onic

Sx

0

0.2

0.4

0.6

0.8

Bcohx

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

Bcohx

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

50

100

Bcx

0 5 10 150

5

10

15

0

50

100

Bcx

frequency in Hz0 5 10 15

0

5

10

15(f)

Figure 4.2.: Comparison of independent oscillations at 4Hz and 8Hz (top) and qua-dratic phase coupling at 8Hz (bottom). Estimated spectra (a,b) areindistinguishable. However, from estimated bicoherence (c,d) and esti-mated bispectral coefficient (e,f), the differences become apparent. Forbetter displayability, estimates’ · are omitted.

independent oscillation. To see the effect on the spectrum, nBl = 200 realizations ofX(t) as given by Eq. (4.57), are simulated. Realizations are of length L = 10 000,with oscillation frequency ω = 2π 4Hz and observation noise η(t) of variance σ2

η,such that the noise-to-signal ratio is NSR = 10%. Estimated spectra are shownin Fig. 4.2. They are identical whether the oscillation at 8Hz is independent fromthe oscillation at 4Hz (a), i.e. ϕ ∼ U(0, 2π], or it is its higher harmonic (b), i.e.,ϕ = ϕ1+ϕ2. Applying bispectral analysis reveals the differences. Bicoherences (c,d)and bispectral coefficients (e,f) exhibit peaks at (4Hz, 4Hz), if and only if a higherharmonic underlies the data (bottom: d,f).For analogous bivariate considerations, the quadratic phase-coupling model given

by Eqs. (4.10) and (4.11) is extended, such that

X1(t) =c11 cos(ωt+ ϕ11) + c12 cos(ωt+ ϕ12)

+c13 cos(2ωt+ ϕ13) + c14 cos(2ωt+ ϕ14) + η1(t) , (4.58)X2(t) =c21 cos(ωt+ ϕ21) + c22 cos(ωt+ ϕ22)

+c23 cos(2ωt+ ϕ23) + η2(t) . (4.59)

A coherence peak occurs at ±ω and 2ω if phase shifts are uniformly distributedexcept for constraints

ϕ11 = ϕ21 and ϕ13 = ϕ23 . (4.60)

56

Page 73: Statistical Analysis of Processes with Application to

4.2. Application

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

inde

pend

ent

Coh12

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

high

er h

arm

onic

Coh12

0

0.2

0.4

0.6

CBcoh121

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

CBcoh121

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

20

40

60

CBc121

0 5 10 150

5

10

15

0

20

40

60

CBc121

frequency in Hz0 5 10 15

0

5

10

15

(a)

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

inde

pend

ent

Coh12

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

high

er h

arm

onic

Coh12

0

0.2

0.4

0.6

CBcoh121

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

CBcoh121

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

20

40

60

CBc121

0 5 10 150

5

10

15

0

20

40

60

CBc121

frequency in Hz0 5 10 15

0

5

10

15

(c)

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

inde

pend

ent

Coh12

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

high

er h

arm

onic

Coh12

0

0.2

0.4

0.6

CBcoh121

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

CBcoh121

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

20

40

60

CBc121

0 5 10 150

5

10

15

0

20

40

60

CBc121

frequency in Hz0 5 10 15

0

5

10

15

(e)

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

inde

pend

ent

Coh12

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

high

er h

arm

onic

Coh12

0

0.2

0.4

0.6

CBcoh121

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

CBcoh121

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

20

40

60

CBc121

0 5 10 150

5

10

15

0

20

40

60

CBc121

frequency in Hz0 5 10 15

0

5

10

15(b)

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

inde

pend

ent

Coh12

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

high

er h

arm

onic

Coh12

0

0.2

0.4

0.6

CBcoh121

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

CBcoh121

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

20

40

60

CBc121

0 5 10 150

5

10

15

0

20

40

60

CBc121

frequency in Hz0 5 10 15

0

5

10

15(d)

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

inde

pend

ent

Coh12

0 5 10 150

0.1

0.2

0.3

0.4

frequency in Hz

high

er h

arm

onic

Coh12

0

0.2

0.4

0.6

CBcoh121

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

0.2

0.4

0.6

CBcoh121

frequency in Hz

freq

uenc

y in

Hz

0 5 10 150

5

10

15

0

20

40

60

CBc121

0 5 10 150

5

10

15

0

20

40

60

CBc121

frequency in Hz0 5 10 15

0

5

10

15(f)

Figure 4.3.: Comparison of linear coupling at 4Hz and 8Hz (top) and additionallyquadratic phase coupling at 8Hz (bottom). As in the univariate case,estimated coherences (a,b) are indistinguishable. However, from esti-mated cross-bicoherence (c,d) and cross-bispectral coefficient (e,f), thedifferences become apparent. For better displayability, estimate’s · areomitted.

This reflects linear coupling in the frequency domain, as quantified by coherence.The sytsem exhibits quadratic phase coupling if furthermore

ϕ14 = ϕ12 + ϕ22 . (4.61)

In Fig. 4.3, the results of bivariate spectral and bivariate bispectral estimation basedon nBl = 200 realizations of L = 10 000 data points are shown for model parametersc11 = c21 = c13 = 0.5 and all other cjk = 1, and NSR = 10%, as before. Estimatedcoherences are comparable (a,b) in the case of linear coupling (top, Eqs. (4.60)) andquadratic phase coupling (bottom, additional Eq. (4.61)). Cross-bispectral analysisreveals the nonlinearity as shown in Fig. 4.3 (c-f). Estimated cross-bicoherence (c,d)and cross-bispectral coefficient (e,f) peaks occur at (4Hz, 4Hz) if and only if qua-dratic phase coupling, Eq. (4.61), persists.To investigate whether estimated bicoherence, cross-bicoherence, bispectral coef-

ficient and cross-bispectral coefficient are significantly different from zero, the pro-posed block bootstrap may be applied. Its performance is assessed by simulationsin the next section.

57

Page 74: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

4.2.2. Performance of block bootstrap-based hypothesis test

Before applying the proposed block bootstrap to empirical data, its reliability andpower is tested in the controlled setting of simulations. The performance of theblock bootstrap-based hypothesis test is assessed for the univariate and the bivari-ate case of the estimated bicoherence and bispectral coefficient. The performance iscompared to that of the established analytic hypothesis test of the bispectral coeffi-cient [23]. For all bispectral estimates, two settings are tested. First, the asymptoticsL = T u of the analytic test is obeyed [91] with u ∈ (0, 1

2). Second, the asymptotic

is not reached. In both settings, data is simulated according to the univariate andbivariate quadratic phase-coupling models presented in Sec. 4.2.1. Data is gaus-sianized prior to bispectral analysis. To test the block bootstrap in the univariatecase, nR = 100 repetitions of a set of nBl realizations of the model

X(t) = cos(ωt+ ϕ1) + cos(ωt+ ϕ2) + c cos(2ωt+ ϕ1 + ϕ2) + η(t) (4.62)

are realized. For c = 0, the null hypothesis is met. Each realization is sampled atfs = 1 000Hz, and consists of L = 5 000 data points. Phase shifts ϕ1, ϕ2

iid∼U(0, 2π]are random but fixed for each realization. As before, NSR = 10% and ω = 2π 4Hz.Bicoherence and bispectral coefficient are estimated based on nBl realizations. Theexplicit number nBl depends on whether the asymptotics of the analytic statistic isreached or not. Significance is tested at the tuple (ω, ω) in each repetition for eachcoupling strength c ∈ [0, 1] by the proposed block bootstrap applied to bicoher-ence and bispectral coefficient estimates, and analytically for bispectral coefficientestimates. The bivariate case is tested analogously. Respective realizations aresimulated according to the bivariate model,

X1(t) = 0.5 cos(ωt+ ϕ11) + cos(ωt+ ϕ12)

+ 0.5 cos(2ωt+ ϕ13) + c cos(2ωt+ ϕ12 + ϕ22) + η1(t) , (4.63)X2(t) = 0.5 cos(ωt+ ϕ11) + cos(ωt+ ϕ22)

+ 0.5 cos(2ωt+ ϕ13) + η2(t) , (4.64)

with random phase shifts ϕjk and ω = 2π 4Hz, as before. Again, the null hypothesesCBcohX1X2X1(ω, ω) = 0 and CBcX1X2X1(ω, ω) = 0 are met for c = 0. While thefirst is tested by the block bootstrap-based hypothesis test only, the second is alsotested by the analytic approach.In Fig. 4.4 the results of the univariate (a) and the bivariate (b) simulation study

are shown for coupling c ∈ (0, 0.1) with significance level α = 5% (black dotted). Inthis first setting, the asymptotics L = T u, u ∈ (0, 1

2), is reached. Choosing u = 0.45,

the total number of data points T = 5 0001/0.45 = 165 933 220 is at the lower boundof T for which the asymptotics holds. The resulting number of realizations usedfor bispectral estimation is nBl = T

L= 33 186. Both in the univariate and in

the bivariate case, the performance of the bootstrap is similarly reliable for thebicoherence (red solid) and the bispectral coefficient (blue dashed). In the univariate

58

Page 75: Statistical Analysis of Processes with Application to

4.2. Application

0 0.02 0.04 0.06 0.08 0.10

20

40

60

80

100

coupling strength

perc

enta

ge o

f rej

ectio

n Bicx bootBcx bootBcx ana

α

0 0.02 0.04 0.06 0.08 0.10

20

40

60

80

100

coupling strength

CBicx1x

2x

1boot

CBcx1x

2x

1boot

CBcx1x

2x

1ana

α

(a)

0 0.02 0.04 0.06 0.08 0.10

20

40

60

80

100

coupling strength

perc

enta

ge o

f rej

ectio

n Bicx bootBcx bootBcx ana

α

0 0.02 0.04 0.06 0.08 0.10

20

40

60

80

100

coupling strength

CBicx1x

2x

1boot

CBcx1x

2x

1boot

CBcx1x

2x

1ana

α

(b)

Figure 4.4.: Results of the univariate (a) and bivariate (b) simulation study in casethe asymptotics of the analytic test holds, i.e. L = T u with u ∈ (0, 1

2).

The performance of the block bootstrap (red solid, blue dashed) is com-parable to that of the analytic test (cyan dash-dotted). All tests arereliable (significance level α = 5%, black dotted) in the case of absentquadratic phase coupling (c = 0). The analytic test is more conserva-tive than the bootstrap based.

case (a), the test is conservative yielding 2% false positives at coupling c = 0. Inthe bivariate case (b), the test exhibits 4% false positives under the null hypothesis.This is compatible with the significance level α. Compared to the bootstrap-basedtest, the analytic test (cyan dash-dotted) is more conservative. It exhibits less than0.003% false positives both for the univariate and the bivariate case. For c > 0 thepower increases more slowly than that of the bootstrap-based tests in the bivariatecase.For the asymptotics of the analytic approach to hold, the number of realizations

needed to estimate the bispectral coefficient is high, here nBl = 33 186 realizations.A second simulation shows the impact on the hypothesis test, if this asymptoticsis not reached. As a reference for the choice of T , a typical tremor recordingof approx. 5min sampled at 1 000 Hz is used [87,88]. The average number of datapoints needed to meet the prerequisite of independence for bispectral estimation isL = 5 000, see App. D. This results in nBl = 60 blocks or realizations, respectively,and a total of T = 300 000 data points. The impact of this restriction is shownin Fig. 4.5. The block bootstrap-based tests (red solid and blue dashed) remainreliable, yet loosing power. The loss of power is due to the decrease of nBl whichresults in an increased variability of the estimates. In the univariate case (a), theblock bootstrap-based test is conservative both for the bicoherence (red solid) andthe bispectral coefficient (blue dashed). In the bivariate case (b), the 5% false pos-itive rate for c = 0 is equivalent to the significance level. The analytic test (cyandash-dotted), on the contrary, is susceptible to the violation of the asymptotics. Inthe case of the null hypothesis, the significance level is exceeded considerably. Theanalytic null distribution leads to an unreliable test both for the univariate and the

59

Page 76: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

coupling strength c

perc

enta

ge o

f rej

ectio

n

Bic

x boot

Bcx boot

Bcx ana

α

0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

coupling strength c

CBic

121 boot

CBc121

boot

CBc121

ana

α

(a)

0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

coupling strength c

perc

enta

ge o

f rej

ectio

n

Bic

x boot

Bcx boot

Bcx ana

α

0 0.2 0.4 0.6 0.8 10

20

40

60

80

100

coupling strength c

CBic

121 boot

CBc121

boot

CBc121

ana

α

(b)

Figure 4.5.: Univariate (a) and bivariate (b) hypothesis tests loose power if theasymptotics is not reached. While the block bootstrap-based test isstill reliable for both the bicoherence (red solid) and the bispectral co-efficient (blue dashed), the analytic test (cyan dash-dotted) becomes un-reliable. Instead of 5% false positives for c = 0, the analytic test yields31% and 35% false positives in (a) and (b), respectively.

bivariate bispectral coefficient.As the simulations show, the analytic and block bootstrapped null distribution

yield reliable and powerful hypothesis tests if the asymptotics L = T u, u ∈ (0, 12),

is reached. However, this may require a lot of data, which may not be availableor recordable in applications. While the block bootstrap remains reliable for thelimited amount of data usually available in applications, the test based on theanalytic null distribution fails.

4.2.3. Bispectral analysis of tremor time series

Spectral and cross-spectral analysis have been widely used in the field of tremoranalysis to understand the interplay of brain and muscle activity [87,172,182,209]. Trem-bling corresponding to pathological forms of tremor has been found to be generatedin the contralateral motor area of the brain [87,88]. An interaction of parts of the brainand the trembling limb is expected [172,209]. Based on the electromyogram (EMG)and the electroencephalogram (EEG), spectral and coherence peaks have been foundat the tremor frequency in the rectified EMG and the EEG [209]. These peaks havebeen related to the pathology of tremor [87,88]. However, spectral and coherencepeaks have been claimed at twice the tremor frequency, as well. So far, they havebeen assumed to be pathologies independent from the ones at tremor frequency [182].As published in [130], it is shown in this section that the peaks at twice the tremorfrequency may refer to first harmonic peaks of the ones occurring at tremor fre-quency instead of independent processes. Significance of nonzero-bicoherence andnonzero-cross-bicoherence is tested applying the proposed block bootstrap. First,the data base and pre-processing of data are summarized. Then second and third

60

Page 77: Statistical Analysis of Processes with Application to

4.2. Application

0 0.2 0.4 0.6 0.8 1−20

−10

0

10

20

30E

EG

in a

u

time in s0 0.2 0.4 0.6 0.8 1

−5

0

5

10

15

EM

G in

au

time in s

(a)

0 0.2 0.4 0.6 0.8 1−20

−10

0

10

20

30E

EG

in a

u

time in s0 0.2 0.4 0.6 0.8 1

−5

0

5

10

15

EM

G in

au

time in s

(b)

Figure 4.6.: Exemplary 1 s recording of EEG (a) and EMG(b) during tremor.

0 10 20

0

5

10

x 105

time lag τ in s

R(τ

)

0 10 20

0

5

10

x 105

time lag τ in s

R(τ

)

0 10 20

0

2

4

6

8

x 105

time lag τ in sR

(τ)

(a)

0 10 20

0

5

10

x 105

time lag τ in s

R(τ

)

0 10 20

0

5

10

x 105

time lag τ in s

R(τ

)

0 10 20

0

2

4

6

8

x 105

time lag τ in sR

(τ)

(b)

0 10 20

0

5

10

x 105

time lag τ in s

R(τ

)

0 10 20

0

5

10

x 105

time lag τ in s

R(τ

)

0 10 20

0

2

4

6

8

x 105

time lag τ in sR

(τ)

(c)

Figure 4.7.: Exemplary EMG autocovariance of three tremor patients (blue) and therespective choice of block lengths (black dashed) 2.5 s (a), 5 s (b), and10 s (c).

order spectral and cross-spectral analysis is made explicit at the example of onepatient. Finally, the results of all patients analyzed are summarized. All uni- andbivariate spectra and bispectra of this section, as well as autocovariances, are esti-mated quantities. For the sake of clarity, estimator signs · are omitted.

Data base

The study includes 58 segments of EMG and simultaneous EEG recordings of 7essential tremor (ET) and 5 Parkinsonian tremor (PT) patients. EMG is recordedfrom the trembling wrist muscles (extensor and flexor) during tremor. Since theEMG corresponds to modulated noise, it is rectified prior to spectral analysis, cf.App. C. EEG is recorded from the motor area of the brain contralateral to thetrembling hand. While ET is a postural tremor, PT also occurs at rest. From eachof the 12 patients, 3 to 6 EEG/EMG–segments of 2 − 5min are included into theanalysis [87,88]. ET patients contribute 30 segments, PT patients 28 segments. Allsegments included into the study exhibit tremor activity in the EMG. An exem-plary second of EEG/EMG during pathological tremor is shown in Fig. 4.6. Toestimate second and third order spectra and cross-spectra, each 2− 5min segment

61

Page 78: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

0 5 10 15

10−5

spectrum EEG

0 5 10 15

10−10

spectrum EMG

frequency in Hz0 5 10 15

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9coherence EEG−EMG

frequency in Hz

(a)

0 5 10 15

10−5

spectrum EEG

0 5 10 15

10−10

spectrum EMG

frequency in Hz0 5 10 15

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9coherence EEG−EMG

frequency in Hz

(b)

0 5 10 15

10−5

spectrum EEG

0 5 10 15

10−10

spectrum EMG

frequency in Hz0 5 10 15

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9coherence EEG−EMG

frequency in Hz

(c)

Figure 4.8.: Spectra of EEG (a) and EMG(b), as well as their coherence (c). Thecoherence peak at twice the tremor frequency exceeds the peak at thetremor frequency, 4.8Hz.

is subdivided into independent blocks of data. The length of these blocks is chosensuch that the autocovariance of the EEG and EMG has decayed considerably. InFig. 4.7, the autocovariances of the EMG of three exemplary patients and the re-sulting choices of block lengths (black dashed vertical line) are shown. In generalthe autocovariance of the EMG decays more slowly than that of the EEG. Fromthe set of blocks, uni- and bivariate spectra as well as uni- and bivariate bispectraare estimated by averaging corresponding periodograms and biperiodograms.

Spectral and bispectral analysis of one patient

In a first step, uni- and bivariate spectral analysis is applied to all EEG and EMGrecordings. In Fig. 4.8, spectra corresponding to the EEG (a) and EMG(b) aswell as the coherence between EEG and EMG (c) are shown for one segment of apatient, exemplarily. From the EMG-spectrum, the tremor frequency of the segmentanalyzed is defined as the frequency at which the first peak in the range [3.5, 8]Hzoccurs. In the segment shown in Fig. 4.8, the tremor frequency is 4.8Hz.Uni- and bivariate spectra of corresponding EEG and EMG exhibit peaks at

twice the tremor frequency, as well. To investigate their third order properties,bicoherence and cross-bicoherence are estimated. In Fig. 4.9, the bicoherence corre-sponding to the EEG (a) and EMG(b), as well as cross-bicoherences (c-f) are shownfor the exemplary segment. While index E corresponds to EEG (instead of x1),the index M corresponds to EMG (instead of x2). The bicoherence of the EMGexhibits a peak at the tuple of tremor frequency (b). This is to be expected from the

62

Page 79: Statistical Analysis of Processes with Application to

4.2. Application

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

BcohE

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEEM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

frequency in Hz

BcohM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohMME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohEMM

0 5 10 150

5

10

15

(a)

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

BcohE

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEEM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

frequency in Hz

BcohM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohMME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohEMM

0 5 10 150

5

10

15

(c)

0

0.2

0.4

0.6

0.8fr

eque

ncy

in H

zBcoh

E

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEEM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

frequency in Hz

BcohM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohMME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohEMM

0 5 10 150

5

10

15

(e)

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

BcohE

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEEM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

frequency in Hz

BcohM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohMME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohEMM

0 5 10 150

5

10

15(b)

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

BcohE

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEEM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

frequency in Hz

BcohM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohMME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohEMM

0 5 10 150

5

10

15(d)

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

BcohE

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

CBcohEEM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

freq

uenc

y in

Hz

frequency in Hz

BcohM

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohMME

0 5 10 150

5

10

15

0

0.2

0.4

0.6

0.8

frequency in Hz

CBcohEMM

0 5 10 150

5

10

15(f)

Figure 4.9.: Univariate bicoherences (a) and (b) of EEG (E) and EMG(M), as wellas bivariate cross-bicoherences (c-f). EMG-bicoherence CBcohM andcross-bicoherence CBcohMME are significant at tremor frequency.

rectification of the raw EMG signal. Another peak occurs in the cross-bicoherenceCBcohMME(4.8Hz, 4.8Hz) (d).To test the significance of bicoherences and cross-bicoherences at the tuple of

tremor frequency, EEG and EMG are gaussianized and the proposed block boot-strap with nB = 20 bootstrap realizations is applied for a significance level α = 5%.The null hypothesis of zero-bicoherence and zero-cross-bicoherence is rejected if theuni- and bivariate bicoherence exceeds the maximum of corresponding 20 bootstrapbicoherences. In the exemplary set of data, the cross-bicoherenceCBcohMME(4.8Hz, 4.8Hz) is significantly different from zero. This suggests thatthe EMG couples nonlinearly into the EEG. The peak at twice the tremor fre-quency may be due to nonlinear interaction instead of being independent from thepeak at the tremor frequency. The test applied to the remaining cross-bicoherencesevaluated at the tremor frequencies is insignificant.

Results of spectral and bispectral analysis for all patients

The results of applying spectral and bispectral analysis to all 58 segments of the 12patients included into the study are summarized in App. D, Tab. D.1.Based on the coherence of the EEG and the EMG, two major groups are iden-

tified. Group A contains segments with a coherence peak at the tremor frequency,exclusively. The 16 segments of this group correspond to 6 ET patients. No PT

63

Page 80: Statistical Analysis of Processes with Application to

4. Bispectral Analysis

patient contributes to this group. Group B contains segments with coherence peaksat twice the tremor frequency. The 33 segments assigned to group B correspondto 22 segments of 4 PT patients and 11 segments of 5 ET patients. In 9 of the 33segments, the coherence peak at twice the tremor frequency exceeds the peak at thetremor frequency. All but one segment correspond to PT patients. Thus PT seg-ments are exclusively assigned to group B, while ET segments are distributed acrossboth groups. Consequently, coherence peaks at twice the tremor frequency, i.e., po-tential first harmonics, occur more consistently across repetitions of recordings fromPT than from ET patients.A third group C is constituted from 1 PT (6 segments) and 1 ET patient (3

segments). In these patients, the frequencies of spectral and coherence peaks at onceand twice the tremor frequency do not match in more than half of their segments.In particular, the first or the second clear peak in the coherence is more than 0.5Hzoff from once or twice the tremor frequency, respectively. Corresponding uni- andbivariate bicoherences are expected to be 0.For all three groups, bicoherences and cross-bicoherences of the EEG and the

EMG are estimated as described in the previous passage in order to identify higherharmonics. For the hypothesis test of zero-bicoherences and zero-cross-bicoherences,the block bootstrap is employed. In all but 2 segments the bicoherence of the EMGis significant. This is to be expected due to the rectification of the EMG. The twoinsignificant EMG-bicoherences occur in group C. Since spectral peaks do not occurat twice the tremor frequency, this result is also expected. The bicoherence of theEEG is significant in 1 ET and 5 PT segments. In all but 10 segments, at leastone of the cross-bicoherences is significant. While only 4 such segments belongto groups A (3 segments) and B (1 segment), the remaining 6 segments belongto group C. Since group C only contains 9 segments, the fraction of insignificantcross-bispectra is high, 66%. This is consistent with the fact, that group C containspatients in which spectral and coherence peaks do not match at once and twice thetremor frequency. In the 48 remaining segments with at least one significant cross-bicoherence, cross-bicoherences which are most frequently significant are those thatquantify quadratic phase coupling from the EMG and the EEG onto the EMG (32segments) and from the EMG onto the EEG (30 segments). Cross-bicoherence of theEEG and EMG onto the EEG is significant in 17 segments. The cross-bicoherenceof the EEG onto the EMG is least frequently significant (13 segments).To conclude, major differences in PT and ET recordings are found by second

order spectral analysis. However, third order spectral analysis reveals that peaksoccurring at twice the tremor frequency in spectra and coherences may be firstharmonics rather than pathological oscillations independent from the oscillation atthe tremor frequency. In segments with consistent spectral and coherence peaksat once and twice the tremor frequency the fraction of significant bicoherences andcross-bicoherences exceeds that of segments without such peaks.

64

Page 81: Statistical Analysis of Processes with Application to

4.3. Summary

4.3. Summary

Spectral analysis is one of the most frequently used techniques to reveal proper-ties of processes in the frequency domain [24,33,45,76,80,164,169,180,215]. Univariate andbivariate methods are distinguished in order to assess the frequency content of oneprocess or interaction of at least two processes [24,164,169]. Most spectral analysesare restricted to second order spectral analysis. However, it is only adequate toquantify linear properties in the frequency domain [57,101,113,126,203]. To investigatenonlinear spectral properties, higher order spectral analyses have to be applied [23].This might, e.g., be the case if second order spectral peaks occur at a fundamentalfrequency and multiples of it, denoted higher harmonics [68,85]. To identify quadraticnonlinearities, bispectral analysis may be applied [130]. Different normalizations ofthe bispectrum are used for their quantification [23,110]. To statistically test the sig-nificance of normalized bispectra, a nonparametric block bootstrap is proposed inthis chapter as published in [130]. The idea of this bootstrap is to resample the blocksused for spectral or bispectral estimation ensuring that quadratic phase couplingis destroyed. To test the hypothesis test based on the block bootstrap, its per-formance is compared to that of an established analytic hypothesis test [23,91]. Theblock bootstrap-based test outperforms the analytic test, particularly if the numberof data points available for bispectral estimation is restricted. While the analytictest becomes highly unreliable if the number of data points is restricted as in mostapplications, the block bootstrap remains reliable. According to this reliability, theblock bootstrap is finally applied to investigate potential first harmonics occurringin tremor data [130]. By bispectral analysis, first harmonics are distinguished fromindependent oscillations at twice the tremor frequency. While oscillations at twicethe tremor frequency have been claimed independent oscillations [182], bispectralanalysis reveals that they could be first harmonics describing the same aspect ofthe pathology [130].

65

Page 82: Statistical Analysis of Processes with Application to
Page 83: Statistical Analysis of Processes with Application to

5. Phase-Amplitude Coupling

In the previous chapter, bispectral analysis is applied to assess nonlinear prop-erties of processes, and their interaction. A generic model of bispectral analysisis quadratic phase coupling. Since phases of different frequency bands couple,quadratic phase coupling corresponds to phase-phase coupling [34,61,205], which inturn belongs to the wider class of cross-frequency coupling [31,61,229]. The charac-teristic property of cross-frequency coupling is that oscillations referring to onefrequency band are coupled to those of a different frequency band [229]. It hasbeen hypothesized that functional interactions of neurons in the brain are basedon cross-frequency coupling [229]. According to this hypothesis, slow oscillationsare claimed to reflect global communication between parts of the brain, whilefast oscillations correspond to local communication between small assemblies ofneurons [162,218,229]. Besides the basic research addressing the functioning of thebrain, cross-frequency coupling has been investigated within the scope of patholo-gies like Parkinson’s disease [41,127,194], social anxiety disorder [142], and tinnitus [2,40].Three major types of cross-frequency coupling are distinguished [229]. Besides phase-phase coupling [34,61,205], amplitude-amplitude [25,193,220] and phase-amplitude cou-pling [30,41,212,219] are investigated. Amplitude-amplitude coupling reflects coupling ofamplitudes corresponding to different frequency bands. Phase-amplitude couplingusually reflects the coupling of low-frequency phase with high-frequency amplitude.Recently, this phase-amplitude coupling [2,10,40,41,194,212] has been in the focus of

investigations since it has been claimed to change task-dependently [26,30,210,219], andto be increased in pathologies like social anxiety disorder and Parkinson’s dis-ease [2,40,41,127,142,194] and during learning [211]. To quantify phase-amplitude coupling,different measures of time series analysis have been applied to the electroencephalo-gram [40,122] or electrocorticogram [141,161], recorded from different sites of the brain.A common basis of these measures is the so-called phase-amplitude plot. It re-flects the mean amplitude as a function of phase angle [212]. In the phase-amplitudecoupled case, mean amplitudes are high at preferred phase angles, while in theuncoupled case, the phase-amplitude plot is constant across phases. Measures ofphase-amplitude coupling quantify this deviation from constancy [30,205,212]. To sta-tistically validate such measures, surrogate methods are employed [10,161,212]. How-ever, the adequacy of surrogate methods has been doubted [10,208,229]. In this chapter,an analytic alternative is proposed. It is based on a transformation of the phase-amplitude plot onto a counting statistic that conforms to a χ2-distribution. Afterits introduction in Sec. 5.1 its performance is assessed in simulations presented inSec. 5.2.

67

Page 84: Statistical Analysis of Processes with Application to

5. Phase-Amplitude Coupling

5.1. Methodology

Phase-amplitude coupling occurs if the phase of one frequency band modulatesthe amplitude of a different frequency band. For investigations of the brain, themodulation of the amplitude of the γ-band (40 − 200 Hz) by the phase of theθ-band (4 − 10 Hz) has been of primary interest, both in animals [21,122,198] andin humans [30,43,102,147,210]. In Sec. 5.1.1, the concept of phase-amplitude coupling isexplained. Measures of phase-amplitude coupling are summarized in Sec. 5.1.2. Theproposed χ2-statistic for the null hypothesis of absent phase-amplitude coupling isderived in Sec. 5.1.3.

5.1.1. Concept of phase-amplitude coupling

Phase-amplitude coupling is modeled by [212]

X(t) = a(t) cosωat+ cosωφt , (5.1)

with the phase-modulated amplitude [212]

a(t) = c cosωφt+ 2− c . (5.2)

The frequency band corresponding to ωa is modulated by the phase φ(t) = ωφt ofthe ωφ-frequency band. Strength of modulation is given by the parameter c ∈ [0, 1],further denoted coupling strength. Whenever the system is phase-amplitude cou-pled, i.e. c > 0, the modulation

cosωφt · cosωat =1

2[cos(ωa − ωφ)t+ cos(ωa + ωφ)t] , (5.3)

induces two sideband contributions in the frequency domain, at ωa − ωφ and ωa +ωφ. Exemplary temporal and spectral representations of a phase-amplitude coupled(c = 1) and uncoupled (c = 0) process with each ωa = 2π 25Hz and ωφ = 2π 4Hzare shown in Fig. 5.1. In the coupled case (a,b), four frequencies contribute to theprocess, according to Eqs. (5.1)–(5.3). Spectral contributions are 1 at frequencies ωaand ωφ and 1

2at frequencies ωa−ωφ and ωa +ωφ, see Fig. 5.1 (b). In the uncoupled

case (c,d), on the contrary, the process consists of two contributions at frequenciesωφ and ωa, see Fig. 5.1 (d). These contributions are 1 at ωφ and 2 at ωa.In the transition from c = 0 to c = 1, power is transferred from the ωa-band into

the two sidebands ωa − ωφ and ωa + ωφ, such that the total spectral contributionremains constant for all coupling strengths. In the model Eq. (5.1), this is ensuredby the summand 2− c in the amplitude a(t), Eq. (5.2).To quantify phase-amplitude coupling in general, the time series of the mod-

ulating phase and the modulated amplitude need to be extracted from the datax(t) [212]. The extraction is illustrated in Fig. 5.2 for a phase-amplitude coupled (a-c)and an uncoupled (d-f) process. The time series of the phase, xφ(t), is obtained

68

Page 85: Statistical Analysis of Processes with Application to

5.1. Methodology

0 0.5 1−1

0

1

2

3

time in s

time

serie

s

0 0.5 1

−2

0

2

time in s

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

spec

tral

rep

rese

ntat

ion

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

(a)

0 0.5 1−1

0

1

2

3

time in s

time

serie

s

0 0.5 1

−2

0

2

time in s

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

spec

tral

rep

rese

ntat

ion

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

(c)

0 0.5 1−1

0

1

2

3

time in s

time

serie

s

0 0.5 1

−2

0

2

time in s

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

spec

tral

rep

rese

ntat

ion

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

(b)

0 0.5 1−1

0

1

2

3

time in s

time

serie

s

0 0.5 1

−2

0

2

time in s

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

spec

tral

rep

rese

ntat

ion

0 10 20 30 400

0.5

1

1.5

2

frequency in Hz

(d)

Figure 5.1.: Phase-amplitude coupled (a,b) and uncoupled (c,d) process with sub-processes oscillating at ωφ = 2π 4Hz and ωa = 2π 25Hz. Processes areshown both in the time (a,c) and frequency domain (b,d).

from narrow-band pass filtering about the frequency ωφ of the modulating phase.The time series of the amplitude, xa(t), is obtained from band-pass filtering in therange [ωa − ωφ − ∆, ωa − ωφ + ∆]. The width ∆ ensures that the major contri-bution of the ωa ± ωφ-related oscillations is included in the filtered signal [212]. InFig. 5.2, the coupled (a) and uncoupled (d) process is shown in its spectral represen-tation (blue solid). The respective band-pass filters resulting in xφ(t) (green dashed)and xa(t) (red dash-dotted) are included as well. The temporal representation ofx(t), xφ(t) and xa(t) are shown in (b,e). Phase-amplitude coupled systems (c) maybe distinguished from uncoupled systems (f), when considering the envelope of thephase- (green dashed) and amplitude-time series (red dash-dotted). To this end, theanalytic signals of xa(t) and xφ(t),

xa(t) = Aa(t)eiΦa(t) and xφ(t) = Aφ(t)eiΦφ(t) , (5.4)

are derived [212]. The amplitudes

Aj(t) =√x2j(t) + x2

j(t) , for j ∈ {a, φ} , (5.5)

69

Page 86: Statistical Analysis of Processes with Application to

5. Phase-Amplitude Coupling

0 10 20 300

1

2

spec

. rep

.

frequency in Hz

rawphaseamplitude

0 0.2 0.4 0.6 0.8 1

−2

0

2

x,x φ,x

a

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

Φφ, A

a

time in s

0 10 20 300

1

2

frequency in Hz

0 0.2 0.4 0.6 0.8 1

−2

0

2

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

time in s

(a)

0 10 20 300

1

2

spec

. rep

.

frequency in Hz

rawphaseamplitude

0 0.2 0.4 0.6 0.8 1

−2

0

2

x,x φ,x

a

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

Φφ, A

a

time in s

0 10 20 300

1

2

frequency in Hz

0 0.2 0.4 0.6 0.8 1

−2

0

2

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

time in s

(d)

0 10 20 300

1

2

spec

. rep

.

frequency in Hz

rawphaseamplitude

0 0.2 0.4 0.6 0.8 1

−2

0

2

x,x φ,x

a

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

Φφ, A

a

time in s

0 10 20 300

1

2

frequency in Hz

0 0.2 0.4 0.6 0.8 1

−2

0

2

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

time in s

(b)

0 10 20 300

1

2

spec

. rep

.

frequency in Hz

rawphaseamplitude

0 0.2 0.4 0.6 0.8 1

−2

0

2

x,x φ,x

a

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

Φφ, A

a

time in s

0 10 20 300

1

2

frequency in Hz

0 0.2 0.4 0.6 0.8 1

−2

0

2

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

time in s

(e)

0 10 20 300

1

2

spec

. rep

.

frequency in Hz

rawphaseamplitude

0 0.2 0.4 0.6 0.8 1

−2

0

2

x,x φ,x

a

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

Φφ, A

a

time in s

0 10 20 300

1

2

frequency in Hz

0 0.2 0.4 0.6 0.8 1

−2

0

2

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

time in s

(c)

0 10 20 300

1

2

spec

. rep

.

frequency in Hz

rawphaseamplitude

0 0.2 0.4 0.6 0.8 1

−2

0

2

x,x φ,x

a

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

Φφ, A

a

time in s

0 10 20 300

1

2

frequency in Hz

0 0.2 0.4 0.6 0.8 1

−2

0

2

time in s

0 0.2 0.4 0.6 0.8 1

0.51

1.52

time in s

(f)

Figure 5.2.: Extracting phase Φφ(t) of phase dynamics xφ(t) and amplitude Aa(t)of amplitude dynamics xa(t) from data of a coupled (a-c) and uncou-pled (d-f) process. Band-pass filtering (green dashed for phase andred dash-dotted for amplitude) of the coupled and uncoupled processX(t) (blue solid), shown in the spectral representation (a,d), leads tothe phase-time series, xφ(t) (green dashed), and the amplitude-time se-ries, xa(t) (red, dash-dotted), shown in (b,e). While in the coupledcase, the envelope of xa(t) is modulated (b), the envelope is constant inthe uncoupled case (e). The analytic signal of xφ(t) and xa(t) providesthe phase Φφ(t) and amplitude Aa(t) that is used to quantify phase-amplitude coupling. In the case of coupling, Aa(t) varies as a functionof the phase (c). For uncoupled systems, Aa(t) remains constant (f).

and phases

Φj(t) = arctanxj(t)

xj(t), for j ∈ {a, φ} , (5.6)

are obtained from the Hilbert transform

z(t) =1

πp.v.

∫ ∞

−∞

z(τ)

t− τ dτ , (5.7)

where p.v. is the Cauchy principal value [212].Based on Aa(t) and Φφ(t), a set of different measures may be applied to quantify

phase-amplitude coupling [31,161,212,229]. Exemplary measures are summarized in thefollowing.

70

Page 87: Statistical Analysis of Processes with Application to

5.1. Methodology

5.1.2. Measures of phase-amplitude coupling

One of the most fundamental measure quantifying phase-amplitude coupling is acorrelation coefficient of the phase- and amplitude-time series, xφ(t) and xa(t), re-spectively [25,114,161,212]. A major drawback of this correlation coefficient is that itis a linear measure while the coupling usually is nonlinear [212]. Furthermore thismeasure is highly sensitive to noise [212]. It has been shown, that both drawbacks ofthe correlation coefficient are overcome by measures based on the phase-amplitudeplot [212]. The phase-amplitude plot is introduced in the subsequent passage. Mea-sures based on the phase-amplitude plot are summarized afterwards. The remainderof this section follows [212] if not stated otherwise.

Phase-amplitude plots

Phase-amplitude plots are normalized mean amplitudes 〈Aa〉 as function of phaseangles Φφ rather than time t. In particular, phase-amplitude plots are obtainedaccording to the following steps.

1. Assign the time points t to nPA bins [0, 2π 1nPA

), . . . , [2π nPA−1nPA

, 2π) accordingto their phase values Φφ(t) mod 2π. The k-th bin contains nk time points{t1(k), . . . , tnk(k)

∣∣∣ Φφ(tj(k)) mod 2π ∈[2πk − 1

nPA, 2π

k

nPA

), j = 1, . . . , nk

},

(5.8)

for k = 1, . . . , nPA.

2. Average the amplitudes {Aa(tj(k))}j=1,...,nk of each bin k = 1, . . . , nPA,

〈A〉k =1

nk

nk∑

j=1

Aa(tj(k)) , (5.9)

3. Normalize 〈A〉k by its sum, i.e.

P(k) =〈A〉k∑nPAk=1 〈A〉k

, (5.10)

such that P(k) ≤ 1, for all k = 1, . . . , nPA .

The phase-amplitude plot P(k) is normalized according to a probability distribu-tion. For phase-amplitude coupled processes, the probability for a high amplitudesis high at preferred phases. This results in a non-uniform phase-amplitude plot.In the case of uncoupled processes, the corresponding phase-amplitude plot con-forms to a uniform distribution on nPA bins in the interval (0, 2π]. Phase-amplitudeplots for the above phase-amplitude coupled and uncoupled process are shown inFig. 5.3 (a) and (b).

71

Page 88: Statistical Analysis of Processes with Application to

5. Phase-Amplitude Coupling

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

phas

e−am

plitu

de h

isto

gram

mass density

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

mass density

(a)

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

phas

e−am

plitu

de h

isto

gram

mass density

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

mass density

(b)

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

phas

e−am

plitu

de h

isto

gram

mass density

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

mass density

(c)

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

phas

e−am

plitu

de h

isto

gram

mass density

10 20 30 40 500

0.01

0.02

0.03

0.04

phase bin

mass density

(d)

Figure 5.3.: Phase-amplitude plot for the phase-amplitude coupled (a) and uncou-pled (b) case. Their transformation to a mass density is shown in (c)and (d), respectively.

Measures based on phase-amplitude plots

Several measures of phase-amplitude coupling have been proposed based on phase-amplitude plots [30,212]. They quantify the degree to which the phase-amplitude plotdeviates from uniformity. Here, the two most powerful measures are presented,exemplarily.The heights-ratio

H =Pmax−Pmin

Pmax, (5.11)

quantifies the deviation of highest to the lowest value of the phase-amplitude plot,Pmax and Pmin, respectively. The major advantage of the heights-ratio is that it isstraight forward in its interpretation. However, its power of detecting multimodalityin phase-amplitude plots is limited.This is resolved when applying the Kullback-Leibler distance [118],

D(G,H) =

nPA∑

k=1

G(k) log

( G(k)

H(k)

). (5.12)

It quantifies the distance of discrete distributions G(k) and H(k). Accordingly,D = 0 if and only if the two distributions are identical, i.e., G(k) = H(k) for all

72

Page 89: Statistical Analysis of Processes with Application to

5.1. Methodology

k = 1, . . . , nPA. The Kullback-Leibler distance of a phase-amplitude plot P(k) to auniform distribution on nPA bins is

D(P ,U) = log(nPA) +

nPA∑

k=1

P(k) log (P(k)) . (5.13)

A major drawback of the Kullback-Leibler distance is that it is not normalized tovalues in [0, 1]. This is overcome by dividing D(P ,U) by log(nPA), resulting in themodulation index [205],

M = 1 +

∑nPAk=1P(k) log (P(k))

log(nPA). (5.14)

Its minimum value, M = 0, is obtained if and only if the phase-amplitude plot isuniform. On the contrary, M = 1 if the phase-amplitude plot is of Dirac-delta type,i.e., P(j) = 1 for one bin j and else P(k) = 0 in all bins k 6= j. The modulation indexhas been introduced to quantify phase-phase coupling rather than phase-amplitudecoupling [205]. Its application to phase-amplitude coupling was introduced ten yearslater [210]. The modulation index has been shown to outperform other measuresbased on the phase-amplitude plot.

5.1.3. Statistical assessment of phase-amplitude coupling

Hypothesis tests for phase-amplitude measures are traditionally based on surrogatemethods [121,161,207]. Those methods are similar to bootstrap methods in that surro-gate time series are obtained by resampling based on the data analyzed [207]. For thestatistical assessment of phase-amplitude measures, surrogate phases Φ

(s)φ (t) are ob-

tained by resampling phases within or across trials [121,161,212], or by inserting randomphase-shifts into the original phase Φφ(t) [30]. Together with the original amplitude,Aa(t), the surrogate measure is derived from the surrogate phase-amplitude plot of(Φ

(s)φ (t), Aa(t)). Similar to bootstrap-based resampling methods, the distribution of

surrogate measures is sampled by a set of surrogate measures derived from a set ofsurrogate phase-time series. The resulting surrogate distribution corresponds to thenull distribution of a hypothesis test on absent phase-amplitude coupling [212]. It hasbeen stated that these methods violate the minimal distortion criterion resulting ininsufficient null distributions [10]. According to the minimal distortion criterion, thesurrogate-based distribution should differ from the original distribution of interestwith respect to the null hypothesis, exclusively [10]. Instead of the effortful procedureof deriving measures from phase-amplitude plots and according surrogates, here, aχ2-test statistic is proposed. It is based on the phase-amplitude plot, directly.A χ2-test addresses the situation in which a number of objects are assigned to

nBin bins. Let N0(k) be the number of expected objects assigned to bin k under acertain null hypothesis, H0, and N(k) the number of objects actually observed in

73

Page 90: Statistical Analysis of Processes with Application to

5. Phase-Amplitude Coupling

bin k. Under H0 the statistic

χ2 =

nBin∑

k=1

(N(k)−N0(k))2

N0(k)(5.15)

is χ2-distributed with nBin − 1 degrees of freedom [168]. Since N(k) and N0(k) arecountable objects, the χ2-statistic, Eq. (5.15), may not be derived from the phase-amplitude plot, directly. To resolve this, the height of the phase-amplitude plot isidentified with a mass density,

C(k) = u Pmax−k∑

j=1

P(j) , for k = 1, . . . , nPA , u & 1 , (5.16)

shown in Fig. 5.3 (c) and (d). By this transformation the phase-amplitude plot (a)transforms into an accumulation of points (c) in the case of coupling. In the uncou-pled case the phase-amplitude plot (b) transforms into equally distributed pointsacross phase angles (d). Binning the mass density C yields the histogram N(k),k = 1, . . . , nBin, of the realization analyzed. The null hypothesis of uncoupled pro-cesses is ensured for

N0(k) =1

nBin

nBin∑

j=1

N(k) . (5.17)

The exact form of N0(k) depends on the null hypothesis tested. The null hypothesisof phase-amplitude uncoupled processes transforms into the null hypothesis of anuniform mass-density histogram. Other null hypotheses are testable by changingN0(k), accordingly. This shows the versatility of this approach with respect to thenull hypotheses testable.The proposed transformation contains the parameter, u & 1. This parameter

scales the nearness of successive points in the mass density. For u = 1, C(k) ≈ 0 inthe case of uncoupled processes. This closeness to 0 induces an undesired suscepti-bility of the hypothesis test to numerical fluctuations such that false positives occur.If u is chosen too high, on the contrary, the deviation of the phase-amplitude plotfrom a uniform distribution is low compared to C. This dependence of the methodon the choice of u is investigated in the following section.

5.2. Application

To assess the performance of the proposed χ2-test, the model system

X(t) = (c cos(ωφt+ ϕ1) + (2− c)) cos(ωat+ ϕ2) + cos(ωφt+ ϕ1) + η(t) , (5.18)

with ωa = 2π 25Hz and ωφ = 2π 4Hz is simulated for coupling strengths c ∈ [0, 1].Random phase shifts ϕ1, ϕ2 ∼ U(0, 2π] of the process are constant within each

74

Page 91: Statistical Analysis of Processes with Application to

5.2. Application

0 0.05 0.1 0.15 0.2 0.25 0.30

20

40

60

80

100

coupling strength cpe

rcen

tage

of r

ejec

tions

NSR = 0%NSR = 10%NSR = 50%α

Figure 5.4.: Performance of the proposed χ2-test for phase-amplitude coupling ata significance level α = 5% (black dotted). For all NSRs, the test isreliable. Power is considerably increased at c = 0.15, where coupling isdetectable for the deterministic system, i.e. NSR = 0 (blue solid). Forincreasing NSR (red dashed and green dash-dotted), the percentage ofH0-rejections transitions smoothly from 0% to 100%.

realization as in the previous chapter. Gaussian white noise η(t) ∼ N (0, ση) ismodeled according to noise-to-signal ratios (NSR), NSR = 0%, 10%, and 50%. Foreach set of parameters, nR = 1 000 realizations of T = 5 000 data points sampled at1 000Hz are simulated. To obtain phase and amplitude time series, the realizationsare band-pass filtered in the frequency bands [3, 5]Hz and [20, 30]Hz, respectively.To derive a highly resolved phase-amplitude plot, the number of bins nPA is tobe chosen high. However, the lower frequency bound, [3, 5]Hz induces a naturalupper bound for nPA = 1 000

5= 200. For more than 200 bins, there would be

more phase bins than phases sampled. For the histogram of the mass density C(k),Eq. (5.16), the number of bins, nBin, needs to be small to be robust with respect tostochasticity. To resolve the shape of the phase-amplitude plot and resulting massdensity histogram, here, nBin = 5 is chosen. The parameter u is chosen close to 1to obtain a powerful test, yet applicable up to NSR =50%. In particular, u is set1.1. The null hypothesis of uniform mass density is tested at the significance levelα = 5% based on the χ2-statistic derived in Sec. 5.1.3.In Fig. 5.4, the results of the simulation study are shown as the percentage of

H0-rejections for increasing coupling strength c ∈ [0, 0.3], for the three NSRs. Underthe null hypothesis, i.e., c = 0, the percentage of false positives is 0% for NSR = 0%and 10%. For NSR = 50%, the false positive rate is 1.6%. A considerable increaseof power occurs at c = 0.15, independent from the NSR. At the same time, the testremains reliable in the considered NSR-range.To justify the choice of u = 1.1, the results of the χ2-test for data of the above

model with NSR 0%, 10%, and 50% are shown in Fig. 5.5 for different choicesof u ∈ [1.01, 1.5]. A too small choice of u leads to a test that is sensitive tonoise influences (blue solid and green dashed). A too high choice of u (purple solidwith dots) renders identification of phase-amplitude coupling only possible for highcoupling strengths, i.e. c ≈ 1. For the choice u = 1.1 (red solid with circles), phase-

75

Page 92: Statistical Analysis of Processes with Application to

5. Phase-Amplitude Coupling

0 0.5 10

50

100

coupling strength c

perc

enta

ge o

f rej

ectio

ns

NSR = 0%

u=1.01u=1.05u=1.1u=1.2u=1.5α

0 0.5 10

50

100

coupling strength c

NSR = 10%

0 0.5 10

50

100

coupling strength c

NSR = 50%(a)

0 0.5 10

50

100

coupling strength c

perc

enta

ge o

f rej

ectio

ns

NSR = 0%

u=1.01u=1.05u=1.1u=1.2u=1.5α

0 0.5 10

50

100

coupling strength c

NSR = 10%

0 0.5 10

50

100

coupling strength c

NSR = 50%(b)

0 0.5 10

50

100

coupling strength c

perc

enta

ge o

f rej

ectio

ns

NSR = 0%

u=1.01u=1.05u=1.1u=1.2u=1.5α

0 0.5 10

50

100

coupling strength c

NSR = 10%

0 0.5 10

50

100

coupling strength c

NSR = 50%(c)

Figure 5.5.: Comparison of the proposed χ2-test with respect to the choice of theparameters u for different noise levels. The case NSR = 0% (a) servesas a reference for NSR = 10% (b) and 50% (c). For u < 1.1 (blue solidand green dashed), the test is not reliable due to the fluctuations in thephase-amplitude plot. For u = 1.5 (purple solid with dots), the poweris considerably decreased.

amplitude coupling is detectable at low coupling strengths c, while the test remainsreliable at c = 0 up to NSR = 50%.

5.3. Summary

Investigations of function and dysfunction of the brain have claimed cross-frequencycouplings [2,25,30,31,34,41,61,127,162,193,194,205,212,219,220,229]. Cross-frequency coupling oc-curs if the phase or amplitude corresponding to a narrow-band frequency componentof a process modulates the phase or amplitude of a different frequency band. Themost promising type of cross-frequency coupling for investigations of function anddysfunction of the brain is phase-amplitude coupling [2,26,30,40,41,122,141,142,161,194,212,219].Common measures of phase-amplitude coupling are based on the phase-amplitudeplot [30,212]. The phase-amplitude plot corresponds to the distribution of mean am-plitudes across phase angles [212]. It is uniform if the process is phase-amplitudeuncoupled. In the case of phase-amplitude coupling, the phase-amplitude plotdeviates from uniformity. To statistically validate measures based on the phase-amplitude plot, resampling-based methods are applied [30,121,161,212]. In this chapter,an analytic alternative is proposed. The idea of this alternative is to transform thephase-amplitude plot into a distribution of countable objects such that a χ2-test [168]may be applied. The proposed transformation consists of a transformation of thephase-amplitude plot into the histogram of a mass distribution that follows thephase-amplitude plot. The resulting χ2-test is versatile with respect to the nullhypotheses testable by it. Its performance is assessed in simulations.

76

Page 93: Statistical Analysis of Processes with Application to
Page 94: Statistical Analysis of Processes with Application to
Page 95: Statistical Analysis of Processes with Application to

Part II.

Statistical Assessment of EventPredictors

79

Page 96: Statistical Analysis of Processes with Application to
Page 97: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

This chapter is based on the publication

M. Mader, W. Mader, B. J. Gluckman, J. Timmer, B. Schelter. Statisticalevaluation of forecasts. Phys. Rev. E, 90: 022133, 2014. [131]

Many fields of physics are based on detecting, describing, and finally predictingevents, such as decays of particles in particle physics [107,154,171], merging of blackholes and passing of stars is astrophysics [14,103,125,157,197], ocean waves in hydro-dynamics [100,139], protein foldings and DNA translation in biophysics [96,128,166], land-slides, earthquakes, and heavy precipitation in geophysics [84,155,177], as well as car-diac arrhythmia and epileptic seizures in the neurosciences [11,223].Of particular interest is the prediction of extreme events with severe impact on

the lives of people. Such events include earthquakes [143,150,185], storms [71,138,149],and heavy precipitation resulting in flash floods [99,224,230]. These events largelyaffect people within whole areas [84]. Events that impact the quality of life ofgroups of individuals independent from their regional occurrence are, e.g., epilep-tic seizures [145,191]. In either case, the impact of extreme events might be allevi-ated by efficacious precautions and interventions if they could be predicted trust-worthily [58,93,226].A predictor is considered trustworthy if both upcoming events and their absence

are correctly predicted up to predefined error probabilities [84,145]. Since extremeevents tend to occur rarely [202], the focus so far has been on statistically controllingthe sensitivity of predictors [145,191]. Sensitivity refers to trustworthiness with respectto correct prediction of event occurrences [145]. The counterpart of sensitivity isspecificity, quantifying the trustworthiness with respect to absent events [225]. Foraffected people, high specificity would result in an improvement of quality of life,since permanent fear could be alleviated [58,146]. To achieve this, negative predictions,i.e., prediction of absent events, need to be statistically specified and controlled.So far, the specification and control of negative predictions has been circumvented

by statistically controlling the sensitivity as a function of admissible false positiverate [189,226]. The false positive rate aims at compensating for the lack of controlof negative predictions, directly [6,146]. In this chapter, first negative predictionsare specified. Then, an independent statistical assessment of sensitivity and speci-ficity is proposed following the publication [131]. After theoretical considerationsare presented in Sec. 6.1, simulations assessing the performance of correspondinghypothesis tests are presented in Sec. 6.2.

81

Page 98: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

6.1. Methodology

Predictors of events are developed under the assumption that the system analyzedchanges its state prior to events. In the case of flash-flood forecasting, the state ischaracterized by properties such as soil moisture, land vegetation and surface tem-perature [84]. In the case of epileptic-seizure prediction, so far linear and nonlinearmeasures of time series analysis, such as those presented in Part I, are applied [145,206]

to the electroencephalogram, electromyogram, or electrocardiogram of epileptic pa-tients [175]. In the field of seizure prediction, such measures of time series analysis arecalled features [146]. Based on changes of these features, a predictor is obtained, e.g.,by neural networks [93,173,176,184] or linear and nonlinear regression methods [8,17,72,83].Using the example of logistic regression, the derivation of a predictor based on mea-sures of time series analysis is made explicit in Sec. 6.1.1. In general, predictorstransform the properties of the system either into the probability for an event orinto a binary yes/no-alarm time series [9,84]. Typically, the performance of proba-bilistic predictors is quantified by the Brier score [81,97]. The performance of binarypredictors is quantified in contingency tables or measures derived from contingencytables [225], see Sec. 6.1.2. To test whether the performance of binary predictors issignificantly better than random predictions, various hypothesis tests have been pro-posed [6,58,59,97,116,226]. Their null distributions are either derived analytically [189,226]

or estimated by surrogate [6,58,116] and bootstrap methods [59,97]. The ideas and lim-itations of these hypothesis tests are summarized in Sec. 6.1.3. In Sec. 6.1.4, amethod that overcomes these limitations is presented. It has been published in [131].

6.1.1. Establishing a predictor

The aim of a predictor is to correctly forecast both the occurrence and the absenceof future events by corresponding predictions [146]. To this end, true positive andtrue negative predictions need to be specified. So far, most attention has been paidto positive predictions, i.e., predictions that an event is about to occur [58,226]. Forprobabilistic predictors, this corresponds to the probability for an event being ideally1 prior to the event. For binary predictors, an alarm is to be raised instead. In orderto initiate precautions and interventions based on a prediction, it is reasonable thatan event-free period precedes the event as shown in Fig. 6.1 (e, red solid verticalline). This event-free period is called intervention period [58,145] (IP, dotted blue box),prediction horizon [226], or lead time [62,99,230]. The IP is followed by the occurrenceperiod (OP, red solid box) [145] or hit window [197]. In this period the event needs tooccur for a positive prediction at t0 (brown dotted vertical line) to be classified truepositive. After a positive prediction, the next prediction is made after the eventhas happened at t1 (brown dashed vertical line), or the OP has passed without anevent occurring. While the positive prediction at t0 is true in the former case, it isclassified false positive in the latter case [145].

82

Page 99: Statistical Analysis of Processes with Application to

6.1. Methodology

time

t0 t1

e

t2 t2 + 1IP

Ti

OP

To Ti

Figure 6.1.: Time line for the definition of predictions. After a positive predictionat t0 (brown dotted), the intervention period (IP, dotted blue box) con-sisting of Ti time points needs to be event free. The event (e) has tooccur within the To time points of the occurrence period (OP, solid redbox). Otherwise, the prediction is false positive. The next predictionis possible at t1 after the event occurred (brown dashed). The nega-tive prediction (brown dotted) at t2 is correct if and only if no eventoccurs within the subsequent Ti time points. The next prediction aftera negative prediction is made at t2 + 1 (brown dashed).

Even though the negative counterpart to true positive predictions would increasethe feeling of security of affected people considerably, so far, no rigorous definitionsimilar to positive predictions has been set [6,146]. Based on the definition of positivepredictions, a true negative prediction at t2 (brown dotted vertical line) is heredefined as future absence of an event within the next Ti time points. After a negativeprediction at t2, the next prediction is made one time step later at t2 + 1 (browndashed vertical line). By this definition, an event occurring at t2 + Ti + 1 may becorrectly predicted at t2 + 1 after a negative prediction at t2. A negative predictionis classified false if an event occurs within Ti time points of the negative prediction.Based on the definitions of positive and negative predictions, predictors may beestablished as exemplarily shown in the following passage.

Prediction mechanisms and resulting predictors

A prerequisite of predictions is that relevant changes of the system’s state are quan-tifiable by a set of features, say f1(t), . . . , fN(t) for time points t = 1, . . . , T [146].Common features applied to seizure prediction include univariate measures such asvariance, Hjorth parameters, and decorrelation time, and bivariate measures suchas phase synchronization or cross-correlation. For a review of important featuresof seizure prediction, see [146,206]. Based on such features two ways of establishinga predictor are conceivable. Either a feature may be thresholded to derive binarypredictions, directly [146]. Then appropriate values of thresholds need to be iden-tified [226]. Alternatively, a prediction mechanism may be employed to transformthe set of features into a prediction [93,97,173,176,184,225]. This is made explicit usingthe example of logistic regression [97,225]. By logistic regression the probability of an

83

Page 100: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

event is regressed by [225]

p(t) =eb0+b1 f1(t)+···+bN fN (t)

1 + eb0+b1 f1(t)+···+bN fN (t)=

1

1 + e−b0−b1 f1(t)−···−bN fN (t). (6.1)

To obtain a predictor from Eq. (6.1), the parameters b0, . . . , bN need to be specified.This is achieved by maximum-likelihood estimation [225]. If features of subsequenttime points t and t+ 1 are independent, the likelihood

L(b0, b1, . . . , bN) =T∏

t=1

y(t) eb0+b1 f1(t)+···+bN fN (t) + (1− y(t))

1 + eb0+b1 f1(t)+···+bN fN (t)(6.2)

is maximized [225]. The predictands y(t) correspond to ideal predictors [225] as definedby true positive and true negative predictions, above. So far, y(t) = 1 in thecase of an event in the corresponding OP, and 0 otherwise [97]. In order to ensureindependence of features of subsequent time points, it has been proposed to considerfeatures

fk(t) =1

L

tL∑

t=(t−1)L+1

fk(t) , t = 1, . . . , T , (6.3)

averaged in non-overlapping windows of length L rather than the highly sampledfeature-time series fk(t), t = 1, . . . , TL, for all features k = 1, . . . , N [97]. Onceparameters b0, b1, . . . , bN of the logistic regression are obtained, the predictor p(t) ofEq. (6.1), converts averaged features f1(t), . . . , fN(t) into the probability p(t) for anevent in the corresponding OP. From this probabilistic predictor a binary predictoris obtained by thresholding p(t), appropriately [225]. The prediction is positive if thethreshold is exceeded, else it is negative. To quantify the performance of predictors,a set of scores has been suggested [225]. They are summarized in the following section.

6.1.2. Quantifying the performance of predictors

To evaluate the performance of a predictor it is useful to split the available data intotwo sets, the training data and the validation data [216]. Both sets comprise distinctevents that are to be predicted. From the training data, the predictor is establishedas exemplarily shown in the previous section [225]. Application of this predictorto the validation data set renders the quantification of the predictor performancepossible [231].The most common [97,225] quantification of the performance of probabilistic pre-

dictors is the Brier score [22]

B =1

T

T∑

t=1

(p(t)− y(t))2 . (6.4)

84

Page 101: Statistical Analysis of Processes with Application to

6.1. Methodology

pred

ictio

nobservation

event

no event

observed total

event no event predicted total

nTP nFP

nFN nTN

nE = nTP + nFN nnE = nFP + nTN

n+ = nTP + nFP

n− = nFN + nTN

T = n+ + n−= nE + nnE

Table 6.1.: Contingency table of a binary predictor with nTP the number of TP andnFP the number of FP predictions, as well as nFN the number of FP andnTN the number of TN predictions.

It is the mean squared error of the probabilistic prediction p(t) and the indicatorfunction of actual events, y(t). Ideally, the Brier score is 0 [225]. In the case of anindecisive predictor, i.e., p(t) = 0.5 for all t = 1, . . . T , the Brier score isB = 0.25 [97].Accordingly, predictors with B > 0.25 correspond to impotent predictors.The performance of binary predictors may be summarized in contingency tables

as shown in Tab. 6.1. They contain the number of true positive (TP) and falsepositive (FP) predictions, as well as the number of false negative (FN) and truenegative (TN) predictions. Different scores have been proposed to quantify theperformance of the corresponding predictor based on these tables [46,225]. Insertinga hypothetical binary predictor into the Brier score instead of the probabilisticpredictor illustrates a fundamental effect of contingency table-based scores of eventprediction. As published in [131], the sum of the Brier score of binary predictionsdecomposes into sums corresponding to TP, FP, FN and TN predictions, such that

B =1

T

(∑

TP

0 +∑

FP

1 +∑

FN

1 +∑

TN

0

)

=nTP ·0 + nFP ·1 + nFN ·1 + nTN ·0

T

= 1− nTP + nTNT

,

(6.5)

where nTP is the number of TPs, nFP the number of FPs, nFN the number of FNs,nTN the number of TN, and T is their sum. TN and TP predictions are weightedequally [131]. In the case of rare events, this may result in misleadingly low valuesof the Brier score, since many correct negative predictions compensate the lack oftrue positive predictions [131].

85

Page 102: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

A score related to the Brier score for binary predictions is the proportion cor-rect [64,225],

C =nTP + nTN

T. (6.6)

This fraction of TPs and TNs to the total number of predictions, T , incorporatesthe same interdependence of TPs and TNs as the Brier score. This is resolved bythe sensitivity [226] or hit rate [46],

S+ =nTPnE

. (6.7)

It is the fraction of correctly predicted events with respect to all actual events. Thesensitivity corresponds to the conditional probability of a positive prediction givenan observed event [225]. The conditional probability of a positive prediction given anabsent event is the false alarm rate [46],

A- =nFPnnE

. (6.8)

In seizure prediction, the false positive rate,

F+ =nFP∆T

, (6.9)

so far has been considered instead of the false alarm rate [226]. The false positiverate quantifies the number of FPs during the period ∆T . False positive rate andsensitivity are the two scores based on which parameters of prediction methodsmay be optimized. For this purpose, the seizure prediction characteristic has beenproposed [189,226]. It quantifies the sensitivity of a prediction method for a fixed,maximally acceptable false positive rate as a function of different sets of param-eters [226]. This control of the false positive rate is a substitute of controlling thenumber of true negative predictions [189]. To quantify the number of true negativepredictions, the specificity, [189]

S- =nTNnnE

, (6.10)

has been defined.To identify trustworthy predictors, the presented scores need to be statistically as-

sessed [58,72,81,225,226]. An overview of established hypothesis tests of event predictionis given in the following section.

86

Page 103: Statistical Analysis of Processes with Application to

6.1. Methodology

6.1.3. Traditional hypothesis tests for event predictors

Statistically assessing the performance of event predictors means comparing theirperformance to the performance obtainable from randomly guessing futureevents [6,59,59,97,116,189,226]. A method that randomly guesses event occurrences iscalled a random predictor (RP) [58,189,226], or reference method [72,81,225].Resampling-based predictors correspond to surrogate- [6,59,116] or bootstrap-

RPs [59,97]. Accordingly, either inter-event intervals [6,59] or features used for pre-diction [116] are resampled from the original time series of predictions and observa-tions [59]. This results in a set of random prediction-observation time series, in whichpredictions are independent from observations [6]. Accordances of predictions andobservations in these time series are due to random effects, solely. The samplingdistribution of the performance measure of the resampled time series estimates thenull distribution of the according performance measure [6]. The corresponding nullhypothesis is [6]

H0: “The performance of the event predictor is compatible with chance.” (6.11)

The term chance is partly specified by the resampling method employed [6]. Thisallows for versatile null hypotheses that may incorporate time-dependencies likeclustering of events [6]. However, it has been claimed that the number and extentof assumptions made when employing such resampling-based methods is multitudi-nous, such that H0 remains fuzzy [59].For binary predictors, this is overcome [59] by hypothesis tests based on analytic

null distributions [29,225,226]. As a first idea of statistically assessing binary predic-tions, contingency tables may be statistically assessed, directly [29]. To this end, aFisher–Irwin test may be employed [66,67]. Such statistics of contingency tables missthe fact that the probability of TP and TN predictions is enhanced by the durationof the IP and OP.This has been resolved by a homogeneous Poissonian RP that corrects for the

extended duration of the OP [135,189,226]. This RP has been established within thescope of the seizure prediction characteristics, by which the sensitivity is specifiedas a function of maximally acceptable false positive rate, F+

max[226]. Accordingly, it

is called the F+max-RP in the following. The probability of the F+

max-RP to raise analarm within To data points is [189,226]

p = F+max To . (6.12)

Choosing To the number of data points in the time frame of the OP, the probabilityto predict k out of K = nE events by chance follows the binomial distribution [189],

B(k,K; p) =

(K

k

)pk(1− p)K−k . (6.13)

87

Page 104: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

The number of events predicted by the F+max-RP is the (100%− α)-quantiles of

B(k,K; p) according to the significance level α [226]. Two conclusions for the F+max-

RP may be drawn from this. First, the quantiles of TPs are directly linked to thedistributional properties of the RP. This results in a computationally less expensiveprocedure than for resampling-based methods [59]. The second conclusion draws theattention to the limitations of the F+

max-RP. First, the sensitivity is statisticallyassessed as a function of F+

max, such that sensitivity is assessed as a function ofspecificity. Second, the specificity by itself is not statistically assessed at all. Bothdrawbacks are resolved by the proposed alternative, which is published in [131].

6.1.4. Independent assessment of positive and negativepredictions

In this section, a method is proposed by which separate analytic null distributionsfor TP and TN predictions may be derived, as published in [131]. Its fundamentalidea is to employ a Poissonian RP similar to the F+

max-RP after removal of theF+

max-dependence. To achieve this, two steps are necessary. In the first step, thedependence of the F+

max-RP on F+max is removed. This yields two RP, one for TPs and

one for TNs. Both RPs do not incorporate extended IPs and OPs. The adaptionsnecessary to include IPs and OPs are introduced in the second step. By this twostep procedure, the adaptability of the proposed method with respect to the set ofnull hypotheses that may be tested, becomes explicit.

Random predictors for true positives and true negatives

The first step, i.e., removal of the F+max-dependence of the Poissonian RP, is achieved

by considering a RP that does and does not raise alarms at the same rates as thepredictor analyzed. Under the assumption of a constant rate, this is achieved by ahomogeneous Poissonian RP that raises alarms with the constant probability

p+ =n+

T, (6.14)

where n+ is the total number of positive predictions, and T the number of predic-tions made within the prediction-observation time series. This RP raises alarms ata constant rate independent from occurrences of events. The null distribution of k+

TP predictions out of K+ = nE events is then binomial, similar to the F+max-RP, cf.

expression (6.13). The probability of a positive prediction is yet p+ instead of p, andthus independent from F+

max. The RP for TNs is derived analogously. In particular,the distribution of k− TN predictions out of K− = nnE negative observations isbinomial with time-constant probability of negative predictions,

p− =n−T

= 1− p+ . (6.15)

88

Page 105: Statistical Analysis of Processes with Application to

6.1. Methodology

The null hypotheses tested by the two RPs are

H0,±: “The number of TPs and TNs, respectively,obtained from the event predictor analyzed is compatibleto those of a homogeneous Poissonian RP with the sameconstant positive and negative prediction rates.”

(6.16)

As for clustering of events or circadian rhythms, it might be desirable to test nullhypotheses that include time dependencies. Time-dependencies may be incorpo-rated by using an inhomogeneous Poissonian random predictor, such that p±(t) aretime dependent. To incorporate severity of events into the null hypotheses, theevents need to be categorized, before the according statistic may be applied. Theincorporation of time dependencies and the severity of events reflects the versatilityof this method with respect to the null hypotheses testable.

Incorporating extended IPs and OPs

When incorporating the existence of extended IPs and OPs, the probabilities oftrue positive and true negative predictions change. A positive prediction has tobe followed by the absence of an event in the IP together with the occurrence ofat least one event in the OP. The probability of an absence of an event during Tiinstances of the IP is

pi = (1− pp)Ti , (6.17)

where pp = n+

Tis the probability of a positive prediction at each time point. The

probability for the occurrence of at least one event during To instances of the OP is

po = 1− (1− pp)To . (6.18)

In total, the probability of a TP prediction of a Poissonian RP is altered,

p+ = pi po = (1− pp)Ti(1− (1− pp)To

). (6.19)

The probability of TN predictions is obtained by similar considerations. The prob-ability for negative predictions is pn = nFP+nTN

T= 1− pp, cf. Eq. (6.15). TN predic-

tions are followed by Ti time points during which no event occurs. The probabilityfor this is pTin . After a negative prediction, the next prediction is made withoutawaiting the passing of the Ti instances corresponding to the previous negative pre-diction. Accordingly, a negative prediction has been made for Ti − 1 instances,already, when a negative prediction is preceded by at least one negative prediction.This reduces the probability for a TN to pn. After each of the nE events the prob-ability of a TN is pTin . Between events this probability is enhanced to pn. All in all,the global probability for TN predictions reads

p− =(1− pp)TinE + (1− pp)(T − nE)

T. (6.20)

89

Page 106: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

Since extreme events tend to occur rarely [38,42,69,163,183], the probability of a truenegative prediction may be approximated by p− ≈ 1− pp.The null distributions of the derived RPs of TP and TN predictions are binomial

according to B(k±, K±; p±), with p± as given by Eqs. (6.19) and (6.20). The nullhypotheses tested are

H0,±: “The number of TPs and TNs, respectively, obtained fromthe event predictor analyzed is compatible with those of ahomogeneous Poissonian RP that incorporates the sameduration of IPs and OPs and raises alarms at the sametime-constant rates as the event predictor.”

(6.21)

Adaptions as in the case of time-dependent predictions or predictions according tothe severity of events may be incorporated as before.Whether IPs and OPs are incorporated into the hypothesis test or not, the null

distribution for nTP and nTN of a predictor corresponds to the binomial distribu-tion B(k±, K±; p±), with p± the probability for a TP and TN prediction, and K±the number of observed events, nE, or absences of events, nnE, respectively. Eachhypothesis test of TPs and TNs is rejected if

nTP ≥ B100%−α+( · , K+; p+) and (6.22)nTN ≥ B100%−α−( · , K−; p−) , (6.23)

where B100%−α±( · , K±; p±) are the (100%− α±)-quantiles of according null distri-butions, and α± the significance levels of the tests.

6.2. Application

In this section, the Poissonian RPs presented in the previous Section, 6.1.4, aretested in simulations with respect to reliability and power. To this end, the mech-anism of deriving prediction-observation time series with known degree of predic-tion capability is described in Sec. 6.2.1. This mechanism is used as data-generatormodel to test the performance of the RPs proposed. Results of reliability and poweranalysis are shown in Sec. 6.2.2. The content of this Section has been publishedin [131].

6.2.1. Simulating prediction-observation time series

Prediction-observation time series with controllable predictability are simulated ac-cording to a stochastic model. Its fundamental idea is to generate predictions basedon a β-distribution. The predictions are linked to the observations by a linear model.This idea has been used for probabilistic predictions before [20]. Here it is adaptedfor binary predictions incorporating the existence of extended IPs and OPs.

90

Page 107: Statistical Analysis of Processes with Application to

6.2. Application

According to the RPs proposed, two settings of prediction-observation time se-ries are described. First, prediction-observation time series are generated withoutincorporating IP and OPs. This illustrates the general formalism of the controlledsetting, in which the RP is tested. Second, the extension to incorporated IPs andOPs is described.

Simulation of prediction-observation time series without IP and OP

To simulate a tuple (f, y) of a binary prediction f and its corresponding observationy, first a random number f is drawn from the β-distribution [20]

Dβ(f) =fv−1

(1− f)w−1

β(v, w)∈ [0, 1] . (6.24)

The β-distribution includes the β-function

β(v, w) =

∫ 1

0

tv−1(1− t)w−1 dt . (6.25)

Parameters v and w of the β-distribution are linked to the expectation [20]

E[F ] =v

v + w=: µf (6.26)

of the random variable F corresponding to the random number f . This randomnumber mimics a feature or a probabilistic prediction based on which a binaryprediction,

f =

{1 if f > Þ0 else

, (6.27)

is obtained by thresholding. While f = 1 corresponds to a positive prediction,f = 0 corresponds to a negative prediction. In accordance to µf in Eq. (6.26), theexpectation of the random variable corresponding to f is denoted µf . Since f isbinary, this expectation corresponds to the probability of a positive prediction. Itsvalue is determined by the choice of the threshold Þ in Eq. (6.27). Choosing Þ,the (1 − µf ) 100%-quantile of Dβ, µf = µf is a function of parameters v and w,Eq. (6.26).Based on the prediction f ∈ {0, 1}, the observation is simulated according to a

probabilistic model. In particular, the probability of an event given the predictionf is modeled [20],

µy|f = cf + u . (6.28)

The linkage strength c determines the degree to which the predictor f affects theobservation y. This observation may either be an event such that y = 1, or no event

91

Page 108: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

linkage strength c

prob

abili

ty fo

r an

eve

nt, µ

y|f

given a positive prediction, f=1given a negative prediction, f=0

Figure 6.2.: Probability, µy|f , for an event after a prediction f ∈ {0, 1}. In the caseof a positive prediction, f = 1 (red solid), µy|f increases with increasinglinkage c. In the case of a negative prediction (blue dashed), f = 0, µy|fdecreases with increasing linkage c. For c = 0, µy|f = µf = µy = 0.2.

such that y = 0. Linkage strength c = 0 corresponds to the incapability of thepredictor to forecast event occurrences and absences. To simulate this incapability,the offset is to be chosen

u = µf (1− c) , (6.29)

resulting in µy|f = µf independent of the prediction f . In Fig. 6.2, the linkage of aprediction and the probability for an event is made explicit for µf = 0.2. For c = 0,the probability for an event is µy|f = µf = 0.2, whether a positive prediction (redsolid), f = 1, or a negative prediction (blue dashed), f = 0, is made. In the case ofa positive prediction, the probability of an event increases linearly until at c = 1,where µy|f = 1, such that a positive prediction is certainly followed by an event (redsolid). In the case of a negative prediction, the probability of an event decreaseslinearly, until at c = 1, µy|f = 0, such that a negative prediction is certainly followedby no event (blue dashed).To simulate a time series of prediction-observation pairs (f, y), this procedure

is repeated T times. In Fig. 6.3 exemplary prediction-observation time series oflength T = 500 are shown for different linkage strengths c ∈ [0, 1]. Parameters ofthe underlying β-distribution are v = 1 and w = 19, such that the probability foran event is µy = µf = µf = 0.05. With increasing linkage (top to bottom), thenumber of positive predictions (red, long vertical lines) that are in accordance with

92

Page 109: Statistical Analysis of Processes with Application to

6.2. Application

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

00

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

05

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

10

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

15

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

20

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

50

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

70

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

80

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=0.

90

0 50 100 150 200 250 300 350 400 450 500−1

0

1

c=1.

00

time in samples

Figure 6.3.: Exemplary prediction-observation time series of length T = 500 forµy = µf = 0.05 with different linkage c ∈ [0, 1]. Positive predictions aredisplayed in red long vertical lines, events are shown in blue, shorterthick vertical lines. For increasing linkage strength c (top to bottom),the number of TP and TN predictions increase.

93

Page 110: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

event occurrences (blue, shorter thick vertical lines) increases.

Simulation of prediction-observation pairs with IP and OP

In this passage, the idea of simulating prediction-observation pairs is extended byincorporating IPs and OPs. In this scenario, a prediction must be made early enoughto initiate an intervention during the Ti time points of the IP before the eventfinally occurs during the To time points of the OP. As before, a binary predictionf is obtained from thresholding the random number f drawn from a β-distributionwith parameters v and w, cf. Eq. (6.24) [20]. Other than before, binary predictionsare linked to observations y by the probabilistic model, cf. Eq. (6.28),

µy|f = cf + µf (1− c) , (6.30)

in a time-shifted way. After a positive prediction, f = 1, the probability for anevent

µy|f = µf (1− c) (6.31)

remains low within the duration of the IP. This corresponds to the probability foran event after a negative prediction in the previous simulation. The probability foran event is yet increased in the OP, where

µy|f = c+ µf (1− c) . (6.32)

This corresponds to the probability for an event after a positive prediction in theprevious simulation. After the occurrence of an event or after passing of the OP,new binary predictions are simulated by thresholding β-distributed random numbersuntil the next positive prediction. The probability for an event after a negativeprediction is according to Eq. (6.31), as before.In Fig. 6.4, exemplary prediction-observation time series are shown for different

linkage strengths c. IPs are simulated to endure Ti = 4 data points, OPs endureTo = 4 data points (small black boxes). Compared to the previous simulation,TP predictions are positive predictions (red, long line) followed by an event (blue,shorter thick line) within the OP. To ensure that the probability for an event withinthe duration of a complex consisting of IP and OP is below 5%, parameters arechosen v = 1 and w = 119. This choice leads to a reduced probability for an eventµy = µf = 0.005. Due to this reduced probability for an event, the sample sizeshown in Fig. 6.4 is doubled compared to that shown in Fig. 6.3. The fraction ofaccordances of predictions and observations is increased at lower linkages in thecase of simulated IPs and OPs. False predictions do not occur for c ≥ 0.7, whilethey do occur up to c = 0.9 when no IPs and OPs are simulated. To summarize,by the incorporation of IPs and OPs, the probability for TP and TN predictions isenhanced. The proposed hypothesis test takes this increase into account.

94

Page 111: Statistical Analysis of Processes with Application to

6.2. Application

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

00

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

05

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

10

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

15

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

20

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

50

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

70

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

80

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=0.

90

0 100 200 300 400 500 600 700 800 900 1000−1

0

1

c=1.

00

time in samples

Figure 6.4.: Exemplary prediction-observation time series of length T = 1 000 forµx = 0.005 with different linkage c ∈ [0, 1]. Positive predictions (red,long vertical lines) are followed by an event free IP of Ti = 4 data pointsand an event (blue, shorter thick vertical lines) within the OP, i.e. theTo = 4 data points. The occurrence period of each positive predictionis displayed as black box. For increasing linkage strength c (top tobottom), the number of true positive and true negative predictionsincreases.

95

Page 112: Statistical Analysis of Processes with Application to

6. Predicting Extreme Events

0 0.1 0.2 0.30

20

40

60

80

100

linkage strength c

perc

enta

ge o

f rej

ectio

ns true positivestrue negativessignificance level

0 0.1 0.2 0.30

20

40

60

80

100

linkage strength c

true positivestrue negativessignificance level

(a)

0 0.1 0.2 0.30

20

40

60

80

100

linkage strength c

perc

enta

ge o

f rej

ectio

ns true positivestrue negativessignificance level

0 0.1 0.2 0.30

20

40

60

80

100

linkage strength c

true positivestrue negativessignificance level

(b)

Figure 6.5.: Performance of RPs in simulations without (a) and with (b) IP-OPcomplexes. Proposed RPs are reliable, and powerful in all settings.Probability for H0,+-rejections (red solid), and probability for H0,−-rejections (blue dashed).

6.2.2. Testing the proposed random predictors

In this section, reliability and power of the proposed Poissonian RPs are tested, bothin the setting without and with IPs and OPs. To this end, prediction-observationtime series are simulated as presented in Sec. 6.2.1. For each simulated linkagestrength c ∈ [0, 1] and each setting without or with IPs and OPs, 1 000 repetitionsof prediction-observation time series of length T = 10 000 are statistically assessedby the according RPs presented in Sec. 6.1.4. The null hypotheses tested by theproposed tests for TPs (+) and TNs (-), are abbreviated H0,±: c = 0. Results ofthe simulations are summarized in Fig. 6.5, as fraction of rejected H0,± at thesignificance level α = 5% as a function of the true linkage c ∈ [0, 0.3].In Fig. 6.5 (a), the fraction of rejected H0,+ (red solid) and the fraction of rejected

H0,− (blue dashed) are shown for the case that no IP-OP complexes are simulatedand the according RPs are employed. For TPs, the fraction of rejected H0,+ atc = 0 is 4%, such that the significance level is met. This fraction increases rapidlywith increasing linkage strength. While the hypothesis test correctly rejects H0,+

for c = 0.1, linkage of predictions and observations is not obvious at c = 0.1 inthe exemplary data shown in Fig. 6.3. This reflects the power of the proposed testas compared to the degree of predictability visible from the data, directly. Thehypothesis test for negative predictions is more conservative than that for positivepredictions. At linkage c = 0, H0,− is rejected in all repetitions. The power of thetest does not increase as rapidly as in the case of positive predictions. This is due tothe fact that the number of negative observations and predictions, approximately9 500, exceeds the number of positive ones, approximately 500, by far. Still, thepower of the test is considerable when relating it to the exemplary data shownin Fig. 6.3. False positive predictions occur in the exemplary data up to linkagec = 0.5, while the hypothesis test rejects H0,− with a probability of 94% at c = 0.1,

96

Page 113: Statistical Analysis of Processes with Application to

6.3. Summary

already.In Fig. 6.5 (b) performance results are shown for the simulation incorporating IP-

OP complexes. The adaptions of the RPs of TPs (red solid) and TNs (blue dashed)render the results comparable to those shown in (a). Again, both hypothesis testsare reliable, and the test is more conservative for TNs (0%) than for TPs (2%).Power of the RP of TPs increases more quickly than that of the TNs. Comparingthe power of the test to the number of correctly predicted event occurrences andabsences shown in Fig. 6.4, both tests are powerful. While only 2 out of 5 eventsare correctly predicted in the exemplary data at c = 0.1, the fraction of rejectedH0,+ is 100% and the fraction of rejected H0,− is 96% at c = 0.1, i.e. high.

6.3. Summary

Different fields of physics target to predict events trustworthily [11,84,107,125,128,139,155,166].Probabilistic or binary predictions of events are derived from changes of the system’sstate prior to the event [138,145,150,230]. To identify these changes, the state is describedby a set of properties which may be regressed to a prediction [8,17,72,83]. In the case ofprobabilistic predictions the decision is left open whether precautions or interven-tions shall be initialized [81,97]. This decision is made on the contrary in the case ofbinary predictions [58,189]. The performance of such predictors is quantified, e.g., bysensitivity and specificity [225]. So far, the focus has been on sensitivity because thestatistical properties of negative predictions have been unclear [6,146]. To account forthis, previous statistical methods for validating the performance of predictors havestatistically assessed sensitivity as a function of the false positive rate [189,226]. Inthis chapter, this drawback is overcome following the publication [131]. First, a sta-tistical definition of negative predictions is proposed. Second, a method, by whichsensitivity and specificity of a predictor may be assessed independently, is provided.This method is based on a Poissonian random predictor with the same probabil-ity of positive or negative predictions as the event predictor analyzed. Only if theperformance of the event predictor exceeds that of the Poissonian random predic-tor, the event predictor is considered significantly better than chance. Significanceapplies to sensitivity and specificity of the predictor, independently. The proposedmethod is tested in simulations that show its reliability and power. Compared toresampling-based methods, a major advantage of the proposed method is its claritywith respect to the null hypotheses tested by it [59], since they tend to be vaguely de-fined for resampling-based methods [59]. Apart from that, resampling-based methodsclaim to be highly versatile with respect to the null hypotheses testable by them [6].However, the proposed analytic method does not fall short with respect to its ver-satility. This is made explicit by an example of incorporated predefined periods inwhich events must not and must occur. Also time-dependencies of the predictor [6],as for clustering of events or circadian cycles, may be incorporated.

97

Page 114: Statistical Analysis of Processes with Application to
Page 115: Statistical Analysis of Processes with Application to

Summary

99

Page 116: Statistical Analysis of Processes with Application to
Page 117: Statistical Analysis of Processes with Application to

Summary

Summary in Words

Natural processes generally exhibit stochasticity, which is accounted for by methodsof statistical inference. These methods range from point estimation over intervalestimation to hypothesis testing, as summarized in Chap. 1. In this thesis, all threemethods are applied to infer linear and nonlinear properties of single processes aswell as their interrelations within complex systems. To quantify properties withinand across processes, univariate, as well as bi- and multivariate methods of timeseries analysis are applied. Besides the pure identification of process propertiesand interactions, time series analysis methods may be applied to quantify changesof a system’s state. In the field of event prediction, such changes are assumed tobe specific for a system prior to events, such as earthquakes or epileptic seizures.Identifying these changes would render the prediction of events possible, such thatpreventions and precautions could be initiated. While Part I of this thesis addressesstatistical inference of process properties and their interactions, event predictors arestatistically assessed in Part II.In Chap. 2, a general framework of multivariate network analysis based on time-

dependent autoregressive modeling in the state-space model is presented [190]. En-suing from autoregressive models, various measures have been proposed to inferlinkages between processes within a complex system [12,36,89,105,192,227]. A frequencydomain measure that has been successfully applied to investigate dynamics of thebrain [199] is the renormalized partial directed coherence [192]. The statistical proper-ties of this measure are known if the linkage is time-constant [192]. If linkage variesover time, the statistical assessment needs to account for the temporal correlationof successively sampled linkage measures. To this end, a parametric bootstrap ofresiduals within the state-space model is proposed [190]. The approach is tested bysimulated networks containing various types of interaction dynamics.A prerequisite of the bootstrap is that the random variables underlying the obser-

vations are independent [48]. If autoregressive modeling in the state-space model isan inadequate option, the block bootstrap is a powerful alternative [32,119]. The keyparameter of the block bootstrap is the block length [32,82,119]. For different appli-cations, optimal choices of block length have been proposed [82,160,187]. In Chap. 3,it is shown that they fail if the system is afflicted by noise that diminishes theautocorrelation. To resolve this shortcoming, an adaption of common algorithmsis proposed [132]. Owing to this adaption optimal block-length selection is robustto impacts of noise, where traditional approaches fail [132]. This is shown both an-alytically and by simulations. Finally, the block bootstrap is applied to estimateconfidence intervals for the univariate variance in a tremor application [132].The fundamental idea of the choice of block length for block bootstrapping smooth

function models is that the autocorrelation needs to have decayed sufficiently withineach block. This notion is used in Chap. 4 as a guideline for the block lengthselection of a block bootstrap designed for hypothesis testing of uni- and bivari-ate bispectral analysis [130]. While linear frequency-domain properties of processesare addressed by uni- and bivariate spectral analysis [24], the nonlinear correlate

101

Page 118: Statistical Analysis of Processes with Application to

Summary

is bispectral analysis [214]. Higher harmonics of a fundamental frequency are in-distinguishable from independent oscillations by spectral analysis. By bispectralanalysis, this distinction may be made [130]. Analytic methods for hypothesis test-ing within the scope of bispectral analysis are known [23,53,90]. However, they onlyapply asymptotically. The amount of data needed to reach this asymptotics is highsuch that it may not be available in applications. To overcome this limitation,block bootstrap-based tests are proposed, instead [130]. It is shown in simulationsthat these tests remain reliable even if less data is available, while analytic testsfail. Finally, the block bootstrap is applied to investigate higher harmonics withinthe scope of tremor analysis [130].In Chapt. 5, an analytic null distribution for hypothesis testing is derived for a

measure quantifying phase-amplitude coupling. This univariate nonlinear propertyof processes has been claimed in various investigations of brain dynamics as in learn-ing and memory consolidation [26,30,210,211,219], and pathologies such as Parkinson’sdisease [41,127,194] and social anxiety disorder [142]. So far, the phase-amplitude plot isthe basis for a set of measures of phase-amplitude coupling, which are statisticallyassessed by surrogate methods [161,212]. However, the adequacy of such surrogate-methods has been doubted [10,229]. In this thesis, an analytic alternative based onthe phase-amplitude plot is proposed. To this end, the phase-amplitude plot istransformed onto a mass density from which a counting statistic may be derived.Based on this statistic, a χ2-test may be applied. This statistic is versatile withrespect to the null hypotheses testable. The reliability and power of the resultingtest is shown in simulations.Finally, a statistical method for hypothesis testing within the scope of event

prediction is presented in Chap. 6 [131]. An event predictor is considered useful, ifit significantly outperforms a predictor randomly raising alarms [145]. If events arepredicted within extended periods after an alarm, the probability of true positivepredictions is enhanced. This has been accounted for by statistical methods ofevent prediction [189,226]. So far, they assume that performance is quantified by thenumber of true positives as a function of admissible false positive predictions [145,226].Since false positive predictions diminish the number of true negative predictions,independent assessment of true positive and true negative predictions has not beenpossible. This drawback is resolved by the hypothesis tests proposed [131]. It isbased on Poissonian random predictors that raises alarms at the same rate as thepredictor investigated. The performance of the resulting hypothesis tests is assessedin simulations [131]. For comparability of prediction algorithms, it is essential to unifythe statistical assessment of predictive performance. The proposed method servesas a candidate for this.As advancements in recording and storage technology allow for new experiments

to test scientific theories, there is need to continue developing appropriate statisticalmethods by which theories which are incompatible with the experimental findingsare reliably rejected.

102

Page 119: Statistical Analysis of Processes with Application to

Summary

Summary as a Table

Topic Network analysis Block-length selection Bispectral analysis Phase-amplitudecoupling

Event prediction

Methodology Autoregressive modeling Variance estimation ofsmooth function models

Higher order spectralanalysis

Cross-frequency coupling Poissonian randompredictor for number oftrue positives/negatives

Estimator ormeasure

Renormalized partialdirected coherence

Variance, in particular Bispectrum, bicoherence,bispectral coefficient

Phase-amplitude plot Independent from choiceof time-series measures

Linearity /nonlinearity

Linear Linear Nonlinear Nonlinear Based on linear andnonlinear measures

Statisticalassessment

Estimation of confidenceintervals

Estimation of confidenceintervals

Hypothesis testing Hypothesis testing Hypothesis testing

Majorcontribution

Bootstrap of residuals inthe state-space modelthat transforms tolinkage measures

Block-length selection forthe block bootstrap in anoisy regime

Block bootstrap thatoutperforms availableanalytic statistics,particularly for few data

Analytic χ2 test forphase-amplitude plots asan alternative tosurrogate methods

Poissonian statistic thatallows for independentstatistical assessment ofpositive and negativepredictions

103

Page 120: Statistical Analysis of Processes with Application to
Page 121: Statistical Analysis of Processes with Application to

Bibliography

[1] G. Aad, T. Abajyan, B. Abbott, J. Abdallah, S. Abdel Khalek, A.A. Abde-lalim, O. Abdinov, R. Aben, B. Abi, M. Abolins, et al. Observation of a newparticle in the search for the Standard Model Higgs boson with the ATLASdetector at the LHC. Phys. Lett. B, 716:1–29, 2012

[2] I. Adamchic, B. Langguth, C. Hauptmann, and P.A. Tass. Abnormal cross-frequency coupling in the tinnitus network. Front. Neurosci., 8:284, 2014.

[3] R. Albert and A.-L. Barabási. Statistical mechanics of complex networks.Rev. Mod. Phys., 74:47–97, 2002.

[4] J. Aldrich. R.A. Fisher and the making of maximum likelihood 1912–1922.Stat. Sci., 12:162–176, 1997.

[5] E.A. Allen, J. Liu, K.A. Kiehl, J. Gelernter, G.D. Pearlson, N. I. Perrone-Bizzozero, and V.D. Calhoun. Components of cross-frequency modulation inhealth and disease. Front. Syst. Neurosci., 5:59, 2011.

[6] R.G. Andrzejak, F. Mormann, T. Kreuz, C. Rieke, C. E. Elger, and K. Lehn-ertz. Testing the null hypothesis of the nonexistence of a preseizure state.Phys. Rev. E, 67:010901, 2003.

[7] R.G. Andrzejak, D. Chicharro, C. E. Elger, and F. Mormann. Seizure predic-tion: Any better than chance? Clin. Neurophysiol., 120:1465–1478, 2009.

[8] M. S. Antolik. An overview of the National Weather Service’s centralizedstatistical quantitative precipitation forecasts. J. Hydrol., 239:306–337, 2000.

[9] S. Applequist, G. E. Gahrs, R. L. Pfeffer, and X.-F. Niu. Comparison ofmethodologies for probabilistic quantitative precipitation forecasting. WeatherForecast., 17:783–799, 2002.

[10] J. Aru, J. Aru, V. Priesemann, M. Wibral, L. Lana, G. Pipa, W. Singer, andR. Vicente. Untangling cross-frequency coupling in neuroscience. Curr. Opin.Neurobiol., 31:51–61, 2015.

[11] R. Aschenbrenner-Scheibe, T. Maiwald, M. Winterhalder, H.U. Voss, J. Tim-mer, and A. Schulze-Bonhage. How well can epileptic seizures be predicted?An evaluation of a nonlinear method. Brain, 126:2616–2626, 2003.

105

Page 122: Statistical Analysis of Processes with Application to

Bibliography

[12] L.A. Baccalá and K. Sameshima. Partial directed coherence: A new conceptin neural structure determination. Biol. Cybern., 84:463–474, 2001.

[13] J. V. Barth, G. Costantini, and K. Kern. Engineering atomic and molecularnanostructures at surfaces. Nature, 437:671–679, 2005.

[14] A. Bauswein, R.A. Pulpillo, H.T. Janka, and S. Goriely. Nucleosynthesisconstraints on the neutron star-black hole merger rate. Astrophys. J., 795:L9,2014.

[15] P. Bellec, P. Rosa-Neto, O.C. Lyttelton, H. Benali, and A.C. Evans. Multi-level bootstrap analysis of stable clusters in resting-state fMRI. Neuroimage,51:1126–1139, 2010.

[16] S. Bialonski, M.-T. Horstmann, and K. Lehnertz. From brain to earth andclimate systems: Small-world interaction networks or not? Chaos, 20:013134,2010.

[17] J. B. Bremnes. Probabilistic forecasts of precipitation in terms of quantilesusing NWP model output. Mon. Weather Rev., 132:338–347, 2004.

[18] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.U. Hwang. Complexnetworks: Structure and dynamics. Phys. Rep., 424:175–308, 2006.

[19] C. Börgers, S. Epstein, and N. J. Kopell. Gamma oscillations mediate stimuluscompetition and attentional selection in a cortical network model. P. Natl.Acad. Sci. USA, 105:18023–18028, 2008.

[20] A.A. Bradley, S. S. Schwartz, and T. Hashino. Sampling uncertainty andconfidence intervals for the Brier score and Brier skill score. Weather Forecast.,23:992–1006, 2008.

[21] A. Bragin, G. Jandó, Z. Nádasdy, J. Hetke, K. Wise, and G. Buzsáki. Gamma(40-100 Hz) oscillation in the hippocampus of the behaving rat. J. Neurosci.,15:47–60, 1995.

[22] G.W. Brier. Verification of forecasts expressed in terms of probability. Mon.Weather Rev., 78:1–3, 1950.

[23] D.R. Brillinger. An introduction to polyspectra. Ann. Math. Stat., 36:1351–1374, 1965.

[24] D.R. Brillinger. Time Series: Data Analysis and Theory. Society for Indus-trial and Applied Mathematics, Philadelphia, 2001.

[25] A. Bruns, R. Eckhorn, H. Jokeit, and A. Ebner. Amplitude envelopecorrelation detects coupling among incoherent brain signals. Neuroreport,11:1509–1514, 2000.

106

Page 123: Statistical Analysis of Processes with Application to

Bibliography

[26] A. Bruns and R. Eckhorn. Task-related coupling from high- to low-frequencysignals among visual cortical areas in human subdural recordings. Int. J.Psychophysiol., 51:97–116, 2004.

[27] E. Bullmore and O. Sporns. Complex brain networks: Graph theoreticalanalysis of structural and functional systems. Nat. Rev. Neurosci., 10:186–198, 2009.

[28] T. J. Buschman, E. L. Denovellis, C. Diogo, D. Bullock, and E.K. Miller.Synchronous oscillatory neural ensembles for rules in the prefrontal cortex.Neuron, 76:838–846, 2012.

[29] I. Campbell. Chi-squared and Fisher–Irwin tests of two-by-two tables withsmall sample recommendations. Stat. Med., 26:3661–3675, 2007.

[30] R.T. Canolty, E. Edwards, S. S. Dalal, M. Soltani, S. S. Nagarajan,H. E. Kirsch, M. S. Berger, N.M. Barbaro, and R.T. Knight. High gammapower is phase-locked to theta oscillations in human neocortex. Science,313:1626–1628, 2006.

[31] R.T. Canolty and R.T. Knight. The functional role of cross-frequency cou-pling. Trends Cogn. Sci., 14:506–515, 2010.

[32] E. Carlstein. The use of subseries values for estimating the variance of ageneral statistic from a stationary sequence. Ann. Stat., 14:1171–1179, 1986.

[33] J.M. Chadney, M. Galand, Y.C. Unruh, T.T. Koskinen, and J. Sanz-Forcada.XUV-driven mass loss from extrasolar giant planets orbiting active stars.Icarus, 250:357–367, 2015.

[34] M.X. Cohen. Assessing transient cross-frequency coupling in EEG data.J. Neurosci. Meth., 168:494–499, 2008.

[35] A.D.D. Craik. The origins of water wave theory. Annu. Rev. Fluid Mech.,36:1–28, 2004.

[36] R. Dahlhaus. Graphical interaction models for multivariate time series.Metrika, 51:157–172, 2000.

[37] R. Dahlhaus and M. Eichler. Causality and graphical models for time series.In P. Green, N. Hjort, and S. Richardson, eds., Highly Structured StochasticSystems, pp. 115–137. Oxford University Press, 2003.

[38] Z. Danku, F. Kun. Creep rupture as a non-homogeneous Poissonian process.Sci. Rep., 3:2688, 2013.

107

Page 124: Statistical Analysis of Processes with Application to

Bibliography

[39] W.W. Davis. Robust interval estimation of the innovation variance of anARMA model. Ann. Stat., 5:700–708, 1977.

[40] D. DeRidder, S. Vanneste, B. Langguth, and R. Llinas. Thalamocorticaldysrhythmia: A theoretical update in tinnitus. Front. Neurol., 6:124, 2015.

[41] C. deHemptinne, E. S. Ryapolova-Webb, E. L. Air, P.A. Garcia, K. J. Miller,J.G. Ojemann, J. L. Ostrem, N.B. Galifianakis, and P.A. Starr. Exaggeratedphase-amplitude coupling in the primary motor cortex in Parkinson disease.P. Natl. Acad. Sci. USA, 110:4780–4785, 2013.

[42] H. L.D. de S.Cavalcante, M. Oriá, D. Sornette, E. Ott, and D. J. Gauthier.Predictability and suppression of extreme events in a chaotic system. Phys.Rev. Lett., 111:198701, 2013.

[43] T. Demiralp, Z. Bayraktaroglu, D. Lenz, S. Junge, N.A. Busch, B. Maess,M. Ergen, and C. S. Herrmann. Gamma amplitudes are coupled to theta phasein human EEG during visual perception. Int. J. Psychophysiol., 64:24–30,2007.

[44] A. P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood fromincomplete data via the EM algorithm. J. Roy. Stat. Soc. B Met., 39:1–38,1977.

[45] L. Dosiek. Extracting electrical network frequency from digital recordingsusing frequency demodulation. IEEE Signal Proc. Let., 22:691–695, 2015.

[46] C.A. Doswell III, R. Davies-Jones, and D. L. Keller. On summary measures ofskill in rare event forecasting based on contingency tables. Weather Forecast.,5:576–585, 1990.

[47] L.-M. Duan and C. Monroe. Colloquium: Quantum networks with trappedions. Rev. Mod. Phys., 82:1209–1224, 2010.

[48] B. Efron. Bootstrap methods: Another look at the jackknife. Ann. Stat.,7:1–26, 1979.

[49] B. Efron and G. Gong. A leisurely look at the bootstrap, the jackknife, andcross-validation. Am. Stat., 37:36–48, 1983.

[50] B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidenceintervals, and other measures of statistical accuracy. Stat. Sci., 1:54–75, 1986.

[51] B. Efron. Second thoughts on the bootstrap. Stat. Sci., 18:135–140, 2003.

[52] M. Eichler. Graphical modeling of dynamic relationships in multivariate timeseries. In B. Schelter, M. Winterhalder, and J. Timmer, eds., Handbook ofTime Series Analysis, pp. 335–372. Wiley-VCH, Weinheim, 2006.

108

Page 125: Statistical Analysis of Processes with Application to

Bibliography

[53] S. Elgar and R.T. Guza. Statistics of bicoherence. IEEE T. Acoust. Speech,36:1667–1668, 1988.

[54] S. Elgar and G. Sebert. Statistics of bicoherence and biphase. J. Geophys.Res.-Oceans, 94:10993–10998, 1989.

[55] R. J. England, R. J. Noble, K. Bane, D.H. Dowell, C.-K. Ng, J. E. Spencer,S. Tantawi, Z. Wu, R. L. Byer, E. Peralta, et al. Dielectric laser accelerators.Rev. Mod. Phys., 86:1337–1389, 2014.

[56] L. Fabiny, P. Colet, R. Roy, and D. Lenstra. Coherence and phase dynamicsof spatially coupled solid-state lasers. Phys. Rev. A, 47:4287–4296, 1993.

[57] Z. L. Fang, H. F. Rong, Z. L. Ya, and P. Qi. In-situ synthesis of CdS/g-C3N4 hybrid nanocomposites with enhanced visible photocatalytic activity.J. Mater. Sci., 50:3057–3064, 2015.

[58] H. Feldwisch-Drentrup, B. Schelter, M. Jachan, J. Nawrath, J. Timmer, andA. Schulze-Bonhage. Joining the benefits: Combining epileptic seizure pre-diction methods. Epilepsia, 51:1598–1606, 2010.

[59] H. Feldwisch-Drentrup, A. Schulze-Bonhage, J. Timmer, and B. Schelter. Sta-tistical validation of event predictors: A comparative study based on the fieldof seizure prediction. Phys. Rev. E, 83:066704, 2011.

[60] P. Felfer, A.V. Ceguerra, S. P. Ringer, and J.M. Cairney. Detecting andextracting clusters in atom probe data: A simple, automated method usingvoronoi cells. Ultramicroscopy, 150:30–36, 2015.

[61] J. Fell and N. Axmacher. The role of phase synchronization in memory pro-cesses. Nat. Rev. Neurosci., 12:105–118, 2011.

[62] J. Feynman and A. Ruzmaikin. Problems in the forecasting of solar particleevents for manned missions. Radiat. Meas., 30:275–280, 1999.

[63] C. Fields. How small is the center of science? Short cross-disciplinary cyclesin co-authorship graphs. Scientometrics, 102:1287–1306, 2014.

[64] J. P. Finley. Tornado predictions. Amer. Meteor. J., 1:85–88, 1884.

[65] R.A. Fisher. On the mathematical foundations of theoretical statistics. Philos.T. R. Soc. Lond., 222:309–368, 1922.

[66] R.A. Fisher. On the interpretation of χ2 from contingency tables, and thecalculation of p. J. R. Stat. Soc., 85:87–94, 1922.

[67] R.A. Fisher. The logic of inductive inference. J. R. Stat. Soc., 98:39–82, 1935.

109

Page 126: Statistical Analysis of Processes with Application to

Bibliography

[68] P.A. Franken, A. E. Hill, C.W. Peters, and G. Weinreich. Generation ofoptical harmonics. Phys. Rev. Lett., 7:118–119, 1961.

[69] C. Franzke. Predictability of extreme events in a nonlinear stochastic-dynamical model. Phys. Rev. E, 85:031134, 2012.

[70] D. Freedman. On bootstrapping two-stage least-squares estimates in station-ary linear models. Ann. Stat., 12:827–842, 1984.

[71] C. Frei and C. Schär. Detection probability of trends in rare events: The-ory and application to heavy precipitation in the Alpine region. J. Climate,14:1568–1584, 2001.

[72] P. Friederichs and A. Hense. Statistical downscaling of extreme precipitationevents using censored quantile regression. Mon. Weather Rev., 135:2365–2378,2007.

[73] P. Fries. A mechanism for cognitive dynamics: Neuronal communicationthrough neuronal coherence. Trends Cogn. Sci., 9:474–480, 2005.

[74] C.W. Gardiner. Handbook of Stochastic Methods. Springer, Berlin, 1997.

[75] A. Gautreau, A. Barrat, and M. Barthélemy. Global disease spread: Statisticsand estimation of arrival times. J. Theor. Biol., 251:509–522, 2008.

[76] Z. Ge. Significance testing for wavelet bicoherence and its application inanalyzing nonlinearity in turbulent shear flows. Phys. Rev. E, 81:056311,2010.

[77] M. Golitko and G.M. Feinman. Procurement and distribution of pre-Hispanic Mesoamerican obsidian 900 BC–AD 1520: A social network analysis.J. Archaeol. Method Th., 22:206–247, 2014.

[78] H. Grabert, U. Weiss, and P. Talkner. Quantum theory of the damped har-monic oscillator. Z. Phys. B – Condensed Matter, 55:87–94, 1984.

[79] J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37:424–438, 1969.

[80] M.P. Grubb, A. J. Orr-Ewing, and M.N.R. Ashfold. KOALA: A program forthe processing and decomposition of transient spectra. Rev. Sci. Instrum.,85:064104, 2014.

[81] J. Gundermann, S. Siegert, and H. Kantz. Improved predictions of rare eventsusing the Crooks fluctuation theorem. Phys. Rev. E, 89:032112, 2014.

[82] P. Hall, J. L. Horowitz, and B.-Y. Jing. On blocking rules for the bootstrapwith dependent data. Biometrika, 82:561–574, 1995.

110

Page 127: Statistical Analysis of Processes with Application to

Bibliography

[83] T.M. Hamill, J. S. Whitaker, and X. Wei. Ensemble reforecasting: Improvingmedium-range forecast skill using retrospective forecasts. Mon. Weather Rev.,132:1434–1447, 2004.

[84] H.A.P. Hapuarachchi, Q. J. Wang, and T.C. Pagano. A review of advancesin flash flood forecasting. Hydrol. Process., 25:2771–2784, 2011.

[85] M.A. Hassan, D. Coats, K. Gouda, Y.-J. Shin, and A. Bayoumi. Analy-sis of nonlinear vibration-interaction using higher order spectra to diagnoseaerospace system faults. Aerosp. Conf. Proc., pp. 1–8. IEEE, Big Sky, 2012.

[86] L. Held and D. S. Bové. Applied Statistical Inference. Springer, Berlin, 2014.

[87] B. Hellwig, S. Häußler, M. Lauk, B. Guschlbauer, B. Köster, R. Kristeva-Feige, J. Timmer, and C.H. Lücking. Tremor-correlated cortical activity de-tected by electroencephalography. Clin. Neurophysiol., 111:806–809, 2000.

[88] B. Hellwig, S. Häußler, B. Schelter, M. Lauk, B. Guschlbauer, J. Timmer, andC.H. Lücking. Tremor-correlated cortical activity in essential tremor. Lancet,357:519–523, 2001.

[89] W. Hesse, E. Möller, M. Arnold, and B. Schack. The use of time-variantEEG Granger causality for inspecting directed interdependencies of neuralassemblies. J. Neurosci. Meth., 124:27–44, 2003.

[90] M. J. Hinich. Testing for gaussianity and linearity of a stationary time series.J. Time Ser. Anal., 3:169–176, 1982.

[91] M. J. Hinich and M. Wolinsky. Normalizing bispectra. J. Stat. Plan. Infer.,130:405–411, 2005.

[92] J. Honerkamp. Statistical Physics: An Advanced Approach with Applications.Springer, Heidelberg, third edt., 2012.

[93] W.-C. Hong. Rainfall forecasting by technological machine learning models.Appl. Math. Comput., 200:41–57, 2008.

[94] C. Hsiao. Autoregressive modeling and causal ordering of economic variables.J. Econ. Dyn. Control, 4:243–259, 1982.

[95] K. Itô. Stochastic integral. Proc. Imp. Acad., 20:519–524, 1944.

[96] D.N. Ivankov and A.V. Finkelstein. Prediction of protein folding rates fromthe amino acid sequence-predicted secondary structure. P. Nat. Acad. Sci.USA, 101:8942–8944, 2004.

111

Page 128: Statistical Analysis of Processes with Application to

Bibliography

[97] M. Jachan, H. Feldwisch-Drentrup, F. Posdziech, A. Brandt, D.-M. Alten-müller, A. Schulze-Bonhage, J. Timmer, and B. Schelter. Probabilistic fore-casts of epileptic seizures and evaluation by the Brier score. In J. VanderSloten, P. Verdonck, M. Nyssen, and J. Haueisen, eds., ECIFMBE 2008,IFMBE Proceedings 22, pp. 1701–1705. Springer, Berlin, 2009.

[98] M.O. Jackson and A. Watts. On the formation of interaction networks insocial coordination games. Game. Econ. Behav., 41:265–291, 2002.

[99] J. Jang and S.-Y. Hong. Quantitative forecast experiment of a heavy rainfallevent over Korea in a global model: Horizontal resolution versus lead timeissues. Meteorol. Atmos. Phys., 124:113–127, 2014.

[100] P.A. E.M. Janssen. Progress in ocean wave forecasting. J. Comput. Phys.,227:3572–3594, 2008.

[101] N. J. Jasmine, P.T. Muthiah, C. Arunagiri, and A. Subashini. Vibrationalspectra (experimental and theoretical), molecular structure, natural bond or-bital, HOMO–LUMO energy, Mulliken charge and thermodynamic analysisof N’-hydroxy-pyrimidine-2-carboximidamide by DFT approach. Spectrochim.Acta A, 144:215–225, 2015.

[102] O. Jensen and L. L. Colgin. Cross-frequency coupling between neuronal oscil-lations. Trends Cogn. Sci., 11:267–269, 2007.

[103] S.W. Kahler, E.W. Cliver, and A.G. Ling. Validating the proton predictionsystem (PPS). J. Atmos. Sol.-Terr. Phy., 69:43–49, 2007.

[104] R. E. Kalman. A new approach to linear filtering and prediction problems.J. Basic Eng.-T. ASME, 82:35–45, 1960.

[105] M. J. Kamiński and K. J. Blinowska. A new method of the description of theinformation flow in the brain structures. Biol. Cybern., 65:203–210, 1991.

[106] M. J. Kamiński, M. Ding, W.A. Truccolo, and S. L. Bressler. Evaluatingcausal relations in neural systems: Granger causality and directed transferfunction and statistical assessment of significance. Biol. Cybern., 85:145–157,2001.

[107] V. Khachatryan, A.M. Sirunyan, A. Tumasyan, W. Adam, T. Bergauer, M.Dragicevic, J. Erö, M. Friedl, R. Frühwirth, V.M. Ghete, et. al. Study ofvector boson scattering and search for new physics in events with two same-sign leptons and two jets. Phys. Rev. Lett., 114:051801, 2015.

[108] M. Killmann, L. Sommerlade, W. Mader, J. Timmer, and B. Schelter. In-ference of time-dependent causal influences in networks. Biomed. Eng.,57:387-390, 2012.

112

Page 129: Statistical Analysis of Processes with Application to

Bibliography

[109] Y.C. Kim and E. J. Powers. Digital bispectral analysis of self-excited fluctu-ation spectra. Phys. Fluids, 21:1452–1453, 1978.

[110] Y.C. Kim and E. J. Powers. Digital bispectral analysis and its applicationsto nonlinear wave interactions. IEEE T. Plasma Sci., 7:120–131, 1979.

[111] Y.C. Kim and E. J. Powers. Digital bispectral analysis and its applicationsto nonlinear wave interactions. IEEE T. Plasma Sci., 7:123, 1979.

[112] K. Kirihara, A. J. Rissling, N.R. Swerdlow, D. L. Braff, and G.A. Light. Hier-archical organization of gamma and theta oscillatory dynamics in schizophre-nia. Biol. Psychiat., 71:873–880, 2012.

[113] P. Kowalczyk, S. Nema, P. Glendinning, I. Loram, and M. Brown. Auto-regressive moving average analysis of linear and discontinuous models of hu-man balance during quiet standing. Chaos, 24:022101, 2014.

[114] M.A. Kramer, A.B. L. Tort, and N. J. Kopell. Sharp edge artifacts and spu-rious coupling in EEG frequency comodulation measures. J. Neurosci. Meth.,170:352–357, 2008.

[115] A.V. Kravtsov and S. Borgani. Formation of galaxy clusters. Annu. Rev.Astrophys., 50:353-409, 2012.

[116] T. Kreuz, R.G. Andrzejak, F. Mormann, A. Kraskov, H. Stögbauer,C. E. Elger, K. Lehnertz, and P. Grassberger. Measure profile surrogates: Amethod to validate the performance of epileptic seizure prediction algorithms.Phys. Rev. E, 69:061915, 2004.

[117] M.-T. Kuhnert, C. Geier, C. E. Elger, and K. Lehnertz. Identifying impor-tant nodes in weighted functional brain networks: A comparison of differentcentrality approaches. Chaos, 22:023142, 2012.

[118] S. Kullback and R.A. Leibler. On information and sufficiency. Ann. Math.Stat., 22:79–86, 1951.

[119] H.R. Kunsch. The jackknife and the bootstrap for general stationary obser-vations. Ann. Stat., 17:1217–1241, 1989.

[120] C. Kurz, M. Schug, P. Eich, J. Huwer, P. Müller, and J. Eschner. Experimentalprotocol for high-fidelity heralded photon-to-atom quantum state transfer.Nat. Commun., 5:5527 2014.

[121] J.-P. Lachaux, E. Rodriguez, J. Martinerie, and F. J. Varela. Measuring phasesynchrony in brain signals. Hum. Brain Mapp., 8:194–208, 1999.

113

Page 130: Statistical Analysis of Processes with Application to

Bibliography

[122] P. Lakatos, A. S. Shah, K.H. Knuth, I. Ulbert, G. Karmos, andC.E. Schroeder. An oscillatory hierarchy controlling neuronal excitability andstimulus processing in the auditory cortex. J. Neurophysiol., 94:1904–1911,2005.

[123] M. Lenz, M. Musso, Y. Linke, O. Tüscher, J. Timmer, C. Weiller, andB. Schelter. Joint EEG/fMRI state space model for the detection of di-rected interactions in human brains – a simulation study. Physiol. Meas.,32:1725–1736, 2011.

[124] K. S. Lii and K.N. Helland. Cross-bispectrum computation and variance es-timation. ACM T. Math. Software, 7:284–294, 1981.

[125] D. Liu, A. Pe’er, and A. Loeb. A two-component jet model for the tidaldisruption event swift J164449.3+ 573451. Astrophys. J., 798:13, 2015.

[126] Y.H. Liu, S. J. Young, L Ji, and S. J. Chang. Noise properties of Mg-doped ZnO nanorods visible-blind photosensors. IEEE J. Sel. Top. Quant.,21:3800405, 2015.

[127] J. López-Azcárate, M. Tainta, M.C. Rodríguez-Oroz, M. Valencia,R. González, J. Guridi, J. Iriarte, J. A. Obeso, J. Artieda, and M. Alegre.Coupling between beta and high-frequency activity in the human subthala-mic nucleus may be a pathophysiological mechanism in Parkinson’s disease.J. Neurosci., 30:6667–6677, 2010.

[128] J. Lv and L. F. Luo. Statistical analyses of protein folding rates from the viewof quantum transition. Sci. China Life Sci., 57:1197–1212, 2014.

[129] H. Lütkepohl. New Introduction to Multiple Time Series Analysis. Springer,Berlin, 2005.

[130] M. Mader, J. Klatt, F. Amtage, B. Hellwig, W. Mader, L. Sommerlade,J. Timmer, and B. Schelter. Spectral and higher-order-spectral analysis oftremor time series. Clin. Exp. Pharmacol., 4:1000149, 2014.

[131] M. Mader, W. Mader, B. J. Gluckman, J. Timmer, and B. Schelter. Statisticalevaluation of forecasts. Phys. Rev. E, 90:022133, 2014.

[132] M. Mader, W. Mader, L. Sommerlade, J. Timmer, and B. Schelter. Block-bootstrapping for noisy data. J. Neurosci. Meth., 219:285–291, 2013.

[133] W. Mader, Y. Linke, M. Mader, L. Sommerlade, J. Timmer, and B. Schel-ter. A numerically efficient implementation of the expectation maximizationalgorithm for state space models. Appl. Math. Comput., 241:222–232, 2014.

114

Page 131: Statistical Analysis of Processes with Application to

Bibliography

[134] W. Mader, M. Mader, J. Timmer, M. Thiel, and B. Schelter. Networks: Onthe relation of bi-and multivariate measures. Sci. Rep., 5:10805, 2015.

[135] T. Maiwald, M. Winterhalder, R. Aschenbrenner-Scheibe, H.U. Voss,A. Schulze-Bonhage, and J. Timmer. Comparison of three nonlinear seizureprediction methods by means of the seizure prediction characteristic. PhysicaD, 194:357–368, 2004.

[136] S. Marceglia, A.M. Bianchi, S. Cerutti, and D. Servello. Cross-bispectralanalysis of local field potentials. In 4th International IEEE/EMBS Conf.Neural Engin., pp. 494–497. IEEE, Antalya, 2009.

[137] C. Marzban. Scalar measures of performance in rare-event situations. Wea.Forecast., 13:753–763, 1998.

[138] C. Marzban. Bayesian probability and scalar performance measures in Gaus-sian models. J. Appl. Meteorol., 37:72–82, 1998.

[139] J. J. Metzger, R. Fleischmann, and T. Geisel. Statistics of extreme waves inrandom media. Phys. Rev. Lett., 112:203903, 2014.

[140] E.K. Miller and T. J. Buschman. Working memory capacity: Limits on thebandwidth of cognition. Daedalus, 144:112–122, 2015.

[141] M. Miyakoshi, A. Delorme, T. Mullen, K. Kojima, S. Makeig, and E.Asano. Automated detection of cross-frequency coupling in the electrocor-ticogram for clinical inspection. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2013,2013:3282–3285, 2013.

[142] V. Miskovic, D.A. Moscovitch, D. L. Santesso, R. E. McCabe, M.M. Antony,and L.A. Schmidt. Changes in EEG cross-frequency coupling during cognitivebehavioral therapy for social anxiety disorder. Psychol. Sci., 22:507–516, 2011.

[143] I. A. Moldovan, A. Apostol, A. Moldovan, C. Ionescu, and A.O. Placinta. Thebio-location method used for stress forecasting in Vrancea (Romania) seismiczone. Rom. Rep. Phys., 65:261–270, 2013.

[144] L.V. Moran and L.E. Hong. High vs low frequency neural oscillations inschizophrenia. Schizophrenia Bull., 37:659-663, 2011.

[145] F. Mormann, R.G. Andrzejak, C. E. Elger, and K. Lehnertz. Seizure predic-tion: The long and winding road. Brain, 130:314–333, 2007.

[146] F. Mormann, T. Kreuz, C. Rieke R. Andrzejak, A. Kraskov, P. David, C. E.Elger, and K. Lehnertz. On the predictability of epileptic seizures. Clin.Neurophysiol., 116:569–587, 2005.

115

Page 132: Statistical Analysis of Processes with Application to

Bibliography

[147] F. Mormann, J. Fell, N. Axmacher, B. Weber, K. Lehnertz, C. E. Elger, andG. Fernández. Phase/amplitude reset and theta–gamma interaction in thehuman medial temporal lobe during a continuous word recognition memorytask. Hippocampus, 15:890–900, 2005.

[148] S. Morrison, G. Kerr, K.M. Newell, and P.A. Silburn. Differential time-andfrequency-dependent structure of postural sway and finger tremor in Parkin-son’s disease. Neurosci. Lett., 443:123–128, 2008.

[149] A.H. Murphy. Probabilities, odds, and forecasts of rare events. WeatherForecast., 6:302–307, 1991.

[150] M. Namvaran and A. Negarestani. Measurement of soluble radon in Jooshanspa (SE of Iran) and study its performance in earthquake forecasting process.Rom. J. Phys., 58:373–382, 2013.

[151] M.E. J. Newman. The structure and function of complex networks. SIAMRev., 45:167–256, 2003.

[152] C. L. Nikias and M.R. Raghuveer. Bispectrum estimation: A digital signalprocessing framework. P. IEEE, 75:869–891, 1987.

[153] S. Nowakowska, A. Wäckerlin, S. Kawai, T. Ivas, J. Nowakowski, S. Fatayer,C. Wäckerlin, T. Nijs, E. Meyer, J. Björk, et al. Interplay of weak interactionsin the atom-by-atom condensation of xenon within quantum boxes. Nat.Commun., 6:6071, 2015.

[154] K.A. Olive, K. Agashe, C. Amsler, M. Antonelli, J. F. Arguin, D.M. Asner,H. Baer, H.R. Band, R.M. Barnett, T. Basaglia, et al. Review of particlephysics: Particle Data Group. Chinese Phys. C, 38:090001, 2014.

[155] A. Oth and A.E. Kaiser. Stress release and source scaling of the 2010–2011Canterbury, New Zealand earthquake sequence from spectral inversion ofground motion data. Pure Appl. Geophys., 171:2767–2782, 2014.

[156] G.A. Pagani and M. Aiello. The power grid as a complex network: A survey.Physica A, 392:2688–2700, 2013.

[157] A. Parikh, J. José, and G. Sala. Classical novae and type I X-ray bursts:Challenges for the 21st century. AIP Advances, 4:041002, 2014.

[158] J.W. Patty and E.M. Penn. Analyzing big data: Social choice and measure-ment. PS-Polit. Sci. Polit., 48:95–101, 2015.

[159] Y. Pawitan. In All Likelihood: Statistical Modelling and Inference Using Like-lihood. Oxford University Press, Oxford, 2001.

116

Page 133: Statistical Analysis of Processes with Application to

Bibliography

[160] M. Peifer, B. Schelter, B. Guschlbauer, B. Hellwig, C.H. Lücking, andJ. Timmer. On studentising and blocklength selection for the bootstrap ontime series. Biometrical J., 47:346–357, 2005.

[161] W.D. Penny, E. Duzel, K. J. Miller, and J.G. Ojemann. Testing for nestedoscillation. J. Neurosci. Meth., 174:50–61, 2008.

[162] M. Penttonen and G. Buzsáki. Natural logarithmic relationship between brainoscillators. Thalamus Relat. Syst., 2:145–152, 2003.

[163] S. L. Pepke and J.M. Carlson. Predictability of self-organizing systems. Phys.Rev. E, 50:236–242, 1994.

[164] D.B. Percival and A.T. Walden. Spectral Analysis for Physical Applications:Multitaper and Conventional Univariate Techniques. Cambridge UniversityPress, Cambridge, 1993.

[165] M. Pfützner, M. Karny, L.V. Grigorenko, and K. Riisager. Radioactive decaysat limits of nuclear stability. Rev. Mod. Phys., 84:567–619, 2012.

[166] C. Plesa and C. Dekker. Data analysis methods for solid-state nanopores.Nanotechnology, 26:084003, 2015.

[167] C. Poletto, M.F. Gomes, A. Pastore y Piontti, L. Rossi, L. Bioglio, D. L. Chao,I.M. Longini, M.E. Halloran, V. Colizza, and A. Vespignani. Assessing theimpact of travel restrictions on international spread of the 2014 West AfricanEbola epidemic. Euro Surveill., 19:8–13, 2014.

[168] W.H. Press, S. A. Teukolsky, W.T. Vetterling, and B.P. Flannery. NumericalRecipes in C. Cambridge University Press, New York, second edt., 2002.

[169] M.B. Priestley. Spectral Analysis and Time Series. Academic Press, London,1989.

[170] M.B. Priestley and T. Subba Rao. A test for nonstationarity of time series.J. Roy. Statist. Soc. B, 31:140–149, 1969.

[171] F. S. Queiroz, K. Sinha, and A. Strumia. Leptoquarks, dark matter, andanomalous LHC events. Phy. Rev. D, 91:035006, 2014.

[172] J. Raethjen, M. Lindemann, H. Schmaljohann, R. Wenzelburger, G. Pfister,and G. Deuschl. Multiple oscillators are causing Parkinsonian and essentialtremor. Mov. Disord., 15:84–94, 2000.

[173] I. I. A. Rahman and N.M.A. Alias. Rainfall forecasting using an artificialneural network model to prevent flash floods. In High Capacity OpticalNetworks and Enabling Technologies (HONET), 2011, pp. 323–328. IEEE,Riyadh, 2011.

117

Page 134: Statistical Analysis of Processes with Application to

Bibliography

[174] M,B. Rajarshi. Statistical Inference for Discrete Time Stochastic Processes.Springer, New Delhi, 2012.

[175] S. Ramgopal, S. Thome-Souza, M. Jackson, N. E. Kadish, I. S. Fernández,J. Klehm, W. Bosl, C. Reinsberger, S. Schachter, and T. Loddenkemper.Seizure detection, seizure prediction, and closed-loop warning systems inepilepsy. Epilepsy Behav., 37:291–307, 2014.

[176] M.C.V. Ramirez, H. F. deCamposVelho, and N. J. Ferreira. Artificial neuralnetwork technique for rainfall forecasting applied to the Sao Paulo region.J. Hydrol., 301:146–162, 2005.

[177] S. Rampone and A. Valente. Neural network aided evaluation of landslidesusceptibility in Southern Italy. Int. J. Mod. Phys. C, 23:1250002, 2012.

[178] H. E. Rauch, F. Tung, and C.R. Striebel. Maximum likelihood estimates oflinear dynamic systems. AIAA J., 3:1445–1450, 1965.

[179] I. Reidler, M. Nixon, Y. Aviad, S. Guberman, A.A. Friesem, M. Rosenbluh,N. Davidson, and I. Kanter. Coupled lasers: Phase versus chaos synchroniza-tion. Opt. Lett., 38:4174–4177, 2013.

[180] S. Rengaraj, S. Venkataraj, S.H. Jee, Y. Kim, C.W. Tai, E. Repo, A. Koisti-nen, A. Ferancova, and M. Sillanpää. Cauliflower-like CDS microspherescomposed of nanocrystals and their physicochemical properties. Langmuir,27:352–358, 2011.

[181] W. Rindler. Relativity: Special, General and Cosmological. Oxford UniversityPress, Oxford, 2001.

[182] M. Rivlin-Etzion, O. Marmor, G. Heimer, A. Raz, A. Nini, and H. Bergman.Basal ganglia oscillations and pathophysiology of movement disorders. Curr.Opin. Neurobiol., 16:629–637, 2006.

[183] J. B. Rundle, J. R. Holliday, W.R. Graves, D. L. Turcotte, K. F. Tiampo, andW. Klein. Probabilities for large events in driven threshold systems. Phys.Rev. E, 86:021106, 2012.

[184] P. Sanò, G. Panegrossi, D. Casella, F. Di Paola, L. Milani, A. Mugnai,M. Petracca, and S. Dietrich. The passive microwave neural network precipi-tation retrieval (PNPR) algorithm for AMSU/MHS observations: Descriptionand application to European case studies. Atmos. Meas. Tech., 8:837–857,2015.

[185] G. Scasserra, J. P. Stewart, P. Bazzurro, G. Lanzo, and F. Mollaioli. A com-parison of NGA ground-motion prediction equations to Italian data. B. Seis-mol. Soc. Am., 99:2961–2978, 2009.

118

Page 135: Statistical Analysis of Processes with Application to

Bibliography

[186] B. Schelter. Analyzing Multivariate Dynamical Processes – From Linear toNonlinear Approaches. Thesis. Freiburg, 2006.

[187] B. Schelter, M. Winterhalder, R. Dahlhaus, J. Kurths, and J. Timmer. Partialphase synchronization for multivariate synchronizing systems. Phys. Rev.Lett., 96:208103, 2006.

[188] B. Schelter, M. Winterhalder, M. Eichler, M. Peifer, B. Hellwig,B. Guschlbauer, C.H. Lücking, R. Dahlhaus, and J. Timmer. Testing fordirected influences among neural signals using partial directed coherence.J. Neurosci. Methods, 152:210–219, 2006.

[189] B. Schelter, M. Winterhalder, T. Maiwald, A. Brandt, A. Schad, A. Schulze-Bonhage, and J. Timmer. Testing statistical significance of multivariate timeseries analysis techniques for epileptic seizure prediction. Chaos, 16:013108,2006.

[190] B. Schelter, M. Mader, W. Mader, L. Sommerlade, B. Platt, Y.-C. Lai,C. Grebogi, and M. Thiel. Overarching framework for data-based modelling.Europhys. Lett., 105:30004, 2014.

[191] B. Schelter, J. Timmer, and A. Schulze-Bonhage, eds. Seizure Prediction inEpilepsy: From Basic Mechanisms to Clinical Applications. Wiley, Weinheim,2008.

[192] B. Schelter, J. Timmer, and M. Eichler. Assessing the strength of directedinfluences among neural signals using renormalized partial directed coherence.J. Neurosci. Methods., 179:121–130, 2009.

[193] D. J. L.G. Schutter and G.G. Knyazev. Cross-frequency coupling of brainoscillations in studying motivation and emotion. Motiv. Emotion, 36:46–54,2012.

[194] S.A. Shimamoto, E. S. Ryapolova-Webb, J. L. Ostrem, N.B. Galifianakis,K. J. Miller, and P.A. Starr. Subthalamic nucleus neurons are synchronizedto primary motor cortex local field potentials in Parkinson’s disease. J. Neu-rosci., 33:7220–7233, 2013.

[195] R.H. Shumway and D. S. Stoffer. An approach to time series smoothing andforecasting using the EM algorithm. J. Time Ser. Anal., 3:253–264, 1982.

[196] R.H Shumway and D. S. Stoffer. Time Series Analysis and its Applications:With R Examples. Springer, New York, 2010.

[197] Z.K. Smith, T.R. Detman, M. Dryer, C.D. Fry, C.-C. Wu, W. Sun, andC. S. Deehr. A verification method for space weather forecasting models using

119

Page 136: Statistical Analysis of Processes with Application to

Bibliography

solar data to predict arrivals of interplanetary shocks at earth. IEEE T.Plasma Sci., 32:1498–1505, 2004.

[198] I. Soltesz and M. Deschênes. Low-and high-frequency membrane potentialoscillations during theta activity in CA1 and CA3 pyramidal neurons ofthe rat hippocampus under ketamine-xylazine anesthesia. J. Neurophysiol.,70:97–116, 1993.

[199] L. Sommerlade, M. Thiel, B. Platt, A. Plano, G. Riedel, C. Grebogi, J. Tim-mer, and B. Schelter. Inference of Granger causal time-dependent influencesin noisy multivariate time series. J. Neurosci. Meth., 203:173–185, 2012.

[200] L. Sommerlade, M. Mader, W. Mader, J. Timmer, M. Thiel, C. Grebogi,and B. Schelter. Optimized spectral estimation for nonlinear synchronizingsystems. Phys. Rev. E, 89:032912, 2014.

[201] M.C. Soriano, J. García-Ojalvo, C.R. Mirasso, and I. Fischer. Complex pho-tonics: Dynamics and applications of delay-coupled semiconductors lasers.Rev. Mod. Phys., 85:421–470, 2013.

[202] D. Sornette. Dragon-kings, black swans and the prediction of crises. Int. J.Terraspace Sci. Engin., 2:1–18, 2009.

[203] P. Stoica and P. Babu. Maximum-likelihood nonparametric estimation ofsmooth spectra from irregularly sampled data. IEEE T. Signal Proces.,59:5746–5758, 2011.

[204] S.H. Strogatz. Exploring complex networks. Nature, 410:268–276, 2001.

[205] P. Tass, M.G. Rosenblum, J. Weule, J. Kurths, A. Pikovsky, J. Volkmann,A. Schnitzler, and H.-J. Freund. Detection of n:m phase locking from noisydata: Application to magnetoencephalography. Phys. Rev. Lett., 81:3291–3294, 1998.

[206] C.A. Teixeira, B. Direito, H. Feldwisch-Drentrup, M. Valderrama,R. P. Costa, C. Alvarado-Rojas, S. Nikolopoulos, M. LeVanQuyen, J. Tim-mer, B. Schelter, and A. Dourado. EPILAB: A software package for studieson the prediction of epileptic seizures. J. Neurosci. Meth., 200:257–271, 2011.

[207] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J.D. Farmer. Testingfor nonlinearity in time series: The method of surrogate data. Physica D,58:77–94, 1992.

[208] J. Timmer. Power of surrogate data testing with respect to nonstationarity.Phys. Rev. E, 58:5153–5156, 1998.

120

Page 137: Statistical Analysis of Processes with Application to

Bibliography

[209] J. Timmer, M. Lauk, S. Häußler, V. Radt, B. Köster, B. Hellwig,B. Guschlbauer, C.H. Lücking, M. Eichler, and G. Deuschl. Cross-spectralanalysis of tremor time series. Int. J. Bifurcat. Chaos, 10:2595–2610, 2000.

[210] A.B. L. Tort, M.A. Kramer, C. Thorn, D. J. Gibson, Y. Kubota, A.M. Gray-biel, and N. J. Kopell. Dynamic cross-frequency couplings of local field po-tential oscillations in rat striatum and hippocampus during performance of aT-maze task. Proc. Natl. Acad. Sci. USA, 105:20517–20522, 2008.

[211] A.B. L. Tort, R.W. Komorowski, J. R. Manns, N. J. Kopell, and H. Eichen-baum. Theta–gamma coupling increases during the learning of item–contextassociations. Proc. Natl. Acad. Sci. USA, 106:20942–20947, 2009.

[212] A.B. L. Tort, R. Komorowski, H. Eichenbaum, and N. Kopell. Measuringphase-amplitude coupling between neuronal oscillations of different frequen-cies. J. Neurophysiol., 104:1195–1210, 2010.

[213] N. Tremblay, A. Barrat, C. Forest, M. Nornberg, J.-F. Pinton, and P. Borgnat.Bootstrapping under constraint for the assessment of group behavior in humancontact networks. Phys. Rev. E, 88:052812, 2013.

[214] J.W. Tukey. An introduction to the measurement of spectra. In The CollectedWorks of John W. Tukey. U. Grenander, ed., pp. 359ff. Chapman&Hall,London, 1984.

[215] N. Ubbelohde, C. Fricke, C. Flindt, F. Hohls, and R. J. Haug. Measurementof finite-frequency current statistics in a single-electron transistor. Nat. Com-mun., 3:612, 2012.

[216] F. Valach, M. Revallo, P. Hejda, and J. Bochníček. Predictions of SEP eventsby means of a linear filter and layer-recurrent neural network. Acta Astronaut.,69:758–766, 2011.

[217] P. VanDeKamp. The nearby stars. Annu. Rev. Astro. Astrophys, 9:103–126,1971.

[218] A. von Stein and J. Sartheim. Different frequencies for different scales of corti-cal integration: From local gamma to long range alpha/theta synchronization.Int. J. Psychophysiol., 38:301–313, 2000.

[219] B. Voytek, R.T. Canolty, A. Shestyuk, N. E. Crone, J. Parvizi, andR.T. Knight. Shifts in gamma phase-amplitude coupling frequency from thetato alpha over posterior cortex during visual tasks. Front. Hum. Neurosci.,4:191, 2010.

121

Page 138: Statistical Analysis of Processes with Application to

Bibliography

[220] B. Voytek, L. Secundo, A. Bidet-Caulet, D. Scabini, S. I. Stiver, A.D. Gean,G.T. Manley, and R.T. Knight. Hemicraniectomy: A new model for humanelectrophysiology with high spatio-temporal resolution. J. Cognitive Neu-rosci., 22:2491–2502, 2010.

[221] E.A. Wan and A.T. Nelson. Neural dual extended Kalman filtering: Applica-tions in speech enhancement and monaural blind signal separation. In Conf.Proc. IEEE Neural Netw. Signal Process., pp. 466–475. IEEE, Amelia Island,1997.

[222] D. J. Watts and S.H. Strogatz. Collective dynamics of ’small-world’ networks.Nature, 393:440–442, 1998.

[223] N. Wessel, U. Meyerfeldt, C. Ziehmann, A. Schirdewan, and J. Kurths. Sta-tistical versus individual forecasting of life-threatening cardiac arrhythmias.In AIP Conf. Proc., pp. 110–115. American Institute of Physics, New York,2002.

[224] M. S. Wigmosta, L.W. Vail, and D.P. Lettenmaier. A distributed hydrology-vegetation model for complex terrain. Water Resour. Res., 30:1665–1679,1994.

[225] D. S. Wilks. Statistical Methods in the Atmospheric Sciences. Academic PressInc., Burlington, second edt., 2005.

[226] M. Winterhalder, T. Maiwald, H.U. Voss, R. Aschenbrenner-Scheibe, J. Tim-mer, and A. Schulze-Bonhage. The seizure prediction characteristic: A gen-eral framework to assess and compare seizure prediction methods. EpilepsyBehav., 4:318–325, 2003.

[227] M. Winterhalder, B. Schelter, W. Hesse, K. Schwab, L. Leistritz, D. Klan,R. Bauer, J. Timmer, and H. Witte. Comparison of linear signal processingtechniques to infer directed interactions in multivariate neural systems. Signal.Process., 85:2137–2160, 2005.

[228] C. F. Wu. On the convergence properties of the EM algorithm. Ann. Stat.,11:95–103, 1983.

[229] C.K. Young and J. J. Eggermont. Coupling of mesoscopic brain oscillations:Recent advances in analytical and theoretical perspectives. Prog. Neurobiol.,89:61–78, 2009.

[230] I. Yucel, A. Onen, K.K. Yilmaz, and D. J. Gochis. Calibration and evaluationof a flood forecasting system: Utility of numerical weather prediction model,data assimilation and satellite-based rainfall. J. Hydrol., 523:49–66, 2015.

122

Page 139: Statistical Analysis of Processes with Application to

Bibliography

[231] R. Zelmann, F. Mari, J. Jacobs, M. Zijlmans, R. Chander, and J. Gotman.Automatic detector of high frequency oscillations for human recordings withmacroelectrodes. In Conf. Proc. Eng. Med. Biol. Soc. Ann., pp. 2329–2333.IEEE, Buenos Aires, 2010.

[232] Y. Zhao, Y. An, and Q. Ai. Research on size and location of distributed gen-eration with vulnerable node identification in the active distribution network.IET Gener. Transm. Dis., 8:1801–1809, 2014.

123

Page 140: Statistical Analysis of Processes with Application to
Page 141: Statistical Analysis of Processes with Application to

Appendix

125

Page 142: Statistical Analysis of Processes with Application to
Page 143: Statistical Analysis of Processes with Application to

A. Maximum-likelihood estimation

In this Appendix, maximum-likelihood methods of parameter estimation for autore-gressive processes in the state-space model (SSM) are described in further detail.The so-called incomplete data likelihood is derived in Sec. A.1 under the assump-tion that the underlying process is not observed. Within the scope of this, theKalman filter is introduced. The Kalman filter is a prerequisite for an alternativemaximum-likelihood approach. This approach is based on the complete data like-lihood within the scope of the Expectation-Maximization algorithm as describedin Sec. 2.1.2. For this approach the Kalman smoother and the lag-one covariancesmoother are needed. They are summarized in Sec. A.2. This appendix follows theconsiderations in [196], if not indicated otherwise.

A.1. Incomplete data likelihood and Kalman filter

The SSM is given by, cf. Eqs. (2.17) and (2.18),

X(t) = AX(t− 1) + ε(t) , with ε(t) ∼ N (0,Σε) (A.1)Y (t) = CX(t) + η(t) , with η(t) ∼ N (0,Ση) . (A.2)

where X(t) and Y (t) are vector-valued and the initial value X0 ∼ N (µ0,Σ0) isunknown. Given a set of sampled observations {y(t)}t=1,...,T , the incomplete datalog-likelihood is

Li(y(1), . . . ,y(T )|Θ) =1

2

T∑

t=1

ln (det (ΣξΘ)) +1

2

T∑

t=1

ξΘ(t)′ΣξΘξΘ(t) (A.3)

with parameters Θ = {µ0,Σ0,A,Σε,Ση}, and det(·) denoted the determinant. Itis based on the innovations

ξΘ(t) := y(t)− y(t|t− 1) , (A.4)

which are the residuals of the observed y(t) and the expected observation

y(t|t− 1) = E[y(t)|y(1), . . . ,y(t− 1)] , (A.5)

conditioned on previous observations y(1), . . . ,y(t− 1). In particular,

y(t|t− 1) = Cx(t|t− 1) = CE[x(t)|y(1), . . . ,y(t− 1)] . (A.6)

127

Page 144: Statistical Analysis of Processes with Application to

A. Maximum-likelihood estimation

The expectation E[·|y(1), . . . ,y(t)] refers to the expectation obtained from the prob-ability density induced by the observations y(1), . . . ,y(t) up to a time point t whenassuming the SSM, Eqs. (A.1) and (A.2).The covariance of the innovations ξΘ(t), Eq. (A.4), is

ΣξΘ := CP (t|t− 1)C ′ + Ση (A.7)

with

P (t|t− 1) = E[(x(t)− x(t|t− 1)) (x(t)− x(t|t− 1))′

∣∣y(1), . . . ,y(t− 1)] , (A.8)

the covariance of the state x(t|t− 1) given the observations.All constituents of the log-likelihood, Eq. (A.3), may be derived from the iterative

Kalman filter [104]. It consists of the iterations

for t = 1, . . . , T

x(t|t− 1) = Ax(t− 1|t− 1) (A.9)

P (t|t− 1) = AP (t− 1|t− 1)A+ Σε (A.10)

K(t) = P (t|t− 1)C ′(CP (t|t− 1)C ′ + Ση

)−1

, (A.11)

x(t|t) = x(t|t− 1) + K(t)(y(t)−Cx(t|t− 1)

)(A.12)

P (t|t) = P (t|t− 1)− K(t)CP (t|t− 1) , (A.13)

for a given set of parameters Θ including the initial state x(0|0) := µ0 with co-variance P (0|0) := Σ0. The Kalman gain K(t), Eq. (A.11), weights the one-stepahead predictions, Eqs. (A.9) and (A.10), by the actual observation in the filterequations (A.12) and (A.13) [104]. Using E[x(t)|y(1), . . . ,y(t − 1)] = x(t|t − 1) theinnovations ξΘ(t), Eq. (A.4), and respective covariances ΣξΘ , Eq. (A.7), are derivedfrom the prediction equations, Eqs. (A.9) and (A.10), to constitute the incompletedata log-likelihood, Eq. (A.3). This likelihood refers to the parameters, Θj, chosento iterate the Kalman fiter, Eqs. (A.9)–(A.11). To obtain the set of parameters thatfits the observations best, the following iterative procedure has to be conducted:

Define a set of parameters Θ0, then iterate for j = 0, 1, . . . until the parame-ters do not change up to a pre-defined precision.

1. Apply the Kalman filter, Eqs. (A.9)–(A.13) to obtain x(t|t−1) and P (t|t−1)for t = 1, . . . T using the current set of parameters Θj.

2. Constitute the incomplete data log-likelihood, Eq. (A.3), using x(t|t− 1) andP (t|t− 1) from the Kalman filter.

3. Numerically find the minimum of the incomplete data log-likelihood to get anupdate of the set of parameters, Θj+1, see e.g. [168].

The alternative complete data log-likelihood approach is summarized in Sec. 2.1.2.It uses the results of Sec. A.2.

128

Page 145: Statistical Analysis of Processes with Application to

A.2. Kalman smoother and lag-one covariance smoother

A.2. Kalman smoother and lag-one covariancesmoother

While the Kalman filter, App. A.1 Eqs. (A.9)–(A.13), uses only previous observa-tions, the Kalman smoother is based on back-ward recursion, such that all obser-vations y(1), . . . ,y(T ) are incorporated in each time t = 1, . . . , T . The Kalmansmoother is given by

x(t− 1|T ) = x(t− 1|t− 1) + J(t− 1)(x(t|T )− x(t|t− 1)

)(A.14)

P (t− 1|T ) = P (t− 1|t− 1) + J(t− 1)(P (t|T )− P (t|t− 1)

)J(t− 1)′ (A.15)

J(t− 1) = P (t− 1|t− 1)A′P−1(t− 1|t− 1) (A.16)

with initial states obtained from the final states and covariances of the Kalmanfilter. Resulting smoothed states x(t|T ) and covariances P (t|T ) are based on allobservations y(1), . . . ,y(T ). The analog of the Kalman gain K(t) for the filter isthe Kalman smoothing gain J(t) for the smoother. From the Kalman smoothermost constituents of Expectation-Maximization algorithm applied to the completedata log-likelihood may be derived, cf. Sec. 2.1.2.The constituent that is not derived by the Kalman smoother is given by the lag-

one covariance smoother. It iterates the covariance of two states x(t|t − 1) andx(t|t− 2), defined as

P (t− 1, t− 2|T ) = E[(x(t)− x(t− 1|T )

)(x(t)− x(t− 2|T )

)′∣∣∣y(1), . . . ,y(T )],

(A.17)

according to

P (t− 1, t− 2|T ) =P (t− 1|t− 1) J(t− 2)′

+ J(t− 1)(P (t, t− 1|T )−AP (t− 1|t− 1)

)J(t− 2)′

(A.18)

with initial values P (T, T − 1|T ) =(1 − K(T )C

)AP (T − 1|T − 1) , P (T |T ), as

well as K(t) and J(t) obtained from the Kalman filter and smoother, Eqs. (A.11)and (A.16), respectively.

129

Page 146: Statistical Analysis of Processes with Application to
Page 147: Statistical Analysis of Processes with Application to

B. Definition of the renormalized partialdirected coherence

The renormalized partial directed coherence (rPDC), cf. Eq. (2.44), [192]

Rjk(ν) = F jk(ν)′Υ−1jk (ν)F jk(ν) , (B.1)

j 6= k, is a normalized version of the absolute value of [192]

F jk(ν) =

(−∑p

τ=1 Ajk(τ) cos(ντ)

+∑p

τ=1 Ajk(τ) sin(ντ)

). (B.2)

The vector F jk contains the real and the imaginary part of the the Fourier transformof parameter matrices A(τ), cf. Eq. (2.40). The normalization is given by [192]

Υjk(ν) =d∑

τ1,τ2=1

Hkk(τ1, τ2)Σε,jj

(cos(τ1ν) cos(τ2ν) − cos(τ1ν) sin(τ2ν)− sin(τ1ν) cos(τ2ν) sin(τ1ν) sin(τ2ν)

), (B.3)

which is the covariance

gggjk(ν) =d∑

τ1,τ2=1

Ajk(τ1) Ajk(τ2)

(cos(τ1ν) cos(τ2ν) − cos(τ1ν) sin(τ2ν)− sin(τ1ν) cos(τ2ν) sin(τ1ν) sin(τ2ν)

). (B.4)

of F jk(ν) divided by the number of data points, T , used for estimation.In this Appendix, the identity

Ajk(τ1) Ajk(τ2) = T Hkk(τ1, τ2) Σε,jj , (B.5)

cf. Eq. (2.42), is derived. Since the rPDC has been proposed based on the assump-tion of directly observed states x(t) instead of y(t), this Appendix shows the resultswhen observing x(t), directly. It is along the considerations in [129]. When the statesare observed through an observation function as modeled by the state-space model(SSM), the key adaption is to consider the smoothed state x(t|T ), as obtained fromthe Kalman filter, instead of x(t). This is due to the fact that instead of maximizingthe expectation of the complete likelihood, for parameter estimation in the SSM theexpectation of the likelihood based on the observed states y(t) is maximized. Thisconditional expectation is transferred to the states x(t|T ), and obtained from theKalman smoother.

131

Page 148: Statistical Analysis of Processes with Application to

B. Definition of the renormalized partial directed coherence

By the example of an AR2[2], the entries of Hkk(τ1, τ2) are made explicit. Sinceall quantities are estimated in this Appendix, all ·-signs are relinguished for the sakeof simplicity. To prove Eq. (B.5), it is useful to introduce a notation which is basedon the vec-Operator that stacks all columns of its argument underneath each other,e.g., for the process matrices of an AR2[2], the vec-operator yields

vec(A) = vec(A11(1) A12(1) A11(2) A12(2)A21(1) A22(1) A21(2) A22(2)

)=

A11(1)A21(1)A12(1)A22(1)A11(2)A21(2)A12(2)A22(2)

. (B.6)

Note that here, only the dynamic parameters of the process are contained in theparameter matrix,

A :=

(A11(1) A12(1) A11(2) A12(2)A21(1) A22(1) A21(2) A22(2)

). (B.7)

The covariance of A is

A11(1)A21(1)A12(1)A22(1)A11(2)A21(2)A12(2)A22(2)

(A11(1) A21(1) A12(1) A22(1) A11(2) A21(2) A12(2) A22(2)

)

=

A11(1)A11(1) A11(1)A21(1) A11(1)A12(1) A11(1)A22(1)A21(1)A11(1) A21(1)A21(1) A21(1)A12(1) A21(1)A22(1)A12(1)A11(1) A12(1)A21(1) A12(1)A12(1) A12(1)A22(1)A22(1)A11(1) A22(1)A21(1) A22(1)A12(1) A22(1)A22(1)A11(2)A11(1) A11(2)A21(1) A11(2)A12(1) A11(2)A22(1)A21(2)A11(1) A21(2)A21(1) A21(2)A12(1) A21(2)A22(1)A12(2)A11(1) A12(2)A21(1) A12(2)A12(1) A12(2)A22(1)A22(2)A11(1) A22(2)A21(1) A22(2)A12(1) A22(2)A22(1)

. . .

A11(1)A11(2) A11(1)A21(2) A11(1)A12(2) A11(1)A22(2)A21(1)A11(2) A21(1)A21(2) A21(1)A12(2) A21(1)A22(2)A12(1)A11(2) A12(1)A21(2) A12(1)A12(2) A12(1)A22(2)A22(1)A11(2) A22(1)A21(2) A22(1)A12(2) A22(1)A22(2)A11(2)A11(2) A11(2)A21(2) A11(2)A12(2) A11(2)A22(2)A21(2)A11(2) A21(2)A21(2) A21(2)A12(2) A21(2)A22(2)A12(2)A11(2) A12(2)A21(2) A12(2)A12(2) A12(2)A22(2)A22(2)A11(2) A22(2)A21(2) A22(2)A12(2) A22(2)A22(2)

.

(B.8)

132

Page 149: Statistical Analysis of Processes with Application to

B Definition of the renormalized partial directed coherence

For the rPDC, and thus identity (B.5), only terms Ajk(τ1)Ajk(τ2) with j 6= k, of thismatrix are needed. Here and in the following, these relevant entries are displayedin red.Now, consider the more general case of an ARN [d],

x(t) =

A11(1) . . . A1N(1)

......

AN1(1) . . . ANN(1)

︸ ︷︷ ︸A(1)

x(t− 1) + . . .

+

A11(d) . . . A1N(d)

......

AN1(d) . . . ANN(d)

︸ ︷︷ ︸A(d)

x(t− d) + ε(t)

(B.9)

with x(t) = (x1(t), x2(t), . . . , xN(t))′. Delay-embedding yields

z(t) =

x(t)...

x(t− d+ 1)

, (B.10)

a (Nd× 1)-vector. Including all times points into one (Nd× T )-matrix,

Ξ = [z(1), . . . ,z(T )] , (B.11)

and defining

ξ := vec(x(1), . . . ,x(T )) , (B.12)

and

vec(ε) := vec(ε(1), . . . , ε(T )) (B.13)

the autoregressive process, Eq. (B.9), is rewritten

ξ = (Ξ ′⊗1N) vec(A) + vec(ε) , (B.14)

with 1N the (N ×N)-identity matrix and Kronecker symbol ⊗ defined byV11 . . . V1r...

...Vr1 . . . Vrr

W11 . . . W1s...

...Ws1 . . . Wss

︸ ︷︷ ︸W

=

V11W . . . V1rW

......

Vr1W . . . VrrW

. (B.15)

133

Page 150: Statistical Analysis of Processes with Application to

B. Definition of the renormalized partial directed coherence

When estimating the parameters A from maximum-likelihood estimation, thecovariance of the parameters is obtainable from the negative inverse of the sec-ond derivative of the corresponding log-likelihood. The relevant terms in the log-likelihood are thus those which contain the parameters A, yielding

Lr :=1

2

T∑

t=1

[ξ − (Ξ ′⊗1N) vec(A)]′(1N ⊗Σ−1

vec(ε)) [ξ − (Ξ ′⊗1N) vec(A)] (B.16)

with vec(ε) ∼ N (0,1N ⊗Σvec(ε)). Note, that in the case of the SSM, Ξ containsthe delay-embedded smoothed x(t|T ) instead of delay-embedded x(t), accordingto Eqs. (B.10) and (B.11). All following considerations then are analogous to theoutlined considerations.Calculating the second derivative of this log-likelihood with respect to vec(A)

leads to the covariance vec(A) vec(A)′. The first derivative of Lr is∂ Lr

∂ vec(A)= (Ξ ′⊗1N)(1T ⊗Σε

−1) [ξ − (Ξ ′⊗1N) vec(A)]

= (Ξ ′⊗Σε−1)ξ − (ΞΞ ′⊗Σε

−1) vec(A) .

(B.17)

The second order derivative is

∂2 Lr∂ vec(A)∂ vec(A)

= −ΞΞ ′⊗Σ−1ε . (B.18)

The expectation of this is the negative Fisher information matrix. Inversion yieldsthe covariance of vec(A),

E[ΞΞ ′

T − d

]−1

⊗Σε = TH⊗Σε (B.19)

with

H :=T

T − d E [ΞΞ ′]−1

. (B.20)

Effectively, H it is the inverse of the covariance of the delay-embedded processes ifx(t) is observed directly, or x(t|T ) if parameters are fit from the SSM.For the two-dimensional AR2[2],

ΞΞ ′

T − d =

R11(1, 1) R12(1, 1) R11(1, 2) R12(1, 2)R21(1, 1) R22(1, 1) R21(1, 2) R22(1, 2)R11(2, 1) R12(2, 1) R11(2, 2) R12(2, 2)R21(2, 1) R22(2, 1) R21(2, 2) R22(2, 2)

(B.21)

with

Rjk(τ1, τ2) =1

T − d [xj(τ1)xk(τ2) + · · ·+ xj(T − d+ τ1)xk(T − d+ τ2)] (B.22)

134

Page 151: Statistical Analysis of Processes with Application to

B Definition of the renormalized partial directed coherence

the covariance function of xj(τ1) and xk(τ2) at respective lags τ1 and τ2.

With Σε =

(Σε,11 0

0 Σε,22

)the second order derivative of the negative log-likelihood

is

− ∂2 Lr∂ vec(A)∂ vec(A)

= ΞΞ ′⊗Σε−1

= (T − d)

R11(1, 1)Σ−1ε,11 0 R12(1, 1)Σ−1

ε,11 00 R11(1, 1)Σ−1

ε,22 0 R12(1, 1)Σ−1ε,22

R21(1, 1)Σ−1ε,11 0 R22(1, 1)Σ−1

ε,11 00 R21(1, 1)Σ−1

ε,22 0 R22(1, 1)Σ−1ε,22

R11(2, 1)Σ−1ε,11 0 R12(2, 1)Σ−1

ε,11 00 R11(2, 1)Σ−1

ε,22 0 R12(2, 1)Σ−1ε,22

R21(2, 1)Σ−1ε,11 0 R22(2, 1)Σ−1

ε,11 00 R21(2, 1)Σ−1

ε,22 0 R22(2, 1)Σ−1ε,22

. . .

R11(1, 2)Σ−1ε,11 0 R12(1, 2)Σ−1

ε,11 00 R11(1, 2)Σ−1

ε,22 0 R12(1, 2)Σ−1ε,22

R21(1, 2)Σ−1ε,11 0 R22(1, 2)Σ−1

ε,11 00 R21(1, 2)Σ−1

ε,22 0 R22(1, 2)Σ−1ε,22

R11(1, 1)Σ−1ε,11 0 R12(1, 1)Σ−1

ε,11 00 R11(1, 1)Σ−1

ε,22 0 R12(1, 1)Σ−1ε,22

R21(1, 1)Σ−1ε,11 0 R22(1, 1)Σ−1

ε,11 00 R21(1, 1)Σ−1

ε,22 0 R22(1, 1)Σ−1ε,22

.

(B.23)

The covariance of vecA is

E[ΞΞ ′

T − d

]−1

⊗Σε

= T

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ H11(1, 1)Σε,22 ∗ ∗ ∗ H11(1, 2)Σε,22 ∗ ∗∗ ∗ H22(1, 1)Σε,11 ∗ ∗ ∗ H22(1, 2)Σε,11 ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ H11(2, 1)Σε,22 ∗ ∗ ∗ H11(2, 2)Σε,22 ∗ ∗∗ ∗ H22(2, 1)Σε,11 ∗ ∗ ∗ H22(2, 2)Σε,11 ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

.

(B.24)

Entries that are relevant for the rPDC are displayed in red as above. For the sake

135

Page 152: Statistical Analysis of Processes with Application to

B. Definition of the renormalized partial directed coherence

of brevity, all other entries are abbreviated *. Comparing this to the covariance

vec(A) vec(A)′ =

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ A21(1)A21(1) ∗ ∗ ∗ A21(1)A21(2) ∗ ∗∗ ∗ A12(1)A12(1) ∗ ∗ ∗ A12(1)A12(2) ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ A21(2)A21(1) ∗ ∗ ∗ A21(2)A21(2) ∗ ∗∗ ∗ A12(2)A12(1) ∗ ∗ ∗ A12(2)A12(2) ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

(B.25)

(cf. Eq. (B.8)) yields the result

Ajk(τ1)Ajk(τ2) = T Hkk(τ1, τ2) Σε,jj , (B.26)

i.e., that Ajk(τ1)Ajk(τ2) is proportional to Hkk(τ1, τ2)Σε,jj, as used in Sec. 2.1.3.

136

Page 153: Statistical Analysis of Processes with Application to

C. Rectification of the electromyogram

C.1. Algorithm of Rectification

The rectified electromyogram (EMG) is the absolute value of the [30, 250]Hz band-pass filtered EMG. By filtering, aliasing and movement artifacts are avoided [186].Within this dissertation, the rectified EMG is demeaned if not noted otherwise.

C.2. Reasoning for Rectification

The EMG of a tremor patient is modeled by [186]

X(t) = (1 + cos(ωt+ ϕ)) ε(t) , ε(t)iid∼ N (0, σ2

ε ) (C.1)

with independent identically distributed (iid) Gaussian noise with zero mean andvariance σ2

ε . Uniformly distributed random phases ϕ ∼ U(0, 2π] ensure mixing ofthe process X(t). The spectrum of X(t), see Chap. 4, is flat, since it consists of theconvolution of the flat spectrum of ε(t) and a peak at ±ω of the cosinusoid.To reveal the cosinusoid oscillation, the signal is rectified. The key step of recti-

fication is taking the absolute value, which is equivalent to first squaring and thentaking the square root of the signal. The squared model (C.1) is

X2(t) = (1 + cos(ωt+ ϕ))2 ε2(t)

= ε2(t) + 2ε2(t) cos(ωt+ ϕ) + ε2(t) cos2(ωt+ ϕ) .(C.2)

Defining η(t) the iid residuals, such that

ε2(t) = η(t) + E[ε2(t)] = η(t) + σ2ε , (C.3)

Eq. (C.2) is rewritten

X2(t) =[η(t) + σ2

ε

]

+[2η(t) cos(ωt+ ϕ) + 2σ2

ε cos(ωt+ ϕ)]

+[η(t) cos2(ωt+ ϕ) + σ2

ε cos2(ωt+ ϕ)].

(C.4)

The spectrum of X2(t) is given by the sum of spectra of the processes given by thethree squared brackets in Eq. (C.4). Since η(t) is iid, the spectrum corresponding tothe first squared bracket is flat. The spectrum corresponding to the second squared

137

Page 154: Statistical Analysis of Processes with Application to

C. Rectification of the electromyogram

0 0.2 0.4 0.6 0.8 10

1

2

3

4

time in s

mod

el

0 0.2 0.4 0.6 0.8 10

5

10

15

time in s

EM

G

(a)

0 0.2 0.4 0.6 0.8 10

1

2

3

4

time in s

mod

el

0 0.2 0.4 0.6 0.8 10

5

10

15

time in s

EM

G

(b)

Figure C.1.: Undemeaned rectified realization of the model of the EMG given byEq. (C.1) (a) as compared to the rectified EMG of a tremor patient (b).

bracket exhibits peaks at ±ω. It is the sum of a flat spectrum of η(t) cos(ωt) andthe peaks at ±ω from cos(ωt+ ϕ). For the third squared bracket, the identity

cos2(ωt) =1

2(cos(2ωt) + 1) , (C.5)

is obtained from angle sum and difference identities. Accordingly, the third squaredbracket is

1

2η(t) (cos(2ωt+ 2ϕ) + 1) +

1

2σ2ε (cos(2ωt+ 2ϕ) + 1) . (C.6)

The spectrum corresponding to the first term in Eq. (C.6) is flat. The spectrumcorresponding to the second term exhibits peaks at ±2ω due to the cosinusiod term.All in all, the spectrum of X2(t) exhibits peaks at ±ω and ±2ω such that due

to the square the fundamental frequency ω contained in X(t), i.e. the dynamics ofinterest, is revealable by spectral analysis. Taking the square root does not changethe spectral contributions due to the monotonicity of the square root.Exemplarily, the rectified process X(t) with σ2

ε = 1 and ω = 4Hz, is shown inFig. C.1 (a). For comparison, an exemplary rectified EMG of a tremor patient isshown in (b). Here, the rectified data is not demeaned.

138

Page 155: Statistical Analysis of Processes with Application to

D. Results of spectral and bispectralanalysis of tremor data

Here, the results of spectral and bispectral analysis, both for univariate and bivariateconsiderations are summarized in Tab. D.1. All spectral quantities are estimatedby averaging independent blocks of data. Independence is specified by the decay ofthe autocorrelation of the EMG.Patients 1-10 exhibited consistent spectral peaks at the tremor frequency and

twice the tremor frequency. Patients 11-12 had inconsistent first order spectraland cross-spectral peaks at tremor frequency and twice the tremor frequency. Theasterisk * refers to the one segment in which no coherence peak at the tremorfrequency occurred.Abbreviations in the table are

• ’ID’ = patient identity number

• ’Type’ = type of tremor (ET, PT)

• ’Seg’ = segment number

• ’Side’ = hand side (L = left, R = right)

• ’Muscle’ = type of muscle (’E’ = extensor, ’F’ = flexor)

• ’TF’ = tremor frequency in Hz

• ’Group’ = categorization according to linear cross-spectral properties (A = nocoherence peak at twice the tremor frequency, B = coherence peak at twicethe tremor frequency, C = inconsistent peaks in spectrum and coherence atonce and twice the tremor frequency)

• ’BL’ = block length for spectral and bispectral analysis,

• ’BE’ = bicoherence of the EEG

• ’BM’ = bicoherence of the EMG

• ’XB1’ = cross-bicoherence of EEG and EMG onto EEG

• ’XB2’ = cross-bicoherence of EMG and EEG onto EMG

• ’XB3’ = cross-bicoherence of EMG onto EEG

• ’XB4’ = cross-bicoherence of EEG onto EMG.

139

Page 156: Statistical Analysis of Processes with Application to

D.Results

ofspectralandbispectralanalysis

oftremor

data

ID Type Seg Side Muscle TF(Hz) Group BL(s) BE BM XB1 XB2 XB3 XB41 ET 1 L E 7 B 2.5 0 1 1 1 0 0

ET 2 R E 4.2 A 2.5 0 1 1 1 1 0ET 3 L E 5.4 A 2.5 0 1 0 0 0 0

2 ET 1 R F 5.9 A 10 0 1 1 1 0 0ET 2 R E 5.9 A 5 0 1 0 1 0 0ET 3 R F 5.9 A 5 0 1 0 1 0 0

3 ET 1 R E 5.6 B 5 0 1 0 0 1 0ET 2 L F 5.3 B 5 0 1 0 1 1 1ET 3 R F 5.6 B 5 0 1 0 0 1 0ET 4 R E 5.9 B 2.5 0 1 1 1 1 0ET 5 R F 5.9 B 2.5 0 1 1 1 1 0ET 6 R E 5.9 A 2.5 0 1 1 1 1 0

4 ET 1 R F 5.4 A 5 0 1 1 0 1 0ET 2 L E 5.1 A 2.5 0 1 0 0 1 0ET 3 L F 5.1 B 2.5 0 1 0 0 1 0ET 4 L E 5.4 A 2.5 0 1 0 1 0 0ET 5 L F 5.4 A 2.5 0 1 0 1 1 1

5 ET 1 L E 5.2 B 5 1 1 1 1 0 1ET 2 L F 5.2 B 5 0 1 0 1 0 1ET 3 R E 5.4 A 2.5 0 1 0 0 0 0ET 4 L F 5.4 B 5 0 1 0 0 1 0

6 ET 1 L E 4.6 A 2.5 0 1 0 1 0 1ET 2 L F 4.6 A 2.5 0 1 0 1 0 0ET 3 L E 4.4 B 2.5 0 1 0 1 1 1ET 4 L F 4.4 A 2.5 0 1 0 1 0 0ET 5 L E 4.4 A 5 0 1 0 1 0 1ET 6 L F 4.4 A 2.5 0 1 0 0 0 0

140

Page 157: Statistical Analysis of Processes with Application to

EResults

ofspectralandbispectralanalysis

oftremor

data

ID Type Seg Side Muscle TF(Hz) Group BL(s) BE BM XB1 XB2 XB3 XB47 PT 1 L E 4 B 10 0 1 0 1 1 0

PT 2 L F 4 B 10 1 1 0 1 1 0PT 3 L E 3.9 B 2.5 0 1 0 1 0 0PT 4 L F 4.2 B 2.5 0 1 1 0 0 0PT 5 L E 4.2 B 5 0 1 1 0 1 0PT 6 L F 4.2 B 5 0 1 0 0 0 0

8 PT 1* R E 4.8 B 5 0 1 0 0 1 0PT 2 R F 4.8 B 10 1 1 1 1 1 1PT 3 R E 4.8 B 10 0 1 1 1 1 0PT 4 R F 4.8 B 10 0 1 1 1 1 0PT 5 R E 4.7 B 10 0 1 1 1 1 0PT 6 R F 4.7 B 10 1 1 0 1 0 1

9 PT 1 R E 4.6 B 10 1 1 0 0 1 1PT 2 R F 4.7 B 10 0 1 0 0 1 0PT 3 R E 4.6 B 10 0 1 1 1 1 0PT 4 R F 4.6 B 10 1 1 1 1 1 1PT 5 R E 4.4 B 2.5 0 1 1 0 1 0PT 6 R F 4.3 B 5 0 1 0 0 1 0

10 PT 1 R E 5.2 B 5 0 1 0 1 0 0PT 2 R F 5.2 B 5 0 1 0 1 0 0PT 3 R E 5.1 B 5 0 1 0 1 0 1PT 4 R F 5.1 B 5 0 1 0 1 0 1

6 49 17 32 27 13

141

Page 158: Statistical Analysis of Processes with Application to

D.Results

ofspectralandbispectralanalysis

oftremor

data

ID Type Seg Side Muscle TF(Hz) Group BL(s) BE BM XB1 XB2 XB3 XB411 PT 1 R E 4.4 C 5 0 1 0 0 0 0

PT 2 R F 4.4 C 5 0 1 0 0 0 0PT 3 R E 4.4 C 5 0 1 0 0 0 0PT 4 R F 4.5 C 10 0 1 0 0 1 0PT 5 R E 4.4 C 2.5 0 1 0 0 0 0PT 6 R F 4.2 C 10 0 0 0 0 0 0

12 ET 1 L E 5.6 C 2.5 0 1 0 0 1 0ET 2 L E 5.6 C 2.5 0 1 0 0 0 0ET 3 L E 5.6 C 2.5 0 0 0 0 1 0

0 7 0 0 3 0

142

Page 159: Statistical Analysis of Processes with Application to

Acknowledgements

First of all, I want to thank my supervisors Prof. Dr. Jens Timmer and Prof. Dr.Björn Schelter. They provided me with the topic of my thesis and left enough scopeto develop my own ideas on the one hand. Particularly, I want to thank them forenabling me to attend conferences and spend seven months at the PennsylvaniaState University in State College, Pennsylvania, visiting my collaboration partnerProf. Dr. Bruce J. Gluckman. I want to thank the latter for his warm welcome.He introduced me into the field of seizure prediction and taught me to challenge myideas.Furthermore I would like to thank all my colleagues for the cooperation. Par-

ticularly, I thank Madineh Sarvestani for the unbelievable support at PennsylvaniaState University. Thank you for proof-reading my thesis. I thank Juliane Klatt,for cooperating in the field of bispectral analysis. Furthermore, I like to thank allmembers of Prof. Dr. Jens Timmer’s group for the great working atmosphere, andthe proof readers of this dissertation Clemens Kreutz, Raphael Engesser, DanielKaschek, Marcus Rosenblatt, and Helge Hass. I want to thank Berenika Siddons,Dr. Elke Martinez, and Monika Hattenbach for the relentless commitment in mas-tering all administrative challenges, ranging from formal changes of position totravel expense accounting.I further thank all collaborators in the University Medical Center Freiburg. Par-

ticularly, I thank Dr. Florian Amtage and PD Dr. Bernhard Hellwig for the col-laboration in the field of tremor analysis. Thank you for the fruitful discussionsand the patience to explain medical details while showing high interest in the sta-tistical methodology. Furthermore, I want to thank PD Dr. Julia Jacobs for theconfidence in letting me advise her Ph.D. students with respect to the statisticalassessment of their data. I also want to thank these students, Josephine Schwind,Christina Klus, and Jena Selvalingam, for their cooperation and the data they pro-vided me with. Of course, I thank all the authors and coauthors of common papers.Their contributions have always made me look at our investigations from a differentperspective.Last but certainly not least, I would like to express my gratitude to my family

and friends for supporting me in all the years at the University. I want to addressspecial thanks to my parents, Jutta and Rainer Killmann, who made it possible forme to write this thesis in several respects. Not only did they support my studiesfinancially. They also strengthened me in all my decisions, and found helping handsin every stressful situation. I also want to thank them and my parents-in-law forbeing such wonderful grandparents to my daughter Leana. Special thanks to thelatter for the happiness and joy you keep bringing into my life. Particularly, I wantto thank my husband Wolfgang Mader, who has not only been stable mental supportduring the last five years but always an alert, firm, credible, and valuable discussionpartner, listener, and proof reader of all papers including this thesis. Thank youfor your love and patience.

Page 160: Statistical Analysis of Processes with Application to
Page 161: Statistical Analysis of Processes with Application to

Index

Autocorrelation, 31–41, 139Autocovariance, 44, 47, 54, 61, 62Autoregressive process, 8, 16, 17, 21,

22, 24, 25, 27–29, 31, 34–38,41, 127, 132–134

Bicoherence, 43, 49, 50, 55, 56, 58–60,62–64, 139

Cross-, 49, 54, 55, 57, 58, 60, 62–64, 139

Bispectral analysis, 43, 55, 56, 65Bispectral coefficient, 43, 49, 51, 52,

55, 56, 58, 59Cross-, 49, 57, 58

Bispectrum, 43, 44, 46, 47, 51, 65Cross-, 45–47, 49, 54

Block length, 31–42, 54, 56, 58, 61, 62,139

Bootstrap, 5, 7–9, 23, 24, 26, 27, 29,53–55, 73, 82, 87

Block-, 8, 31–33, 37, 40–43, 52, 54,55, 57–60, 63–65

Nonparametric-, 7–9, 31, 65Parametric-, 7–9, 14, 23, 26

Coherence, 45, 48, 50, 51, 56, 57, 60,62–64, 139

ConfidenceInterval, 6, 7, 9, 23, 24, 26, 27, 29,

31, 37, 40–42Level, 6, 27, 40

Cross-covariance, 45, 54Cross-frequency coupling, 67, 68, 76Cumulant, 44

Joint-, 44, 45

Distribution, 5, 7–9, 18, 23, 71–73, 88Null-, 9, 52, 53, 55, 59, 60, 73, 87,

88, 90Sampling-, 5–8, 24, 87

Estimation, 5–9, 32, 41, 53, 54, 131Interval-, 6, 8, 14, 24, 26, 27, 29,

43Point-, 5, 8, 14, 17–21, 23, 25, 27,

31–34, 36, 38–41, 43, 47–49,51, 52, 54–59, 61, 62, 64, 65

Expectation-Maximization algorithm,18, 19, 23, 127, 129

Gaussianization, 52, 58, 63Granger causality, 13, 14, 29

HypothesisNull-, 8, 9, 53, 63, 87, 89, 90, 96Test, 8, 9, 52, 54, 55, 60, 73–75,

82, 90, 97

KalmanFilter, 19, 127–129Dual-, 21, 23, 25

Gain, 129Smoother, 19, 20, 127, 129

Kullback-Leibler distance, 72, 73

Lag-one covariance smoother, 19, 20,127, 129

Likelihood, 17Log-, 17–19, 127–129

Maximum likelihoodEstimation, 14, 17–21, 29, 35, 84,

127, 134

Page 162: Statistical Analysis of Processes with Application to

Index

Mixing, 31, 44

Network, 13, 15, 24, 27, 29Noise-to-signal ratio, 24, 27, 34, 36–

38, 42, 56, 58, 75

Observation noise, 15, 24, 27, 34, 36,37, 55, 56

Partial directed coherenceRenormalized-, 22–30

Phase-amplitudeCoupling, 67, 68, 70–73, 75, 76Histogram, 72Plot, 71–76

Prediction, 13, 81–97, 128

Quadratic phase coupling, 44, 45, 47,55–59, 64

Quantile, 6, 9, 24, 27, 53, 55, 88, 91

RegressionAnalysis, 49, 50Logistic-, 82, 83

Significance, 58, 60, 64, 82Level, 8, 58, 59, 63, 75, 96

Smooth function model, 5, 31, 32, 41Spectral analysis, 43, 60, 62, 65

Cross-, 60, 62Spectrum, 44, 47, 48, 61, 62, 64, 139

Cross-, 45, 47, 48, 61State-space model, 14, 16–24, 29, 34,

37, 38, 127, 128, 131Dual-, 20

Surrogate, 67, 73, 82, 87