Download pdf - Clustering di Serie Temporali Finanziarie ed Evidence di Effetti di Memoria

Clustering di Serie Temporali Finanziarie edEvidenze di Effetti di Memoria

Gabriele Pompa

14/12/2012

Clustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 1of 24

Introduzione Feedback Strategie di Investimento Analisi Statistiche Modello di Clustering Risultati Conclusioni

Sommario1 Introduzione

Mercato FinanziariTechnical TradingPattern Recognition

2 Feedback Strategie di InvestimentoSupporti & Resistenze: Probabilitá di Rimbalzo

3 Analisi StatisticheRimbalzi: Tempo di Ricorrenza, Durata & Fluttuazioni tipiche diPrezzo

4 Modello di ClusteringCaratterizzazione Bayesiana del Miglior ClusteringStep MCMC, Splitting & Merging

5 RisultatiToy Model & Serie RealiAnalisi Causa-Effetto

6 Conclusioni



Ipotesi di Mercato Efficiente (EMH)

information set θt : corpus di conoscenze su cui sono basatiinvestimenti (profitto aspettato di una compagnia, tasso diinteresse, dividendo aspettato)Jensen (1978): mercato efficiente rispetto a θt ⇐⇒ impossibile fareprofitti basandosi su θt =⇒ martingale: E [pt+1|p0, p1, ..., pt ] = ptincrementi del prezzo non prevedibili: modelli Random-Walk(Bachelier 1900, Samuelson 1973...)

Time (hours)

Pric

e (t

icks

)

Time (hours)

Pric

e (t

icks

)

Vodaphone, 110o giorno di trading del 2002 Vs pt+1 = pt +N (µ, σ)



Alcuni "Fatti Stilizzati"

rendimenti rτ (t) = p(t + τ)− p(t) non distribuiti gaussianamente:code piú grasse rispetto a distribuzione normalerendimenti scorrelati 〈rτ (t)rτ (t + T )〉 ∼ 0 ma non indipendenti:〈|rτ (t)||rτ (t + T )|〉 e 〈r2τ (t)r2τ (t + T )〉 non trascurabili

5

10-4

10-3

10-2

10-1

-1.5 -1 -0.5 0 0.5 1 1.5

Pro

ba

bil

ity

den

sity

fu

nct

ion

Log-returns

BNPP.PAGaussian

Student

FIG. 1. (Top) Empirical probability density function of thenormalized 1-minute S&P500 returns between 1984 and 1996.Reproduced from Gopikrishnan et al. (1999). (Bottom) Em-pirical probability density function of BNP Paribas unnor-malized log-returns over a period of time ! = 5 minutes.

trading. Except where mentioned otherwise in captions,this data set will be used for all empirical graphs in thissection. On figure 2, cumulative distribution in log-logscale from Gopikrishnan et al. (1999) is reproduced. Wealso show the same distribution in linear-log scale com-puted on our data for a larger time scale ! = 1 day,showing similar behaviour.

Many studies obtain similar observations on di!erentsets of data. For example, using two years of data onmore than a thousand US stocks, Gopikrishnan et al.(1998) finds that the cumulative distribution of returns

asymptotically follow a power law F (r! ) ! |r|!" with" > 2 (" " 2.8 # 3). With " > 2, the second mo-ment (the variance) is well-defined, excluding stable lawswith infinite variance. There has been various sugges-tions for the form of the distribution: Student’s-t, hyper-bolic, normal inverse Gaussian, exponentially truncatedstable, and others, but no general consensus exists onthe exact form of the tails. Although being the mostwidely acknowledged and the most elementary one, thisstylized fact is not easily met by all financial modelling.Gabaix et al. (2006) or Wyart and Bouchaud (2007) re-

10-3

10-2

10-1

100

0 0.02 0.04 0.06 0.08 0.1

Cu

mu

lati

ve d

istr

ibu

tio

n

Absolute log-returns

SP500Gaussian

Student

FIG. 2. Empirical cumulative distributions of S&P 500 dailyreturns. (Top) Reproduced from Gopikrishnan et al. (1999),in log-log scale. (Bottom) Computed using o!cial daily closeprice between January 1st, 1950 and June 15th, 2009, i.e.14956 values, in linear-log scale.

call that e"cient market theory have di"culties in ex-plaining fat tails. Lux and Sornette (2002) have shownthat models known as “rational expectation bubbles”,popular in economics, produced very fat-tailed distribu-tions (" < 1) that were in disagreement with the statis-tical evidence.

2. Absence of autocorrelations of returns

On figure 3, we plot the autocorrelation of log-returnsdefined as #(T ) ! $r! (t + T )r! (t)% with ! =1 minuteand 5 minutes. We observe here, as it is widely known(see e.g. Pagan (1996); Cont et al. (1997)), that thereis no evidence of correlation between successive returns,which is the second “stylized-fact”. The autocorrelationfunction decays very rapidly to zero, even for a few lagsof 1 minute.

5 minutes Returns

PDF

Aut

ocor

rela

tion

T (minutes)

BNP Paribas, 2007/08, distribuzione rendimenti a τ = 5 min (sinistra).Vodaphone, 2002, 110o giorno, autocorrelazione rendimenti,rendimenti quadrati e assoluti a τ = 1 min (destra)Clustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 4of 24


Technical Trading

Assunzioni principali: apertamente in contrasto con EMHmercato sconta tutto: ”storia” del prezzo é unico θt necessarioprezzo si muove in trend: rialzista, ribassista o costantestoria ripete se stessa: investitori reagiscono allo stesso modo alripetersi di condizioni simili (figures =⇒ indicatori tecnici)

Time (hours)

Pric

e (t

icks

)

CAPITOLO 5. ANALISI TECNICA 70

Figura 5.2: Esempio di supporto (linea verde) e di supporto (linea rossa).Fonte: stockcharts.com

gerisce che quando il prezzo scende avvicinandosi al supporto i compratorisono più inclini a comprare e i venditori meno inclini a vendere. Quandoil prezzo raggiunge il livello del supporto si crede che la domanda superil’o!erta e impedisce che il prezzo cada sotto il supporto. Una resistenza èquel livello del prezzo al quale si pensa che l’o!erta sia forte abbastanza daimpedire che il prezzo salga ancora. Quando il prezzo si avvicina alla resi-stenza, i venditori sono più inclini a vendere e i compratori diventano menoinclini a comprare. Quando il prezzo raggiunge il livello della resistenza,si crede che l’o!erta superi la domanda in modo da impedire che il prezzosalga ancora (Cf. [29]). In figura 5.2 è mostrato un esempio di supporto eresistenza.

I supporti e le resistenze verranno ampiamente trattati nel capitolo 6 incui si discuterà la possibilità di introdurre una definizione quantitativa chesarà testata per mezzo dell’analisi statistica delle serie temporali finanziarie.

Time (months) Pr

ice

(tic

ks)

Unilever, 66o giorno, 2002, media mobile a 5min (sinistra) ed esempiodi Supporto e Resistenza Lilly Ely & Co. il 2/2/2000 (destra)



Pattern Recognition (PR)

Esempi: filtro anti-spam, macchina per riciclaggio, protocollo OCRSistema PR α: estrae caratteristiche x da oggetto e lo associa aclasse/modello

α : x→ {ω1, ω2, ..., ωc}

Classi note: Apprendimento Supervisionato tramite training setdi associazioni note

Di = {x1, ..., xni} =⇒ classe ωi

Classi NON note: Clustering =⇒ informazioni solo dasomiglianza tra dati =⇒ serie temporali, varie metodologie:

dati grezzi, previa detrendizzazione e standardizzazionemodelli probabilistici: regressioni, apprendimento bayesiano



Apprendimento Bayesiano

dato il dataset DN = {x1, ..., xN} e modelli descrittivi m ∈Mscopo: selezionare m che descriva meglio DN secondo regola di Bayes:

P(m|DN) =P(DN |m)P(m)

P(DN)dove:

∑

m∈MP(m) = 1

Prior P(m): confidenza su m, prima di misurare DNLikelihood P(DN |m): probabilitá di DN , se m é il modello giustoPosterior P(m|DN): aggiornamento confidenza su m, dato DN

Modello m con parametri {θ}⇓

(Marginal) Likelihood:

P(DN |m) =∫P(DN |θ,m)P(θ)dθ

too simple

too complex

"just right"

All possible data sets

P(D

|mi)

D

Figure 4: The marginal likelihood (evidence) as a function of an abstract one dimensional representationof “all possible” data sets of some size N . Because the evidence is a probability over data sets, it mustnormalise to one. Therefore very complex models which can account for many datasets only achieve modestevidence; simple models can reach high evidences, but only for a limited set of data. When a dataset D isobserved, the evidence can be used to select between model complexities.

11.1 Laplace approximation

It can be shown that under some regularity conditions, for large amounts of data N relative to the number ofparameters in the model, d, the parameter posterior is approximately Gaussian around the MAP estimate, ✓:

p(✓|D,m) ⇡ (2⇡)�d2 |A| 12 exp

⇢�1

2(✓ � ✓)>A (✓ � ✓)

�(51)

Here A is the d⇥ d negative of the Hessian matrix which measures the curvature of the log posterior at theMAP estimate:

Aij = � d2

d✓id✓jlog p(✓|D,m)

��✓=✓

(52)

The matrix A is also referred to as the observed information matrix. Equation (51) is the Laplace approxi-mation to the parameter posterior.

By Bayes rule, the marginal likelihood satisfies the following equality at any ✓:

p(D|m) =p(✓,D|m)

p(✓|D,m)(53)

The Laplace approximation to the marginal likelihood can be derived by evaluating the log of this expressionat ✓, using the Gaussian approximation to the posterior from equation (51) in the denominator:

log p(D|m) ⇡ log p(✓|m) + log p(D|✓,m) +d

2log 2⇡ � 1

2log |A| (54)

11.2 The Bayesian information criterion (BIC)

One of the disadvantages of the Laplace approximation is that it requires computing the determinant of theHessian matrix. For models with many parameters, the Hessian matrix can be very large, and computingits determinant can be prohibitive.

The Bayesian Information Criterion (BIC) is a quick and easy way to compute an approximation to themarginal likelihood. BIC can be derived from the Laplace approximation by dropping all terms that do notdepend on N , the number of data points. Starting from equation (54), we note that the first and third termsare constant with respect to the number of data points. Referring to the definition of the Hessian, we cansee that its elements grow linearly with N . In the limit of large N we can therefore write A = NA, where

23

All possible Datasets

P(D

|m)



Il Dataset

DN : serie temporali sec-by-sec di 9 azioni scambiate presso LSE,2002: Shell, Unilever, Vodaphone, Royal Bank of Scotland...Prezzo in ticks =⇒ tick-min: 0.01, 0.25, 0.5, 1 SterlineSerie P(t) =⇒ riscalaggio ogni T = 45, 60, 90, 180 secondi: PT (t)

Time (hours)

Pric

e (t

icks

)

T = 5 min

T = 10 min

T = 15 min

Shell, 55o giorno di trading del 2002, T = 0, 5, 10, 15 minClustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 8of 24


Caratterizzazione Supporti e Resistenze

AMPIEZZA STRISCIA δ: FLUTTUAZIONI TIPICHE GIORNALIERE di PREZZO

Time (hours)

Pric

e (t

icks

)

Heritage Financial Group, 103o giorno, 2002 - 2 esempi di ResistenzeClustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 9of 24


Caratterizzazione Supporti e Resistenze

Time (hours)

Pric

e (t

icks

)

Royal Bank of Scotland, 54o giorno, 2002 - esempio di SupportoClustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 10of 24


p(rimbalzo|numero di rimbalzi precedenti)

Previous bounces (i =1,2,3,4,5)

P(bo

unce

| i)

p(b|i = 1, 2, 3, 4, 5) - Resistenze - T = 45 secondi & RW compatibileClustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 11of 24


Tempo tra Rimbalzi

log(recurrence time)

log(

PDF/

bin

wid

ths)

distribuzione tempo di ricorrenza - Resistenze & Supporti - T = 45 sec



Tempo nei Rimbalzi: τ

log(τ)

log(

PDF/

bin

wid

ths)

distribuzione dimensione finestra - Resistenze & Supporti - T = 45 sec



Fluttuazioni Tipiche di Prezzo attorno ai Rimbalzi

Max Fluctuation / tick-min

Occ

urre

nces

AstraZeNeca, 2002 - fluttuazioni tipiche attorno a rimbalzi su livelli diSupporti / tick-min (0.5 Sterline) - ampiezza finestre: τ = 150



Caratterizzazione Bayesiana della Miglior Partizione

dataset: DN = {xi}i=1,...,N dove xi é la serie xi = {xi1 , xi2 , ..., xiτ }scopo: trovare migliore partizione

m = {(x4, x17, x25), (x1, x27), ..., (xN), ...}= {C1, C2, ..., Cn}

=⇒ quella che massimizza la Posterior P(m|DN):

P(m|DN) ∝ P(DN |m)︸︷︷︸likelihood

×prior︷︸︸︷P(m) =

∏

Ck∈m

cluster-likelihood︷︸︸︷∏

xi∈Ck

Nxi (µk ,Σi ,k)︸︷︷︸likelihood di serie

×Nn(0, σp)︸︷︷︸n clusters

in cui:serie media intra-cluster: µk = 〈xi 〉Ckserie varianza intra-cluster: σ2

k = 〈x2i 〉Ck − µ2

k

matrice di covarianza di singola serie: Σi ,k = diag(√δ2i + σ2

k)

δi ampiezza striscia =⇒ dispersione intrinseca della serie xi



Step MCMC

Catena nello spazio delle partizioni =⇒ proposta nuova partizione:se Posterior aumenta: accetto!altrimenti: soglia random u ∼ U[0:1]

3

4

5

2

5

5

4

4

3

6

4

2 6



Step Splitting & Step Merging

Splitting: proposta di divisione in 2, cluster per cluster=⇒ accettata se guadagno Likelihood é maggiore di ∆RANDOM

SPLITTING (a sx)

Merging: proposta di unione piú clusters assieme, a 2 a 2=⇒ accettata se perdita Likelihood é inferiore di ∆RANDOM

MERGING (a dx)

N

Δ RANDOM

SPL

ITT

ING

N2

N1

Δ RANDOM MERGING



Serie Standard ∈ [trimbalzo − τ2 : trimbalzo +

τ2 ]× [−1 : 1]

δ δ ^

TR

AN

SLAT

ION

RE

SCA

LIN

G

Time (T units)

4° bounce Pr

ice

Res

cale

d 1

-1

Rio Tinto, giorno 202o , 2002 - T = 45 sec - τ = 100 - centrata su 4o

rimbalzo su Resistenza - riscalata in [t4o − τ/2 : t4o + τ/2]× [−1 : 1]



Toy Model

5 serie madre: {xmadre}k=1,2,3,4,5 dove (xk1 , ..., xkτ ) ∼ U[−1:1]=⇒ partizione corretta é nota: 5 clustersN serie figlie: {xfiglia i di madre k}i=1,...,N = N ({xmadre}k , diag(σ))

Esempio di Serie Madre in [-1:1] & Figlie: fluttuazioni gaussianeattorno ad essa



Risultati: Toy Model

Num

ber

of

Clu

ster

s

σ prior

correct partition: 5 clusters

Numero di Cluster Vs σprior ed esempio di partizione sub-ottimale



Risultati: Serie Reali - T = 180 sec - τ = 100

Pric

e R

esca

led

Time Time

Stock tickers Trading days

Occ

urre

nces

HBOS

RBS

Es. Cluster Ck - confidenza a 1 σk e 2 σk - occorrenze stock e giornoClustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 21of 24


Analisi Causa-Effetto: Serie - T = 180 sec - τ = 100

Cause-cluster

Time right Time left

Pric

e R

esca

led

Occ

urre

nces

9°

Cluster-effetto C5 - occorrenza clusters-causa - cluster-causa C9Clustering di Serie Temporali Finanziarie ed Evidenze di Effetti di Memoria 14/12/2012 22of 24


Conclusioni e Analisi Future

Si conclude che:Confermati effetti di memoria da aumento dip(rimbalzo|numero di rimbalzi precedenti)Evidenze visive somiglianza serie del prezzoPossibili effetti di causa-effetto nella dinamica degli eventi

Prospettive di analisi future:serie piú lunghe =⇒ dinamiche stagionaliserie standardizzate ad incremento medio = 1 =⇒ esamevariazioni relative di prezzoserie in tick-time =⇒ effetto netto transazioni



Grazie per l’attenzione!