170

Click here to load reader

Quantum Statistics

Embed Size (px)

Citation preview

Page 1: Quantum Statistics

Quantum Statistics

Madalin Guta

School of Mathematics

University of Nottingham

1

Page 2: Quantum Statistics

The old paradigm

Quantum Mechanics up to the 80’s

I Quantum measurements have random results

I Only probability distributions can be predicted

I Perform measurements on huge ensembles

I Observe averages

Old ParadigmIt makes no sense to talk about individual systems

E. Schrodinger [1952]: We are not experimenting with singleparticles, any more than we can raise Ichtyosauria in the zoo

2

Page 3: Quantum Statistics

The new paradigm

Individual quantum systems are carriers of a new type of information

Delft qubit [2003] [Naik et al, Nature, 2006]

[Haffner et al, Nature, 2005] [Monroe, Nature, 2002]

3

Page 4: Quantum Statistics

Quantum Information Science

I Quantum Information

I quantum entropyI correlations (entanglement) between quantum systemsI capacity of quantum channel for information transmission

I Quantum ComputationI algorithms for quantum computers (e.g. Shor’s factoring algorithm)I error correction theoryI different practical implementations of quantum circuits

(ion traps, photons, solid state...)

I Quantum Filtering and ControlI stochastic evolution and continuous time measurementsI protecting quantum systems from ‘decoherence’I steering systems towards a desired state

I Quantum Probability and StatisticsI unified framework for classical and quantum stochasticsI measurement design for optimal statistical inferenceI use probabilistic ideas in operator algebra theory

4

Page 5: Quantum Statistics

Quantum measurements

I Every quantum system has an associated Hilbert space, e.g. Cd

I Density matrix (quantum state) encodes all information about thepreparation of the system

ρ =

0BBB@ρ11 ρ12 . . . ρ1d

ρ21 ρ22 . . . ρ2d

......

. . ....

ρd1 ρd2 . . . ρdd

1CCCA ≥ 0, Tr(ρ) = 1

I A measurement M with values in Ω = 1, 2, . . . , k is given by a ‘positiveoperator valued measure’

Mi ∈ M(Cd), Mi ≥ 0,kX

i=1

Mi = 1

I Statistical interpretation: The outcome X is random and the probabilitythat X = i when the system is prepared in state ρ is

P(M)ρ ([X = i ]) = Tr(ρMi )

5

Page 6: Quantum Statistics

Quantum Statistics

I Quantum mechanics makes predictions about the direct map:

M : ρ 7−→ P(M)ρ

I What if ρ is not known ? A quantum statistical model (experiment) is afamily of states indexed by a parameter θ belonging to a space Θ

Q := ρθ : θ ∈ Θ

I For each M we obtain a classical statistical model PM := P(M)

ρθ : θ ∈ Θand we can apply ‘classical’ statistical tools to solve inverse problems like

X ∼ P(M)ρ 7−→ θ(X ) ≈ θ

I Questions:I for which measurements is θ identifiable?

I which measurements are optimal for a given statistical problem?

I how much statistical information does Q contain?

I can we develop a theory of statistical models at the quantum level ?

6

Page 7: Quantum Statistics

Motivation/Applications Mesure quantique

ρθ∼∼∼∼∼∼∼∼∼∼ M

appareil

de mesure

- -X ∼ PθM

resultat

θ (X)

estimateur

Quantum Engineering

I statistical validation through measurements of new quantum statesand devices

I quantum state/process estimation

Quantum Information and Computation

I encoding and decoding information with quantum states

I state discrimination

Statistics

I extend statistical decision theory to non-commutative models

I connections with Quantum Probability, Quantum Control...

7

Page 8: Quantum Statistics

History of Quantum Statistics

R.L. Stratonovich

[1966] Transmission rates for quantum channels

C. W. Helstrom

[1967-1976] “Quantum Detection and Estimation Theory”

V. P. Belavkin

[1975] Optimal multiple hypothesis testing[1976] Generalised uncertainty relations

8

Page 9: Quantum Statistics

History of Quantum Statistics

A. S. Holevo

[1972] non-commutative statistical decision theory[1982] “Probabilistic and Statistical Aspects of Quantum Theory”

The JapaneseSchool

H. Nagaoka, A. Fujiwara, M. Hayashi, K. Matsumoto

[1987-] Differential geometric aspects of quantum state estimation[1996-] Quantum Fisher information and asymptotic estimation

R. D. Gill

[1998-] Asymptotic bounds and optimal quantum estimation[2001-] Statistical approach to Bell inequalities

9

Page 10: Quantum Statistics

Useful references

I BOOKS

1. C. W. Helstrom, Quantum Detection and Estimation Theory (1976)

2. A. S. Holevo: Probabilistic and statistical aspects of quantum theory (1982)

3. M.A. Nielsen and I.L. Chuang: Quantum computation and quantuminformation (2000)

I ONLINE LECTURE NOTES

1. R. Gill et. al: Quantum Statistics [book draft]http://www.math.leidenuniv.nl/∼gill/teaching/quantum/pages from Qbook.pdf

2. H. Maassen: Quantum Probability Theoryhttp://www.math.ru.nl/ maassen/lectures/qp.pdf

3. N.P. Landsman: Lecture notes on C∗-algebras, Hilbert C∗-modules andQuantum Mechanicshttp://xxx.lanl.gov/pdf/math-ph/9807030

I PAPERS

1. Artiles, L, Gill, R., Guta, M., An invitation to quantum tomography, J.Royal Statist. Soc. B, 67, (2005), 109-134.

2. Barndorff-Nielsen O.E., Gill, R., Jupp, P. E., On quantum statisticalinference (with discussion), J. R. Statist. Soc. B, 65, (2003), 775-816.

3. Guta M., Janssens B., Kahn J., Optimal estimation of qubit states withcontinuous time measurements, Commun. Math. Phys., 277, (2008),127-160.

10

Page 11: Quantum Statistics

Color code

red is used for keywords

brown is used for notions which are defined in appendices

11

Page 12: Quantum Statistics

Quantum Mechanics as non-commutative probability theory

(Ω,Σ, ν)measure space

‘Space’HHilbert space

L∞(Ω,Σ, ν)bounded randomvariables

Observables

B(H)bounded selfadjointoperators

p ∈ L1(Ω,Σ, ν)probability densities

Statesρ ∈ T1(H)density matrices

(p, f ) 7→R

p(ω)f (ω)ν(dω)Pairing =expectations

(ρ,A) 7→ Tr(ρA)

L∞(Ω,Σ, ν) = L1(Ω,Σ, ν)∗ Duality B(H) = T1(H)∗

T : L1(Ω,Σ, ν)→ L1(Ω,Σ, ν)randomisations

I positive

I normalised

Transformations

C : T1(H)→ T1(H)quantum channel

I complet. positive

I normalised

12

Page 13: Quantum Statistics

Hilbert spaces

I Inner product space

I Hilbert space

I Orthonormal basis

I Physical examples

13

Page 14: Quantum Statistics

Inner product spaces

DefinitionAn inner product over a C-linear space V is a map 〈·, ·〉 : V × V → Csatisfying the following conditions for all u, v ∈ V , λ ∈ C:

I 〈u, u〉 ≥ 0 for all u ∈ V and 〈u, u〉 = 0 if and only if u = 0.

I 〈u, v + w〉 = 〈u, v〉+ 〈u,w〉I 〈u, λv〉 = λ〈u, v〉I 〈u, v〉 = 〈v , u〉

Example

I Cn: n-tuples u := (u1, u2, . . . un) of complex numbers with

〈u, v〉 =nX

j=1

ujvj

I C [a, b]: continuous complex valued functions on [a, b] with

〈f , g〉 =

Z b

a

f (x)g(x)dx

14

Page 15: Quantum Statistics

Hilbert spaces

Definition (Hilbert space)

An inner product space (H, 〈·, ·〉) is called a Hilbert space if it is complete withrespect to the norm ‖h‖ :=

p〈h, h〉.

Example

I L2([a, b]): the space of square integrable functions on [a, b] with

〈f , g〉 :=

Zf (x)g(x)dx

I L2(Ω,Σ,P) the space of square integrable random variables on (Ω,Σ,P)with

〈X ,Y 〉 := E(X Y ) =

ZX (ω)Y (ω)P(dω)

15

Page 16: Quantum Statistics

Orthonormal basis (ONB) in a separable Hilbert space

DefinitionLet (H, 〈·, ·〉) be a Hilbert space. A sequence of vectors ek1≤k≤N is an ONBof H if its linear span is dense in H and 〈ei , ej〉 = δi,j for all i , j .

H is separable if and only if it has a countable ONB.

Properties

I Any vector x ∈ H has a unique decomposition x =P

k xkek wherexk = 〈ek , x〉 are the Fourier coefficients w.r.t. the ONB ek. Thefollowing Parseval equality holds

‖x‖2 =X

k

|〈ek , x〉|2.

I If K is a closed subspace of H and K⊥ its orthogonal complement then xhas a unique decomposition x = y + y⊥ with y ∈ K and y⊥ ∈ K⊥, and‖x‖2 = ‖y‖2 + ‖y⊥‖2. The vector y is called the orthogonal projection ofx onto K and satisfies

y = arg minz∈K‖z − x‖

16

Page 17: Quantum Statistics

Direct sum and tensor products of Hilbert spaces

DefinitionLet H1,H2 be Hilbert spaces.

1. The direct sum H1 ⊕H2 is the Hilbert space consisting of ordered pairs

h1 ⊕ h2 ≡ (h1, h2) ∈ H1 ×H2

with inner product

〈g1 ⊕ g2, h1 ⊕ h2〉 = 〈g1, h1〉+ 〈g2, h2〉

H1 and H2 can be seen as orthogonal complements in H1 ⊕H2 byidentifying h1 ∈ H1 with h1 ⊕ 0 and h2 ∈ H2 with 0⊕ h2.

2. The tensor product H1 ⊗H2 is Hilbert space obtained as the normcompletion of the algebraic tensor product H1H2 w.r.t the inner product

〈g1 ⊗ g2, h1 ⊗ h2〉 := 〈g1, h1〉〈g2, h2〉

If ei and fj are ONB in H1 and H2 then ei ⊗ fj is an ONB inH1 ⊗H2

17

Page 18: Quantum Statistics

Which Hilbert space corresponds to a given quantum system ?

I C2 for a spin, 2 level system, qubit

I C2 ⊗ C2 ⊗ · · · ⊗ C2 for n qubits

I L2(R) for a particle in one dimension, harmonic oscillator

I F = ⊕∞n=0H⊗sn for bosonic many particle systems, quantum noise

I L2(Ω,Σ, ν) square integrable random variables on (Ω,Σ, ν)

18

Page 19: Quantum Statistics

Hilbert space Operators

I Bounded operators

I The adjoint

I Selfadjoint operators

I Unbounded selfadjoint operators

19

Page 20: Quantum Statistics

Bounded selfadjoint operators

DefinitionLet H be a Hilbert space. A linear map A : H → H is called a bounded linearoperator on H if

‖A‖ := suph 6=0

‖Ah‖‖h‖ <∞.

The space of bounded operators on H is denoted B(H).

Example (exercise)

I Any linear transformation of Cd is bounded

I The shift Sy given by (Sy f )(x) = f (x − y) is a bounded operator on L2(R)

I The Volterra operator Tf (s) =R s

0k(s, t)f (t)dt with |k(s, t)| < C is a

bounded operator on L2([0, 1])

Theorem(B(H), ‖ · ‖) is a Banach algebra, i.e. a Banach space which is also an algebraand satisfies ‖A · B‖ ≤ ‖A‖ ‖B‖ for all A,B ∈ B(H)

20

Page 21: Quantum Statistics

Adjoint, selfadjoint, C∗-property

Definition

I Let A ∈ B(H). The adjoint A∗ of A is defined by

〈g ,A∗h〉 = 〈Ag , h〉

I A is called selfadjoint if A = A∗

Lemma (C∗-property)

Let A ∈ B(H). Then ‖A∗‖2 = ‖A‖2 = ‖A∗A‖.

Proof.From

‖Ah‖2 = 〈Ah,Ah〉 = 〈h,A∗Ah〉 ≤ ‖h‖2‖A∗A‖

we get ‖A‖2 ≤ ‖A∗A‖.Together with ‖A∗A‖ ≤ ‖A‖‖A∗‖ this implies ‖A‖ ≤ ‖A∗‖.A similar argument show that ‖A∗‖ ≤ ‖A‖

21

Page 22: Quantum Statistics

Examples of bounded selfadjoint operators

I Let H = Cd and let ei be the standard basis in Cd . ThenB(H) ≡ M(Cd) by identifying A with the matrix [Ai,j ], where

Ai,j = 〈ei ,Aej〉

Then, A ∈ B(H) is selfadjoint iff Ai,j = Aj,i ( hermitian matrix)

I The Pauli matrices

σx =

„0 11 0

«σy =

„0 −ii 0

«σz =

„1 00 −1

«are selfadjoint and form a basis of M(C2) together with the identity 1.

I Let K be a closed subspace of H. Let x = y + y⊥ be the uniqueorthogonal decomposition of x ∈ H, with y ∈ K and y⊥ ∈ K⊥.

The orthogonal projection PK onto K defined by P : x 7→ y is a selfadjointoperator satisfying P = P2:

〈x1,Px2〉 = 〈x1, y2〉 = 〈y1, y2〉 = 〈Px1,Px2〉

22

Page 23: Quantum Statistics

Examples of unbounded selfadjoint operators

DefinitionAn (unbounded) linear operator on H is defined as a linear mapR : D(R)→ H, whose domain D(R) is a dense linear subspace of H.

The domain of R∗ consists of those h for which there exists g := R∗(h) so that

〈Rk, h〉 = 〈k, g〉, ∀k ∈ D(R)

R is selfadjoint if D(R) = D(R∗) and R = R∗ on their common domain.

Example

The position and momentum operators Q and P are selfadjoint on L2(R).

I Q : h 7→ (Qh)(x) = xh(x)

with domain D(Q) = h :R|xh(x)|2dx <∞

I P : f 7→ −idf

dx

with domain D(P) = f : f (b)− f (a) =R b

ag(x)dx , g ∈ H

23

Page 24: Quantum Statistics

Spectral Theorem

I Spectral Theorem in finite dimensions

I Spectrum and resolvent

I Projection valued measures

I Spectral Theorem for bounded selfadjoint operators

I Continuous functional calculus

I Spectral Theorem: multiplication operator form

I L∞ functional calculus

I Multiplicity Theory

24

Page 25: Quantum Statistics

Spectral theorem in finite dimensions

Theorem (diagonalisation = spectral theorem)

Let A be selfadjoint operator on Cd . Then there exists an ONB of eigenvectorsof A:

Afk = λk fk , k = 1, . . . d

where λk ∈ R are the eigenvalues of A.Let Pk be the one-dimensioanal projections associated to fk , then

A =dX

k=1

λkPk =

0BBB@λ1 0 . . . 00 λ2 . . . 0...

.... . .

...0 0 . . . λd

1CCCA

Remark

I If A = A∗ ∈ M(Cd) then ‖A‖ = max(|λ1|, . . . , |λd |)I If A,B are selfadjoint and commute, i.e. AB = BA, then they have a

commun eigenbasis, so they can be diagonalised simultaneously. IfAB 6= BA no such basis exists.

25

Page 26: Quantum Statistics

Resolvent and spectrum

DefinitionLet A ∈ B(H). A complex number α is said to be in the resolvent set ρ(A) ifα1− A is a bijection, with bounded inverse.The spectrum of A is defined as σ(A) = C \ ρ(A).

Properties

The spectrum σ(A) is

I contained in the set α ∈ C : |α| ≤ ‖A‖I compact

I non-empty

If A is selfadjoint then σ(A) ⊂ R and r(A) := supλ∈σ(A) |λ| = ‖A‖

Example (exercise)

I The matrix σ+ :=

„0 10 0

«has spectrum σ(σ+) = 0

I Let f ∈ C([0, 1]). The multiplication operator Mf ∈ B(L2([0, 1])) hasspectrum σ(Mf ) = y : f (x) = y for some x ∈ [0, 1]

I if U is unitary (UU∗ = U∗U = 1) then σ(U) ⊂ λ ∈ C : |λ| = 126

Page 27: Quantum Statistics

Projection valued measure (PVM)

DefinitionLet An be a sequence of operators in B(H). We say that

I An converges in norm to A ∈ B(H) if limn→∞ ‖An − A‖ = 0

I An converges strongly to A ∈ B(H) if limn→∞ ‖(An − A)h‖ = 0 forany h ∈ H

DefinitionA projection valued measure (PVM) over a measure space (Ω,Σ) is a mapP : Σ→ B(H) which satisfies

I P(E) is an orthogonal projection for each E ∈ Σ

I P is σ-additive: for any countable family Ei of mutually disjoint setsP(∪∞i=1Ei ) =

P∞i=1 P(Ei ) (sum converging strongly)

I P(Ω) = 1

For any unit vector h ∈ H we define the probability measure on (Ω,Σ)

Ph(E) = 〈h,P(E)h〉

27

Page 28: Quantum Statistics

Examples of projection valued measures

Example

I Let eii=1,...,N be an ONB in H. Then the corresponding orthogonalprojections Pi define a PVM over 1, . . . ,N with P(E) =

Pi∈E Pi

The measure Ph is given by Ph(i) = |〈h, ei 〉|2

I Let H = L2(Ω,Σ, ν) and let P(E) be the projection onto the subspace offunctions with support in E

P(E) : f 7→ f · χE

The measure Ph is given by Ph(dω) = |h(ω)|2ν(dω)

28

Page 29: Quantum Statistics

Spectral Theorem

Theorem (Spectral Theorem)

Let A ∈ B(H) be selfadjoint. Then there exists a PVM P over R such that

A =

ZλP(dλ)

in the sense that 〈h,Ah〉 =RλPh(dλ) for every h ∈ H.

The PVM is supported by the spectrum: P(σ(A)) = 1.

Example (exercise)

The multiplication operator Mx : f (x) 7→ xf (x) on L2([0, 1]) does not have anyeigenvalue but has a ‘continuous’ PVM with

P(E) : f 7→ f · χE

the projection onto the subspace of L2([0, 1]) functions with support in E

29

Page 30: Quantum Statistics

Main steps of the proof

I Continuous functional calculus: define f (A) for f ∈ C(σ(A))

I Spectral Theorem - multiplication operator form: A is unitarily equivalentto a multiplication operator on L2

I L∞ calculus: define f (A) for f ∈ L∞(σ(A), µ) such that

P(E) = χE (A)

30

Page 31: Quantum Statistics

Proof of the Spectral Theorem (I)

Theorem (Continuous functional calculus)

There is a unique map φ : C(σ(A))→ B(H) with the properties

(i) φ is a C∗-algebra morphism, i.e.

φ(fg) = φ(f )φ(g), φ(λf ) = λφ(f ), φ(f ) = φ(f )∗, φ(1) = 1

(ii) φ is isometric: ‖φ(f )‖ = ‖f ‖∞(iii) let Id be the function Id(λ) = λ; then φ(Id) = A

Proof.

1. Let P(λ) =Pp

n=1 anλn be a polynomial and let P(A) :=

Ppn=1 anAn. Then

(i) and (iii) are satisfied for polynomials by choosing φ(P) = P(A)

2. Show that σ(P(A)) = P(λ) : λ ∈ σ(A) (exercise)

3. Show that ‖P(A)‖ = supλ∈σ(A) |P(λ)| :

‖P(A)‖2 = ‖P(A)∗P(A)‖ = ‖(PP)(A)‖ = supλ∈σ((PP)(A))

|λ| 2.= supλ∈σ(A)

|PP(λ)|

4. Extend φ by continuity from polynomials to the whole C(σ(A))

31

Page 32: Quantum Statistics

Proof of the Spectral Theorem (II)

DefinitionLet h ∈ H be a unit vector. Then f 7→ 〈hf (A)h〉 is a positive linear functionalon C(σ(A)) and by the Riesz-Markov Theorem there exist a probabilitymeasure µh on σ(A) such that 〈h, f (A)h〉 =

Rσ(A)

f (λ)µh(dλ)

The measure µh is called the spectral measure assciated to h.

DefinitionA vector h is cyclic for A ∈ B(H) if the span of Anh∞n=0 is dense in H.

TheoremLet A ∈ B(H) be selfadjoint with cyclic vector h. Then there exists a unitaryU : H → L2(σ(A), µh) such that

(UAU−1f )(λ) = λf (λ)

32

Page 33: Quantum Statistics

Proof of the Spectral Theorem (III)

Proof.

1. Define U by U : φ(f )h 7→ f for all f ∈ C(σ(A)).Then U is norm preservingby

‖φ(f )h‖2 = 〈h, φ(f )∗φ(f )h〉 = 〈h, φ(f f )h〉 =

Z|f (λ)|2µh(dλ)

2. Since h is cyclic and C(σ(A)) is dense in L2, U can be extended to aunitary operator

3. Check that UAU−1 acts as multiplication by λ on functions in C(σ(A))

(UAU−1f )(λ) = [UAφ(f )h](λ) = [Uφ(Id)φ(f )h](λ) = [Uφ(Id·f )h](λ) = λf (λ)

33

Page 34: Quantum Statistics

Proof of the Spectral Theorem (IV)

RemarkIn general there may not exist a cyclic vector, for example if A has a degenerateeigenvalue, i.e. there exist at least two linearly independent eigenvectors

Theorem (Spectral Theorem - multiplication operator form)

Let A ∈ B(H) be selfadjoint. Then there exist unit vectors hiNi=1 in H and aunitary operator U : H → ⊕N

i=1L2(R, µhi ) such that

(UAU−1f )i (λ) = λfi (λ), i ≥ 1

Proof.Using Zorn lemma we can split H into a direct sum of subspaces Hi such that

I A leaves each Hi invariant, i.e. Ah ∈ Hi for all h ∈ Hi

I For each i there exists a vector hi ∈ Hi which is cyclic for A|Hi

We then apply the previous Theorem for each cyclic subspace

34

Page 35: Quantum Statistics

Proof of the Spectral Theorem (V)

Theorem (L∞ functional calculus)

Let µ be a probability measure on σ(A) such that µ ∼ µhi i≥1, i.e.µ(E) = 0 iff µi (E) = 0 for all i

Then there exists a unique morphism φ : L∞(σ(A), µ)→ B(H) such that

(i) φ is an extension of φ : C(σ(A))→ B(H) (Continuous Functional Calc.)

(ii) φ is isometric

(iii) P(E) = φ(χE )

(iv) φ is normal, i.e. it is continuous with respect to the weak∗ topology onL∞(σ(A), µ) and B(H)

35

Page 36: Quantum Statistics

Proof of the Spectral Theorem (V)

Proof of (i)-(iii).

1. By the previous Theorem, for any f ∈ C(σ(A)) we have

φ(f ) = U−1 [⊕i Mi (f )] U

where Mi (f ) is the multiplication by f on L2(σ(A), µhi )

2. This map can be extended to L∞(σ(A), µ). Indeed since µ µhi , theoperator Mi (f ) : g 7→ f · g is well defined on L2(σ(A), µ) (exercise)

3. φ is isometric: for any f ∈ L∞(σ(A), µ) there is an i such that‖f ‖∞ = ‖Mi (f )‖

4. We have

f (A) := φ(f ) =

Zf (λ)P(dλ), f ∈ L∞(σ(A), µ)

where the spectral projections of A are

P(E) = χE (A) = φ(χE ) = U−1 [⊕i Mi (χE )] U

36

Page 37: Quantum Statistics

Further spectral analysis: multiplicity theory

The choice of cyclic vectors hi and spectral measures µhi is not unique, and itis not clear how to use them to answer the following natural question:

Question: Given two selfadjoint operators A,B, does there exist a unitary Vsuch that A = VBV−1?

Answer in finite dimensions: two selfadjoint matrices are unitarily equivalent ifthey have the same spectrum and same multiplicities for each eigenvalue.

Theorem (Hahn-Hellinger)

Any selfadjoint operator is unitarily equivalent to the multiplication operator on

⊕ℵ0i=1 ⊕

ik=1 L2

“σ(A), µ

(A)i

”where all measures µ

(A)i are mutually disjoint.

Two operators are unitarily equivalent if and only if all their measures areequivalent µ

(A)i ∼ µ(B)

i , i.e. they have the same sets of measure zero.

Reference: V.S. Sunder, Functional Analysis-Spectral Theory, Birkhauser, (1998)

37

Page 38: Quantum Statistics

Trace-class operators

I The trace

I Polar decomposition

I Trace-class operators

I Duality between T1(H) and B(H)

38

Page 39: Quantum Statistics

The trace

DefinitionThe trace of a positive operator A ∈ B(H) is defined by

Tr(A) =X

k

〈ek ,Aek〉

where ek is an ONB. The trace is independent of the basis and has thefollowing properties

I Tr(A) is independent of the ONB eiI Tr(A + B) = Tr(A) + Tr(B)

I Tr(λA) = λTr(A), λ ≥ 0

I Tr(UAU∗) = Tr(A) for all unitaries U

I if 0 ≤ A ≤ B then Tr(A) ≤ Tr(B)

39

Page 40: Quantum Statistics

The trace is independent of the ONB

Proof.Given the ONB ek define Tre(A) =

Pk〈ek ,Aek〉. Let fj be another ONB.

Then

Tre(A) =X

k

〈ek ,Aek〉 =X

k

‖A1/2ek‖2

=X

k

Xj

|〈fj ,A1/2ek〉|2!

=X

j

Xk

|〈A1/2fj , ek〉|2!

=X

j

‖A1/2fj‖2

= Trf (A)

where the sums can be exchanged since all terms are positive.

The other properties are left as an exercise

40

Page 41: Quantum Statistics

The polar decomposition

Definition

I An operator W ∈ B(H) is called a partial isometry if both WW ∗ andW ∗W are orthogonal projections

I The absolute value of B ∈ B(H) is defined by |B| =√

B∗B ≥ 0

Theorem (polar decomposition)

Let B ∈ B(H). Then there exists a partial isometry W such that B = W |B|.W is uniquely determined by the condition that Ker(W ) = Ker(B).

Sketch of the proof.

I The map W : Ran(|B|)→ Ran(B) given by W : |B|h 7→ Bh is welldefined since

‖|B|h‖2 = 〈|B|h, |B|h〉 = 〈h, |B|2h〉 = 〈h,B∗Bh〉 = ‖Bh‖2

I Extend W to extends to an isometry from Ran(|B|) to Ran(B) and to

zero on Ran(|B|)⊥

.

I Since Bh = 0⇔ |B|h = 0, we have Ker(W ) = Ker(|B|) = Ker(B)

41

Page 42: Quantum Statistics

Trace-class operators

DefinitionThe space of trace-class operators is

T1(H) = τ ∈ B(H) : ‖τ‖1 := Tr|τ | <∞.

Properties

1. T1(H) is a Banach space

2. Let A ∈ T1(H) be selfadjoint. Then A has a complete basis of eigenvectorsei with eigenvalues λi such that A =

Pi λi Pei and ‖A‖1 =

Pi |λi |

3. If A ∈ T1(H) and B ∈ B(H) then

I A∗,AB,BA ∈ T1(H)I Tr(AB) = Tr(BA)I Tr(A∗) = Tr(A)I |Tr(AB)| ≤ ‖A‖1 · ‖B‖

RemarkPoint 2. is a particular case of the following: any selfadjoint compact operatorhas discrete spectrum with λi → 0.

Point 3. can be proved using point 2. and the polar decomposition (exercise)

42

Page 43: Quantum Statistics

Point 1.: proof of the triangle inequality

I Let A,B ∈ T1. We will show that ‖A + B‖1 ≤ ‖A‖1 + ‖B‖1. Consider thepolar decompositions

A + B = U|A + B|, A = V |A|, B = W |B|

Then

Tr(|A+B|) =X

k

〈ek ,U∗(A+B)ek〉 ≤

Xk

|〈ek ,U∗V |A|ek〉|+

Xk

|〈ek ,U∗W |B|ek〉|

I Now by applying the Cauchy-Schwarz inequality twiceXk

|〈ek ,U∗V |A|ek〉| ≤

Xk

‖|A|1/2V ∗Uek‖ · ‖|A|1/2ek‖

≤ (X

k

‖|A|1/2V ∗Uek‖2)1/2 · (X

k

‖|A|1/2ek‖2)1/2

I The sums on the right side are equal to Tr(U∗V |A|V ∗U) and Tr(|A|).

I Using the fact that U,V are partial isometries one can show thatTr(U∗V |A|V ∗U) ≤ Tr(|A|) hence the left side is smaller that ‖A‖1

43

Page 44: Quantum Statistics

B(H) = T1(H)∗

DefinitionLet V be a Banach space. The dual V ∗ is the space of continuous linear mapst : V → C. V ∗ is a Banach space when endowed with the norm

‖t‖ = sup‖v‖=1

|t(v)|

TheoremThe space (B(H), ‖ · ‖) is the dual of T1(H) with the pairing

T1(H)× B(H) 3 (τ,A) 7→ Tr(τA)

Sketch of the proof.

1. Show that B(H) ⊂ T1(H)∗.Let B ∈ B(H). Since |Tr(Bτ)| ≤ Tr(|τ |)‖B‖, the linear functionalτ 7→ Tr(τB) is bounded on T1(H)

2. Show that T1(H)∗ ⊂ B(H).Let ` ∈ T1(H)∗. Then `(|h〉〈k|) = 〈k,Bh〉 = Tr(B|h〉〈k|) for a B ∈ B(H).Use the fact that finite rank operators are ‖ · ‖1-dense in T1(H)

44

Page 45: Quantum Statistics

States, Observables and Measurements

I States and observables in Quantum Mechanics

I The weak∗-topology

I Measurements as (completely) positive maps

I Positive operator valued measures

I Naimark’s dilation Theorem

45

Page 46: Quantum Statistics

States and observables in quantum mechanics

DefinitionLet H be the Hilbert space associated to a quantum system

I A (bounded) observable is defined as a selfadjoint operator A ∈ B(H)

I A density matrix is a positive trace-class operator ρ such that Tr(ρ) = 1

I A state on B(H) is a linear functional ϕ : B(H)→ C of the form

ϕ(A) = Tr(ρA)

where ρ is a density matrix.

Lemma (exercise)

Let A =Rσ(A)

λP(dλ) be an observable and let ϕ be a state with density

matrix ρ. Then

Pρ(E) = ϕ(χE (A)) = Tr(P(E)ρ), E ∈ Σ ∩ σ(A)

defines a probability distribution over σ(A).

46

Page 47: Quantum Statistics

Probabilistic interpretations

I Probabilistic interpretation for measurements of observables

If we measure the observable A =RλP(dλ) of a system prepared in state

ϕ with density matrix ρ, we obtain a random result X ∈ σ(A) withdistribution Pρ.

I Probabilistic interpretation for mixtures of states

Recall that any selfadjoint τ ∈ T1(H) has the spectral decompositionτ =

Pλi Pi . In particular if τ is a density matrix then λi ≥ 0 andP

i λi = 1.

The space of density matrices (states) S(H) is convex and its extremalpoints are the one dimensional projections |h〉〈h| called pure states.

If a system is prepared randomly in state ρi with probability µi

(µi ≥ 0,Pµi = 1) and i is unknown, then the corresponding state is

ρ =P

i µiρi

47

Page 48: Quantum Statistics

The weak∗-topology

DefinitionLet V be a Banach space. The weak∗-topology on the dual V ∗ is defined by

the convergence criterion (on nets) : `nw∗→ ` iff `n(v)→ `(v) for all v ∈ V .

Example

I L∞(Ω,Σ, µ) = L1(Ω,Σ, µ)∗

fnw∗→ f iff

Rp(ω)fn(ω)µ(dω)→

Rp(ω)fn(ω)µ(dω) for all p ∈ L1(Ω,Σ, µ)

I B(H) = T1(H)∗

Anw∗→ A iff Tr(τAn)→ Tr(τA) for all τ ∈ T1(H)

TheoremLet V be a Banach space. The linear functionals on V ∗ which are continuouswit respect to the weak∗-topology are precisely those of V ⊂ V ∗∗, v(`) := `(v)for v ∈ V and ` ∈ V ∗.

48

Page 49: Quantum Statistics

weak∗-continuity of φ

RecallThe L∞ functional calculus Theorem associates to the selfadjoint operator A amorphism φ : L∞(σ(A), µ)→ B(H)

Lemmaφ is continuous with respect to the weak∗-topology.

Sketch of the proof.

I Let ρ ∈ S(H).Pρ is dominated by µ. Indeed

µ(E) = 0⇒ P(E) = 0⇒ Pρ(E) = Tr(ρP(E)) = 0

I Thus Pρ has density pρ =dPρ

dµ∈ L1(σ(A), µ)

I If fnw∗→ f then Tr(ρφ(fn)) =

Rfn(ω)pρ(ω)µ(dω)→

Rf (ω)pρ(ω)µ(dω)

49

Page 50: Quantum Statistics

Measurements as (completely) positive unital maps (I)

DefinitionLet φ : L∞(σ(A), µ)→ B(H) be a weak∗-continuous morphism (previouslydenoted φ). We define

φ∗ : T1(H)→ L1(σ(A), µ)

by the duality

Tr(τφ(f )) =

Zf (ω)pτ (ω)µ(dω), pτ := φ∗(τ)

φ∗ has the following properties:

I it is linear and positive, i.e. φ∗(τ) ≥ 0 if τ ≥ 0

I it is normalised, i.e. pρ is a probability density if ρ is a density matrix

50

Page 51: Quantum Statistics

Measurements as (completely) positive unital maps (II)

TheoremLet M : L∞(Ω,Σ, µ)→ B(H) be a linear map such that:

I M is positive, i.e. M(f ) ≥ 0 if f ≥ 0

I M is unital, i.e. M(1) = 1

I M is continuous with respect to the w∗-topology

There exists a linear map M∗ : T1(H)→ L1(Ω,Σ, µ) which satisfies

Tr(τM(f )) =

Zpτ (ω)f (ω)µ(dω), pτ := M∗(τ)

and is

I positive, i.e. M∗(τ) ≥ 0 for τ ≥ 0

I normalised, i.e. pρ a probability density if ρ is a density matrix

Conversely, any linear map M∗ with these properties has a dual M

Hints for the proof.

⇒ : show that f 7→ Tr(τM(f )) is weak∗-continuous and hence is given bysome pτ⇐: show that M∗ is ‖ · ‖1 continuous. Then define M(f ) as element of thedual of T1(H)

51

Page 52: Quantum Statistics

Measurements as (completely) positive unital maps (III)

Definition (general definition of a measurement)

Let B(H) be the algebra of observables of a quantum system. A measurementwith outcomes in the measure space (Ω,Σ) is given by a dual pair (M,M∗) asabove. The result X ∈ Ω of M has probability densitypρ := M∗(ρ) ∈ L1(Ω,Σ, µ)

Example

I Let ri be 3 coplanar unit vectors in R3 forming 120 degrees angles. Thetriad, or Mercedes-Benz measurement on C2 consists of 3 operators

Mi =1

3(1 +

→r i→σ ) =

1

3

„1 + ri,z ri,x − iri,y

ri,x + iri,y 1− ri,z

«I randomised measurement: Let M,N be two measurements with outcomes

in (Ω,Σ). Then R(f ) := λM(f ) + (1− λ)N(f ) defines a measurementobtained by randomly choosing M or N with probabilities (λ, 1− λ)

52

Page 53: Quantum Statistics

Positive operator valued measures (POVM)

DefinitionLet (Ω,Σ) be a measure space. A map M : Σ→ B(H) is called a positiveoperator valued measure (POVM) if it has the following properties

I positivity: M(E) ≥ 0 for all E ∈ Σ

I σ-additivity: M(∪i Ei ) =P

i M(Ei ) (in the sense of strong convergence)for any countable family of mutually disjoint sets Ei ∈ Σ

I normalisation: M(Ω) = 1

TheoremLet M be a measurement M : L∞(Ω,Σ, µ)→ B(H).Then the operators M(E) := M(χE ) form a POVM over (Ω,Σ).

Conversely, for every POVM M(E) : E ∈ Σ over (Ω,Σ) with values in B(H)there exists a probability measure µ and a measurementM : L∞(Ω,Σ, µ)→ B(H) with M(χE ) = M(E).

53

Page 54: Quantum Statistics

Measurements and POVM’s

Proof.⇒: we only need to prove the σ-additivity of M(E).

LemmaLet Mn be an increasing net of positive operators converging to a boundedoperator M w..r.t the weak∗-topology. Then Mn converges strongly to M andM is the least upper bound l .u.b.(Mn).

Since M is weak∗-continuous we have

M(∪i Ei ) = w∗- limk→∞

kXi=1

M(Ei ).

By the previous lemma,Pk

i=1 M(Ei )→ M(∪i Ei ) strongly.

⇐: Given M(E) we construct M∗ : T1(H)→ L1(Ω,Σ, µ) as follows.

For every density matrix τ define the probability measure µτ (E) := Tr(τM(E)).We only need to find a common dominating measure. Let ρ be a densitymatrix with strictly positive eigenvalues and let µ = µρ.Then µτ µ because µ(E) = 0⇒ M(E) = 0⇒ Tr(τM(E)) = 0.Thus M∗ : τ → pτ := dµτ

dµ∈ L1(Ω,Σµ) has the desired properties

54

Page 55: Quantum Statistics

Naimark’s dilation Theorem

Example

Let P(E) be a PVM with values in B(K) and let V : H → K be an isometry.Then M(E) := V ∗P(E)V is a POVM with values in B(H)

Theorem (Naimark’s dilation Theorem)

Let M : L∞(Ω,Σ, µ)→ B(H) be a measurement. There exists a projectionvalued measure P : Σ→ B(K) and an isometry V : H → K such that

M(E) = V ∗P(E)V , E ∈ Σ

Remark

I Naimark’s Theorem is a consequence of Stinespring’s Theorem forcommutative C∗-algebras

I Since V is isometric we can identifying H with VH ⊂ K and writeM(E) = PHP(E)PH

55

Page 56: Quantum Statistics

Proof of Naimark’s Theorem for finite measure spaces

Let Ω = 1, . . . , n and POVM M1, . . . ,Mn.I Define the (positive) inner product over the direct sum of d copies of H:

〈h, k〉M = 〈(h1, . . . , hn), (k1, . . . , kn)〉M :=nX

i=1

〈hi ,Mi ki 〉

I Let K be the Hilbert space (⊕ni=1H)/N where

N := h ∈ ⊕ni=1H : ‖h‖M = 0

is the space of null vectors of 〈·, ·〉M .

I Define V : H → K by V : h 7→ (h, . . . , h). Then V is an isometry:

〈Vh,Vk〉 = 〈h, k〉M =X

i

〈h,Mi h〉 = 〈h, h〉

I Let Pi ∈ B(K) e the orthogonal projection onto the i ’s copy of HI Verify that V ∗Pi V = Mi :

〈h,V ∗Pi Vk〉 = 〈Vh,Pi Vk〉 = 〈hPi k〉M = 〈h,Mi k〉

56

Page 57: Quantum Statistics

Further topics related to measurements

I Measurements are ‖ · ‖1 contractive

I Bures (fildelity) distance on density matrices

I Measurements are contractive w.r.t the Bures-Hellinger distance

I Convex structure of measurements space. Extremal measurements

I In finite dimensions measurements have densities

57

Page 58: Quantum Statistics

Measurements are ‖ · ‖1-contractive maps

LemmaLet M∗ : T1(H)→ L1(Ω,Σ, µ) be a measurement. Let ρ, τ be density matricesand pρ := M∗(ρ), pτ := M∗(τ). Then

‖pρ − pτ‖1 ≤ ‖ρ− τ‖1

Proof.

I Note that if f , g are probability densities

‖f − g‖1 =

Z|f (ω)− g(ω)|µ(dω) = 2 sup

E

ZE

f (ω)− g(ω)µ(dω)

I Similarly, if ρ, τ are density matrices then we can write ρ− τ = δ+ − δ−where δ± are positive operators with orthogonal supports. Then

‖ρ− τ‖1 = Tr(|ρ− τ |) = Tr(δ+ + δ−) = 2Tr(δ+) = 2 supM

Tr(M(ρ− τ))

where the supremum is taken over all operators 0 ≤ M ≤ 1.

I Finally,

‖pρ−pτ‖1 = 2 supE

ZE

(pρ(ω)−pτ (ω))µ(dω) = 2 supE

Tr((ρ−τ)M(E)) ≤ ‖ρ−τ‖1

58

Page 59: Quantum Statistics

Bures (fidelity) distance on density matrices

DefinitionLet ρ, τ be two density matrices on H. The Bures (or fidelity) distance betweenρ and τ is defined as

b(ρ, τ) :=“

2− 2‖ρ1/2τ 1/2‖1

”1/2

=“

2− 2 Tr(pρ1/2τρ1/2)

”1/2

DefinitionLet ρ be a density matrix on H. A purification of ρ is any pure statePψ = |ψ〉〈ψ| on an extended space H⊗K such that ρ = TrK(Pψ) orequivalently Tr(ρA) = 〈ψ,A⊗ 1ψ〉 for all A ∈ B(H).

Theorem

The fidelity F (ρ1, ρ2) := Tr(

1/21 ρ2ρ

1/21 ) is equal to max |〈ψ1, ψ2〉| where the

maximum is taken over all purifications ψ1, ψ2 of ρ1, ρ2.

59

Page 60: Quantum Statistics

Fidelity and transition probability

Sketch of the proof.

1. Let τ ∈ T1(H). Then max |Tr(Uτ)| = Tr(|τ |) with maximum taken overall unitaries. (use the polar decomposition τ = V |τ | with a unitary V )

2. Let ρi =P

k λ(i)k |e

(i)k 〉〈e

(i)k | be the spectral decompositions of ρi . Any

purification of ρi is of the Schmidt form

ψi =X

k

(i)k e

(i)k ⊗ f

(i)k ∈ H⊗H

with f (1)k and f (2)

k orthonormal sets in H (K can be taken to be H)

3. There exist unitaries Ui : f(i)k 7→ e

(1)k for i = 1, 2 and V : e

(2)k 7→ e

(1)k .

Check that〈ψ1, ψ2〉 = Tr(ρ

1/21 ρ

1/22 UT

2 VV T UT1 )

4. Optimise over U2 and use point 1. to obtain the equality

max |〈ψ1, ψ2〉| = Tr(|ρ1/21 ρ

1/22 |)

60

Page 61: Quantum Statistics

Measurements are contractive w.r.t. the Bures-Hellinger distance

DefinitionLet p, q be two probability densities in L1(Ω,Σ, µ). The Hellinger distancebetween f and g is defined as

h(p, q) := ‖√p −√q‖2 =

„2− 2

Z pp(ω)q(ω)µ(dω)

«1/2

TheoremLet M∗ : T1(H)→ L1(Ω,Σ, µ) be a measurement. Let ρ, τ be density matricesand pρ := M∗(ρ), pτ := M∗(τ). Then

h(pρ, pτ ) ≤ b(ρ, τ)

61

Page 62: Quantum Statistics

Proof of contractivity with respect to the Bures-Hellinger distance

Proof in the case of a discrete measure space.

Let Ω = 1, . . . , n and the measurement POVM M1, . . .Mn.The theorem is equivalent to

Pnk=1

√pk√

qk ≥ F (ρ, τ)

I By Naimark’s Theorem we can embed H into a larger space K such thatthe measurement is given by a PVM P1, . . .Pn.This operation leaves F (ρ, τ) invariant

I F (ρ, τ) = supψ,φ |〈ψ, φ〉| where ψ, φ ∈ K ⊗K are purifications of ρ, τ .Thus it suffices to show that for any ψ, φ

nXk=1

√pk√

qk ≥ |〈ψ, φ〉|

I But pk = 〈ψPk ⊗ 1ψ〉 = ‖Pk ⊗ 1ψ‖2 and qk = 〈φPk ⊗ 1φ〉 = ‖Pk ⊗ 1φ‖2

By using Cauchy-Schwarz we finally get

nXk=1

√pk√

qk =nX

k=1

‖Pk ⊗ 1ψ‖ · ‖Pk ⊗ 1φ‖ ≥nX

k=1

|〈Pk ⊗ 1ψ,Pk ⊗ 1φ〉|

≥ |〈ψ,nX

k=1

Pk ⊗ 1φ〉| = |〈ψ, φ〉|

62

Page 63: Quantum Statistics

Extremal measurements

DefinitionA subset C of a vector space V is convex if for any u 6= v ∈ C, the vectorsw := λu + (1− λ)v are in C for all 0 < λ < 1.

w ∈ C is called an extremal point of C if it cannot be decomposed as above

Problem. Characterise the extremal points of the convex set of measurementsM : L∞(Ω,Σ, µ)→ B(H).

TheoremLet M1, . . .Mn be the POVM of a measurement M with values inΩ = 1, . . . , n. Let

Mi =

riXj=1

m(i)j |v

(i)j 〉〈v

(i)j |

with 〈v (i)j , v

(i)k 〉 = δj,k and m

(i)j > 0. Then M is extremal if and only if the

rank-one operators |v (i)j 〉〈v

(i)k | : i = 1, . . . , n; j , k = 1 . . . , ri are linearly

independent.References:K.R. Parthasarathy: Inf. Dim. Analysis, Quantum Probability Rel. Topics, 2, 557-568, (1999)

G.M. D’Ariano, et al: J. Phys. A: Math. Gen. 38, 5979-5991, (2005)

63

Page 64: Quantum Statistics

Extremal measurements: solution in the case of a finite POVM

Sketch of the proof.

1. M is not extremal iff there exist selfadjoint operators D1, . . .Dn (not allequal to zero) such that

Pi Di = 0 and Mi ± Di ≥ 0 for all i .

Indeed, M1 ± D1, . . . ,Mn ± Dn are POVM’s andMi = 1

2(Mi + Di ) + 1

2(Mi − Di )

2. Mi ± Di ≥ 0 implies Ker(Mi ) ⊂ Ker(Di ).

This follows from 〈h,Mi ± Di h〉 ≥ 0 by writting h = αh1 + βh2 withh1 ∈ Ker(Mi ) and h2 ∈ Ker(Mi )

⊥.

3. If Mi has spectral decomposition Mi =Pri

j=1 m(i)j |v

(i)j 〉〈v

(i)j | with m

(i)j > 0

then point 2. implies that Di can be expressed as

Di = d(i)j,k |v

(i)j 〉〈v

(i)j |

ThenP

i Di = 0 is equivalent to the linear dependence of |v (i)j 〉〈v

(i)j |

64

Page 65: Quantum Statistics

In finite dimensions measurements have densities

Lemma (measurement density)

Let M(E) be a POVM over (Ω,Σ) with values in M(Cd). Then there existsa measure µ on (Ω,Σ) and a positive density functionm ∈ L1(Ω,Σ, µ)⊗M(Cd) such that

M(E) =

ZE

m(ω)µ(dω), E ∈ Σ

Moreover µ can be chosen such that ‖m(ω)‖ ≤ 1, almost surely.

Proof.Let tr(A) := Tr(A)/d and define the probability measure µ(E) := tr(M(E)).

The matrix element Mij(E) := 〈ei ,M(E)ej〉 is a measure on Ω, dominated byµ. Thus there exists a density mij ∈ L1(Ω,Σ, µ) such that

Mi,j(E) =

ZE

mi,j(ω)µ(dω).

Moreover from tr(M(E)) = µ(E) =R

tr(m(ω))µ(dω) it follows thattr(m(ω)) = 1, µ-almost surely. In particular m is bounded.

65

Page 66: Quantum Statistics

Notions of statistical inference

I Statistical models

I Parametric estimation

I Fisher Information

I Cramer-Rao bound

I Efficient estimators

I Repeated coin toss example

66

Page 67: Quantum Statistics

What is statistical inference?

Given some random data X from an unknown distribution, one aims to makean ‘educated guess’ about some property of the underlying distribution

Example

I Density estimation: given X1, . . . ,Xn independent identically distributed(i.i.d.) with unknown density p ∈ L1([0, 1]), estimate the value of p(x) forsome x ∈ [0, 1]

I Hypothesis testing: given X drawn from either P0 or P1 decide from whichof the two distributions it comes

I Confidence intervals: together with estimator θ(x) of θ, provide aneighbourhood C of θ(x) such that θ belongs to C with probability p

I Sufficient statistic: can data X ∼ Pθ be ‘summarised’ into a ‘simpler’statistics f (X ) without losing information about θ ?

I Optimality: how do we compare the performance of estimators and whichare the optimal ones?

I Asymptotics: what happens in the limit of ‘large number of data’?

67

Page 68: Quantum Statistics

Statistical models

DefinitionLet Θ be a parameter space. A statistical model (experiment) over Θ is afamily Pθ : θ ∈ Θ of probability distributions on a measure space (Ω,Σ).

Example

I Repeated coin toss: X1, . . . ,Xn i.i.d. with Pθ([Xi = 1]) = θ andPθ([Xi = 0]) = 1− θ. The joint distribution is:

Pnθ([X1 = x1, . . . ,Xn = xn]) =

nYi=1

Pθ([Xi = xi ]) = θP

i xi · (1− θ)n−P

i xi

I Gaussian shift on Rk : family of Gaussian distributions N(θ,V ) withunknown mean θ ∈ Rk and known k × k covariance matrix V

I Tomography: an unknown probability density p over R2 is probed throughits marginals along random directions φ in plane. For each φ we get dataX ∼ R[p](x , φ) where R[p] is the Radon transform

R[p](x , φ) =

Zp(x cosφ+ t sinφ, x sinφ− t cosφ)dt

68

Page 69: Quantum Statistics

Parametric estimation

ProblemGiven

I a (open) subset Θ of Rk

I data X ∼ Pθ with Pθ probability distribution on (Ω,Σ), and θ ∈ Θ

I a loss function W : Θ×Θ→ R+, e.g. W (θ, θ) = ‖θ − θ‖2

devise an estimator θ = θ(X ) such that the risk

R(θ, θ) := Eθ(W (θ(X ), θ))

is small.

Remark

I The same problem can be formulated for ‘non-parametric’ Θ, and/orestimation of a function t = t(θ)

I In general the estimator may be randomised, for exampleI θ = θ(X ,U) where U is an additional random variable with fixed, known

distributionI if X = x choose θ ∼ K(x , ·) where K : Ω× ΣΘ → [0, 1] is a Markov kernel

69

Page 70: Quantum Statistics

Unbiased estimators

DefinitionLet Pθ : θ ∈ Θ ⊂ Rk be a parametric statistical model and let X ∼ Pθ.An estimator θ(X ) is called unbiased if Eθ(θ(X )) = θ for all θ.

Example

I Let X1, . . . ,Xn be i.i.d. Bernoulli with Pθ([X = 1]) = θ andPθ([X = 0]) = 1− θ. Then X = (

PXi )/n is an unbiased estimator of θ

I Let Y1, . . . ,Yn be i.i.d. normal distributed with Pθ = N(θ,V ). ThenY = (

PYi )/n is an unbiased estimator of θ

Remark (exercise)

Pθ : θ ∈ Θ ⊂ Rk be a parametric statistical model and let X ∼ Pθ. The meansquare error of θ(X ) can be written as the sum of a variance and a bias terms

Eθ((θ − θ)2) =

Z(θ − θ)2pθ(d θ) =Z(θ − Eθ(θ))2pθ(d θ) + (θ − Eθ(θ))2 = V (θ) + B(θ)2

If θ is unbiased then the mean square error is equal to V (θ)

70

Page 71: Quantum Statistics

Fisher information matrix

Let Pθ : θ ∈ Θ ⊂ Rk be a parametric statistical model with Pθ probabilitymeasures on (Ω,Σ) dominated by µ.

Smooth modelThroughout the following we will assume that pθ = dPθ

dµsatisfy sufficient

‘regularity conditions’ allowing for differentiation w.r.t. θ and exchangeability ofintegral and derivative.

DefinitionLet `θ := log pθ be the log likelihood and let ˙

θ,i := ∂`θ/∂θi be the scorefunction(s). The Fisher information matrix is defined by

Ii,j(θ) := Eθ( ˙θ,i

˙θ,j) =

Zp−1θ (ω)

∂pθ∂θi

∂pθ∂θj

µ(dω)

71

Page 72: Quantum Statistics

Properties of the Fisher information matrix

I I (θ) is a positive definite real k × k matrix

I I (θ) is additive for products of independent models (exercise):

if Pθ = P(1)θ × P(2)

θ then I (θ) = I (1)(θ) + I (2)(θ)

I The Hellinger distance between infinitesimally close densities pθ and pθ+dθ

is determined by the Fisher information

h(pθ, pθ+dθ)2 =

Z(p

pθ(ω)−p

pθ+dθ(ω))2µ(dω) =1

4I (θ)(dθ)2 +o((dθ)2)

I The Fisher information matrix defines a riemannian metric on Θ and thecorresponding geodesic distance is the Bhattacharya distance

d(pθ1 , pθ2 ) = 2 arccos

„Z ppθ1 (ω)

ppθ2 (ω)µ(dω)

«I Let qθ be the probability density of a randomisation Y of X (randomised

statistic, Markov kernel) where X ∼ Pθ. Then

d(qθ1 , qθ2 ) ≤ d(pθ1 , pθ2 ) and h(qθ1 , qθ2 ) ≤ h(pθ1 , pθ2 )

I I (θ) is the unique metric contracting under all randomisations

72

Page 73: Quantum Statistics

The Cramer-Rao bound

Theorem (Cramer-Rao)

The following matrix inequality holds for any unbiased estimator θ

Eθ((θ − θ)2) = Var(θ) ≥ I (θ)−1

where I (θ) is the Fisher information matrix.

Proof.Let θ be one dimensional. The general case is left as exercise.By Cauchy-Schwarz

Var(θ) · I (θ) = Eθ((θ − θ)2) · Eθ( ˙2θ) ≥

˛Eθ((θ − θ) ˙

θ)˛2

The right side is

Eθ((θ − θ) ˙θ) = Eθ(θ ˙

θ)− θEθ( ˙θ) =

=

Zθ(ω)

dpθdθ

(ω)µ(dω)− θZ

dpθdθ

(ω)µ(dω) =

=d

Zθ(ω)pθ(ω)µ(dω)− θ d

Zpθ(ω)µ(dω) =

d

dθEθ(θ) = 1

73

Page 74: Quantum Statistics

Remarks on the Cramer-Rao bound

I One can similarly define unbiased estimators g of g(θ) for a functiong : Θ→ Rp. The Cramer-Rao bound in this case is

Var(g) ≥ J(θ)I (θ)−1J(θ)T

where J(θ)l,i = ∂g(θ)l/∂θi is the p × k Jacobian matrix.

I For certain models there exist no unbiased estimators, e.g. the binomialdistribution b(θ, n) and function g(θ) = θ−1 (exercise).

I Even if unbiased estimators exist, their variance may be too big.

I The Cramer-Rao bound is in general not attainable, but it becomesequality if and only if the distributions form an exponential family

pθ = exp

sX

i=1

ηi (θ)gi (ω)− B(θ)

!h(ω)

74

Page 75: Quantum Statistics

Asymptotic efficiency

The theory of asymptotic efficiency shows that the Cramer-Rao bound isasymptotically attained in the following sense.

DefinitionLet Pθ : θ ∈ Θ ⊂ Rk be a parametric statistical model. Let X1, . . . ,Xn bei.i.d. with distribution Pθ. An estimator θn = θn(X1, . . . ,Xn) is calledasymptotically efficient if

√n(θn − θ)

L−→ N(0, I (θ)−1)

In particular, if θ is one dimensional then nEθ((θn − θ)2)→ I (θ)−1.

TheoremUnder regularity conditions, the maximum likelihood estimator

θn(X1, . . . ,Xn) = arg maxτ

nYi=1

pτ (Xi )

is asymptotically efficient.

75

Page 76: Quantum Statistics

Repeated coin toss example

Let Pθ be the Bernoulli distribution: Pθ([X = 1]) = θ and Pθ([X = 0]) = 1− θ.Let X1, . . . ,Xn be i.i.d. with distribution Pθ. Then

I Xn := (Pn

i=1 Xi )/n is an unbiased estimator of θ.

I Var(Xn) = Var(X )/n = θ(1− θ)2 + (1− θ)(0− θ)2 = θ(1− θ)

I The Fisher information is I (θ) = θ−1 + (1− θ)−1 = 1/(θ(1− θ))

I Thus Xn attains the Cramer-Rao bound.Moreover by the Central Limit Theorem we have

√n(θn − θ) =

1√n

nXi=1

(Xi − θ)L−→ N(0,Var(X )) = N(0, θ(1− θ))

I Hence θn is asymptotically efficient

I The maximum likelihood estimator is obtained by differentiating thelikelihood

dpθdθ

(X1, . . . ,Xn) =d

nYi=1

θP

i Xi (1− θ)n−P

i Xi =

Pi Xi

θ−

n −P

i Xi

1− θ = 0

with solution θn = Xn!

76

Page 77: Quantum Statistics

Hypothesis testing

ProblemLet P0,P1 be a binary statistical model over (Ω,Σ). Given X ∼ Pi decidewhich of the two hypotheses is true, i = 0 or i = 1.

The test t : Ω→ 0, 1 is ‘good’ if its error probabilities are small

I type I error P0([t(X ) = 1])

I type II error P1([t(X ) = 0])

There are two main approaches to optimality

1. fix a level α ∈ (0, 1) and look for a test that minimisesβ := P1([t(X ) = 0]) under the constraint P0([t(X ) = 1]) ≤ α

2. fix a prior π0, π1 and find a test that minimises the average errorprobability Pe

π := π0P0([t(X ) = 1]) + π1P1([t(X ) = 0])

RemarkOne can extend the problem to

I more hypotheses P1, . . . ,PkI composite hypotheses: θ ∈ Θ0 vs θ ∈ Θ1 where Θ0,Θ1 is a partition of

Θ and X ∼ PθI randomised tests t = t(X ,U) with U uniform on [0, 1]

77

Page 78: Quantum Statistics

Optimal tests

Let P0,P1 be a binary statistical model over (Ω,Σ) and let p0 and p1 be thedensities of P0 and P1 w.r.t. a probability measure µ.

Lemma (Neyman-Pearson lemma)

Let α ∈ (0, 1) be a fixed level. Then there exist a constant k such that thelikelihood ratio test

t(ω) :=

0 if p0(ω)/p1(ω) > k1 if p0(ω)/p1(ω) ≤ k

satisfies P0([t(X ) = 1]) = α and minimises the type II error P1([t(X ) = 0])among the α-level tests.

Lemma (optimal Bayes test)

Let π0, π1 be a (non-degenerate) prior distribution. Then the likelihood ratiotest

t(ω) :=

0 if p0(ω)/p1(ω) > π1/π0

1 if p0(ω)/p1(ω) ≤ π1/π0

has minimal average error

Peπ := π0P0([t(X ) = 1]) + π1P1([t(X ) = 0]) =

1

2(1− ‖π1p1 − π0p0‖1)

78

Page 79: Quantum Statistics

Asymptotics: Stein’s Lemma and Chernoff’s bound

Let P0,P1 be a binary statistical model and let X1, . . . ,Xn i.i.d. with Xk ∼ Pi .

Theorem (Stein’s Lemma)

Let tn(X1, . . .Xn) be the most powerful level α test. Then

limn→∞

1

nlog Pn

1([tn = 0]) = −D(p0, p1)

where D(p0, p1) is the relative entropy

D(p0, p1) =

Zp0(ω) log(p0/p1)µ(dω).

Theorem (Chernoff’s bound)

Let π0, π1 be a nondegenerate prior and let tn(X1, . . .Xn) be the optimal Bayestest. Then

limn→∞

1

nlog Pe,n

π = −C(p0, p1)

where C(p0, p1) is the Chernoff distance

C(p0, p1) = − log

„inf

0≤s≤1

Zps

0(ω)p1−s1 (ω)µ(dω)

«79

Page 80: Quantum Statistics

The quantum Cramer-Rao bound

I Quantum statistical models

I Quantum state estimation

I The L2(ρ) Hilbert space

I The quantum Fisher-Helstrom information matrix

I Quantum Cramer-Rao bound(s)

I The quantum Cramer-Rao bound is achievable for Θ ⊂ RI Achievability of the quantum Cramer-Rao bound for Θ ⊂ Rk with k > 1

I The right Cramer-Rao bound

I The Holevo bound

80

Page 81: Quantum Statistics

Quantum statistical models

DefinitionLet Θ be a parameter space. A quantum statistical model (experiment) over Θis a family ρθ : θ ∈ Θ of density matrices ρθ ∈ T1(H) for a given space H.

Example

I qubit states: indexed by r = (rx , ry , rz) ∈ R3 such that ‖r‖ ≤ 1

ρr =1

2

„1 + rz rx − iry

rx + iry 1− rz

« z

y

x

r

I coherent spin states: ρnr = ρr ⊗ · · · ⊗ ρr, for ‖r‖ = 1 (pure states)

I Unitary family: ρt = exp(iHt)ρ exp(iHt) for t ∈ R, H selfadjoint

I quantum exponential family:

ρθ = e−k(θ) exp(X

i

γi (θ)T ∗i ) ρ0 exp(X

i

γi (θ)Ti ), γi (θ) ∈ C,Ti ∈ B(H)

I Gaussian states of a quantum harmonic oscillator Φ(z ,V ) with meanz ∈ C, complex 2× 2 ’covariance matrix’ V

81

Page 82: Quantum Statistics

Quantum state estimation

ProblemGiven

I a quantum statistical model ρθ : θ ∈ ΘI a loss function W : Θ×Θ→ R+, e.g.‖θ − θ‖2 for Θ ⊂ Rk or ‖ρ− ρ‖1 if Θ ⊂ S(H), etc.

design a measurement M and an estimator θ(X ), where X is the outcome ofthe measurement, such that

R(M, θ, θ) = Eθ(W (θ(X ), θ))

is small.Mesure quantique

ρθ∼∼∼∼∼∼∼∼∼∼ M

appareil

de mesure

- -X ∼ PθM

resultat

θ (X)

estimateur

Remark

I same problem can be formulated for estimating a function g(θ)

I the main quantum feature is the optimisation over measurements step

I measurement and estimator can be ‘bundled’ into a measurement withvalues in Θ

82

Page 83: Quantum Statistics

The L2(ρ) Hilbert space

DefinitionLet ρ be a positive operator in T1(H). On the R-linear space of boundedselfadjoint operators B(H)sa define the inner product

〈A,B〉ρ := Tr(ρA B), A B =1

2(AB + BA)

L2R(ρθ) is the Hilbert space completion of B(H)sa with respect to 〈·, ·〉ρ

Remark

I A,B ∈ B(H) correspond to the same vector in L2R(ρ) if Tr(ρ(A− B)2) = 0

(relevant when ρ has eigenvalues equal to zero).

I It can be shown each vector in L2R(ρ) can be identified with (the

equivalence class of) a square summable operator w.r.t. ρ, i.e. unboundedsymmetric linear operators satisfyingX

λi‖Xei‖2 <∞

where ρ =P

i λi |ei 〉〈ei | is the spectral decomposition of ρ

I equivalently, X is square summable iff X√ρ is a Hilbert-Schmidt operator,

i.e. ‖X√ρ‖22 = Tr((X

√ρ)∗(X

√ρ)) = ‖X‖2

ρ <∞

83

Page 84: Quantum Statistics

The quantum Fisher-Helstrom information matrix

Let ρθ : θ ∈ Θ be a parametric statistical model with ρθ ∈ T1(H) andΘ ⊂ Rk open. Let (L2

R(ρθ), 〈·, ·〉θ) be the L2 space w.r.t. ρθ.

Assume thatI θ 7→ ρθ is differentiable as function with values in T1(H)I the linear functional on B(H)

A 7→ ∂

∂θiTr(Aρθ) = Tr(∂ρθ/∂θi A)

can be extended to a continuous functional on L2R(ρθ) for all i = 1, . . . , k

Then by Riesz Theorem there exists a unique vector Lθ,i ∈ L2R(ρθ) called

symmetric logarithmic derivative (s.l.d.) such that

Tr(∂ρθ/∂θi A) = 〈Lθ,i ,A〉θ = Tr((ρθ Lθ,i )A)

or equivalently,∂ρθ∂θi

= Lθ,i ρθ

The quantum Fisher-Helstrom information matrix is defined as

H(θ)i,j = 〈Lθ,i ,Lθ,j〉θ

84

Page 85: Quantum Statistics

Example (exercise)

Let ρr ∈ M(C2) be the state with Bloch vector r represented in polarcoordinates r↔ (r , θ, φ)

ρr =1

2

„1 + r cos θ r sin θe−iφ

r sin θe−iφ 1− r cos θ

«=

1

2(1 + rσ)

The symmetric logarithmic derivatives are the solutions of

∂ρr

∂r= Lr,r ρr,

∂ρθ∂θ

= Lr,θ ρr,∂ρr

∂φ= Lr,φ ρφ

and are given by

Lr =1

1 + r(1 + rσ/r), Lθ =

∂r

∂θσ, Lφ =

∂r

∂φσ.

The quantum Fisher-Helstrom information matrix is

H(r) =

0@ 11−r2 0 0

0 r 2 00 0 r 2 sin θ2

1A

85

Page 86: Quantum Statistics

Properties of the quantum Fisher-Helstrom information matrix

I H(θ) is a real positive definite matrix

I additivity: if ρθ = ρ(1)θ ⊗ ρ

(2)θ then H(θ) = H(θ)(1) + H(θ)(2) (exercise)

I metric: the Bures (fidelity) distance between infinitesimally close states ρθand ρθ+dθ is given by the quantum Fisher-Helstrom information

B(ρθ, ρθ+dθ) =1

4H(θ)(dθ)2 + o((dθ)2)

I contractivity: let C : T1(H)→ T1(K) be a quantum channel(completely positive, trace preserving linear map)

Let τθ := C(ρθ) be the quantum model obtained by applying the‘quantum randomisation’ C to ρθ. Then

b(ρθ1 , ρθ2 ) ≥ b(τθ1 , τθ2 ), and H(ρθ) ≥ H(τθ)

I unlike the classical case, H is not the unique contractive metric. Suchmetrics are in one-to-one correspondence with operator monotonefunctions f : R+ → R (i.e. f (A) ≥ f (B) for all A ≥ B ≥ 0 in B(H))satisfying f (t) = tf (t−1) and f (1) = 1Reference: D. Petz, Linear Algebra Appl. 244 81-96 (1996)

86

Page 87: Quantum Statistics

Quantum Cramer-Rao bound (I)

TheoremLet Q := ρθ : θ ∈ Θ ⊂ Rk be a quantum statistical model with ρθ ∈ B(H)and denote by H(θ) the associated quantum Fisher information matrix.

Let M be a measurement with outcomes in (Ω,Σ) and let P(M)θ := M∗(ρθ).

Let PM := P(M)θ : θ ∈ Θ be the classical model associated to (Q,M) and let

IM(θ) be its Fisher information matrix. Then the matrix inequality holds

IM(θ) ≤ H(θ)

and in particular, for any unbiased estimator θ of θ we have

Var(θ) ≥ IM(θ)−1 ≥ H(θ)−1

Remark

I In the last display, the left inequality is the ‘classical’ Cramer-Rao.

I the right inequality follows from applying the operator monotone functionf (x) = x−1 to the previous inequality IM(θ) ≤ H(θ).

I A function is called operator monotone if f (A) ≤ f (B) for all A,B ∈ B(H)satisfying 0 ≤ A ≤ B. Not all monotone functions are operator monotone(exercise) 87

Page 88: Quantum Statistics

Proof: the case of a PVM

1. Suppose first that M is a PVM. The general case is reduced to this byNaimark’s theorem (next page).

We show that there exists and isometry I : L2R(pθ)→ L2

R(ρθ) such thatI ∗(Lθ,i ) = ˙

θ,i which implies IM(θ) ≤ H(θ).

I Let L2R(pθ) = f : Ω→ R : Eθ(f 2) <∞ be the Hilbert space with inner

product 〈f , g〉θ = Eθ(fg)

I The score functions ˙θ,i are elements of L2

R(pθ) and IM(θ) = 〈 ˙θ,i , ˙

θ,j〉θ

I Recall that we defined M : L∞(Ω, σ, µ)→ B(H). Then

〈f , g〉θ = Eθ(f , g) = Tr(ρθM(f · g)) = 〈M(f ),M(g)〉θ

so M can be extended to an isometry I : L2R(pθ)→ L2

R(ρθ)

I we show that I ∗(Lθ,i ) = ˙θ,i . Indeed for every f ∈ L2(pθ)

〈f , ˙θ,i 〉θ =

Zf (ω)∂pθ/∂θi (ω)µ(dω) = ∂Eθ(f )/∂θi

∂Tr(ρθM(f ))/∂θi = Tr(∂ρθ/∂θi M(f )) = 〈I (f ),Lθ,i 〉θ

88

Page 89: Quantum Statistics

Proof : the Naimark Theorem argument

I Then IM(θ) ≤ H(θ) sinceXi,j

ci cj IM(θ)i,j = ‖X

i

ci˙θ,i‖2

θ = ‖I ∗X

i

ciLθ,i‖θ ≤

‖X

i

ciLθ,i‖θ =Xi,j

ci cjH(θ)i,j

2. Now let M be a general measurement given by a POVM on H.

I By Naimark’s Theorem there exists an isometry V : H → K such thatM(B) = V ∗P(B)V , with P(B) a PVM.

I The map V · V ∗ : B(H)sa → B(K)sa extends to an isometric isomorphimO : L2(ρθ)→ L2(ρθ) where ρθ := V ρθV ∗ is the embedded state(exercise).

I In particular Lθ,i = OLθ,i O−1 and H(ρθ) = H(ρθ) = H(θ)

I When measuring ρθ with P(B) we get the same distribution Pθ as whenmeasuring ρθ with M(B) and hence, same Fisher information.

We can now apply the proof for the PVM case

89

Page 90: Quantum Statistics

Quantum Cramer-Rao bound (II)

Theorem (Helstrom, Belavkin, Holevo)

Let Q := ρθ : θ ∈ Θ ⊂ Rk be a quantum statistical model with ρθ ∈ B(H)and denote by H(θ) the associated quantum Fisher information matrix.

Let M be a unbiased measurement with values in Θ, i.e.the result θ ∼ P(M)

θ is unbiased estimator of θ.

Define the operators

X Mi =

Zxi M(dx), i = 1, . . . , k

as element in L2(ρθ) and the ‘quantum covariance matrix’

V M(θ)i,j := 〈X Mi − θi ,X

Mj − θj〉θ

ThenVar(θ) ≥ V M(θ) ≥ H(θ)−1

90

Page 91: Quantum Statistics

Proof of quantum Cramer-Rao Theorem (II)

1. We first prove Var(θ) ≥ V M(θ)

I We use again Naimark’s theorem (M(dx) = V ∗P(dx)V ) to obtain

X Mi =

Zxi M(dx) = V ∗

Zxi P(dx)V = V ∗X P

i V

I Let Y M(c) :=P

i ci (X Mi − θi ) and Y P(c) :=

Pi ci (X P

i − θi ).

Then (exercise)

cT V M(θ)c = Tr(ρθ(Y Mc )2) = Tr(ρθ(V ∗Y P

c V )2)

= Tr(ρθY Pc VV ∗Y P

c ) ≤ Tr(ρθ(Y Pc )2)

= Eθ((X

i

ci (θi − θi ))2) = cT Var(θ)c

91

Page 92: Quantum Statistics

Proof of quantum Cramer-Rao Theorem (II)

2. We now prove the second inequality for one dimensional θ. The general caseis left as an exercise.

I By Cauchy-Schwarz we have

‖Lθ‖2θ · ‖Y M‖2

θ ≥ |〈Lθ,Y M〉θ|2

I Since H(θ) = ‖Lθ‖2θ and V M(θ) = ‖Y M‖2

θ, it suffices to show that〈Lθ,Y M〉θ = 1.

I By using the isomorphism O : L2(ρθ)→ L2(ρθ) the isometryI : L2(pθ)→ L2(ρθ), the fact that I ∗Lθ = ˙

θ and Y P = I (f ) forf (x) = x − θ, we get

〈Lθ,Y M〉θ = 〈Lθ,Y P〉θ = 〈Lθ, I (f )〉θ

= 〈I ∗(Lθ), f 〉θ =

Z˙θ(x)(x − θ)pθ(x)µ(dx) = 1.

92

Page 93: Quantum Statistics

The quantum Cramer-Rao bound is asymptotically achievable for Θ ⊂ R

By measuring Lθ0 in state ρθ0 for some fixed θ0 we obtain a random variable Lwith mean and variance

Eθ0 (L) = Tr(ρθ0Lθ0 ) = 0, Varθ0 (L) = Tr(ρθ0L2θ0

) = H(θ0)

Then

θ :=L

H(θ0)+ θ0

is locally unbiased estimator of θ around θ0 since

Eθ(θ) = θ0 +Tr(ρθLθ0 )

H(θ0)= θ0 + dθ

Tr( dρθdθLθ0 )

H(θ0)+ o(dθ)

= θ0 + dθTr(ρθ0L2

θ0)

H(θ0)+ o(dθ) = θ + o(dθ)

and its variance is

Var(θ) =Var(L)

H(θ0)2= H(θ0)−1

93

Page 94: Quantum Statistics

The quantum Cramer-Rao bound is asymptotically achievable for Θ ⊂ R

However the measurement depends on θ0 and is only ‘locally optimal’. Theargument can be made rigourous in the asymptotic framework using anadaptive measurement procedure:

1. Consider n independent, identically prepared quantum systems. Thecorresponding statistical model is Qn := ρ⊗n

θ : θ ∈ Θ2. The s.l.d. is given by the the sum of the individual s.l.d.’s

L(n)θ = Lθ ⊗ 1⊗ · · · ⊗ 1 + · · ·+ 1⊗ · · · ⊗ Lθ

and the Fisher- Helstrom information is H(n)(θ) = nH(θ)

3. We perform a simple measurement (e.g. separate, identical,informationally complete meausurements on each system) on a smallfraction n n of the systems and compute a rough estimator θn of θ

4. On the rest of the systems we measure the s.l.d. L(n)θ at θ = θn and

compute the locally unbiased θn.

5. This estimator is efficient

√n(θn − θ)

L−→ N(0,H(θ)−1)

94

Page 95: Quantum Statistics

Achievability of the quantum Cramer-Rao bound for Θ ⊂ Rk with k > 1

1. If the s.l.d.’s commute with each other, i.e. [Lθ,i ,Lθ,j ] = 0 for alli , j = 1, . . . , k then they can be measured simultaneously (exercise) andthe previous argument leads to an efficient estimator θn.

2. However, if the s.l.d.’s do not commute with each other, there may notexist locally unbiased estimator which achieve the quantum Cramer-Raobound.Asymptotically, the variance H(θ)−1 can be achieved iff the weaker formof commutativity holds Tr(ρθ[Lθ,i ,Lθj ]) = 0.

3. Although the bound is in general not achievable, it is sharp in the sensethat if V (M, θ) ≥ K−1(θ) for all locally unbiased measurements, thenH(θ)−1 ≥ K−1(θ).

4. What is a ‘good estimator’ in this case?The answer depends on the particular form of the loss function. IfG ∈ M(Rk) is a positive matrix we define the loss function

W (θ, θ) =Xi,j

(θi − θi )Gi,j(θj − θj) = (θ − θ)T G(θ − θ)

The risk is given by R(θ, θ,G) = EθW (θ, θ) = TrGVar(θ) and the optimalmeasurement procedure will depend on G ...

95

Page 96: Quantum Statistics

The right logarithmic derivative

Definition

1. Let ρ ∈ T1(H) be a state. Define L2+(ρ) to be the complex Hilbert space

obtained as the completion of B(H) with respect to the inner product

(X ,Y )ρ := Tr(ρYX ∗)

2. Let Q := ρθ : θ ∈ Θ ⊂ Rk be a quantum statistical model on H.Assume that

I ρθ is differentiable in T1(H)I the functional A 7→ ∂Tr(ρθA)

∂θi= Tr( ∂ρθ

∂θiA) on B(H), can be extended to a

continuous linear functional on L2+(ρθ).

The right logarithmic derivative Lθ,i is defined as the vector in L2+(ρ)

satisfying Tr( ∂ρθ∂θi

A) = (Lθ,i ,A)θ or equivalently,

∂ρθ∂θi

= ρθLθ,i

3. The right information matrix is defined by

J(θ)i,j = (Lθ,i , Lθ,j)θ

96

Page 97: Quantum Statistics

The right Cramer-Rao bound

Theorem (Yuen and Lax, Belavkin)

Let Q := ρθ : θ ∈ Θ ⊂ Rk be a quantum statistical model with ρθ ∈ B(H)and denote by J(θ) the associated right information matrix.

Let M be a unbiased measurement with values in Θ, i.e.the result θ ∼ P(M)

θ is unbiased estimator of θ.

Define the operators

X Mi =

Zxi M(dx), i = 1, . . . , k

as element in L2+(ρθ) and the ‘right quantum covariance matrix’

V M+ (θ)i,j := (X M

i − θi ,XMj − θj)θ

ThenVar(θ) ≥ V M

+ (θ) ≥ J(θ)−1

where all matrices are considered as elements of M(Ck).

97

Page 98: Quantum Statistics

Comparison of the symmetric and right (left) Cramer-Rao bounds

1. If θ is one dimensional then the symmetric bound is at least as informativeas the right bound:

H(θ) ≤ J(θ)

Indeed the variance H(θ)−1 is achieved by measuring Lθ (locally unbiasedmeasurement), hence the right bound implies that H(θ)−1 ≥ J(θ)−1

2. For certain models the right bound is better than the symmetric one. Forexample in the case of mixed Gaussian states of a harmonic oscillatorG(z ,V ) with fixed V and unknown z , the right bound is achieved in thesense that for any fixed positive matrix G ,there exists an unbiasedestimator z such that

Tr(GVar(θ)) = Tr(GJ(θ)−1)

The measurements leading to these estimators depend however on G , andare incompatible with each other.

98

Page 99: Quantum Statistics

The Holevo bound for quadratic risk

Let Q = ρθ : θ ∈ Θ ⊂ Rk be a quantum statistical model on H.Let W (θ, θ) be a quadratic loss function, i.e.

W (θ, θ) =Xi,j

(θi − θi )Gij(θj − θj) = (θ − θ)T G(θ − θ)

The risk of an unbiased estimator θ is given by

R(θ, θ,G) = Eθ(W (θ, θ)) =Xi,j

GijEθ((θi − θi )(θj − θj)) = Tr(GVar(θ))

Theorem (Holevo bound)

Let M(d θ) be an unbiased measurement. Then

Tr(GVar(θ)) ≥ infXθ

nTr“√

GRe(Z(Xθ))√

G”

+ Tr“˛√

G Im(Z(Xθ))√

G˛”o

where Xθ := (Xθ,1, . . . ,Xθ,k) with Xθ,k symmetric elements of L2+(ρθ) satisfying

Tr(ρθXθ,i ) = 0, Tr(∂ρθ∂θi

Xθ,j) = δi,j ,

and Z(Xθ)i,j := (Xθ,i ,Xθ,j)θ = Tr(ρθXθ,jXθ,i ).

99

Page 100: Quantum Statistics

Proof of the Holevo bound

For simplicity we take G = 1. The general case is left as an exercise.

I It is enough to prove the bound for special Xθ of the form

Xθ,i =

Zxi M(dx)− θi

I Check the duality between Xθ,j and ∂ρθ∂θi

Tr(∂ρθ∂θi

Xθ,j) =∂

∂θiTr(ρθXθ,j)− Tr(ρθ

∂Xθ,j∂θi

) =

∂θiTr(ρθXθ,j) + Tr(ρθ)δi,j = δi,j

I As in the Cramer-Rao bound (II) it can be shown that

Var(θ) ≥ Z(Xθ)

I Lemma (proof left as exercise): if V is a real symmetric k × k matrix, Z ishermitian (complex) matrix and V ≥ Z , then

Tr(V ) ≥ Tr(Re(Z)) + Tr(|ImZ |)

I Apply Lemma with V = Var(θ) and Z = Z(Xθ)

100

Page 101: Quantum Statistics

The Holevo bound is achievable (asymptotically)

1. The Holevo bound is achieved in the case of quantum Gaussian shiftmodels, i.e. Gaussian states of quantum oscillators with unknown meansand fixed, known covariance. This will be discussed in detail in thefollowing sections.

2. The Holevo bound is achieved asymptotically for i.i.d. models of finitedimensional states, i.e. ρθ ⊗ · · · ⊗ ρθ with ρθ ∈ M(Cd)

The measurement consists of a two steps adaptive procedure (as in thecase of one-dimensional parameter), with the difference that in the secondstep one needs to perform a joint measurement (not separable) on then − n systems. The measurement can be understood by showing that then particle model ‘converges’ to a Gaussian model for which the solution isknown.

I A proof based on Cramer-Rao analysis is given for d = 2 inM. Hayashi and K. Matsumoto: arXiv:quant-ph/0411073

I For the general case d <∞ the result follows from the theory of ‘localasymptotic normality’ developed inJ. Kahn and M. Guta: arXiv:quanth-ph/0804.3876

101

Page 102: Quantum Statistics

Covariant measurements

I Group covariant quantum statistical models

I Covariant measurements

I The covariant quantum estimation problem

I Optimal covariant measurements

I Structure of covariant measurements

I The optimal measurement in the case of irreducible representations

I Example: estimation of pure states

Reference:A. S. Holevo: Probabilistic and statistical aspects of quantum theory (1982)

102

Page 103: Quantum Statistics

Covariant quantum statistical models

Definition (covariant statistical models)

Let G be a group of transformations of a set Θ and denote the action byθ 7→ gθ for θ ∈ Θ, g ∈ G .

Let U : G → U(H) be a unitary representation of G on H.

A quantum statistical model ρθ : θ ∈ Θ on H is called covariant if

ρgθ = U(g)ρθU(g)∗, g ∈ G , θ ∈ Θ

Example

I the set of pure states ρP = P with P a one dimensional projection in Cd iscovariant under the action of SU(d) given by P 7→ UPU∗

I shift parameter: the time evolved states ρt := exp(−iHt)ρ exp(iHt) arecovariant with respect to the representation of R given by U(t) = exp(iHt)

I orientation parameter: Let U : SO(3)→ U(H) be a unitary representation.Let n 7→ gn be the action on S2 by rotations.The model ρn := U(g)ρn0 U(g)∗ : n ∈ S2 is covariant,provided that ρn0 = U(g)ρn0 U(g)∗ for all g s.t. gn0 = n0

103

Page 104: Quantum Statistics

Covariant measurements

Definition (covariant measurements)

Let G be a group of (measurable) transformations of a measure space (Ω,Σ)and denote the action by ω 7→ gω for ω ∈ Ω, g ∈ G .

Let U : G → U(H) be a unitary representation of G on H.

A measurement on H with outcomes in Ω is called covariant if

U∗(g)M(B)U(g) = M(g−1B)

where gB = ω : ω = gω′, ω′ ∈ B.

Example

I Let Q,P be the position and momentum of a quantum particle such thatexp(−ixP)Q exp(ixP) = Q − x1. The measurement of Q is covariant withrespect to U(x) := exp(ixP) (exercise).

I The triad measurement is covariant with respect to a unitaryrepresentation of S(3) (exercise).

104

Page 105: Quantum Statistics

The covariant quantum estimation problem

Problem (covariant quantum estimation)

Given

I an action of G on Θ

I a unitary representation U : G → U(H)

I a covariant model ρθ : θ ∈ Θ on HI an invariant loss function: W (θ, θ) = W (g θ, gθ) for all g ∈ G

find the ‘optimal’ measurement for estimating θ.

RemarkLet θ be the result of a measurement M. The risk at θ is R(θ,M) := EθW (θ, θ)By optimal we mean a measurement that minimises the maximum risk

R(M) = supθ

R(θ,M)

Alternatively, we can look for a measurement that minimises the Bayesian risk

R(π,M) =

π(dθ)R(θ,M)

for a prior π on Θ that is invariant under the action of G .

105

Page 106: Quantum Statistics

Transitive actions on compact groups

From now on we will assume for simplicity that

I Θ ⊂ Rk is a smooth manifold

I G is a compact Lie group

I the action of G on Θ is continuous and transitive, i.e.for any θ ∈ Θ there exists g ∈ G such that θ = gθ0 for some fixed θ0

I H is finite dimensional

Remark

I On a compact Lie group there is a unique left (and right) invariantprobability measure, i.e. µ(A) = µ(gA) = µ(Ag) called the Haar measure

I Let G0 = g : gθ0 = θ0 ⊂ G be the stationary group of θ0. Transitivityimplies Θ ∼= G/G0.

I on Θ there is a unique invariant measure ν given byν(B) = µ(g : gθ0 ∈ B) (exercise)

106

Page 107: Quantum Statistics

Optimal covariant measurements

Theorem (covariant measurements achieve the optimal risk)

In the covariant quantum estimation problem the minima of the Bayesian riskR(π,M) and maximum risk R(M) are achieved on a covariant measurement.Moreover if M is covariant, then

R(π,M) = R(M) = R(θ,M)

107

Page 108: Quantum Statistics

Optimal covariant measurements: proof

I For any measurement M and g ∈ G we can define a new measurement Mg

byMg (B) = U(g)∗M(gB)U(g)

I Using the covariance of ρθ : θ ∈ θ and the invariance of W (θ, θ) we get

R(Mg , θ) =

ZW (θ, θ)Tr(ρθMg (d θ)) =

ZW (θ, θ)Tr(ρθU(g)∗M(dg θ)U(g))

=

ZW (θ, gθ)Tr(U(g)ρθU(g)∗M(d θ)) = R(M, gθ)

I In particular if M is covariant then R(M, θ) = R(M, gθ)

I Thus R(ν,M) = R(ν,Mg ) since ν is invariant measure on θ

108

Page 109: Quantum Statistics

Optimal covariant measurements: proof

I The averaged measurement

M(B) =

ZG

Mg−1 (B)µ(dg)

is covariant and

R(M, ν) =

ZR(Mg−1 , ν)µ(dg) = R(M, ν) (5.1)

I MoreoverR(M) = sup

θR(M, θ) ≥ R(M, ν) = R(M, ν) (5.2)

I (5.1) and (5.2) say that the covariant measurement M is at least as goodas M

109

Page 110: Quantum Statistics

Structure of covariant measurements

Theorem (structure of covariant measurements)

Let m0 ∈ B(H) be a positive operator which commutes with the operatorsU(g) : g ∈ G0 and satisfiesZ

G

U(g)m0U(g)∗µ(dg) = 1.

Define m(θ) := U(g)m0U(g)∗ where g : θ0 → θ. Then

M(B) :=

ZB

m(θ)ν(dθ), B ∈ Σ(Θ)

is (the POVM of) a covariant measurement.

Conversely, any covariant measurement is of this form.

RemarkBy considering B such that ν(B) is small enough we get ‖M(B)‖ < 1.Thus a covariant measurement on a finite dimensional space cannot beprojection valued.

110

Page 111: Quantum Statistics

Structure of covariant measurements: proof

1. Direct statement

I Note that m(θ) is well defined due to the property m0 = U(g)m0U(g)∗ forg ∈ G0.

I Positivity and σ-additivity follow directly from definitions

I Using ν(B) = µ(g : gθ0 ∈ B) we obtain the normalisationZΘ

M(dθ) =

m(θ)ν(dθ) =

ZG

U(g)m0U(g)∗µ(dg) = 1

2. Converse

I We apply the measurement density Lemma to obtain thatM(dθ) = m(θ)ν(dθ) where m(θ) is a unique positive operator density(ν-almost surely)

I The covariance impliesZB

U(g)∗m(θ)U(g)ν(dθ) =

Zg−1B

m(θ)ν(dθ) =

ZB

m(g−1θ)ν(dθ)

and since the density is unique, we obtain U(g)∗m(θ)U(g) = m(g−1θ)

I Choose m0 = m(θ0) and check that it satisfies the conditions (exercise)

111

Page 112: Quantum Statistics

Covariant measurements for irreducible representations

Definition (irreducible representation)

A unitary representation U : G → B(H) is called irreducible (irrep) if the onlysubspaces of H that are invariant under U are H and 0.

Lemma (Schur lemma)

Let U : G → B(H) be an irreducible representation. An operator A ∈ B(H)commutes with U(g) for all g ∈ G iff A = c1 for some c ∈ C.

Proposition (measurement seed for irreps)

There exists a one-to-one correspondence between covariant measurementswith respect to an irreducible representation U : G → B(H) and densitymatrices s0 commuting with U(g) : g ∈ G0:

M(dθ) = dU(g)s0U(g)∗ν(dθ), d = dim(H) (5.3)

112

Page 113: Quantum Statistics

Covariant measurements for irreducible representations

Proof.

I All irreps of a compact group are finite dimensional.

I The expression (5.3) defines a measurement iff m0 := ds0 satisfies thenormalisation Z

U(g)m0U(g)∗µ(dg) = 1 (5.4)

I Since the integral (5.4) commutes with U(g), and U is irreducible, it isproportional to 1 for arbitrary m0 (Schur’s Lemma).

I By taking trace on both sides of (5.4)ZTr(U(g)m0U(g)∗)µ(dg) = Tr(m0) = Tr(1) = d

hence s0 = m0/d is a density matrix.

113

Page 114: Quantum Statistics

The optimal measurement in the case of irreducible representations

Proposition (optimal seed for irreps)

Let U : G → U(H) be an irreducible representation of G acting on Θ. Letρθ : θ ∈ Θ be a covariant quantum statistical model on H = Cd .

1. The risk of a covariant measurement M(dθ) = dsθν(dθ) is equal to

R(M) = dTr(W0s0)

where W0 is the positive operator

W0 =

ZW (θ, θ0)U(g)∗ρθ0 U(g)ν(dθ).

2. The optimal covariant measurement has ‘seed’ s0 given by

s0 =1

dminPmin

where Pmin is the projection onto the eigenspace of W0 coresponding to theminimal eigenvalue, and dmin is the dimension of the eigenspace.

114

Page 115: Quantum Statistics

The optimal measurement in the case of irreducible representations

Proof.As shown before, we can restrict to covariant measurements and the risk is

R(M) = R(M, θ0) = d

ZW (θ, θ0)Tr(ρθ0s(θ))ν(dθ) =

d

ZW (θ, θ0)Tr(U(g)∗ρθ0 U(g)s0)ν(dθ) = dTr(W0s0)

where W0 is the positive, operator (exercise: verify selfadjointness)

W0 =

ZW (θ, θ0)U(g)∗ρθ0 U(g)ν(dθ)

The minimum of Tr(W0s0) over all density matrices s0 is achieved at Pmin/dmin

with Pmin the eigenprojection coresponding to the minimal eigenvalue of W0

(exercise)

115

Page 116: Quantum Statistics

Example: estimation of pure states

I Let Q := ρθ = |θ〉〈θ| : θ ∈ Θ be the family of pure states where

|θ〉 =dX

i=1

θi |ei 〉 ∈ Cd and

nXi=1

|θi |2 = 1

I Let W (θ, θ) = 1− |〈θ, θ〉|2 be the fidelity distance

Remark (exercise)

I Strictly speaking, the state determines the vector |θ〉 only to a phasefactor. This can be taken into account by fixing the phase of one of thecoefficients

I The quantum statistical model Q is covariant with respect to the(irreducible) representation of the special unitary group SU(d) on Cd

I The loss function W (θ, θ) is invariant under the action of SU(d)

116

Page 117: Quantum Statistics

Example: estimation of pure states

Theorem (optimal estimation of pure states)

The optimal covariant measurement for the above quantum estimation problemis

M(dθ) = d |θ〉〈θ|ν(dθ)

where ν is the unique SU(d)-invariant measure on Θ.

117

Page 118: Quantum Statistics

Example: estimation of pure states

Proof.

I Let θ0 = (1, 0, . . . , 0). The stationary group of θ0 is G0∼= SU(d − 1)

consisting of unitaries U acting on Cd−1 = Line2, . . . , ed and leaving e1

fixed.

I According to the Proposition ‘measurement seed for irreps’

s0 = λP0 +1− λd − 1

P⊥0 , λ ∈ [0, 1]

where P0 = |θ0〉〈θ0|.I According to the Proposition ‘optimal seed for irreps’ we need to optimise

the affine functional λ 7→ Tr(W0s0), thus the minimum is achieved by oneof the extremal points λ = 0 or λ = 1.

I By direct calculation one can verify that λ = 1 is the minimum (exercise)

118

Page 119: Quantum Statistics

Quantum harmonic oscillators and Gaussian states

I The quantum harmonic oscillator/quantum particle

I Creation and annihilation operators, phase-shift operator

I Coherent states

I Squeezed states

I Thermal equilibrium states

I All gaussian states

Reference:U. Leonhardt, Measuring the quantum state of light, Cambridge University Press, 1997

119

Page 120: Quantum Statistics

The quantum harmonic oscillator

Definition (position and momentum)

A quantum quantum harmonic oscillator/quantum particle is characterised byits position and momentum (unbounded) observables acting on L2(R) as

Q : h 7→ xh(x), h ∈ D(Q)

P : h 7→ −idh

dx, h ∈ D(P)

Formally, Q and P satisfy the Heisenberg commutation relation

[Q,P] = i1

Lemma (exercise)

Let F : L2(R)→ L2(R) be the (unitary) Fourier transform

F [f ](p) =1√2π

Zf (q)e−ipqdq, F−1[g ](q) =

1√2π

Zg(p)e ipqdq

The operators Q,P are Fourier transform of each other

Q = FPF∗, P = F∗QF

120

Page 121: Quantum Statistics

Weyl operators

Theorem (Baker-Haussdorf formula)

Let F ,G be operators such that [F ,G ] commutes with both F and G. Then

exp(F + G) = exp(−1

2[F ,G ]) exp(F ) exp(G)

Definition (Weyl operators)

The Weyl operators are defined as U(a) := exp(iaP) and V (b) := exp(ibQ).From the Baker-Haussdorf formula we get the Weyl relations

U(a)V (b) = exp(iab)V (b)U(a)

Alternatively we will use the displacement operators

D(q, p) := exp(ipQ − iqP) = exp(− ipq

2) exp(ipQ) exp(−iqP)

or

121

Page 122: Quantum Statistics

Projective unitary representation of R2

Remark (projective unitary representation of R2)

I Note that U(a) and V (b) act as displacement operators

U(a)QU(a)∗ = Q + a1, V (b)PV (b)∗ = P − b1

I The unitaries D(q, p) satisfy

D(q, p)D(q′, p′) = exp(i/2(pq′ − qp′)) D(q + q′, p + p′),

hence we have a projective unitary representation of R2.

I The theory of covariant measurements can be extended to projectiveunitary representations, in particular the statistical model

ρq,p = D(q, p)ρ0,0D(q, p)∗, (q, p) ∈ R2

is covariant w.r.t displacements in R2.

122

Page 123: Quantum Statistics

Weyl/CCR algebra

Definition (Weyl/CCR algebra)

The C∗-algebra generated by S(a, b) is called the Weyl or CCR (canonicalcommutation relations) algebra.

Lemma (irreducibility of the defining representation)

The representation of CCR on L2(R) is irreducible (exercise).

Hint. To prove this verify that if 〈g ,U(a)V (b)f 〉 = 0 for all (a, b) then g = 0or f = 0, by using properties of the Fourier transform.

Theorem (von Neumann’s uniqueness Theorem)

All weakly continuous (w.r.t. (a, b)) irreducible representations of the Weylalgebra are unitarily equivalent to each other.

Proof.Look up page 225 inA. S. Holevo, Probabilistic and statistical aspects of quantum theory (1982)

123

Page 124: Quantum Statistics

Creation and annihilation operators

Definition (Fock basis)

The Fock, or number O.N. basis in L2(R) is defined by

ψn(x) = Hn(x)e−x2/2/(√π2nn!)1/2, n ≥ 0

where Hn are the Hermite polynomials. We will denote the vectors ψn by |n〉

Definition (creation and annihilation operators)

The creation and annihilation operators on L2(R) are defined as

a∗ = (Q − iP)/√

2 and a := (Q + iP)/√

2

and satisfy the commutation relations [a, a∗] = 1.

124

Page 125: Quantum Statistics

Number and phase-shift operators

Lemma (ladder operators)

I |n〉 are the eigenvectors of the number operator

N := a∗a = (P2 + Q2 − 1)/2 and N|n〉 = n|n〉

I a,and a∗ act as ‘ladder operators’ on |n〉 : n ≥ 0:

a∗|n〉 =√

n + 1|n + 1〉, aψn =√

n|n − 1〉, a|0〉 = 0

LemmaThe phase-shift unitary Γ(φ) := exp(−iφN) acts as

Γ(φ)∗aΓ(φ) = a exp(−iφ)

or equivalently as rotation of the phase space variables

Γ(φ)∗„

QP

«Γ(φ) = R(φ)

„QP

«, where R(φ) :=

„cosφ sinφ− sinφ cosφ

«

Proof. By differentiating Γ(φ)∗aΓ(φ) w.r.t. φ we get −iΓ(φ)∗aΓ(φ)

125

Page 126: Quantum Statistics

Coherent states

Definition (vacuum and coherent states)

I The vector |0〉 is called vacuum or ground state.

I The displacement operators D(q, p) can be rewritten in complex form

D(z) := exp(za∗ − za), z := (q + ip)/√

2 ∈ C

I |z〉 := D(z)|0〉 is called coherent vector and

|z〉 = exp(−1

2|z2|)

∞Xn=0

zn

√n!|n〉

I In particular, N has a Poisson distribution with intensity |z |2 w.r.t. |z〉〈z |

P|z〉〈z|(n) = |〈z |n〉|2 = exp(−|z2|) zn

n!

I Γ(φ) acts on coherent vectors as phase-shift

Γ(φ)|z〉 = |e−iφz〉

126

Page 127: Quantum Statistics

Overcompleteness of the coherent states

Lemma (Overcompleteness of the coherent states)

I The wave function of the coherent vector |z〉 is

ψz(x) = ψ0(x − q) exp(ipx − ipq/2), z = (q + ip)/√

2

I The inner product of two coherent vectors is

〈z |z ′〉 = exp(−|z − z ′|2/2) exp(iIm(zz ′))

I The coherent states form an overcomplete set of projectionsZdqdp

2π|z〉〈z | = 1, z = (q + ip)/

√2

In particular the linear span of the coherent vectors is dense in L2(R)

Proof.First 2 items follow from definitions.The overcompleteness can be checked by taking matrix elements w.r.t. theFock basis.

127

Page 128: Quantum Statistics

Squeezed states

Definition (displaced squeezed vacuum)

The unitary operator S(ξ) is called a squeezing operator

S(ξ) := exp(ξ

2(a2 − a∗2)), ξ ∈ R

The vector states |φ, ξ, z〉 := Γ(φ)S(ξ)D(z)|0〉 are called (pure) squeezedstates.

Lemma (squeezing of coordinates)

S(ξ) has the following action on Q and P

S(ξ)∗QS(ξ) = Qe−ξ, S(ξ)∗PS(ξ) = Peξ

Consequently, squeezing phase shifting and displacing has the action

Ad[D(z)∗Γ(φ)∗S(ξ)∗] :

„QP

«7→„

cosφe−ξ sinφeξ

− sinφe−ξ cosφeξ

«„QP

«+

„qp

«1

Proof. By differentiating S(ξ)∗QS(ξ) w.r. t. ξ we obtain −S(ξ)∗QS(ξ)

128

Page 129: Quantum Statistics

Gaussian states

Definition (Gaussian state)

I The quadrature observables are defined by

Xφ := Q cosφ+ P sinφ = Γ(φ)∗QΓ(φ)

I A state ρ is called Gaussian if Xφ has Gaussian distribution for all φ

LemmaA Gaussian state ρ is completely characterised by the mean values(q, p) := (Tr(ρQ),Tr(ρP)) and the ‘covariance matrix’ of (Q,P)

Vρ :=

„Tr(ρ(Q − q)2) Tr(ρ(Q − q) (P − p))

Tr(ρ(Q − q) (P − p)) Tr(ρ(Q − q)2)

«= Tr(ρXXT )

where

X =

„Q − q1P − p1

«In particular, the distribution of Xφ is

N(q cosφ+ p sinφ, [R(φ)VρR(φ)T ]11)

129

Page 130: Quantum Statistics

Characterisation of Gaussian states

Proof.Any (classical) Gaussian distribution is uniquely determined by its mean andvariance.The mean and variance of Xφ are

Tr(ρXφ) = Tr(ρQ cosφ+ P sinφ) = q cosφ+ p sinφ

and

Tr(ρ(Xφ − q cosφ+ p sinφ)2) = Tr(ρ[R(φ)XXT R(φ)T ]11) = [R(φ)VρR(φ)T ]11

The fact that there can be only one Gaussian state with a give mean andvariance can best be seen by associating in a one-to-one fashion the Wignerfunction Wρ which in the case of Gaussian states is just the GaussianN((q, p),V ).

Wigner functions will be studied in the next section.

130

Page 131: Quantum Statistics

The squeezed states are Gaussian

Lemma (the vacuum state is Gaussian)

The vacuum state is a Gaussian state with each quadrature having distributionN(0, 1/2). The covariance matrix of (Q,P) is

V|0〉 =

„Tr(|0〉〈0|Q2) Tr(|0〉〈0|Q P)

Tr(|0〉〈0|Q P) Tr(|0〉〈0|Q2)

«=

„12

00 1

2

«

Corollary (all squeezed (coherent) states are Gaussian)

The squeezed and coherent states are Gaussian.

The distribution of Xφ′ with respect to ρ := |φ, ξ, z〉〈φ, ξ, z | is the marginalalong the direction φ′ of the bivariate Gaussian N((q, p),V ) with covariancematrix

V =

„Tr(ρQ2) Tr(ρQ P)

Tr(ρQ P) Tr(ρQ2)

«=

1

2R(φ)

„e−2ξ 0

0 e2ξ

«R(φ)T

Moreover these are the only states of minimum uncertainty (exercise)

Varρ(Q)Varρ(P) =1

4

131

Page 132: Quantum Statistics

Thermal equilibrium states

Definition (Thermal equilibrium state)

The thermal equilibrium state at inverse temperature β > 0 is defined by

ρβ := (1− e−β)∞Xn=0

|n〉〈n|e−nβ

Remark

I The thermal equilibrium states are faithful, i.e.all eigenvalues of ρβ are strictly positive

I The thermal equilibrium states are invariant under phase shifts

I The number operator N has a geometric distribution with meanN = (eβ − 1)−1

Pρβ (n) = (1− e−β)e−nβ

132

Page 133: Quantum Statistics

Thermal states as mixtures of coherent states

LemmaThe thermal equilibrium state is a mixture of coherent states with Gaussianweight of variance σ2 = 1/(eβ − 1)

ρβ =1

2πσ2

Ze−(q2+p2)/(2σ2)|z〉〈z |dqdp, z = (q + ip)/

√2

Proof.By rotation (phase) symmetry it suffices to verify that the diagonal matrixelements on the right and left side coincide:

1

σ2

Z ∞0

e−r2/(2σ2)e−|r|2/2 (r 2/2)n

n!d(r 2/2) = (1− e−β)e−nβ

133

Page 134: Quantum Statistics

The thermal equilibrium states are Gaussian

Corollary

The thermal equilibrium ρβ state is a centered Gaussian state with covariancematrix „

Tr(ρQ2) Tr(ρQ P)Tr(ρQ P) Tr(ρQ2)

«=

coth(β/2)

2

„1 00 1

«,

where coth(β/2) = (eβ/2 + e−β/2)/(eβ/2 − e−β/2).

134

Page 135: Quantum Statistics

All Gaussian states

Theorem (general form of a Gaussian state)

Any Gaussian state of a quantum harmonic oscillator is of the form

ρ = D(z)∗Γ(φ)∗S(ξ)∗ ρβ S(ξ)Γ(φ)D(z),

i.e. a displaced, rotated, squeezed thermal state. The corresponding bivariateGaussian is N((p, q),V ) with covariance matrix

V =coth(β/2)

2R(φ)

„e−2ξ 0

0 e2ξ

«R(φ)T

A positive real matrix V is the covariance matrix of a Gaussian state iff

Det(V ) ≥ 1

4

Proof: exercise.

135

Page 136: Quantum Statistics

Estimation of Gaussian states

I Gaussian shift models

I Gaussian estimation problems

I One dimensional Gaussian shift

I Two dimensional Gaussian shift

I The optimal covariant measurement

Reference:A. S. Holevo, Probabilistic and statistical aspects of quantum theory (1982)

136

Page 137: Quantum Statistics

Gaussian shift models

DefinitionDenote by G(z ,V ) the density matrix of the Gaussian state with displacementz and covariance matrix V .

A quantum Gaussian shift model is a family of the form

QV := G(θ,V ) : θ ∈ Θ ⊂ C

where Θ is a real linear subspace of C and V is a fixed and known covariancematrix.

RemarkIn the case of a single quantum oscillator considered so far, there are only twopossible types of Gaussian shift models: one dimensional and two-dimensional(full) shift.

137

Page 138: Quantum Statistics

Equivalence with displaced thermal equilibrium states

LemmaBy applying an appropriate (unitary) squeezing operation we can transform themodel QV into an equivalent model consisting of displaced thermal equilibrium(or coherent) states.

Proof.By the Theorem on the general form of a Gaussian state we have

G(z ,V ) = D(z)∗Γ(φ)∗S(ξ)∗ ρβ S(ξ)Γ(φ)D(z),

We haveS(ξ)Γ(φ)D(z)Γ(φ)∗S(ξ)∗ = D(z ′),

wherez ′ = (eξRe(e iφz) + ie−ξIm(e iφz))

From the above equations we get

G(z ′, σ21) = S(ξ)Γ(φ) G(z ,V ) Γ(φ)∗S(ξ)∗, σ2 = Det(V )1/2

138

Page 139: Quantum Statistics

Gaussian estimation problems

We will consider the following two estimation problems:

1. estimation of parameter θ in the one-dimensional Gaussian shift

ρθ := exp(−iθP)ρβ exp(iθP) : θ ∈ R

for quadratic riskR(θ, θ) = Eθ((θ − θ)2)

2. estimation of parameter θ = (q, p) in the two-dimensional Gaussian shift

ρθ := D(q, p)ρβD(q, p)∗ : (q, p) ∈ R2

for quadratic risk

R(θ, θ,G) = Eθ((θ − θ)T G(θ − θ))

By rotation symmetry of ρβ we can pass to the coordinates system inwhich G = Diag(gq, gp) is diagonal and write

R(θ, θ,G) = gqEθ((q − q)2) + gpEθ((p − p)2)

139

Page 140: Quantum Statistics

One-dimensional Gaussian shift

Theorem (optimal estimation for one-dimensional shift)

Letρθ := exp(−iθP)ρβ exp(iθP), θ ∈ R,

be a Gaussian shift with 0 < β <∞ fixed and known.

The symmetric logarithmic derivative Lθ defined by

dρθdθ

= Lθ ρθ

is equal to 2(Q − θ1)/ coth(β/2) and the quantum Cramer-Rao bound isachieved by measuring Q. The resulting unbiased estimator θ has risk

R(θ, θ) = Varθ(θ) = H(θ)−1 =coth(β/2)

2

140

Page 141: Quantum Statistics

One-dimensional Gaussian shift: proof

Proof.

I We have

dρθdθ

= −i [P, ρθ] = −i exp(−iθP)[P, ρβ ] exp(iθP)

Thus the s.l.d. is of the form Lθ = exp(−iθP)L0 exp(iθP) where L0 is thesolution of

−i [P, ρβ ] = L0 ρβ

I By writing the matrix elements w.r.t the Fock basis we get

−i(e−nβ − e−mβ)〈m|P|n〉 = (e−nβ + e−mβ)/2〈m|L0|n〉

with solution L0 = 2Q/ coth(β/2). Hence Lθ = 2(Q − θ1)/ coth(β/2)

I The Helstrom-Fisher information is H(θ) = Tr(ρθL2θ) = 2/ coth(β/2)

I The result θ of measuring Q is an unbiased estimator of θ and

Varθ(θ) = H(θ)−1

141

Page 142: Quantum Statistics

One dimensional Gaussian shift as a covariant family

Remark

I The s.l.d.’s Lθ : θ ∈ R form a commutative family and in this case wecan achieve the Cramer-Rao bound ‘in one shot’ not only in the sense oflocally unbiased measurements which provide only asymptotic optimality

I The Helstrom-Fisher matrix and risk do not depend on θ due to the factthat we deal with a covariant family. The methods developed for covariantmeasurements with respect to compact groups can be extended to R andlead to the same optimal measurement (see Holevo)

I The same result can be obtained in the case of coherent states (‘β =∞’)with the difference that Lθ is not uniquely defined as an operator but onlyas an element of L2(ρθ)

142

Page 143: Quantum Statistics

Two-dimensional Gaussian shift

Theorem (optimal estimation for two-dimensional shift)

Letρθ := D(q, p)ρβD(q, p)∗, θ = (q, p) ∈ R2

be a Gaussian shift model with 0 < β <∞ fixed and known.

Let us fix the quadratic risk for an estimator θ = (q, p)

R(θ, θ,G) = gqEθ((q − q)2) + gpEθ((p − p)2)

1. The following covariant measurement is optimal for the above estimationproblem

M(dq dp) = |ξ, z〉〈ξ, z |dq dp

where |ξ, z〉 is the pure squeezed state with z = (q + i p)/√

2 andsqueezing parameter e−2ξ =

pgp/gq

2. The components of the corresponding estimator θ = (q, p) are unbiasedestimators of q and respectively p and their covariance matrix achieves thelower bound in the right Cramer-Rao bound.

143

Page 144: Quantum Statistics

Simple implementation of the optimal measurement (I)

The measurement M(dq dp) can be dilated to a PVM as follows.

I Let Q,P be the coordinates of the system and let Q ′,P ′ be thecoordinates of an independent copy of the oscillator. On the joint spaceL2(R)⊗ L2(R) we have„

QP

«≡„

Q ⊗ 1P ⊗ 1

«,

„Q ′

P ′

«≡„

1⊗ Q ′

1⊗ P ′

«,

I Define the rotated coordinates (50% beam-splitter transformation)„Q1

P1

«=

1√2

„Q + Q ′

P + P ′

«,

„Q2

P2

«=

1√2

„Q − Q ′

P − P ′

«,

and note that P1 commutes with Q2. Thus we can define the PVM

E(dq dp) = E Q(dq) · E P(dp)

with Q := Q − Q ′ and P ′ = P + P ′

144

Page 145: Quantum Statistics

Simple implementation of the optimal measurement (II)

I If the oscillator (Q ′,P ′) is prepared in state |ξ〉 then we obtain theeffective measurement M on the first copy of L2(R)

Tr(ρM(B)) = Tr(ρ⊗ |ξ〉〈ξ|E(B)), B ∈ Σ(R2)

I The measurement M is uniquely fixed by its characteristic functions

ϕMρ (u, v) := Tr(ρM(e iuq+iv p)) = Tr(ρ⊗|ξ〉〈ξ|E(e iuq+iv p)), ρ ∈ T1(L2(R))

Thus it is enough to show that the right hand side is equal toZe iuq+iv pTr(ρ|z , ξ〉〈z , ξ|) dqdp

I Finally, up to a density argument, it is enough to show this for rank one ρof the form |z1, ξ〉〈z2, ξ|. In this case the equality reduces to acomputation with Gaussian integrals (exercise)

145

Page 146: Quantum Statistics

Proof of the optimal estimation for two-dimensional shift Theorem

I It is easy to see that the measurement is unbiased and has the same riskfor all θ

I To show that the measurement is optimal one has to show thatTr(GVar(θ)) achieves the smallest possible value.

I For this it suffices to show that the right Cramer-Rao bound is achieved inthe sense that

Tr(GVarθ(θ)) = Tr(GJ−1θ )

I Both sides can be explicitly computed, similarly to the one dimensionalcase.

146

Page 147: Quantum Statistics

Alternative proof using covariant measurements

LemmaAny covariant measurement w.r.t the displacement operators has a POVM ofthe form

M(dq dp) = D(z)ρD(z)∗dq dp

2π, z = (q + i p)/

√2

where ρ is an arbitrary state.

Proof.Extension of the results on covariant measurements for irreduciblerepresentations, to the case of non-compact (but abelian) group R2

TheoremThe covariant measurement which minimises the risk R(θ, θ,G) is given by

M(dq dq) = |ξ〉〈ξ|dq dp

with e−2ξ =p

gq/gp.

147

Page 148: Quantum Statistics

Alternative proof using covariant measurements

I Since the risk is affine w.r.t. ρ the minimum is achieved on an extremalpoint, so we can restrict to pure states ρ = |ψ〉〈ψ|

I Let M be the measurement with seed ψ. Then M can be dilated to aPVM on two oscillators as before, by simply replacing |ξ〉 by |ψ〉 and

Eθ(q) =

Zq Tr(ρθ ⊗ |ψ〉〈ψ|E Q(dq)E P(dp))

= Tr(ρβ ⊗ |ψ〉〈ψ|Q) = Tr(ρβQ)− 〈ψ|Q ′|ψ〉= q − 〈ψ|Q ′|ψ〉

Similarly, Eθ(p) = p + 〈ψ|P ′|ψ〉.The measurement is thus unbiased up to a constant factor.

I The risk of M is constant as function of θ and equal to

R(M,G) = gq[Tr(ρβQ2) + Tr(|ψ〉〈ψ|Q ′2)] + gq[Tr(ρβP2) + Tr(|ψ〉〈ψ|P ′2)]

=coth(β/2)

2Tr(G) + [gq〈ψ,Q ′2ψ〉+ gp〈ψ,P ′2ψ〉]

148

Page 149: Quantum Statistics

Alternative proof using covariant measurements

I We have

gq〈ψ,Q ′2ψ〉+ gp〈ψ,P ′2ψ〉2

≥ √gqgp

pψ,Q ′2ψ〉〈ψ,P ′2ψ〉 ≥ 1

2

√gqgp

where we used Heisenberg’s uncertainty relation in the second inequality.

I Equalities are obtained if |ψ〉 is a minimum uncertainty state, i.e. a puresqueezed states |ψ〉 = |ξ〉, and if

gqe−2ξ = gpe2ξ

i.e. e−2ξ =p

gp/gq

I The minimum risk is

R(G) =coth(β/2)

2Tr(G) +

√DetG

149

Page 150: Quantum Statistics

Wigner functions and quantum homodyne tomography

I Hilbert-Schmidt operators

I Isometry between T2 and L2(R2)

I The Wigner function

I Examples of Wigner function

I Quantum homodyne homography

I Estimation of matrix elements using pattern functions

Reference:A. S. Holevo, Probabilistic and statistical aspects of quantum theory (1982)

L. Artiles, R. D. Gill and M. Guta, J. Royal Stat. Soc B, 67, 109-134, (2005)

150

Page 151: Quantum Statistics

Hilbert-Schmidt operators

Definition (Hilbert-Schmidt operators)

Let H be a Hilbert space. The class of Hilbert-Schmidt operators is defined by

T2(H) := τ ∈ B(H) : Tr(|τ |2) <∞

with ‖τ‖2 = Tr(|τ |2)1/2.

Properties

I T2(H) is a Hilbert space with inner product

〈τ, σ〉2 = Tr(τ∗σ)

I The finite rank operators are dense in T2(H)

I Any τ ∈ T2(H) has a singular value decomposition

τ =∞Xi=1

µi |ei 〉〈fi |,

where µi ≥ 0, ei and fi are ONB andP

i |µi |2 = ‖τ‖22 <∞.

I The trace-class operators T1(H) form a subset of T2(H)

151

Page 152: Quantum Statistics

Isometry between T2(L2(R)) and L2(R2)

Proposition (Isometry between T2(L2(R)) and L2(R2))

1. Let ψ1, ψ2 be vectors in L2(R). Then the function

(u, v) 7→ fW|ψ2〉〈ψ1|(u, v) := 〈ψ1, exp(−i(uQ + vP))ψ2〉

is square intergrable, i.e. fW (u, v) ∈ L2(R2)

2. If ei is an ONB of L2(R) then the functions

1√2πfW|ei 〉〈ej |(u, v), i , j ≥ 1

form an ONB of L2(R2)

3. The transformation

τ 7→ 1√2πfWτ (u, v) :=

1√2π

Tr(τ exp−i(uQ + vP)))

maps the Hilbert space T2(L2(R)) unitarily onto L2(R2).

152

Page 153: Quantum Statistics

Isometry between T2(L2(R)) and L2(R2)

Proof.1. Using the definition of Weyl operators we have

1√2π〈ψ1, exp(−i(uQ + vP))ψ2〉 =

1√2π

ei2uv

Zψ1(x)e−iuxψ2(x − v)dx

=1

2πe

i2uv

Z Zψ1(x)ψ2(y)e ixy e−i(yv+ux)dxdy

where ψ2 is the Fourier transform of ψ2.

Since ψ1, ψ2 are square integrable, then ψ1(x)ψ2(y)e ixy is a square integrablefunction of (x , y) and the double integral is its Fourier transform.

2. If eii is an ONB in L2(R) then ei (x)ej(y)e ixyi,j is an ONB in L2(R2).

The result follows from the fact that the Fourier transform is unitary operatoron L2(R2).

3. This is a consequence of 2. and of the fact that the finite rank operators aredense in T2(L2(R)).

153

Page 154: Quantum Statistics

Wigner function

Definition (Wigner function)

Let τ ∈ T2(L2(R)).

I The ‘characteristic function’ of τ is defined asfWτ (u, v) := Tr(τ exp(−i(uQ + vP))), (u, v) ∈ R2

I The Wigner function of τ is defined as

Wρ(q, p) =1

(2π)2

Z Zexp(iuq) exp(ivp)fW (u, v)dudv

154

Page 155: Quantum Statistics

(some) Properties of Wigner functions

I Let τ ∈ T2(L2(R)). Then Wτ is a square integrable function and

Wτ∗(q, p) = Wτ (q, p)

I (Overlap formula) Let τ1, τ2 ∈ T2(L2(R)). Then

Tr(τ∗1 τ2) = 2π

Z ZWτ1 (q, p)Wτ2 (q, p)dqdp

I Let ρ be a density matrix. Then the one dimensional marginal along thedirection φ of the Wigner function Wρ is equal to the probability densityof the quadratures Xφ in the state ρ:

p(Xφ)ρ (q) =

Z ∞−∞

W (q cosφ− p sinφ, q sinφ+ p cosφ)dp

I Wρ is a quasi-probability distribution of Q and P: its marginals areprobability densities but Wρ may take negative values.

I Displacements, phase rotations and squeezing act in the obvious way onthe space of Wigner functions.

155

Page 156: Quantum Statistics

Examples of Wigner functions

Wigner function of a squeezed state Wigner function of a single-photon-added state

!4!2

02

4!4

!20

24

!0.3

!0.2

!0.1

0

0.1

0.2

0.3

0.4

qp

1020

3040

5060

1020

3040

5060

!0.3

!0.25

!0.2

!0.15

!0.1

!0.05

0

0.05

0.1

Wigner function of a ‘Schrodinger cat state’

(superposition of two coherent vectors)Wigner function of the one photon state ψ1

156

Page 157: Quantum Statistics

Quantum homodyne tomography

Quantum homodyne tomography is a measurement technique developed inquantum optics for the estimation of the state of a quantum oscillator(monochromatic mode of light)

G. Breitenbach, S. Schiller and J. MlynekMeasurement of the quantum states of squeezed light

Nature 387 471-475 (1997)

157

Page 158: Quantum Statistics

Quantum homodyne tomography: the measurement procedure

Meausurement procedure

1. One chooses a random, uniformly distributed phase φ ∈ [0, π]

2. The quadrature Xφ is measured on a quantum system prepared in state ρ.This is a so called homodyne measurement (see figure)

3. Steps 1. and 2. are repeated on independent copies of ρ and one collectsi.i.d. data (Φ1,X1), . . . , (Φn,Xn) with probability density

pρ(φ, x) = p(φ) · pρ(x |φ) =1

πp

Xφρ (x), x ∈ R, φ ∈ [0, π]

signal

detector

local

detector

oscillator

I2

I1

I2−I1

|z| ∼ pρ(x, φ)

beam splitter

z = |z|eiφ

158

Page 159: Quantum Statistics

Qauntum homodyne tomography: the Radon transform

Definition (Radon transform)

Let Wρ : R2 → R be the Wigner function of ρ. The Radon transform of Wρ isthe function on R× [0, π] given by

R[Wρ](q, φ) :=

Z ∞−∞

W (q cosφ− p sinφ, q sinφ+ p cosφ)dp = pXφρ (q)

La fonction de Wigner et la transformation de Radon

On peut pas mesurer Q et P simultanement!

QP − PQ = i1 =⇒ E(QP) = Tr [PΨQP] 6= Tr [PΨPQ] = E(PQ)

Quadratures: Xφ = cos φ Q + sin φ P, densite de prob. pρ(x | φ)

ρ Wρ(q, p) pρ(x | φ)

Wρ(u, v) = Tr [ρ exp(−iuQ − ivP)]

.......................................................................................................................................................................................................................... ............ ...................................................................................................................................................................................................................................... .................................................................................................................................................................................................................................................................................................................................................. ............R................................................................................................................................................................................................................................................................................................................. .........

...

F2

.................................................................................................................................................................................................................................................................................................................................................................

F1

Radon: R[f ](x, φ) =∫∞−∞ f(x cos φ + t sin φ, x sin φ − t cos φ)dt

Remark

I From the above diagram we conclude that the map ρ 7→ pρ(x , φ) isinjective and hence ρ is identifiable from the data with distribution Pn

ρ.

I If one aims at estimating Wρ rather than ρ, the problem is closely relatedto the ‘classical’ positron emission tomography method with the differencethat Wρ has intrinsic quantum properties, e.g. it may take negative values.

159

Page 160: Quantum Statistics

Quantum homodyne tomography: the statistical problem

ProblemGiven i.i.d. data (X1,Φ1), . . . , (Xn,Φn) with distribution Pρ, construct anestimator ρn of ρ such that

I ρn is consistent, i.e. d(ρn, ρ)→ 0 as n→∞ for some relevant distance d,e.g. norm-one, fidelity...

I R(ρn, ρ) := Eρd(ρn, ρ) is small for all states ρ

Similar problem for the estimation of Wρ or some functional of ρ, e.g. Tr(ρ2)

Remark

I If some information is available about ρ we can encode it in theparametrisation ρ = ρθ for θ ∈ Θ

I If the dimension of Θ is infinite, we deal with a non-parametric estimationproblem for which the risk R(ρn, ρ) may decrease slower than the 1/ntypical for parametric problems

I A more restrictive statistical model ρθ : θ ∈ Θ leads to faster estimationrates but one should avoid “model misspecification”

160

Page 161: Quantum Statistics

Estimation of matrix elements using pattern functions

LemmaLet (X ,Φ) ∼= Pρ. Let wρ(s|φ) := fWρ(s cosφ, s sinφ) and define

Fij(x , φ) :=

2F−1[w|j〉〈j|(−s|φ)|s|](x)

where F−1 is the inverse Fourier transform with respect to s. Then

Eρ(Fi,j(X ,Φ)) = ρi,j .

Moreover Fj,k(x , φ) is of the form

Fj,k(x , φ) = fj,k(x) expi(j−k)φ

with fj,k bounded oscillatory functions called pattern functions.

161

Page 162: Quantum Statistics

Estimation of matrix elements using pattern functions

Proof.We have

ρi,j = Tr(ρ|j〉〈i |) =1

Z Z fWρ(u, v)fWi,j(−u,−v)dudv

=1

Z ∞−∞

Z π

0

|s|wρ(s|φ)w|j〉〈j|(−s|φ) |s|dsdφ

=1

π

Z ∞−∞

Z π

0

Fi,j(x , φ)pρ(x |φ)

= Eρ(Fi,j(X ,Φ))

In the first equality we used the isometry between T2(L2(R)) and L2(R2)

In the third equality we used the relation w(s|φ) = F [p(x |φ)](s).

The dependence of Fj,k on φ follows from the definition of fWj,k .

162

Page 163: Quantum Statistics

Pattern functions

−5 0 5−1

−0.5

0

0.5

1

Pattern function f5,5

q−10 −5 0 5 10−1

−0.5

0

0.5

1

Pattern function f20,20

q

−10 −5 0 5 10−1

−0.5

0

0.5

1

Pattern function f5,20

q−10 −5 0 5 10−1

−0.5

0

0.5

1

1.5

Pattern function f10,30

q

Pattern functions for different matrix elements

163

Page 164: Quantum Statistics

Estimators based on pattern functions

Definition (pattern function estimator for matrix elements)

Let (X1, φ1), . . . , (Xn,Φn) be i.i.d. with distribution Pρ. The pattern functionestimator of ρj,k is

ρ(n)j,k :=

1

n

nXi=1

fj,k(Xi )e i(j−k)Φi

Theorem (consistency of truncated pattern function estimators)

Let ρ(n) be the density matrix estimator with matrix elements

ρ(n)j,k :=

ρ

(n)j,k if j , k ≤ d(n)

0 if maxj , k > d(d)

where d(n) is the effective dimension of the estimator and satisfies d(n) ↑ ∞ asn→∞ and d(n) = o(n3/7)Then ρ(n) is consistent with respect to the ‖ · ‖2 distance, i.e.

limn→∞

Eρ‖ρ(n) − ρ‖22 = 0

164

Page 165: Quantum Statistics

Estimators based on pattern functions

Proof.

I We write the risk as

E(‖ρ(n) − ρ‖22) =

∞Xj,k=0

Eρ(|ρj,k − ρj,k |2)

=

d(n)Xj,k=0

Eρ(|ρj,k − ρj,k |2) +X

maxj,k>d(n)

|ρj,k |2.

The variance term (first) decreases with n but increases with d , while thebias term (second) decreases with d . Thus, if the variance is controlledwhile increasing d with n we obtain the consistency result.

I Now

Eρ(|ρj,k − ρj,k |2) =1

n

Z|Fj,k(x , φ)− ρj,k |2p(x |φ)

πdx ≤ 1

n‖fj,k‖2

2

The result follows from the bound (see J. Royal Stat. Soc. B, 67, 109-134, (2005))

dXj,k=0

‖fj,k‖22 = O(d7/3)

165

Page 166: Quantum Statistics

Dependence of the risk on dimension d

The graphs below show the risk as function of dimension d for two estimationmethods (pattern functions and sieve max. lik.) and several choices of n.

The tradeoff between bias and variance is reflected in the existence of aminimum for a certain ‘oracle’ dimension d∗(n) which depends on ρ.

5 10 15 20 250

0.5

1

1.5

2n = 100

5 10 15 20 250

0.5

1

1.5

2n = 200

5 10 15 20 250

0.5

1

1.5

2n = 400

5 10 15 20 250

0.5

1

1.5

2n = 800

5 10 15 20 250

0.5

1

1.5

2n = 1600

5 10 15 20 250

0.5

1

1.5

2n = 3200

5 10 15 20 250

0.5

1

1.5

2n = 6400

5 10 15 20 250

0.5

1

1.5

2n = 12800

4 6 8 10−4

−3

−2

−1

0

log(n)

log(

min

(L2 ))

Opt. L2 vs n, PF and MLE

166

Page 167: Quantum Statistics

Methods for choosing the dimension d(n)

Deterministic choice of dimensionSuppose that ρ belongs to a ‘nice’ class of states, e.g.

ρ ∈ E(α, β) := τ : Tr(τeαN ≤ β), α, β > 0

Then Xmaxj,k>d

|ρj,k |2 ≤ C(α, β)e−αd

and the risk is upper bounded by

Eρ(‖ρn − ρ‖22) ≤ C

d7/3

n+ C ′e−αd

By choosing d = 1α

log n we get

Eρ(‖ρn − ρ‖22) = O

„(log n)7/3

n

«which is only slightly worse than the parametric rate 1/n

167

Page 168: Quantum Statistics

Methods for choosing the dimension d(n)

Data driven choice of dimensionThe previous method selects d which works for all states in a certain class. Itwould be nicer to adapt the dimension to the particular state by making use ofthe data.

There are many ‘model selection’ methods, e.g. penalised maximum likelihood,cross-validation, Akaike’s information criterion, thresh-holding...

In general one would like to find

d∗ = arg mind

R(d) := arg mind

Eρ‖ρ(n)d − ρ‖

22

where ρ(n)d is the estimator of dimension d for n samples.

However R(d) depends on ρ, so we can try to estimate it and minimise theestimator.

168

Page 169: Quantum Statistics

Methods for choosing the dimension d(n)

The risk as a function of d is

R(d) = Eρ‖ρ(n)d − ρ‖

22 =

dXj,k=0

Eρ|ρ(n)j,k |

2 − 2dX

j,k=0

|ρj,k |2 + ‖ρ‖22

We try to change this into an expression depending only on the data (not onρj,k), which is an estimator of the risk.

I The last term does not depend on d and can be dropped without changingthe minimum

I The first term can be estimated from the data by simply takingPdj,k=0 |ρ

(n)j,k |

2

I The second term can be estimated unbiasedly by

−2

n(n − 1)

dXj,k=0

Xa 6=b

Fj,k(Xa,Φa)Fj,k(Xb,Φb)

I Thus our estimator for R(d) = ‖ρ‖22 is

dXj,k=0

Eρ|ρ(n)j,k |

2 − 2

n(n − 1)

dXj,k=0

Xa 6=b

Fj,k(Xa,Φa)Fj,k(Xb,Φb)

169

Page 170: Quantum Statistics

Methods for choosing the dimension d(n)

0 5 10 15 20 25 30 35 40 450

0.5

1

1.5

2

2.5

D

L2 nor

m o

f err

or

L2 error, theoretical and for CV estimator for 1000 observations

The graph represents the risk R(d) as function of dimension d (red) and theestimated risk for three samples of data from a squeezed state (blue)

170