Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Eﬃcient...

Faster and Sample Near-Optimal Algorithms for Proper Learning

Mixtures of GaussiansConstantinos Daskalakis, MIT

Gautam Kamath, MIT

What’s a Gaussian Mixture Model (GMM)?

• Interpretation 1: PDF is a convex combination of Gaussian PDFs• 𝑝 𝑥 = Σ𝑖𝑤𝑖𝒩 𝜇𝑖 , 𝜎𝑖

2, 𝑥

• Interpretation 2: Several unlabeled Gaussian populations, mixed together

• Focus on mixtures of 2 Gaussians (2-GMM) in one dimension

PAC (Proper) Learning Model

• Output (with high probability) a 2-GMM 𝑋 which is close to 𝑋 (in statistical distance*)

• Algorithm design goals:• Minimize sample size• Minimize time

𝑋 ∈ ℱ 𝒜𝑥1, 𝑥2, … , 𝑥𝑚~ 𝑋 𝑋 ∈ ℱ

*statistical distance = total variation distance = ½ × 𝐿1 distance

Learning Goals

Improper Proper Parameter

Target

Learning Goals

Improper Proper Parameter

Target

Prior Work

Learning Model Sample Complexity Time Complexity

Improper Learning

Proper Learning

Parameter Estimation 𝑝𝑜𝑙𝑦 1/휀 𝑝𝑜𝑙𝑦 1/휀 [KMV10]

Prior Work

Improper Learning 𝑂 1/휀2 𝑝𝑜𝑙𝑦 1/휀 [CDSS14]

Proper Learning

Prior Work

Proper Learning 𝑂 1/휀2

𝑶 𝟏/𝜺𝟐 𝑂 1/휀7 [AJOS14] 𝑶 𝟏/𝜺𝟓 [DK14]

Prior Work

Proper Learning 𝑂 1/휀2

𝑶 𝟏/𝜺𝟐 𝑂 1/휀7 [AJOS14] 𝑶 𝟏/𝜺𝟓 [DK14]

• Sample lower bounds:• Improper and Proper Learning: Ω 1/휀2 [folklore]

• Parameter Estimation: Ω 1/휀6 [HP14]• Matching upper bound, but not immediately extendable to proper learning

The Plan

1. Generate a set of hypothesis GMMs

2. Pick a good candidate from the set

The Plan

Some Tools Along the Way

a) How to remove part of a distribution which we already know

b) How to robustly estimate parameters of a distribution

c) How to pick a good hypothesis from a pool of hypotheses

The Plan

Who Do We Want In Our Pool?

• Hypothesis: ( 𝑤, 𝜇1, 𝜎1, 𝜇2, 𝜎2)

• Need at least one “good” hypothesis

• Parameters are close to true parameters• Implies desired statistical distance bound

• Want:• 𝑤 − 𝑤 ≤ 휀

• 𝜇𝑖 − 𝜇𝑖 ≤ 휀𝜎𝑖• 𝜎𝑖 − 𝜎𝑖 ≤ 휀𝜎𝑖

Warm Up: Learning one Gaussian

• Easy!

• 𝜇 = sample mean

• 𝜎2 = sample variance

The Real Deal: Mixtures of Two Gaussians

• Harder!

• Sample moments mix up samples from each component

• Plan:• Tall, skinny Gaussian* stands out –

learn it first

• Remove it from the mixture

• Learn one Gaussian (easy?)

*Component with maximum 𝑤𝑖

𝜎𝑖

The First Component

• Claim: Using 𝑂12 sample size, can

generate 𝑂13 candidates

𝑤, 𝜇1, 𝜎1 , at least one is close to the taller component

• If we knew which candidate was right, could we remove this component?

Dvoretzky–Kiefer–Wolfowitz (DKW) inequality

• Using sample of size 𝑂12 from a distribution 𝑋, can output 𝑋 such

that 𝑑𝐾 𝑋, 𝑋 ≤ 휀

• Kolmogorov distance – CDFs of distributions are close in 𝐿∞distance• Weaker than statistical distance

• Works for any probability distribution!

Subtracting out the known component

PDF to CDF

DKW inequality

Subtract component

Monotonize

Warm Up (?): Learning one (almost) Gaussian

Robust Statistics

• Broad field of study in statistics

• Median

• Interquartile range

• Recover original parameters (approximately), even for distributions at distance 휀

• Entirely determined by the other component

• Still 𝑂13 candidates!

The Plan

Hypothesis Selection

• 𝑁 candidate distributions

• At least one is 휀-close to 𝑋 (in statistical distance)

• Goal: Return candidate which is 𝑂 휀 -close to 𝑋

𝑐휀

• Classical approaches [Yat85], [DL01]• Scheffé estimator, computation of the “Scheffé set” (potentially hard)

• 𝑂(𝑁2) time

• Acharya et al. [AJOS14]• Based on Scheffé estimator

• 𝑂(𝑁 log𝑁) time

• Our result [DK14]• General estimator, minimal access to hypotheses

• 𝑂(𝑁 log𝑁) time

• Milder dependence on error probability

• Input:• Sample access to 𝑋 and hypotheses ℋ = 𝐻1, … , 𝐻𝑁• PDF comparator for each pair 𝐻𝑖 , 𝐻𝑗• Accuracy parameter 휀 , confidence parameter 𝛿

• Output:• 𝐻 ∈ ℋ• If there is a 𝐻∗ ∈ ℋ such that 𝑑𝑠𝑡𝑎𝑡 𝐻

∗, 𝑋 ≤ 휀, then 𝑑𝑠𝑡𝑎𝑡 𝐻,𝑋 ≤ 𝑂(휀) with probability ≥1 − 𝛿

• Sample complexity: 𝑂log 1/𝛿

2 log𝑁

• Time complexity: 𝑂log 1/𝛿

2 𝑁 log𝑁 + log21

• Expected time complexity: 𝑂𝑁 log 𝑁/𝛿

• Naive: Tournament among candidate hypotheses; compare every pair; output hypothesis with most wins

• Us: Set up a single-elimination tournament• Issue: error doubles at every level of tree; log𝑁 levels →Ω(2log 𝑁 휀) error

• Better analysis via double-window argument:• Great hypotheses: those within 휀 of target

• Good hypotheses: those within 8휀 of target

• Bad hypotheses: the rest

• Show: if density of good hypotheses small, error propagation won’t happen

• If density large, sub-sample 𝑁 hypotheses; run naive tournament

Putting It All Together

• 𝑁 = 𝑂13 candidates

• Use hypothesis selection algorithm to pick one

• Sample complexity: 𝑂 log(1/𝛿) /휀2

• Time complexity: 𝑂 log3(1/𝛿) /휀5

Open Questions

• Faster algorithms for 2-GMMs

• Time complexity of k-GMMs

• High dimensions

Bibliography

• [AJOS14] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theerta Suresh. Near-optimal-sample estimators for spherical Gaussian mixtures.

• [CDSS14] Siu On Chan, Ilias Diakonikolas, Rocco A. Servedio, and Xiaorui Sun. Efficient Density Estimation via Piecewise Polynomial Approximation.

• [DL01] Luc Devroye and Gabor Lugosi. Combinatorial Methods in Density Estimation.

• [HP14] Moritz Hardt and Eric Price. Sharp bounds for learning a mixture of two Gaussians.

• [KMV10] Adam Kalai, Ankur Moitra, Gregory Valiant. Efficiently Learning Mixtures of Two Gaussians.

• [Yat85] Yannis Yatracos. Rates of convergence of minimum distance estimators and Kolmogorov’s entropy.

Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures … · Sun. Eﬃcient...

Documents

Peningkatan Kualitas Citra - Afif Supianto Blogafif.lecture.ub.ac.id/files/2013/09/Slide-03-Peningkatan-Kualitas...Contrast Stretching c. Histogram Equalization ... Piecewise-Linear

Energy‐Eﬃcient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

Evalua&ng)STT,RAMas)an)) Energy,Eﬃcient)Main)Memory ... · Evalua&ng)STT,RAMas)an)) Energy,Eﬃcient)Main)Memory)Alterna&ve) EmreK ültürsay, Mahmut"Kandemir," Anand"Sivasubramaniam*,"and"Onur"Mutlu†

Eﬃcient Computation of Many-to-Many Shortest Paths

Data-Efficient Performance Learning for ConfigurableSystems17em… · CART, that combines CART with resampling and parameter tuning for performance prediction of configurable systems

Evalua&ng)STT,RAMas)an)) Energy,Eﬃcient)Main)Memory ...omutlu/pub/kultursay_ispass13_talk.pdf · Evalua&ng)STT,RAMas)an)) Energy,Eﬃcient)Main)Memory)Alterna&ve) EmreK ültürsay*,

Provable approximation properties for deep neural networks1).pdf · three layer networks containing McCulloch-Pitts (Heaviside) units (hence computing piecewise linear functions)

Piecewise Functions - Belton Independent School … Algebra 2 9-2 Piecewise Functions A piecewise function is a function that is a combination of one or more functions. Piecewise functions

Piecewise-Afﬁne Approximations for a Powertrain Control ...jdeshmuk/Papers/arch15.pdf · Powertrain Control Veriﬁcation Benchmark Jyotirmoy V. Deshmukh, Hisahiro Ito, Xiaoqing

K H P S T Sebagai Upaya Pencegahan S P K S D R S “X” S M P …repository.its.ac.id/304/2/1311100081-presentationpdf.pdf · 2016-06-09 · piecewise polynomial 1 dan piecewise

Unsere weiteren Termine: Jahreskonzert 2018 · The Rock N. Glennie-Smith/H. Zimmer/H. Gregson-Williams, arr. Pascal Devroye Symphonic Highlights from “Frozen

Eﬃcient solution of a three-dimensional inverse heat ...herbert/pubs/pool...Eﬃcient solution of a three-dimensional inverse heat conduction problem in pool boiling Herbert Egger1,

Eﬃcient Attribute-Based Signatures for Non …Eﬃcient Attribute-Based Signatures for Non-Monotone Predicates in the Standard Model Tatsuaki Okamoto1 and Katsuyuki Takashima2 1

Harmonic maps on domains with piecewise Lipschitz continuous

Phase space Feynman path integrals via piecewise ... · PDF filePhase space Feynman path integrals via piecewise bicharacteristic paths and their ... (cf. L.S. Schulman [28, ... space

Splines and Piecewise Interpolation - 國立臺灣師範大學berlin.csie.ntnu.edu.tw/Courses/Numerical Methods/Lectures2012S... · Splines and Piecewise Interpolation ... • Example

Structure-Property Relationships of Oligothiophene-Isoindigo Polymers …liu.diva-portal.org/smash/get/diva2:656978/FULLTEXT01.pdf · oligothiophene–isoindigo polymers for eﬃcient

Devroye Cognitive COMMAG

Lecture 2 Piecewise-linear optimization - Engineering ...vandenbe/ee236a/lectures/pwl.pdf · Lecture 2 Piecewise-linear optimization ... • equivalent LP ... modeling tools simplify

Álgebras de incidência hereditárias por partesAbstract MOREIRA, M. Piecewise hereditary incidence algebras. 2016. 273 p.Tese (Doutorado) - Instituto de Matemática e Estatística,