View
2.651
Download
3
Category
Preview:
Citation preview
Topic Models, LDA and all that
肖智博2011-03-23
Just make sure we are on the same page
LDA
• Linear Discriminant Analysis– Fisher Linear Discriminant Analysis– 有监督学习– 寻找使类间与类内比最大的投影方向– 从矩阵的角度
• Latent Dirichlet Allocation– 无监督学习– 从图模型的角度
• 所需数学知识• Latent Dirichlet Allocation• 后验概率逼近方法• 主题模型的演化 —— 从主题的关系探讨• 有监督 LDA --- MedLDA• 主题模型的应用
Roadmap
主要内容
• 后验概率 posteriori• 逼近 approximation• 采样 sampling• 变分法 variational• 优化 optimization
Keywords
关键词
• Approximation methods is useful!– Gibbs Sampling– Variational Methods
• (Convex) Optimization is Useful!• Math is almighty!
My afterthoughts
我的感受
• 概率知识回顾• Beta 和 Gamma 方程• Dirichlet 分布• 多项分布• 共轭分布• 贝叶斯网络简介
本节内容
Overview
概率知识复习• Chain rule (conditional independence)
• Bayes rule
• Marginal distribution
Probability Recap
• Gamma 方程
• Beta 方程
Gamma 和 Beta 方程
Gamma and Beta function
Dirichlet 分布• 概率密度函数
Dirichlet Distribution
多项分布• 概率密度函数
• 期望• 方差
Multinomial Distribution
共轭分布如果似然函数 和先验分布 属于同一分布族,则称两者是共轭分布共轭先验分布可以为计算后验分布提供方便
Conjugrate distribution
Very Important!
共轭分布
Conjugrate distribution
相似
David Barber: Bayesian Reasoning and Machine Learning
贝叶斯网络
Bayesian Network
贝叶斯网络 ( 续 )
Bayesian Network
• 如何表示满足特定独立性的分布?– 表示问题 representation
• 如何利用特定独立性来有效的计算?– 推断问题 inference
• 如何辨识数据中的特定独立性?– 学习问题 learning
贝叶斯网络:要解决的问题
Bayesian Network : problems to solve
• David Barber– Bayesian Reasoning and Machine Learning
• Daphne Koller, Nir Friedman– Probabilistic Graphical Model
• Bishop– Pattern Recognition and Machine Learning Ch8
• Eric Xing– Probabilistic Graphical Models
获得更多
Bayesian Network : where to learn more
本节内容 : 主题模型• 主题模型• LDA• 推断方法 Inference• 主题间关系• MedLDA• LDA 的应用
Topic Model Overview
主题模型
Topic Model Overview
• LDA based Topic Models
• Hyperspace Analogue to Language (HAL) (Lund and Burgess, 1996)
• Bound Encoding of the Aggregate Language Environment(BEAGLE) (Jones and Mewhort, 2007)
主要研究者
Researchers in Topic Model
D Blei
Andrew McCallum
Michael I. Jordan Andrew Ng John Lafferty
Eric Xing Fei-Fei Li
Mark Steyvers
主要研究者
Researchers in Topic Model
Hanna Wallach Yee Whye Teh Jun ZhuDavid Mimno
Why Latent?
重新思考贝叶斯模型• 一个适合的模型 应该包含事件可能发生的各种情况。• 一个恰当的先验分布 应该避免给可能发生的情况赋予小概率,但是也不应该将几乎不可能事件与其他事件一概而论。为了避免这种情况发生,需要考虑模型参数间的联系。一种策略是在模型中引入隐含变量 (latent variables ) ,另一种是引入超参数 (hyperparameters) 。这两种方法都是可计算的 (tractable) 。
From Radford Neal’s CSC2541 “Bayesian Methods for Machine Learning”
The goal is to find short descriptions of the members of a collection that enable efficient processing of large collections while preserving the essential statistical relationships that are useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevance judgments.
Goal
Goal and Motivation of Topic Model
主题模型:前人工作
tf-idf1983
tf-idf 统计词频无法捕捉到文档内部和文档间的统计特征
Previous Work : tf-idf
LSI1990
tf-idf1983
LSI: Latent Semantic Indexing在词与文档 (term-by-document) 矩阵上使用 SVDtf-idf 的线性组合,能捕捉到一些语法特征
Previous work : LSI
主题模型:前人工作
LSI1990
tf-idf1983
pLSI1999
pLSI (aka Aspect Model 内容模型 )参数随着语料库的容量增长,容易过拟合在文档层面没有一个统计模型,无法对文档指定概率
Previous work : pLSI
主题模型:前人工作
LSI1990
tf-idf1983
LDA2003
pLSI1999
LDAbag-of-word 假设同时考虑词和文档交换性的混合模型
LDA
主题模型:前人工作
LDA
LDA in graphical model
LDA 举例:在线音乐社区
An analog : Modeling Shared Tastes in Online Communities - Laura Dietz NIPS 09 workshop
LDA
对于语料库 中的每个文档 , LDA 是如下的变参数层次贝叶斯网络:1. 选择单词的个数2. 选择文档中话题比率3. 对于每个单词 a) 选择话题b) 从分布 中选择单词
LDA procedure
LDA
在已知超参数 和 的情况下,主题和词的联合概率为
对 和 求积分,可以得到文档的边际概率
进而,对所有的边际概率求积,可得语料库的概率
LDA : to see a document
LDA
在已知超参数 和 的情况下,主题和词的联合概率为
对 和 求积分,可以得到文档的边际概率
进而,对所有的边际概率求积,可得语料库的概率
LDA : to see a document
为何求积分?
From Jerry Zhu’s CS 731 Advanced Artificial Intelligence
• 概率通过频数得到;• 数据是随机的,所以,期望也是随机的;• 参数是确定的,未知常量与概率式无关;
• 概率是置信度的体现;• 求参数 的期望是通过求其概率分布得到; • 对未知参数的估计是通过求其边际概率得到。
Frequentist
Bayesian
LDA by Human and Computer
寒假里发生的一件趣事
• 中心思想• 段落的中心思想• 展开
训练测试
阅读…
LDA : Topic
LDA : Five topics from a 50-topic LDA model fit to Science from 1980– 2002
Five topics from a 50-topic LDA model fit to Science from 1980– 2002
LDA : Personas
LDA : Personas
Demohttp://personas.media.mit.edu/personasWeb.html
LDA :获得更多
LDA : where to learn more --- Surveys
1. David M Blei, Andrew Y Ng, and Michael I Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, 2003
2. David M Blei and John D Lafferty. Topic models. Taylor and Francis, 2009.
3. Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, 4(2):280–301,January 2010.
4. Mark Steyvers and Tom Griffith. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, July 2006.
如何得到 LDA 中的参数 --- 推断文档中主题的概率 每个词的主题指定概率
LDA 模型中最重要的计算任务是计算隐含变量的后验概率
变分法 抽样法Variational Inference Gibbs Sampling
Inference --- get important parameters in LDA
推断方法
Inference Methods Overview
推断方法• 随机方法 (抽样 )– MCMC, Metropolis-Hasting, Gibbs, etc– 计算量大,但相对精确
• 判定方法 ( 优化 )– Mean Field, Belief Propagation– Variational Bayes, Expectation Propagation– 计算量小,不精确,可以给出边界
Inference Methods : Comparison of two major methods
变分推断 Variational Inference
Variational ≈ Optimization
Variational ≈ Convex OptimizationThe basic idea of convexity-based variational inference is to make use of Jensen’s inequality to obtain an adjustable lower bound on the log likelihood. Essentially, one considers a family of lower bounds, indexed by a set of variational parameters. The variational parameters are chosen by an optimization procedure that attempts to find the tightest possible lower bound.
Variational Inference
Mean field
• 基本思想– 用一个简单可分解的分布 逼近–求 KL散度最小的逼近
• 为何得名?– 概率可完全分解
Mean field variational inference
LDA 中的变分推断
Variational inference in LDA Overview
目标:求出
LDA 中的变分推断
Variational Inference : Beautiful math
Jensen 不等式
LDA 中的变分推断
Variational Inference : Beautiful math
记则
因为 都是可分解的,所以有
LDA 中的变分推断
Variational Inference : Beautiful math
LDA 中的变分推断
Variational Inference : Beautiful math
应用拉格朗日法,得到
总结: LDA 中的变分推断
Variational Inference : Review
目标:求出
变分推断 : 获得更多
Variational Inference : Where to learn more
1. Martin Wainwright. Graphical models and variational methods: Message-passing, convex relaxations, and all that. ICML2008 Tutorial
2. M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, Vol. 1, Numbers 1--2, pp. 1--305, December 2008
MCMC in LDA
MCMC Overview
• Sampling in general– Why sampling is necessary and why it is hard– Importance sampling, rejection sampling– Markov Chain Monte Carlo– Metropolis-Hasting, Gibbs sampling
• Collapsed Gibbs in LDA
Pioneers to push sampling
MCMC Overview
Nicholas C. Metropolis Josiah W. GibbsAndrey Markov
中华人民共和国国家统计局 2006年 3月 16日经国务院批准,我国于 2005年底开展了全国1% 人口抽样调查工作。这次调查的样本量为 1705 万人,占全国总人口的 1.31% 。全国人口中,具有大学程度(指大专及以上)的人口为 6764万人,高中程度(含中专)的人口为 15083万人,初中程度的人口为 46735万人,小学程度的人口为 40706万人。
抽样例子
Sampling Example : Population statistics
1 从给定概率分布 中产生样本
抽样要解决的问题
Sampling
2 在给定概率分布 下,估计函数的期望
例子:测量湖水内某种物质的含量
Sampling : Why it is so damn hard?
Rejection sampling
Sampling : Rejection sampling
Accept
Reject
Importance sampling
Sampling : Importance sampling
In Rejection sampling, throwing away an x seems a waste, and the rejection is the only thing we know about the original distribution.
Metropolis-Hasting Method
Sampling : Metropolis-Hasting Method
考虑 Markov 特性:某个状态仅与其前一个状态有关在 t状态, 可以是任意可以抽样的分布,比如高斯分布对于一个新的状态,考虑
Gibbs Sampling
Sampling : Gibbs Sampling
1 对所有的变量初始化2 选定维度 i
从分布 中对 采样
Gibbs Sampling in LDA : Joint distribution
Gibbs Sampling in LDA
Gibbs Sampling in LDA : Joint distribution
Gibbs Sampling in LDA
Collapsed:
将上式带入
Gibbs Sampling in LDA : Joint distribution
Gibbs Sampling in LDA
此处省略若干公式 ……
Gibbs Sampling in LDA : Marginal dist.
Gibbs Sampling in LDA
此处省略若干公式 ……
Gibbs Sampling in LDA in Python Code
Sampling : Gibbs Sampling code in Python
for m in xrange(n_docs): for i, w in enumerate(word_indices(matrix[m, :])): z = np.random.randint(self.n_topics) self.nmz[m,z] += 1 self.nm[m] += 1 self.nzw[z,w] += 1 self.nz[z] += 1 self.topics[(m,i)] = z
Really simple!
1. D.J.C. MacKay. Information theory, inference, and learning algorithms. Cambridge Univ Pr,2003.
2. Gregor Heinrich. Parameter estimation for text analysis. Technical Report, 2009.
3. Michael I. Jordan and Yair Weiss. Graphical models: Probabilistic inference.
4. Christophe Andrieu, N De Freitas, A Doucet, and Michael I. Jordan. An introduction to MCMC for machine learning. Machine learning, pages 5–43, 2003.
5. Yi Wang. Distributed Gibbs Sampling of Latent Dirichlet Allocation : The Gritty Details. Technical Report, 2007.
Gibbs Sampling 获得更多
Gibbs Sampling where to learn more
主题模型的演化
Evolution of Topic Models
• Correlated topic models• Dynamic topic models• Temporal topic models
1. David M. Blei and John D Lafferty. Correlated Topic Models. In Advances in Neural Information Processing Systems 18, 2006.
2. David M. Blei and John D Lafferty. A correlated topic model of Science. The Annals of Applied Statistics, 1(1):17–35, 2007.
3. David M. Blei and John D Lafferty. Dynamic topic models. Proceedings of the 23rd international conference on Machine learning - ICML ’06, pages 113–120, 2006.
Correlated + Dynamic TM
Correlated + Dynamic Topic Models
Correlated + Dynamic TM
Correlated + Dynamic Topic Models
Correlated TM
Correlated Topic Models
无法捕捉主题间的联系 和多元分布不共轭采用变分法进行推断
Correlated TM
Correlated Topic Models
Correlated TM
Correlated Topic Models
控制稀疏度
Dynamic TM
Dynamic Topic Models
DTM 中,假设所有文档是按时间分块
Dynamic TM
Dynamic Topic Models : Top10 words of Science and example articles of Science
Topics over time
Topics over time
Published in: KDD '06 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Topics over time
Topics over time
Topics over time : Topic discovery
Topics over time : State-of-the-Union Addresses
TOT
LDA
Topics over time : Topic evolution
Topics over time : Topic evolution on NIPS data
Topics over time : Co-occuring Topics
Topics over time : Topic evolution on NIPS data
Topics over time : Review
Topics over time : Review 基于 LDA 话题演化研究方法综述@ 中文信息学报 2010 年 11 月
Work by Hanna Wallach
Work by Hanna Wallach
NIPS ’09
ICML ’09
不对称先验
Rethinking LDA : Why priors matter
不对称先验的优点
Rethinking LDA : Why priors matter
• 不同的是对不同的文档,不对称起到平滑主题的作用• 对于来说,通过控制可以达到控制主题是更稀疏还是更平均分布,由于是对整个语料库起作用,不对称降低了模型对文档内部结构的刻画• 是每个文档的参数,适合使用不对称先验对文档进行刻画
1. David Blei and Jon D. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems, pages 1–22, 2008.
2. Daniel Ramage, David Hall, Ramesh Nallapati, and C.D. Manning. Labeled LDA : A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1, pages 248–256. Association for Computational Linguistics, 2009.
3. Jun Zhu, Amr Ahmed, and Eric Xing. MedLDA: Maximum Margin Supervised Topic Models. Journal of Machine Learning Research, 1:1–48, 2010.
Supervised LDA
Supervised LDA
Supervised LDA :目标
Supervised LDA
在无监督模型的基础上,增加对类标的描述进行分类、回归朴素的方法:先进行 LDA ,利用主题进行分类 Naïve
用分布来对类标进行建模
Supervised topic models
Supervised LDA
用泛化线性模型 (Generalized Linear Model) 来对类标进行建模
Supervised topic models
Supervised LDA : Why use GLM
GLM 可以灵活的描述任何可以写成指数分布的类标• 高斯分布• 二项分布• 多项分布• 柏松分布• Gamma 分布• Weibull 分布• ……
Semi-Supervised LDA
Semi-Supervised LDA
UnlabeledLabeled
MedLDA
Maximum Entropy Discrimination Latent Dirichlet Allocation
maximum entropy discrimination latent Dirichlet allocation
通过优化单一目标函数和一组边界约束将大边界理论同主题模型结合在一起
MedTM
Maximum Entropy Discrimination Topic Models
优点:1.利用大边界理论正确分类2.更好的描述数据
MedLDA LDA
MedLDA : topic discovery
2D embedding on 20Newsgroups data
MedLDA : classification
Classfication on 20Newsgroups data
二分类 多分类
主题模型的应用
NIPS ’09 Workshop on Applications for Topic Models: Text and Beyond
• 社交网络,微博,电子邮件 不规范用语 (缩写,误拼,引用,不规范引用, @ , RT……)
• 蛋白质表达式分析
层次结构,更细的粒度• 话题跟踪、演化、消亡• 文本摘要• 多媒体(图像、音频、视频)
重建庞贝古城
Reconstructing Pompeian Households
重建庞贝古城
Reconstructing Pompeian Households
重建庞贝古城
Reconstructing Pompeian Households
炖菜锅 糕点模具
重建庞贝古城
Reconstructing Pompeian Households
炖菜锅 糕点模具
Is it so?
重建庞贝古城
Reconstructing Pompeian Households
房间功能物件
主题分布单词
重建庞贝古城
Reconstructing Pompeian Households
重建庞贝古城
Reconstructing Pompeian Households
物品用途与名称不匹配
构件挖掘
Software Analysis with Unsupervised Topic Models
12,151 Java projects from Sourceforge and Apache
4,632 projects
366,287 source files
38.7 million lines of code
written by 9,250 developers
构件挖掘
Software Analysis with Unsupervised Topic Models
构件挖掘
Software Analysis with Unsupervised Topic Models
研究方向
Future Direction
• LDA 作为降维方法,应用于 k-means• 以 MedLDA 为基础继续扩展• LDA+ 主动学习 • 半监督主题模型• Asymmetric Dirichlet Prior• 更好的逼近方法 (Data Augmentation)
• 文本摘要• 非结构化数据的主题模型 (微博、论坛、基因序列 )• LDA 应用到图像
未讲述的内容
Not covered topics
• Hierarchical Topic Models• nested Chinese Restaurant Process• Indian Buffet Process
Topic Models Background : General
Topic Models Background
• D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
• D. Blei and M. Jordan. Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis, 1:121–144, 2006.
• M. Steyvers and T. Griffiths. Probabilistic Topic Models. In Latent Semantic Analysis: A Road to Meaning, T. Landauer, Mcnamara, S. Dennis, and W. Kintsch eds. Laurence Erlbaum, 2006.
• Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566-1581, 2006.
• J. Zhu, A. Ahmed and E. P. Xing. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification. The 26th International Conference on Machine Learningy, 2009.
Inference and Evaluation
Topic Models Background
• Asuncion, M. Welling, P. Smyth, and Y. Teh. On Smoothing and Inference for Topic Models. In Uncertainty in Artificial Intelligence, 2009.
• W. Li, and A. McCallum. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. In International Conference on Machine Learning, 2006.
• Porteous, A. Ascuncion, D. Newman, A. Ihler, P. Smyth, and M. Welling. Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation. In Knowledge Discovery and Data Mining, 2008.
• H. Wallach, I. Murray, R. Salakhutdinov and D. Mimno. Evaluation Methods for Topic Models. In International Conference on Machine Learning, 2009.
• M. Welling, Y. Teh and B. Kappen. Hybrid Variational/Gibbs Inference in Topic Models. In Uncertainty in Artificial Intelligence, 2008.
Biology
Topic Models Background
• E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed-membership stochastic blockmodels. Journal of Machine Learning Research, 9: 1981-2014, 2008.
• P. Agius, Y. Ying, and C. Campbell. Bayesian Unsupervised Learning with Multiple Data Types. Statistical Applications in Genetic and Molecular Biology, 3(1):27, 2009.
• P. Flaherty, G. Giaever, J. Kumm, Michael I. Jordan, Adam P. Arkin. A Latent Variable Model for Chemogenomic Profiling. Bioinformatics 2005 Aug 1;21(15):3286-93.
• S. Shringarpure and E. P. Xing. mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations. Genetics, Vol 182, issue 2, 2009.
Natural Language Processing
Topic Models Background
• J. Boyd-Graber and D. Blei. Syntactic topic models. In Neural Information Processing Systems, 2009.
• T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. In Neural Information Processing Systems, 2005.
• K. Toutanova and M. Johnson. A Bayesian LDA-based Model for Semi-Supervised Part-of-speech Tagging. In Neural Information Processing Systems, 2008.
Social Science
Topic Models Background
• J. Chang, J. Boyd-Graber, and D. Blei. Connections between the lines: Augmenting social networks with text. Knowledge Discovery and Data Mining, 2009.
• L. Dietz, S. Bickel, and T. Scheffer. Unsupervised Prediction of Citation Influences. In International Conference on Machine Learning, 2007.
• D. Hall, D. Jurafsky, and C. Manning. Studying the History of Ideas Using Topic Models. In Emperical Methods in Natural Language Processing, 2008.
Temporal and Network Models
Topic Models Background
• D. Blei and J. Lafferty. Dynamic topic models. In International Conference on Machine Learning, 2006.
• J. Chang and D. Blei. Relational topic models for document networks. Artificial Intelligence and Statistics (in print), 2009.
• E.P. Xing, W. Fu, and L. Song. A State-Space Mixed Membership Blockmodel for Dynamic Network Tomography. Annals of Applied Statistics, 2009.
• H. Wallach. Topic Modeling: Beyond Bag-of-Words. In International Conference on Machine Learning, 2006.
Vision
Topic Models Background
• J. Philbin, J. Sivic, and A. Zisserman. Geometric LDA: A Generative Model for Particular Object Discovery In British Machine Vision Conference, 2008.
• L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models for 101 object categories. In Computer Vision and Image Understanding, 2007.
• C. Wang, D. Blei and L. Fei-Fei. Simultaneous Image Classification and Annotation. In Computer Vision and Pattern Recognition, 2009.
ONE MORE THING…
结构 关系广泛应用
房间功能物件
主题分布单词
Q & A
Recommended