123
Topic Models, LDA and all that 肖肖肖 2011-03-23

Topic model, LDA and all that

Embed Size (px)

Citation preview

Page 1: Topic model, LDA and all that

Topic Models, LDA and all that

肖智博2011-03-23

Page 2: Topic model, LDA and all that

Just make sure we are on the same page

LDA

• Linear Discriminant Analysis– Fisher Linear Discriminant Analysis– 有监督学习– 寻找使类间与类内比最大的投影方向– 从矩阵的角度

• Latent Dirichlet Allocation– 无监督学习– 从图模型的角度

Page 3: Topic model, LDA and all that

• 所需数学知识• Latent Dirichlet Allocation• 后验概率逼近方法• 主题模型的演化 —— 从主题的关系探讨• 有监督 LDA --- MedLDA• 主题模型的应用

Roadmap

主要内容

Page 4: Topic model, LDA and all that

• 后验概率 posteriori• 逼近 approximation• 采样 sampling• 变分法 variational• 优化 optimization

Keywords

关键词

Page 5: Topic model, LDA and all that

• Approximation methods is useful!– Gibbs Sampling– Variational Methods

• (Convex) Optimization is Useful!• Math is almighty!

My afterthoughts

我的感受

Page 6: Topic model, LDA and all that
Page 7: Topic model, LDA and all that

• 概率知识回顾• Beta 和 Gamma 方程• Dirichlet 分布• 多项分布• 共轭分布• 贝叶斯网络简介

本节内容

Overview

Page 8: Topic model, LDA and all that

概率知识复习• Chain rule (conditional independence)

• Bayes rule

• Marginal distribution

Probability Recap

Page 9: Topic model, LDA and all that

• Gamma 方程

• Beta 方程

Gamma 和 Beta 方程

Gamma and Beta function

Page 10: Topic model, LDA and all that

Dirichlet 分布• 概率密度函数

Dirichlet Distribution

Page 11: Topic model, LDA and all that

多项分布• 概率密度函数

• 期望• 方差

Multinomial Distribution

Page 12: Topic model, LDA and all that

共轭分布如果似然函数 和先验分布 属于同一分布族,则称两者是共轭分布共轭先验分布可以为计算后验分布提供方便

Conjugrate distribution

Very Important!

Page 13: Topic model, LDA and all that

共轭分布

Conjugrate distribution

相似

Page 14: Topic model, LDA and all that
Page 15: Topic model, LDA and all that

David Barber: Bayesian Reasoning and Machine Learning

Page 16: Topic model, LDA and all that

贝叶斯网络

Bayesian Network

Page 17: Topic model, LDA and all that

贝叶斯网络 ( 续 )

Bayesian Network

Page 18: Topic model, LDA and all that

• 如何表示满足特定独立性的分布?– 表示问题 representation

• 如何利用特定独立性来有效的计算?– 推断问题 inference

• 如何辨识数据中的特定独立性?– 学习问题 learning

贝叶斯网络:要解决的问题

Bayesian Network : problems to solve

Page 19: Topic model, LDA and all that

• David Barber– Bayesian Reasoning and Machine Learning

• Daphne Koller, Nir Friedman– Probabilistic Graphical Model

• Bishop– Pattern Recognition and Machine Learning Ch8

• Eric Xing– Probabilistic Graphical Models

获得更多

Bayesian Network : where to learn more

Page 20: Topic model, LDA and all that
Page 21: Topic model, LDA and all that

本节内容 : 主题模型• 主题模型• LDA• 推断方法 Inference• 主题间关系• MedLDA• LDA 的应用

Topic Model Overview

Page 22: Topic model, LDA and all that

主题模型

Topic Model Overview

• LDA based Topic Models

• Hyperspace Analogue to Language (HAL) (Lund and Burgess, 1996)

• Bound Encoding of the Aggregate Language Environment(BEAGLE) (Jones and Mewhort, 2007)

Page 23: Topic model, LDA and all that
Page 24: Topic model, LDA and all that

主要研究者

Researchers in Topic Model

D Blei

Andrew McCallum

Michael I. Jordan Andrew Ng John Lafferty

Eric Xing Fei-Fei Li

Mark Steyvers

Page 25: Topic model, LDA and all that

主要研究者

Researchers in Topic Model

Hanna Wallach Yee Whye Teh Jun ZhuDavid Mimno

Page 26: Topic model, LDA and all that

Why Latent?

Page 27: Topic model, LDA and all that

重新思考贝叶斯模型• 一个适合的模型 应该包含事件可能发生的各种情况。• 一个恰当的先验分布 应该避免给可能发生的情况赋予小概率,但是也不应该将几乎不可能事件与其他事件一概而论。为了避免这种情况发生,需要考虑模型参数间的联系。一种策略是在模型中引入隐含变量 (latent variables ) ,另一种是引入超参数 (hyperparameters) 。这两种方法都是可计算的 (tractable) 。

From Radford Neal’s CSC2541 “Bayesian Methods for Machine Learning”

Page 28: Topic model, LDA and all that

The goal is to find short descriptions of the members of a collection that enable efficient processing of large collections while preserving the essential statistical relationships that are useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevance judgments.

Goal

Goal and Motivation of Topic Model

Page 29: Topic model, LDA and all that

主题模型:前人工作

tf-idf1983

tf-idf 统计词频无法捕捉到文档内部和文档间的统计特征

Previous Work : tf-idf

Page 30: Topic model, LDA and all that

LSI1990

tf-idf1983

LSI: Latent Semantic Indexing在词与文档 (term-by-document) 矩阵上使用 SVDtf-idf 的线性组合,能捕捉到一些语法特征

Previous work : LSI

主题模型:前人工作

Page 31: Topic model, LDA and all that

LSI1990

tf-idf1983

pLSI1999

pLSI (aka Aspect Model 内容模型 )参数随着语料库的容量增长,容易过拟合在文档层面没有一个统计模型,无法对文档指定概率

Previous work : pLSI

主题模型:前人工作

Page 32: Topic model, LDA and all that

LSI1990

tf-idf1983

LDA2003

pLSI1999

LDAbag-of-word 假设同时考虑词和文档交换性的混合模型

LDA

主题模型:前人工作

Page 33: Topic model, LDA and all that

LDA

LDA in graphical model

Page 34: Topic model, LDA and all that

LDA 举例:在线音乐社区

An analog : Modeling Shared Tastes in Online Communities - Laura Dietz NIPS 09 workshop

Page 35: Topic model, LDA and all that

LDA

对于语料库 中的每个文档 , LDA 是如下的变参数层次贝叶斯网络:1. 选择单词的个数2. 选择文档中话题比率3. 对于每个单词 a) 选择话题b) 从分布 中选择单词

LDA procedure

Page 36: Topic model, LDA and all that

LDA

在已知超参数 和 的情况下,主题和词的联合概率为

对 和 求积分,可以得到文档的边际概率

进而,对所有的边际概率求积,可得语料库的概率

LDA : to see a document

Page 37: Topic model, LDA and all that

LDA

在已知超参数 和 的情况下,主题和词的联合概率为

对 和 求积分,可以得到文档的边际概率

进而,对所有的边际概率求积,可得语料库的概率

LDA : to see a document

为何求积分?

Page 38: Topic model, LDA and all that

From Jerry Zhu’s CS 731 Advanced Artificial Intelligence

• 概率通过频数得到;• 数据是随机的,所以,期望也是随机的;• 参数是确定的,未知常量与概率式无关;

• 概率是置信度的体现;• 求参数 的期望是通过求其概率分布得到; • 对未知参数的估计是通过求其边际概率得到。

Frequentist

Bayesian

Page 39: Topic model, LDA and all that

LDA by Human and Computer

寒假里发生的一件趣事

• 中心思想• 段落的中心思想• 展开

训练测试

阅读…

Page 40: Topic model, LDA and all that

LDA : Topic

LDA : Five topics from a 50-topic LDA model fit to Science from 1980– 2002

Five topics from a 50-topic LDA model fit to Science from 1980– 2002

Page 41: Topic model, LDA and all that

LDA : Personas

LDA : Personas

Demohttp://personas.media.mit.edu/personasWeb.html

Page 42: Topic model, LDA and all that

LDA :获得更多

LDA : where to learn more --- Surveys

1. David M Blei, Andrew Y Ng, and Michael I Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, 2003

2. David M Blei and John D Lafferty. Topic models. Taylor and Francis, 2009.

3. Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, 4(2):280–301,January 2010.

4. Mark Steyvers and Tom Griffith. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, July 2006.

Page 43: Topic model, LDA and all that
Page 44: Topic model, LDA and all that

如何得到 LDA 中的参数 --- 推断文档中主题的概率 每个词的主题指定概率

LDA 模型中最重要的计算任务是计算隐含变量的后验概率

变分法 抽样法Variational Inference Gibbs Sampling

Inference --- get important parameters in LDA

Page 45: Topic model, LDA and all that

推断方法

Inference Methods Overview

Page 46: Topic model, LDA and all that

推断方法• 随机方法 (抽样 )– MCMC, Metropolis-Hasting, Gibbs, etc– 计算量大,但相对精确

• 判定方法 ( 优化 )– Mean Field, Belief Propagation– Variational Bayes, Expectation Propagation– 计算量小,不精确,可以给出边界

Inference Methods : Comparison of two major methods

Page 47: Topic model, LDA and all that

变分推断 Variational Inference

Variational ≈ Optimization

Variational ≈ Convex OptimizationThe basic idea of convexity-based variational inference is to make use of Jensen’s inequality to obtain an adjustable lower bound on the log likelihood. Essentially, one considers a family of lower bounds, indexed by a set of variational parameters. The variational parameters are chosen by an optimization procedure that attempts to find the tightest possible lower bound.

Variational Inference

Page 48: Topic model, LDA and all that

Mean field

• 基本思想– 用一个简单可分解的分布 逼近–求 KL散度最小的逼近

• 为何得名?– 概率可完全分解

Mean field variational inference

Page 49: Topic model, LDA and all that

LDA 中的变分推断

Variational inference in LDA Overview

目标:求出

Page 50: Topic model, LDA and all that

LDA 中的变分推断

Variational Inference : Beautiful math

Jensen 不等式

Page 51: Topic model, LDA and all that

LDA 中的变分推断

Variational Inference : Beautiful math

记则

因为 都是可分解的,所以有

Page 52: Topic model, LDA and all that

LDA 中的变分推断

Variational Inference : Beautiful math

Page 53: Topic model, LDA and all that

LDA 中的变分推断

Variational Inference : Beautiful math

应用拉格朗日法,得到

Page 54: Topic model, LDA and all that

总结: LDA 中的变分推断

Variational Inference : Review

目标:求出

Page 55: Topic model, LDA and all that

变分推断 : 获得更多

Variational Inference : Where to learn more

1. Martin Wainwright. Graphical models and variational methods: Message-passing, convex relaxations, and all that. ICML2008 Tutorial

2. M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, Vol. 1, Numbers 1--2, pp. 1--305, December 2008

Page 56: Topic model, LDA and all that
Page 57: Topic model, LDA and all that

MCMC in LDA

MCMC Overview

• Sampling in general– Why sampling is necessary and why it is hard– Importance sampling, rejection sampling– Markov Chain Monte Carlo– Metropolis-Hasting, Gibbs sampling

• Collapsed Gibbs in LDA

Page 58: Topic model, LDA and all that

Pioneers to push sampling

MCMC Overview

Nicholas C. Metropolis Josiah W. GibbsAndrey Markov

Page 59: Topic model, LDA and all that

中华人民共和国国家统计局 2006年 3月 16日经国务院批准,我国于 2005年底开展了全国1% 人口抽样调查工作。这次调查的样本量为 1705 万人,占全国总人口的 1.31% 。全国人口中,具有大学程度(指大专及以上)的人口为 6764万人,高中程度(含中专)的人口为 15083万人,初中程度的人口为 46735万人,小学程度的人口为 40706万人。

抽样例子

Sampling Example : Population statistics

Page 60: Topic model, LDA and all that

1 从给定概率分布 中产生样本

抽样要解决的问题

Sampling

2 在给定概率分布 下,估计函数的期望

Page 61: Topic model, LDA and all that

例子:测量湖水内某种物质的含量

Sampling : Why it is so damn hard?

Page 62: Topic model, LDA and all that

Rejection sampling

Sampling : Rejection sampling

Accept

Reject

Page 63: Topic model, LDA and all that

Importance sampling

Sampling : Importance sampling

In Rejection sampling, throwing away an x seems a waste, and the rejection is the only thing we know about the original distribution.

Page 64: Topic model, LDA and all that

Metropolis-Hasting Method

Sampling : Metropolis-Hasting Method

考虑 Markov 特性:某个状态仅与其前一个状态有关在 t状态, 可以是任意可以抽样的分布,比如高斯分布对于一个新的状态,考虑

Page 65: Topic model, LDA and all that

Gibbs Sampling

Sampling : Gibbs Sampling

1 对所有的变量初始化2 选定维度 i

从分布 中对 采样

Page 66: Topic model, LDA and all that

Gibbs Sampling in LDA : Joint distribution

Gibbs Sampling in LDA

Page 67: Topic model, LDA and all that

Gibbs Sampling in LDA : Joint distribution

Gibbs Sampling in LDA

Collapsed:

将上式带入

Page 68: Topic model, LDA and all that

Gibbs Sampling in LDA : Joint distribution

Gibbs Sampling in LDA

此处省略若干公式 ……

Page 69: Topic model, LDA and all that

Gibbs Sampling in LDA : Marginal dist.

Gibbs Sampling in LDA

此处省略若干公式 ……

Page 70: Topic model, LDA and all that

Gibbs Sampling in LDA in Python Code

Sampling : Gibbs Sampling code in Python

for m in xrange(n_docs): for i, w in enumerate(word_indices(matrix[m, :])): z = np.random.randint(self.n_topics) self.nmz[m,z] += 1 self.nm[m] += 1 self.nzw[z,w] += 1 self.nz[z] += 1 self.topics[(m,i)] = z

Really simple!

Page 71: Topic model, LDA and all that

1. D.J.C. MacKay. Information theory, inference, and learning algorithms. Cambridge Univ Pr,2003.

2. Gregor Heinrich. Parameter estimation for text analysis. Technical Report, 2009.

3. Michael I. Jordan and Yair Weiss. Graphical models: Probabilistic inference.

4. Christophe Andrieu, N De Freitas, A Doucet, and Michael I. Jordan. An introduction to MCMC for machine learning. Machine learning, pages 5–43, 2003.

5. Yi Wang. Distributed Gibbs Sampling of Latent Dirichlet Allocation : The Gritty Details. Technical Report, 2007.

Gibbs Sampling 获得更多

Gibbs Sampling where to learn more

Page 72: Topic model, LDA and all that
Page 73: Topic model, LDA and all that

主题模型的演化

Evolution of Topic Models

• Correlated topic models• Dynamic topic models• Temporal topic models

Page 74: Topic model, LDA and all that

1. David M. Blei and John D Lafferty. Correlated Topic Models. In Advances in Neural Information Processing Systems 18, 2006.

2. David M. Blei and John D Lafferty. A correlated topic model of Science. The Annals of Applied Statistics, 1(1):17–35, 2007.

3. David M. Blei and John D Lafferty. Dynamic topic models. Proceedings of the 23rd international conference on Machine learning - ICML ’06, pages 113–120, 2006.

Correlated + Dynamic TM

Correlated + Dynamic Topic Models

Page 75: Topic model, LDA and all that

Correlated + Dynamic TM

Correlated + Dynamic Topic Models

Page 76: Topic model, LDA and all that

Correlated TM

Correlated Topic Models

无法捕捉主题间的联系 和多元分布不共轭采用变分法进行推断

Page 77: Topic model, LDA and all that

Correlated TM

Correlated Topic Models

Page 78: Topic model, LDA and all that

Correlated TM

Correlated Topic Models

控制稀疏度

Page 79: Topic model, LDA and all that

Dynamic TM

Dynamic Topic Models

DTM 中,假设所有文档是按时间分块

Page 80: Topic model, LDA and all that

Dynamic TM

Dynamic Topic Models : Top10 words of Science and example articles of Science

Page 81: Topic model, LDA and all that

Topics over time

Topics over time

Published in: KDD '06 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Page 82: Topic model, LDA and all that

Topics over time

Topics over time

Page 83: Topic model, LDA and all that

Topics over time : Topic discovery

Topics over time : State-of-the-Union Addresses

TOT

LDA

Page 84: Topic model, LDA and all that

Topics over time : Topic evolution

Topics over time : Topic evolution on NIPS data

Page 85: Topic model, LDA and all that

Topics over time : Co-occuring Topics

Topics over time : Topic evolution on NIPS data

Page 86: Topic model, LDA and all that

Topics over time : Review

Topics over time : Review 基于 LDA 话题演化研究方法综述@ 中文信息学报 2010 年 11 月

Page 87: Topic model, LDA and all that

Work by Hanna Wallach

Work by Hanna Wallach

NIPS ’09

ICML ’09

Page 88: Topic model, LDA and all that

不对称先验

Rethinking LDA : Why priors matter

Page 89: Topic model, LDA and all that

不对称先验的优点

Rethinking LDA : Why priors matter

• 不同的是对不同的文档,不对称起到平滑主题的作用• 对于来说,通过控制可以达到控制主题是更稀疏还是更平均分布,由于是对整个语料库起作用,不对称降低了模型对文档内部结构的刻画• 是每个文档的参数,适合使用不对称先验对文档进行刻画

Page 90: Topic model, LDA and all that

1. David Blei and Jon D. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems, pages 1–22, 2008.

2. Daniel Ramage, David Hall, Ramesh Nallapati, and C.D. Manning. Labeled LDA : A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1, pages 248–256. Association for Computational Linguistics, 2009.

3. Jun Zhu, Amr Ahmed, and Eric Xing. MedLDA: Maximum Margin Supervised Topic Models. Journal of Machine Learning Research, 1:1–48, 2010.

Supervised LDA

Supervised LDA

Page 91: Topic model, LDA and all that

Supervised LDA :目标

Supervised LDA

在无监督模型的基础上,增加对类标的描述进行分类、回归朴素的方法:先进行 LDA ,利用主题进行分类 Naïve

用分布来对类标进行建模

Page 92: Topic model, LDA and all that

Supervised topic models

Supervised LDA

用泛化线性模型 (Generalized Linear Model) 来对类标进行建模

Page 93: Topic model, LDA and all that

Supervised topic models

Supervised LDA : Why use GLM

GLM 可以灵活的描述任何可以写成指数分布的类标• 高斯分布• 二项分布• 多项分布• 柏松分布• Gamma 分布• Weibull 分布• ……

Page 94: Topic model, LDA and all that

Semi-Supervised LDA

Semi-Supervised LDA

UnlabeledLabeled

Page 95: Topic model, LDA and all that

MedLDA

Maximum Entropy Discrimination Latent Dirichlet Allocation

maximum entropy discrimination latent Dirichlet allocation

通过优化单一目标函数和一组边界约束将大边界理论同主题模型结合在一起

Page 96: Topic model, LDA and all that

MedTM

Maximum Entropy Discrimination Topic Models

优点:1.利用大边界理论正确分类2.更好的描述数据

Page 97: Topic model, LDA and all that

MedLDA LDA

MedLDA : topic discovery

2D embedding on 20Newsgroups data

Page 98: Topic model, LDA and all that

MedLDA : classification

Classfication on 20Newsgroups data

二分类 多分类

Page 99: Topic model, LDA and all that

主题模型的应用

NIPS ’09 Workshop on Applications for Topic Models: Text and Beyond

• 社交网络,微博,电子邮件 不规范用语 (缩写,误拼,引用,不规范引用, @ , RT……)

• 蛋白质表达式分析

层次结构,更细的粒度• 话题跟踪、演化、消亡• 文本摘要• 多媒体(图像、音频、视频)

Page 100: Topic model, LDA and all that

重建庞贝古城

Reconstructing Pompeian Households

Page 101: Topic model, LDA and all that

重建庞贝古城

Reconstructing Pompeian Households

Page 102: Topic model, LDA and all that

重建庞贝古城

Reconstructing Pompeian Households

炖菜锅 糕点模具

Page 103: Topic model, LDA and all that

重建庞贝古城

Reconstructing Pompeian Households

炖菜锅 糕点模具

Is it so?

Page 104: Topic model, LDA and all that

重建庞贝古城

Reconstructing Pompeian Households

房间功能物件

主题分布单词

Page 105: Topic model, LDA and all that

重建庞贝古城

Reconstructing Pompeian Households

Page 106: Topic model, LDA and all that

重建庞贝古城

Reconstructing Pompeian Households

物品用途与名称不匹配

Page 107: Topic model, LDA and all that

构件挖掘

Software Analysis with Unsupervised Topic Models

12,151 Java projects from Sourceforge and Apache

4,632 projects

366,287 source files

38.7 million lines of code

written by 9,250 developers

Page 108: Topic model, LDA and all that

构件挖掘

Software Analysis with Unsupervised Topic Models

Page 109: Topic model, LDA and all that

构件挖掘

Software Analysis with Unsupervised Topic Models

Page 110: Topic model, LDA and all that
Page 111: Topic model, LDA and all that

研究方向

Future Direction

• LDA 作为降维方法,应用于 k-means• 以 MedLDA 为基础继续扩展• LDA+ 主动学习 • 半监督主题模型• Asymmetric Dirichlet Prior• 更好的逼近方法 (Data Augmentation)

• 文本摘要• 非结构化数据的主题模型 (微博、论坛、基因序列 )• LDA 应用到图像

Page 112: Topic model, LDA and all that

未讲述的内容

Not covered topics

• Hierarchical Topic Models• nested Chinese Restaurant Process• Indian Buffet Process

Page 113: Topic model, LDA and all that

Topic Models Background : General

Topic Models Background

• D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.

• D. Blei and M. Jordan. Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis, 1:121–144, 2006.

• M. Steyvers and T. Griffiths. Probabilistic Topic Models. In Latent Semantic Analysis: A Road to Meaning, T. Landauer, Mcnamara, S. Dennis, and W. Kintsch eds. Laurence Erlbaum, 2006.

• Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566-1581, 2006.

• J. Zhu, A. Ahmed and E. P. Xing. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification. The 26th International Conference on Machine Learningy, 2009.

Page 114: Topic model, LDA and all that

Inference and Evaluation

Topic Models Background

• Asuncion, M. Welling, P. Smyth, and Y. Teh. On Smoothing and Inference for Topic Models. In Uncertainty in Artificial Intelligence, 2009.

• W. Li, and A. McCallum. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. In International Conference on Machine Learning, 2006.

• Porteous, A. Ascuncion, D. Newman, A. Ihler, P. Smyth, and M. Welling. Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation. In Knowledge Discovery and Data Mining, 2008.

• H. Wallach, I. Murray, R. Salakhutdinov and D. Mimno. Evaluation Methods for Topic Models. In International Conference on Machine Learning, 2009.

• M. Welling, Y. Teh and B. Kappen. Hybrid Variational/Gibbs Inference in Topic Models. In Uncertainty in Artificial Intelligence, 2008.

Page 115: Topic model, LDA and all that

Biology

Topic Models Background

• E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed-membership stochastic blockmodels. Journal of Machine Learning Research, 9: 1981-2014, 2008.

• P. Agius, Y. Ying, and C. Campbell. Bayesian Unsupervised Learning with Multiple Data Types. Statistical Applications in Genetic and Molecular Biology, 3(1):27, 2009.

• P. Flaherty, G. Giaever, J. Kumm, Michael I. Jordan, Adam P. Arkin. A Latent Variable Model for Chemogenomic Profiling. Bioinformatics 2005 Aug 1;21(15):3286-93.

• S. Shringarpure and E. P. Xing. mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations. Genetics, Vol 182, issue 2, 2009.

Page 116: Topic model, LDA and all that

Natural Language Processing

Topic Models Background

• J. Boyd-Graber and D. Blei. Syntactic topic models. In Neural Information Processing Systems, 2009.

• T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. In Neural Information Processing Systems, 2005.

• K. Toutanova and M. Johnson. A Bayesian LDA-based Model for Semi-Supervised Part-of-speech Tagging. In Neural Information Processing Systems, 2008.

Page 117: Topic model, LDA and all that

Social Science

Topic Models Background

• J. Chang, J. Boyd-Graber, and D. Blei. Connections between the lines: Augmenting social networks with text. Knowledge Discovery and Data Mining, 2009.

• L. Dietz, S. Bickel, and T. Scheffer. Unsupervised Prediction of Citation Influences. In International Conference on Machine Learning, 2007.

• D. Hall, D. Jurafsky, and C. Manning. Studying the History of Ideas Using Topic Models. In Emperical Methods in Natural Language Processing, 2008.

Page 118: Topic model, LDA and all that

Temporal and Network Models

Topic Models Background

• D. Blei and J. Lafferty. Dynamic topic models. In International Conference on Machine Learning, 2006.

• J. Chang and D. Blei. Relational topic models for document networks. Artificial Intelligence and Statistics (in print), 2009.

• E.P. Xing, W. Fu, and L. Song. A State-Space Mixed Membership Blockmodel for Dynamic Network Tomography. Annals of Applied Statistics, 2009.

• H. Wallach. Topic Modeling: Beyond Bag-of-Words. In International Conference on Machine Learning, 2006.

Page 119: Topic model, LDA and all that

Vision

Topic Models Background

• J. Philbin, J. Sivic, and A. Zisserman. Geometric LDA: A Generative Model for Particular Object Discovery In British Machine Vision Conference, 2008.

• L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models for 101 object categories. In Computer Vision and Image Understanding, 2007.

• C. Wang, D. Blei and L. Fei-Fei. Simultaneous Image Classification and Annotation. In Computer Vision and Pattern Recognition, 2009.

Page 120: Topic model, LDA and all that

ONE MORE THING…

Page 121: Topic model, LDA and all that

结构 关系广泛应用

房间功能物件

主题分布单词

Page 122: Topic model, LDA and all that
Page 123: Topic model, LDA and all that

Q & A