Konstantion Vorontsov - Additive regularization of matrix decompositons and probabilistic topic modeling

  • View
    918

  • Download
    1

Embed Size (px)

Transcript

  • 1. Additive Regularization of Matrix Factorization for Probabilistic Topic Modeling ( ) Konstantin Vorontsov Yandex CC RAS MIPT HSE MSU Analysis of Images, Social Networks and Texts Ekaterinburg, 1012 April 2014 (voron@forecsys.ru) 1 / 51

2. 1 , , PLSA LDA. - 2 - 3 , , , (voron@forecsys.ru) 2 / 51 3. , , PLSA LDA. - () , . : W , () D (, ) ndw w W d D : p(w|t) w t p(t|d) t d : p(w|d) (voron@forecsys.ru) 3 / 51 4. , , PLSA LDA. - () , , , : (expert search), , , , , , (voron@forecsys.ru) 4 / 51 5. , , PLSA LDA. - : : d: {w1, . . . , wnd } , : , , , .. / : , () , , , , (voron@forecsys.ru) 5 / 51 6. , , PLSA LDA. - : ? ? ? ? ? ? ? ? ? ? ? (voron@forecsys.ru) 6 / 51 7. , , PLSA LDA. - , , . . , ( , 1987.) 1 . 2 . 3 , , . 4 . (voron@forecsys.ru) 7 / 51 8. , , PLSA LDA. - d p(w|d) = tT p(w|t)p(t|d) - . GC- GA- . , ( , ) . . , . ( ). )|( :)|( , , , , : 0.018 0.013 0.011 0.023 0.016 0.009 0.014 0.009 0.006 (voron@forecsys.ru) 8 / 51 9. , , PLSA LDA. - : D (di , wi , ti )n i=1 p(d, w, t) di , wi , ti : p(w|d, t) = p(w|t) : p(w|d) = tT p(w|t) wt p(t|d) td wt p(w|t) t T; td p(t|d) d D. : wt, td d. : p(w|d) ndw nd wt, td . (voron@forecsys.ru) 9 / 51 10. , , PLSA LDA. - : L (, ) = dD wd ndw ln tT wttd max , , wt 0; wW wt = 1; td 0; tT td = 1 : F W D W T TD F = p(w|d) W D , = wt W T wt =p(w|t), = td TD td =p(t|d). (voron@forecsys.ru) 10 / 51 11. , , PLSA LDA. - Probabilistic Latent Semantic Analysis [Hofmann, 1999] , , E-: ndwt = ndw wttd sT ws sd ; M-: wt = nwt nt ; nwt = dD ndwt; nt = wW nwt; td = ntd nd ; ntd = wd ndwt; nd = tT ntd ; EM- E- M- . . : ! (voron@forecsys.ru) 11 / 51 12. , , PLSA LDA. - - - : p(t|d, w) = wttd sT ws sd ndwt = ndw p(t|d, w) (d, w, t) - : wt = nwt nt dD ndwt dD wd ndwt , td = ntd nd wd ndwt wW tT ndwt , : p(t|d, w) wttd ; wt nwt; td ntd ; (voron@forecsys.ru) 12 / 51 13. , , PLSA LDA. - - : E- - : D, |T|, imax; : ; 1 wt, td d D, w W , t T; 2 i = 1, . . . , imax 3 nwt, ntd , nt, nd := 0 d D, w W , t T; 4 d D w d 5 p(t|d, w) = wttd s ws sd t T; 6 nwt, ntd , nt, nd += ndw p(t|d, w) t T; 7 wt := nwt/nt w W , t T; 8 td := ntd /nd d D, t T; (voron@forecsys.ru) 13 / 51 14. , , PLSA LDA. - EM- ( ) 1 wt w W , t T; 2 nwt := 0, nt := 0 w W , t T; 3 Dj, j = 1, . . . , J 4 nwt := 0, nt := 0 w W , t T; 5 d Dj 6 td t T; 7 8 p(t|d, w) = wttd s wssd w d, t T; 9 td := 1 nd wd ndw p(t|d, w) t T; 10 d ; 11 nwt, nt += ndw p(t|d, w) w d, t T; 12 nwt := j nwt + nwt; nt := j nt + nt w W , t T; 13 wt := nwt/nt w W , t T; (voron@forecsys.ru) 14 / 51 15. , , PLSA LDA. - [Blei, Ng, Jordan, 2003] wt p(w|t), td p(t|d): PLSA : wt = nwt nt , td = ntd nd LDA : wt = nwt + w nt + 0 , td = ntd + t nd + 0 . Asuncion A., Welling M., Smyth P., Teh Y. W. On smoothing and inference for topic models // Intl conf. on Uncertainty in Articial Intelligence, 2009. .., .. EM- // , 2013. T. 1, 6. . 657686. (voron@forecsys.ru) 15 / 51 16. , , PLSA LDA. - , Jianwen Zhang, Yangqiu Song, Changshui Zhang, Shixia Liu Evolutionary Hierarchical Dirichlet Processes for Multiple Correlated Time-varying Corpora // KDD10, July 2528, 2010. (voron@forecsys.ru) 16 / 51 17. , , PLSA LDA. - Weiwei Cui, Shixia Liu, Li Tan, Conglei Shi, Yangqiu Song, Zekai J. Gao, Xin Tong, Huamin Qu TextFlow: Towards Better Understanding of Evolving Topics in Text // IEEE Transactions On Visualization And Computer Graphics, Vol. 17, No. 12, December 2011. (voron@forecsys.ru) 17 / 51 18. , , PLSA LDA. - n- Shoaib Jameel, Wai Lam. An N-Gram Topic Model for Time-Stamped Documents // 35th ECIR 2013, Moscow, March 2427. pp. 292304. (voron@forecsys.ru) 18 / 51 19. , , PLSA LDA. - n- Shoaib Jameel, Wai Lam. An N-Gram Topic Model for Time-Stamped Documents // 35th ECIR 2013, Moscow, March 2427. pp. 292304. (voron@forecsys.ru) 19 / 51 20. , , PLSA LDA. - , Laura Dietz, Steen Bickel, Tobias Scheer. Unsupervised prediction of citation inuences // ICML-2007, Pp. 233240. (voron@forecsys.ru) 20 / 51 21. , , PLSA LDA. - D. Blei, J. Laerty. A correlated topic model of Science // Annals of Applied Statistics, 2007. Vol. 1, Pp. 17-35. (voron@forecsys.ru) 21 / 51 22. , , PLSA LDA. - I. Vulic, W. De Smet, J. Tang, M.-F. Moens. Probabilistic topic modeling in multilingual settings: a short overview of its methodology with applications // NIPS, 78 December 2012. Pp. 111. (voron@forecsys.ru) 22 / 51 23. , , PLSA LDA. - A. Chaney, D. Blei. Visualizing topic models // International AAAI Conference on Social Media and Weblogs, 2012. (voron@forecsys.ru) 23 / 51 24. , , PLSA LDA. - Jason Chuang, Christopher D. Manning, Jerey Heer. Termite: Visualization Techniques for Assessing Textual Topic Models // Advanced Visual Interfaces, 2012 (voron@forecsys.ru) 24 / 51 25. , , PLSA LDA. - , , ... Ali Daud, Juanzi Li, Lizhu Zhou, Faqir Muhammad. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, Vol. 4, No. 2., 2010, Pp. 280301. ( www.MachineLearning.ru) Topic Modeling Bibliography: http://mimno.infosci.cornell.edu/topics.html (voron@forecsys.ru) 25 / 51 26. - : = (S)(S1 ) = STT , , . . , : PLSA LDA PLSA LDA 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 , = 0.01 D D D 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 , = 0,01 D D D 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 , = 0.1 D D D 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 , = 0.1 D D D : ! (voron@forecsys.ru) 26 / 51 27. - , , n Ri (, ), i = 1, . . . , n . . : dD wd ndw ln tT wttd log-likelihood L (,) + n i=1 i Ri (, ) R(,) max , , wt 0; wW wt = 1; td 0; tT td = 1 i > 0 . (voron@forecsys.ru) 27 / 51 28. - - PLSA , , E-: ndwt = ndw wttd sT ws sd ; M-: wt = nwt nt ; nwt = dD ndwt + wt R wt + ; nt = wW nwt; td = ntd nd ; ntd = wd ndwt + td R td + ; nd = tT ntd R(, ) = 0 EM- PLSA. (voron@forecsys.ru) 28 / 51 29. - . P = (pi )n i=1 Q = (qi )n i=1: KL(P Q) KLi (pi qi ) = n i=1 pi ln pi qi . 1. KL(P Q) 0; KL(P Q) = 0 P = Q; 2. KL : KL(P Q()) = n i=1 pi ln pi qi () min n i=1 pi ln qi () max . 3. KL(P Q) < KL(Q P), P Q, Q P: 0 50 100 150 200 0 0.01 0.02 0.03 0.04 0 50 100 150 200 0 0.005 0.010 0.015 0.020 0 50 100 150 200 0 0.005 0.010 0.015 0.020 P PP Q Q Q KL(P Q) = 0.442 KL(Q P) = 2.966 KL(P Q) = 0.444 KL(Q P) = 0.444 KL(P Q) = 2.969 KL(Q P) = 2.969 (voron@forecsys.ru) 29 / 51 30. - 1: ( LDA) : wt w td t tT KLw (w wt ) min ; dD KLt(t td ) min . : R(, ) = 0 tT wW w ln wt + 0 dD tT t ln td max . , - LDA: wt nwt + 0w , td ntd + 0t. D.Blei, A.Ng, M.Jordan. Latent Dirichlet allocation // Journal of Machine Learning Research, 2003. Vol. 3. Pp. 9931022. (voron@forecsys.ru) 30 / 51 31. - 2: ( LDA) : 1) Td T d D0, 2) Wt W t T0. 0 wt , Wt 0 td , Td : R(, ) = 0 tT0 wWt 0 wt ln wt + 0 dD0 tTd 0 td ln td max , LDA: wt nwt + 00 wt td ntd + 00 td Nigam K., McCallum A., Thrun S., Mitchell T. Text classication from labeled and unlabeled documents using EM // Machine Learning, 2000, no. 23. (voron@forecsys.ru) 31 / 51 32. - 2: ( LDA) : R(, ) = 0 tT0 wWt 0 wt(wt)+0 dD0 tTd 0 td (td ) max . , LDA: wt nwt + 00 wtwt (wt) td ntd + 00 td td (td ). (z) = z cov(0 d , d ). : 0 td Td , td Td . (voron@forecsys.ru) 32 / 51 33. - 3: ( LDA) : wt, td . , . . w , t (?) wt, td : R(, ) = 0 tT wW w ln wt 0 dD tT t ln td max . , -LDA: wt nwt 0w + , td ntd 0t + . Varadarajan J., Emonet R., Odobez J.-M. A sparsity constraint for topic models application to temporal activity mining // NIPS-2010 Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions. (voron@forecsys.ru) 33 / 51 34. - 4: : , . p(t) = d p(d)td, KL- p(t) : R() = tT ln dD p(d)td max . , : td ntd nd nt td + . t, , nt = d w ndwt. (voron@forecsys.ru) 34 / 51 35. - 5: : , . - t: R() = 2 tT sTt wW wtws max . , : wt nwt wt sTt ws + . Tan Y., Ou Z. Topic-weak-correlated latent Dirichlet allocation // 7th Intl Symp. Chinese Spoken Language Processing (ISCSLP), 2010. Pp. 224228. (voron@forecsys.ru) 35 / 51 36. - 6: : , ( ) u, w W . Cuw , p(w|u) = Nuw Nu . wt p(w|t) , p(w|t) = u p(w|u)p(u|t) = 1 nt u Cuw