Semantic Indexing Using Deep CNN and GMM Supervectors Nakamasa Inoue and Koichi Shinoda, Tokyo Institute of Technology
System Overview
! A hybrid system of Gaussian-mixture-model (GMM) supervectors and deep convolutional neural networks.
! Our best result was 0.299 (Mean InfAP), which is ranked 3rd among participating teams. ! Future work: audio and motion analysis using deep neural networks.
Results & Conclusion
Gaussian-Mixuture-Model Supervectors
! Each video shot is modeled by a GMM. Maximum a poste- riori adaptation is used to estimate parameters. ! 6 types of low-level features: Harris SIFT, Hessian SIFT, Dense SIFT, Dense HOG, Dense LBP, and MFCC.
Prior GMM
MAP adaptation
GMM supervector
Convolutional Neural Network
! The convolutional network with 16 layers in [1] is used to extract 4096 dimensional features. ! Parameters of the CNN are trained on ImageNET 2012.
Video shot
Score fusion
Deep Convolutional Neural Network (CNN)
SVM
GMM Supervectors
SVMs
Audio & Visual Low-level features GMM
TokyoTech at TRECVID 2015
CNN
CNN
CNN
! Features are extracted from multiple frames in each video shot by using convolutional neural networks.
SVM
max pooling
!
video shot
[1] K. Simonyan, and A. Zisserman, Very Deep Convolutional Networks for Large- Scale Image Recognition In Proc. of ICLR, 2015.
Method Mean InfAP
Deep CNN 0.274GMM Supervector 0.226Fusion 0.299
TokyoTech Runs
InfA
P
" : Highest score in 2015 # : Ours $ : Median
Input Convolution Max pooling Full connection
4096 dim.
Low-level Features Mean vector of a GMM
!
!"#
!"$
!"%
!"&
'()*+,-./(012*- 3-+4.567-8./(012*- 9+*,2461.,-+4
!"#$"!%&'()*(!+,-./0(1234
5"&)678)*7"9(:7*'(;<)*7"=!%><"?)6;%6%&*7@%(;%)?&'()9A(;BB9%*
:;20<=-.>+,+,[email protected]+=+,+0+.'42<[email protected]+!"#$"%&'()*)+),%"-%!,./'"0"1$
CD?(;$E*%>
+%ED6*E(F(-"9&6DE7"9G(HD*D?%(I"?#E
J"@%6*$(3
;<)*7"=!%><"?)6(+%K7"9(B?"<"E)6E
J"@%6*$(1
LD6*7=H?)>%(;&"?%(HDE7"9
J"@%6*$(M
J%7K'N"?(H?)>%(;&"?%(O""E*79K
! ;%6%&*7@%(;%)?&'FGH.I6?C.?-,52*+8.E6,-40624+8.-7?-4E-E.*-J624.5*2520+80! KC60.I688.5*2E<1-.+.8+*J-.4<,L-*.2).*%><"?)66$(&"9*79D"DE.*-J624.5*2520+80
! M0.D-8-1?6N-.D-+*1C@.:-0<8?0.124?+64.+.82?.2).DE%6%EE(<?"<"E)6E
! M4.-))616-4?.,-?C2E.?2.-7?*+1?.OAA012*-0.)*2,.+.8+*J-.4<,L-*.2).2LP-1?*-J6240.2).+.6,+J-
! M1C6-N-E.+.0?+?-(2)(?C-(+*?.*-0<8?.64.2LP-1?.821+86Q+?624
! KC-.D-8-1?6N-.D-+*1C.*-0<8?0.+*-.<0-E.+0.*-J624.5*2520+8
! K2.+N26E.9"7E%.2*."NP%&*(A%Q"?>)*7"9@.)<0-.)-+?<*-.,+50.+,24J.E%@%?)6(Q?)>%E
! R718<E-.<0-8-00.5*2520+80
FGH.S".:".:"[email protected]".R".M"[email protected]".U-N-*[email protected]".V".3".D,-<8E-*[email protected]?6N-.0-+*1C.)2*.2LP-1?.*-12J46?624".'4.'[email protected]"[email protected]"GX$(GYG@.#!GZ
F#H.B".9-@.[".\[email protected]".:[email protected]".D<[email protected]+?6+8.5;*+,6E.522864J.64.E--5.124N28<?624+8.4-?I2*=0.)2*.N60<+8.*-12J46?624".'4.'RRR.K*+40+1?6240.24.]+??-*4.M4+8;060.+4E.3+1C64-.'[email protected]"G^!$(G^G%@.#!GX
! ').0;0?-,.)+680.?2.821+86Q-.64.02,-.)*+,[email protected]*./*+,-.D12*-._220?64J.I688.*-12N-*.<064J.4-6JCL2*0
_220?_220?
9+*,".3-+4.2)./(012*-0
3-?C2E W+8 K-0?
D-8-1?6N-.D-+*1C.`.D]]4-? !"$$&G !"X%X%
`.DK(:-J624.]*[email protected]<8?6(/*+,-.D12*-./<0624
!"$XG& !"XYG%
`.A-6JCL2<*(/*+,-.D12*-._220? 2RS4TU 2R4V42
! 3<8?6(/*+,-.D12*-./<0624.+4E.A-6JCL2*(/*+,-.D12*-._220?64J.6,5*2N-E.?C-.012*-
! V-.+*1C6N-E.Z*E.58+1-.+,24J.+88.?-+,0.I6?CC+*,2461.,-+42)./(012*-0
! /<?<*-.I2*=a.KC-.E-?-1?624.*-0<8?0.0?*24J8;.E-5-4E.24.b<+86?;.2).DK(:-J624.]*2520+80" ',5*2N-.DK(:-J624.]*2520+80.b<+86?;" c21+86Q+?624.I6?C2<?.*-J624.1+4E6E+?-0
! U-4-*+?-.*-J6240.)*2,.)-+?<*-.,+50 D;0?-,.2<?5<?
U*2<4E.?*<?C
!"# !"#
!$%&'
()*+%, ()*+%,
()*+%,
-.(&/0'-.(&/0' 1.(&/0'1.(&/0' 1.(&/0'1.(&/0' 1.(&/0'1.(&/0'
!$%&'!$%&'
!1123/4'&!1123/4'&(5
6'78
(5
6'78
!"#
!$%&'
599
(5
6'78
(5
6'78
!"#
!$%&'
6':+%,2;&%;%*/3*2<4
!'3'$=+>'2!'/&$?@AB
3-E6+3688._-0?
OOA>._-0?
CD?(O%E*
K*6,50._-0?
]61Dd3._-0?
eOT._
-0?
K6,-
!11,'=
!11,'=
!11,'=
!11,'=
KC-.6,+J-.)*2,.FGH
! D-8-1?6N-.D-+*1C.5*2E<1-0.+.8+*J-.4<,L-*.2).2LP-1?.*-J624.5*2520+80.)*2,.+.0?688.6,+J-
! M4.6,+J-.60.0-J,-4?-E.C6-*+*1C61+88;.I6?C.0-N-*+8.0-J,-4?+?624.0?*+?-J6-0.6418<E64J.DE%6%EE("9%E
B?%@7"DE(:"?#W(;%6%&*7@%(;%)?&'FGH.`.;<)*7)6(B$?)>7A(B""679K(9%*F#H
! !
!"#$%&"'"($&#)$*+,
!"#$%&'"()")&*#+,&'"*,&"&-#,*.#&/0
12344"'+%&,5&.#(,'")(,"/&6'&"783"9:783;<2""""""""""""""=""""""""""""""")(,">3?@ABCD"9ABCD;E2""""""""""""""=""""""""""""""")(,"/&6'&"#,*F&.#(,$"),(G"783H""""""""""""""""""""""""""""""""78CH"*6/"4?7"9:D;
!2IJ/&(A#(,$",&%,&'&6#*#J(6'"9K&L"M;
4*-JG+G" *" %('#&,J(,J" 94NO;" */*%#*#J(6" *6/" P6J5@&,'*Q" ?*.RS,(+6/" 4(/&Q" 9P?4;" *,&" +'&/" #(" G*R&"344"'+%&,5&.#(,'
-+./+-")0'#$'-123456'7%8$*9":*#'2;",$'6"$")$*+,'<=>?
3+9@*,#$*+,'+A'4*:"+B$+&/'#,:'C77'D%E"&;")$+&DD,*6"7*J":*6SH"K*R*G*'*"B6(+&H"*6/"T(J.UJ"AUJ6(/*
!"#$"%&'()*)+),%"-%!,./'"0"1$
B"$$*,F *,AGH<==IJK
VJ#U(+#"IJ/&(A#(,$ 1E2WW
VJ#U"IJ/&(A#(,$ >LMNO
IJ/&(
344"'+%&,5&.#(,'
344"'+%&,5&.#(,")(,"
:D@4?7"x
D&,G"5&.#(,"y"
IJ'+*Q"
%,(F&.#J(6"BD&-#+*Q"
%,(F&.#J(6"A
X
Y
1X
1Y
<X
<Y
EX
EY
!XOA"1XXZ-
Z5*QC+QQ
Z5*QA+[
J6)N
O<XX"9\
;
AI4
AI4'
C+'J(6
drivesafeelectriccar
IJ/&(A#(,$",&%,&'&6#*#J(6
B/D$"9'+;"&;*"P
V&".(G[J6&"IJ/&(A#(,$",&%,&'&6#*#J(6"LJ#U"#U&"344"'+%&,5&.#(,"'$'#&G
4*:"+B$+&/Q>R
]1^"!"#$%&''(#)*+,-#-#,).*/%&",'*0()'#)1.*2(('*34*04*5)&(14*6#7(&58&$9:*!*;(<*0=>8#"(7#,*?"-(77#)@*A&$*B(<C?D,"E>(*F(G&@)#8#&)*,)7*/$,)'>,8#&)*&A*?H()8'4*!20*0=>8#"(7#,.*IJKL
1"D%8$D 3+,)8%D*+,D! IJ/&(A#(,$" 'U(L'"&))&.#J5&6&''" J6"&5@&6#'" '+.U" *'" _>(.R" .QJG[J6S`H" _CJ-J6S"G+'J.*Q"" J6'#,+G&6#'`H""_O*,RJ6S""*"5&@UJ.Q&`H"_D+6J6S""G+'J.*Q"J6'#,+G&6#'`
! B#"J'"6&&/&/"#("J6.,&*'&"#U&"*G(+6#""()"#,*J6J6S"/*#*")(,"JG%,(5J6S"#U&"%&,)(,G@*6.&
344"'+%&,5&.#(,")(,"
:D@4?7"x
IJ'+*Q"%,(F&.#J(6"*6/"#&-#+*Q"%,(F&.#J(6"*,&"#,*J6&/"F(J6#Q$"+'J6S"5J/&('"*6/"#U&J,"#J#Q&'"),(G"#U&"IJ/&(@A#(,$!aT"/*#*'&#
86Q$" #U&" 5J'+*Q" %,(F&.#J(6" J'" +'&/" #(".(G%+#&"IJ/&(A#(,$",&%,&'&6#*#J(6'"()"D>ZbIB:"5J/&('
E,/"*G(6S"c"#&*G'
"!
"
!#
$$$#
O,&@#,*J6J6S"(6"#U&"IJ/&(A#(,$!aT"/*#*'&#
"!
"
!#
$$$#
IJ/&(A#(,$",&%,&'&6#*#J(6'"()"D>ZbIB:"5J/&('
IJ/&(A#(,$"
,&%,&'&6#*#J(6"s
O,&@#,*J6&/"5J'+*Q"
%,(F&.#J(6"B
IJ/&(A#(,$"
,&%,&'&6#*#J(6"ss=Bxs=Bx ŷ=As
AI4
b(G%*,J'(6"()"(+,"/J))&,&6#"'&##J6S'"J6"#U&".(6/J#J(6"()
OA"1XZ-"Z5*QA+[