Localiza)on using Faster R-CNN and Mul)-Frame Fusion · Eﬃcient End-to-End object localiza)on 1....

Localiza)onusingFasterR-CNNandMul)-FrameFusion

RyosukeYamamoto,NakamasaInoue,KoichiShinodaTokyoIns8tuteofTechnology

Outline

Mo)va)on:detectanac)onconcept“Si?ngDown”

Ourmethod:FasterR-CNN+LSTM+Re-scoring

Annota)on:Frame-wiseannota)onforSi?ngDown,Key-frameannota)onforotherconcepts

Results:2ndamong3teams,bestresultatSi?ngDown

0.5iframe_fscore

mean_pixel_fscore

Mo)va)on

・Localiza)ontaskfocusesnotonlyonsta)cobjects,butalsoonac)onconcepts・WefocusonSi?ngDown,oneofac)onconcepts・Howtodis)nguishbetweenSi?ngandSi?ngDown?→Dynamicinforma)onis

importantforprecisedetec)on

Si?ng Si?ngDown

OurMethod

・Faster-RCNN(Ren2015)-Efficientobjectlocaliza)on

・LSTM(Donahue2015)-Preciseac)onlocaliza)on-AppliedtoSi?ngDown

・Re-scoring(Yamamoto2015)

-Mul)-frameScoreFusion-Mul)-ShotScoreBoos)ng

Faster R-CNN

PredictionPrediction Prediction

Fusion

LSTMLSTM LSTM

BoostBoost Boost

Time Sequence

FasterR-CNN(Ren2015)

EfficientEnd-to-Endobjectlocaliza)on1.Generateregionproposalsbyanetwork2.PredictscoresforeachregionbyusingCNNfeaturesExampleCNNs:

-ZFNet(Zeiler2014) weuse-VGG-16(Simonyan2014)-GoogLeNet(Szegedy2015)-ResNet(He2016)

ROI PoolingROI Pooling

Region Region

proposalsproposals

FasterR-CNN

Prediction

FasterR-CNN

Prediction

FasterR-CNN

Prediction

Time Sequence

LongShort-TermMemory(LSTM)

AnLSTMlayerisintroducedtoFasterR-CNN-memorizelongandshortterminforma)on-appliedonlytoSi?ngDown

Mul)-FrameandMul)-Shot(Yamamoto2015)

l  Mul)-FrameScoreFusionAveragepoolingofscoresover5framesinashot

l  Mul)-ShotScoreBoos)ngAddadjacentshotscores

Key-frame(I-frame)

Average

Key-FrameAnnota)ons

Bounding-boxannota)onontherepresenta)vekey-frameforeachshotlabeledasposi)veincollabora)veannota)on

Concept #frames #boxes Concept #frames #boxesAnimalBicyclingBoyDancingExplosionFire

11,545599

1,8482,1182,483

9,1551,3552,4925,1992,402

Inst.MusicianRunningSi?ngDownBabySkier

4,923945

-898320

7,2291,394

-895521

I-FrameAnnota)onsforSi?ngDown

l  I-Frameannota)onforSi?ngDowntotrainLSTMl  Annota)onresults

#shots=92#frames=481#bounding-boxes=515

*WefoundSi?ngDowninonly92shotsinthe3Kshotslabeledasposi)veincollabora)veannota)on

Results

0.5iframe_fscore

mean_pixel_fscore

TokyoTechRuns

ID Method RunID1*2*3*4*5

FasterR-CNN+Mul)-FrameScoreFusion1+Mul)-ShotScoreBoos)ng1+LSTM(4096units)forSi?ngDown2+LSTM(4096units)forSi?ngDown2+LSTM(64units)forSi?ngDown

fusionboostfusion.lstmboost.lstm(postexp.)

l  2ndamong3teams

ResultsforSi?ngDown

ID Method I-FrameF-score PixelF-score2*4*5

Fusion+Boos)ng2+LSTM(4096units)2+LSTM(64units)

0.630.00

0.220.004.51

BestresultforSi?ngDownwithrun#2LSTMwith4096units(run#4)didnotwork→LSTMwith64units(run#5)avoidedover-fi?ng

andworkedinpostsubmissionexperiment

SittingDown

System outputGround truthGood cases Bad cases

Moving but not sitting down Moving around a chairSitting down

Re-trained network with LSTM 64 units

Animal, Good Results

System output Ground truth

Faster R-CNN Score Fusion

Cat (no movement)

Score Boosting

Dog (walking)

Animal, Bad Results

Faster R-CNN Score Fusion

Many animals

Score Boosting

Bird (flying fast)

Others

Faster R-CNN Score Fusion Score Boosting

Bicycling

Others

Dancing

ExplosionFire

Others

InstrumentalMusician

Running

Others

Conclusion&FutureWork

l  Weproposedalocaliza)onsystem-FasterR-CNN+LSTM+Re-scoring

l  Manualannota)on-31Kboundingboxes

l  Results-2ndamong3teams,bestresultatSi?ngDown-LSTMwith64unitswaseffec)veforSi?ngDown

l  Futurework-Findabeoerwaytolocalizeac)on

Localiza)on using Faster R-CNN and Mul)-Frame Fusion · Eﬃcient End-to-End object localiza)on 1....

Documents

Multi-task Learning using Multi-modal Encoder-Decoder Networks … · 2017. 11. 21. · sists of encoders and decoders for each modality, and the whole network is trained in an end-to-end

EUV Mask Observation Result Using Coherent EUV ...euvlsymposium.lbl.gov/pdf/2014/4a9897e45aa744f2850b65346aa59055.pdf · 1st order diffractions Line end over defect 40 nm EXP.time:

Localiza - Mecânica Básica

Soeks Ecovisor Manager instruction 141117 · SOEKS Manager Setup SUEKS read Standard End User License Agreement (EULA) PLEASE READ CAREFULLY BEFORE USING THIS PRODUCT: End-user License

Localiza institucional port final

Driving Multidisciplinary Optimization Using ANSA - End ... · Driving Multidisciplinary Optimization Using ANSA - End User Case Studies ... • ANSA key enablers - what made the

Hacia LOCALIZA Libre

1 | Localiza ção

Localiza 60 MM Final

Quick tour to front end unit testing using jasmine

[SIP 2015] Back-end Proposal: Chat System using Socket.io

Generation of Mammalian Host-adapted Leptospira ...the top of the DMC and passing the free end of the tubing through the loop. Trim away excess tubing from the tied end using sterile

Curs 9 - profs.info.uaic.rootto/LFAC2017-18/LFAC10.pdf · Curs 9 1 Analiza ... a end# SC end# expandare 2 a end# BC end# expandare 4 a end# aC end# potrivire end# C end# expandare

How to for External ReWriters using Nero 7 and LightScribe...When you are using Windows 98 Second Edition, please go to the end of this chapter. Otherwise, continue with step 7. Identify

Using the Visual Studio® Debugger - pearsoncmg.comptgmedia.pearsoncmg.com/images/9780133439854/downloads/cppfp2… · 12}; // end class Account 1 ... H-4 Appendix H Using the Visual

Effective Front-end Architecture Search for Random Weight Network using Particle Swarm ...mi.cau.ac.kr/activities/outputs/lect_rwnas.pdf · 2019-11-25 · 제132회석사학위졸업논문발표

end. env. end. - Marie Claire · 2019. 8. 26. · end. end. end. end. end. end. end. env. env. env. end. end. end. end. end. 22 cm Tour taille 51 cm Tour hanches 54 cm A B C D 1 cm

Digital image processing using Matlab - bml.pusan.ac.krbml.pusan.ac.kr/Lecture/Undergraduates/IntroMedEng/2016/Digital... · Matlab •For : for-end is a repetition statement providing

Localiza - Balanco Patrimonial

Localiza 2T12