NeurIPS 2019 Vancouver Dec. 8-14, 2019 GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou State Key Lab of CAD&CG, ZJU-Sensetime Joint Lab of 3D Vision, Zhejiang University 1. Background 2. Group Feature & Equivariance Property Problem: How to make image feature descriptors invariant to scaling and rotation? Can we learn an invariant descriptor with theoretical guarantees? 1) Equivariance definition 4. Pipeline 3. Group Convolution & Bilienar Pooling 5. Results 1) Sparse correspondence estimation 2) Dense correspondence estimation GIFT SuperPoint[3] GIFT Reference VCNN Daisy[4] 3) Robustness of GIFT 1) Group convolution i) encodes local structures of group feature ii) preserves equivariance property 2) Bilinear Pooling GIFT descriptor Group Feature 1 Group Feature 2 Advantages over Max/Average Pooling: i) Second order statistics are more informative. ii) Generalized form of previous descriptors [1, 2, 3]. [1] Tal Hassner, et al. On sifts and their scales. In CVPR, 2012. [2] Zhenhua Wang, et al. Affine subspace representation for feature description. In ECCV,2014. [3] Tsun-Yi Yang, et al. Accumulated stability voting: A robust descriptor fromdescriptors of multiple scales. In CVPR, 2016. [4] Daniel DeTone, et al. SuperPoint: Self-supervised interest pointdetection and description. In CVPR Workshops, 2018. [5] James Philbin, et al. Descriptor learning for efficient retrieval. In ECCV, 2010. Only few covariant regions can be reliably detected 1) Traditional DoG detector Convolutions are not invariant to geometric transformation naturally. 2) CNN-based descriptor Small scaling and rotation Large scaling and rotation group feature Local structure of group feature 2) Equivariance property of group feature extraction 3) Invariance of local structure ( ) () g g fT I T fI I g T I ( ) g fT I () fI f f g T g T I input data. f feature extractor g T transformation of input g T transformation of feature Equivariance Transformation results in permutation.

NeurIPS 2019 Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng ... · [2] Zhenhua Wang, et al. Affine subspace representation for feature description. In ECCV,2014. [3] Tsun-Yi Yang,

Download PDF Report

Upload
others
View
1
Download
0

Embed Size (px)

Citation preview

Page 1: NeurIPS 2019 Yuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng ... · [2] Zhenhua Wang, et al. Affine subspace representation for feature description. In ECCV,2014. [3] Tsun-Yi Yang,

NeurIPS 2019Vancouver

Dec. 8-14, 2019

GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNsYuan Liu, Zehong Shen, Zhixuan Lin, Sida Peng, Hujun Bao, Xiaowei Zhou

State Key Lab of CAD&CG, ZJU-Sensetime Joint Lab of 3D Vision, Zhejiang University

1. Background

2. Group Feature & Equivariance Property

Problem: How to make image feature descriptors invariant to scaling and rotation?

Can we learn an invariant descriptor with theoretical guarantees?

1) Equivariance definition

4. Pipeline

3. Group Convolution & Bilienar Pooling 5. Results

1) Sparse correspondence estimation

2) Dense correspondence estimation

GIFT SuperPoint[3]

GIFTReference VCNN Daisy[4]

3) Robustness of GIFT

1) Group convolution

i) encodes local structures of group feature ii) preserves equivariance property

2) Bilinear Pooling

GIFT descriptor

Group Feature 1

Group Feature 2

Advantages over Max/Average Pooling:i) Second order statistics are more informative.ii) Generalized form of previous descriptors [1, 2, 3].

[1] Tal Hassner, et al. On sifts and their scales. In CVPR, 2012.[2] Zhenhua Wang, et al. Affine subspace representation for feature description. In ECCV,2014.[3] Tsun-Yi Yang, et al. Accumulated stability voting: A robust descriptor fromdescriptors of multiple scales. In CVPR, 2016.

[4] Daniel DeTone, et al. SuperPoint: Self-supervised interest pointdetection and description. In CVPR Workshops, 2018.[5] James Philbin, et al. Descriptor learning for efficient retrieval. In ECCV, 2010.

Only few covariant regions can be reliably detected

1) Traditional DoG detector

Convolutions are not invariant to geometric transformation naturally.

2) CNN-based descriptor

Small scaling

and rotation

Large scaling

and rotation

group feature

Local structure of

group feature

2) Equivariance property of group feature extraction 3) Invariance of local structure