Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
인공지능심화학습프로그램Convolution Neural Network
Computer vision
Vision and Learning laboratory강준규
2020.11.02 ~ 2020.11.05
2
소개
◼ 출처 : cs231n.stanford.edu
3
목차
◼ What is computer vision?
◼ Let’s know Image *like RGB channel
◼ Convolution neural network example
◼ Filter
◼ Why Convolution neural network?
◼ Convolution neural network
◼ Convolution layer
◼ Stride, padding, pooling
◼ Receptive field
◼ ImageNet
◼ VGG, Resnet, Googlenet
◼ QnA & Training
4
컴퓨터비전이란?
컴퓨터가보는시야를뜻합니다.
즉컴퓨터에들어갈이미지, 영상등
얼마나어떻게어떠한방법인지
분석하고찾아내는분야입니다.
출처 : https://medium.com/@miccowang/computer-vision-the-closet-thing-to-ai-on-our-personal-device-d2ff63994856
출처 : https://www.annalect.com/7-ways-computer-vision-helps-marketers-see-better-performance/
5
컴퓨터비전이란?
컴퓨터가사물을보기위해선컴퓨터가알아볼수있게저장해야하고저장방식에는다양한방법이있습니다.
일반적인이미지는 RGB 채널을이용하여 0~255 사이의숫자를집어넣어서이미지를숫자화시킵니다.
출처 : https://qits.tistory.com/entry/RGB%EC%83%89%EC%83%81%ED%91%9C
R GB
6
Convolution neural network example
7
Convolution neural network example
8
Convolution neural network example
9
Convolution neural network example
10
Filter?
11
Filter?
0 -1 0
-1 8 -1
0 -1 0
18 8 9
19 12 7
13 14 30
0 -8 0
-19 96 -7
0 -14 0
0 -1 0
-1 8 -1
0 -1 0
8 9 6
12 7 17
14 30 12
0 -9 0
-12 56 -17
0 -30 0
12
Filter?
0 -8 0
-19 96 -7
0 -14 0
0 -9 0
-12 56 -17
0 -30 0
0 -9 0
-1 56 -17
0 -30 0
…
…
0 -8 0
-9 96 -7
0 -14 0
0 -9 0
-1 56 -17
0 -30 0
0 -19 0
-8 23 -1
0 -3 0
+
Filter
13
Filter?
Filter Filter Filter Filter
사람이정한 filter를이용하면사진에특수한효과를넣을수있다.
그럼기계가 filter를자동으로찾아내고특수한 filter를만들순없을까?
Ex) 눈을찾는 filter, 모서리를찾는 filter, 사람이알지못하는특수한차원의 filter
14
Convolution
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
𝑥1 𝑥2 𝑥3
𝑥4 𝑥5 𝑥6
𝑥7 𝑥8 𝑥9
Image3x3x3
RG
B
1 2
2 1
1 2
2 1
𝑤1 𝑤2
𝑤3 𝑤4𝑦1
Filter2x2x3
Activation map2x2x1
𝑦1 = 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑤4𝑥4 + 𝑤5𝑥5+ 𝑤10𝑥10 + 𝑤11𝑥11 + 𝑤13𝑥13 + 𝑤14𝑥14
+ 𝑤19𝑥19 + 𝑤20𝑥20 + 𝑤22𝑥22 + 𝑤23𝑥23
15
Why Convolution neural network?
16
Why Convolution neural network?
17
Why Convolution neural network?
Model이찾아야하는 weight의개수는?
16 (input size) x 4 (hidden node)+ 4 (hidden node) x 2 (output)
72
만약 input size 224 x 224
hidden layer가 3개
각각 1024, 2048, 4096개 node
Output 1000개의 class 일경우
필요한 weight 개수는?
50176 x 1024 + 1024 x 2048 + 2048 x 4096 + 4096 x 1000= 65,961,984개
18
Why Convolution neural network?
Input size 224x224
Convolution layer 3 kernel size 3x3
Each channel size 128, 256, 512
Last fully connected layer 1024
Output class 1000
9 x 128 + 9 x 256 + 9 x 512 + 1024 x 1000= 1,032,064
19
Convolution neural network
20
Fully connected layer
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
3 3 2
2 9 1
2 5 1
Image3x3x3
RG
B
3 2 5 6 2 3 4 2 4 3 9 6 4 5 1 1 3 3 2 7 4 3 1 1 3 9 1
𝑤1𝑤2
𝑤7
𝑤18
𝑤27
……
𝑥1𝑥2
𝑥7
𝑥18
𝑥27…
…
2727
𝑤1𝑤2
𝑤7
𝑤18
𝑤27
……
21
Convolution layer
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
𝑥1 𝑥2 𝑥3
𝑥4 𝑥5 𝑥6
𝑥7 𝑥8 𝑥9
Image3x3x3
RG
B
Image 3x3x3
filter 2x2x3
1 2
2 1
1 2
2 1
𝑤1 𝑤2
𝑤3 𝑤4
𝑦1 𝑦2
𝑦3 𝑦4
Filter2x2x3
Activation map2x2x1
𝑦1 = 𝑤1𝑥1 +𝑤2𝑥2 + 𝑤4𝑥4 + 𝑤5𝑥5+ 𝑤10𝑥10 + 𝑤11𝑥11 + 𝑤13𝑥13 + 𝑤14𝑥14
+ 𝑤19𝑥19 + 𝑤20𝑥20 + 𝑤22𝑥22 + 𝑤23𝑥23
22
Convolution layer
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
3 3 2
2 9 1
2 5 1
Image3x3x3
RG
B
1 2
2 1
Filter2x2x3
1 2
2 1
1 2
2 1
Activation map2x2x1
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
3 3 2
2 9 1
2 5 1
Image3x3x3
RG
B
1 2
2 1
Filter2x2x3
1 2
2 1
1 2
2 1
…
42
56
Activation map2x2x1
42 56
17 95
23
Convolution layer
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
3 3 2
2 9 1
2 5 1
Image3x3x3
RG
B
1 2
2 1
Filter2x2x3
1 2
2 1
8 21
-2 -1
Activation map2x2x1
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
3 3 2
2 9 1
2 5 1
Image3x3x3
RG
B
1 2
2 1
Filter2x2x3
1 2
2 1
…
17
-5
Activation map2x2x1
17 -5
1 35
8 21
-2 -1
24
Convolution layer
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
3 3 2
2 9 1
2 5 1
Image3x3x3
RG
B
Filter2x2x3
…
Activation map2x2x1
17 -5
1 35
4 3 5
2 9 3
2 5 2
4 3 5
7 9 3
1 1 8
3 3 2
2 9 1
2 5 1
Image3x3x3
RG
B
Filter2x2x3
Activation map2x2x1
42 56
17 951 2
2 1
1 2
2 1
1 2
2 1
1 2
2 1
1 2
2 1
8 21
-2 -1
25
Convolution layer
숫자 1개
26
Convolution layer
27
Convolution layer
28
Convolution layer
29
Convolution layer
30
Convolution layer stride
7x7 image / 3x3 filter / stride 1
…
7x7 image / 3x3 filter / stride 2
…
31
Convolution layer stride
7x7 image / 3x3 filter / stride 2
…
Output activation map7x7 image / 3x3 filter / stride 1
Output activation map7x7 image / 3x3 filter / stride 2
Output activation map7x7 image / 3x3 filter / stride 3
32
Convolution layer padding
0 0 0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0
7x7 image / 3x3 filter / stride 1 / padding 1
0 0 0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0
Output activation map7x7 image / 3x3 filter / stride 1 / padding 1
Output activation map7x7 image / 3x3 filter / stride 2 / padding 1
33
Convolution layer pooling
34
Filter size formula
35
Classification
Cat 0.7
Dog 0.08
Car 0.01
Bird 0.01
Lion 0.2
Model
36
Detection
1
2
3 22
4
Model
37
Segmentation
1 0 1 1 0 1
1 0 1 1 0 1
1 1 1 1 1 1
2 1 1 1 1 1
1 1 1 1 0 1
2 1 2 2 0 1
2 2 2 2 0 2
0 0 2 2 0 2
Model
38
Convolution neural network
Filter 동그라미? Filter 눈? Filter
고양이
Filter 가로선? Filter 수염?
Filter 동그라미? Filter 발?
Filter
Filter
39
Convolution neural network
convolution layer 1 convolution layer 2 convolution layer 3
간단한특징 고차원특징 초고차원특징
40
Convolution neural network
41
Receptive field
3x3 filter conv
Fully connected layer
Cat : 78%Lion : 20%Dog : 2%
이경우 model은
즉더넓은영역의정보를취합하여예측하는게아니라 3x3 영역까지의정보만보고예측한다.
Ex) 귀가있으면서발이있다 -> 정보를모름귀와발은 3x3 이상떨어져있기때문
만보고판단을한다.
3x3 filter conv 3x3 filter conv
Fully connected layer
Cat : 78%Lion : 20%Dog : 2%
이경우 model은 5x5 영역까지의
정보를보고예측한다.
42
Receptive field
5x5 filter conv
3x3 filter conv 3x3 filter conv
둘의차이점은?
장점과단점
43
Dilation convolution
44
Convolution neural network
45
LeNet-5
Parameter size ≒ 60,000
LeNet은 1998년초창기 CNN 모델이다.Yann LeCun이개발했으며현재뉴욕대교수이면서Facebook 연구소기술이사이다.원래우편번호와수표의필기체들을인식하기위한용도
46
ImageNet
ImageNet은 computer vision 경연을위해ILSVRC에서사용하는유명한데이터셋.2012~2017년 까지대회를진행하였으며현재에도각종논문에서사용하고있는가장유명한데이터이다.
1000종류와총 1,281,167장의 데이터가존재하며200GB가넘는다.
ILSVRC: ImageNet Large Scale Visual Recognition Competition
47
AlexNet
Parameter size ≒ 62,000,000
AlexNet은 2012년 ILSVRC에서우승인 CNN 구조이다.특이사항으론이전에설명한 LeNet-5 의구조를따르지만컴퓨터성능이발전함에따라 2개의 GPU를병렬연산할수있게모델을설계했다.Image netTop-5 error : 15.3%
48
VGG
Ex) VGG16 (D type)
Parameter size ≒ 138,000,000
VGG는 2014년 ILSVRC에서 2등인 CNN 구조이다.특이사항으론 1등인 GoogleNet보다간단한구조인데성능의차이가크지않고응용하기쉽다.Image netTop-5 error : 7.5%
49
GoogleNet
GoogleNet은 2014년 ILSVRC에서우승인 CNN 구조이다.특이사항으론 Inception module을이용하여 .Image netTop-5 error : 6.7%
Inception module
50
Resnet
일반적인 network의구조 residual network의구조
Resnet은 2015년 ILSVRC에서우승인 CNN 구조이다.특이사항으론기존모델들과다르게 residual 개념을사용하여더욱더깊은 network를설계할수있게되었다.Image netTop-5 error : 3.57%
51
Q & A
◼ Thank You