Upload
kotaro-tanahashi
View
410
Download
1
Embed Size (px)
Citation preview
ディープラーニングライブラリ Coyoteの開発(CNN編)
棚橋 耕太郎
2015.10.16 シルバーウィークの宿題
前回までの実装• fully connected(FC)レイヤーとlogistic回帰を実装 • mnistデータセットで正答率97%
0,1,7,3,…
logistic classifierfully connected layer
fully connected layer
handwritten digits images
labels
今回やったこと• データ構造の再設計 • Convolutional Neural Network(CNN)を実装 • mnistデータセットで正答率98%
stackoverflow.com
CNNの構造
CNNの特徴
rgb input convolutionfilter size=5depth=5
poolingpool size=2
width=28height=28depth=3
width=(28-5+1)=24height=(28-5+1)=24depth=5
width=24/2=12height=24/2=12depth=5
28
2824
24
5
22
5
全体 詳細
• FCと異なるデータ構造(2次元の階層構造を持つ) • 出力されるデータの次元が入力画像の次元に依存 (出力されるデータの次元が簡単にわからない)
FCのInterface以前までのFCの実装
# make neural network layers=[] layers.append(FullyConnect(n_in=28 * 28, n_out=100, activation='tanh')) layers.append(FullyConnect(n_in=100, n_out=50, activation='tanh')) layers.append(LogisticRegression(n_in=50, n_out=10))
各レイヤーへの入力次元を指定しないといけなかった ->各レイヤーへの入力次元は自動計算へ (つまり、レイヤーを上から順番に作成するようにした)
CNNのinterface # make neural network layers = [] layers.append(ImageInput(out_sx=28, out_sy=28, out_depth=1)) layers.append(LeNetConvPoolLayer(out_depth=7, filter_size=5)) layers.append(LeNetConvPoolLayer(out_depth=3, filter_size=5)) layers.append(Flatten()) layers.append(LogisticRegression(n_out=10))
• CNNは最低限の設定out depthとfilter sizeを指定するだけ • 便宜的に、扱うデータを⑴画像vectorと⑵1dim vetorに分ける • flattenで⑴->⑵は変換可能 • 画像の入力はImageInputで行う
rgb input convolutionfilter size=5depth=5
poolingpool size=2
width=28height=28depth=3
width=(28-5+1)=24height=(28-5+1)=24depth=5
width=24/2=12height=24/2=12depth=5
28
2824
24
5
22
5
phirosophy
適当に書けばとりあえず動く!!
legoのように
たった1つのシンプルなルール: 同じ種類の矢印同士はつなぐことができる
FullyConnect
RNN RNN Repeat
LeNetConv Flatten
LogisticRegression
Input ImageInput SequenceInput
image vector
1 dim vector
sequence vector
Layer classclass Layer(object): def __init__(self): self.input=T.matrix() self.output=None
self.batch_size = 1 self.in_sx = 1 self.in_sy = 1 self.in_depth = 1
def in_shape(self): return self.batch_size, self.in_sx, self.in_sy, self.in_depth
def out_shape(self): return self.batch_size, self.in_depth, self.in_sx, self.in_sy
def set_in_shape(self, pre_out_shape): self.batch_size = pre_out_shape[0] self.in_depth = pre_out_shape[1] self.in_sx = pre_out_shape[2] self.in_sy = pre_out_shape[3]
すべてのneural netはlayerクラスを継承して作られている
例) Flattenclass Flatten(Layer): def __init__(self): super(Flatten,self).__init__() self.input = T.matrix()
def out_shape(self): return self.batch_size, self.in_sx * self.in_sy * self.in_depth, 1, 1
def set_input_image_shape(self, image_shape): self.image_shape = image_shape self.n_out = self.image_shape[2] * self.image_shape[3] * self.in_depth
def get_output(self): tmp = self.input.flatten(3) return tmp.reshape((self.batch_size,self.in_sx * self.in_sy * self.in_depth,1,1))
(1)画像vectorから(2)1dim ベクトルへの変換に伴って out shapeのsx,syを1にしている
例)LeNetConvPoolLayerclass LeNetConvPoolLayer(Layer): """Pool Layer of a convolutional network """
def __init__(self, out_depth, filter_size, poolsize=(2, 2)): super(LeNetConvPoolLayer,self).__init__() self.image_shape = None self.in_depth = None self.input = T.matrix() self.rng = np.random.RandomState(4711)
self.poolsize = poolsize self.filter_size = filter_size self.out_depth = out_depth
def init_params(self): self.filter_shape = (self.out_depth, self.in_depth, self.filter_size, self.filter_size) # there are "num input feature maps * filter height * filter width" # inputs to each hidden unit fan_in = np.prod(self.filter_shape[1:]) # each unit in the lower layer receives a gradient from: # "num output feature maps * filter height * filter width" / # pooling size fan_out = (self.filter_shape[0] * np.prod(self.filter_shape[2:]) / np.prod(self.poolsize)) # initialize weights with random weights W_bound = np.sqrt(6. / (fan_in + fan_out)) self.W = theano.shared( np.asarray( self.rng.uniform(low=-W_bound, high=W_bound, size=self.filter_shape), dtype=theano.config.floatX ), borrow=True )
# the bias is a 1D tensor -- one bias per output feature map b_values = np.zeros((self.filter_shape[0],), dtype=theano.config.floatX) self.b = theano.shared(value=b_values, borrow=True)
# store parameters of this layer self.params = [self.W, self.b]
def set_input_image_shape(self, image_shape): self.image_shape = image_shape
def get_output_image_shape(self): conv_out_shape = self.image_shape[2] - self.filter_size + 1, self.image_shape[3] - self.filter_size + 1 return self.batch_size, self.out_depth, conv_out_shape[0]/self.poolsize[0], conv_out_shape[1]/self.poolsize[1]
def out_shape(self): conv_out_shape = self.in_sx - self.filter_size + 1, self.in_sy - self.filter_size + 1 return self.batch_size, self.out_depth, conv_out_shape[0]/self.poolsize[0], conv_out_shape[1]/self.poolsize[1]
def get_output(self): # convolve input feature maps with filters print "self.image_shape",self.image_shape self.conv_out = conv.conv2d( input=self.input, filters=self.W, filter_shape=self.filter_shape, image_shape=self.image_shape )
# downsample each feature map individually, using maxpooling self.pooled_out = downsample.max_pool_2d( input=self.conv_out, ds=self.poolsize, ignore_border=True ) return T.tanh(self.pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
model classの概要 # set batch size at the initial input layer self.layers[0].set_batchsize(batch_size)
for i,layer in enumerate(self.layers[1:]): pre_layer = self.layers[i] layer.set_in_shape(pre_layer.out_shape())
# init parameters for layer in self.layers: layer.init_params()
# set input and output self.layers[0].setInput(self.x2) for i,layer in enumerate(self.layers[1:]): layer.setInput(self.layers[i].get_output())
for layer in self.layers: if hasattr(layer,'params'): try: self.params += layer.params except: self.params = layer.params self.cost = self.layers[-1].get_cost(self.y) self.grads = T.grad(self.cost, self.params) self.updates = [ (param_i, param_i - self.learning_rate * grad_i) for param_i, grad_i in zip(self.params, self.grads) ]
<-上から順番にshape(次元)を計算
<-それぞれパラメータの初期化
<-上のlayerの出力を下のlayerの入力に代入
<-各layerのパラメータをまとめる
<-分類layerからcostを取り出す
<-パラメータの更新方法を指定
つづく