Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
딥러닝구현기법 “using Caffe”
2016-11-11
DGIST 미래자동차융합연구센터
Heechul Jung
2016 대한임베디드공학회추계학술대회 Tutorial
Learning Deep Learning. (Research-oriented)
Install linux.(Ubuntu, fedora, ..)
Read papers.(ArXiv, CVPR, ICCV, NIPS, ICML, ICLR)
Install deep learning tool.(Caffe, torch, tensorflow..)
Learning deep learning tool.
Reproduce state-of-the-art algorithms. (Baseline)
1 day
several weeks
1 day
1 day
Implementing Idea.
few weeks
Writing a paper?
Real-time Object Recognition
CPU : Intel i5-4690 CPU 3.5GHzRAM : 18GBGPU : NVIDIA Geforce GTX770
How to use Caffe.
What is Caffe?Open framework, models, and worked examplesfor deep learning
- < 2 years
- 600+ citations, 100+ contributors, 6,000+ stars
- 3,400+ forks
- focus has been vision, but branching out:sequences, reinforcement learning, speech + text
Prototype Train Deploy
What is Caffe?Open framework, models, and worked examplesfor deep learning- Pure C++ / CUDA architecture for deep learning- Command line, Python, MATLAB interfaces
- Fast, well-tested code
- Tools, reference models, demos, and recipes
- Seamless switch between CPU and GPU
Prototype Train Deploy
Caffe is a Community project pulse
Open Model Collection
The Caffe Model Zooopen collection of deep models to share innovationVGG ILSVRC14 + Devil models in the zooNetwork-in-Network / CCCP model in the zooMIT Places scene recognition model in the zoo
help disseminate and reproduce research
bundled tools for loading and publishing models
Share Your Models! with your citation + license of course
Recipe for Brewing
Buy NVIDIA graphic cards.Install Caffe.Convert the data to Caffe-format
lmdb, leveldb, hdf5 / .mat, list of images, etc.
Define the Net (prototxt)Configure the Solver (prototxt)caffe train -solver solver.prototxt -gpu 0
Examples are your friendscaffe/examples/mnist,cifar10,imagenet
caffe/examples/*.ipynb
caffe/models/*
Choose your graphic card.
NVIDIA K40Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees 1 GB of GPU memory.Best settings with ECC off and maximum clock speed in standard Caffe:• Training is 26.5 secs / 20 iterations (5,120 images)• Testing is 100 secs / validation set (50,000 images)Best settings with Caffe + cuDNN acceleration:• Training is 19.2 secs / 20 iterations (5,120 images)• Testing is 60.7 secs / validation set (50,000 images)
NVIDIA TitanTraining: 26.26 secs / 20 iterations (5,120 images). Testing: 100 secs / validation set (50,000 images).cuDNN Training: 20.25 secs / 20 iterations (5,120 images). cuDNN Testing: 66.3 secs / validation set (50,000images).
NVIDIA K20Training: 36.0 secs / 20 iterations (5,120 images). Testing: 133 secs / validation set (50,000 images).
NVIDIA GTX 770Training: 33.0 secs / 20 iterations (5,120 images). Testing: 129 secs / validation set (50,000 images).cuDNN Training: 24.3 secs / 20 iterations (5,120 images). cuDNN Testing: 104 secs / validation set (50,000images).
Installationdetailed documentation:
http://caffe.berkeleyvision.org/installation.html
required packages:CUDA, OPENCVBLAS (Basic Linear Algebra Subprograms):
operations like matrix multiplication, matrix addition, both implementation for CPU(cBLAS) andGPU(cuBLAS). provided by MKL(INTEL), ATLAS, openBLAS, etc.
Boost: a c++ library.> Use some of its math functions and shared_pointer.
glog,gflags provide logging & command line utilities.> Essential for debugging.
leveldb, lmdb: database io for your program.> Need to know this for preparing your own data.
protobuf: an efficient and flexible way to define data structure.> Need to know this for defining new layers.
Define your task
Dog?Cat?
Next stepPreparing data => LevelDB, LMDB
Model Definition (tran_val.prototxt)
Solver (solver.prototxt)
DownloadImage Data
LevelDB,LMDB
train_val.prototxt
solver.prototxt
Preparing dataIf you want to run CNN on other dataset:
caffe reads data in a standard database format.
You have to convert your data to leveldb/lmdb manually.
Creating image set
for imagenet dataset…
Using LMDB
./convert_imageset --resize_height=256 --resize_width=256 --shuffle ./data/imagenet ./data/imagenet/train.txt ./ilsvrc12_train_lmdb --backend=lmdb
Define your network (train_val.prototxt)
Define data layer.
Define specific layers.
Convolution layer.
Fully connected layer (Inner product layer)
Pooling layer.
Activation function layer.
Define loss function.
Define your network (train_val.prototxt)
LogReg ↑
"dummy-net"
{ name:
{ name:
{ name:
"data" …}
"conv" …}
"pool" …}
more layers …
name:
layers
layers
layers
…
layers { name: "loss" …}
net:
blue: layers you need to define
yellow: data blobs
LeNet →
examples/mnist/lenet_train.prototxt
ImageNet, Krizhevsky 2012 →
Define your network (train_val.prototxt)
Data Layerlayer {name: "cifar"type: "Data"top: "data"top: "label"include {phase: TRAIN
}transform_param {mean_file: "examples/cifar10/mean.binaryproto"
}data_param {source: "examples/cifar10/cifar10_train_lmdb"batch_size: 100backend: LMDB
}}
layer {name: "cifar"type: "Data"top: "data"top: "label"include {phase: TEST
}transform_param {mean_file: "examples/cifar10/mean.binaryproto"
}data_param {source: "examples/cifar10/cifar10_test_lmdb"batch_size: 100backend: LMDB
}}
Define your network (train_val.prototxt)
Conv Layerlayer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {lr_mult: 1
}param {lr_mult: 2
}convolution_param {num_output: 32pad: 2kernel_size: 5stride: 1weight_filler {type: "gaussian"std: 0.0001
}bias_filler {type: "constant"
}}
}
data
convolution
Convolution•Layer type: Convolution
•CPU implementation: ./src/caffe/layers/convolution_layer.cpp
•CUDA GPU implementation: ./src/caffe/layers/convolution_layer.cu
•Parameters (ConvolutionParameter convolution_param)
•Required•num_output (c_o): the number of filters
•kernel_size (or kernel_h and kernel_w): specifies height and width of
each filter
•Strongly Recommended•weight_filler [default type: 'constant' value: 0]
•Optional•bias_term [default true]: specifies whether to learn and apply a set of
additive biases to the filter outputs•pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to
(implicitly) add to each side of the input•stride (or stride_h and stride_w) [default 1]: specifies the intervals at
which to apply the filters to the input•group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a
subset of the input. Specifically, the input and output channels are separated
into g groups, and the ith output group channels will be only connected to
the ith input group channels.
Define your network (train_val.prototxt)
Fully connected layerlayer {name: "ip1"type: "InnerProduct"bottom: "pool3"top: "ip1"param {lr_mult: 1
}param {lr_mult: 2
}inner_product_param {num_output: 64weight_filler {type: "gaussian"std: 0.1
}bias_filler {type: "constant"
}}
}
Inner Product•Layer type: InnerProduct
•CPU implementation: ./src/caffe/layers/inner_product_layer.cpp•CUDA GPU implementation: ./src/caffe/layers/inner_product_layer.cu•Parameters (InnerProductParameter inner_product_param)
•Required•num_output (c_o): the number of filters
•Strongly recommended•weight_filler [default type: 'constant' value: 0]
•Optional•bias_filler [default type: 'constant' value: 0]•bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
Define your network (train_val.prototxt)
Pooling layerPooling•Layer type: Pooling
•CPU implementation: ./src/caffe/layers/pooling_layer.cpp•CUDA GPU implementation: ./src/caffe/layers/pooling_layer.cu•Parameters (PoolingParameter pooling_param)
•Required•kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
•Optional•pool [default MAX]: the pooling method. Currently MAX,
AVE, or STOCHASTIC•pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input•stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input
layer {name: "pool1"type: "Pooling"bottom: "conv1"top: "pool1"pooling_param {pool: MAXkernel_size: 3stride: 2
}}
Define your network (train_val.prototxt)
Activation function
layer {name: "relu1"type: "ReLU"bottom: "pool1"top: "pool1"
}
ReLU / Rectified-Linear and Leaky-ReLU•Layer type: ReLU
•CPU implementation: ./src/caffe/layers/relu_layer.cpp•CUDA GPU implementation: ./src/caffe/layers/relu_layer.cu•Parameters (ReLUParameter relu_param)
•Optional•negative_slope [default 0]: specifies whether to
leak the negative part by multiplying it with the
slope value rather than setting it to 0.
Define your network (train_val.prototxt)
Loss layer
layer {name: "loss"type: "SoftmaxWithLoss"bottom: "ip2"bottom: "label"top: "loss"
}
ClassificationSoftmaxWithLoss
HingeLoss
Linear RegressionEuclideanLoss
Attributes / MulticlassificationSigmoidCrossEntropyLoss
Others…
New TaskNewLoss
•
DataCon-volve
PoolCon-volve
PoolInner Prod
...Rect-ify
Rect-ify
Pre-dict
Label
Loss
network does not need to be linear
linear network:
DataCon-volve
PoolCon-volve
PoolInner Prod
...Rect-ify
Rect-ify
Pre-dict
Label
Loss
? ?
?
...
...
?
?
? ? Sum
directed acyclic graph:
—> a little more about the network
Define your network (train_val.prototxt)
23
Solving: Training a Net (solver.prototxt)Optimization like model definition is configuration.
train_net: "lenet_train.prototxt"base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
max_iter: 10000
snapshot_prefix: "lenet_snapshot"
All you need to run things on the GPU.
> caffe train -solver lenet_solver.prototxt -gpu 0
Stochastic Gradient Descent (SGD) + momentumAdaptive Gradient (ADAGRAD) · Nesterov’s Accelerated Gradient (NAG)
Solving: Training a Net (solver.prototxt)net: "train_val.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.001
momentum: 0.9
weight_decay: 0.004
lr_policy: "fixed"
display: 100
max_iter: 4000
snapshot: 4000
snapshot_prefix: "examples/cifar10/cifar10_quick"
solver_mode: GPU
Model definition file
Iteration for test
Test interval
Learning rateMomentum
Weight decayLearning rate policy
Max iteration number for traning
Save
Save filename
CPU/GPU
Additional detailsDownload caffe (https://github.com/BVLC/caffe)
Installsudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler libatlas-base-dev
CUDNN (optional)Download from NVIDIAsudo cp lib* /usr/local/cuda/lib64/sudo cp cudnn.h /usr/local/cuda/include/
Change config file name In caffe folder: “Makefile.config.example” “Makefile.config”
To use CUDNN (optional)In file “Makefile.config”: # USE_CUDNN := 1 USE_CUDNN := 1
CompileIn caffe folder: make all –j8 faster
Download DataExecute in folder “caffe/data/cifar10”: sh get_cifar10.sh
Create DataMove file: “caffe/examples/cifar10/create_cifar10.sh” “caffe/create_cifar10.sh”
Execute in folder “caffe/”: sh create_cifar10.sh
cifar10_test_lmdb
cifar10_train_lmdb
mean.binaryproto
TrainMove file: “caffe/examples/cifar10/train_quick.sh” “caffe/train_quick.sh”
Execute in folder “caffe/”: sh train_quick.sh75.11%
Distribute your network.“.caffemodel”
weight parameters.
“_deploy.prototxt”model definition file.
layer {name: "ip1"type: "InnerProduct"bottom: "pool3"top: "ip1"param {lr_mult: 1decay_mult: 250
}param {lr_mult: 2decay_mult: 0
}inner_product_param {num_output: 10
}}layer {name: "prob"type: "Softmax"bottom: "ip1"top: "prob"
}
name: "CIFAR10_full_deploy"input: "data"input_dim: 1input_dim: 3input_dim: 32input_dim: 32layer {name: "conv1"type: "Convolution"bottom: "data"top: "conv1"param {lr_mult: 1
}param {lr_mult: 2
}convolution_param {num_output: 32pad: 2kernel_size: 5stride: 1
}}
Finetuning modelsExample
ImageNet dataset => Style dataset
Different DB, the number of class.
Finetuning models
● Simply change a few lines in the layer definition new name = new params
—> what if you want to transfer the weight of a existing model to finetune another dataset / task
Input:A differentsource
Last Layer:A differentclassifier
layers {
name: "data"
type: “Data”
data_param {
source:
"ilsvrc12_train_leveldb"
"./data/mean_file:
ilsvrc12"
...
}
...
...
layers {
name: "fc8"
type:"InnerProduct”
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 1000
...
}
layers {
name: "data"
type: “Data”
data_param {
source: "style_leveldb"
mean_file: "./data/
ilsvrc12"
...
}
...
}
...
layers {
name: "fc8-style"
type: "InnerProduct”
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 20
...
}
—solver
—weights
models/finetune_flickr_style/solver.prototxt
bvlc_reference_caffenet.caffemodel
> finetune_net.bin solver.prototxt model_file
old caffe:
new caffe:
> caffe train
net =new Caffe::Net("style_solver.prototxt");
net.CopyTrainedNetFrom(pretrained_model);
solver.Solve(net);
Finetuning models
Making Deep Residual Network.
Revolution of Depth
3.57
6.7 7.3
16.4
11.7
25.8
28.2
ILSVRC'15 ILSVRC'14 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10
ResNet GoogleNet VGG AlexNet
ImageNet Classification top-5 error (%)
shallow8 layers
19 layers22 layers
152 layers
8 layers
Very deep
Ultra deep
Deep
Deep Residual Network
Making .prototxt
layer {name: "cifar"type: "Data"top: "data"top: "label"include {phase: TRAIN
}transform_param {mean_file: "examples/cifar10/mean.binaryproto"
}data_param {source: "examples/cifar10/cifar10_train_lmdb"batch_size: 100backend: LMDB
}}
2734 lines/ 44 layers
1202 layers??????2734/44*1202
=74687.909… lines?If 1 line = 1 sec,
it takes approximately 20 hours.
CONV
Batch Normalization
Scale Layer
ReLU
PYTHON Implementation
from caffe import layers as L
def conv_bn_cifar10(bottom, nout, ks = 3, stride=1, pad = 0, is_test = True, learn = True):
if learn:
param = [dict(lr_mult=1, decay_mult=1)]
else:
param = [dict(lr_mult=0, decay_mult=0), dict(lr_mult=0, decay_mult=0)]
conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
num_output=nout, pad=pad, param = param, weight_filler=dict(type="msra"), bias_filler=dict(type="constant"), bias_term = False)
bn = L.BatchNorm(conv, param=[dict(lr_mult=0), dict(lr_mult=0), dict(lr_mult=0)], batch_norm_param=dict(use_global_stats=is_test))
scale = L.Scale(bn)
relu = L.ReLU(scale)
return conv, bn, scale, relu
CONV
Batch Normalization
Scale Layer
ReLU
Issues when Training Neural Networks
Most slides were obtained from Stanford C231n (Prof. Fei-Fei Li).
ConvNets need a lot of data to train.
ImageNet data
1. Train on ImageNet 2. Finetune network on
your own data
your data
1. Train on ImageNet
2. If small dataset: fix all weights (treat CNN as fixed feature extractor), retrain only theclassifier
i.e. swap the Softmax layer at the end
3. If you have medium sized dataset, “finetune” instead: use the old weights as initialization, train the full network or only some of the higherlayers
retrain bigger portion of the network, or even all of it.
Transfer Learning
E.g. Caffe Model Zoo: Lots of pretrained ConvNetshttps://github.com/BVLC/caffe/wiki/Model-Zoo
...
Tuning Learning Rate.
Double check that the loss is reasonable:
returns the loss and the g
radient for all parameters
disable regularization
loss ~2.3. “
correct “ for
10 classes
Double check that the loss is reasonable:
crank up regularization
loss went up, good. (sanity check)
Lets try to train now…
Tip: Make sure that
you can overfit very
small portion of the
training dataThe above code:
- take the first 20 examples from
CIFAR-10- turn off regularization (reg = 0.0)
- use simple vanilla ‘sgd’
45
Lets try to train now…
Tip: Make sure that
you can overfit very
small portion of the
training data
Very small loss, tra
in accuracy 1.00, n
ice!
46
Lets try to train now…
I like to start with small
regularization and find
learning rate that mak
es the loss go down.
47
Lets try to train now…
I like to start with small
regularization and find
learning rate that mak
es the loss go down.
Loss barely changing
Lets try to train now…
I like to start with small
regularization and find
learning rate that mak
es the loss go down.
loss not going down:
learning rate too low
Loss barely changing: Learning rate is
probably too low
Lets try to train now…
I like to start with small
regularization and find
learning rate that mak
es the loss go down.
loss not going down:
learning rate too low
Loss barely changing: Learning rate is
probably too low
Notice train/val accuracy goes to 20% t
hough, what’s up with that? (remember
this is softmax)
Lets try to train now…
I like to start with small
regularization and find
learning rate that mak
es the loss go down.
loss not going down:
learning rate too low
Okay now lets try learning rate 1e6. What could
possibly go wrong?
cost: NaN almost
always means high
learning rate...
Lets try to train now…
I like to start with small
regularization and find
learning rate that mak
es the loss go down.
loss not going down:
learning rate too low l
oss exploding: learni
ng rate too high
52
Lets try to train now…
I like to start with small
regularization and find
learning rate that mak
es the loss go down.
loss not going down:
learning rate too low l
oss exploding: learni
ng rate too high
3e-3 is still too high. Cost explodes….
=> Rough range for learning rate we
should be cross-validating is some
where [1e-3 … 1e-5]
Monitor and visualize the loss curve
Monitor and visualize the accuracy:
big gap = overfitting
=> increase regularization strength?
no gap=> increase model capacity?
Squeezing out the last few percent.
VGG Net VGG Net (Oxford)
The second places in the classification tasks. Stacked 3x3 filter No LRN layers Stride 1
The power of small filters
Suppose we stack two CONV layers with receptive field size 3x3
=> Each neuron in 1st CONV sees a 3x3 region of input.
1st CONV neuron
view of the input:
(and stride 1)
The power of small filters
Suppose we stack two CONV layers with receptive field size 3x3
=> Each neuron in 1st CONV sees a 3x3 region of input.
Q: What region of input does each neuron in 2nd CONV see?
2nd CONV neuron
view of 1st conv:
Suppose we stack two CONV layers with receptive field size 3x3
=> Each neuron in 1st CONV sees a 3x3 region of input.
Q: What region of input does each neuron in 2nd CONV see?
X2nd CONV neuron
view of input: Answer: [5x5]
The power of small filters
Suppose we stack three CONV layers with receptive field size 3x3
Q: What region of input does each neuron in 3rd CONV see?
3rd CONV neuron
view of 2nd CONV:
The power of small filters
Suppose we stack three CONV layers with receptive field size 3x3
Q: What region of input does each neuron in 3rd CONV see?
X
X
Answer: [7x7]
The power of small filters
Suppose input has depth C & we want output depth C as well
1x CONV with 7x7 filters 3x CONV with 3x3 filters
Number of weights: Number of weights:
The power of small filters
Number of weights:
C*(7*7*C)
= 49 C^2
Number of weights:
Suppose input has depth C & we want output depth C as well
1x CONV with 7x7 filters 3x CONV with 3x3 filters
The power of small filters
Number of weights:
C*(7*7*C)
= 49 C^2
Number of weights:
C*(3*3*C) + C*(3*3*C) + C*(3*3*C)
= 3 * 9 * C^2
= 27 C^2
Suppose input has depth C & we want output depth C as well
1x CONV with 7x7 filters 3x CONV with 3x3 filters
The power of small filters
Number of weights:
C*(7*7*C)
= 49 C^2
Number of weights:
C*(3*3*C) + C*(3*3*C) + C*(3*3*C)
= 3 * 9 * C^2
= 27 C^2
Fewer parameters and more nonlinearities = GOOD.
Suppose input has depth C & we want output depth C as well
1x CONV with 7x7 filters 3x CONV with 3x3 filters
The power of small filters
“More non-linearities” and “deeper” usually gives better
performance.
[Network in Network, Lin et al. 2013]
The power of small filters
“More non-linearities” and “deeper” usually gives better
performance.
=> 1x1 CONV!(Usually follows a normal CONV, e.g.
[3x3 CONV - 1x1 CONV]
[Network in Network, Lin et al. 2013]
The power of small filters
[Network in Network, Lin et al. 2013]
“More non-linearities” and “deeper” usually gives better
performance.
=> 1x1 CONV!(Usually follows a normal CONV, e.g.
[3x3 CONV - 1x1 CONV]
3x3 CONV view of input
1x1 CONV view of output
of 3x3 CONV
The power of small filters
“More non-linearities” and “deeper” usually gives better
performance.
=> 1x1 CONV!(Usually follows a normal CONV, e.g.
[3x3 CONV - 1x1 CONV]
[Network in Network, Lin et al. 2013]
The power of small filters
[Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan et al., 2014]
=> Evidence that using 3x3 instead of
1x1 works better
The power of small filters
• i.e. simulating “fake” data
• explicitly encoding image transfor
mations that shouldn’t change obj
ect identity.
Data Augmentation
What the computer sees
1. Flip horizontally
Data Augmentation
73
2. Random crops/scales
Sample these during training (al
so helps a lot during test time)
e.g. common to see even up to 150 crops used
Data Augmentation
3. Random mix/combinations of :
- translation
- rotation
- stretching
- shearing,
- lens distortions, … (go crazy)
Data Augmentation
75
4. Color jittering(maybe even contrast jittering, etc.)
- Simple: Change contrast
small amounts, jitter the
color distributions, etc.
- Vignette,... (go crazy)
Data Augmentation
Data Augmentation
4. Color jittering(maybe even contrast jittering, etc.)
- Simple: Change contrast
small amounts, jitter the
color distributions, etc.
Fancy PCA way:1. Compute PCA on all [R,G,
B] points values in the tra
ining data
2. sample some color offset
along the principal comp
onents at each forward
pass
3. add the offset to all pixels in
a training image
(As seen in [Krizhevsky et al. 2012])
77
Notice the more general theme:1. Introduce a form of randomness in forward pass
2. Marginalize over the noise distribution during prediction
DropConnect
Dropout
Data Augmentation,
Model Ensembles
Real-time application using Caffe.
ImageNet CompetitionTotal 1000 classes
Each class has about 1300 images for training. (1300 x 1000 = 1,300,000)
It takes about one week for training CNN model.
Deep Learning
Hand-crafted
Real-time Object Recognition VGG Net (Oxford)
The second places in the classification tasks. Stacked 3x3 filter No LRN layers Stride 1
Real-time Object Recognition (contd.)
CPU : Intel i5-4690 CPU 3.5GHzRAM : 18GBGPU : NVIDIA Geforce GTX770
Implementation Detail
Step 1. Windows 7 64bit / NVIDIA Graphic card (optional)
Step 2. Install CUDA 6.5
Step 3. Download Caffe-windows (http://github.com/niuzhiheng/caffe)
Step 4. Download 3rd party libraries (http://github.com/niuzhiheng/caffe)
Step 5. Download VGG pre-trained weights / architecture
(http://www.robots.ox.ac.uk/~vgg/research/very_deep/)
Step 6. Implement Code
Implementation Detail (contd.)
CameraFrame
(Image)
CNN(Forward
Propagation)
Result(Top5)
OpenCV
Cuda
Caffe
Resizing
Implementation Detail (contd.)
// Test modeCaffe::set_phase(Caffe::TEST);
// mode setting - CPU/GPUCaffe::set_mode(Caffe::GPU);
// gpu device numberint device_id = 0;Caffe::SetDevice(device_id);
// prototxtNet<float> caffe_test_net("VGG_ILSVRC_19_layers_deploy.prototxt");
// caffemodel(weight)caffe_test_net.CopyTrainedLayersFrom("VGG_ILSVRC_19_layers.caffemodel");
name: "VGG_ILSVRC_19_layers"input: "data"input_dim: 1input_dim: 3input_dim: 224input_dim: 224layers {bottom: "data"top: "conv1_1"name: "conv1_1"type: CONVOLUTIONconvolution_param {
num_output: 64pad: 1kernel_size: 3
}}layers {bottom: "conv1_1"top: "conv1_1"name: "relu1_1"type: RELU
}
<prototxt>http://caffe.berkeleyvision.org/
for (k=0; k<3; k++){
for (i=0; i<IMAGE_SIZE; i++){
for (j=0; j< IMAGE_SIZE; j++){
blob.mutable_cpu_data()[blob.offset(0, k, i, j)] = (float)(unsigned char)small_image->imageData[i*small_image->widthStep+j*small_image->nChannels+k] - mean_val[k];
}}
}input_vec.push_back(&blob);
// forward propagationfloat loss;const vector<Blob<float>*>& result = caffe_test_net.Forward(input_vec, &loss);
// copy outputfor(i=0; i<1000; i++){
output[i] = result[0]->cpu_data()[i];}
Thank You!!E-mail : [email protected]