Upload
ferhat-kurt
View
140
Download
13
Embed Size (px)
Citation preview
GÖMÜLÜ SİSTEMLERDE DERİN ÖĞRENME UYGULAMALARI
Ferhat Kurt https://embedded.openzeka.com
Microsoft & Google “Superhuman” Image
Recognition
Microsoft “Super Deep Network”
Berkeley’s BrettEnd-to-End
Reinforcement Learning
Deep Speech 2One network, 2 languages
A New Computing Model Hits Pop Culture
AlphaGoRivals a World Champion
TU Delft Deep-Learning Amazon Picking Champion
YAPAY ZEKA KİLOMETRE TAŞLARI
Deep Learning andComputer Vision
Graphics GPU Compute
NVIDIA GPU: GRAFİKTEN DAHA FAZLASI
GPU'lar üstün performans ve verimlilik sunar
Tümleşik algılama ve derin öğrenme, otonomluk sağlar
x1
x2
x3
x4
OTONOM MAKİNELERİN YÜKSELİŞİ
Otonomluk gerektiren yeni kullanım durumları
ÖNCÜ JETSON TEKNOLOJİSİOtonom Makinelerin Gelecek Nesline Güç Veriyor
Jetson TX1Bir Modül Üzerinde Süper Bilgisayar
10 W altında benzersiz performansOtonom makineler için gelişmiş teknolojiKredi kartından daha küçük
JETSON TX1
GPU 1 TFLOP/s 256-core Maxwell
CPU 4x 64-bit ARM A57 CPUs | 1.6GHz
Memory 4 GB LPDDR4 | 25.6 GB/s
Video decode 4K 60Hz H.264
Video encode 4K 30Hz H.264
CSI Up to 6 cameras | 1400 Mpix/s
Display 2x DSI, 1x eDP 1.4, 1x DP 1.2/HDMI
Wi-Fi 802.11 2x2 ac
Networking 1 Gigabit Ethernet
PCI-E Gen 2 1x1 + 1x4
Storage 16 GB eMMC, SDIO, SATA
Other 3x UART, 3x SPI, 4x I2C, 4x I2S, GPIOs
Power 10-15W, 6.6V-19.5VDC
Size 50mm x 87mm
Modül Üstünde Sistem
Jetson TX1Developer Kit
Jetson TX1 Developer Board 5MP Camera
DIGITS Workflow VisionWorks Jetson Multimedia SDK
ve diğer teknolojiler:
CUDA, Linux4Tegra, NSIGHT EE, OpenCV4Tegra, OpenGL, Vulkan, System Trace, Visual Profiler, Ubuntu 14.04
Deep Learning SDK
NVIDIA JETPACK
Linux for Tegra
Compute (CUDA)
Jetson TX1
VisionMachineLearning
cuSPARSE
cuSolver
cuFFT
cuBLAS NPP
cuRAND Thrust
CUDA Math Library
Graphics
Araçlar
NVTXNVIDIA Tools eXtension
Source code editor
Debugger
Profiler
System Trace
Dikey Entegre Edilmiş Paketler
V4L2
libjpeg
JETSON SDK: DETAYLAR
VISIONWORKS™
CUDA-accelerated Computer Vision Toolkit• Full OpenVX 1.1 implementation
• Easy integration with existing CV pipelines
• Custom extensions
Applications
VisionWorks
CUDA
Jetson TX1
VisionWorks™
Toolkit
Robotics Augmented Reality Drones
Example Applications
Feature Tracking
Structure from Motion
Object Tracking
Dense Optical Flow
VisionWorks™ API + FrameWorksIMAGE ARITHMETICAbsoluteDifference Accumulate ImageAccumulate Squared Accumulate Weighted Add / Subtract / MultiplyChannel Combine ChannelExtract
GEOMETRIC TRANSFORMSAffine Warp +Perspective Warp Flip Image Gaussian PyramidRemap Scale Image
FeaturesCanny EdgeDetector Fast Corners+Fast Track Harris Corners + Harris TrackHoughCircles HoughLines
• Jetpack SDK
• Libraries
• Developer tools
• Design collateral
• Developer Forum
• Training and Tutorials
• Ecosystemhttp://developer.nvidia.com/embedded-computing
Kapsamlı Geliştirici Platformu
GETTINGSTARTED
JETSON COMMUNITYDeveloper Forums devtalk.nvidia.com eLinux Wiki eLinux.org/Jetson_TX1
• Infrared devices:• SICK LIDAR (LMS 200); Hokuyo; rpLIDAR• Asus Xtion Pro Live (PrimeSense)• Intel RealSense (mult. generations)
• Stereo and color cameras:• StereoLabs Zed (consumer-oriented)• Point Grey Research USB3 and GigE• e-con Systems CSI-MIPI Cameras
with external ISP
THE PERIPHERALS JETSON CONNECTS WITHincluding Community Contributions
JETSON TX1 MODÜLÜ YERLEŞTİRMEModüler Ekosistem
• ConnectTech Orbitty
• ConnectTech Rosie
• Auvidea J120
• Colorado EngineeringTX1-SOM TX1 MODÜL
GPU Inference Engine ile Gerçek Zamanlı Derin Öğrenme Ağlarını Uygulama
72%74%
84%
88%93%
96.4%
Human:94.9%
2010 2011 2012 20152013 2014
GPU’da Derin Öğrenme
OTONOMA NE KADAR UZAĞIZ?ImageNet sınıflandırma doğruluğu
DERİN ÖĞRENMEFark Ne?
Derin ÖğrenmeDNN + Veri + HPC
Geleneksel Bilgisayarlı GörüUzman + Zaman
YENİ HESAPLAMA MODELİ
Otonom MakienlerOnboard Zeka
Nesne Sınıflandırma
Segmentasyon
Çarpışma Önleme3D GeriçatmaLokalizasyon/
Haritalandırma
POWERING THE DEEP LEARNING ECOSYSTEMNVIDIA SDK Accelerates Every Major Framework
developer.nvidia.com/deep-learning-software
DEEP LEARNING FRAMEWORKS
COMPUTER VISION SPEECH ANDAUDIO NATURAL LANGUAGE PROCESSING
Object Detection Voice Recognition Language TranslationRecommendation
EnginesSentiment Analysis
Mocha.jl
Image Classification
NVIDIA DEEP LEARNING SDK
NCCLcuDNN cuBLAS GIEcuSPARSE
A COMPLETE COMPUTE PLATFORMMANAGE TRAIN DEPLOY
DIGITS
DATACENTER AUTOMOTIVE
TRAINTEST
MANAGE /AUGMENTEMBEDDED
GPU INFERENCE ENGINE
NVIDIA DIGITS
Test Image
developer.nvidia.com/digits input
concat
İnteraktif Derin Öğrenme GPU Eğitim Sistemi
Veri İşleme DNN Yapılandırma İşlem Görüntüleme Görselleştirme
FIRST Team 900ROBUST DATACOLLECTION
ZEBRACORNSteam900.org
GPU INFERENCE ENGINEWorkflow
DIGITS OPTIMIZATION ENGINE
EXECUTION ENGINE
PLANNEURAL NETWORK
input
concatdeveloper.nvidia.com/gpu-inference-engine
NVIDIA GPU Inference Engine (GIE) provides even higher efficiency and performance for neural network inference. Tests performed using GoogLenet. CPU-only: Single-socket Intel Xeon (Haswell) E5-2698 [email protected] with HT.GPU: NVIDIA Tesla M4 + cuDNN 5 RC.GPU + GIE: NVIDIA Tesla M4 + GIE.
input
concat
GPU INFERENCE ENGINEOptimizations
• Fuse network layers
• Eliminate concatenation layers
• Kernel specialization
• Auto-tuning for target platform
• Select optimal tensor layout
• Batch size tuningTRAINED NEURALNETWORK
input
concat
OPTIMIZEDINFERENCERUNTIME
developer.nvidia.com/gpu-inference-engine
Graph Optimization
concat
max pool
next input
1x1 conv.
relubias
relubias
1x1 conv.
relubias
3x3 conv.
relubias
1x1 conv.
relubias
5x5 conv.
relubias
1x1 conv.
input
concat
Graph OptimizationVertical fusion
max pool
input
concat
next input
concat
1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR 1x1 CBR
Graph OptimizationHorizontal fusion
concat
max pool
next input
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR
input
concat
Graph OptimizationConcat elision
max pool
input
next input
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR
• Baseline is cuDNN / cuBLAS
• Direct convolution kernels for small batch
• Custom Winograd & Implicit GEMM for Half2
• Custom Deconvolution for filter size == stride case
• Weight pre-transform for Winograd
• Optimal T/N choice for BLAS
• Run cudnnFindForwardConvolutionEx() with multiple iterations
AutotuningChoose the fastest kernel for each layer
// create the network definitionINetworkDefinition* network = infer->createNetwork();
// create a map from caffe blob names to GIE tensorsstd::unordered_map<std::string, infer1::Tensor> blobNameToTensor;
// populate the network definition and map CaffeParser* parser = new CaffeParser;parser->parse(deployFile, modelFile, *network, blobNameToTensor);
// tell GIE which tensors are required outputs for (auto& s : outputs)
network->setOutput(blobNameToTensor[s]);
BuildImporting a Caffe Model
// Specify the maximum batch size and scratch size CudaEngineBuildContext buildContext; buildContext.maxBatchSize = maxBatchSize; buildContext.maxWorkspaceSize = 1 << 20;
// create the engine ICudaEngine* engine =
infer->createCudaEngine(buildContext, *network);
// serialize to a C++ streamengine->serialize(gieModelStream);
BuildEngine Creation
// get array bindings for input and outputint inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME),
outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
// set array of input and output buffersvoid* buffers[2]; buffers[inputIndex] = gpuInputBuffer;buffers[outputIndex] = gpuOutputBuffer;
RuntimeBinding Buffers
// Specify the batch size CudaEngineContext context; context.batchSize = batchSize;
// add GIE kernels to the given streamengine->enqueue(context, buffers, stream, NULL);
<…>
// wait on the streamcudaStreamSynchronize(stream);
RuntimeRunning the Engine
Training organizations and individuals to solve challenging problems using Deep Learning
On-site workshops and online courses presented by certified experts
Covering complete workflows for proven application use casesImage classification, object detection, natural language processing, recommendation systems, and more
www.nvidia.com/dli
Hands-on Training for Data Scientists and Software Engineers
NVIDIA Deep Learning Institute
Deep Reinforcement Learning
PLAYING ATARI WITHDEEPMIND
From Pixels to Actions: Human-level control through Deep Reinforcement Learning
http://arxiv.org/abs/1602.01783
47
http://arxiv.org/abs/1602.01783
Inside Google’s DeepMind AlphaGo GPU cluster
END-TO-END LEARNING
Motor PWM
Sensory Inputs
Perceptron
RNN
Recognition
Inference
Goal/Reward
user task
Shor
t-te
rmLo
ng-t
erm
MOTION CONTROL
AUTONOMOUS NAVIGATION
49
OpenAI Gym
Gazebo
Unreal4Torch
PhysX
Others
SIMULATIONPhysical Intuition
A reinforcement learning agent includes:state (environment) actions (controls) reward (feedback)
A value function predicts the future rewardof performing actions in the current state
Given the recent state, action with the maximum estimated future reward is chosen for execution
For agents with complex state spaces, deep networks are used as Q-value approximator
Numerical solver (gradient descent) optimizes the network on-the-fly based on reward inputs
Q-LEARNINGHow’s it work?
LSTM ACCELERATIONLaunch a 2D grid of RNN cells
Multiple layers in a single call are faster
Doesn’t suffer from vanishing gradient
Able to adopt long-term strategy
Supports:
Partially-observable environments
Uni/Bidirectional RNNs
Non-uniform length minibatches
Dropout between layers
DEEP-LEARNING RESEARCH ROVERTURBO 2.0
github.org/dusty-nv
Derin Öğrenme Sunucuları (Kütüphane, veri setleri, ağ yapısı ve modellerini içerir)
İstek ön işleme ve sonuç döndürme katmanı
Kullanıcı arayüzü (Web+Api desteği)
Görüntü Analizi Ses analizi Veri analizi Müşteriye özel
analiz yapısı
Girdi ÇıktıResimVideoSes (sinyal)Veri
Gerçek zamanlı Sınıflandırılmış veya anlamlandırılmış çıktı
Open Zeka Mimarisi
Open Zeka API
GPU ve CPU Bulutu Üzerinde Gömülü Sistemler
Jetson TX1-TK1Rasberry Pi 3Test devam ediyor
Frame Dönüşümü Ses Ayrışımı
Resim ne anlatıyor? Ses ne anlatıyor?
Görüntü
Kaynaklar Tür Model
Fotoğraf
Video frame
RGB
Termal(LWIR/SWI
R)Monochrom
e
Nesne Tespiti
Yüz tanıma
Konsept
Konsept
MSI/HSI
Ses
Metin/Veri
Veri
Open Zeka Servisi
Son kullanıcıya Cloud üzerindeinsan algısına yakın bir seviyedegörüntü, ses ve veri analizisunma
Model barındırma servisi (Geliştirici arayüz desteği)
Algoritma geliştirme ve barındırma servisi (Esnek mimari)
Nerede Kullanılacak
• Kamera görüntülerinin (Resim-akış) gerçekzamanlı anlamlandırılması,
• Eğlence sektörü,• Sürücü destek sistemleri,• Otonom ve robotik sistemler (Gömülü teknoloji)• Savunma sanayiinde sensör kullanan mimarilere
yapay zekâ kazandırılması (Karar destek sistemi)• Sağlık alanında görüntü ve veri analizi• Büyük veri analizi (Finans)
Güvenlik kameralarının bulut içerisinde gerçek zamanlı analizi
Open Zeka Jetson TX1 Türkiye tedarikçisidir.
Türkiye Derin Öğrenme Grubu Sayfası: https://www.linkedin.com/grp/home?gid=8334641
Ankara Derin Öğrenme Meetup Sayfası: http://www.meetup.com/Ankara-Deep-Learning
Derin Öğrenme Grup Sayfası: https://www.facebook.com/groups/derin.ogrenme
http://www.derinogrenme.com
“If we knew what it was we were doing, it would not be called research, would it?”Einstein
TEŞEKKÜRLER.