Gömülü Sistemlerde Derin Öğrenme Uygulamaları

GÖMÜLÜ SİSTEMLERDE DERİN ÖĞRENME UYGULAMALARI

Ferhat Kurt https://embedded.openzeka.com

http://openzeka.com/


https://embedded.openzeka.com/

Microsoft & Google “Superhuman” Image

Recognition

Microsoft “Super Deep Network”

Berkeley’s BrettEnd-to-End

Reinforcement Learning

Deep Speech 2One network, 2 languages

A New Computing Model Hits Pop Culture

AlphaGoRivals a World Champion

TU Delft Deep-Learning Amazon Picking Champion

YAPAY ZEKA KİLOMETRE TAŞLARI

Deep Learning andComputer Vision

Graphics GPU Compute

NVIDIA GPU: GRAFİKTEN DAHA FAZLASI

GPU'lar üstün performans ve verimlilik sunar

Tümleşik algılama ve derin öğrenme, otonomluk sağlar

x1

x2

x3

x4

OTONOM MAKİNELERİN YÜKSELİŞİ

Otonomluk gerektiren yeni kullanım durumları

ÖNCÜ JETSON TEKNOLOJİSİOtonom Makinelerin Gelecek Nesline Güç Veriyor

Jetson TX1Bir Modül Üzerinde Süper Bilgisayar

10 W altında benzersiz performansOtonom makineler için gelişmiş teknolojiKredi kartından daha küçük

JETSON TX1

GPU 1 TFLOP/s 256-core Maxwell

CPU 4x 64-bit ARM A57 CPUs | 1.6GHz

Memory 4 GB LPDDR4 | 25.6 GB/s

Video decode 4K 60Hz H.264

Video encode 4K 30Hz H.264

CSI Up to 6 cameras | 1400 Mpix/s

Display 2x DSI, 1x eDP 1.4, 1x DP 1.2/HDMI

Wi-Fi 802.11 2x2 ac

Networking 1 Gigabit Ethernet

PCI-E Gen 2 1x1 + 1x4

Storage 16 GB eMMC, SDIO, SATA

Other 3x UART, 3x SPI, 4x I2C, 4x I2S, GPIOs

Power 10-15W, 6.6V-19.5VDC

Size 50mm x 87mm

Modül Üstünde Sistem

Jetson TX1Developer Kit

Jetson TX1 Developer Board 5MP Camera

DIGITS Workflow VisionWorks Jetson Multimedia SDK

ve diğer teknolojiler:

CUDA, Linux4Tegra, NSIGHT EE, OpenCV4Tegra, OpenGL, Vulkan, System Trace, Visual Profiler, Ubuntu 14.04

Deep Learning SDK

NVIDIA JETPACK

Linux for Tegra

Compute (CUDA)

Jetson TX1

VisionMachineLearning

cuSPARSE

cuSolver

cuFFT

cuBLAS NPP

cuRAND Thrust

CUDA Math Library

Graphics

Araçlar

NVTXNVIDIA Tools eXtension

Source code editor

Debugger

Profiler

System Trace

Dikey Entegre Edilmiş Paketler

V4L2

libjpeg

JETSON SDK: DETAYLAR

VISIONWORKS™

CUDA-accelerated Computer Vision Toolkit• Full OpenVX 1.1 implementation

• Easy integration with existing CV pipelines

• Custom extensions

Applications

VisionWorks

CUDA

Jetson TX1

VisionWorks™

Toolkit

Robotics Augmented Reality Drones

Example Applications

Feature Tracking

Structure from Motion

Object Tracking

Dense Optical Flow

VisionWorks™ API + FrameWorksIMAGE ARITHMETICAbsoluteDifference Accumulate ImageAccumulate Squared Accumulate Weighted Add / Subtract / MultiplyChannel Combine ChannelExtract

GEOMETRIC TRANSFORMSAffine Warp +Perspective Warp Flip Image Gaussian PyramidRemap Scale Image

FeaturesCanny EdgeDetector Fast Corners+Fast Track Harris Corners + Harris TrackHoughCircles HoughLines

• Jetpack SDK

• Libraries

• Developer tools

• Design collateral

• Developer Forum

• Training and Tutorials

• Ecosystemhttp://developer.nvidia.com/embedded-computing

Kapsamlı Geliştirici Platformu

http://developer.nvidia.com/embedded-computing

GETTINGSTARTED

JETSON COMMUNITYDeveloper Forums devtalk.nvidia.com eLinux Wiki eLinux.org/Jetson_TX1

• Infrared devices:• SICK LIDAR (LMS 200); Hokuyo; rpLIDAR• Asus Xtion Pro Live (PrimeSense)• Intel RealSense (mult. generations)

• Stereo and color cameras:• StereoLabs Zed (consumer-oriented)• Point Grey Research USB3 and GigE• e-con Systems CSI-MIPI Cameras

with external ISP

THE PERIPHERALS JETSON CONNECTS WITHincluding Community Contributions

JETSON TX1 MODÜLÜ YERLEŞTİRMEModüler Ekosistem

• ConnectTech Orbitty

• ConnectTech Rosie

• Auvidea J120

• Colorado EngineeringTX1-SOM TX1 MODÜL

GPU Inference Engine ile Gerçek Zamanlı Derin Öğrenme Ağlarını Uygulama

72%74%

84%

88%93%

96.4%

Human:94.9%

2010 2011 2012 20152013 2014

GPU’da Derin Öğrenme

OTONOMA NE KADAR UZAĞIZ?ImageNet sınıflandırma doğruluğu

DERİN ÖĞRENMEFark Ne?

Derin ÖğrenmeDNN + Veri + HPC

Geleneksel Bilgisayarlı GörüUzman + Zaman

YENİ HESAPLAMA MODELİ

Otonom MakienlerOnboard Zeka

Nesne Sınıflandırma

Segmentasyon

Çarpışma Önleme3D GeriçatmaLokalizasyon/

Haritalandırma

POWERING THE DEEP LEARNING ECOSYSTEMNVIDIA SDK Accelerates Every Major Framework

developer.nvidia.com/deep-learning-software

DEEP LEARNING FRAMEWORKS

COMPUTER VISION SPEECH ANDAUDIO NATURAL LANGUAGE PROCESSING

Object Detection Voice Recognition Language TranslationRecommendation

EnginesSentiment Analysis

Mocha.jl

Image Classification

NVIDIA DEEP LEARNING SDK

NCCLcuDNN cuBLAS GIEcuSPARSE

A COMPLETE COMPUTE PLATFORMMANAGE TRAIN DEPLOY

DIGITS

DATACENTER AUTOMOTIVE

TRAINTEST

MANAGE /AUGMENTEMBEDDED

GPU INFERENCE ENGINE

NVIDIA DIGITS

Test Image

developer.nvidia.com/digits input

concat

İnteraktif Derin Öğrenme GPU Eğitim Sistemi

Veri İşleme DNN Yapılandırma İşlem Görüntüleme Görselleştirme

FIRST Team 900ROBUST DATACOLLECTION

ZEBRACORNSteam900.org

GPU INFERENCE ENGINEWorkflow

DIGITS OPTIMIZATION ENGINE

EXECUTION ENGINE

PLANNEURAL NETWORK

input

concatdeveloper.nvidia.com/gpu-inference-engine

NVIDIA GPU Inference Engine (GIE) provides even higher efficiency and performance for neural network inference. Tests performed using GoogLenet. CPU-only: Single-socket Intel Xeon (Haswell) E5-2698 [email protected] with HT.GPU: NVIDIA Tesla M4 + cuDNN 5 RC.GPU + GIE: NVIDIA Tesla M4 + GIE.

input

concat

GPU INFERENCE ENGINEOptimizations

• Fuse network layers

• Eliminate concatenation layers

• Kernel specialization

• Auto-tuning for target platform

• Select optimal tensor layout

• Batch size tuningTRAINED NEURALNETWORK

input

concat

OPTIMIZEDINFERENCERUNTIME

developer.nvidia.com/gpu-inference-engine

Graph Optimization

concat

max pool

next input

1x1 conv.

relubias

relubias

1x1 conv.

relubias

3x3 conv.

relubias

1x1 conv.

relubias

5x5 conv.

relubias

1x1 conv.

input

concat

Graph OptimizationVertical fusion

max pool

input

concat

next input

concat

1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR 1x1 CBR

Graph OptimizationHorizontal fusion

concat

max pool

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

input

concat

Graph OptimizationConcat elision

max pool

input

next input

3x3 CBR 5x5 CBR 1x1 CBR

1x1 CBR

• Baseline is cuDNN / cuBLAS

• Direct convolution kernels for small batch

• Custom Winograd & Implicit GEMM for Half2

• Custom Deconvolution for filter size == stride case

• Weight pre-transform for Winograd

• Optimal T/N choice for BLAS

• Run cudnnFindForwardConvolutionEx() with multiple iterations

AutotuningChoose the fastest kernel for each layer

// create the network definitionINetworkDefinition* network = infer->createNetwork();

// create a map from caffe blob names to GIE tensorsstd::unordered_map<std::string, infer1::Tensor> blobNameToTensor;

// populate the network definition and map CaffeParser* parser = new CaffeParser;parser->parse(deployFile, modelFile, *network, blobNameToTensor);

// tell GIE which tensors are required outputs for (auto& s : outputs)

network->setOutput(blobNameToTensor[s]);

BuildImporting a Caffe Model

// Specify the maximum batch size and scratch size CudaEngineBuildContext buildContext; buildContext.maxBatchSize = maxBatchSize; buildContext.maxWorkspaceSize = 1 << 20;

// create the engine ICudaEngine* engine =

infer->createCudaEngine(buildContext, *network);

// serialize to a C++ streamengine->serialize(gieModelStream);

BuildEngine Creation

// get array bindings for input and outputint inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME),

outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);

// set array of input and output buffersvoid* buffers[2]; buffers[inputIndex] = gpuInputBuffer;buffers[outputIndex] = gpuOutputBuffer;

RuntimeBinding Buffers

// Specify the batch size CudaEngineContext context; context.batchSize = batchSize;

// add GIE kernels to the given streamengine->enqueue(context, buffers, stream, NULL);

<…>

// wait on the streamcudaStreamSynchronize(stream);

RuntimeRunning the Engine

Training organizations and individuals to solve challenging problems using Deep Learning

On-site workshops and online courses presented by certified experts

Covering complete workflows for proven application use casesImage classification, object detection, natural language processing, recommendation systems, and more

www.nvidia.com/dli

Hands-on Training for Data Scientists and Software Engineers

NVIDIA Deep Learning Institute

http://www.nvidia.com/dli

Deep Reinforcement Learning

PLAYING ATARI WITHDEEPMIND

From Pixels to Actions: Human-level control through Deep Reinforcement Learning

http://googleresearch.blogspot.com/2015/02/from-pixels-to-actions-human-level.html

http://arxiv.org/abs/1602.01783


47


Inside Google’s DeepMind AlphaGo GPU cluster


END-TO-END LEARNING

Motor PWM

Sensory Inputs

Perceptron

RNN

Recognition

Inference

Goal/Reward

user task

Shor

t-te

rmLo

ng-t

erm

MOTION CONTROL

AUTONOMOUS NAVIGATION

49

OpenAI Gym

Gazebo

Unreal4Torch

PhysX

Others

SIMULATIONPhysical Intuition

A reinforcement learning agent includes:state (environment) actions (controls) reward (feedback)

A value function predicts the future rewardof performing actions in the current state

Given the recent state, action with the maximum estimated future reward is chosen for execution

For agents with complex state spaces, deep networks are used as Q-value approximator

Numerical solver (gradient descent) optimizes the network on-the-fly based on reward inputs

Q-LEARNINGHow’s it work?

LSTM ACCELERATIONLaunch a 2D grid of RNN cells

Multiple layers in a single call are faster

Doesn’t suffer from vanishing gradient

Able to adopt long-term strategy

Supports:

Partially-observable environments

Uni/Bidirectional RNNs

Non-uniform length minibatches

Dropout between layers

DEEP-LEARNING RESEARCH ROVERTURBO 2.0

github.org/dusty-nv

Derin Öğrenme Sunucuları (Kütüphane, veri setleri, ağ yapısı ve modellerini içerir)

İstek ön işleme ve sonuç döndürme katmanı

Kullanıcı arayüzü (Web+Api desteği)

Görüntü Analizi Ses analizi Veri analizi Müşteriye özel

analiz yapısı

Girdi ÇıktıResimVideoSes (sinyal)Veri

Gerçek zamanlı Sınıflandırılmış veya anlamlandırılmış çıktı

Open Zeka Mimarisi

Open Zeka API

GPU ve CPU Bulutu Üzerinde Gömülü Sistemler

Jetson TX1-TK1Rasberry Pi 3Test devam ediyor

Frame Dönüşümü Ses Ayrışımı

Resim ne anlatıyor? Ses ne anlatıyor?

Görüntü

Kaynaklar Tür Model

Fotoğraf

Video frame

RGB

Termal(LWIR/SWI

R)Monochrom

e

Nesne Tespiti

Yüz tanıma

Konsept

Konsept

MSI/HSI

Ses

Metin/Veri

Veri

Open Zeka Servisi

Son kullanıcıya Cloud üzerindeinsan algısına yakın bir seviyedegörüntü, ses ve veri analizisunma

Model barındırma servisi (Geliştirici arayüz desteği)

Algoritma geliştirme ve barındırma servisi (Esnek mimari)

Nerede Kullanılacak

• Kamera görüntülerinin (Resim-akış) gerçekzamanlı anlamlandırılması,

• Eğlence sektörü,• Sürücü destek sistemleri,• Otonom ve robotik sistemler (Gömülü teknoloji)• Savunma sanayiinde sensör kullanan mimarilere

yapay zekâ kazandırılması (Karar destek sistemi)• Sağlık alanında görüntü ve veri analizi• Büyük veri analizi (Finans)

Güvenlik kameralarının bulut içerisinde gerçek zamanlı analizi

Open Zeka Jetson TX1 Türkiye tedarikçisidir.

https://embedded.openzeka.com/

Türkiye Derin Öğrenme Grubu Sayfası: https://www.linkedin.com/grp/home?gid=8334641

Ankara Derin Öğrenme Meetup Sayfası: http://www.meetup.com/Ankara-Deep-Learning

Derin Öğrenme Grup Sayfası: https://www.facebook.com/groups/derin.ogrenme

http://www.derinogrenme.com

“If we knew what it was we were doing, it would not be called research, would it?”Einstein

TEŞEKKÜRLER.



Data & Analytics

Gömülü Sistemlerde Derin Öğrenme Uygulamaları