View
1
Download
0
Category
Preview:
Citation preview
Dong Li
Seaway Technology Inc.
ICT, CAS
2019-11-15
Towards Benchmarking AIOT Device based on MCU
Bench19 Seaway tech. 2
Outline
Contents
MCU-based AIOT Device and Benchmarking
SeawayRTOS Intro. & Auditing Kernel
BenchMarking Goal and Method
2
Early Experiments for BenchMarking
Bench19 Seaway tech. 3
内容提要
2
01MCU-based AIOT Device and Benchmarking
Bench19 Seaway tech. 4
MCU-based AIOT Device
2
1. Tiny Smart Device with computing ability are
Already Cheap and Everywhere.
2. the Future of Machine Learning will be Tiny
Bench19 Seaway tech. 5
MCU and Sensors are already in milliwatts ranges
26 in' Display
400 mW4G cell radio
800 mWLP BLE4.0&WIFI
100 mWGyroscope Sensor
130mWGPS
180 milliwatts.1/4 CMOS camera
300 milliwatts.
- ARM & Princeton [arXiv:1905.12107]
Bench19 Seaway tech. 6
Deep Learning Works Well and Energy-Efficient on MCUs
2
1. ARM CMSIS-5 for Cortex M
- CMSIS-NN
- uTensor
2. TensorFlow Lite For MCU
- Person detection
- Speech Keyword spotting
- Classify physical gestures
3. Microsoft Embedded Learning Library (ELL)
ESP32 SOCWIFI and BLE
Spark fun Edge with Apollo3
Nordic nRF 52840 BLE
STM32F746 Discovery kit
Bench19 Seaway tech. 7
Existing MCUs and New AIOT Low Power Proccessor
1. MCU 40~200Mhz
2. RAM(SDRAM) 32KB ~ 512KB
3. ROM(Flash) 512KB ~ 1MB
4. Energy ~100 uA/MHz (1.2V - 5V)
Existing MCU/DSP
1. MCU+NPU by ARM or RISC-V
2. MCU+DSP+ Spec. NN Accelerator by ARM/RISC-V/FPGA
3. MCU+PIM(Process in Memory) chip
New AIOT Proccessor (MCU/DSP+NPU)
ESP32 by TFLite for Face Recognition
ICT RISCV MCU+NPU FPGA Broad
Bench19 Seaway tech. 8
Benchmarking Goal : The Best Shape
picojoules per op
Accuracy
Energy Consumption
Max RAMCost
Max ROM
Computing Performance
spindle-shaped is the best shape
Bench19 Seaway tech. 92
SeawayRTOS Intro. & Auditing Kernel
02
Bench19 Seaway tech. 10
SeawayRTOS for AIOT Devices
2
KB-Level Runtime
KB-level Seaway RTOS Kernelel)
KB-Level EdgeStack
- Online AIOT App Store- Support Javascript and Python- ROM<100K, RAM<2K
- Function Migration - Support for MQTT、CoAP and HTTP- WIFI、BLE、LoRA、NB-IOT and Zigbee- ROM<32K, RAM<2K - Resp to Req <200 mS
Data/Ins. Bus
I/O BUS
Little CoreSensor Hub
Sensors Actua.
Big CoreOS
AI coreInference
MemoryController
Comm. Controller
EdgeStack SeawayKernel
HAL & BSP
Seaway Runtime
AIOT Framework
App AppApp
EnergyOpt.
App
Files
- Auditing Kernel - Active Sleep Mode- ROM<10K & RAM<1K & TCB<10B- ask Fail Rate <0.1%
Bench19 Seaway tech. 112
Seaway Runtime
技术特点
1. AIOT App Store - 不落盘AIOT App应用执行方法- 面向边缘域的拟单机编程
2. AIOT Runtime Development- on Kernel:Native C/C++- on Runtime:JavaScript/Python- Dynamic Task Allocation and Execution
3. Less Codes than Traditional Embedded Program
Evaluation indexExperiment result
WebletScript JerryScript Duktape EspruinoCompatibility(%) 58.6 99.7 99.4 66.5
Footprint(KB) 80 168 184 231by ECMA-262 benchmark
Bench19 Seaway tech. 12
Seaway EdgeSuite
2
End AIOT Device Edge AIOT Device Cloud
Seaway RTOS Seasway Edge Seaway Cloud
The developer now only need one application for the whole end-Edge-clould system
Bench19 Seaway tech. 13
Auditing Kernel Design
nEnable Kernel information monitoring for event-driven RTOS
should be in Kernel
nA lightweight resource auditing tool
Less than 1KB ROM and 1KB RAM
nEarly security warning when the abnormal resource usage pattern is captured
Design Goals
Bench19 Seaway tech. 14
n Processl Confirm the execution entity of a task
l Locate the executable code segment
n Eventl The event statistics data of a tasks in the kernel
l Identify the abnormal event usage.
n Hardware resource usagel Quantity and pattern of the consumption of hardware
resource, including Proccessor, Memory, Radio and
Sensors
Auditing Kernel Design
5
Bench19 Seaway tech. 15
7
Seaway Resource Auditing Overview
1. Resource Auditor Moudle collects the running information and generates the log data of an AIoT device.
2. Seaway analyzes the log data in Edgedevices according to the corresponding resource usage Model.
3. the AIoT devices receive the performance status.
Bench19 Seaway tech. 16
n Data Hook
l Process-Event Model
l Hardware Time-Base Model
n Data processing Module
n Warning Handle Module
7
Kernel Auditing Architecture
n kernel inner loop function l The entity of a task
l The executable code segment
l Setup hooks in basic kernel function such as
do_poll / do_event
l Save the data in the locally file system
l Or Send them out to the gateway for analysis
Bench19 Seaway tech. 17
n Hardware resource schedulingl Quantity and pattern of the consumption of event and task
Capture the kernel data for hardware Resources
Category Component Parameter Kernel Events
NetworkDataPackage
Network
wifi_init_result WiFi init
wifi_mode WiFi set_mode
wifi_state WiFi On/Off
source source IP
destination destination IP
package_transfer
System
Shceduling Data
Task Information
taskID xTaskCreate
task_running_frequency
portYIELD, xPortSysTickHan
dler
Hardware Module Usage
CPU CPU_Frequency CPU frequency switch
Sensorsnviroment_data sensor_get_data
Sensors_Frequency
sensor frequency switch
Bench19 Seaway tech. 182
Experiments for getting bench score
03
Bench19 Seaway tech. 19
n SeawayRTOSl A event-driven scheduling system
l multi-threaded
l lightweight threading technology
Protothreads
l file system(Coffee)
l network support: LwIP
l OTA
Experiment Setup
nCC2538 + ESP32 l an ARM Cortex-M3 with up to
32MHz clock speed
l 32KB of RAM
l 256KB flash
l Zigbee in CC2538
l WIFI/BLE in ESP32
8
Bench19 Seaway tech. 20
we catch the kernel data of event and process information of an benchmark task using SeawayRTOS
EVALUATION
9
Bench19 Seaway tech. 21
The analysis restult of the tcp/ip experiment with process-event Model
n The Process-event Analysis
Result l There are different operations in Period
1056&1057 compared with base behavior of
this benchmarking task
l The system is using the radio to send data
Warning generatedperiod
10
Bench19 Seaway tech. 22
The analysis result of the Time-Base Model
n The Time-Base Analysis Result l We got the working state information of CPU,
Memory, RADIO and SENSORS
l There are suspicious operations in Period
5&6 compared with normal action of this
application
l The System is using the radio to listen other
data
l Warning generated, and we should suspend
the task waiting for the administrator to
decide.
period
12
Bench19 Seaway tech. 232
BenchMarking Goal and Method
04
Bench19 Seaway tech. 24
1. A open-source Testbed Board with sensors and Radios
2
the main processor
A: Low Power BLE/WIFI Module
B: MIC
C: Accelerometers
D: Temperature & Humidity
E: multi-threaded Protothreads
F: COMS Image Sensor
G: PIR (motion) sensor
H: GPS
Bench19 Seaway tech. 25
Run the Benchmark tasks on DataSets
2
MNIST databasehandwritten digits
CIFAR-10
Wechat Audio 100 Keyword SpottingBy Seaway Tech.
Chars74K dataset
Band Accelerator Data 100hoursPattern recognition
Band Heart Rate 100hoursfor DL and SVM alg.By Seaway Tech.
Character Recognition
We can provide some baseline results on these dataset with our own implementation on STM32 and ESP32
objects classification
Bench19 Seaway tech. 26
Benchmark Design
2
First Satisfy:
1. Benchmark Alg. Accuracy > baseline
2. Max ROM < baseline
3. Max RAM < baseline
4. Processor Cost
Compare:
how much energy a single benchmark task cost
given picojoules per op
Bench19 Seaway tech. 27
Thanks
Dong LiSeaway Technology Inc.
lidong@haiwei.tech
Bench19 Seaway tech. 28
Comparison
2
AliOS Things Amazon FreeRTOS
MicrosoftThreadX
Seaway
授权方式 社区版开源 小部分开源 闭源 社区版开源
基础内核Footprint 8KB 8KB 10KB 8KB
物端应用层协栈 各协议分立-80K MQTT协议栈-20K 专有协议-80K MCH综合栈,32KB
ML推理模型支持 - 支持 支持 支持低功耗控制 - - 支持 支持(<0.1w)边缘计算支持 - 支持 支持 支持原生安全机制 - - - 支持第三方应用支持 物云独立 物云一体 物云一体 端边云一体
IOT云服务 绑定阿里云 绑定AWS 绑定Azure 自由
AI数学库支持 - 至Cortex A级 - 至Cortex M级
Recommended