28
Dong Li Seaway Technology Inc. ICT, CAS 2019-11-15 Towards Benchmarking AIOT Device based on MCU

Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Dong Li

Seaway Technology Inc.

ICT, CAS

2019-11-15

Towards Benchmarking AIOT Device based on MCU

Page 2: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 2

Outline

Contents

MCU-based AIOT Device and Benchmarking

SeawayRTOS Intro. & Auditing Kernel

BenchMarking Goal and Method

2

Early Experiments for BenchMarking

Page 3: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 3

内容提要

2

01MCU-based AIOT Device and Benchmarking

Page 4: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 4

MCU-based AIOT Device

2

1. Tiny Smart Device with computing ability are

Already Cheap and Everywhere.

2. the Future of Machine Learning will be Tiny

Page 5: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 5

MCU and Sensors are already in milliwatts ranges

26 in' Display

400 mW4G cell radio

800 mWLP BLE4.0&WIFI

100 mWGyroscope Sensor

130mWGPS

180 milliwatts.1/4 CMOS camera

300 milliwatts.

- ARM & Princeton [arXiv:1905.12107]

Page 6: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 6

Deep Learning Works Well and Energy-Efficient on MCUs

2

1. ARM CMSIS-5 for Cortex M

- CMSIS-NN

- uTensor

2. TensorFlow Lite For MCU

- Person detection

- Speech Keyword spotting

- Classify physical gestures

3. Microsoft Embedded Learning Library (ELL)

ESP32 SOCWIFI and BLE

Spark fun Edge with Apollo3

Nordic nRF 52840 BLE

STM32F746 Discovery kit

Page 7: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 7

Existing MCUs and New AIOT Low Power Proccessor

1. MCU 40~200Mhz

2. RAM(SDRAM) 32KB ~ 512KB

3. ROM(Flash) 512KB ~ 1MB

4. Energy ~100 uA/MHz (1.2V - 5V)

Existing MCU/DSP

1. MCU+NPU by ARM or RISC-V

2. MCU+DSP+ Spec. NN Accelerator by ARM/RISC-V/FPGA

3. MCU+PIM(Process in Memory) chip

New AIOT Proccessor (MCU/DSP+NPU)

ESP32 by TFLite for Face Recognition

ICT RISCV MCU+NPU FPGA Broad

Page 8: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 8

Benchmarking Goal : The Best Shape

picojoules per op

Accuracy

Energy Consumption

Max RAMCost

Max ROM

Computing Performance

spindle-shaped is the best shape

Page 9: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 92

SeawayRTOS Intro. & Auditing Kernel

02

Page 10: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 10

SeawayRTOS for AIOT Devices

2

KB-Level Runtime

KB-level Seaway RTOS Kernelel)

KB-Level EdgeStack

- Online AIOT App Store- Support Javascript and Python- ROM<100K, RAM<2K

- Function Migration - Support for MQTT、CoAP and HTTP- WIFI、BLE、LoRA、NB-IOT and Zigbee- ROM<32K, RAM<2K - Resp to Req <200 mS

Data/Ins. Bus

I/O BUS

Little CoreSensor Hub

Sensors Actua.

Big CoreOS

AI coreInference

MemoryController

Comm. Controller

EdgeStack SeawayKernel

HAL & BSP

Seaway Runtime

AIOT Framework

App AppApp

EnergyOpt.

App

Files

- Auditing Kernel - Active Sleep Mode- ROM<10K & RAM<1K & TCB<10B- ask Fail Rate <0.1%

Page 11: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 112

Seaway Runtime

技术特点

1. AIOT App Store - 不落盘AIOT App应用执行方法- 面向边缘域的拟单机编程

2. AIOT Runtime Development- on Kernel:Native C/C++- on Runtime:JavaScript/Python- Dynamic Task Allocation and Execution

3. Less Codes than Traditional Embedded Program

Evaluation indexExperiment result

WebletScript JerryScript Duktape EspruinoCompatibility(%) 58.6 99.7 99.4 66.5

Footprint(KB) 80 168 184 231by ECMA-262 benchmark

Page 12: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 12

Seaway EdgeSuite

2

End AIOT Device Edge AIOT Device Cloud

Seaway RTOS Seasway Edge Seaway Cloud

The developer now only need one application for the whole end-Edge-clould system

Page 13: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 13

Auditing Kernel Design

nEnable Kernel information monitoring for event-driven RTOS

should be in Kernel

nA lightweight resource auditing tool

Less than 1KB ROM and 1KB RAM

nEarly security warning when the abnormal resource usage pattern is captured

Design Goals

Page 14: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 14

n Processl Confirm the execution entity of a task

l Locate the executable code segment

n Eventl The event statistics data of a tasks in the kernel

l Identify the abnormal event usage.

n Hardware resource usagel Quantity and pattern of the consumption of hardware

resource, including Proccessor, Memory, Radio and

Sensors

Auditing Kernel Design

5

Page 15: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 15

7

Seaway Resource Auditing Overview

1. Resource Auditor Moudle collects the running information and generates the log data of an AIoT device.

2. Seaway analyzes the log data in Edgedevices according to the corresponding resource usage Model.

3. the AIoT devices receive the performance status.

Page 16: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 16

n Data Hook

l Process-Event Model

l Hardware Time-Base Model

n Data processing Module

n Warning Handle Module

7

Kernel Auditing Architecture

n kernel inner loop function l The entity of a task

l The executable code segment

l Setup hooks in basic kernel function such as

do_poll / do_event

l Save the data in the locally file system

l Or Send them out to the gateway for analysis

Page 17: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 17

n Hardware resource schedulingl Quantity and pattern of the consumption of event and task

Capture the kernel data for hardware Resources

Category Component Parameter Kernel Events

NetworkDataPackage

Network

wifi_init_result WiFi init

wifi_mode WiFi set_mode

wifi_state WiFi On/Off

source source IP

destination destination IP

package_transfer

System

Shceduling Data

Task Information

taskID xTaskCreate

task_running_frequency

portYIELD, xPortSysTickHan

dler

Hardware Module Usage

CPU CPU_Frequency CPU frequency switch

Sensorsnviroment_data sensor_get_data

Sensors_Frequency

sensor frequency switch

Page 18: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 182

Experiments for getting bench score

03

Page 19: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 19

n SeawayRTOSl A event-driven scheduling system

l multi-threaded

l lightweight threading technology

Protothreads

l file system(Coffee)

l network support: LwIP

l OTA

Experiment Setup

nCC2538 + ESP32 l an ARM Cortex-M3 with up to

32MHz clock speed

l 32KB of RAM

l 256KB flash

l Zigbee in CC2538

l WIFI/BLE in ESP32

8

Page 20: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 20

we catch the kernel data of event and process information of an benchmark task using SeawayRTOS

EVALUATION

9

Page 21: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 21

The analysis restult of the tcp/ip experiment with process-event Model

n The Process-event Analysis

Result l There are different operations in Period

1056&1057 compared with base behavior of

this benchmarking task

l The system is using the radio to send data

Warning generatedperiod

10

Page 22: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 22

The analysis result of the Time-Base Model

n The Time-Base Analysis Result l We got the working state information of CPU,

Memory, RADIO and SENSORS

l There are suspicious operations in Period

5&6 compared with normal action of this

application

l The System is using the radio to listen other

data

l Warning generated, and we should suspend

the task waiting for the administrator to

decide.

period

12

Page 23: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 232

BenchMarking Goal and Method

04

Page 24: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 24

1. A open-source Testbed Board with sensors and Radios

2

the main processor

A: Low Power BLE/WIFI Module

B: MIC

C: Accelerometers

D: Temperature & Humidity

E: multi-threaded Protothreads

F: COMS Image Sensor

G: PIR (motion) sensor

H: GPS

Page 25: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 25

Run the Benchmark tasks on DataSets

2

MNIST databasehandwritten digits

CIFAR-10

Wechat Audio 100 Keyword SpottingBy Seaway Tech.

Chars74K dataset

Band Accelerator Data 100hoursPattern recognition

Band Heart Rate 100hoursfor DL and SVM alg.By Seaway Tech.

Character Recognition

We can provide some baseline results on these dataset with our own implementation on STM32 and ESP32

objects classification

Page 26: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 26

Benchmark Design

2

First Satisfy:

1. Benchmark Alg. Accuracy > baseline

2. Max ROM < baseline

3. Max RAM < baseline

4. Processor Cost

Compare:

how much energy a single benchmark task cost

given picojoules per op

Page 27: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 27

Thanks

Dong LiSeaway Technology Inc.

[email protected]

Page 28: Towards Benchmarking AIOT Device based on MCUbenchcouncil.org/bench19/file/slides/invite2.pdfDiscovery kit Bench19 Seaway tech. 7 Existing MCUs and New AIOT Low Power Proccessor 1

Bench19 Seaway tech. 28

Comparison

2

AliOS Things Amazon FreeRTOS

MicrosoftThreadX

Seaway

授权方式 社区版开源 小部分开源 闭源 社区版开源

基础内核Footprint 8KB 8KB 10KB 8KB

物端应用层协栈 各协议分立-80K MQTT协议栈-20K 专有协议-80K MCH综合栈,32KB

ML推理模型支持 - 支持 支持 支持低功耗控制 - - 支持 支持(<0.1w)边缘计算支持 - 支持 支持 支持原生安全机制 - - - 支持第三方应用支持 物云独立 物云一体 物云一体 端边云一体

IOT云服务 绑定阿里云 绑定AWS 绑定Azure 自由

AI数学库支持 - 至Cortex A级 - 至Cortex M级