Upload
andra-norman
View
216
Download
0
Embed Size (px)
Citation preview
1
“Low-Power, Real-Time Object-Recognition Processors for Mobile Vision Systems”,
IEEE Micro 2012.
Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung Park ; Seungjin Lee ; Joo-Young Kim ;
Jeong-Ho Woo ; Hoi-Jun Yoo
Presenter: Juseong Lee, 2013021037
2
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
3
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
5
Introduction• Object recognition system
– Require real-time operation• High performance• Low power in mobile system
• How can implement?– Find suitable algorithm
• SIFT algorithm
– Hardware optimization• Algorithm optimization• Make exclusive processor
– Parallel computation• Multi-threading• NoC
SIFT - Scale Invariant Feature TransformNoC - Network on Chip
Source by VOLVO
6
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
7
Background Knowledge
• What is SIFT algorithm?– Scale Invariant Feature Transform– The most popular candidate
• For how to extract some interest points out of the object and describe them
– Robust against changes in translation, scaling, and rotation.
Image matching by SIFT
8
Background Knowledge• What’s the problem in SIFT-based object recognition?
– Consumes a lot of power• Owing to the heavy computation required in descriptor Gen. and matching
– Today’s high-resolution image sensors & tight power budgets• Make real-time SIFT implementation in mobile device even harder
Scare resources problem
9
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
10
Main Idea• How can we solve the problem?
– Make an object-recognition processor• Using an attention-based recognition algorithm
– For energy efficiency
• A heterogeneous multicore architecture– For data and thread parallelism
• Network-on-Chip(NoC) communication– For high bandwidth
• The processor determines Regions of Interest(ROI) part of image– For minimizing unnecessary computations
• Heterogeneous multicore architecture– provides several types of parallelism– achieves high throughput– low power consumption
• High-bandwidth NoC plays a role as the communications backbone
11
Why find ROI?• Image processing algorithm has no regard throughput
Image size
480 x 360
Objects have feature!172,800 computations!
Example) Edge detection
You can select part for reducing computation!
12
Main Idea – BONE V
Using Conventional method
Using Main Idea
13
Main Idea – Algorithm• Attention-based object recognition
14
Main Idea – Architecture
Pixel level parallelVery long instruction word
3 stage task level pipeline1.5x↓ power consumption
5 stage fine-grained pipeline3.45x↑ pipeline throughput
15
SMT-enabled heteroge-neous multicore processor
• Throughput-optimized SFEC– Find ROI tile for energy efficiency– Memory locality with high bandwidth utilization
• Latency-optimized FMP– ROI tile and NoC help latency
• Power-optimized MLE– Changes the core’s thread allocation – and operating voltage and frequency dynamically
BONE-V5:
SFEC: SMT-enabled Feature Extraction ClusterFMP: Feature Matching ProcessorMLE: Machine Learning Engine
16
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
17
Implementation
18
Implementation - Comparing
19
Implementation - Comparing
20
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
21
Conclusion
• Energy efficient system is important to improve performance
• Algorithm and architecture have to optimize at the same time
• BONE-V multicore processors can apply real-time object recognition system
• Future BONE-V processors will further lower the power consumption.
22
Outline
• Introduction
• Background
• Main Idea
• Implementation
• Conclusion
• Evaluation
Object Recognition by Juseong Lee
23
Evaluation
• Table 3 has to contain the result that comparing other recognition processor
• When hardware optimization, Not only overall algorithm but particular algorithm block optimization are needed– CORDIC based gradient and magnitude computation