Upload
dinhnhu
View
235
Download
0
Embed Size (px)
Citation preview
Update on Super HDTV Decoder Project
Youn-Long Lin Department of Computer Science
National Tsing Hua University
IC-DFN 2007, Rizhao
YLLIN NTHU-CS 2
More Pixels
YLLIN NTHU-CS 3
NHK Proposes UHD TV Broadcast
Super HiVision 7680x4320 pixels at 60 fps (16XHDTV)
Baseband signal is 24 Gbps. Using 16 MPEG-2 encoding chips, the signal was compressed to 250 Mbps for transmission.
HDTV signals at present are 1.5 Gbps for baseband and 20 Mbps for compressed signals.
High Performance compression / decompression and transmission / storage are needed for
24 Gbps ~300 Mbps
YLLIN NTHU-CS 4
3840x2160 QFHD TV
7680x4320 UHD TV
SDTV
1920x1080 HDTV
YLLIN NTHU-CS 5
Applications
QFHD1080HD720HDD2CIFQCIF
YLLIN NTHU-CS 6
Video Coding Technology Trend
H.264 50% 69%
YLLIN NTHU-CS 7
Features of Video Coding Standards
64kbps ~ 150Mbps64kbps~2Mbps2-15 MbpsUp to 1.5 Mbps
Transmission rate
I, P, BI, P, BI, P, BI, P, BPicture type
Multiple (5) framesOne frameOne frameOne frameReference frames
pel pel pel pelPixel accuracy
41 MVs per MBYesYesYesME, MC
VLC, CAVLC and CABACVLCVLCVLCEntropy coding
4*4 int transformDCT/ WaveletDCTDCTTransform
16*16, 16*8, 8*16, 8*8, 8*4, 4*8, 4*4
16*16, 8*88*88*8Block size
16*1616*1616*16(frame)16*16MB size
H.264/AVCMPEG-4MPEG-2MPEG-1Standard
YLLIN NTHU-CS 8
Not all H.264/AVC systems are equal
32168
8.872.5411
55.724.616.95
Search Range#Ref Frames
Video Coding with H.264/AVC: Tools, Performance and Complexity, J. Ostermann et al, IEEE CAS Mag., Q1 2004.
Relative Computational Complexity
YLLIN NTHU-CS 9
Quality vs Bit-rate vs Decoding Throughput
6530726557042144172316FpsBit Rate (Kbps)QP
H.264/AVC Baseline Profile Decoder Complexity Analysis, M. Horowitz, IEEE T-CSVT, July 2003
Decoding Capability of a 600MHz CPU
YLLIN NTHU-CS 10
Our Target
Single-Chip Decoder for QFHD (3840x2160) H.264/AVC High Profile Video.264 bitstream Video outputCABAD Advanced Entropy CoderMixed 4x4/8x8 Transform Commodity External MemoryPlatform-Based Design
YLLIN NTHU-CS 11
Resolution vs. Needed Frequency
Clock Frequency (MHz)
Res
olut
ion
100 200 400 800
1080 HD
QFHD
720 HD
16 VGA
50
Lin [30],Chen[32],Chien [46], Peng[48]Lin [31], Liu[42]Conexant [36]C&S [39]Kawakami [44]
66 % Frequency Saving
4 x larger frame size
YLLIN NTHU-CS 12
Frequency Budget
675.0
170.0
75.0
28.1
8.3
2.0
1.0
Size
Digital signageMedical videoSatellite imageSpace exploration
249 MHzQFHD (3840 x 2160)
62 MHz1080HD (1920 x 1088)
Home theater
30 MHz720HD (1080 x 720)
Car TVSurveillance10 MHzD2 (720 x 480)
Mobile TV3 MHzCIF (352 x 288)
0.8 MHzQCIF (176 x 144)
Video phone
0.4 MHzSQCIF (128 x 96)
ApplicationClock FrequencyResolution
YLLIN NTHU-CS 13
Essential Issues
MemoryTradeoff Between the Size of Internal Memory and
Bandwidth of External AccessMassive Parallelism (Pipelining)Macroblock Decoding SchedulingPower
YLLIN NTHU-CS 14
NTHU H.264 Decoder Architecture
Parser CAVLD/CABAD
IQ & IT
MVG
IPRED
INTERP
BSG
DF
MAU & AMBA Interface Translator
H.264 Video Decoder
CPU Display MemoryController Ethernet
AHB
para & predinfo
reconbs
residualm
v&
ridx
coeffm
vdinfo
Memory
YLLIN NTHU-CS 16
size vs. b/w in ME
Memory Bandwidth (MB/s)
Memory Size (Bytes)
19658
240
1200
4977
124929
151631762
A
B
C
D
Full HD 30fps, # of rf =1 , SRV=SRH=64 Level A : 240 Bytes, 19658 MB/sLevel B : 1200 Bytes, 1516MB/sLevel C: 4977 Bytes, 317MB/sLevel D: 124,929 Bytes, 62 MB/s
YLLIN NTHU-CS 17
CB memrf0 mem rf1 mem
CMBreg
CB AGrf AG
rf regarray
comparator comparator comparator comparator
MV mem
IME block diagram
CMBreg
CMBreg
CMBreg
rf router
MVGenrf0
MVGenrf0
MVGenrf0
MVGenrf0
MVGenrf0
MVGenrf0
MVGenrf0
MVGenrf0
MV AG
YLLIN NTHU-CS 18
size vs. b/w in ME
Memory Bandwidth (MB/s)
Memory Size (Bytes)
19658
240
1200
4977
124929
151631762
A
B
C
D
ours
YLLIN NTHU-CS 19
Reference-data Pre-fetch System
No redundant fetching Collecting several MBs motion vectors, and read the same place
by only one single operation
Minimize the number of burst initials On average, 2 burst initials per MB (1 for luma, 1 for chroma)
: a group of sequentially read (burst read)
YLLIN NTHU-CS 20
CABAD
Reference-data Pre-fetch System (Cont)
MB7
MB8
MB9
MB10
Motion VectorGenerator
Translator
Reference Region & Index Register
Region Analyzer/ Searcher
OES manager
MB6MB7
MB4MB5MB6MB7
MB4MB5
MB2MB3MB4
MB1MB2
MB0MB1MB2
MB0
R0
R1
R0R1R2R3R4R5R6R7 Buffer
R2
MB7 MB6 MB5 MB4 MB3 MB2 MB1 MB0
R2 Information
R2 Information
Interp
R0/R1 Data
R2 Data from SDRAM
MB7 Information
MB7 MV
MB7 Region Information
. . . .
MAU Interface
Massive Parallelism
YLLIN NTHU-CS 22
4
IQ/IDCT Timing Diagram
t3 212
1 1 1 1 1 1
195
0~16
0~15chromaac_6_7
1
0~16luma
ac_0_1
0~16luma
ac_14_15
0~16luma
ac_0_1
0~16luma
ac_14_15
1 1 1
4
1 1 1 1
4
0~4dc
0~15chromaac_0_1
0~15chromaac_6_7
0~15chromaac_0_1
0~15chromaac_6_7
1 1 1 1 1 4 1 1 1 1
4
122 140 144 161
2
4 4 4
219
0~16luma
ac_0_1
0~16
0~16
0~16
0~16
0~15chromaac_0_1
0~15
0~15
0~16
0~16
0~16
0~16
0~16
0~16
0~16
0~16
0~16
0~16
0~16
0~16
0~16
0~15
0~15
0~15
0~15
0~16luma
ac_14_15
4 4 4 4 4
4 4 4 4 4 4
4 4
4 4
4
1 1 1 1 1 1
4 1
1
IDCTstage 1
coeflag_memread
coeff_memread
IQstage 1
IQ stage 2
residual_memwrite
IDCTstage 2 4 4 4 4 4 4 4 4 4
YLLIN NTHU-CS 23
Deblocking Filter Timing Diagram
YLLIN NTHU-CS 24
L31L30 L32 L33
L21L20 L22 L23
L11L10 L12 L13
L01L00 L02 L03
Strong filter (Bs=4)/ Left delta calculation
M01M00 M02 M03 R01R00 R02 R03
M11M10 M12 M13
M21M20 M22 M23
M31M30 M32 M33
R11R10 R12 R13
R21R20 R22 R23
R31R30 R32 R33
Right Weak filter (Bs
System-Level Optimization
Cyclic-Queue-Based IP Interface
YLLIN NTHU-CS 26
Main Controller
PARSERFSM
PARSER
CABADFSM
CABAD
CAVLDFSM
CAVLD
IQ/ITFSM
IQ/IT
MVGFSM
MVG
INTERPFSM
INTERP
IPREDFSM
IPRED
BSGFSM
BSG
DFFSM
DF
MFUFSM
MFU
MAIN CONTROL FSM
Decoder Controller
H.264 Video Decoder
CPU Display MemoryController Ethernet
AHB
YLLIN NTHU-CS 27
Performance Gap in Elastic Pipeline
0
20
40
60
80
100
120
140
160
180
Idle Cycles/MB 0 36.16 89.87 31.85 116.85 42.85
Procesing Cycles/MB 160.85 124.69 70.98 129 44 118
Whole CABAD IQ/IT IPRED BSG DF
Actual
IdealPattern: pedestrian QP: 28 Resolution:720*480 GOP: IIIFrame #: 30
About 25% performance drop between the actual and the ideal situation
YLLIN NTHU-CS 28
Elastic Pipeline Decoder Timing Diagram (I Frame)
CABAD
PARSER
IQ/IT
BSG
DF
(time)Header information decode
Initial context table and condition offset
IPRED
MB-Level decode
Bubble cycles
100 to 1000 cycles per MB
YLLIN NTHU-CS 29
Elastic Pipeline Decoder Timing Diagram (I Frame)
CABAD
PARSER
IQ/IT
BSG
DF
(time)Header information decode
Initial context table and condition offset
IPRED
MB-Level decode
Bubble cycles
YLLIN NTHU-CS 30
Timing Diagram after ASAP Scheduling
CABAD
PARSER
IQ/IT
BSG
DF
(time)Header information decode
Initial context table and condition offset
IPRED
MB-Level decode
Reduced bubble cycles
However, bubble cycles still exist
YLLIN NTHU-CS 31
Timing Diagram after ASAP Scheduling
CABAD
PARSER
IQ/IT
BSG
DF
(time)Header information decode
Initial context table and condition offset
IPRED
MB-Level decode
Remaining bubble cycles
YLLIN NTHU-CS 32
Timing Diagram after Cyclic Queue Insertion
CABAD
IQ/IT
BSG
DF
(time)Header information decode
Initial context table and condition offset
IPRED
PARSER
MB-Level decode
Reduced remaining bubble cycles
Total reduced bubble cycles
YLLIN NTHU-CS 33
Elastic Pipeline Decoder Timing Diagram (P/B Frame)
CABAD
IQ/IT
MVG
DF
(time)Header information decode
Initial context table and condition offset
INTERP
PARSER
BSG
MB-Level decode
YLLIN NTHU-CS 34
Cyclic Queue Decoder Timing Diagram (P/B Frame)
CABAD
IQ/IT
MVG
DF
(time)Header information decode
Initial context table and condition offset
INTERP
PARSER
BSG
MB-Level decode
Reduced Bubble Cycles
Reduced Processing Cycles
YLLIN NTHU-CS 35
Comparison of Different Scheduling Methods
2.62
5.6 5.6
8.3
486
620 644
540
486
161 159 140
0
50
100
150
200
250
300
350
400
450
500
550
600
650
Sequential Elastic Pipeline ASAP Ping-Pong ASAP Cyclic-queue
0
1
2
3
4
5
6
7
8
9
SRAM Usage Turnaround Cycle Processing Cycle
(Cycles/ MB)KB
Test Pattern: pedestrianResolution: 720*480 QP: 28 GOP: IIIFrame #: 30
YLLIN NTHU-CS 36
Clock Gating
Manual clock gatingassign gclk = clk & ip_en;
clk
cabac_en picrec_en idct_en ipred_en df_en mc_en
cabac picrec idct ipred df mc
YLLIN NTHU-CS 37
Two Clock Gating Methods
Yes, FunctionNo, MalfunctionViability
3790(32.28%)1037(8.9%)# of un-gated register
7952(67.72%)10619(91.1%)# of gated registers
61420# of clock gating elements
Module-basedRegister-based
YLLIN NTHU-CS 38
mfu
parser cabad idct ipred interp df
bsgmain_ctrltopamba_wrap mvg
def
rtl syn vn nlint gate_sim
rtl_simfilelist tbench
Sub IPhd_amba
Verification Environment
H264
filelist
fpga_lib
gate_simasic_lib
synjm11.0
mem netlist
rtl_sim
tbench
lm_wrap
nlintvn
xilinx_mem altera_mem artisan_mem
Easy Bug Tracing
YLLIN NTHU-CS 39
Verification Flow
IP Spec
IP design IP testbenchdesign
RTL code (3)IP linting
no error? RTL-Sim
func. currect?
CoverageAnalysis
IP Synthesis& Gate-Sim
meetcriterion?
func. currect? IP Delivering
Sys Spec
Sys testbenchdesign
RTL code (2)
IP/SysIntegration
RTL code (5)
IP/SysRTL-Sim
Sysfail?
IPfail?
IP/Sys Synthesis& Gate-Sim
RTL code (4)
func.currect?
HW imageDelivering
YesNo
YesNo
YesNo
YesNo Yes No
Yes
No
No
Yes
SW Spec
SW design
C code
Compilation
HW image
Sys building
Prototype
SW image
Design SpecmodifyReference SWSW profiling
Test dataextraction
C model
GoldenTest Data
(1)
Sys design
(2)
(1)
(2)(1)
(3)
(3)
(1)
(1) (3)
(5)
(4)
(4)
(1)
SW DesignerSys Designer
IP Designer
YLLIN NTHU-CS 40
A Multimedia SOC PlatformCPU Accelerator(FPGA)
USB(PHY)Daughter Board
ROM/Flash Memory
SRAMSDRAM
VIC USB 2.0 Staticmemory SDRAM Controller(4-CH)
JPEGCodec DMA SRAM PWM WDT TIMER
APBBridge Capture
DisplayController
DAI SSI SD SM UART GPIO 12C
Audio CodecI2S
Flash memory with SSI Flash Card Button LED
Video-InCCIR601 TV/LCD
High-Speed Bus
Peripheral Bus
FPGA
YLLIN NTHU-CS 41
Summary
Super High Definition Video Capturing, Delivery and Display are on the Horizon
Massive Parallelism is Essential for Making Consumer Applications Possible
Tradeoff Among Memory Usage, Bandwidth and Logic Has Profound Impact on the Overall System Performance
System Design Should Be Adaptable to Content, Quality Variation
Thank You!!