Upload
eleanore-conley
View
220
Download
0
Embed Size (px)
Citation preview
NCTU, EE, Vision Lab
Implementation of H.264 Based System on Multi-DSPs Board
陳奕安 2008.02.13
1
NCTU, EE, Vision Lab
Outline
System descriptionArchitectureMEX BoardTMSDM642
Communication interface Software development Error resilience
2
NCTU, EE, Vision Lab
PC 2
Architecture
MEX Board 2
MEX Board 1
CaptureFrameCaptureFrame
H.264 EncodeH.264 Encode
Send to NetworkSend to Network
DisplayDisplay H.264 DecodeH.264 Decode
Receive from Network
Receive from Network
PC 1
PC 2
3
NCTU, EE, Vision Lab
MEX Board
MEX board is composed of : 4 DSP TMS320DM642 for data stream
compression (video/audio) and its memory. 2 FPGA for flexible architecture 8 video chips SA6711H(ADC) 44 audio stereo chip CS4221(ADC audio stereo chip CS4221(ADC)
4
NCTU, EE, Vision Lab
MEX Board
4 DM6424 DM642
22 FPGAFPGAVideo/Audio ChipVideo/Audio Chip
Block Diagram of MEX board[1] 5
NCTU, EE, Vision Lab
MEX Board Block Diagram
Block Diagram of MEX board[1]
6
NCTU, EE, Vision Lab
TMS320DM642 TMS320DM642
Performance : 4000-4800 MIPSTwo Level Cache :
○ L2: 256 KB, L1P: 16 KB, L1D: 16 KB3 Video Ports8-Bit McASP Ethernet MAC32-Bit HPI66 MHz PCI64-Bit EMIF
DSP DM642 block diagram[2]7
NCTU, EE, Vision Lab
TMS320DM642
Peripherals will be used: Enhanced DMA (EDMA)Video ports (VP0~VP2)Inter-integrated circuit (I2C) busExternal memory interface (EMIF)Ethernet media access controller(EMAC)Management data input/output (MDIO)
8
NCTU, EE, Vision Lab
Outline
System description Communication interface
Host/ MEX CommunicationVideo capturing/ DisplayingNetwork Transmit
Software development Error resilience
9
NCTU, EE, Vision Lab
PC
MEX
Host/ MEX Communication
DSP started : fill memory
Initializetransfer
DSP to PCItransferrequest
Start TransferTransferfinished
Set DSP FIFO DirectionSet FIFO Full Flag valueDSP FIFO is reset
Start EDMAUnreset DSP1 FIFOClear PCI Interrupt
PCI started :wait for interrupt
Initializetransfer
PCI to DSPstart transferrequest
Wait fortransferfinished
Transferfinished
Set transfer sizeSet PCI FIFO directionSelect DSP data sourcesSet transfer destinationaddressStart PCI FIFOClear DSP Interrupt
10Data transfer from the 4 DSP (SDRAM) to PCI [1]
NCTU, EE, Vision Lab
Video Capture
Camera
MEX Board
Video ChipSAA7113H
(ADC)
DM642
VP0
VP1
VP2
ITU656 : Digital / for PAL or NTSC
Raw Data
DMA
NTSC : Analog / 525-line per frame / 30 frames per secondor PAL : Analog / 625-line per frame / 25 frames per second
I2C BUS
11
NCTU, EE, Vision Lab
TMS320DM642 Video Port
12[3]
NCTU, EE, Vision Lab
Network ArchitectureMEX Board 1
PHYLXT971ALC
DM642
EMAC
MDIO
MEX Board 2
PHYLXT971ALC
DM642
EMAC
MDIO
RJ45
13
NCTU, EE, Vision Lab
TMS320DM642 EMAC DM642 Networking Using EMAC and MDIO
14DM642 Networking [4]
NCTU, EE, Vision Lab
Outline
System description Communication interface Software development
H.264 CodecOptimizationParallelizationMemory Issue
Error resilience
15
NCTU, EE, Vision Lab
H.264 Encoder Block Diagram
Fn
(current)
F'n-1
(reference)
ME
MC
Intraprediction
ChooseIntra
predition
F'n(reconstructed)
Filter
T Q ReorderEntropyencode
T -1 Q -1
Dn X
D'n
Inter
Intra
P
uF'n
NAL
(1 or 2 previouslyencoded frames)
16
NCTU, EE, Vision Lab
H.264 Decoder Block Diagram
17
F'n-1(reference)
F'n(reconstructed)
MC
Intraprediction
Filter T -1 Q -1 ReorderEntropyencode
X D'n
P
Inter
Intra
uF'nNAL
NCTU, EE, Vision Lab
Optimization on Single Chip
Realization and Optimization of DSP Based H.264 Encoder [5]
Optimization of H.264 on DSP platformCode transplant and primary optimizationOptimization of the key module Using TI C64x IMAGLIB
Data scheduling and storage allocation Data scheduling with EDMAStorage allocation (Code section/Data section)
NCTU, EE, Vision Lab
Parallelization on Chips
One GOP in one DSPEach DSP handles IPPP… or IBBPBB... .
No dependences are between group of pictures (GOPs).
One Frame / One macroblck in one DSPEach DSP handle one frame or one macroblock.
Dependences are between frames and macroblocks.
19
NCTU, EE, Vision Lab
Macroblock Dependencies Data dependencies induced by inter-prediction:
Motion vector MVcur are predicted from MVA~D
20
MVD MVB MVC
MVA MVcur
Reference frame
Current frame
Data dependencies induced from MV prediction [6]
NCTU, EE, Vision Lab
Macroblock Dependencies Data dependencies induced by intra-prediction: Left, upper-left, upper, and upper-right MBs
Data dependencies induced from intra prediction [6]
21
NCTU, EE, Vision Lab
Macroblock Dependencies Data dependencies induced by deblocking
filter:Top 4 rows of pixels and leftmost 4 columns
22
Data dependencies induced from deblocking filter [6]
NCTU, EE, Vision Lab
Intra Pred.MV Pred.
Intra Pred.MV Pred.DeblockingFitler
Intra Pred.MV Pred.
Intra Pred.MV Pred.DeblockingFitler
Current MB
Macroblock Dependencies
23
Possible spatial data dependencies for a macroblock
Possible spatial data dependencies for a macroblock [6]
NCTU, EE, Vision Lab
Macroblock Dependencies Macroblock Dependencies:
Data dependencies between framesData dependencies between MB rows in the
same frameData dependencies in the same MB row
24
NCTU, EE, Vision Lab
Wave-front parallelization Partition for MB region
Wave-front of Macro-block Region Partition [7]
25
NCTU, EE, Vision Lab
Wave-front parallelization
Wave-front of Frame Partition [7]
26
Partition for frames
NCTU, EE, Vision Lab
Memory Issue
27
L1P Cache Direct Mapped 16Kbytes Total
DM642 DSP Core
L1D Cache 2-way Set Associated 16Kbytes Total
L2 Cache/ M
emory
256Kbytes T
otal
Two-level cache architecture of DM642
ED
MA
Controller
peripherals Limited memory of DM642 Use memory buffer to reduce memory access
NCTU, EE, Vision Lab
Memory Issue
Memory hierarchy for inter prediction
28
Memory hierarchy [8]
NCTU, EE, Vision Lab
Memory Issue
Slice memory buffer for intra prediction and deblocking filter
Slice Memory [9]
29
NCTU, EE, Vision Lab
Outline
System description Communication interface Software development Error resilience
Error-Resilience Tools in H.264/AVCError resilience of JM source code
30
NCTU, EE, Vision Lab
Error Resilience Tools in H.264/AVC Redundant slices (RSs) [10]
For a MB, an encoder can place redundant representation of the same MBs into the same it stream.
e.g.○ One slice is coded using different quantization parameter
(QP).○ If the slice of low QP is available, the decoder discards the
RS; otherwise, the RS is reconstructed by the decoder
Slice AQP1
Slice AQP2
Decoder
NCTU, EE, Vision Lab
Parameter sets [10]Including picture size, entropy coding method, MV
resolution, and so on.Sequence parameter set (SPS)
○ Containing all information related to the picture sequence between two IDR (Instantaneous Decoding Refresh ) pictures.
Picture parameter set (PPS)○ Containing all information related to all slices in a
picture.e.g. Sending multiple copies of SPSs so to
enhance the arrival rate.e.g. SPSs can be sent out-of-band.
Error Resilience Tools in H.264/AVC
NCTU, EE, Vision Lab
Error Resilience Tools in H.264/AVC Flexible macro-block ordering (FMO) [10]
7 modes Overhead bits highly depends on the picture format, the
content, and the QP. ○ < 5% penalty at QP = 16; on average 20% at QP = 28.
6 modes of FMO [10]
NCTU, EE, Vision LabError Concealment of H.264/AVC Error concealment scheme provided in JM
Intra
Inter○ }|)mv(|{
1},,,{
argmin NYYdN
j
OUTj
INj
dirsm
rightleftbottopdir
Error concealment for macro-blocks [11]
NCTU, EE, Vision Lab
Future Work
Optimization the H.264 codec for real time
Implementation of different concealment methods
Proposed corresponding error resilience methods
NCTU, EE, Vision Lab
Reference [1] VITEC MULTIMEDIA, “MEX User manual Revision 1.7”. [2] Texas Instruments, Incorporated “TMS320C64x DSP Generation Product Bulletin” (sprt236) [3] Texas Instruments, Incorporated “TMS320DM64x Video Port to Video Port Communication.”
(spraaf3) [4] Texas Instruments, Incorporated “TMS320C6000 DSP Ethernet Media Access Controller (EMAX)
Management Data Input Output Module Reference Guide.” (spru628a) [5] Zhe Wei and Canhui Cai “Realization and Optimization of DSP Based H.264 Encoder “, ISCAS
2006 Circuits and Systems, May 2006 [6] Chen, Y., Li, E., Zhou, X., Ge, S. “Implementation of H. 264 Encoder and Decoder on Personal
Computers.” Journal of Visual Communications and Image Representation 17 (2006) [7] Zhuo Zhao, and Ping Liang, “Data partition for wave-front parallelization of H.264 video encoder”,
31st IEEE International Conference on Acoustics, Speech, and Signal Processing (2006) [8] Denolf, K. De Vleeschouwer, et al,, “Memory centric design of an MPEG-4 video encoder” , IEEE
Trans. CSVT, Vol. 15, No. 5, pp. 609-619, May 2005. [9] Tsu-Ming Liu et al., “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile
Applications,” ISSCC Digest of Technical Papers, pp. 402-403, Feb. 2006. [10]S. Wenger, “H.264/AVC over IP,” IEEE Trans. Cir. Syst. Video Technol., vol. 13, pp. 645–656,
July 2003. [11] "Non-normative error concealment algorithms , ITU-T VCEG-N62[S】, 2001 一 O9
36
NCTU, EE, Vision Lab
H.264 Partitions
Frame partitions Macroblock partitions
161
32
16 8
8 0
4
4
16
16
16x16 blocks 8x8 blocks 4x4 blocks
37
NCTU, EE, Vision Lab
H.264 Intra-Mode Decision
38
NCTU, EE, Vision Lab
H.264 Intra-Mode Decision
39
16*16 plane
4*4 horizontal
NCTU, EE, Vision Lab
23/4/21 40
Fast integer & fractional pixel motion estimation
Integer pixel search scheme
-15 -10 -5 0 5 10 15-15
-10
-5
0
5
10
15
Cover both small motion and large motions, the search point which gives the smallest matching error from one step is the starting point of next step.
Around 130 points searched in this algorithm, the save is (33x33-130)/(33x33) 90%!If there are 3 starting points are tried, the save is around 64%!
Assume the guessed starting point is (0,0).
step2-1
Step 2-1. local full-search around the starting point
step2-2
Step 2-2. Uneven multi-hexagon search
step3-1
Step 3-1. Extended Hexogon-based searchThe search will continue until the minimal matching error point is the center of the new hexagon.
step3-2
Step 3-2. Center biased search.
step1
Step 1. Unsymmetrical-cross search
NCTU, EE, Vision Lab
23/4/21 41
Fast integer & fractional pixel motion estimation
Fractional pixel search scheme
Best matching integer point coming from integer motion search
1. Search its 1/2 -pixel neighbors2. Search its 1/4-pixel neighbors3. Search its 1/8-pixel neighbors
The optimal point is the search center ofnext step search.