122
數位電視之 H.264 視訊 編解碼技術 楊士萱 國立台北科技大學資訊工程系

楊士萱 國立台北科技大學資訊工程系shyang/DTV/H.264 for DTV.pdf · Motion Estimation in H.264/AVC,” IEEE Trans. CSVT, ... Rate-Constrained Mode Decision: ... control

  • Upload
    lamtruc

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

  • H.264

  • 2

    MPEG-4/H.264H.264DVB-HRTPH.264 NALUH.264

  • MPEG-4/ H.264

  • 4

    MPEG-4MPEG-4 part 2 (1999)

    MPEG-2 video

    MPEG-2 MPEG-4 SP (simple profile)ASP(advanced simple profile)20% - 30%DVDavi

    MPEG-4 part 10 (ver. 1 2003)(advanced video coding, AVC)H.264

    MPEGMPEG-2MPEG-4 AVC50% H.264

    1~2 MbpsDVD6 MHz(19 Mbps)1080i HDTV

    (DVB-T HDTVDVB-H)Blu-Ray

  • 5

    MPEG-4 (part 2)()

  • 6

    DVB-H

    H.264 VCLH.264 NALNAL(NAL unit, NALU)HE-AAC v2AU (Access Unit)TS 102 005 IP Datacast(handhelddevices with severe limitations on computational resoureces and battery) H.264NALURFC 3984RTPRTPUDP (User Datagram Protocol)IP (Internet Protocol)IP datagramIP datagramMPEMPE-FECMPEG-2 Systems MPEG-2(Transport Stream, TS)

  • 7

    ETSI TS 102 005DVB-HH.264(capability)

    ABCbaseline profileBconstraint_set1_flag1baseline profileFMOASORSDVB-HMPE-FECH.264

  • 8

    MPEG

    MPEG-2For encoders, decoders, and consumer products (e.g., digital camcorders, DVD-RW), $2.50 per unit.For products, $4.00 per data stream.

    MPEG-4 part 2For encoders and decoders, $0.25 per unit, no charge on first 20k unit/year, annual cap of 1M.For encoder/decoder use (say CATV operators), $0.25/subscriber or $0.02/hour, subject to 1M annual cap, no charge on first 50k subscribers/year.

    MPEG-4 AVCBaseline is conditionally royalty free.

  • H.264

  • 10

    H.264/MPEG-4 AVC

  • 11

    H.264

    H.264MPEG

    H.264

  • 12

    Functional Blocks and Features of H.264.

    Yu-Wen Huang et al. Analysis and Complexity Reduction of Multiple Reference Frames Motion Estimation in H.264/AVC, IEEE Trans. CSVT, vol. 16, no. 4, April 2006.

  • 13

    H.264

    1/4

    44 DCT

  • 14

    (variable block size)

    7(texture)-16 1616 1616 88 168 88 8 8 88 44 84 41/4

  • 15

    Runtime Percentages of Functional Blocks in H.264/AVC Baseline Encoder

    Motion estimation (along with mode decision) is the most time-consuming process for an H.264 video encoder.

    Yu-Wen Huang et al. Analysis and Complexity Reduction of Multiple Reference Frames Motion Estimation in H.264/AVC, IEEE Trans. CSVT, vol. 16, no. 4, April 2006.

  • 16

    H.264 Inter-Mode Decision

    M (macroblock) types: Partitions with luma block sizes of 1616, 168, 816, 88.

    SKIP mode: a special subclass of the 1616 partitioning, where the best reference frame, motion vector, and transform coefficients are all equal to the predicted values.

    88 (sub-macroblock) types: further division of 88 partition into smaller regions of 84, 48, or 44 luma samples.

    It occurs when the 88 partitioning produces the lowest RDcost among the four M types.

    M types

    16x16 16x8 8x16

    00

    10 1

    8x8

    0

    2

    1

    38x8 types 0 1

    2 30 1

    01

    0

    8x8 8x4 4x8 4x4

    Skip mode

    0

  • 17

    RDcost and Inter-Mode Decision

    )MODE,,()MODE,,(SSD)|MODE,,( MODEMODE csRcscsJ +=

    Rate-Constrained Mode Decision: Rate-Distortion Cost (RDcost)

    where SSD denotes the sum of the square differences between the original blocks s and its reconstruction c, QP is the quantization parameter, and mode is an overall partition type for a 1616 MB.

    3/)12QP(MODE 285.0

    =

  • 18

    (Paris)

    M types (including the SKIP mode) are adequate for slow-moving or homogeneous regions to reduce the overhead information and complexity. Sub-macroblock types are preferred in high-motion or detailed regions to increase the matching accuracy.

  • 19

    Test Sequences (examples)QCIF(176x144)

    foreman carphone news

    CIF(352x288)hall_monitor container coastguard

    SIF(352x240)tennis football garden mobile

  • 20

    Optimal Mode Distribution (QP=28)Resolution Sequences p(SKIP) p(1616) p(168) p(816) p(88) p(I4MB) p(I16MB)

    Foreman 28.56% 28.59% 11.62% 15.38% 15.50% 0.15% 0.20%

    Container 79.89% 9.57% 3.47% 3.47% 3.47% 0.00% 0.13%

    Carphone 34.83% 27.82% 9.82% 11.28% 14.63% 0.55% 1.06%

    News 73.69% 7.32% 3.82% 5.01% 9.78% 0.31% 0.08%

    Silent 63.64% 12.19% 4.58% 6.45% 11.12% 0.95% 0.15%

    Tennis 38.21% 18.17% 10.17% 9.45% 19.56% 1.83% 2.62%

    Football 13.51% 20.21% 9.59% 9.41% 38.22% 8.98% 0.08%

    Garden 11.47% 22.53% 12.63% 6.58% 46.26% 0.08% 0.45%

    Foreman 30.41% 33.41% 10.26% 12.37% 10.86% 1.47% 1.23%

    Container 73.75% 15.43% 3.46% 3.71% 3.06% 0.02% 0.58%

    Hall_Monitor 64.12% 17.53% 5.16% 3.36% 6.52% 0.42% 2.89%

    Coastguard 12.59% 35.73% 14.40% 15.14% 20.52% 1.34% 0.28%

    Stefan 25.41% 26.73% 11.92% 10.30% 23.74% 0.73% 1.16%

    Mobile 5.22% 28.97% 16.85% 16.08% 32.66% 0.05% 0.18%

    Tempete 12.52% 32.02% 15.43% 14.01% 24.66% 1.01% 0.34%

    Average 37.85% 22.41% 9.55% 9.47% 18.70% 1.19% 0.76%

    CIF352288

    SIF352240

    QCIF176144

  • 21

    MPEGH.264

    Decoupling of referencing order from display orderIn prior standards such as MPEG-2, there was a strict dependency between the ordering of pictures for motion compensation referencing purposes and the ordering of pictures for display purposes. In H.264, these restrictions are largely removed, with constraints only by a total memory capacity bound imposed to ensure decoding ability. Removal of the restriction also enables removing the extra delay previously associated with bi-predictive coding.

  • 22

    Inter-Frame Prediction in P Slices

    More than one prior coded picture can be used as reference for motion motion-compensated prediction of P slices. Multiframe motion-compensated prediction requires both encoder and decoder to store the reference pictures (the reference picture list is called list 0) used for inter prediction in a multipicture buffer. Unless the size of the multipicture buffer is set to one picture, the index at which the reference picture is located inside the multipicture buffer has to be signaled. The reference index parameter is transmitted for each motion-compensated 1616, 168, 8 16, or 88 luma block. Motion compensation for smaller regions than 88 use the same reference index for prediction of all blocks within the 88 region. P_Skip mode (no extra side information is transmitted).

  • 23

    Inter-Frame Prediction in B SlicesThe concept of B slices is generalized in H.264. Other pictures can reference pictures containing B slices. Thus, the substantial difference between B and P slices is that some macroblocks or blocks in B slices may use a weighted average of two distinct motion-compensated prediction values. B slices utilize two distinct lists of reference pictures, list 0 and list 1. Which pictures are actually located in each reference picture list is an issue of the multipicture buffer control. An operation very similar to the conventional MPEG-2 B pictures can be enabled if desired by the encoder. Four different types of inter-picture prediction: list 0, list 1, bi-predictive, and direct prediction. For the bi-predictive mode, the prediction signal is formed by a weighted average of motion-compensated list 0 and list 1 prediction signals. The direct prediction mode is inferred from previously transmitted syntax elements and can be either list 0 or list 1 prediction or bi-predictive.Direct Modes

    B_Skip mode (no further data is present for the macroblock in the bitstream) The functions MbPartWidth( B_Skip ), and MbPartHeight( B_Skip ) are used in the derivation process for motion vectors and reference frame indices.B_Direct mode (no motion vector differences or reference indices are present for the macroblock in the bitstream) The functions MbPartWidth( B_Direct_16x16 ), and MbPartHeight( B_Direct_16x16 ) are used in the derivation process for motion vectors and reference frame indices.

  • 24

    Rate-Constrained Motion Estimation

    Motion estimation is also performed in a rate-constrained framework. The encoder minimizes the Lagrangian cost function

    with the motion vector m, the predicted motion vector p, the reference frame parameter r, and the Lagrange multiplier SAD for the distortion measure. Note that SAD is simpler than SSD andThe rate term represents the motion information and the number of bits associated with choosing the reference picture . The rate is estimated by table lookup using the universal variable-length code (UVLC) table, even if the arithmetic entropy-coding method is used. For integer-pixel search, is the summed absolute difference between the original luminance signal and the motion-compensated luminance signal. In the subpixel refinement search, the sum of the absolute Hadamardtransform coefficients of the difference between the original luminance signal and the motion-compensated luminance signal is calculated.

    MODESAD =

    ),(),(SAD),|,( SADSAD rpmRrmprmJ +=

  • 25

    1/4(Fractional-Sample Motion Compensation)

    6-tap filtering at half-sample positions (luma)b =[(E - 5F + 20G + 20H - 5I + J) +16] >> 5h =[(A - 5C + 20G + 20M - 5R + T) +16] >> 5 j =[(cc - 5dd + 20h + 20m 5ee + ff) +16] >> 5

    Quarter-sample positions (luma)a = (G + b + 1) >> 1c = (b + h + 1) >> 1 e = (b + h + 1) >> 1

    The prediction values for the chroma component are always obtained by bilinear interpolation.

  • 26

    Blocking ArtifactsTwo sources:

    Block-based integer DCT. Coarse quantization of the transform coefficients can cause visually disturbing discontinuities at the block boundaries. It is also well known that the coding errors are larger near the block boundaries than in the middle of the block.Motion compensated prediction. Motion compensated blocks are generated by copying interpolated pixel data from different locations of possibly different reference frames.

    Two approaches in integrating deblocking filters into video codecs: Post filters operate on the display buffer outside of the coding loop, and thus are not normative in the standardization process. Because their use is optional, post-filters offer maximum freedom for decoder implementations.Loop filters operate within the coding loop. That is, the filtered frames are used as reference frames for motion compensation of subsequent coded frames. This forces all standard conformant decoders to perform identical filtering in order to stay in synchronization with the encoder.

  • 27

    Advantages of In-Loop DeblockingFiltering

    To guarantee a certain level of qualityWith a loop filter in the codec design, content providers can safely assume that their material is processed, guaranteeing the quality level expected by the producer.

    No need for an extra frame buffer in the decoderIn the post-filtering approach, the frame is typically decoded into a reference frame buffer. An additional frame buffer may be needed to store the filtered frame to be passed to the display device. In the loop-filtering approach, however, filtering can be carried out macroblock-wise during the decoding process, and the filtered output stored directly to the reference frame buffers.

    Loop filtering typically improves visual quality with significant reduction in decoder complexity compared to post filtering.

    Quality improvements are mainly due to the fact that filtered reference frames offer higher quality prediction for motion compensation. Reductions in computational complexity can be achieved by taking into account the fact that the image area in past frames is already filtered, and thereby optimizing the filtering process accordingly.

  • 28

    H.264 In-Loop Deblocking Filtering Phase I: Boundary Analysis (1/2)

    Block modes and conditions Bs

    One of the blocks is Intra andThe edge is a macroblock edge 4

    One of the blocks is Intra 3

    One of the blocks has codedresiduals 2

    Difference of block motion 1 luma sample distance 1

    Motion compensation fromdifferent reference frames 1

    Else 0

    Boundary-Strength (Bs) parameter is assigned to every edge between two 4 4 luminance sample blocks. The conditions are evaluated from top to bottom, until one of the conditions holds true. Bs determines the strength of the filtering performed on the edge. A value of 4 means the strongest filtering, whereas a value of 0 means no filtering is applied on this edge.The Bs values for filtering of chrominance block edges are copied from the values calculated for their corresponding luminance edges.

  • 29

    H.264 In-Loop Deblocking Filtering Phase I: Boundary Analysis (2/2)

    In deblocking filtering, it is crucially important to be able to distinguish between true edges in the image and those created by quantization. In order to separate these two cases, the sample values across every edge to be filtered are analyzed. Let us denote one line ofFor edges with nonzero Bs values, a pair of quantization-dependent parameters, referred to as and , are used in the content activity check that determines whether each set of samples is filtered. Filtering on a line of samples only takes place if the following three conditions hold:

    |p0 q0| < (IndexA)|p1 p0| < (IndexB)|q1 q0| < (IndexB)

  • 30

    H.264 In-Loop Deblocking Filtering Phase II: Filtering (1/2)

    Filtering operations are conducted on a macroblock basis, with horizontal filtering (vertical edges) performed first, followed by vertical filtering. Both directions of filtering on each macroblock must be conducted before moving on to the next macroblock. The macroblocks are filtered in raster-scan order throughout the picture. For each luminance macroblock, the left-most edge of the macroblock is filtered first, followed from left to right by the three vertical edges that are internal to the macroblock. Similarly, the top edge of the macroblock is filtered first in the horizontal filtering pass, followed by the three internal horizontal edges from top to bottom. Chrominance filtering follows a similar order, with one external edge and one internal edge in each direction for each 8 8 chrominance macroblock.Filtering is conducted in-place, so that the modified sample values after featuring each line of samples across an edge are used as input values to subsequent operations.

  • 31

    H.264 In-Loop Deblocking Filtering Phase II: Filtering (2/2)

    Two filtering modes are defined and are selected based on the Bsparameter for a set of samples. Stronger filtering is applied when Bs is equal to 4;

    Intra coding in H.264 tends to use 1616 modes when coding nearly uniform image areas. However, due to the Mach band effect, even very small differences in the intensity values at the macroblock boundaries are perceived as abrupt steps in these cases To compensate for this tiling effect, stronger filtering is applied on boundaries between two macroblocks with smooth image content.

    For Bs = 1, 2, or 3,p0 and q0 are modified by + and -.p1 and q1 are only modified if (p0, p2) and (q0, q2) are close enough.Clipping (limiting the modified values) is performed when too much low-pass filtering (blurring) occurs.

  • 32

    Complexity Issues of In-Loop Deblocking

    Even after a tremendous effort in speed optimization of the filtering algorithms, the filter can easily account for one-third of the computational complexity of a decoder. This is true even though the loop filter can be implemented without any multiplication or division operations.The complexity is mainly based on the high adaptivity of the filter, which requires conditional processing on the block edge and sample levels. As a consequence, conditional branches almost inevitably appear in the inner most loops of the algorithm. These are very time consuming and are also quite a challenge for parallel processing in DSP hardware or SIMD (Single Instruction Multiple Data) code on general-purpose processors.Another reason for the high complexity is the small block size employed for residual coding in the H.264 coding algorithm. With the 44 blocks and a typical filter length of 2 samples in each direction, almost every sample in a picture must be loaded from memory, either to be modified or to determine if neighboring samples will be modified.

  • 33

    (Intra Prediction)H.264 4 4 Intra(a~p)(A~M)A~MDC9

  • 34

    4 4 Intra Prediction Example

    The 9 prediction modes (0-8) are calculated for the 4x4 block shown above. The left figureshows the prediction block P created by each of the predictions. The Sum of Absolute Errors (SAE) for each prediction indicates the magnitude of the prediction error. In this case, the best match to the actual current block is given by mode 7.

  • 35

    1616 luma Intra prediction modes and chroma Intra prediction

    1616 luma Intra prediction Mode 0 (vertical): extrapolation from upper samples (H).Mode 1 (horizontal): extrapolation from left samples (V).Mode 2 (DC): mean of upper and left-hand samples (H+V).Mode 4 (Plane): a linear plane function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance.

    Chroma samples assume the same Intra prediction modes.

  • 36

    44 DCTMPEG-4 AVCDCT44 DCT

    44 DCT()

    MPEG-4 AVC44

    High Profile88 DCT

    =

    1221111121121111

    T

  • 37

    Scan order of transform coefficient levels

    zig-zag scan: A specific sequential ordering of transform coefficient levels from (approximately) the lowest spatial frequency to the highest. Zig-zag scan is used for transform coefficient levels in frame macroblocks.

    field scan: A specific sequential ordering of transform coefficients that differs from the zig-zag scan by scanning columns more rapidly than rows. Field scan is used for transform coefficients in field macroblocks.

  • 38

    Coded Block Pattern (CBP)CBP specifies which of the six 88 blocks in an MB (for 4:2:0 subsampling) contains non-zero transform coefficients.

    Some CBP examples and its implications:CBP = 0 (000000). All the luma and chroma blocks are uncoded, which implies that motion estimation is very accurate.CBP = 1, 2, 4, or 8. In this case, only one of the four luma blocks (and no chroma blocks) requires residual coding.CBP = 16 or 32. The luma blocks are uncoded while some chroma blocks require residual coding.

    CodedBlockPatternChroma Description

    0 All chroma transform coefficient levels are equal to 0.

    1 One or more chroma DC transform coefficient levels shall be non-zero valued. All chroma AC transform coefficient levels are equal to 0.

    2 Zero or more chroma DC transform coefficient levels are non-zero valued. One or more chroma AC transform coefficient levels shall be non-zero valued.

  • 39

    MPEG-4 AVC

    CAVLCCABACuniversal variable-length code (UVLC): Exp-Golomb codes

    (context-adaptive variable length coding, CAVLC) Baseline Profile(context-adaptive binary arithmetic coding, CABAC) Main Profile(syntax elements)

    CABACCAVLC5%15%

  • 40

    Exp-Golomb Codes (Exponential-Golomb code)

    To encode a nonnegative integer codeNum:Write it in binary representation of codeNum + 1. Call it x. Suppose that x contains p bits. Attach p-1 0s to x. The resulting binary string z of 2p-1 bits is the corresponding Exp-Golomb codeword z (denoted as Bit string) of codeNum.

    Examples:codeNum = 3. x = 100. z = 00100.codeNum = 8. x = 1001. z = 0001001.

    Decoding method (obtaining codeNum from Bit string z):codeNum = 2leadingZeroBits -1 + read_bits(leadingZeroBits) where read_bits( leadingZeroBits ) is the value removing leadingZeroBits.Example: z = 0001001. leadingZeroBits = 3. read_bits(leadingZeroBits) = 0012. 1. So codeNum = 23 -1 + 1 = 8.

  • 41

    CAVLC in H.264In this scheme, VLC tables for various syntax elements are switched depending on already transmitted syntax elements. Sincethe VLC tables are designed to match the corresponding conditioned statistics, the entropy coding performance is improved in comparison to schemes using a single VLC table.In the CAVLC entropy coding method, the number of nonzero quantized coefficients (N) and the actual size and position of the coefficients are coded separately. The number of nonzero coefficients for adjacent blocks are highly correlated.After zig-zag scanning of transform coefficients, their statistical distribution typically shows large values for the low frequency part decreasing to small values later in the scan for the high-frequency part. Furthermore, the trailing nonzero coefficients are mostly +1 or -1.

  • 42

    CAVLC Sections (1/2)

    Section A: Number CodingEncode (TotalCoeff, TrailingOnes) as a function (context) of nC.

    TotalCoeff = Number of Nonzero CoefficientsTrailingOnes = number of coefficients with absolute value equal to 1 at the end of the scannC = average number of non-zero coefficients (given by TotalCoeff) to the left and above the current blockFor luma blocks, 4 VLC tables correspond to 4 ranges of nC:

    0 nC < 2, 2 nC

  • 43

    CAVLC Sections (2/2)

    Coefficient values are coded in reverse scan order. The TrailingOnes need only sign specification. A starting VLC is used for the first coefficient. When coding the next coefficient, a new VLC may be used based on the just coded coefficient. Six exp-Golomb code tables are available for this adaptation.

    Section C: Distribution CodingEncode the distribution of zeros.

    total_zeros: the total number of zeros excluding EOB. 15 VLC tables in the context of TotalCoeff (in the range 115) are available.run_before: the number of 0s between the nonzero coefficient and the next nonzero coefficients (in reverse scan order)zerosLeft: the total number of 0s between the nonzero coefficient and DC coefficient. run_before zerosLeft

    (run_before, zerosLeft) is coded as a pair.

  • 44

    Example: 7,6,-2,0,-1,0,0,1,0,0,0,0,0,0,0,0

    Section A To encode (TotalCoeff = 5, TrailingOnes = 2)nCSuppose that nC = 3, then A = 0000101.

    Section BTo encode the Value of Nonzero Coefficients: +,-,-2,+6,+7

    Section CTo encode total_zeros (3). C0 = 111.To encode (run_before = 2, zerosLeft = 3) for the coefficient 1. C1 = 01.To encode (run_before = 1, zerosLeft = 1) for the coefficient -1. C2 = 1.

  • 45

    (Error Protection Mechanisms)

    (error propagation)

    (prediction)(variable length coding)

    (forward error correction, FEC)(parity-check bits)(error resilience)

    (error concealment)

  • 46

    Use Case of Error Protection

  • 47

    Error Resilience Tools (also available in older standards) 1/2

    Picture segmentationH.264 supports picture segmentation in the form of slices. A slice is formed by an integer number of MBs of one picture.Macroblocks are assigned to slices in raster scan order, unless FMO is used.The main motivation for slices is the adaptation of the coded slice size to different MTU sizes, but they can also be used to implement schemes such as interleaved packetization.

    Placement of Intra MBs, intra slices, and intra picturesIntra placement is used primarily to combat drifting effects.H.264 has two forms of slices that contain Intra MBs only: Intra slices and IDR slices. IDR slices must always form a complete IDR picture. An IDR picture invalidates all short-term reference memory buffers, and, hence, has a stronger re-synchronization property than a picture that contains only Intra slices.

    Reference picture selection (with and without feedback)

  • 48

    Error Resilience Tools (also available in older standards) 2/2

    Data partitioningNormally, all symbols of a macroblock are coded together in a single bit string that forms a slice. Data partitioning, however, creates more than one bit strings (called partitions) per slice, and allocates all symbols of a slice into an individual partition that have a close semantic relationship with each other. In H.264, three different partition types are used.

    Type A Header information, including MB types, quantization parameters, and motion vectors. It is the most important, because the other partitions cannot be used without it. Type B Intra Partition. It carries Intra CBPs and Intra coefficients. The type B partition requires the availability of the type A partition of a given slice to be useful. Type CThe Inter Partition. It contains only Inter CBPs and Inter coefficients but is, in many cases, the biggest partition of a coded slice. In order to be used, they require the availability of the type A partition, but not the type B partition.

  • 49

    Parameter Sets in H.264Calling parameter sets an error-resilience tool is not entirely appropriatethey are generally used in all H.264 bit streams. The sequence parameter set contains all information related to a sequence of pictures (defined as all pictures between two IDR pictures), and a picture parameter set contains all information related to all the slices belonging to a single picture. Multiple different sequence and picture parameter sets can be available at the decoder in numbered storage positions. The encoder chooses the appropriate picture parameter set to use by referencing the storage location in the slice header of each coded slice. The picture parameter set itself contains a reference to the sequence parameter set to be used.The key of using parameter sets in an error-prone environment is to ensure that they arrive reliably, and in a timely fashion at the decoder. They can, for example, be sent out-of-band, using a reliable control protocol, and early enough to get them to the decoder. Alternatively, they can be sent in-band, but with appropriate application layer protection (e.g., by sending multiple copies). A third option is that an application hard-codes a few parameter sets in both encoder and decoder, which would be the only operation points of the codec.

  • 50

    New Error Resilience Tools in H.264 (1/2): FMO (Flexible Macroblock Ordering)

    These tools are available only in the Baseline and Extended profiles.FMO allows to assign MBs to slices in an order other than the scan order.To do so, each MB is assigned to a slice group using a macroblock allocation map (MBAmap). Each slice group can be divided into several slices. Within a slice group, MBs are coded using the normal scan order.In-picture prediction mechanisms, such as Intra prediction or motion vector prediction, are only allowed if the spatially neighboring MBs belong to the same slice group. So FMO usually causes lower coding efficiency.

    7 Types. Type 1: suitable in conjunction with error concealment. Type 2: foreground and background separation for different transmission priority. Type 6: user defined.

  • 51

    New Error Resilience Tools in H.264 (2/2): ASO and RS

    Arbitrary Slice Ordering (ASO)ASO enables sending and receiving the slices of the picture in any order relative to each other. Since each slice of a coded picture can be (approximately) decoded independently of the other slices of the picture, This capability (combined with FMO) can improve end-to-end delay in real-time applications, particularly when used on networks having out-of-order delivery behavior (e.g., IP networks).

    Redundant Slices (RSs)RSs allow an encoder to place, one or more redundant representations of the same MBs. The key difference betweena transport based redundancy, such as packet duplication and the use of RSs is that the redundantrepresentation in RSs can be coded using different coding parameters(e.g., QPs).

  • 52

    H.264

  • 53

    H.264 JM(informative)

    (frame loss)(slice loss)

    Intra concealment:

    Inter concealment:

    JM

  • 54

    EC

    ( )( ) [ ( ) ( )]

    ( )

    ( ) ( )( )

    =

    Njiyxji

    Njiyxji

    d

    djiYyxY

    ,,,

    ,,,15,

    ,~

    Candidate MB

    Inside Pixel

    Outside Pixel

    Insi

    de P

    ixel

    Insi

    de P

    ixel

    Inside Pixel

    Outside Pixel

    Out

    side

    Pix

    el

    Out

    side

    Pix

    el

    ( )( ) [ ( ) ( )]

    ( )

    ( ) ( )( )

    =

    Njiyxji

    Njiyxji

    d

    djiYyxY

    ,,,

    ,,,15,

    ,~

    Corrupted MB

    d4

    d1

    d2

    d3

    Y1(i,j)

    Y2(i,j)

    Y3(i,j)

    Y4(i,j)

    ( ){ }rightleft,bot,top,dir

    ~11

    = =

    N

    i

    OUTi

    INi

    dirsm YmvYNd

    Inter BMA (Boundary match algorithm)

  • 55

    Effects of JM Intra Error Concealment

    JM Intra

  • 56

    Results of Inter Error Concealment (packet loss rate = 5%, QP=24)

    Upper left: correct frame, Upper right: corrupted frame without error concealment, Lower left: concealed frame by proposed method, Upper middle: concealed frame by fixed 88 method, Lower right: concealed frame by JM (fixed 1616)

  • 57

    H.264 baseline

    CAVLC

  • 58

    H.264

    H.264MPEG-1/2() (1/4CABAC)

    MPEG-2 : SP(ASP) : AVC = 1 : 2-3 : 4-5 MPEG-2 : SP(ASP) : AVC = 1 : 1.5 : 2-3

    H.264

  • 59

    H.264 Profiles (1/4)

    BaselineB-CAVLC (no CABAC)Flexible Macroblock Ordering (FMO)Arbitrary Slice Ordering (ASO)redundant slices)DVB-HDMB (Digital Multimedia Broadcasting)3GPPMBMS (Multimedia Broadcast/Multicast Service)BaselineApplevideo iPod

    MainBaseline (interlaced)BCABAC

    FMOASORSMainSDTV

  • 60

    H.264 Profiles (2/4)

    ExtendedBaselineFMOASOSISPno CABAC

    HighMainMain88DCT

    HighMainDVB-TISDB-THighHDTV

    High(Blu-ray Disc)

  • 61

    H.264 Profiles (3/4)

    High1010

    High4:2:24:2:2

    High4:4:44:4:412

    High 10

  • 62

    H.264 Profiles (4/4)

  • 63

    H.264

    1/4-pixel ME/MCMultiple and flexible reference framesAdaptive MB size selection for Inter prediction

    Intra predictionInteger transform

    CABAC

    In-loop deblocking filter

  • 64

    Comparing key features of H.264 with MPEG-2 (MP@ML) and MPEG-4 ASP

  • DVB-HRTPH.264 NALU

  • 66

    DVB-H

    H.264 VCLH.264 NALNAL(NAL unit, NALU)HE-AAC v2AU (Access Unit)TS 102 005 IP Datacast(handhelddevices with severe limitations on computational resoureces and battery) H.264NALURFC 3984RTPRTPUDP (User Datagram Protocol)IP (Internet Protocol)IP datagramIP datagramMPEMPE-FECMPEG-2 Systems MPEG-2(Transport Stream, TS)

  • 67

    RTP (Real-time Transport Protocol)NALURTPThe RTP defines a standardized packet format for delivering audio and video over the Internet. It was developed by the Audio-Video Transport Working Group of the IETF as RFC 3550.Protocols like SIP, RTSP, H.225 and H.245 are used for session initiation, control and termination. Other standards like H.264, MPEG, H.263 etc., are used to encode the payload data (specified via RTP Profile).An RTP sender captures the multimedia data, which are then encoded as frames and transmitted as RTP packets, with appropriate timestamps and increasing sequence numbers. Depending on the RTP Profile in use, the Payload Type field is set. The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application. The combined size of the IP/UDP/RTP header is 20+8+12 = 40 bytes.The fields in the header are as follows:

    Bit Offset 0-1 2 3 4-7 8 9-15 16-31

    0 Ver. P X CC M Payload Type Sequence Number

    32 Timestamp

    64 SSRC identifier

    96 CSRC identifiers (optional)

  • 68

    RTP Header (1/2)Ver.: (2 bits) Indicates the version of the protocol. Current version is 2.P (Padding): (1 bit) Used to indicate if there are extra padding bytes at the end of the RTP packet, to make the payload a multiple of 32 bits.X (Extension): (1 bit) Indicates presence of an Extension header between standard header and payload data. This is application / profile specific.CC (CSRC Count): (4 bits) Contains the number of CSRC identifiers that follow the fixed header.M (Marker): (1 bit) Used at the application level and is defined by a profile. If it is set, it means that the current data has some special relevance for the application.

    Video: mark the end of a frame.Audio: mark the beginning of a talk spurt.

    PT (Payload Type): (7 bits) Indicates the format of the payload and determines its interpretation by the application. This is specified by an RTP profile. Some examples: PCM -law (0), GSM (3), MPEG Audio (14), G.728 (15); Motion JPEG (26), H.261 (31), MPEG-1 video (32), MPEG-2 video (33), H.263 (34).See http://www.iana.org/assignments/rtp-parameters for the latest update.The RTP payload type of H.264 is not specified. For DVB-H, it is otherwise specified by SDP (as payload of FLUTE/ALT).

    http://www.iana.org/assignments/rtp-parameters

  • 69

    RTP Header (2/2)Sequence Number : (16 bits) Used by the receiver to detect packet loss. The sequence number is incremented by one for each RTP data packet sent. The RTP does not take any action when it sees a packet loss, but it is left to the application to take the desired action. Timestamp: (32 bits) reflects the sampling instant of the first audio/video byte in the RTP packet. The granularity of the timing is application specific, derived from a sampling clock at the sender. For example, for an audio application with 8k samples/sec, the timestamp clock increments by one for every 125 s. If the audio AP generates chunks of 160 samples, the timestamp increases by 160. It can be used to remove the jitter at the receiver by delay buffer. When several media streams are present, the timestamps are independent in each stream, and may not be relied upon for media synchronization. -> RTCPSSRC : (32 bits) Synchronization source identifier uniquely identifies the source of a stream.

    Assigned randomly by the source.Distinct SSRC for each stream in an RTP session (defined by RTP/RTCP port no. + IP address)

    CSRC (optional) : Contributing source IDs enumerate contributing sources to a stream which has been generated from multiple sources. The number of CSRC identifiers is determined by CC (CSRC Count.Extension header : (optional).

  • 70

    RTP over UDP

    RTP typically runs on top of UDP. The sending side encapsulates a media chunk within an RTP packet, then encapsulates the packet in a UDP segment, and then hands the segment to IP.Recall that the UDP header has only four fields, each consisting of two bytes:

    Source port numberDestination port numberLength: entire datagram (header plus data) in bytesCheckSum (1s complement of sums of 2-byte chunks)

  • 71

    RTCP (Real-Time Control Protocol)

    RTCP is a separate control protocol to provide feedback to RTP data source, and all other session participants (IP multicast fashion). RTCP packets are transmitted by each participant in an RTP session to all other participants in the session. RTCP uses the same transport service as RTP (usually UDP).RTP and RTCP packets are distinguished from each other through the use of distinct port numbers. RTCP port number = RTP port number + 1. RTCP packets do not encapsulate chunks of audio or video. Instead, RTCP packets are sent periodically and contain sender and/or receiver reports that announce statistics that can be useful to the application. These statistics include number of packets sent, number of packets lost, and interarrival jitter.

  • 72

    RTCP (cont.)For each RTP stream that a sender is transmitting, the sender creates and transmits RTCP sender report packets. These packets include

    The SSRC of the RTP streamThe timestamp and wall clock time of the most recently generated RTP packet in the streamThe number of packets/bytes sent in the stream

    For typical multimedia applications, the (relative) time stamps (32 bits, in units of the associated sampling period) of the video and audio RTP packets are individually generated by the video and audio clock sources, a third clock source, called wall clock is necessary to resolve the audio-video synchronization issue (lip synchronization). The wall clock is an absolute Unix time (64 bits, in units of 2-32second).Sender reports can be used to synchronize different media streams within an RTP session. Each RTCP sender report contains, for the most recently generated RTP packet in the associated RTP stream, the timestamp of the RTP packet and the wall clock time when the packet was created. Thus the RTCP sender report associates the sampling clock with the real-time clock. Receivers can use this association in RTCP sender reports to synchronize the playout of audio and video.

  • 73

    Lip Synchronization in DVB-HThe timestamp of the wall clock, along with either video or audio timestamp, is provided in the RCPT packet. Call the RTCP packet that contains the time stamps of both audio (or video) clock and wall clock the A_RTCP (or V_RTCP).The following outlines the basic steps of the lip synchronization.

    1) Empty the audio output buffer. 2) Search an A_RTCP. Let the receiver reference clock (RRC) be its wall clock TS. Feed the decoded audio samples to the audio output buffer.3) When a V_RTCP is present, the RRC time stamp associated with the current video frame is denoted as TV. However, if the V_RTCP is not present, we then add TV by TV = V / 90,000, where V is the difference of time stamps given in previous and present (video) RTP packets. Decode the video frame.4) Monitor the consumed number of audio samples (per channel) SC in the audio output buffer. If necessary, decode next audio packet. Update TS by TS = TS + SC / Fs, where Fs is the audio sampling rate.5) If TS < TV, go back to step 4. If TS > TV + TV, drop the current video frame. Otherwise (TV< TS < TV + TV), render (show) the current frame.6) If A_RTCP is present for current audio packet, then we should set TS = TS - SR / Fs, where SR is the number of samples remained in the audio output buffer. The discontinuity of TSshould be used to determine if deletion or insertion of audio samples is necessary to accommodate the sampling rate difference between the transmitter and the receiver. 7) Decode the audio packet and go back to step 3.

  • 74

    RFC 3984 (H.264RTP)

    Marker bit (M): 1 bitSet for the very last packet of the access unit indicated by the RTP timestamp, in line with the normal use of the M bit in video formats, to allow an efficient playout buffer handling. For aggregation packets (STAP and MTAP), the marker bit in the RTP header MUST be set to the value that the marker bit of the last NAL unit of the aggregation packet would have been if it were transported in its own RTP packet. Decoders MAY use this bit as an early indication of the last packet of an access unit, but MUST NOT rely on this property.Informative note: Only one M bit is associated with an aggregation packet carrying multiple NAL units. Thus, if a gateway has re-packetized an aggregation packet into several packets, it cannot reliably set the M bit of those packets.Payload type (PT): 7 bitsThe assignment of an RTP payload type for this new packet format is outside the scope of RFC 2984 and will not be specified here. The assignment of a payload type has to be performed either through the profile used or in a dynamic way.For DVB-H, the payload type (such as H.264) is otherwise specified by SDP (as payload of FLUTE/ALT), along with IP, port number, program title, etc.

  • 75

    RFC 3984 (cont.)Sequence number (SN): 16 bitsSet and used in accordance with RFC 3550. For the single NALU and non-interleaved packetization mode, the sequence number is used to determine decoding order for the NALU.Timestamp: 32 bitsThe RTP timestamp is set to the sampling timestamp of the content. A 90 kHz clock rate MUST be used. If the NAL unit has no timing properties of its own (e.g.,parameter set and SEI NAL units), the RTP timestamp is set to the RTP timestamp of the primary coded picture of the access unit in which the NAL unit is included.Receivers SHOULD ignore any picture timing SEI messages included in access units that have only one display timestamp. Instead, receivers SHOULD use the RTP timestamp for synchronizing the display process.If one access unit has more than one display timestamp carried in a picture timing SEI message, then the information in the SEI message SHOULD be treated as relative to the RTP timestamp, with the earliest event occurring at the time given by the RTP timestamp, and subsequent events later, as given by the difference in SEI message picture timing values. Let tSEI1, tSEI2, ..., tSEIn be the display timestamps carried in the SEI message of an access unit, where tSEI1 is the earliest of all such timestamps. Let tmadjst() be a function that adjusts the SEI messages time scale to a 90-kHz time scale. Let TS be the RTP timestamp. Then, the displaytime for the event associated with tSEI1 is TS. The display time for the event with tSEIx, where x is [2..n] is TS + tmadjst (tSEIx - tSEI1).

  • 76

    NALU HeaderAll NAL units consist of a single NAL unit type octet, which also co-serves as the payload header of the RTP payload format. A NAL unit contains an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and byte stream is identical except that each NAL unit can be preceded by a start code prefix (0x000001 or 0x00000001) and extra padding bytes in the byte stream format. Note: DVB-H is a packet-oriented system.The NAL unit type octet has the following format:

    Bit Position 0 1-2 3-7

    F NRI Type

    The semantics of the components of the NAL unit type octet, as specified in the H.264 specification, are described briefly below. F: 1 bit

    forbidden_zero_bit. The H.264 specification declares a value of 1 as a syntax violation.NRI (nal_ref_idc): 2 bitsType (nal_unit_type): 5 bits

  • 77

    Byte stream NAL unit syntax and semanticsbyte_stream_nal_unit( NumBytesInNALunit ) {

    while( next_bits( 24 ) != 0x000001 &&next_bits( 32 ) != 0x00000001 )

    leading_zero_8bits /* equal to 0x00 */if( next_bits( 24 ) != 0x000001 )

    zero_byte /* equal to 0x00 */start_code_prefix_one_3bytes /* equal to 0x000001 */nal_unit( NumBytesInNALunit )while( more_data_in_byte_stream( ) &&

    next_bits( 24 ) != 0x000001 &&next_bits( 32 ) != 0x00000001 )

    trailing_zero_8bits /* equal to 0x00 */ }

  • 78

    NALU syntax

    nal_unit( NumBytesInNALunit ) { // NumBytesInNALunit: size of the NAL unit in bytesforbidden_zero_bit // = 0nal_ref_idc // 2 bitsnal_unit_type // 5 bitsNumBytesInRBSP = 0 // initializationfor( i = 1; i < NumBytesInNALunit; i++ ) {

    if( i + 2 < NumBytesInNALunit && next_bits( 24 ) = = 0x000003 ) {rbsp_byte[ NumBytesInRBSP++ ] // 0x00rbsp_byte[ NumBytesInRBSP++ ] // 0x00i += 2emulation_prevention_three_byte /* equal to 0x03 */

    } elserbsp_byte[ NumBytesInRBSP++ ]

    }}

  • 79

    NALU semantics (1/2)nal_ref_idc equal to 0 for a NAL unit containing a slice or slice data partition indicates that the slice or slice data partition is part of a non-reference picture. It is not 0 otherwise. A non-reference picture is not used for inter prediction of any other pictures.A reference picture (with nal_ref_idc not equal to 0) is marked as used for short-term reference or "used for long-term reference".nal_unit_type specifies the type of RBSP data structure contained in the NAL unit. VCL NAL units are specified as those NAL units having nal_unit_type equal to 1 to 5, inclusive. All remaining NAL units are called non-VCL NAL units.raw byte sequence payload (RBSP): A syntax structure containing an integer number of bytes that is encapsulated in a NAL unit.emulation_prevention_three_byte is a byte equal to 0x03. When an emulation_prevention_three_byte is present in the NAL unit, it shall be discarded by the decoding process. (Discard the three bytes 0x000003)The emulation prevention byte 0x03 may be deliberately inserted by an encoder to ensure that no sequence of consecutive byte-aligned bytes in the NAL unit contains a start code prefix.

  • 80

    NALU semantics (2/2)Within the NAL unit, the following three-byte sequences shall not occur at any byte-aligned position:

    0x0000000x0000010x000002

    Within the NAL unit, any four-byte sequence that starts with 0x000003 other than the following sequences shall not occur at any byte-aligned position:

    0x000003000x000003010x000003020x00000303

    When nal_unit_type is equal to 0, particular care must be exercised in the design of encoders to avoid the presence of the above-listed three-byte and four-byte patterns at the beginning of the NAL unit syntax structure, as the syntax element emulation_prevention_three_byte cannot be the third byte of a NAL unit.The last byte of the NAL unit shall not be equal to 0x00.

  • 81

    NAL unit type codes

  • 82

    NALU type code specified in RFC 3984 as RTP Payload

    The payload format defines three different basic payload structures. A receiver can identify the payload structure by the first byte of the RTP payload. This byte is always structured as a NAL unit header. The NAL unit type field indicates which structure is present. The possible structures are as follows:

    Single NAL Unit Packet: Contains only a single NAL unit in the payload. The NAL header type field will be equal to the original NAL unit type; i.e., in the range of 1 to 23, inclusive. Specified in section 5.6 of RFC 3984. Aggregation packet: Packet type used to aggregate multiple NAL units into a single RTP payload. This packet exists in four versions, the Single-Time Aggregation Packet type A (STAP-A), the Single-Time Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet (MTAP) with 24-bit offset (MTAP24). The NAL unit type numbers assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and 27, respectively. Specified in section 5.7 of RFC 3984. Fragmentation unit: Used to fragment a single NAL unit over multiple RTP packets. Exists with two versions, FU-A and FU-B, identified with the NAL unit type numbers 28 and 29, respectively. Specified in section 5.8 of RFC 3984.

  • 83

    Packetization Modes Three cases of packetization modes:

    Single NAL unit modeNon-interleaved modeInterleaved mode

    The single NAL unit mode is targeted for conversational systems that comply with ITU-T Recommendation H.241. The non-interleaved mode is targeted for conversational systems that may not comply with ITU-T Recommendation H.241. In the non-interleaved mode, NAL units are transmitted in NAL unit decoding order. The interleaved mode is targeted for systems that do not require very low end-to-end latency. The interleaved mode allows transmission of NAL units out of NAL unit decoding order. The packetization mode in use MAY be signaled by the value of the OPTIONAL packetization-mode MIME parameter or by external means. The used packetizationmode governs which NAL unit types are allowed in RTP payloads. Table 3 summarizes the allowed NAL unit types for each packetization mode. Some NAL unit type values are reserved for future extensions.

  • 84

    Summary of allowed NAL unit types for each packetization mode (yes = allowed, no = disallowed, ig = ignore)

    Type Packet Single NAL UnitModeNon-Interleaved

    ModeInterleaved

    Mode0 undefined ig ig ig

    1-23 NAL unit yes yes no

    24 STAP-A no yes no

    25 STAP-B no no yes

    26 MTAP16 no no yes

    27 MTAP24 no no yes

    28 FU-A no yes yes

    29 FU-B no no yes

    30-31 undefined ig ig ig

  • H.264

  • 86

    Coding Layers in H.264 (1/2)1. coded video sequenceA coded video sequence consists of a series of access units that are sequential in the NAL unit stream and use only one sequence parameter set. Each coded video sequence can be decoded independently of any other coded video sequence, given the necessary parameter set information, which may be conveyed in-band or out-of-band. At the beginning of a coded video sequence is an instantaneous decoding refresh (IDR) picture. 1.5 access unit (See next page)2. picture: field or frameIDR picture: A coded picture in which all slices are I or SI slices that causes the decoding process to mark all reference pictures as "unused for reference" immediately after decoding the IDR picture. After the decoding of an IDR picture all following coded pictures in decoding order can be decoded without inter prediction from any picture decoded prior to the IDR picture.

  • 87

    Access UnitA set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. Some supplemental enhancement information (SEI) containing data such as picture timing information may also precede the primary coded picture.Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of data. If the coded picture is the last picture of a coded video sequence, an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to indicate that the stream is ending.

  • 88

    Coding Layers in H.264 (2/2)3. slice

    I slice (intra slice): A slice that is not an SI slice that is decoded using prediction only from decoded samples within the same slice.P slice (predictive slice): A slice that may be decoded using intra prediction from decoded samples within the same slice or inter prediction from previously-decoded reference pictures, using at most one motion vector and reference index to predict the sample values of each block.B slice (bi-predictive slice): A slice that may be decoded using intra prediction from decoded samples within the same slice or inter prediction from previously-decoded reference pictures, using at most two motion vectors and reference indices to predict the sample values of each block.SI slice, SP slice (Extended profile)

    4. macroblock

  • 89

    Hierarchical structure of NALU stream

  • 90

    Parameter Sets in H.264One very fundamental design concept of H.264 is to generate self- contained packets, to make mechanisms such as the header duplication of RFC 2429 or MPEG-4's Header Extension Code (HEC) unnecessary. This was achieved by decoupling information relevant to more than one slice from the media stream. This higher layer meta information should be sent reliably, asynchronously, and in advance from the RTP packet stream that contains the slice packets. (Provisions for sending this information in-band are also available.) The combination of the higher-level parameters is called a parameter set. The H.264 specification includes two types of parameter sets: sequence parameter set and picture parameter set. An active sequence parameter set remains unchanged throughout a coded video sequence, and an active picture parameter set remains unchanged within a coded picture. The sequence and picture parameter set structures contain information such as picture size, optional coding modes employed, and macroblock to slice group map. To be able to change picture parameters (such as the picture size) without having to transmit parameter set updates synchronously to the slice packet stream, the encoder and decoder can maintain a list of more than one sequence and picture parameter set. Each slice header contains a codeword that indicates the sequence and picture parameter set to be used.

  • 91

    Important Definitions

    byte stream: An encapsulation of a NAL unit stream containing start code prefixes and NAL units.raw byte sequence payload (RBSP): A syntax structure containing an integer number of bytes that is encapsulated in a NAL unit. An RBSP is either empty or has the form of a string of data bits containing syntax elements followed by an RBSP stop bit and followed by zero or more subsequent bits equal to 0.sequence parameter set: A syntax structure containing syntax elements that apply to zero or more entire coded video sequences as determined by the content of a seq_parameter_set_id syntax element found in the picture parameter set referred to by the pic_parameter_set_id syntax element found in each slice header.picture parameter set: A syntax structure containing syntax elements that apply to zero or more entire coded pictures as determined by the pic_parameter_set_idsyntax element found in each slice header.quantisation parameter: A variable used by the decoding process for scaling of transform coefficient levels.

  • 92

    Parameter Set use with reliable "out-of-band" parameter set exchange

  • 93

    Sequence parameter set RBSP (1/2)

    seq_parameter_set_rbsp( ) { profile_idc // profile indicator (8 bits). Baseline = 66, Main = 77.constraint_set0_flag // = 1 for Baseline profile constraints.constraint_set1_flag // = 1 for Main profile constraints.constraint_set2_flag // = 1 for Extended profile constraints.constraint_set3_flagreserved_zero_4bits /* equal to 0 */level_idc // level indicator (8 bits)seq_parameter_set_id /* identifies the sequence parameter set that is referred to by the picture parameter set. The value of seq_parameter_set_id shall be in the range of 0 to 31, inclusive. */if( profile_idc = = 100 | | profile_idc = = 110 | |

    profile_idc = = 122 | | profile_idc = = 144 ) { // Main profile or above

    }log2_max_frame_num_minus4 // MaxFrameNum = 2( log2_max_frame_num_minus4 + 4 )

  • 94

    Sequence parameter set RBSP (2/2)

    pic_order_cnt_type // specifies the method to decode picture order countnum_ref_frames // specifies the maximum number of reference frames.gaps_in_frame_num_value_allowed_flagpic_width_in_mbs_minus1pic_height_in_map_units_minus1 // map = slice group mapframe_mbs_only_flag /* = 1 specifies that every coded picture of the coded video sequence is a coded frame containing only frame macroblocks; = 0 either fields or frames are possible.if( !frame_mbs_only_flag )

    mb_adaptive_frame_field_flag /* = 1 specifies the possible use of switching between frame and field macroblocks within frames. */

    direct_8x8_inference_flag // method for derivation of MVs in Direct mode.frame_cropping_flag /* = 1 specifies that the frame cropping offset parameters follow next in the sequence parameter set. */

    }

  • 95

    Picture parameter set RBSP (1/2)

    pic_parameter_set_rbsp( ) {pic_parameter_set_id /* identifies the picture parameter set that is referred to in the slice header. The value of pic_parameter_set_id shall be in the range of 0 to 255, inclusive. */seq_parameter_set_id // refers to the active sequence parameter set.entropy_coding_mode_flag // = 0 using Exp-Golomb and CAVLC; = 1 using CABACpic_order_present_flag // presence of picture order count syntax elements in the slice headersnum_slice_groups_minus1if( num_slice_groups_minus1 > 0 ) {

    slice_group_map_type // specifies how the mapping of slice group map units to slice groups is coded

    }

  • 96

    Picture parameter set RBSP (2/2)pic_init_qp_minus26 /* relative to 26 */ pic_init_qs_minus26 /* relative to 26 */ chroma_qp_index_offsetdeblocking_filter_control_present_flagconstrained_intra_pred_flagredundant_pic_cnt_present_flagif( more_rbsp_data( ) ) {

    transform_8x8_mode_flag // whether 8x8 transform is in usepic_scaling_matrix_present_flagif( pic_scaling_matrix_present_flag )

    }second_chroma_qp_index_offset

    }rbsp_trailing_bits( )

    }

  • 97

    Slice layer without partitioning RBSP / Slice headerslice_layer_without_partitioning_rbsp( ) { // one possible (simplest) slice RBSP

    slice_header( )slice_data( ) /* all categories of slice_data( ) syntax */rbsp_slice_trailing_bits( )

    }slice_header( ) {

    first_mb_in_slice // specifies the address of the first macroblock in the slice.slice_type // see next pagepic_parameter_set_id // specifies the picture parameter set in use.frame_num /* an identifier for pictures. If the current frame is an IDR frame, frame_num = 0 and PrevRefFrameNum = 0. */ if( !frame_mbs_only_flag ) { // frame_mbs_only_flag appears in sequence PS.

    field_pic_flag // = 1 specifies that the slice is a slice of a coded field.if( field_pic_flag )

    bottom_field_flag}if( nal_unit_type = = 5 ) idr_pic_id // an IDR picture

  • 98

    Slice header (cont.)if( pic_order_cnt_type = = 0 ) { }if( pic_order_cnt_type = = 1 && !delta_pic_order_always_zero_flag ) { }if( redundant_pic_cnt_present_flag )

    redundant_pic_cnt // = 0 for a primary coded picture; > 0 for a redundant coded picture. if( slice_type = = B )

    direct_spatial_mv_pred_flag // specifies the method for deriving Direct MVs; spatial or temporal.if( slice_type = = P | | slice_type = = SP | | slice_type = = B ) { }ref_pic_list_reordering( )if( ( weighted_pred_flag && ( slice_type = = P | | slice_type = = SP ) ) | | ( weighted_bipred_idc = = 1 && slice_type == B ) ) pred_weight_table( )if( nal_ref_idc != 0 ) dec_ref_pic_marking( )if( entropy_coding_mode_flag && slice_type != I && slice_type != SI )

    cabac_init_idc /* specifies the initialisation for context variables. */slice_qp_delta // specifies the value of QSY for all MBs in SP and SI slices.

    }

  • 99

    Slice Type

    slice_type Name of slice_type0 P1 B

    2 I3 SP4 SI5 P6 B7 I8 SP9 SI

    slice_type values in the range 5..9 specify, in addition to the coding type of the current slice, that all other slices of the current coded picture shall have a value of slice_type equal to the current value of slice_type or equal to the current valueof slice_type 5.

  • 100

    Variables derived in slice headers

    The variable MbaffFrameFlag is derived as follows.MbaffFrameFlag = mb_adaptive_frame_field_flag && !field_pic_flag

    Mbaff = macroblock-adaptive frame/field decodingThe variable for the picture height in units of macroblocks is derived as follows

    PicHeightInMbs = FrameHeightInMbs / ( 1 + field_pic_flag )The variable for picture height for the luma component is derived as follows

    PicHeightInSamplesL = PicHeightInMbs * 16 The variable for picture height for the chroma component is derived as follows

    PicHeightInSamplesC = PicHeightInMbs * MbHeightCThe variable PicSizeInMbs for the current picture is derived according to:

    PicSizeInMbs = PicWidthInMbs * PicHeightInMbsThe variable MaxPicNum is derived as follows.- If field_pic_flag is equal to 0, MaxPicNum is set equal to MaxFrameNum.- Otherwise (field_pic_flag is equal to 1), MaxPicNum is set equal to 2*MaxFrameNum.The variable CurrPicNum is derived as follows.- If field_pic_flag is equal to 0, CurrPicNum is set equal to frame_num.- Otherwise (field_pic_flag is equal to 1), CurrPicNum is set equal to 2 * frame_num + 1.

  • 101

    Reference Pictures

    reference picture: A picture with nal_ref_idc not equal to 0. A reference picture contains samples that may be used for inter prediction in the decoding process of subsequent pictures in decoding order.reference picture list: A list of reference pictures that is used for inter prediction of a P, B, or SP slice. For the decoding process of a P or SP slice, there is one reference picture list. For the decoding process of a B slice, there are two reference picture lists.reference picture list 0: A reference picture list used for inter prediction of a P, B, or SP slice. All inter prediction used for P and SP slices uses reference picture list 0. reference picture list 1: A reference picture list used for inter rediction of a B slice. Reference picture list 1 is one of two lists of reference picture lists used for inter prediction for a B slice, with the other being reference picture list 0.direct prediction: An inter prediction for a block for which no motion vector is decoded. Two direct prediction modes are specified that are referred to as spatial direct prediction and temporal prediction mode.

  • 102

    Reference picture list reordering syntax and semantics (1/2)ref_pic_list_reordering( ) {

    if( slice_type != I && slice_type != SI ) {ref_pic_list_reordering_flag_l0 /* equal to 1 specifies that reordering_of_pic_nums_idc is present for specifying reference picture list 0. Note that l0 = list 0 */if( ref_pic_list_reordering_flag_l0 )

    do {reordering_of_pic_nums_idc /* together with abs_diff_pic_num_minus1 or long_term_pic_num specifies which of the reference pictures are re-mapped. */if( reordering_of_pic_nums_idc = = 0 || reordering_of_pic_nums_idc = = 1 )

    abs_diff_pic_num_minus1else if( reordering_of_pic_nums_idc = = 2 )

    long_term_pic_num} while( reordering_of_pic_nums_idc != 3 )

    }

  • 103

    Reference picture list reordering syntax and semantics (2/2)

    if( slice_type = = B ) {ref_pic_list_reordering_flag_l1 /* equal to 1 specifies that reordering_of_pic_nums_idc is present for specifying list 1. Note that l1 = list 1. */if( ref_pic_list_reordering_flag_l1 )

    do {reordering_of_pic_nums_idcif( reordering_of_pic_nums_idc = = 0 ||

    reordering_of_pic_nums_idc = = 1 )abs_diff_pic_num_minus1 /* plus 1 specifies the absolute difference between the picture number of the picture being moved to the current index in the list and the picture number prediction value. */

    else if( reordering_of_pic_nums_idc = = 2 )long_term_pic_num /* specifies the long-term picture number of the picture being moved to the current index in the list. */

    } while( reordering_of_pic_nums_idc != 3 )}

    }

  • 104

    Reordering_of_pic_nums_idc operations for reordering of reference picture lists

    reordering_of_pic_nums_idc Reordering specified

    0abs_diff_pic_num_minus1 is present and corresponds to a difference to subtract from a picture number prediction value

    1abs_diff_pic_num_minus1 is present and corresponds to a difference to add to a picture number prediction value

    2 long_term_pic_num is present and specifies the long-term picture number for a reference picture

    3 End loop for reordering of the initial reference picture list

  • 105

    Decoding process for reference picture lists construction

    A reference index is an index into a reference picture list. When decoding a P or SP slice, there is a single reference picture list RefPicList0. When decoding a B slice, there is a second independent reference picture list RefPicList1 in addition to RefPicList0. At the beginning of decoding of each slice, reference picture list RefPicList0, and for B slices RefPicList1, are derived as the initialisation process.The number of entries in the modified reference picture list RefPicList0 is num_ref_idx_l0_active_minus1 + 1, and for B slices the number of entries in the modified reference picture list RefPicList1 is num_ref_idx_l1_active_minus1 + 1. A reference picture may appear at more than one index in the modified reference picture lists RefPicList0 or RefPicList1.

  • 106

    Decoded reference picture marking syntaxdec_ref_pic_marking( ) {

    if( nal_unit_type = = 5 ) { // an IDR pictureno_output_of_prior_pics_flaglong_term_reference_flag

    } else {adaptive_ref_pic_marking_mode_flag

    if( adaptive_ref_pic_marking_mode_flag )do {

    memory_management_control_operationif( memory_management_control_operation == 1 || memory_management_control_operation == 3)difference_of_pic_nums_minus1

    if(memory_management_control_operation == 2)long_term_pic_num

    if( memory_management_control_operation == 3 || memory_management_control_operation == 6)long_term_frame_idx

    if( memory_management_control_operation == 4)max_long_term_frame_idx_plus1

    } while( memory_management_control_operation != 0 )}

    }

  • 107

    Memory management control operation

    memory_management_control_operation Memory Management Control Operation

    0 End memory_management_control_operation syntax element loop1 Mark a short-term reference picture as unused for reference2 Mark a long-term reference picture asunused for reference

    3 Mark a short-term reference picture as "used for long-term reference" and assign a long-term frame index to it

    4Specify the maximum long-term frame index and mark all long-term reference pictures having long-term frame indices greater than the maximum value as "unused for reference"

    5 Mark all reference pictures as "unused for reference" and set theMaxLongTermFrameIdx variable to"no long-term frame indices"

    6 Mark the current picture as "used for long-term reference" and assign along-term frame index to it

  • 108

    Reference Picture Marking

    A reference picture (i.e., a picture with nal_ref_idc !=0) is stored in the Decoded Picture Buffer (DPB) and is marked as used for short-term reference or "used for long-term reference". A short-term reference picture is identified by its FrameNum.A long-term reference picture is identified by its LongTermFrameIdx.A reference picture can be used as a reference for inter prediction when decoding a frame until the picture is marked as unused for reference. A picture can be marked as "unused for reference" by the sliding window reference picture marking process, a first-in, first-out mechanism or by a customised adaptive memory control reference picture marking process.A short-term reference picture is identified for use in the decoding process by its variables FrameNum and FrameNumWrap and its picture number PicNum, and a long-term reference picture is identified for use in the decoding process by its long-term picture number LongTermPicNum.

  • 109

    Decoding process for picture numbers

    When the current picture is not an IDR picture, the assignment of the variables FrameNum, FrameNumWrap, PicNum and LongTermPicNum is as follows.To each short-term reference picture the variables FrameNum and FrameNumWrapare assigned as follows. First, FrameNum is set equal to the syntax element frame_num that has been decoded in the slice header(s) of the corresponding short-term reference picture. Then the variable FrameNumWrap is derived as

    if( FrameNum > frame_num )FrameNumWrap = FrameNum MaxFrameNum

    elseFrameNumWrap = FrameNum

    Each long-term reference picture has an associated value of LongTermFrameIdx.To each short-term reference picture a variable PicNum is assigned, and to each long-term reference picture a variable LongTermPicNum is assigned.

  • 110

    Initialisation process for the reference picture list for P slices

    The reference picture list RefPicList0 is ordered so that short-term reference frames have lower indices than long-term reference frames.The short-term reference frames are ordered starting with the frame with the highest PicNum value and proceeding through in descending order to the frame or complementary field pair with the lowest PicNum value.The long-term reference frames are ordered starting with the frame with the lowest LongTermPicNum value and proceeding through in ascending order to the frame with the highest LongTermPicNumvalue.

  • 111

    Reference picture ordering example (P slice, list0).

    Operation list0(0) list0(1) list0(2) list0(3) list0(4)

    Initial state - - - - -

    Encode frame 150 150 - - - -

    Encode 151 151 150 - - -

    Encode 152 152 151 150 - -

    Encode 153 153 152 151 150 -

    Encode 154 154 153 152 151 150

    Encode 155 155 154 153 152 151

    Assign 154 to LongTermPicNum 3 155 153 152 151 3

    Encode 156 and mark it as LongTermPicNum 1 155 153 152 1 3

    Encode 157 157 155 153 1 3

    Reference picture list is initially empty. Current frame_num is 150. Maximum size of the DPB is 5 frames. Italics indicate a LongTermPicNum.

  • 112

    Slice data syntaxslice_data( ) {

    if( entropy_coding_mode_flag ) // specified in the picture parameter setwhile( !byte_aligned( ) ) cabac_alignment_one_bit // stuffed 1s up to byte boundary

    CurrMbAddr = first_mb_in_slice * ( 1 + MbaffFrameFlag ) // specified and derived in slice headermoreDataFlag = 1prevMbSkipped = 0do {

    if( slice_type != I && slice_type != SI )if( !entropy_coding_mode_flag ) {

    mb_skip_run // specifies the number of consecutive skipped MBs (P_Skip or B_Skip)prevMbSkipped = ( mb_skip_run > 0 )for( i=0; i

  • 113

    Slice data syntax (cont.)if( moreDataFlag ) {

    if( MbaffFrameFlag && ( CurrMbAddr % 2 = = 0 || ( CurrMbAddr % 2 = = 1 && revMbSkipped ) ) )mb_field_decoding_flag // = 1 for a field MB, = 0 for a frame MB

    macroblock_layer( ) }if( !entropy_coding_mode_flag ) moreDataFlag = more_rbsp_data( )else {

    if( slice_type != I && slice_type != SI ) prevMbSkipped = mb_skip_flagif( MbaffFrameFlag && CurrMbAddr % 2 = = 0 ) moreDataFlag = 1else {

    end_of_slice_flag /* = 0 specifies that another MB is following in the slice, = 1 specifies the end of the slice and that no further macroblock follows. */

    moreDataFlag = !end_of_slice_flag}

    }CurrMbAddr = NextMbAddress( CurrMbAddr )

    } while( moreDataFlag )}

  • 114

    Macroblock layer syntaxmacroblock_layer( ) {

    mb_type // specifies the macroblock type whose semantics of mb_type depend on the slice type.if( mb_type == I_PCM ) {

    while( !byte_aligned( ) )pcm_alignment_zero_bit // stuffed 0s up to byte boundary

    for( i = 0; i < 256; i++ )pcm_sample_luma[ i ] // luma sample values

    for( i = 0; i < 2 * MbWidthC * MbHeightC; i++ )pcm_sample_chroma[ i ] // chroma sample values

    } else { // mb_type != I_PCM noSubMbPartSizeLessThan8x8Flag = 1if( mb_type != I_NxN && MbPartPredMode( mb_type, 0 ) != Intra_16x16

    && NumMbPart( mb_type ) == 4 ) {sub_mb_pred( mb_type ) // mode prediction for submacroblocksfor( mbPartIdx = 0; mbPartIdx < 4; mbPartIdx++ )

    if( sub_mb_type[ mbPartIdx ] != B_Direct_8x8 ) {if( NumSubMbPart( sub_mb_type[ mbPartIdx ] ) > 1 )

    noSubMbPartSizeLessThan8x8Flag = 0} else if( !direct_8x8_inference_flag )

    noSubMbPartSizeLessThan8x8Flag = 0

  • 115

    Macroblock layer syntax (cont.)} else {

    if( transform_8x8_mode_flag && mb_type = = I_NxN )transform_size_8x8_flag // = 1 for 8x8 transform, = 0 (default) for 4x4 transform

    mb_pred( mb_type ) // mode prediction for macroblocks }if( MbPartPredMode( mb_type, 0 ) != Intra_16x16 ) {

    coded_block_pattern // which of 8x8 blocks may contain non-zero transform coefficient levelsif( CodedBlockPatternLuma > 0 && transform_8x8_mode_flag && mb_type != I_NxN

    &&noSubMbPartSizeLessThan8x8Flag &&( mb_type != B_Direct_16x16 || direct_8x8_inference_flag ) )

    transform_size_8x8_flag}if( CodedBlockPatternLuma > 0 || CodedBlockPatternChroma > 0 || MbPartPredMode( mb_type, 0 ) == Intra_16x16 ) {

    mb_qp_delta // can change the value of QPY in the macroblock layer.residual( )

    }}

    }

  • 116

    mb_type symanticsTables and semantics are specified for the various macroblock types for I, SI, P, SP, and B slices. Each table presents

    the value of mb_type, the name of mb_type, the number of macroblock partitions used (given by the NumMbPart( mb_type )function),the prediction mode of the macroblock (when it is not partitioned) or the first partition (given by the MbPartPredMode( mb_type, 0 ) function) and the prediction mode of the second partition (given by the MbPartPredMode( mb_type, 1 ) function).

    mb_type Name of mb_type NumMbPart( mb_type )MbPartPredMode

    ( mb_type, 0 )MbPartPredMode

    ( mb_type, 1 )MbPartWidth

    ( mb_type )MbPartHeight

    ( mb_type )

    0 P_L0_16x16 1 Pred_L0 na 16 16

    1 P_L0_L0_16x8 2 Pred_L0 Pred_L0 16 8

    2 P_L0_L0_8x16 2 Pred_L0 Pred_L0 8 16

    3 P_8x8 4 na na 8 8

    4 P_8x8ref0 4 na na 8 8

    inferred P_Skip 1 Pred_L0 na 16 16

    One example: Macroblock type values 0 to 4 for P and SP slices

  • 117

    Residual data syntaxresidual( ) {

    if( !entropy_coding_mode_flag ) // CAVLC or CABACresidual_block = residual_block_cavlc

    elseresidual_block = residual_block_cabac

    if( MbPartPredMode( mb_type, 0 ) = = Intra_16x16 )residual_block( Intra16x16DCLevel, 16 )

    for( i8x8 = 0; i8x8 < 4; i8x8++ ) /* each luma 8x8 block */if( !transform_size_8x8_flag || !entropy_coding_mode_flag )

    for( i4x4 = 0; i4x4 < 4; i4x4++ ) { /* each 4x4 sub-block of block */if( CodedBlockPatternLuma & ( 1

  • 118

    Residual block CAVLC syntaxresidual_block_cavlc( coeffLevel, maxNumCoeff ) {

    for( i = 0; i < maxNumCoeff; i++ )coeffLevel[ i ] = 0

    coeff_tokenif( TotalCoeff( coeff_token ) > 0 ) {

    if( TotalCoeff( coeff_token ) > 10 && TrailingOnes( coeff_token ) < 3 )suffixLength = 1

    elsesuffixLength = 0

    for( i = 0; i < TotalCoeff( coeff_token ); i++ )if( i < TrailingOnes( coeff_token ) ) {

    trailing_ones_sign_flaglevel[ i ] = 1 2 * trailing_ones_sign_flag

    } else {level_prefix

    } // end of else and for

  • 119

    Residual block CAVLC syntax (cont.)if( TotalCoeff( coeff_token ) < maxNumCoeff ) {

    total_zeroszerosLeft = total_zeros

    } else zerosLeft = 0for( i = 0; i < TotalCoeff( coeff_token ) 1; i++ ) {

    if( zerosLeft > 0 ) {run_beforerun[ i ] = run_before

    } else run[ i ] = 0zerosLeft = zerosLeft run[ i ]

    }run[ TotalCoeff( coeff_token ) 1 ] = zerosLeftcoeffNum = -1for( i = TotalCoeff( coeff_token ) 1; i >= 0; i-- ) {

    coeffNum += run[ i ] + 1coeffLevel[ coeffNum ] = level[ i ]

    }}

    }

  • 120

    Jae-Beom Lee and Hari Kalva, The VC-1 and H.264 Video Compression Standards for Broadband Video Services, Springer 2008. I. E. G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia, John Wiley & Sons, 2003.Tutorial Issue on the MPEG-4 Standard, Signal Processing: Image Communication, vol. 15, no. 4, Jan. 2000.Special Issue on the H.264/AVC Video Coding Standard, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, July 2003.G. J. Sullivan and T. Wiegand, Video compression - from concepts to the H.264/AVC Standard, Proceedings of the IEEE, vol. 93, no. 1, pp. 18-31, Jan. 2005.D. Marpe, T. Wiegard, and G. J. Sullivan, The H.264/MPEG4 advanced video coding standard and its applications, IEEE Communications Magazine, vol. 44, no.8, pp. 134-143, Aug. 2006.T. Wiegard and G. J. Sullivan, The H.264/AVC video coding standard, IEEE Signal Processing Magazine, vol. 24, no. 2, pp. 148-153, March 2007.H.264/AVC(invited paper)2007

  • 121

    (cont.)ISO/IEC International Standard 14496-10, Information technology Coding of audio-visual objects Part 10: Advanced Video Coding, third edition, Dec. 2005, corrected version, March 2006.H.264/AVC Software Coordination, http://iphome.hhi.de/suehring/tml/IETF, RFC 3984, RTP payload for transport of H.264, 2005.IETF, RFC 3640, RTP payload for transport for generic MPEG-4 elementary streams, 2003.EBU, Digital Video Broadcasting (DVB); Transmission system for handheld terminals (DVB-H), ETSI EN 302 304, v1.1.1, 2004.EBU, Digital Video Broadcasting (DVB); Specification for the use of video and audio coding in DVB services delivered directly over IP protocols, ETSI TS 102 005, v1.3.1, 2007.G. Faria, J. A. Henriksson, E. Stare, and P. Talmola, DVB-H: digital broadcast services to handheld devices, Proceedings of the IEEE, vol. 94, no. 1, pp. 194 - 209, Jan. 2006.M. Kornfeld and G. May, DVB-H and IP datacastbroadcast to handheld devices,IEEE Trans. Broadcasting, vol. 53, no. 1, pp.161- 170, March 2007.

    http://iphome.hhi.de/suehring/tml/

  • 122

    (due 10/28, Wed.)

    H.264 baseline profileMPEG-2 IPPP

    H.264parameter setsMPEG-2 sequence headerpicture header parameter sets

    H.264(transcoding)MPEG-2

    JM H.264

    H.264MPEG-4/ H.264MPEG-4MPEG-4 (part 2)()DVB-H ETSI TS 102 005DVB-HH.264(capability)MPEGH.264H.264/MPEG-4 AVCH.264Functional Blocks and Features of H.264.H.264(variable block size)Runtime Percentages of Functional Blocks in H.264/AVC Baseline EncoderH.264 Inter-Mode DecisionRDcost and Inter-Mode Decision(Paris)Test Sequences (examples)Optimal Mode Distribution (QP=28)Inter-Frame Prediction in P SlicesInter-Frame Prediction in B SlicesRate-Constrained Motion Estimation1/4(Fractional-Sample Motion Compensation)Blocking ArtifactsAdvantages of In-Loop Deblocking FilteringH.264 In-Loop Deblocking Filtering Phase I: Boundary Analysis (1/2)H.264 In-Loop Deblocking Filtering Phase I: Boundary Analysis (2/2)H.264 In-Loop Deblocking Filtering Phase II: Filtering (1/2)H.264 In-Loop Deblocking Filtering Phase II: Filtering (2/2)Complexity Issues of In-Loop Deblocking (Intra Prediction)4 4 Intra Prediction Example1616 luma Intra prediction modes and chroma Intra prediction 44 DCTScan order of transform coefficient levelsCoded Block Pattern (CBP)Exp-Golomb Codes (Exponential-Golomb code)CAVLC in H.264CAVLC Sections (1/2)CAVLC Sections (2/2)Example: 7,6,-2,0,-1,0,0,1,0,0,0,0,0,0,0,0 (Error Protection Mechanisms)Use Case of Error ProtectionError Resilience Tools (also available in older standards) 1/2Error Resilience Tools (also available in older standards) 2/2 Parameter Sets in H.264New Error Resilience Tools in H.264 (1/2): FMO (Flexible Macroblock Ordering)New Error Resilience Tools in H.264 (2/2): ASO and RSH.264 H.264 JM (informative)ECEffects of JM Intra Error ConcealmentResults of Inter Error Concealment (packet loss rate = 5%, QP=24)H.264 baselineH.264H.264 Profiles (1/4)H.264 Profiles (2/4)H.264 Profiles (3/4)H.264 Profiles (4/4)H.264Comparing key features of H.264 with MPEG-2 (MP@ML) and MPEG-4 ASPDVB-HRTPH.264 NALUDVB-H RTP (Real-time Transport Protocol)RTP Header (1/2)RTP Header (2/2)RTP over UDPRTCP (Real-Time Control Protocol)RTCP (cont.)Lip Synchronization in DVB-HRFC 3984 (H.264RTP)RFC 3984 (cont.)NALU HeaderByte stream NAL unit syntax and semanticsNALU syntaxNALU semantics (1/2)NALU semantics (2/2)NAL unit type codesNALU type code specified in RFC 3984 as RTP Payload Packetization Modes Summary of allowed NAL unit types for each packetization mode (yes = allowed, no = disallowed, ig = ignore) H.264Coding Layers in H.264 (1/2)Access UnitCoding Layers in H.264 (2/2)Hierarchical structure of NALU streamParameter Sets in H.264Important DefinitionsParameter Set use with reliable "out-of-band" parameter set exchangeSequence parameter set RBSP (1/2)Sequence parameter set RBSP (2/2)Picture parameter set RBSP (1/2)Picture parameter set RBSP (2/2)Slice layer without partitioning RBSP / Slice headerSlice header (cont.)Slice TypeVariables derived in slice headersReference PicturesReference picture list reordering syntax and semantics (1/2)Reference picture list reordering syntax and semantics (2/2)Reordering_of_pic_nums_idc operations for reordering of reference picture listsDecoding process for reference picture lists constructionDecoded reference picture marking syntaxMemory management control operationReference Picture MarkingDecoding process for picture numbersInitialisation process for the reference picture list for P slicesReference picture ordering example (P slice, list0). Slice data syntaxSlice data syntax (cont.)Macroblock layer syntaxMacroblock layer syntax (cont.)mb_type symanticsResidual data syntaxResidual block CAVLC syntaxResidual block CAVLC syntax (cont.) (cont.) (due 10/28, Wed.)

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure true /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles true /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /NA /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /LeaveUntagged /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice