14
Copyright(c) 2013 Nippon Telegraph and Telephone Corporation Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo NITTA, Mitsuo IKEDA, and Atsushi SHIMIZU NTT Media Intelligence Laboratories 6/5/2013 1 DAC50, Designer Track, 156-VB543

Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

  • Upload
    raoul

  • View
    33

  • Download
    2

Embed Size (px)

DESCRIPTION

DAC50, Designer Track, 156-VB543. Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform. Kazuya YOKOHARI, Koyo NITTA, Mitsuo IKEDA, and Atsushi SHIMIZU NTT Media Intelligence Laboratories. Outline. Introduction Proposed Design Methodology - PowerPoint PPT Presentation

Citation preview

Page 1: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based

Platform

Kazuya YOKOHARI, Koyo NITTA,Mitsuo IKEDA, and Atsushi SHIMIZUNTT Media Intelligence Laboratories

6/5/2013 1

DAC50, Designer Track, 156-VB543

Page 2: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Outline

• Introduction• Proposed Design Methodology• Case Study: 4K HEVC Intra Codec• Evaluation• Conclusion

6/5/2013 2

Page 3: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Video Codec LSI

6/5/2013 3

• MPEG-2 and H.264/AVC are major standards of video coding.

• We have developed MPEG-2 video codec LSI (VASA) and H.264/AVC codec LSI (SARA).

• The development of video codec LSI needs many simulations.

Test data

VASA (MPEG-2)

SARA (H.264/AVC)

Bit Stream(Coded Image)

Codec LSI

• Coded image should be evaluated by subjective and objective evaluation.

• Degradations of some coded images are not detected by objective evaluation.

• Subjective evaluation in real-time is important to find these degradations.

Objective evaluation examples: BD-Bitrate, SSIM, PSNR

Page 4: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Existing LSI Design Flow

6/5/2013 4

Stimulus

Verification

Behavioral Synthesis

Verification

Logic Synthesis

P & R

SystemCsource codes

Verilog-RTL codes

Verilog-RTLcodes(already verified)

Fail

Pass

Fail

Pass

ASIC FPGA IP core

TechnologyLibrary

Behavioral design

RTL design

Gate-level design

• Even behavioral design which is fastest simulation environment needs 100 times simulation time, at the existing design flow.

• Fast simulation environment is important, since many simulations are needed at the video codec LSI design.

Simulation Speed

X100 (on CPU)

X1,000 (on CPU)X100 (on emulator)

X10,000 (on CPU)X1,000 (on emulator)

Existing architecture exploration loop

Page 5: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

The Problems of The Video Codec LSI Development

6/5/2013 5

• Many simulations are needed at the development of the video codec LSI.

• The simulation needs 100 times simulation time at the existing LSI design.

• To resolve above problems, simulation and circuit design environments are important to check and improve codec LSI performance smoothly.

• Simulation environment: FPGA-based platform.Real-time simulation becomes possible using FPGA.

Rapid prototyping becomes possible using high-level synthesis.

• Circuit design environment: High-level synthesis.

Page 6: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Video Codec Design Platform

6/5/2013 6

• The video codec design platform is able to run large scale circuit simulation in real-time using many FPGAs.

• The proposed platform enables input and output image data in real-time using some SDI interfaces.

FPGA1 FPGA2

FPGA3 FPGA4

FPGA(Center

)

SDI interface

• The proposed platform has many FPGAs, since the scale of a product level video codec LSI is very large.

• This platform enables simulations of a product level circuit using many FPGAs.

Page 7: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Proposed Video Codec Design Flow (1/2)

6/5/2013 7

Stimulus

Verification

Behavioral Synthesis

Verification

Logic Synthesis

P & R

SystemCsource codes

Verilog-RTL codes

Verilog-RTLcodes(already verified)

Fail

Pass

Fail

Pass

ASIC FPGA IP core

TechnologyLibrary

Behavioral design

RTL design

Gate-level design

•Proposed design flow enables rapid prototyping using high-level synthesis.•Proposed design flow enables real-time simulation using the proposed platform.

Simulation Speed

X100 (on CPU)

X1,000 (on CPU)X100 (on emulator)

X10,000 (on CPU)X1,000 (on emulator)

Existing architecture exploration loop

Proposed architecture exploration loop

X1(on video codec design platform)

•Feedback time is needed by repetition of each design steps when single architecture exploration loop is used.

GOOD

NOT GOOD

Page 8: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Proposed Video Codec Design Flow (2/2)

6/5/2013 8

Stimulus

Verification

Behavioral Synthesis

Verification

Logic Synthesis

P & R

SystemCsource codes

Verilog-RTL codes

Verilog-RTLcodes(already verified)

Fail

Pass

Fail

Pass

ASIC FPGA IP core

TechnologyLibrary

Behavioral design

RTL design

Gate-level design

• Circuits design is subdivided and parallel design is performed, in order to reduce feedback time by repetition of each design steps.

• Using parallel design, architecture exploration is realized at high speed.

Simulation Speed

X100 (on CPU)

X1,000 (on CPU)X100 (on emulator)

X10,000 (on CPU)X1,000 (on emulator)

Existing architecture exploration loop

Proposed architecture exploration loop

X1(on video codec design platform)

Page 9: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Summary of The Proposed Design Methodology

The proposed parallel design methodology has three features.1. High-level synthesis.

– Using high-level synthesis, a target circuit architecture can be easily changed and tuned compared with a RTL design methodology.

2. Video codec design platform.– Using video codec design platform, a subjective image

evaluation can be performed, since the proposed platform can perform simulation in real-time.

3. Parallel design.– Using parallel design and high-level synthesis, the function

addition in smaller unit becomes possible that leads to the reduction of a feedback time.

6/5/2013 9

Combining these three features, an effect of subjective image quality for each function can be evaluated and used for architecture exploration.

Page 10: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Case Study: 4K HEVC Intra Codec

6/5/2013 10

IntraPrediction

Transform and

Quantization

EntropyCoding

Video Coding

InputData

OutputStream

• HEVC (High Efficiency Video Coding) is a next generation video coding standard.

• HEVC intra codec consists of three blocks, intra prediction, transform and quantization, and entropy coding block.

Intra Prediction generates prediction difference image from input data and predicted image data.

Transform and Quantization

generates quantized values from transformed difference image and reconstruction image from quantized values.

Entropy Codinggenerates bit stream from quantized values.

Page 11: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

The Specifications of the HEVC Intra Codec

6/5/2013 11

*CU stands for Coding Unit.*PU stands for Prediction Unit.*TU stands for Transform Unit.*HM is a reference software of HEVC

• Prediction Mode

2

0: Planar1: DC

10

18

26

34

STEP1 STEP2 (LOOP#1)

STEP2 (LOOP#2)

STEP2(LOOP#3)

Intra Prediction

•PU: 32x32•Prediction Mode: 4

•Prediction Mode: 7

•PU: 64x64, 16x16

Transform and Quantization

•TU: 32x32 •TU: 16x16

Entropy Coding

•CU: 32x32 •CU: 64x64

Base Algorithm

•HM3.0 •HM7.0 This slide’s scope.

Page 12: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Evaluation (1/2)

6/5/2013 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.5

1

1.5

2

2.5

3

3.5

4

4.5

050000100000150000200000250000300000350000400000450000500000

IPDTQECCycle

Area Cycle

1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

0

500

1000

1500

2000

2500

3000

3500Area Cycle

1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

0

500

1000

1500

2000

2500

3000

3500

Design Period (Month)

Area Cycle

STEP1

STEP2LOOP#1

STEP2LOOP#2

STEP2LOOP#3

Circuits Performances and Design Period

The main changed points of each block.• LOOP#1: Version up base algorithm of

each block• LOOP#2: Functional expansion of IPD• LOOP#3: Functional expansion of each

block

• The circuit performances of each expanded function are evaluated at STEP2.

• The feedback data is available from other design loops at STEP2.

Subjective Evaluation Period

Feedback data is available

Subjective Evaluation Period

Page 13: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Evaluation (2/2)

6/5/2013 13

• Using the proposed parallel design methodology, three design loops were able to be tried in only seven months.

• Using the proposed parallel design methodology, the number of cycle*area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.2

0.4

0.6

0.8

1

1.2

STEP1, STEP2(LOOP#1)

STEP2(LOOP#2)

STEP2(LOOP#3)

Design Period (Month)

Cycle*Area

90% down

STEP1

STEP2

LOOP#180% down(four months)

LOOP#275% down(three months)

Page 14: Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform

Copyright(c) 2013 Nippon Telegraph and Telephone Corporation

Conclusion• We proposed that the new design methodology for

video codec LSI. Using the proposed design methodology, we are able to reduce feedback time and run simulation and evaluate coded image in real-time.

• Using the proposed design methodology, three design loops were able to be tried in only seven months.

• Using the proposed design methodology, the number of cycle * area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2.

• In order to realize a HEVC codec, we need to add or expand some functional tools, checking subjective evaluation of these tools.

6/5/2013 14