Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser MMOG Daosheng Mu, Lead...

Preview:

Citation preview

Using The New Flash Stage3D Web Technology To Build Your Own Next 3D Browser MMOG

Daosheng Mu, Lead Programmer

Eric Chang, CTO

XPEC Entertainment Inc.

Outline

• Brief of Speakers• Introduction of Adobe Flash Stage3D API• XPEC Flash 3D Engine• Optimization for Flash Program• Future Works• Conclusion• Q & A

Brief of Speakers

• Eric Chang– 19 Years of Game Industry

Experiences– Cross-platform 3D Game

Engine Development– PC/Console/Web

Brief of Speakers

• Daosheng Mu– 4.5 Years of Cross-platform 3D Game Engine Development

Experiences– PC/Console/Web

Why Flash?Native C/C++ vs. Unity vs. Flash

Native C/C++ Unity Flash

DevelopmentDifficulty High Low Mid

Ease ofCross Platform Low High High

Performance High Mid Low

Market Popularity Low Mid

High(>95%)

Project C4 Demo Video

Introduction of Adobe Flash Stage3D API

Stage3D

• Support all browsers

Stage3D

• Stage3D includes with GPU-accelerated 3D APIs– Z-buffering– Stencil/Color buffer– Vertex shaders– Fragment shaders– Cube textures– More…

Stage3D

• Pros:– GPU accelerated API– Relies on DirectX, OpenGL, OpenGL ES– Programmable pipeline

• Cons:– No support of alpha test– No support of high-precision texture format

Stage3D

ResourceNumber allowedTotal memoryVertex buffers 4096 256 MBIndex buffers 4096 128 MB Programs 4096 16 MB Textures 4096 128 MB*Cube textures 4096 256 MB Draw call limits 32,768

*350 MB is absolute limit for textures, 340 MB is the result we gather

AGAL

• Adobe Graphics Assembly Language– No support of ‘if-else’ statements– No support of ‘constants’

Program3D

XPEC Flash 3D Engine

Model Pipeline

• Action Message Format (AMF):– Native ByteArray compression– Native object serialization

3DS Max

EngineLoader

Exporter ColladaBinary

Converter AMF

AMFEngineRender

XPEC Flash 3D Engine

• Application: update/render on CPU• Command buffer: store graphics API

instruction

Application DriverCPU

XPEC Flash 3D Engine: Application

Object3D•Material•Geometry

Update•UpdateDeltaTime•UpdateTransform

Scene management•Scene partition•Frustum culling

Update•UpdateHierarchy

Draw•SetMaterial•SetGeometry

Stage3D•Set Stage3D APIs

Scene Management

• Goal: Minimize draw calls as possible• Indoor Scene

– BSP tree

• Outdoor Scene– Octree/Quad tree– Cell– Grid

Scene Management: Project C4• Grid partition• Object3D: (MinX, MaxX), (MinY, MaxY)

(0, 0)

(2, 2)

(4, 4)

(0,0),

(1,2)

(3,4),(0,2)

y

x

Scene Management: Project C4

• Frustum: (MinX, MaxX), (MinY, MaxY)

(0, 0)

(2, 2)

(4, 4)

(1,4),(0,4)

(0,0),

(1,2)

(3,4),(0,2)

y

x

XPEC Flash 3D Engine:Command Buffer

Initialize

• createVertex/Index Buffer

• createTexture

• createProgram

Begin

• clear• setRend

erToTexture

Draw

• setVertex/Index Buffer

• setProgram• setProgram

Constants• setRenderS

tate• setTextureA

t• drawTriangl

es

• Avoid user/kernel mode transition• Decrease shader patching

– “Material sorting”• Reduce draw call

– “Shared buffers”– “Dynamic batching”

Material Sorting

• Opaque/Translucent

Material Sorting

• State management• 1047/2598 draw calls

0

20

40

60

CPU waiting GPURender loop

Ela

psed

tim

e(m

s)

020406080

100

CPU waiting GPURender loop

Ela

psed

tim

e(m

s)

Before sorting(ms) After sorting(ms)

NVIDIA 8800 GT- 1047 draw calls

Render loop elapsed time

16 16

Total elapsed time

41 40

NVIDIA 8800 GT- 2598 draw calls

Render loop elapsed time

36 36

Total elapsed time

50 50

Before sorting(ms) After sorting(ms)

NVIDIA 6600 GT- 1047 draw calls

Render loop elapsed time

34 31

Total elapsed time

53 48

NVIDIA 6600 GT- 2598 draw calls

Render loop elapsed time

81 64

Total elapsed time

89 89

Shared Buffers

• Problem:– Numbers of buffers are limited

ResourceNumber allowedTotal memoryVertex buffers 4096 256 MBIndex buffers 4096 128 MB Programs 4096 16 MB

Shared BuffersVertex Buffer

Index Buffer

Vertex Buffer

Index Buffer

Vertex Buffer

Index Buffer

Particle System

• Each particle property is computed on the CPU at each frame– Alpha, Color,

LinearForce, Size, Speed, UV

– Facing

Particle System

• Index buffer– Indices will not be changed

• Vertex buffer– Problem:

• Particle amount depends on frame• Upload data to vertex buffer frequently

Particle System

StaticIndex Buffer

DynamicVertex Buffer

Vertex Data

Skinned Model

• Problem:– Lesser vertex constants

allowed• 128 constants per vertex

program

– Global vertex constants• Lighting, Fog, Const

Skinned Model

• 4x3 Matrix• Bone count per

geometry is limited to 29– “Split mesh”

128 constants / 3 = 42.6666 bones3 * 29 bones = 87 constants

Shadow Map

Shadow Map

present()

End frame

setRenderToBackBuffer()

Set shadow map

setRenderToTexture()

Clear shadow map Draw to shadow map

clear()

Clear back buffer

Shadow Map• Problem:

– Texture format: RGBA8– Artifact

• Aliasing• Popping while moving

• Size: 1024x1024• RGBA8 R32

Shadow Map

Shadow Map

• Percentage Closer Filtering (PCF) solution:– Hard shadow– Aliasing– Popping while moving

Shadow Map

• PCF

pw = 1/mapWidth

ph = 1/mapHeight

• Result = 0.5 * texel( 0, 0) + 0.125 * texel( -pw, +ph) + 0.125 * texel(-pw, -ph)+ 0.125 * texel( +pw, +ph) + 0.125 * texel(+pw, -ph)

(-pw , +ph) (+pw , +ph)

(0, 0)

(+pw , -ph)(-pw , -ph)

Shadow Map

• PCF based solution:

NVIDIA 6600GT - 1047 draw

calls

NVIDIA 6600GT - 1047 draw calls with

PCF

NVIDIA 8800GT - 1047 draw

calls

NVIDIA 8800GT - 1047 draw calls with

PCF

0

20

40

60

80

100

CPU waiting GPURender loop

Ela

ps

ed

tim

e(m

s)

Toon Shading

• Single pass– Problem: Dependent on no. of face

• Two passes– Scale vertex position following the vertex

normal– Not dependent on no. of face

𝑣→

:𝑣𝑖𝑒𝑤𝑣𝑒𝑐𝑡𝑜𝑟𝜃

𝑖𝑓 𝜃> h h𝑡 𝑟𝑒𝑠 𝑜𝑙𝑑 ,𝑑𝑟𝑎𝑤𝑡𝑜𝑜𝑛𝑐𝑜𝑙𝑜𝑟

𝑁→

:𝑣𝑒𝑟𝑡𝑒𝑥 𝑛𝑜𝑟𝑚𝑎𝑙

Toon Shading

• Enable back face• Scale vertex

position• Draw color

Toon• Enable front face• Draw material

General Result

Alpha Test

• Problem:– Stage3D without alpha test– “kil opcode in AGAL”

• Performance penalty on mobile device

Alpha Test• Solution:

Render loop time(ms)

Total time(ms)

6600GT alpha test

17~19 47

6600GT alpha blend

18~19 65~67

8800GT alpha test

0.16 37

8800GT alpha blend

0.3 36

• 304 draw calls• Alpha-test performance is better on

desktop

Replace alpha-test

with alpha-blend

Post Effect

OriginGlowDOFColor Filter

Static Lightmap

• Pros:– Pre-computation– Global illumination

• Cons:– More textures

Optimization for Flash Program

Optimization for Flash Program

• Problem:– For Each is slow

• “Use for-loop to replace it”

– Memory management• “Recycle manager”• “Strengthen garbage collection”

Optimization for Flash Program

• Solution:– Recycle manager

• Reduce garbage collection loading• Save objects initial time• public function

recycleObject3D( obj:IObject3D ):void• public function requestObject3D( classType:int ,

searchKey:*, renderHandle:int = 0 ):*

Optimization for Flash Program

• Solution:– Strengthen garbage collection

• Avoid inner function• Force to dereference function pointer• Dereference attribute in object destructor

• Avoid inner function• Force to dereference function pointer

Without inner function

Use inner function

Optimization for Flash Program

• Experiment: before vs. after– Switching among levels

Before improvement: After improvement :

Rapid loading

Rapid loading

• Streaming– Data compression

• PNG: swf compression: 20%~55%• Package: zip compression: 25~30%

– Batch loading• Separate resource to several packages• Download what you really need

Rapid loading

Enter to avatar stage

Enter to game stage

After loading picture finished

5Mb/sElapsed time (sec)

15 6 12

• game code• ui

• game scene • scene textures

Future Works

• Adobe Texture Format (ATF)– Support for compressed/mipmap textures on the

different GPU chipset

• FlasCC– C++ AS3 Compilation

• AS3 Workers– Multi-thread support

• MovieClip– Replace with Stage3D UI framework, ex: Starling

Conclusion

• Cross-Device/Cross-OS/Cross-Browser– Browser + Cloud Computing– Write Once, Run Anywhere

• Flash vs. HTML5• Cross-Compiling Technology Trend

– C/C++ + Flash/ActionScript– C/C++ + HTML5/JavaScript

Acknowledgements

• XPEC - Project C4 Team• XPEC - RDO Team

Q & AEllison_Mu@xpec.comEric_Chang@xpec.com