133
AGH University of Science and Technology Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Institute of Computer Science Dissertation for the degree of Doctor of Philosophy in Computer Science Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo Ray Tracing mgr in˙ z. Micha l Radziszewski supervisor: dr hab. in˙ z. Krzysztof Boryczko, prof. n. AGH Krak´ ow, June 2010

Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

AGH University of Science and Technology

Faculty of Electrical Engineering, Automatics,Computer Science and Electronics

Institute of Computer Science

Dissertation for the degree ofDoctor of Philosophy in Computer Science

Design and Implementation

of Parallel Light Transport Algorithms

based on quasi-Monte Carlo Ray Tracing

mgr inz. Micha l Radziszewski

supervisor:dr hab. inz. Krzysztof Boryczko, prof. n. AGH

Krakow, June 2010

Page 2: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Abstract

Photorealistic rendering is a part of computer graphics, which concentrates on creating imagesand animations based on 3D models. Its goal is creation of pictures that are indistinguishablefrom real world scenes. This work is dedicated to a particular class of photorealistic renderingalgorithms – quasi-Monte Carlo ray tracing based global illumination. Global illumination is avery useful concept for creating realistically lit images of artificial 3D scenes. Using automaticand correct computation of vast diversity of optical phenomena, it enables creating a renderingsoftware, which allows specification of what is to be rendered instead of detailed description of howto render a given scene. Its current applications range from many CAD systems to special effectsin movies. In future, when computers become sufficiently powerful, real time global illuminationmay be the best choice for computer games and virtual reality.

Currently, only Monte Carlo and quasi-Monte Carlo ray tracing based algorithms are generalenough to support full global illumination. Unfortunately, they are very slow compared to othertechniques, e.g. hardware accelerated rasterization. The main purpose of this thesis is an im-provement of efficiency of physically correct rendering. The thesis concentrates on enhancementof robustness of rendering algorithms, as well as parallel realization of them. These two elementstogether can substantially increase global illumination applicability, and are a step towards theultimate goal of being able to run true global illumination in real time.

i

Page 3: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Acknowledgements

I am deeply indebted to my supervisor, Prof. Krzysztof Boryczko. Without his inspiration,advice and encouragement this work would not have been completed. I would like to sincerelythank Dr Witold Alda from the AGH University of Science and Technology. During work on thisdissertation we have spent many hours in conversations. The expertise I gathered working withhim contributed substantially to the dissertation.

I would like to acknowledge the financial support from the AGH University of Science andTechnology – a two one year scholarships for PhD students. Finally, I also would like to acknowledgeZPORR (Zintegrowany Program Operacyjny Rozwoju Regionalnego) for scholarship “Ma lopolskieStypendium Doktoranckie”, co-founded by European Union.

ii

Page 4: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Contents

Abstract i

Acknowledgements ii

1 Introduction 11.1 Global Illumination and its Applications . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Purpose and Original Contributions . . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Light Transport Theory 42.1 Geometric Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Radiometric Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Light Transport Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 Surface Only Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Volumetric Scattering Extension . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3 Properties of Scattering Functions . . . . . . . . . . . . . . . . . . . . . . . 92.2.4 Analytic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.5 Simplifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.1 Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 Integral Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.3 Image Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Monte Carlo Methods 143.1 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.1 Statistical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.2 Estimators of Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.3 Biased and Unbiased Methods . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Variance Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.1 Importance and Multiple Importance Sampling . . . . . . . . . . . . . . . . 173.2.2 Russian Roulette and Splitting . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.3 Uniform Sample Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Quasi-Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.1 Desired Properties and Quality of Sample Sequences . . . . . . . . . . . . . 203.3.2 Low Discrepancy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.3 Randomized Quasi-Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . 233.3.4 Comparison of Monte Carlo and Quasi-Monte Carlo Integration . . . . . . . 233.3.5 Quasi-Monte Carlo Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Light Transport Algorithms 264.1 Ray Tracing vs. Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.1 View Dependent vs. View Independent Algorithms . . . . . . . . . . . . . . 274.1.2 Ray Tracing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

iii

Page 5: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CONTENTS iv

4.1.3 Hardware Accelerated Rasterization . . . . . . . . . . . . . . . . . . . . . . 284.1.4 Radiosity Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Light Transport Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2.1 Classification of Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.2 Construction of Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.3 Local Path Sampling Limitation . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Full Spectral Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3.1 Necessity of Full Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3.2 Representing Full Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3.3 Efficient Sampling of Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4 Analysis of Selected Light Transport Algorithms . . . . . . . . . . . . . . . . . . . 424.4.1 Path Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.4.2 Bidirectional Path Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4.3 Metropolis Light Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.4.4 Irradiance and Radiance Caching . . . . . . . . . . . . . . . . . . . . . . . . 514.4.5 Photon Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5 Combined Light Transport Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.2 Merging of an Unbiased Algorithm with Photon Mapping . . . . . . . . . . 584.5.3 Results and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Parallel Rendering 615.1 Stream Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1.1 Stream Processing Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.1.2 Extended Stream Machines with Cache . . . . . . . . . . . . . . . . . . . . 625.1.3 Stream Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 Parallel Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.2.1 Algorithm Initialization and Scene Description . . . . . . . . . . . . . . . . 655.2.2 Frame Buffer as an Output Stream . . . . . . . . . . . . . . . . . . . . . . . 655.2.3 Multipass Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2.4 Ray Tracing on an Extended Stream Machine . . . . . . . . . . . . . . . . . 665.2.5 Results and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3 Choice of Optimal Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.3.1 Shared Memory vs. Clusters of Individual Machines . . . . . . . . . . . . . 705.3.2 Multiprocessor Machines vs. Graphics Processors . . . . . . . . . . . . . . . 705.3.3 Future-proof Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.4 Interactive Visualization of Ray Tracing Results . . . . . . . . . . . . . . . . . . . . 715.4.1 Required Server Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.4.2 Client and Server Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 745.4.3 MIP-mapping Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.4.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6 Rendering Software Design and Implementation 826.1 Core Functionality Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.1.1 Quasi-Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 836.1.2 Ray Intersection Computation . . . . . . . . . . . . . . . . . . . . . . . . . 856.1.3 Spectra and Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.1.4 Extension Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.2 Procedural Texturing Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.2.1 Functional Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.2.2 Syntax and Semantic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.2.3 Execution Model and Virtual Machine API . . . . . . . . . . . . . . . . . . 936.2.4 Results and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.3 New Glossy Reflection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Page 6: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CONTENTS v

6.3.2 Properties of Reflection Functions . . . . . . . . . . . . . . . . . . . . . . . 986.3.3 Derivation of Reflection Function . . . . . . . . . . . . . . . . . . . . . . . . 996.3.4 Results and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7 Results 1047.1 Image Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047.2 Full Spectral Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.3 Comparison of Rendering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8 Conclusion 1098.1 Contributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1098.2 Final Thoughts and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Bibliography 112

Index 121

Page 7: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

List of Symbols

Φ radiant fluxE irradianceL radianceW importanceλ wavelengthfr bidirectional reflection distribution function (BRDF)ft bidirectional transmission distribution function (BTDF)fs bidirectional scattering distribution function (BSDF)fp phase functionN normal directionω unit directional vectorωi incident ray directionωo outgoing ray directionθ angle between ω and Nx, y points – either on a surface A or in a volume Vx2 − x1 normalized vector pointing from x1 to x2, x1 6= x2

x, x[k] a light transport path, path with k segments, k + 1 vertexesΛ space of all visible wavelengths, Λ = [λmin, λmax]Ω space of all unit directional vectors ωΩ+ space of all unit directional vectors such that ω N ≥ 0A space of all points on scene surfacesV space of all points in scene volume, without surfacesX space of all light transport pathsµ arbitrary measureσ(ω) solid angle measureσ⊥(ω) projected solid angle measureA(x) area measureV (x) volumetric measureσa absorption coefficientσs scattering coefficientσe extinction coefficientpdf probability density functioncdf cumulative distribution functionξ canonical uniform random variableδ Dirac delta distribution

vi

Page 8: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

List of Figures

2.1 Radiance definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Results of simplifications of light transport equation . . . . . . . . . . . . . . . . . 11

3.1 Uniform sample placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Quasi-Monte Carlo sampling patterns . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Comparison of Monte Carlo and quasi-Monte Carlo integration error . . . . . . . . 243.4 Undesired correlations between QMC sequences . . . . . . . . . . . . . . . . . . . . 253.5 Rendering with erroneous QMC sampling . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 An example light path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Extended light path notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Difficult light path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 Local path sampling limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.5 Full spectral and RGB reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.6 Full spectral and RGB refrefraction on prism . . . . . . . . . . . . . . . . . . . . . 344.7 Different methods for wavelength dependent scattering . . . . . . . . . . . . . . . . 384.8 Selection of optimal number of spectral samples . . . . . . . . . . . . . . . . . . . . 384.9 Various methods of sampling spectra . . . . . . . . . . . . . . . . . . . . . . . . . . 394.10 Imperfect refraction with dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . 404.11 Analysis of behaviour of spectral sampling . . . . . . . . . . . . . . . . . . . . . . . 414.12 A path generated by Path Tracing algorithm . . . . . . . . . . . . . . . . . . . . . 434.13 Results of Path Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.14 Simplified Path Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.15 Batch of paths generated with BDPT. . . . . . . . . . . . . . . . . . . . . . . . . . 464.16 Batch of paths generated with optimized BDPT. . . . . . . . . . . . . . . . . . . . 484.17 Comparison of BDPT and Photon Mapping . . . . . . . . . . . . . . . . . . . . . . 554.18 One pass versus two pass Photon Mapping . . . . . . . . . . . . . . . . . . . . . . . 564.19 Quickly generated images with one pass Photon Mapping . . . . . . . . . . . . . . 574.20 Light transport algorithms comparison . . . . . . . . . . . . . . . . . . . . . . . . . 594.21 Light transport algorithms comparison . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1 Stream machine basic architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 Extended stream machine architecture. . . . . . . . . . . . . . . . . . . . . . . . . . 635.3 Test scenes for parallel rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.4 Parallel rendering run times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5 Main loop of visualization client process. . . . . . . . . . . . . . . . . . . . . . . . . 755.6 Glare effect applied as a postprocess on the visualization client. . . . . . . . . . . . 775.7 Comparison of MIP-mapping and custom filtering based blur quality. . . . . . . . . 795.8 Results of Interactive Path Tracing. . . . . . . . . . . . . . . . . . . . . . . . . . . 795.9 Results of Interactive Photon Mapping. . . . . . . . . . . . . . . . . . . . . . . . . 805.10 Noise reduction based on variance analysis of Path Tracing. . . . . . . . . . . . . . 80

6.1 Ray-primitives interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.2 Semi-transparent surfaces intersection optimization . . . . . . . . . . . . . . . . . . 86

vii

Page 9: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

LIST OF FIGURES viii

6.3 Comparison between different gamut mapping techniques . . . . . . . . . . . . . . 896.4 Textured and untextured models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.5 Sample procedural texture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.6 Images generated using noise primitive . . . . . . . . . . . . . . . . . . . . . . . . . 956.7 Procedurally defined materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.8 Mandelbrot and Julia Fractals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.9 Comparison of different glossy BRDFs with little gloss . . . . . . . . . . . . . . . . 1026.10 Latitudal scattering only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1026.11 Longitudal scattering only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1026.12 Product of latitudal and longitudal scattering . . . . . . . . . . . . . . . . . . . . . 1026.13 Scattering with perpendicular and grazing illumination . . . . . . . . . . . . . . . . 1036.14 Complex dragon model rendered with our glossy material . . . . . . . . . . . . . . 103

7.1 Comparison of spectral rendering algorithms . . . . . . . . . . . . . . . . . . . . . . 1057.2 Full spectral rendering of a scene with imperfect refraction . . . . . . . . . . . . . . 1077.3 Rendering of indirectly visible caustics . . . . . . . . . . . . . . . . . . . . . . . . . 108

Page 10: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

List of Tables

4.1 Numerical error of spectral sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.1 Spectral functions for RGB colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886.2 Texturing language grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.3 Notation used in BRDF derivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.1 Convergence of spectral sampling techniques . . . . . . . . . . . . . . . . . . . . . . 1057.2 Convergence of selected rendering algorithms . . . . . . . . . . . . . . . . . . . . . 106

ix

Page 11: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

List of Algorithms

4.1 Optimized Metropolis sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2 Construction of photon maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.1 Kernel for Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2 Single- and multipass rendering on extended stream machine. . . . . . . . . . . . . . 665.3 Rasterization of point samples by Visualization client . . . . . . . . . . . . . . . . . 765.4 Visualization client repaint processing . . . . . . . . . . . . . . . . . . . . . . . . . . 776.1 Gamut mapping by desaturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2 Sample usage of procedural texturing . . . . . . . . . . . . . . . . . . . . . . . . . . 95

x

Page 12: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 1

Introduction

Photorealistic rendering is a part of computer graphics, which concentrates on creating still im-ages and animations based on 3D models. Its goal is creation of pictures that are indistinguishablefrom real world scenes. This work is dedicated to the particular class of photorealistic renderingalgorithms – ray tracing based global illumination. In the rest of this chapter, a brief descrip-tion of global illumination is presented, followed by the outline of the most interesting originalcontributions, and finally, the thesis organization.

1.1 Global Illumination and its Applications

The global illumination term is used to name the two distinct types of capabilities of renderingalgorithms – there are two commonly used and substantially different definitions of it. According tothe first definition, global illumination effects are just opposed to local illumination effects, whereonly direct lighting is accounted for while rendering any part of the scene. Thus, any phenomenonwhich depends on knowledge of other scene parts, while rendering a given primitive, is a globalillumination effect. Such effects are, for example, shadow casting and environment mapping. Onthe other hand, according to the second definition, any rendering algorithm capable of globalillumination must be able to simulate all possible interreflections of light between scene primitives.Since the second definition is much more useful and precise, it is used through the rest of the thesis.

Global illumination is a very useful concept for creating realistically lit images of artificial 3Dscenes. Computing automatically and correctly vast diversity of optical phenomena, it creates asolid basis for rendering software, which allows specification of what is to be rendered instead ofdetailed description of how to render a given scene. Its current applications range from many CADsystems to special effects in movies.

Global illumination algorithms are responsible for the rendering process. Within the frameworkpresented later they are easy to implement, however design of an effective, robust and physicallycorrect global illumination algorithm is a difficult and still not fully solved task. By physicalcorrectness we understand complete support of geometric optics based phenomena. Currently, onlyray tracing based algorithms are general enough to render all geometric optic effects. Unfortunately,the price to pay for such an automatization is that the evaluation of global illumination is slowcompared to other techniques, e.g. hardware accelerated rasterization.

However, it is widely believed that, when computers become fast enough, global illuminationis likely to replace other, less physically accurate techniques. This statement is often supportedby the fact that a similar breakthrough is already happening in modeling domain, where physicalmodels compete with more traditional approaches with good effect. For example, nowadays it ispossible to model a cloth animation either as a mesh with a given mass, stiffness, etc. and let themodel to calculate its appearance over time, or as a sequence of keyframes, each frame laboriously

1

Page 13: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 1. INTRODUCTION 2

created by animator. Currently both models can be run in real time, while the simulation was notplausible a few years ago.

1.2 Thesis Purpose and Original Contributions

The main purpose of this thesis is to improve an efficiency of physically correct rendering. Theidea is to provide working software with rigorous theoretical basis. Our assumption is to avoid twoextremes – overly theoretical work, without care about algorithms implementation and algorithmscreated by method of trials and errors, designed to produce good enough visual effects, whichsometimes work and sometimes do not. The latter approach is, unfortunately, surprisingly commonin real time graphics, despite it does not provide any information about algorithm correctness.Moreover, accounting for hardware development trends is very important for us, since this cansignificantly affect performance of algorithms.

The thesis concentrates on enhancement of robustness of rendering algorithms, as well as theirparallel realization. These two elements together can substantially increase global illuminationapplicability, and are a step towards the ultimate goal of being able to run true global illuminationin real time. In effort to realize it, this dissertation provides several original contributions in thefield of computer graphics. This section enumerates most important and interesting of them, listedin order in which they appear in the thesis.

Light transport theory. Typically light transport theory assumes creation of image from pixelsby convolution of luminance with filter function associated with each pixel. There is nothingincorrect with this concept, but it limits generality of rendering algorithms. Instead, we representimage as a 3D function defined over [0, 1]2×λ, where unit square represents image film surface andλ is wavelength of light. This generalization allows using much more sophisticated post-processingtechniques, but it makes invalid any rendering algorithm dependent on pixel abstraction, however.This approach is explained in details in Section 2.3.

Full spectral rendering. Despite being proven incorrect, a lot of global illumination algorithmsare designed to use an RGB color model. Only few most advanced approaches attempt to accu-rately simulate visually pleasing full spectral phenomena. We have designed and implementedan improved full spectral support, based on Multiple Importance Sampling. This new techniqueis much more efficient at simulating non-idealized wavelength dependent phenomena, and fitselegantly into Monte Carlo and quasi-Monte Carlo ray tracing algorithms. The novel spectralsampling technique is defined in Section 4.3.

Light transport algorithms. First, we have adapted Photon Mapping to be one pass technique.The modified algorithm starts storing just few photons, and later, the photon map size is increased.This aspect allows rendering images with progressive quality improvement, a feature impossible toobtain in the original, two pass variants of this technique. Surprisingly, dynamic photon map comesat a little performance penalty for typical rendered scenes. Second, we have provided a detailedpotential error prediction technique for some most significant ray tracing algorithms. This featureis then used in the presented new rendering algorithm, which tries to select the most appropriatemethod to render any part of an image. Both algorithms are presented in Sections 4.4 and 4.5.

Parallel rendering. We provide an extension to the stream processor model to support read-write memory. The extension guarantees coherency of all pieces of written data, but the orderof different reading and writing operations is not preserved. Therefore, the correctness of anyalgorithm must not depend on content of this kind of memory, but the algorithm may use it toaccelerate its operations. This mechanism is used in parallel implementation of one pass versionof photon mapping, as well as in our new rendering algorithm. The extended stream machine isdescribed in Chapter 5. Moreover, we have designed and implemented an interactive viewer of

Page 14: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 1. INTRODUCTION 3

ray tracing results, based on processing power of GPUs. The viewer works in parallel with CPUbased renderer, which generates new data while previous is displayed. This concept is explainedin Section 5.4.

Sampling oriented interface. We have designed an interface between 3D objects, cameras andrendering algorithms based entirely on sampling. This interface provides clear abstraction layer,general enough to express majority of ray tracing algorithms. It is designed to simplify the imple-mentation of bidirectional light transport methods. Furthermore, the interface provides supportfor spectral rendering and carefully designed quasi-Monte Carlo sequence generation infrastructure.Additionally, we have developed a technique for storing 2D surfaces and 3D participating media inthe same ray intersection acceleration structure, using the sampling interface. If a rendered scenecontains roughly similar number of different 2D and 3D entities, the improved algorithm is nearlytwice as fast as algorithm using two separate structures. The design of this interface is explainedin Section 6.1.

Materials. We have provided a shading language optimized for ray-tracing. The new conceptis based on usage of a functional language for this purpose. The language is computationallycomplete and enables easy creation of complex material scripts. The script style resembles muchmore mathematical notation than classic imperative programming languages. The language ispresented in Section 6.2. Moreover, we have developed a new glossy reflection model, which istogether symmetric and energy preserving. Derivation of its formulae is given in Section 6.3.

1.3 Thesis Organization

The second and third chapters provide a theoretical basis used in the rest of the thesis. Thesecond chapter presents a brief introduction to the light transport theory under the assumptionof geometric optics applicability. It explains how illumination in a 3D scene can be describedmathematically as an integral equation (so called Light Transport Equation), and how to use itssolution to create images. The third chapter describes Monte Carlo integration as a general purposenumerical technique, giving details on selected approaches to improve its efficiency.

The fourth chapter shows how to apply the Monte Carlo integration to solve Light TransportEquation, which leads to variety of so called non-deterministic ray tracing algorithms. The mainpoint of this chapter is, however, the analysis of strong and weak points of major existing lighttransport algorithms, as well as proposal of new algorithm, designed to efficiently cope with manyof their issues. Finally, this chapter expresses how efficiently incorporate full spectral rendering intopresented methods. The fifth chapter illustrate the potential of parallel rendering. Furthermore,it gives the idea of how to express ray tracing as an algorithm dedicated to slightly extendedstreaming machine, and describes variety of hardware, as potential candidates for implementationof streaming machine. In this chapter there is also presented an interactive previewer of ray tracingresults. The sixth chapter presents the design of rendering software for developing and evaluatinglight transport algorithms, and some most interesting implementation details.

The seventh chapter provides the most important results. Then, it contains detailed comparisonof efficiency of all presented light transport algorithms. The last chapter summarizes the originalcontributions, and finally, it presents some ideas of future work dedicated to rendering.

Page 15: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 2

Light Transport Theory

The light transport theory provides a theoretical basis for an image creation process. Thetheory used in this thesis is based on the assumption of geometric optics applicability. It describesequilibrium radiance distribution over an entire scene, and additionally, a method for the calculatedradiance to form a final image. In computer graphics, light transport was first described formallyby Kajiya [Kajiya, 1986]. More recent works, which cover this area, are [Veach, 1997] and [Pharr &Humphreys, 2004]. Arvo’s course [Arvo, 1993] and Pauly’s thesis [Pauly, 1999] provide a detaileddescription of light transport in volumes.

This chapter starts with explanation of assumptions of geometric optics. Next, it presents theequation describing radiance distribution in both forms – for surfaces placed in vacuum only, andfor volumetric effects as well. Because all these equations are commonly known, they are discussedbriefly. Finally, it is shown how computed radiance is used to form a final image. The presentedimage formation theory is a modified approach, designed to remove assumption that image is builtfrom pixels.

2.1 Geometric Optics

There are a few physical theories describing the behaviour of light. Most important of them are:geometric optics, wave optics and quantum optics. In simplification, geometric and wave opticsexplain transport of light and quantum optics describes its interaction with matter. Each of thesetheories predicts selected real-world phenomena with certain degree of accuracy, and likely noneof them is completely correct. Choice of a physical theory for simulation is a tradeoff between thedesired accuracy of a solution and a computational cost. An interesting fact is that, for computergraphics needs, the simplest theory, geometric optics, typically is sufficient to provide high degreeof realism. Geometric optics based rendering is perfectly fine at capturing phenomena such assoft shadows, indirect lighting, dispersion, etc. However, despite its physical simplicity, very fewrendering systems provide full support of geometric optics based rendering, and any application,which attempts to do so, is far too slow to be used in real time.

2.1.1 Assumptions

The theoretical model of geometric optics is based on the following simplifying assumptions,which significantly improve the computation efficiency and still allow simulation of majority ofcommonly seen phenomena:

• the number of photons is huge while the photon energies are extremely small – any distribu-tion of photons may be treated as a continuous value;

• photons do not interact with each other, thus effects such as interference cannot be simulated;

4

Page 16: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 5

• photon collisions with surfaces and particles of non transparent volumes (e.g. fog or dust)are elastic, which means that photons cannot change wavelength during scattering;

• diffraction, continuously varying refractive indexes and all other phenomena which couldaffect movement of photons are neglected, so between collisions photons travel along straightlines;

• speed of photons is infinitely large, the scene is assumed to be always in equilibrium state;

• optical properties of materials do not depend on illumination power, therefore illuminationis linear, i.e. it can be computed independently for each light source and summed to formfinal result.

Using all these assumptions light transport can be described by means of radiometric quantities,such as flux or radiance. Some phenomena, which require solving wave equation of light transport,like diffraction or interference, cannot be simulated using these quantities at all. However, someother selected non-geometric effects can be easily simulated, by simple extension of these quantities.For example, spectral radiance is used to simulate wavelength dependent effects. In a similar wayradiance can be extended to support polarization. Moreover, scattering can be extended to supportfluorescence. If spectral radiance is represented as a vector, then reflectivity can be representedas a matrix and scattering event as a matrix-vector multiplication. The simplified case of elasticphoton scattering is therefore represented with diagonal matrixes. Implemented software extendsgeometric optics to support spectral radiance only.

2.1.2 Radiometric Quantities

Under the assumption of applicability of geometric optics it is enough to use a few radiometricquantities to fully describe light transport in any 3D scene. Each quantity is defined by measuringthe distribution of radiation energy with respect to some parameters. Any of these quantities canbe defined in standard and spectral (denoted by dependence on λ) version.

Radiant Flux

Radiant flux is defined as radiated energy per time:

Φ(t) =dQ(t)dt

, Φλ(t, λ) =d2Q(t, λ)dtdλ

. (2.1)

Radiant flux is measured in Watts. This quantity is useful for description of total emission from a3D object or a single light source.

Irradiance

Irradiance in a point x is defined as radiant flux per area:

E(x) =dΦ(x)dA(x)

, Eλ(x, λ) =d2Φλ(x, λ)dA(x)dλ

. (2.2)

Irradiance is measured in Watts per square meter. It is used to describe how strong the illuminationon a given surface is.

Page 17: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 6

Radiance

Radiance is considered to be the most basic quantity in radiometry, and is defined as a power perarea per solid angle (see Figure 2.1):

L(x,ω) =d2Φ(x,ω)

dA⊥(x)dσ(ω), Lλ(x,ω, λ) =

d3Φλ(x,ω, λ)dA⊥(x)dσ(ω)dλ

, (2.3)

where dA⊥(x) is a projected area measure. Radiance is measured in watts per square meter persteradian. It may be rewritten to the more convenient expression:

L(x,ω) =d2Φ(x,ω)

|ω N(x)|dAdσ(ω)=

d2Φ(x,ω)dA(x)dσ⊥(ω)

, (2.4)

which uses standard area measure on surfaces. Light transport equations are based on radiance,which has a useful property – it is constant when light travels along straight lines in vacuum. Torender an image it is enough to know the radiance on camera lens, however some techniques try tocompute radiance everywhere. During scattering of photons it is important to distinguish betweenincident and outgoing radiances. These quantities (defined on the same x) are often marked as Liand Lo.

dA

dwq

dAcosq

Nw

Figure 2.1: Radiance of conical bundle of rays is defined as derivative of bundle power with respectto angular divergence dω of the cone and perpendicular area dA of its base. The radiance can bemeasured on surface non-perpendicular to bundle direction, by projecting the measured area.

Radiant Intensity

Radiant intensity is defined as power per unit angle:

I(ω) =dΦ(ω)dσ(ω)

, Iλ(ω, λ) =d2Φλ(ω, λ)dσ(ω)dλ

. (2.5)

Radiant intensity is measured in watts per steradian. It is used to describe how strong the illu-mination in a particular direction is. In strictly physically based systems, the radiant intensityfrom any particular point x is always zero, however the I(x,ω) quantity is useful in description ofemission from, commonly used in modeling, point-based light sources.

Volume Emittance

Volume emittance is similar to radiance, but is defined with respect to volume, instead of surface:

Lv(x,ω) =d2Φ(x,ω)dV (x)dσ(ω)

, Lv,λ(x,ω, λ) =d3Φλ(x,ω, λ)dV (x)dσ(ω)dλ

, (2.6)

where V (x) means volume measure. Volume emittance is measured in Watts per cubic meter persteradian. This quantity is used to describe volumetric phenomena, for example emission fromcorrectly modeled fire.

Page 18: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 7

2.2 Light Transport Equation

All the rendering algorithms have to solve the light transport problem. In 1986 J. Kajiya[Kajiya, 1986] first noticed that his theoretically correct (under the assumption of geometric opticsapplicability) approach solves equation described in his paper, and all currently available algorithmsmake simplifications of some kind, trading the accuracy of solution for speed. This section presentsin short the derivation of this equation and its extension to support light transport in volumes.Next, it explains scattering functions properties. Then, it presents analytical solution to thesimplest cases of this equation, and finally, it describes simplifications of light transport equationmade by various accelerated rendering algorithms.

2.2.1 Surface Only Scattering

The following derivation is due to [Veach, 1997]. Global light transport equation is based on aformal definition of local surface scattering. Whenever a beam of light from a direction ωi hits asurface at a point x, it generates irradiance equal to:

dE(x,ωi) = Li(x,ωi)|ωi N(x)|dσ(ωi) = Li(x,ωi)dσ⊥(ωi). (2.7)

It can be observed that radiance reflected from a particular point on a surface Lo is proportional toirradiance at that point: dLo(x,ωo) ∝ dE(x,ωi). Bidirectional Scattering Distribution Function(BSDF), called scattering function for short, is, by definition, constant of this proportionality:

fs(x,ωi,ωo) =dLo(ωo)dE(x,ωi)

=dLo(x,ωo)

Li(x,ωi)dσ⊥(ωi). (2.8)

Local surface scattering equation is defined as:

Ls(x,ωo) =∫

Ω

fs(x,ωo,ωi)Li(x,ωi)dσ⊥(ωi). (2.9)

This equation describes how much light is reflected from a surface point x in a particular directionωo, knowing the incident illumination Li. Total light outgoing from a particular surface point xin a particular direction ωo is a sum of scattered light Ls and emitted light Le:

Lo(x,ωo) = Le(x,ωo) +∫

Ω

fs(x,ωo,ωi)Li(x,ωi)dσ⊥(ωi). (2.10)

Incident radiance at a particular surface can be computed using outgoing radiance from anothersurface:

Li(x,ωi) = Lo(T (x,ωi),−ωi). (2.11)

The ray casting operator T (x,ω) finds nearest ray-surface intersection point for ray starting fromx in direction ω. In order to avoid special case when ray escapes to infinity, the whole scene maybe enclosed in a huge, ideally black sphere. The equation 2.11 holds because radiance does notchange as light travels along straight lines in vacuum. Substituting 2.11 into 2.10 leads to the finalform of rendering equation:

L(x,ωo) = Le(x,ωo) +∫

Ω

fs(x,ωo,ωi)L(T (x,ωi),−ωi)dσ⊥(ωi). (2.12)

The incident radiance Li appears no more in this equation, so the subscript from Lo is dropped.This equation is valid for spectral radiance Lλ as well.

2.2.2 Volumetric Scattering Extension

Classic rendering equation cannot handle light transport in so-called participating media. Thesemedia affect radiance when light travels between surfaces. The assumption of surfaces placed in

Page 19: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 8

vacuum is a good approximation when rays travel on short distances in clean air. It fails, however,in simulation of phenomena such as dusty or foggy atmosphere, emission form fire, or large openenvironments, where light paths may be many kilometres long.

The vacuum version of rendering equation (2.12) can be extended to support volumetric effectsby modification of ray casting operator T to account for radiance changes. The extensions are dueto [Arvo, 1993], [Pauly, 1999] and [Pharr & Humphreys, 2004].

Participating media may affect ray by increasing or decreasing its radiance while it travels.Increase is due to in-scattering and emission, and decrease due to out-scattering and absorption.The whole participating medium may be described by definition of three coefficients – absorption,emission and scattering in every point of 3D space. Moreover, at every point in which scatteringcoefficient is larger than zero, there must be also provided a phase function, which performs a rolesimilar to BSDF in classic equation.

The emission from a volumetric medium is described by the following expression:

dL(x+ tω,ω)dt

= Lve(x+ tω,ω), (2.13)

where Lve(x,ω) is volume emittance. Absorption coefficient is defined as following:

dL(x+ tω,ω)dt

= −σaL(x+ tω,ω). (2.14)

Intuitively, the absorption coefficient describes how many times radiance is decreased when lighttravels a unit distance. The absorption coefficient is measured in [m−1], and can take any non-negative real value. Out-scattering decreases ray radiance in similar way as absorption, but usesscattering coefficient σs instead of σa. The total decrease of ray radiance is expressed by extinctioncoefficient σe = σa + σs.

Fraction of light, which is transmitted between points x and x + sω (beam transmittance), isgiven by the following formula:

tr(x, x+ sω) = exp

− s∫0

σe(x+ tω,ω)dt

. (2.15)

Beam transmittance has two useful properties:

tr(x1, x2) = tr(x2, x1) (2.16)tr(x1, x3) = tr(x1, x2)tr(x2, x3). (2.17)

The scattering in the participating medium is described by:

Lvs(x,ω) = σs(x)∫

Ω

fp(x,ωo,ωi)Li(x,ωi)dσ(ωi). (2.18)

Radiance added to ray per unit distance due to in-scattering and emission can be expressed as:

Lvo(x,ω) = Lve(x,ω) + Lvs(x,ω). (2.19)

Assuming that ray travels through participating medium infinitely long, i.e. it never hits a surface,the total ray radiance change due to participating medium is:

Li(x,ω) =

∞∫0

tr(x, x+ tω)Lvo(x+ tω,ω)dt (2.20)

When participating media are mixed with surfaces, similarly as in surface-only rendering, raycasting operator T can be used to found nearest ray-surface intersection. Let y = T (x,ωi) and

Page 20: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 9

s = ‖x− y‖. The radiance Li incoming at a surface can then be expressed in terms of radianceoutgoing from other surface, modified by encountered participating media:

Li(x,ωi) = tr(x, y)Lo(y,−ωi) +

s∫0

tr(x, x+ tωi)Lvo(x+ tωi,ωi)dt (2.21)

Expression 2.21 can be substituted into 2.10, which leads to light transport equation generalizedto support participating media mixed with surfaces:

L(x,ωo) = Le(x,ωo) +∫

Ω

fs(x,ωo,ωi)·

·

tr(x, y)L(y,−ωi) +

s∫0

tr(x, x+ tωi)Lvs(x+ tωi,ωi)dt

dσ⊥(ωi).(2.22)

Similarly as in surface-only version, the subscript from Lo is dropped, and the equation holdsfor spectral radiance as well. The generalized equation is substantially more complex than thesurface-only version, and therefore it may be expected that rendering participating media dramat-ically hurts performance. Because of that, many existing volumetric rendering algorithms makesimplifications of some kind to this general form of volumetric rendering equation.

2.2.3 Properties of Scattering Functions

The domain of scattering function fs, which describes scattering from all surfaces, is the wholeΩ. This function is often defined as a union of simpler functions – reflection fr (BidirectionalReflection Distribution Function, BRDF) and transmission ft (Bidirectional Transmission Distri-bution Function, BRDF). Reflection happens when both ωi and ωo are on the same side of asurface. Transmission happens when ωi and ωo are on opposite sides of the surface. Transmissionis typically modeled as one two-directional function. Therefore, for reflection only direction ofsurface normal N is important, while for transmission direction as well as sign of N has to beaccounted for.

Scattering functions have several important properties. First, to conform the laws of physics,BRDFs must be symmetric, i.e. swapping incident and outgoing directions must not change BRDFvalue:

∀ωi,ωo ∈ Ω+ fr(ωi,ωo) = fr(ωo,ωi). (2.23)

However, when surface transmits light, the BTDF typically is not symmetric, but the asymmetryis strictly defined as a function of refraction coefficients of the media on the opposite sides of thesurface. The same rule applies to phase functions, but obviously in this case there is no differencebetween reflection and transmission:

∀ωi,ωo ∈ Ω fp(ωi,ωo) = fp(ωo,ωi). (2.24)

Moreover, all BRDFs and BTDFs (and therefore BSDFs) must be energy conserving, i.e. sur-faces cannot reflect more light than they receive:

∀ωo ∈ Ω R(ωo) =∫

Ω

fs(ωi,ωo)dσ⊥(ωi) ≤ 1. (2.25)

Analogous relationship holds for phase functions:

∀ωo ∈ Ω R(ωo) =∫

Ω

fp(ωi,ωo)dσ(ωi) = 1. (2.26)

The equation 2.26 differs from 2.25 in two ways. First, the integration is done with respect toordinary solid angle, and second, there is strict requirement that phase function is a probabilitydistribution (i.e. it must integrate to one).

Page 21: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 10

2.2.4 Analytic Solutions

The light transport equation can be solved analytically only in trivial cases, useless in practice.However, these analytical solutions, despite being unable to produce any interesting image, canprovide a valuable tool in testing light transport algorithms. Obviously, these tests cannot definitelyprove that algorithm which passes them is correct, but nevertheless they provide an aid in removingalgorithms’ errors and evaluating their speed of convergence.

A very simple test scene is a unit sphere, with constant emission Le(x,ω) ≡ 12π and constant

BRDF fr(x,ωi,ωo) ≡ 12π for each point on the sphere and each direction inside sphere. Since the

scene is symmetric (neither geometry nor emission or scattering can break the symmetry), it iseasy to verify that radiance L measured along each ray inside the sphere must be identical andequal to 1. This scene can be made a bit more complex with a BRDF which is not necessarilyconstant, but still with constant reflectivity of R = 0.5. The sphere can be filled with homogenousparticipating medium with an absorption coefficient σa ≡ 0 and arbitrary scattering coefficient σs.These modifications must not change the result returned by tested light transport algorithms.

2.2.5 Simplifications

Variety of rendering algorithms, especially those which run in real time, do not attempt to solvefull light transport equation. These algorithms may be described theoretically by a (substantially)simplified equations, which are actually solved. In Figure 2.2 there are shown results of simplifi-cations of light transport equation compared with full global illumination solution. The simplifiedresults are given without any additional terms, like ambient light, they just show what simplifiedequations actually describe.

First, one-pass hardware accelerated rasterization implemented in commonly available libraries,such as OpenGL or DirectX, computes only one scattering and supports only point light sources.Therefore rasterization solves the following equation:

L(x,ωo) =n∑i=1

fr

(x,ωo, yi − x

)Ii

(x− yi

), (2.27)

where n is the number of light sources, yi is the position of ith light source, and Ii is its radiantintensity. Only recently, due to advancements in programmable graphics hardware [Rost & Licea-Kane, 2009], the fr function can be arbitrarily complex, and non-point lights can be approximatedwith reasonable precision. Such dramatic simplifications result in possibility of real time rendering,but at the cost of very poor illumination quality.

Classic Whitted ray tracer [Whitted, 1980] (see Section 4.1.2) handles multiple reflections, butonly from ideal specular surfaces and for point light sources. This may be seen as replacing integralwith a sum of radiances of ideally reflected and transmitted rays:

L(x,ωo) =n∑i=1

fr

(x,ωo, yi − x

)Ii

(x− yi

)+ αL(T (x,ωr),−ωr) + βL(T (x,ωt),−ωt), (2.28)

where ωr is reflected ray direction, ωt is transmitted ray direction, and α and β are color coefficientssuch that 0 ≤ α < 1, 0 ≤ β < 1 and α+ β < 1. An improved approach – Cook’s Distributed RayTracing [Cook et al., 1984] computes right hand integral, but only once for area light sources ortwice for glossy reflections, so it also does not support global illumination.

On the other hand, the radiosity algorithms [Cohen & Wallace, 1993] (see Section 4.1.4) handlemultiple reflections, but are limited to matte surfaces only. They solve full light transport equation,but with the assumption that fr ≡ k

π , 0 < k < 1. There exist less restrictive radiosity algorithms,however, but they are impractical due to excessive memory consumption.

Similarly, the volumetric version of light transport equation is often simplified. In rasterizationapproach, this equation is typically ignored completely – color of all scene elements exponentially

Page 22: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 11

Figure 2.2: Difference between results of hardware rasterization algorithm (left), classic ray tracing(middle) and full global illumination (right). All guessed lighting terms (e.g. ambient light) weredisabled deliberately, to show what given technique actually computes.

fade to arbitrarily chosen fog color with the distance from the viewer. This is a very poor approxi-mation, and it cannot simulate majority of volumetric effects, like visible beams of light. However,that is the high price of the ability to run rasterization in real time. On the other hand, physicallybased rendering systems use less drastic simplifications of volumetric effects. For example, Pharrand Humphreys [Pharr & Humphreys, 2004] implemented single scattering approximation, whichseems to be a reasonable trade-off between rendering speed and image quality.

2.3 Image Formation

This section shows how to create an image from computed radiance. It starts by describinga camera as a device emitting importance. Next, it explains the conversion of light transportequation domain to points only from points and unit direction vectors. Finally, it shows how theequation can be rewritten as integral over space of all light transport paths, which sometimes iseasier to solve. These formulae are based on [Veach, 1997], [Pauly, 1999] and [Pharr & Humphreys,2004], with modification that image is a function defined on real values instead of array of numbers.

2.3.1 Importance

Cameras may be formally described as devices emitting importance. The importance mightbe though as hypothetical particles like photons, which propagate from camera, against directionof light flow. Intuitively, tracing importance approximates how important for a rendered imagevarious scene parts are. That is, the importance distribution only is enough to define a camera,and different cameras have different importance distributions. The creation of an image is definedas integral of product of emitted importance (from camera) and radiance (from light sources):

Ii =∫Alens

∫Ω

W ie(xlens,ω)Li(xlens,ω)dσ⊥(ω)dA(xlens), (2.29)

where the Ii is ith pixel, W ie is its emitted importance and lens is treated as a special surface in

the scene – an idealized measuring device – which is able to record radiance while not interferingwith light transport.

The pixel dependence can be removed in the following way. Let the image domain be a unitsquare, i.e. [0, 1]2, and u, v image coordinates: (u, v) ∈ [0, 1]2. The importance is defined asfunction of point xlens and direction ω, as well as location on image plane (u, v). The imagefunction I is then evaluated using the expression:

I(u, v) =∫Alens

∫Ω

We(u, v, xlens,ω)Li(xlens,ω)dσ⊥(ω)dA(xlens). (2.30)

Page 23: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 12

The importance We can be obtained from W ie using a filter function:

We(u, v) =∑i

W ief

i(u, v). (2.31)

The image equation 2.29 uses importance W ie based on particular filter during image creation

process. This seriously limits available image post-processing algorithms. On the other hand,the basic modification in equation 2.30 removes this flaw. The modification seems very simplein theoretical formulation of light transport, but it has significant implications to the describedlater design of rendering algorithms and their parallelization. Obviously, both these equations arecorrect for spectral radiance as well.

The functional representation of image, however, can cause potential problems when emittedradiance Le described with δ distribution is directly visible. For example, point light sourcesrendered with pinhole cameras cannot be represented with finite I(u, v) values. Therefore, allalgorithms implemented for purpose of this thesis explicitly omit directly visible lights describedwith δ distributions.

2.3.2 Integral Formulation

The light transport equation can be reformulated to an integral over all possible light transportpaths. The following transformation is based on [Veach, 1997] and [Pharr & Humphreys, 2004].First, radiance and scattering functions can be expressed in different domains:

L(x1,ω) = L(x1 → x2), where ω = x2 − x1 (2.32)fs(x2,ωo,ωi) = fs(x3 → x2 → x1), where ωo = x3 − x2 and ωi = x2 − x1. (2.33)

The projected solid angle measure σ⊥(ω) is transformed to an area measure A(x) as follows:

dσ⊥(ω) = V (x1 ↔ x2)| cos θ1|| cos θ2|‖x1 − x2‖2

dA(x2) = G(x1 ↔ x2)dA(x2), (2.34)

where ω = x2 − x1, θ1 and θ2 are angles between ω and N(x1) or N(x2), respectively, andV (x1 ↔ x2) is the visibility factor – which is equal to 1 if x1 and x2 are mutually visible, and 0otherwise. Substituting 2.33, 2.32 and 2.34 into 2.12 leads to a light transport equation definedover all scene surfaces instead over all direction vectors:

L(x2 → x1) = Le(x2 → x1) +∫A

fs(x3 → x2 → x1)L(x3 → x2)G(x3 ↔ x2)dA(x3), (2.35)

and, assuming x0 ≡ xlens and We ≡We(u, v), image creation equation:

I(u, v) =∫A2We(x0 → x1)L(x1 → x0)G(x1 ↔ x0)dA(x1)dA(x0). (2.36)

Next, using the recursive substitution of the right hand expression of 2.35 into L in 2.36 it isobtained:

I(u, v) =∫A2We(x0 → x1)G(x1 ↔ x0)Le(x1 → x0)dA(x1)dA(x0) +

+∫A3We(x0 → x1)G(x1 ↔ x0)fs(x2 → x1 → x0) ·

·G(x2 ↔ x1)Le(x2 → x1)dA(x2)dA(x1)dA(x0) +

+∫A4We(x0 → x1)G(x1 ↔ x0)fs(x2 → x1 → x0)G(x2 ↔ x1) ·

· fs(x3 → x2 → x1)G(x3 ↔ x2)Le(x3 → x2)dA(x3) · · · dA(x0) +

+ · · · =

=∞∑i=0

∫Ai+2

We(x0 → x1)αiLe(xi+1 → xi)dµ(xi), (2.37)

Page 24: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 2. LIGHT TRANSPORT THEORY 13

where

αi = G(x1 ↔ x0)i∏

j=1

fs(xj+1 → xj → xj−1)G(xj+1 ↔ xj)

dµ(xi) = dA(x0)dA(x1) · · · dA(xi+1).

The expressions 2.12, 2.35, and 2.37 for evaluating radiance L are obviously equivalent, but differentrendering algorithms often prefer particular form over another.

The volumetric version of light transport equation written as integral over light paths was for-mulated by Pauly [Pauly, 1999]. The main idea of this concept is integration over paths built fromany combination of surface and volume scattering. Let bi be the ith bit of binary representation ofb, b ∈ N. Let x = x0x1 . . . xk be the light transport path with k + 1 vertexes. Integration domainis defined as:

Ψkb = (ψ0ψ1 · · ·ψi · · ·ψk), ψi =

A, if bi = 0V, if bi = 1 . (2.38)

The integration measure is:

µkb (x) = (ψ0ψ1 · · ·ψi · · ·ψk), ψi =dA(xi), if bi = 0dV (xi), if bi = 1 . (2.39)

The geometric factor is redefined as:

Gx(x1 ↔ x2) = V (x1 ↔ x2)tr(x1 ↔ x2)c(x1)c(x2)‖x1 − x2‖2

, c(x) =| cos θ|, if x ∈ A1, if x ∈ V . (2.40)

The emitted radiance is

Lex(x2 → x1) =Le(x2 → x1), if x2 ∈ ALev(x2 → x1), if x2 ∈ V

, (2.41)

and scattering function is

fx(x2 → x1 → x0) =fs(x2 → x1 → x0), if x1 ∈ Aσsfp(x2 → x1 → x0), if x1 ∈ V

. (2.42)

Finally, the integral formulation of volumetric light transport equation is:

I(u, v) =∞∑i=0

2i+1−1∑b=1

∫Ψi+1

2b

We(x0 → x1)αixLex(xi+1 → xi)dµi+12b (x), (2.43)

where αix is similar to αi, but is defined on Gx and fx instead of G and fs. This equation implicitlymakes the common sense assumption that x0, which is a point on camera lens, is always surfacepoint, i.e. no volumetric sensors are allowed and importance is always emitted from lens surface.However, the radiance emission from volume is accounted properly.

2.3.3 Image Function

In basic version, image contains only intensity values. However, we found that depth of thefirst scattering (relative to camera) is potentially very useful in many of postprocessing techniques.Thus, the image function obtained during rendering process is:

I : [0, 1]2 × λ −→ R+, (2.44)

Id : [0, 1]2 −→ R+ ∪∞∪ V , (2.45)

where the∞means that ray escapes to infinity and V means that ray was scattered in participatingmedium. In the latter case there is no way to reliably define depth of the first intersection.

Page 25: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 3

Monte Carlo Methods

The light transport equation has an analytical solution only for very simple scenes, whichare useless in practice, thus appropriate numerical algorithms are necessary to solve it. Classicquadrature rules, e.g. Newton-Cotes or Gaussian quadratures are not well suited to light transport.Light transport integrals are very high-dimensional and integrated functions are discontinuous,which result in poor convergence of quadrature rules based on regular grids. However, usage ofalgorithm of non-deterministic sampling of integrated functions gives much better results.

Non-deterministic algorithms use random numbers to compute the result. According to [Pharr& Humphreys, 2004], they can be grouped in two broad classes – the Las Vegas algorithms, whererandom numbers are used only to accelerate computations in average case, with final deterministicresult (e.g. Quick Sort with randomly selected pivot) and Monte Carlo algorithms, which givescorrect results on average. For example, the result of Monte Carlo integration, is not certainlycorrect, but nevertheless has strict probabilistic bounds on its error. All non deterministic renderingalgorithms, from mathematical point of view, can be seen as variants of Monte Carlo integration.

The purpose of this chapter is the explanation of statistical methods used in rendering al-gorithms. Since these methods are well-known, the discussion is brief. Mathematical statisticsis explained in detail in [Plucinska & Plucinski, 2000]. The good reference books on generalMonte Carlo methods are [Fishman, 1999] and [Gentle, 2003], and [Niederreiter, 1992] on quasi-Monte Carlo techniques. Applications of Monte Carlo methods in computer graphics are presentedin [Veach, 1997] and [Pharr & Humphreys, 2004].

The rest of this chapter starts with a short review of statistic terms and basic Monte Carlointegration techniques. Next, a distinction between biased and unbiased integration algorithms isexplained. This is followed by a description of selected most useful variance (i.e. error) reductiontechniques. Finally, some quasi-Monte Carlo methods are presented.

3.1 Monte Carlo Integration

This section starts with description of concepts employed in statistics. These ideas are thenused to construct estimators which approximate integrals of arbitrary functions. Finally, there isa brief analysis of error and convergence rate of Monte Carlo integration, based on variance.

3.1.1 Statistical Concepts

Let Ψ be a set (called sample space)1. Let A be a σ-algebra on Ψ and B be the σ-algebra ofBorel sets on R. The measure P defined on A is probability if P (Ψ) = 1. The function X : Ψ→ R

1Typically sample space is denoted by Ω. However, in this work Ω is reserved for space of unit directional vectors,thus sample space is denoted by Ψ to avoid confusion.

14

Page 26: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 15

is a single-dimensional random variable if:

∀x ∈ R X−1((−∞, x)) = ψ : X(ψ) < x ∈ A. (3.1)

The probability distribution of random variable is defined as:

PX(S) = P (ψ : X(ψ) ∈ S) , ∀S∈B. (3.2)

The cumulative distribution function (cdf) is:

cdf(x) = PX((−∞, x)), ∀x∈R. (3.3)

Cumulative distribution function cdf(x) may be interpreted as a probability of an event thatrandomized X value happens to be less or equal given x:

cdf(x) = Pr X ≤ x . (3.4)

The corresponding probability density function (pdf or p) is:

pdf(x) =d cdf(x)dx

. (3.5)

Let X1, X2, . . . , Xn be random variables and Bn be the σ-algebra of Borel sets on Rn. The vectorX = (X1, X2, . . . , Xn) is a multidimensional random variable if

PX(S) = P (ψ : X(ψ) ∈ S) , ∀S∈Bn . (3.6)

The cdf of multidimensional random variable is:

cdf(x) = PX((−∞,x)), ∀x∈Rn , (3.7)

and is corresponding pdf :

pdf(x) =∂n cdf(x)

∂x1∂x2 · · · ∂xn. (3.8)

The relationship between pdf and cdf can be expressed in more general way using measure theory:

pdf(x) =d cdf(x)dµ(x)

and cdf(x) =∫D

pdf(x)dµ(x). (3.9)

The expected value of random variable (single- or multidimensional) Y = f(X) is defined as:

E[Y ] =∫

Ψ

f(x)pdf(x)dµ(x), (3.10)

and its variance is:V [Y ] = E

[(Y − E[Y ])2

]. (3.11)

Standard deviation σ, which is useful in error estimation, is defined as square root of variance:

σ[X] =√V [X]. (3.12)

Expected value and variance have the following properties for each α ∈ R:

E[αX] = αE[X], (3.13)V [αX] = α2V [X]. (3.14)

The expected value of sum of random variables is a sum of expected values:

E

[N∑i=1

Xi

]=

N∑i=1

E [Xi] . (3.15)

Similar equation holds for variance if and only if the random variables are independent. Usingthese expressions and some algebraic manipulation variance can be reformulated:

V [X] = E[(X − E[X])2

]= E

[X2 − 2XE[X] + E[X]2

]= E

[X2]− E[X]2. (3.16)

Page 27: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 16

3.1.2 Estimators of Integrals

Let I be the integral to evaluate:

I =∫

Ψ

f(x)dµ(x). (3.17)

The basic Monte Carlo estimator of this integral is:

I ≈ FN =1N

N∑i=1

f(Xi)pdf(Xi)

, (3.18)

where ∀x : f(x) 6= 0 pdf(x) > 0. Using definition of expected value it may be found that expectedvalue of estimator 3.18 is equal to integral 3.17:

E [FN ] = E

[1N

N∑i=1

f(Xi)pdf(Xi)

]=

1N

N∑i=1

∫Ψ

f(x)pdf(x)

pdf(x)dµ(x) =∫

Ψ

f(x)dµ(x), (3.19)

thus the estimator 3.18 produces correct result on average. Variance of such estimator can beexpressed as:

V [FN ] = V

[1N

N∑i=1

Xi

]=

1N2

V

[N∑i=1

Xi

]=

1N2

N∑i=1

V [Xi] =1NV [F ]. (3.20)

This expression for variance is valid if and only if Xi are independent. The variance of singlesample V [F ] is equal to:

V [F ] = E[F 2]− E[F ]2 =∫

Ψ

f2(x)pdf(x)

dµ(x)−(∫

Ψ

f(x)dµ(x))2

(3.21)

Convergence rate of estimator FN can be obtained from Chebyshev’s inequality:

Pr

|X − E[X]| ≥

√V [X]δ

≤ δ, (3.22)

which holds for any fixed threshold δ > 0 and any random variable X which variance V [X] <∞.Substitution of estimator FN into Chebyshev’s inequality yields:

Pr

|FN − I| ≥

1√N

√V [F ]δ

≤ δ, (3.23)

thus for any fixed threshold δ the error decreases with rate O(1/√N)

3.1.3 Biased and Unbiased Methods

All non-deterministic integration algorithms can be divided into two fundamental groups –unbiased and biased. Bias β in statistics is defined as difference between the true value of estimatedquantity Q and expected value of the estimator FN :

β [FN ] = E [FN ]−Q. (3.24)

Thus, the unbiased algorithms produce correct results on average, without any systematic errors.However, the shortcoming of these algorithms is variance, which is likely to be larger than inbiased ones. The variance appears as noise in the rendered image. Non trivial scenes require alot of computation to reduce this distracting artifact to an acceptable level. On the other hand,biased methods exhibit systematic error. For example, given point in a rendered image can be

Page 28: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 17

regularly too bright regardless of number of samples N evaluated. Biased methods still can beconsistent (unbiased asymptotically), as long as the error decreases to zero with increasing amountof computation:

limN→∞

E [FN ] = Q. (3.25)

The bias is usually difficult to estimate, and even if rendered image does not appear noisy, itstill can have substantial inaccuracies, typically seen as excessive blurring or illumination artifactson fine geometrical details. However, despite non-zero bias, biased methods tends to convergesubstantially faster (they have lower variance) than unbiased ones.

3.2 Variance Reduction Techniques

The basic Monte Carlo estimator potentially can have high variance, which directly leads topoor efficiency. In this section there is a brief revision of general methods which substantially reducevariance of this estimator, without introducing bias. On the other hand, biased methods used incomputer graphics are often dedicated to particular light transport algorithms, and therefore aredescribed in Chapter 4.

3.2.1 Importance and Multiple Importance Sampling

A variance of Monte Carlo estimator 3.18 can be decreased, when the pdf(x) is made moreproportional to f(x). Intuitively, this approach tends to place relatively more samples whereverintegrand is large, therefore reducing integration error. Particularly, when pdf(x) ∝ f(x) is satisfiedexactly, the variance is zero:

pdf(x) = cf(x), c =(∫

Ψ

f(x)dµ(x))−1

, (3.26)

and therefore

V [F ] =∫

Ψ

f2(x)pdf(x)

dµ(x)−(∫

Ψ

f(x)dµ(x))2

= c−1

∫Ψ

f(x)dµ(x)−(∫

Ψ

f(x)dµ(x))2

= 0.

However, in order to achieve zero variance, the function must be integrated analytically, to obtainnormalization constant c. This is impossible, since otherwise there would not be the necessity ofusing Monte Carlo integration at all. Fortunately, using a pdf which is proportional, or almostproportional, to at least one of factors of f , typically decreases variance. This technique is calledImportance Sampling.

Suppose that f(x) can be decomposed into product f(x) = f1(x)f2(x) · · · fn(x) and there existprobability densities pdfi proportional (or roughly proportional) to each factor fi. If the standardImportance Sampling is used, the pdfi used for sampling f(x) has to be chosen at algorithm designtime. This can have disastrous consequences for algorithm efficiency, if the chosen pdfi poorlymatches the overall f(x) shape. In this case, Importance Sampling can actually increase varianceover sampling with uniform probability. In this case Multiple Importance Sampling [Veach &Guibas, 1995] can be used. This technique was designed to improve the reliability of ImportanceSampling when the appropriate pdf cannot be chosen at the design time. The main idea of thismethod is to define more than one pdf (each of them is potentially good candidate for importancesampling) and let the algorithm chose the best one at runtime, when the actual shape of integrandis known. The algorithm does this by computing appropriate weights and returning the estimatoras weighted sum of samples from these pdfs:

Fnm =n∑i=1

1m

m∑j=1

wi(Xij)f(Xij)pdfi(Xij)

, ∀xn∑i=1

wi(x) = 1. (3.27)

Page 29: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 18

The appropriate choice of weights wi is crucial for obtaining low variance estimator. Accordingto [Veach & Guibas, 1995] following set of weights is the optimal choice:

wi(x) =pdfi(x)∑nj=1 pdfj(x)

. (3.28)

However, Multiple Importance Sampling causes an issue that has large impact on design andimplementation of sampling routines. The standard Importance Sampling technique requires justtwo methods – sampling points xi with given probability pdf(xi) and evaluating f(xi). The Mul-tiple Importance Sampling, however, requires an additional operation – computing pdf(x) for anarbitrary argument x. Intuitively, the operation may be interpreted as ‘compute the hypotheticalprobability of returning given x’. This operation is usually more difficult than computing proba-bility during sampling xi, since algorithm has no knowledge of random choices necessary to selectarbitrary value, as it has in sampling procedure. Fortunately, when approximated probabilities arecomputed, the Multiple Importance Sampling estimator still is correct, but crude approximationhurts performance.

3.2.2 Russian Roulette and Splitting

Russian roulette and Splitting are designed to adaptively change sampling density withoutintroducing bias. These two techniques were introduced to computer graphics by Arvo and Kirk[Arvo & Kirk, 1990]. Suppose that estimator F is sum of estimators F = F1 + F2 + . . . + Fn.Russian roulette allows random skipping evaluation of these terms:

F ′i =

Fi−(1−qi)c

qi, with probability qi,

c, with probability 1− qi,(3.29)

where c is arbitrary constant, typically c = 0. When the estimator is a sum of infinite number ofterms S = F1 +F2 + . . .+Fi+ . . ., the Russian roulette still can be applied, provided that the sumS is finite. Let Si = Fi + Fi+1 + . . . be the partial sum. Then S can be reexpressed as S = S1,S1 = F1 + S2, S2 = F2 + S3, . . .. Each sum Si is then evaluated with probability qi, and set to 0otherwise. Provided that at most finite number of qi is equal to 1 and ∃ ε > 0: qi < 1 − ε, foralmost all qi, evaluation of sum S is randomly terminated with probability 1 after n terms. Thisleads to expression:

S′ =1q1

(F1 +

1q2

(F2 + . . .+

1qnFn

). . .

)=

n∑i=1

i∏j=1

1qj

Fi

(3.30)

Russian roulette, however, increases variance of the estimator. Nevertheless, since it reducescomputational cost of it, Russian roulette can improve the estimator efficiency (product of varianceand cost), if probabilities qi are chosen carefully. Moreover, Russian roulette can be used toterminate computation of infinite series without introducing statistical bias.

Splitting works in opposite direction to Russian roulette. Splitting increases the number ofsamples in order to reduce variance. Splitting increases computation time, but nevertheless, ifperformed carefully, can improve sampling efficiency. According to [Veach, 1997], the splittingtechnique works as follows:

F ′i =1n

n∑i=1

Fij , (3.31)

where Fij are independent samples from Fi.

3.2.3 Uniform Sample Placement

Random samples tend to clump together leaving large portions of domain relatively empty.Uneven distribution of samples leads to increased variance of estimators and therefore low sampling

Page 30: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 19

efficiency. More advanced sampling techniques try to spread samples as evenly as possible overentire integration domain, which is typically assumed to be a unit s-dimensional hypercube.

Stratified Sampling

The stratified sampling method is used to split the integration domain Ψ into k non-overlappingsmaller subdomains Ψ1, . . . ,Ψk, called strata, and draw ni samples from each Ψi. The total numberof samples is not modified, but the samples are better distributed. Provided that no ni is equalto zero, the result is still correct, and provided that each ni is proportional to relative volume ofrespective Ψi, stratified sampling never increases variance. According to [Veach, 1997], stratifiedsampling works most efficiently if integrand mean values in different strata are as different aspossible; if the mean values are identical, stratified sampling does not help at all. An examplestratified pattern of 16 samples is presented in Figure 3.1. Each of 16 strata contains exactly onesample.

Unfortunately, stratified sampling has two major drawbacks. First, whenever dimensionality sof integration domain Ψ is high, the number of subdomains k quickly becomes prohibitive. Forexample, suppose that s = 10 and stratification splits domain into four parts along each dimension.In this case, k = 410 ≈ 106, which typically is far too much. Second, in order to optimally stratifyintegration domain, the number of drawn samples should be known in advance. This is a majorlimitation, if algorithm is designed to draw as many samples as is necessary to achieve desired levelof accuracy of solution.

Latin Hypercube Sampling

Latin hypercube sampling stratifies projections of integration domain onto any of axis. Thecost of this technique does not increase with dimensionality of integration domain s, however,Latin hypercube sampling does not provide multidimensional stratification. An example Latinhypercube pattern of 16 samples is presented in 3.1. Each of 16 horizontal and each 16 verticalstripes contains exactly one sample. Latin hypercube sampling can be very effectively implementedusing the following formula:

Xji =

πj(i)− ξjiN

, (3.32)

where i is sample number, j is dimension number, N is number of samples, πj is jth randompermutation of sequence of numbers 1, 2, . . . , N, and all ξji are independent canonical randomnumbers. Latin hypercube sampling works best when single-dimensional components of integrandare much more important than others, i.e. f(x1, x2, . . . , xs) = f(x1) + f(x2) + . . . + f(xs) +fres(x1, x2, . . . , xs) and |fres(x1, x2, . . . , xs)| |f(x1, x2, . . . , xs)|. Nevertheless, the variance ofLatin Hypercube Sampling is never much worse than variance of common unstratified sampling:

∀N≥2 V [F ′] ≤ N

N − 1V [F ], (3.33)

where V [F ] is variance of estimator F using unstratified sampling and V [F ′] is variance of the sameestimator with Latin hypercube sampling. Thus using Latin hypercube sampling in the worst casecan result in variance of standard sampling with one observation less. Since N has to be known inadvance, similarly as stratified sampling, Latin hypercube sampling does not allow adaptive choiceof number of samples.

3.3 Quasi-Monte Carlo Integration

Almost all implementations of Monte Carlo algorithms use random numbers in theory and somekind of pseudorandom number generator in practice. This is not strictly correct, but if a sequenceof pseudorandom numbers satisfies some constraints, the convergence of Monte Carlo algorithm

Page 31: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 20

Figure 3.1: Comparison of uniform sample placement techniques. Left image: reference purerandom sampling. Middle image: stratified sampling. Right image: Latin hypercube sampling.

with pseudorandom number generator does not differ much from purely random Monte Carlo.The approach of using pseudorandom numbers may be pushed further. So-called quasi-MonteCarlo methods use carefully designed deterministic sequences of numbers for sampling integratedfunctions, which, in practice, provides a slightly better convergence than random Monte Carlo.

This section starts with brief explanation of selected methods of evaluating quality of sequencesof quasi-Monte Carlo samples. Next, it lists properties, which such sequences should have in orderto be applicable to integrating Light Transport Equation. Then, there is a short description ofa few particularly useful methods of generating them, and finally, a comparison of Monte Carloand quasi-Monte Carlo integration and a summary of quasi-Monte Carlo limitations and potentialissues related to using this approach.

3.3.1 Desired Properties and Quality of Sample Sequences

A sequence of sample points, which is to be used to solve Light Transport Equation by meansof quasi-Monte Carlo integration, must have at least three properties. First, the number of samplesnecessary to obtain desired accuracy of solution is not known in advance. Therefore, the sequencemust be infinite. That is, when a given method is used to obtain two sequences with n and n+ 1sample points, the n sample points from a first sequence must be a prefix of the second sequence.Second, each individual light scattering event increases dimensionality of an integration domain bytwo. Since, at least in theory, light can bounce indefinitely, the dimensionality of a sequence mustbe unlimited, too. Additionally, since individual samples are evaluated in parallel, it is desirableto be able to compute ith sample point without prior evaluation of 0, . . . , i− 1 points.

True random numbers clearly satisfy these properties, but with some drawbacks – they maybe obtain only by means of external devices attached to a computer, the sequence obtained in onealgorithm run is unrepeatable, and integration error resulting from using true random numbers ishigher than when carefully designed deterministic sequences are used. Thus, infinite pseudorandomsequences which minimize integration error have to be designed.

The sequence domain is a s-dimensional unit hypercube, where s may be infinite. Samplepoints defined in the hypercube are then transformed to necessary domain independently of samplegeneration algorithms. The quality of a low discrepancy sequence implies how large is the meanerror of integrating functions using sample points from the given sequence. Intuitively, the moreevenly sample points are spread in the s-dimensional hypercube and in its lower dimensionalprojections, the better the sequence quality is.

Unfortunately, there is no perfect, precise measurement of sequence quality, which is directlyrelated to integration error. The commonly used measure is a discrepancy. Discrepancy is measuredas supremum of set of values, which are calculated as a ratio of volume of axis aligned box to thefraction of sample points in that box. Discrepancy is equal to supremum over all such boxes insidethe unit hypercube. A star discrepancy quality measurement limits the accounted boxes to ones

Page 32: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 21

which include origin of the hypercube. A star discrepancy of N samples is then defined by thefollowing formula:

D∗N = supb∈B

∣∣∣∣# xi ∈ bN− µ(b)

∣∣∣∣ , (3.34)

where B is a hypercube, b is a origin-anchored box, µ(b) is its volume and xi is a ith sample point.The star discrepancy of true random numbers is, with probability one, asymptotically equal to:

D∗N = O

(√log logN

N

)≈ O

(1√N

). (3.35)

Best known low discrepancy sequences in s-dimensional space have a discrepancy of:

D∗N = O(

(logN)s

N

), (3.36)

while regular grids have the following discrepancy:

D∗N = O(

1s√N

), (3.37)

which explains why they behave such poorly when dimensionality is large. Low discrepancy se-quences obviously have the lowest discrepancy, but, in practice, when s is large, the number ofsamples N at which these sequences start to be better than random numbers, is far too large tobe useful.

3.3.2 Low Discrepancy Sequences

There is a number of techniques for generating low discrepancy sequences. Simple and popularare methods based on radical inverses. Let the sample number be i. A radical inverse in a base bis found by evaluating digits of representation of i in a base b, and then reflecting string of thesedigits around a decimal point. If i is written as:

i = dn−1dn−2 . . . d0 =n−1∑i=0

dibi, (3.38)

then the radical inverse r is:

r = 0.d0d1 . . . dn−1 =n−1∑i=0

dib−i−1. (3.39)

The radical inverse in base 2 is called a van der Corput sequence. This sequence is one-dimensional,and cannot be converted to multiple dimensions by simply taking subsequent samples, e.g. if everyeven sample point is used as a x coordinate and respectively every odd point as a y one, allsample points will fall onto diagonal lines along a 2D domain. Solution to this problem is aHalton sequence [Halton, 1960]. Halton sequence is built from radical inverses for each dimension,with the restriction that all base numbers must be relatively prime. Typically, first s primesare used for sampling in a s-dimensional space. Unfortunately, the sequence quality, and thereforeintegration accuracy, degrades quickly with increasing base. Partial solution to this issue is a Fauresequence [Faure, 1982]. Suppose that sampling domain is s dimensional. The Faure sequence isconstructed over a finite field over a prime p not less than s. Since sequences built over smallestpossible primes have best properties, a smallest such prime is typically chosen. This feature is alsoan obvious drawback of a Faure sequence – s has to be known in advance or a crude overestimationof it is necessary. First coordinate of a point from a Faure sequence is a radical inverse in a basep. Subsequent coordinates are evaluated by multiplication of a digit vector used to construct aradical inverse by a matrix defined as Pascal triangle modulo p. The modified digit vector is thenreflected around a decimal point. All operations are performed in p element finite field. Such

Page 33: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 22

multiplications can be performed p− 1 times, before results start to repeat. For example, a digitvector of ith coordinate of a point P from a Faure sequence in base 3 is constructed as follows:

Pi =

1 1 1 1 1 1 . . .0 1 2 0 1 2 . . .0 0 1 0 0 1 . . .0 0 0 1 1 1 . . .0 0 0 0 1 2 . . .0 0 0 0 0 1 . . ....

......

......

.... . .

i

×

d0

d1

d2

d3

d4

d5

...

, i = 0, 1, 2. (3.40)

There is an alternative concept of evaluating low discrepancy sequences quality – (t,m, s)-netsand (t, s)-sequences. For example, suppose that the sampling domain is two dimensional, and thereare n2 points in a sequence. A stratified sampling would then results with an exactly one samplein each [i/n, (i+1)/n]× [j/n, (j+1)/n] block. On the other hand, latin hypercube sampling wouldplace exactly one sample in each [0, 1]× [i/n2, (i+ 1)/n2] and [j/n2, (j + 1)/n2]× [0, 1] blocks. Itis desirable to satisfy stratified sampling, latin hypercube sampling, and any other similar domainpartitionings together. Techniques for generating (t,m, s)-nets do exactly this.

Let B be the elementary interval in the base b, which is an axis aligned box inside a units-dimensional hypercube C:

B =s∏j=1

[tjbkj

,tj + 1bkj

), (3.41)

where 0 ≤ tj < bkj . A (t,m, s)-net in a base b is then a set of N = bm points placed in ahypercube C such that each elementary interval with volume bt−m contains exactly bt points, foreach m ≥ t ≥ 0. Intuitively, t is a quality parameter, and best nets are these with t = 0. In thiscase, each elementary interval contains exactly one sample, which results in as uniform sampledistribution as possible.

A (t, s)-sequence in a base b is an infinite sequence in which ∀k > 0 each subsequence:

xkbm , . . . , xkbm+1−1 (3.42)

forms a (t,m, s)-net in a base b. It is worth to note that the Faure sequence in a base b isactually a (t, s)-sequence in such a base. Additionally, any (t, s)-sequence is a low discrepancysequence. Unfortunately, good properties of (t,m, s)-nets and therefore (t, s)-sequences in base bare obtained for sample numbers N = bm, m = 0, 1, 2, . . . , thus when N samples are not enoughto obtain desired accuracy, N should be increased b times. If b is large (from a practical point ofview: larger than 2), (t, s)-sequences are of little usefulness in rendering. Moreover, the base 2 isalso convenient due to very efficient implementations of (t, s)-sequence generators, which performlogic operations on individual bits instead of relatively costly integer multiplication, division andmodulus. Algorithms for constructing (t, s)-sequences in base 2 are one of the best choices forgeneration of quasi-Monte Carlo sample points. Unfortunately, for a given s there may exist nosequence with desirable quality t. In particular, it has been proven that existence of sequenceswith minimal possible t is linearly dependent on s. That is, tmin = O(s). For example, best (witht = 0) infinite sequence in base 2 exist up to s = 2. If quality requirements are released a bit,t = 1, the highest available s is 4.

The most common solution, and a solution chosen in our implementation, is usage of a (t, s)-sequence for a few first s dimensions of s′ dimensional space, and then fill the remaining dimensionswith pseudorandom numbers, for example based on hashing functions. If sampling space is definedso that the first few dimensions affect most of integration error, which is typically satisfied byintegrals in Light Transport Equation, a good quality infinitely dimensional sequence is achievablein such a way.

Page 34: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 23

3.3.3 Randomized Quasi-Monte Carlo Sampling

Quasi-Monte Carlo methods use deterministic, carefully designed sample sequences in orderto minimize integration error. However, when these methods are used for rendering, sometimesregular sampling patterns are visible in resulting images – see Figure 3.2, for example. What ismore, all methods for estimating random error due to variance are invalid with quasi-Monte Carlosampling. Nevertheless, despite their drawbacks, quasi-Monte Carlo sampling tends to produce lesserror, even if integrands are discontinuous and highly dimensional, which is common in graphics.These issues can be removed by randomized quasi-Monte Carlo methods [Hong & Hickernell, 2003].High quality results are obtained by using randomly scrambled (t,m, s)-nets and (t, s)-sequences.Integration error resulting from these algorithms can be analyzed by means of variance, and theystill have good sample distribution properties of deterministic quasi-Monte Carlo samples.

3.3.4 Comparison of Monte Carlo and Quasi-Monte Carlo Integration

It is interesting to compare the behaviour of Monte Carlo sequences and various techniquesof quasi-Monte Carlo sampling. This comparison is based on integration of two functions. First,a case which is well suited for QMC methods – a smooth 2D function given by the equationf(x, y) = arctan(x+y), integrated over a square [−1, 1]× [−1, 1]. Second, a more difficult example– a 4D discontinuous function – g(x, y, z, w) = sign(x)sign(y)sign(z)sign(w)exp(x2 + y2 + z2 +w2), integrated over an analogous square. Both functions can be integrated analytically in anobvious way, the results in both cases is 0. True random numbers for Monte Carlo integrationare simulated by a Mersenne Twister pseudorandom number generator [Matsumoto & Nishimura,

Figure 3.2: Patterns resulting from a quasi-Monte Carlo sampling. Top image: visible regular pat-terns due to Niederreiter-Xing sequence. Bottom image: pseudorandom numbers do not producenoticeable patterns. Both images have been rendered with low quality to show error more clearly.

Page 35: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 24

1e-007

1e-006

1e-005

0.0001

0.001

0.01

0.1

1

10

1 10 100 1000 10000 100000 1e+006 1e+007

Abs

olut

e in

tegr

atio

n er

ror

Number of samples (N)

Monte Carloquasi-Monte Carlo

1e-006

1e-005

0.0001

0.001

0.01

0.1

1

10

100

1000

1 10 100 1000 10000 100000 1e+006 1e+007

Abs

olut

e in

tegr

atio

n er

ror

Number of samples (N)

Monte Carloquasi-Monte Carlo

Figure 3.3: Comparison of Monte Carlo and quasi-Monte Carlo integration error with respectto the number of samples taken. Left image: integration error of a smooth 2D functionf(x, y) = arctan(x+y) over [−1, 1]2. Right image: integration error of a discontinuous 4D functiong(x, y, z, w) = sign(x)sign(y)sign(z)sign(w)exp(x2 +y2 +z2 +w2) over [−1, 1]4. In both cases QMCconverges substantially faster, yet the ordinary MC is better with small number of samples.

1998], while QMC is based on the Halton and the Faure in a base 2 sequences in the 2D case, andthe Niederreiter-Xing [Niederreiter & Xing, 1996] sequence in the 4D case. Error of integratingthese functions with respect to the number of samples taken is shown in Figure 3.3.

3.3.5 Quasi-Monte Carlo Limitations

Quasi-Monte Carlo sampling behaves substantially differently than classic random Monte Carlosampling. If quasi-Monte Carlo is used, there are pitfalls, which must be avoided. We found two ofthem worth to mention. First, it is a serious error to select every nth sample from a QMC sequence(see Figure 3.4), while it is perfectly fine with random numbers. For example, if only samples witheven indexes are chosen from a van der Corput sequence, exactly half of the domain is sampled,and the other half contains no sample points. Second, samples from different sequences maycorrelate. For example, if Faure sequences in bases 2 and 3 are mixed, some 2D projections exhibitvisible correlations, see Figure 3.4. Therefore, a single, well designed and proven to be correct,multidimensional sequence must be used for integration. It is a serious error, if, for example,camera rays are generated using one sequence, and light sources are sampled with another. Thisimportant aspect influenced design of our rendering software, see Section 6.1.1. Figure 3.5 presentswhat may happen if this is not assured.

Page 36: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 3. MONTE CARLO METHODS 25

Figure 3.4: Undesired correlations between quasi-Monte Carlo samples. Left image: 2D projectionof an erroneous 3D sequence generated from a van der Corput sequence, using 3n, 3n + 1, 3n + 2sequence points as nth 3D point coordinates. Right Image: 2D projection of an erroneous 5Dsequence constructed from two Faure sequences in bases 2 and 3. The projection, where bothdimensions came from sequences in different bases, forms a 4× 3 grid with difference in samplingdensity between individual rectangles of it.

Figure 3.5: Results of rendering with erroneous quasi-Monte Carlo sampling. The scene containsuniform gray matte walls and a sphere, illuminated by a rectangular light source with uniformintensity. All these conspicuous patterns are results of correlation between camera rays and lightsource sampling.

Page 37: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 4

Light Transport Algorithms

Light transport algorithms provide numerical solutions to the light transport equation – thesealgorithms are responsible for rendering process. Currently, only Monte Carlo or quasi-Monte Carlobased ray tracing techniques are capable of solving the light transport equation in its original form,without any simplifications. This chapter concentrates on analysis of inefficiencies and difficultinput scenes for different ray tracing algorithms, as well as design of improved algorithm, which isnot prone to majority of these flaws.

Ray tracing and global illumination algorithms are not new. The earliest well known raytracer was created by Whitted [Whitted, 1980]. Despite being a significant improvement overearlier methods, this algorithm still creates artificially looking images. Its main drawbacks arelack of global illumination, sharp shadows, sharp reflections and support of point light sourcesonly. An improved version of this algorithm is [Cook et al., 1984]. It can create smooth shadowsand reflections, but still cannot simulate global illumination. The first physically correct solution(under the assumption of application of geometric optics applicability) is Path Tracing [Kajiya,1986]. There is, however, an earlier interesting algorithm for global illumination [Goral et al., 1984],but it is based on radiosity technique and supports only ideal matte (Lambertian) reflection, thus isnot general enough to be considered as a full global illumination solution. Two important renderingtechniques – Russian roulette and Particle Tracing are introduced by Arvo and Kirk [Arvo & Kirk,1990]. Later, Particle Tracing has lead to bidirectional rendering techniques, and the Russianroulette becomes one of the main probabilistic algorithms used in virtually any ray tracer.

The original Path Tracing method, despite being correct, usually is not particularly efficient.The improved algorithm [Pharr & Humphreys, 2004] is somewhat better, but still cannot copewith many common lighting conditions. Most notably, all variants of Path Tracing fail duringsimulation of caustics or strong indirect illumination. Since the appearance of Path Tracing, therewere a lot of research dedicated to make this class of algorithms more effective.

The important unbiased methods are Bidirectional Path Tracing [Veach, 1997] and MetropolisLight Transport [Veach, 1997]. The Bidirectional algorithm handles indirect illumination andcaustics much better than ordinary Path Tracing due to its ability to trace light paths from acamera and from light sources as well. The Metropolis algorithm is designed to handle cases whichmuch of light is transported by relatively few paths, and majority of paths have negligible or zerocontribution. The Metropolis algorithm ability of using mutations of previously found paths tocreate new ones and carefully evaluated mutation acceptance probabilities ensure high efficiencyof this algorithm. However, major drawback of Metropolis method is the fact, that it cannotcompute absolute image brightness. Moreover, the Metropolis algorithm does not stratify sampleswell. Kelemen et al. [Kelemen et al., 2002] proposed mutations defined over a unit cube, whichincreases mutation acceptance rate. The more recent algorithm – Energy Redistribution PathTracing [Cline et al., 2005] – also uses mutation scheme, but is free of some of Metropolis samplingdefects.

The well known biased methods are Irradiance Caching [Ward et al., 1988, Ward & Heckbert,

26

Page 38: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 27

1992] and Photon Mapping [Jensen, 2001]. The Irradiance Caching is well suited only for diffusescattering, while Photon Mapping works well with arbitrary reflection functions. Irradiance cachingwas recently extended to radiance caching [Krivanek, 2005,Krivanek et al., 2006,Jarosz et al., 2008].The latter algorithm is capable of handling similar effects as Photon Mapping.

The major limitation of Photon Mapping is excessively large memory consumption, makingthis technique difficult to use in complex environments. Since the appearance of original PhotonMapping, there were a lot of work dedicated to improve or modify it. The paper of Fradin etal. [Fradin et al., 2005] describes how Photon Mapping can be modified to effectively use externalmemory in order to render huge scenes, far beyond the capabilities of original implementation. Fanet al. [Fan et al., 2005] illustrates how to incorporate advantages of Metropolis sampling into PhotonMapping algorithm. The original Photon Mapping fails when scene contains difficult visibilityconditions between light sources and camera, because a lot of stored photons can potentially beinvisible and therefore useless. On the other hand, the improved technique tries to take into accountviewer position while building photon map. This enhancement seems to be more reliable thanthree pass version of Photon Mapping [Jensen, 2001]. The original final gathering was designed tosubstantially improve rendered image quality by reducing blurriness resulting from using photonmap directly. However, using this technique causes a lot of additional rays to be traced andtherefore hurts Photon Mapping rendering speed drastically. Havran et al. [Havran et al., 2005]and Arikan et al. [Arikan et al., 2005] improve the efficiency of final gathering. Herzog et al. [Herzoget al., 2007] shows different algorithm of estimating irradiance when rendering photon map. Thetechnique is used to improve density estimation in diffuse or moderately glossy environment.

The rest of this chapter starts with brief comparison of ray tracing with alternative renderingalgorithms. Next, there is described a concept of light transport paths and a local path samplingtechnique and its limitations, which happens to be a major handicap for variety of unbiased raytracing algorithms. Later, the chapter explains a novel approach to full spectral rendering, whichprovides more reliability and integrates well with Monte Carlo and quasi-Monte Carlo ray tracing.This is followed by a detailed analysis of strengths and weaknesses of important existing lighttransport algorithms. Finally, an improved technique, designed to reduce impact of some of theflaws of current light transport algorithms is proposed.

4.1 Ray Tracing vs. Other Algorithms

Ray tracing algorithms are one of a few well-known popular rendering methods. Based on pointsampling of scene radiance distribution, they are substantially different from other approaches.These algorithms are not always the best choice. However, when light transport has to be simulatedexactly and the scene contains complex materials and complex geometry, currently there are noalternative to ray tracing algorithms for rendering. This section starts with fundamental distinctionbetween view dependent and view independent algorithms. Next, principles of ray tracing areexamined. This is followed by brief description of selected alternative approaches – hardwareaccelerated rasterization and radiosity. Finally, there are some advices when ray tracing is the bestchoice for rendering.

4.1.1 View Dependent vs. View Independent Algorithms

View dependent techniques compute solution that is valid only for particular view (i.e. cameralocation). Output from these algorithms is usually an image that can immediately be displayed.The general advantage is that majority of these techniques require very small amount of additionalmemory and are capable of rendering huge scenes without using external storage. However, somealgorithms are based on storing and interpolating already computed results, which acceleratesrendering speed. Nevertheless, this storage is not necessary for algorithm output, and if memorycosts become too high, it is always possible to switch to different algorithm, which does not imposeextra memory costs.

Page 39: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 28

On the other hand, view independent methods compute the solution for all views simultane-ously. The output is some kind of temporary data, which requires additional processing to bedisplayed. The most common representation are light maps. Light maps are simple additionalgrayscale textures for all scene primitives, which, when mixed with normal textures, give the ap-pearance of illuminated polygons, without using any lights. The advantage of this approach isthat light maps can be quickly rendered by graphic hardware, allowing real-time walkthroughsin globally illuminated scenes. Unfortunately, a scene textured with light maps must be static– any, even smallest, change to the scene makes the entire solution invalid. What is more, thesize of intermediate solution of full, unsimplified light transport equation is unacceptably huge.If a scene contains only matte, flat surfaces, the dimensionality of entire solution is two (solutiondomain is union of all scene surfaces). On the other hand, the full solution requires 6D domain(3D participating media instead of surfaces, 2D directions and 1D spectral data, S = V ×Ω× λ).

4.1.2 Ray Tracing Algorithms

Ray tracing is a class of view dependent techniques for generating images by tracing pathsof light and accounting for intersections of them with scene objects. In nature, light is emittedfrom light sources, and after scattering it may hit a surface of the sensor. The earliest ray tracingalgorithm [Whitted, 1980] reverses this process, and follows light paths from sensor, through sceneobjects to light sources. Nowadays, much more complex ray tracing algorithms have been intro-duced. The broad class of these techniques consists of methods using basic ray tracing operations –ray casting, which looks for a nearest intersection of a ray with scene geometry, and ray scattering,which generates new, bounced, ray after an intersection was encountered.

From mathematical point of view, ray tracing is a way of point sampling the scene radiancedistribution. The image is formed by integrating radiance over many rays. In simplified versionsof ray tracing, only specific phenomena can be rendered – e.g. perfect specular reflection andrefraction, point light sources and sharp shadows. On the other hand, in ray tracing based fullglobal illumination, any radiance distribution could be sampled by means of casting and scatteringrays, however at the high computational cost.

4.1.3 Hardware Accelerated Rasterization

The rasterization is an algorithm for rendering which draws individual primitives pixel-by-pixelinto a raster image. Similarly as ray tracing, rasterization is also view dependent. Typically,rasterization assumes that models are built from triangles and vertexes. In general, rasterizationconsists of three steps. First, model vertexes are transformed from a model space to a screen spaceby a model, view and projection matrixes. Then, primitives are assembled as simple triangles ormore complex polygons if triangle happens to be projected at the edge of the screen. Eventually,assembled polygons are rasterized (converted to fragments, which are then written into raster framebuffer). For decades these steps were frozen in graphics hardware designs, and only recently a bitof programmability has appeared. As of 2009 year, vertex, geometry and fragment processing ispartially programmable.

Obviously, rasterization is far inferior if compared to ray tracing when high quality imagesare generated. It is not based on physics of light transport and thus is not capable of correctlycalculating many important illumination features. In rasterization, individual pixels can be coloredby local illumination algorithms. Attempts to global illumination use various quirks based onmultipass rendering and use initial rendering pass results as textures for subsequent passes. Thesetricks make the rasterization design particularly unclean, and moreover, these effects are merelyapproximation, sometimes very poor. The tradeoffs between ray tracing and rasterization arepertinently described by Luebke and Parker [Luebke & Parker, 2008]: rasterization is fast – butneeds cleverness to support complex visual effects, ray tracing supports complex visual effects –but needs cleverness to be fast.

Page 40: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 29

Figure 4.1: An example light path.

4.1.4 Radiosity Algorithms

Radiosity algorithms [Cohen & Wallace, 1993] are radically different from ray tracing and ras-terization. The most fundamental difference is a view independent approach – radiosity algorithmscalculate radiance distribution L(x, ω) over the entire scene. The radiance distribution, however,is not represented exactly – a linear combination of carefully selected basis functions L is used.The dependence 2.12 can be concisely written as light transport operator A: L = Le + AL. Thelight transport operator A also is not represented exactly. Typically a sparse matrix A, whichapproximates the original equation: L = Le + A × L is used. Modern radiosity methods solveapproximated rendering equation in iterative approach:

L(1) = Le

L(n) = Le + A× L(n−1),

however first approaches used Gaussian elimination instead [Goral et al., 1984].

Since radiosity is well suited only for matte materials, there are some more advanced algorithmsdesigned to cope with this flaw. For example, multi-pass techniques [Wallace et al., 1987], whichmix ray tracing and radiosity are designed to incorporate reflections into scenes rendered withradiosity. However, these hybrid approaches are barely satisfying if applied to solve general lighttransport equation. Radiosity component is still a serious handicap, and ray-traced reflections areadded in a view-dependent way, effectively killing main radiosity advantage.

4.2 Light Transport Paths

Light transport paths are basic concept of ray tracing based global illumination algorithms. Alight transport path is a sequence of rays which connect a point on a light source with a point oncamera lens. Any point of interaction (i.e. scattering) of light is represented as a vertex of lighttransport path. Ray segment connects two adjacent vertexes and represents radiance transportbetween them along a straight line. An example light transport path is shown in Figure 4.1.

Virtually all unbiased ray tracing algorithms generate images by sampling space of all possi-ble light paths. A computed image is then defined as contribution of sampled path divided byprobability of its generation, averaged over large number of such samples. This is a direct appli-cation of Monte Carlo formula (Equation 3.18) to the image formation formula (Equation either2.37 or 2.43). Different algorithms use different approaches to sample path space and therefore

Page 41: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 30

paths are sampled with different probability densities. Quality of sampling, and therefore lighttransport algorithm, can be estimated by proportionality of its path sampling probability densityto contributions of paths. This is a consequence of importance sampling (Section 3.2.1).

The rest of this section starts with description of classification of light paths. Then there aredescribed methods of construction of light paths, and finally, an inherent limitation of local pathsampling technique is explained.

4.2.1 Classification of Paths

Light transport paths can be classified according to types of scattering at path vertexes. Scat-tering can be specular, if all possible scattering directions form less than two dimensional set, ordiffuse otherwise. Examples of specular scattering are an ideal mirror (with unique possible scat-tering direction) and an idealized anisotropic reflection (when scattered directions form 1D set).In case of specular scattering, scattering function fs is infinite and its values cannot be representedas real numbers. In this case a δ distribution with an appropriate coefficient is used.

The distinction between types of scattering based on ideal specular and non-specular scatteringevents is somewhat suboptimal. If fs represents highly polished, yet non-ideal, material, the fs isalways finite, but the scattering behaviour is almost as in specular case. It is far better to use anappropriate threshold to determine scattering type. If a fs for a given scattering event is greaterthan the threshold, the scattering is considered to be glossy. Otherwise, the scattering is matte.

Light paths can be described as sequences of scattering events. Heckbert [Heckbert, 1990]developed a regular expression notation for light paths. In this notation, paths are described byexpressions in the form of L(S|D)∗E, where L is a vertex on light source, E is a vertex on cameralens and S or D represent scattering, respectively specular or diffuse. To avoid confusion, in the restof this thesis, the L(S|D)∗E notation is used for ideal vs. non-ideal classification, and L(G|M)∗Enotation, where G means glossy and M matte scattering, for threshold based classification.

Regular expression based description of light paths was extended by Veach [Veach, 1997]. Inthis approach light and importance emission is split into spatial and directional components. Lightpath is then extended by two vertexes at the side of a light source and additional two vertexes at theside of a camera lens. The first and last vertex of path represents spatial component of emission.This component is specular in case of pinhole cameras and point light sources. Area or volumetricemitters and cameras with finite aperture are accounted diffuse. The second and one-before-lastvertexes represent directional component of emission. This component is specular in case of ortho-graphic projection camera or idealized laser light source. In the extended path notation, specialcases, i.e. light L and camera E vertexes, no longer exist – these symbols indicate only direction oflight flow through path. Paths are described by expressions like L(S|D)(S|D)(S|D)∗(S|D)(S|D)Eor L(G|M)(G|M)(G|M)∗(G|M)(G|M)E. Extended path notation is presented more intuitively inFigure 4.2.

4.2.2 Construction of Paths

All known unbiased methods are based on a local sampling scheme. The local sampling is atechnique of constructing light transport paths adding one ray segment at a time. Thus the pathcan be constructed using the following operations only:

• random selection of path vertex, typically at light source or camera lens, but also possiblyat ordinary surface or volume;

• addition of next vertex to already constructed subpath, the addition typically is performed byscattering a ray in random direction from last path vertex and search for nearest intersection;

• deterministic connection of two subpaths.

Page 42: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 31

pinhole camera

arealight sourceS

D

SS

D

D

D

extra cameravertexes

extra lightvertexes

Figure 4.2: Extended light path notation. Pinhole camera can be seen as additional specularscattering and area light source as additional diffuse scattering.

On the other hand, operation like calculating the point of reflection R on a mirror in such away that light travels from A to B through R is not local sampling, because it adds two segmentsat once. Non-local sampling is prohibitively complex to integrate into any ray tracing algorithm,and therefore is not used.

4.2.3 Local Path Sampling Limitation

A light transport path can be constructed with non-zero probability using local path sampling ifand only if it has two subsequent diffuse scattering events, i.e. its expression contains DD substring[Veach, 1997]. Particularly, local sampling cannot generate light paths having specular reflectionsseparated by single diffuse reflections with point light sources and pinhole cameras. An exampleof a such path is a caustic on a seabed, caused by a specular refraction on water surface, viewedindirectly through the same water surface (another specular refraction, both specular refractionsare separated by only one diffuse scattering at a seabed), presented in Figure 4.3. This fact becomesmore intuitive, when all possible ways of constructing light transport path with local path samplingpath are considered. First, a light source or a camera may be intersected at random. This canhappen if and only if they occupy a finite area and light or importance emission is not specular. Inthis case, path begins or ends with DD substring. Second, a subpath starting from camera can beconnected to subpath starting from light. This connection can be made only between two adjacentdiffuse vertexes.

pinhole camera

point light sourceS

D

SS

D

D

S

Figure 4.3: Difficult light path.

Page 43: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 32

Even if a light source or a camera are not point based, yet relatively small, and reflectionsare highly glossy, the image created with any unbiased method can have huge error. A methodfor estimation of such error, based on threshold path classification, is used in algorithm presentedin Section 4.5. On the other hand, the most advanced biased methods, which do not rely onconcept of light paths, do not have such flaw. The illumination in difficult lighting conditions canbe excessively blurry, but important features are never missing completely. Moreover, increasingrendering time progressively reduces blurring. Results of local path sampling and a biased method(our implementation of Photon Mapping) in highly glossy, yet not specular environment, arepresented in Figure 4.4.

Figure 4.4: Top left image: Reference rendering with local path sampling. Top right image:Rendering using light paths which do not cause excessive noise only. Bottom left image: Renderingwith problematic paths only. Bottom right image: Biased method applied to problematic paths.

4.3 Full Spectral Rendering

The color phenomenon is caused by a spectral mixture of light, perceived by the human vi-sual system. However, the human visual system cannot distinguish between arbitrary spectrallight distributions. Different spectra, which are indistinguishable by human observers, are calledmetamers. The space of colors recognizable by human observers contains only three independentvalues, hence the popularity of three component color models.

There are many color models in computer graphics, however most are designed for a specificpurpose only. The most common are: RGB designed for displaying images, CMYK for printing andHSV for easy color selection by user. All of these models are to some degree hardware dependent.There is, however, a standard model based on XYZ color space, which is independent of any

Page 44: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 33

hardware and can represent all the colors recognizable by a human observer. It was defined bythe CIE (Comission Internationale de l’Eclairage) as three weighting functions to obtain x, y,and z components from arbitrary spectra. Nevertheless, neither of these models is well suited forrendering, where direct calculations on spectra are the only way to produce correct results [Evans& McCool, 1999,Johnson & Fairchild, 1999].

A general description of many popular color models can be found in Stone [Stone, 2003]. Devlinet al. [Devlin et al., 2002] provide references related to data structures for full spectral renderingand algorithms for displaying spectral data. There are several works dedicated to simulation ofparticular spectral based phenomena. Wilkie et al. [Wilkie et al., 2000] simulated dispersion bymeans of classic (deterministic) ray tracing. Rendering of optical effects based on interferenceattracted a fair amount of attention. Reflection from optical disks is presented in Stam [Stam,1999] and Sun et al. [Sun et al., 2000]. Algorithms for accurate light reflection from thin layers canbe found in Gondek et al. [Gondek et al., 1994] and Durikovic and Kimura [Durikovic & Kimura,2006]. The latter paper also shows how this algorithm can be run on contemporary GPUs.

Many papers present methods for representing and operating on spectral data. Peercy [Peercy,1993] designed a spectral color representation as a linear combination of basis functions, chosen ina scene dependent manner. Different algorithm using basis functions is described by Rougeron andPeroche [Rougeron & Peroche, 1997]. It uses adaptive projection of spectra to hierarchical basisfunctions. Sun et al. [Sun et al., 2001] proposed a decomposition of spectra on smooth functions andset of spikes. Evans and McCool [Evans & McCool, 1999] used clusters of many randomly selectedspectral point samples. Johnson and Fairchild [Johnson & Fairchild, 1999] extended OpenGLhardware rasterization to support full spectra.

Dong [Dong, 2006] points that typically only a part of the scene needs a full spectral simulationand using RGB together with full spectrum can accelerate rendering at cost of only slight qualityloss. Ward and Eydelberg-Vileshin [Ward & Eydelberg-Vileshin, 2002], however, designed a three-component model optimized for rendering, which typically produces images with an acceptable yetimperfect quality, but the model is not general enough and cannot simulate wavelength dependentphenomena like dispersion.

The rest of this section starts with explanation why rendering with full spectrum is necessary.Next, a random point sampling as a method of representing spectra is presented. This is followed bya detailed description of our novel sampling technique, designed to substantially reduce varianceof rendering of many wavelength dependent phenomena. Finally, a method for combination ofsampling of light transport paths with spectrum sampling is explained. Research results explainedin this section are also presented in [Radziszewski et al., 2009].

4.3.1 Necessity of Full Spectrum

The RGB model is often used for rendering color images. However, this is an abuse of it, sinceRGB based rendering does not have any physical justification. The model was designed for storageand effective display of images on a monitor screen, but not for physically accurate rendering. Thelight reflection computation under the assumption of elastic photon scattering is performed by amultiplication of a spectrum that represents an illumination and a spectrum describing surfacereflectance. This multiplication actually must be performed on spectral distribution functions, noton RGB triplets, in order to get proper results.

The RGB based reflection of white light, or light with smoothly varying spectrum, from a surfacewith smoothly varying reflectance, typically does not produce substantial inaccuracies. However,when at least one of spectra has large variation, the simulation using RGB model becomes visiblyincorrect (see Figure 4.5, for example). Moreover in global illumination, due to multiple lightscattering, even white light becomes colorful, causing scattering inaccuracies to accumulate. Thismakes RGB based global illumination rendering results unable to accurately capture the physicalphenomena.

In addition, the most visually distracting error from using an RGB model appears in simulationof phenomena like dispersion. Whenever RGB based light, from a light source with almost parallel

Page 45: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 34

Figure 4.5: Left image: copper sphere illuminated by a D65 white light. Right image: coppersphere illuminated by a triangular spectral distribution stretched from 535nm to 595nm. Top lefthalf: an RGB model with 645nm, 526nm and 444nm wavelengths. Right bottom half: our fullspectral model. For clarity, only diffuse reflection is calculated.

output rays, hits a prism, it is scattered into three bands instead of continuous full spectrum, andthe rest of the image remains dark (see Figure 4.6), which looks unrealistic. Using a full spectrumrepresentation gives a continuous rainbow of colors. However, good-looking results may be obtainedby an RGB representation if the light source angular distribution is conical and divergent enough.A similar trick is a basis of a simple Nvidia shader demo [Fernando & Kilgard, 2003]. An addressof the texture on a surface, which is seen through glass, is offseted independently for each channel.If texture data is blurred enough, the resulting ’spectrum’ is smooth. Nevertheless, both of thesemethods do not have any physical significance, and obviously are incorrect, but, in some conditions,can look convincing.

Figure 4.6: Dispersion on a prism. Top row: RGB model with 645nm, 526nm and 444nm wave-lengths. Bottom row: physically correct full spectrum. The light collimation is controlled by aPhong-like function I cosn(φ), with exponent n decreased four times in each subsequent column,and intensity I doubled to compensate for light scattering.

Page 46: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 35

4.3.2 Representing Full Spectra

Full spectral rendering requires an efficient method for representing spectral data. The mostcommon techniques are based on linear combinations of carefully selected basis functions [Peercy,1993,Rougeron & Peroche, 1997,Sun et al., 2001] and point sampled continuous functions [Evans &McCool, 1999]. Efficiency of the linear combination approach is strongly dependent on the actualfunctions and their match to scene spectral distribution. However, the natural solution in MonteCarlo based rendering system is a random point sampling.

Random Point Sampling

Random point sampling produces noise at low sampling rate, but well-designed variants of thistechnique converge quickly. Point sampling can effectively handle smooth (like tungsten bulbs)light distributions and very narrow spikes (like neon bulbs) in the same scene. The two greateststrengths of this technique are: randomly selected wavelengths and well defined wavelength valuefor each spectral sample. The first one ensures correctness, since when more samples are computed,the more different wavelengths are explored, and due to the law of large numbers, the renderingresult converges to the true value. The second allows simulating wavelength dependent effects likedispersion at the cost of additional color noise.

It is worth to note that wavelength dependent phenomena cannot be simulated correctly withalgorithms based on linear combinations of basis functions with non-zero extent in wavelengthspace. Even if spectra are represented by unique non-zero coefficients, the corresponding basisfunctions still have some finite extent, which prevents from doing exact computations with explicitwavelength required.

The simplest approach to point sampled spectra is generation of a single spectral sample perlight transport path. However, according to Evans and McCool [Evans & McCool, 1999], thistechnique is inefficient, since it causes a lot of color noise. They proposed using a fixed numberof several spectral samples (called a cluster of samples) traced simultaneously along a single lightpath, which substantially reduces variance with minimal computational overhead.

Basic Operations

The implementation of multiplication, addition, minimum, etc. operators are obvious, since itis enough to perform appropriate calculation per component, as in RGB model. However, whenusing full spectrum, computing luminance is a bit more difficult. Particularly, luminance of aspectrum which describes reflectivity of a surface, by definition must be in [0, 1] range.

However, computing luminance as a Monte Carlo quadrature of product of reflectance spectrumr(λ) and scaled CIE y weighting function, may randomly lead to numerical errors causing luminanceto exceed 1.0 threshold. The equation:

l ≈n∑i=1

r(λi)y(λi)p(λi)

/ n∑i=1

y(λi)p(λi)

, (4.1)

where r(λ) is the reflectance, y(λ) is CIE y weight and p(λi) is a probability of selecting given λi,solves the issue. It guarantees that the luminance is in [0, 1] range, provided that r(λ) is also inthe specified range.

Wavelength dependent effects can be handled as proposed by Evans and McCool [Evans &McCool, 1999] for specular dispersion – by dropping all but one spectral sample from a cluster.This is done by randomly selecting a sample to preserve, with uniform probability. All the samples,except the selected one, are then set to zero, and the power of the chosen one is multiplied by thecluster size. Then the wavelength parameter becomes well defined, and further computations areperformed with usage of its actual value. However, when simulated phenomena are not opticallyperfect, like in Phong-based glossy refraction, it may be more efficient to trace the whole cluster,scaling power of each sample independently. We examine this approach in detail in the next section.

Page 47: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 36

4.3.3 Efficient Sampling of Spectra

Evans and McCool [Evans & McCool, 1999] simulate wavelength dependent phenomena bytracing only one spectral sample per path. This particular approach is always correct, and isnecessary when a phenomenon is optically perfect, such as refraction on idealized glass. However,when the scattering is not ideal, dropping all but one spectral sample from a cluster, while stillbeing correct, might be extremely wasteful and inefficient. In this section we propose a substantiallyimproved technique.

Single Scattering Model

For testing purpose, a refraction model with an adjustable, wavelength dependent refractionand imperfection introduced by Phong-based scattering [Phong, 1975], with controllable glossinessis used. Perhaps an extension to Walter et al. microfacet based refraction [Walter et al., 2007]supporting dispersion would give better results, but their model is much more complicated andtherefore would make evaluation of spectral sampling difficult. Nonetheless, since we have nevermade assumptions about scattering model, our results are general and applicable to any wavelengthdependent phenomena. For clarity, all tests are based on a single scattering simplification (i.e. lightis refracted once, when it enters into glass only). The x component in CIE XYZ space in outgoingdirection ωo is then described by the following formula:

ICIEx(ωo) =∫

Λ

∫Ω

fs(ωi,ωo, λ)Lλ(ωi, λ) · wCIEx(λ)dσ⊥(ωi)dλ, (4.2)

where Λ is the space of all visible wavelengths, Ω is the space of all direction vectors, Lλ(ωi, λ) isthe radiance incoming from direction ωi, wCIEx is the CIE weight for x component, and σ⊥(ωi)is the projected solid angle measure. The y and z components can be evaluated in a similar way.In the rest of this section, the Formula (4.2) is written in a simplified, still not confusing, form:

I =∫

Λ

∫Ω

f(ω, λ)L(ω, λ)w(λ)dσ⊥(ω)dλ. (4.3)

Basic and Cluster Based Monte Carlo Estimators

The Monte Carlo method (Equation 3.18) can be applied to evaluate the two integrals fromFormula (4.3), which leads to the following estimator:

I ≈ 1N

N∑i=1

f(ωi, λi)pdfσ⊥(ωi, λi)

w(λi)pdfλ(λi)

L(ωi, λi), (4.4)

where pdfσ⊥ is the probability of selection of a given ωi evaluated with the σ⊥(ω) measure onΩ and pdfλ is the probability of selection of a given λi. Quality of this estimator, and all thefurther estimators in this section, relies on the assumption that scattering model offers properimportance sampling (Section 3.2.1), i.e. f(ω, λ) ∝ pdfσ⊥(ω, λ) is roughly satisfied. However, thisbasic estimator is inefficient, because it forces the numbers of spectral and directional samples tobe equal. Each directional sample requires additional rays to be traced, which is computationallyexpensive, while spectral samples are almost for free. This explains the advantage of clusters ofspectral samples over a single spectral sample approach.

The main improvement over Evans and McCool method is tracing a full cluster of spectralsamples, even when wavelength dependent phenomenon is encountered. Wavelength dependencecan be defined precisely as the dependence of pdfσ⊥ on λ. If scattering is not wavelength dependent,directional sampling is not wavelength dependent as well, i.e. pdfσ⊥(ω, λ) ≡ pdfσ⊥(ω). In ourmethod, a particular spectral sample λsi is selected at random from a cluster, and its value is used

Page 48: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 37

for sampling ωsi . This leads to the color estimator in the form:

I ≈ 1NC

N∑i=1

C∑j=1

f(ωsi , λ

ji )

pdfσ⊥(ωsi , λ

si )

w(λji )pdfλ(λji )

L(ωsi , λ

ji ) =

=1NC

N∑i=1

1pdfσ⊥(ωs

i , λsi )·C∑j=1

f(ωsi , λ

ji )

w(λji )pdfλ(λji )

L(ωsi , λ

ji ), (4.5)

where N is the number of traced clusters, C is the number of samples in each cluster, and pdfσ⊥is the probability of selecting scattering direction, calculated for the selected wavelength λsi . Theestimator (4.5) can be more efficient than estimator (4.4), since it traces C spectral samples at theminimal additional cost. On the other hand, it may deteriorate the importance sampling qualitysignificantly. This happens because all samples with potentially wildly different f(ωs

i , λji ) values are

traced, and just one probability pdfσ⊥(ωsi , λ

si ) which matches the shape of f(ωs

i , λsi ) only, is used.

Whenever a direction ωsi with low probability pdfσ⊥(ωs

i , λsi ) is chosen at random, and at least one

of the f(ωsi , λ

ji ) has a relatively large value in that direction, the value is no longer cancelled by the

probability, leading to the excessively high variance in the rendered image. Moreover, the estimator(4.5) is incorrect whenever ∃λsi ,ωs

i : pdfσ⊥(ωsi , λ

si ) = 0 and ∃λji : f(ωs

i , λji ) > 0, particularly when

a wavelength dependent phenomenon is optically perfect, i.e. its f is described by a δ distribution.Thus, the initial version of our new approach is not always better than the traditional technique oftracing only one spectral sample. The question is when the new technique exhibits lower varianceand when it does not.

Multiple Importance Sampling Estimator

Fortunately, the variance issue can be solved automatically. Simple modification of the estima-tor (4.5), which incorporates Multiple Importance Sampling [Veach & Guibas, 1995] (see Section3.2.1), gives a better estimator with variance as low as possible in a variety of conditions. The newimproved estimator is constructed from the estimator (4.5) multiplying each cluster by C and aweight W s

i equal to:

W si =

pdfσ⊥(ωsi , λ

si )∑C

j=1 pdfσ⊥(ωsi , λ

ji ), (4.6)

where pdfσ⊥(ωsi , λ

si ) is the probability with which the scattering direction is selected, and the

values pdfσ⊥(ωsi , λ

ji ) are hypotethical probabilities of selecting the sampled direction if using λji

value instead. This leads to the final estimator:

I ≈ 1NC

N∑i=1

CW si

pdfσ⊥(ωsi , λ

si )·C∑j=1

f(ωsi , λ

ji )

w(λji )pdfλ(λji )

L(ωsi , λ

ji ) =

=1N

N∑i=1

1∑Cj=1 pdfσ⊥(ωi, λ

ji )·C∑j=1

f(ωsi , λ

ji )

w(λji )pdfλ(λji )

L(ωsi , λ

ji ). (4.7)

Assuming that a scattering model provides proper importance sampling, the estimator (4.7) leadsto a low variance result. Moreover, the estimator (4.7) is correct whenever scattering model iscorrect, i.e. whenever ∀ω, λ : f(ω, λ) > 0 pdfσ⊥(ω, λ) > 0, so it is applicable even to opticallyperfect wavelength dependent phenomena. However, in this case it does not provide any benefitover estimator (4.4). The comparison between the new estimators (4.5) and (4.7) and the previoussingle sample estimator (4.4) is presented in Figure 4.7. The glass sphere has linearly varyingrefraction from 1.35 for 360nm to 1.2 for 830nm and uses Phong based scattering with n = 1000.Images are created using only two 16-sample clusters, to show error more clearly.

Generation of Clusters

In order to generate clusters efficiently, two issues have to be solved, namely: how many samplesshould a single cluster contain, and how to generate them. The number of spectral samples in a

Page 49: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 38

Figure 4.7: Comparison between new initial estimator (left), new improved estimator (middle) andprevious method (right). The new initial estimator exhibits more variance due to lack of properimportance sampling. The color noise from single sample approach makes the rightmost imagebarely legible.

Figure 4.8: Selection of optimal number of spectral samples for a single cluster: 4 samples (left),8 samples (middle), 12 samples (right). All images were rendered in 640x480, with 200k imagesamples (i.e. spectral clusters).

cluster is an important decision for achieving best possible performance. Unfortunately, optimalnumber of such samples is highly scene dependent. The more variation emission and reflectancespectra have, the more spectral samples a single cluster should contain. Assuming that a scenecontains rather smoothly varying spectra (this assumption typically is satisfied), it is possible tobalance excessive color noise and computational overhead. After a few tests we have found thateight spectral samples are optimal1. Four samples cause significant noise and twelve give barelyvisible improvement (see Figure 4.8). Rendering time differences between these images have beenless than 1%, which confirms the efficiency of a cluster approach.

The efficient generation of spectral samples proves to be more difficult. Spectra should beimportance sampled, but there are at least three factors, which should affect choice of pdfλ, namely:sensor (camera, human eye, etc.) sensitivity, light source spectral distribution and reflectanceproperties of materials. However, often only sensor is taken into account, and it is assumed that itssensitivity is well described by CIE y weighting function. Unfortunately, despite producing goodquality grayscale images, importance sampling wavelength space with respect to the y functioncauses excessive color noise, and, contrary to common knowledge, is suboptimal. Ideally, a samplingprobability should take into account all three x, y and z components. After some experiments, wefound that following probability gives good results:

pdfλ(λ) = N−1f(λ), f(λ) =1

cosh2(A(λ−B)), (4.8)

1Due to Intel SSE instruction set optimization, our implementation requires the number of samples to be divisibleby four.

Page 50: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 39

Figure 4.9: Various methods of sampling spectra. Top row: 2000K blackbody radiation. Bottomrow: D65 spectrum. Left column: spectra sampled using random numbers and our importancesampling, with various numbers of samples. Middle column: comparison of luminance basedimportance sampling (top halves) with our pΛ (bottom halves) using 128 spectral samples. Rightcolumn: spectra sampled using Sobol low discrepancy sequence and our pΛ, using 4 and 8 spectralsamples.

where A = 0.0072nm−1 and B = 538.0nm are empirically evaluated constants and

N =∫

Λ

f(λ)dλ =1A

(atanh

(A(λmax −B)

)− atanh

(A(λmin −B)

))(4.9)

is a normalization factor. Results of this improved technique are presented in Figure 4.9.

Moreover, since spectra are typically smooth, sampling them with quasi-Monte Carlo (QMC)low discrepancy sequences instead of random numbers improves results. However, care must betaken when QMC sampling is applied to cluster based spectra. When a wavelength dependenteffect is to be simulated, a single sample from the cluster has to be chosen. This choice is trickydue to peculiarities of QMC sampling. In case of true random numbers, selection of first samplefrom a cluster always works correctly. On the other hand, it is a serious error to select an every nthsample from a low discrepancy sequence. In the latter case, we assign a separate (pseudo)randomsequence for a such selection of a spectral sample, in addition to sequence used for randomizingcluster samples. Results of QMC sampling are presented in Figure 4.9.

4.3.4 Results and Discussion

Some more comparison between single spectral sample approach and improved technique ispresented in Figure 4.10. Images in top row use previous settings (refraction coefficient from 1.35for 360nm to 1.2 for 830nm and glossiness coefficient n = 1000). Next, images in bottom row usemuch sharper settings (refraction coefficient from 1.5 for 360nm to 1.2 for 830nm and glossinesscoefficient n = 4000). Images from first and second column are rendered to have approximatelythe same quality, and images from second and third column are rendered with the same number ofsamples (i.e. traced rays). The average numerical error for various numbers of rays for scene fromFigure 4.10 is summarized in Table 4.1.

Limit Cases

Analysis of two limit cases could give more insight into how this new technique works, and whenit is most effective. The analysis is based on the assumption that f(ω, λ) ∝ pdfσ(ω, λ) is roughly

Page 51: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 40

Figure 4.10: Imperfect refraction with dispersion. Top left image uses previous approach witha massive number of 900 samples per pixel. Top middle image uses new technique with just 50samples per pixel, yet it has similar quality. Top right image again uses previous approach, butwith 50 samples per pixel. However, gains from using the new technique are less spectacularwhen glossiness or dispersion is increased. Bottom row images use 900, 100, and 100 samples,respectively.

Settings C MIS SSS

n = 1000η = [1.35, 1.20]

1 1.26 · 10−1 2.47 · 10−1

4 6.67 · 10−2 2.02 · 10−1

16 2.63 · 10−2 1.34 · 10−1

64 1.22 · 10−2 7.56 · 10−2

256 5.32 · 10−3 3.83 · 10−2

n = 4000η = [1.50, 1.20]

1 2.07 · 10−1 2.46 · 10−1

4 1.33 · 10−1 1.96 · 10−1

16 7.39 · 10−2 1.29 · 10−1

64 3.84 · 10−2 7.37 · 10−2

256 1.74 · 10−2 3.72 · 10−2

Table 4.1: Comparison of error of our method (MIS) and a single spectral sample approach (SSS),for C 8-sample spectral clusters per pixel. The error is evaluated as a difference between the testedimage and the reference image, averaged over all pixels and color components. The pixel valuesare normalized to [0, 1] range.

satisfied. Otherwise, the multiple importance sampling approach cannot help much in reducingvariance.

First, when wavelength dependence is little and the reflection is fairly matte, all the scatteringprobabilities become more and more independent on λ: pdfσ(ωs

i , λji ) ≈ pdfσ(ωs

i ). The weight W si

then becomes:

W si =

pdfσ(ωsi , λ

si )∑C

j=1 pdfσ(ωsi , λ

ji )≈ pdfσ(ωs

i )∑Cj=1 pdfσ(ωs

i )→ 1

C, (4.10)

Page 52: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 41

Figure 4.11: Analysis of behaviour of estimator (4.7) with increasing glossiness and wavelengthdependence of scattering. Wavelength independent scattering (leftmost image). Optically perfectwavelength dependent scattering (rightmost image). Intermediate cases (middle). All the imagesare rendered with just four clusters.

and the estimator:

I ≈ 1N

N∑i=1

Wi

pdfσ(ωsi , λ

si )·C∑j=1

f(ωsi , λ

ji )

w(λji )pdfλ(λji )

L(ωsi , λ

ji )→

→ 1NC

N∑i=1

C∑j=1

f(ωsi , λ

ji )

pdfσ(ωsi )

w(λji )pdfλ(λji )

L(ωsi , λ

ji ), (4.11)

which is an estimator of a simple, wavelength independent, scattering.

Second, when scattering becomes more and more glossy and wavelength dependence is signif-icant, with probability close to one the f becomes close to zero for all directions except ωs

i . Therare cases, when f(ωj

i , λji ) is large and j 6= s, have low weight W s

i , and therefore they cannotaffect the estimator significantly. Moreover, all the probabilities but the selected one go to zero,and therefore the weight W s

i for directions preferred by fs for λsi goes to one, which leads to theestimator equal to:

I ≈ 1N

N∑i=1

W si

pdfσ⊥(ωsi , λ

si )·C∑j=1

f(ωsi , λ

ji )

w(λji )pdfλ(λji )

L(ωsi , λ

ji )→

→ 1N

N∑i=1

f(ωsi , λ

si )

pdfσ⊥(ωsi , λ

si )

w(λji )pdfλ(λji )

L(ωsi , λ

ji ), (4.12)

which is equivalent to the one sample estimator. This behaviour of estimator (4.7) is presented inFigure 4.11.

The former approach to spectral rendering separates scattering into two artificial cases: stan-dard wavelength independent scattering, and costly simulation of wavelength dependent phenom-ena using single spectral sample estimator. On the other hand, our method does not depend onsuch classification. Due to automatically computed weights, it adjusts itself to these two limitcases, and to the broad spectrum of intermediate cases, when scattering is wavelength dependent,but imperfect. The computational cost of our method depends on strength of wavelength depen-dence and optical perfection of material. These factors cause the computational cost to increase,but it never exceeds the cost of single spectral sample estimator.

Sampling of Light Transport Paths

Our spectral sampling was derived for a single scattering model. However, it is easy to generalizeto light transport path sampling – a case when there could be more than one wavelength dependentscattering encountered on the same light path. The wavelength λsi is selected once for a wholepath, and reused at each scattering. The weight W s

i is therefore computed for the whole path,using products of probabilities instead of probabilities of single scatterings. For example, assuming

Page 53: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 42

that the sampled path is build by recursively sampling fs and tracing rays in sampled directions,the W s

i is given by the following expression:

W si =

∏mk=1 pdfσ⊥(ωs

ki, λsi )∑C

j=1

∏mk=1 pdfσ⊥(ωs

ki, λji ), (4.13)

where k is the number of a scattering event and m is the length of the sampled path. Intuitively,a weight W s

i is a ratio of probability of generating the whole path using selected wavelength λsito the sum of probabilities of generating such a path using each wavelength from a cluster. If alight transport algorithm generates a path in a different way, or does not use a concept of lighttransport paths, the weight W s

i has to be computed in a different manner. The integration of ourspectral sampling with individual light transport algorithms is presented in details in Section 4.4.

4.4 Analysis of Selected Light Transport Algorithms

There is no perfect rendering algorithm. Any of them is well suited to particular input scenes,but is inferior in rendering others. In this section there is a detailed analysis of strengths andweaknesses of selected rendering algorithms. The analysis is based, among others, on classificationof light paths described in section 4.2.1. Moreover, methods for integration of full spectral sampling(Section 4.3) with presented algorithms are given. The described algorithms are: Path Tracing,Bidirectional Path Tracing, Metropolis Light Transport, Energy Redistribution Path Tracing, Irra-diance and Radiance Caching, and Photon Mapping. Additionally, we have proposed optimizationsfor some of these algorithms, and tested Path Tracing and Photon Mapping in restricted versions– which are potential candidates for real time global illumination.

4.4.1 Path Tracing

Path Tracing algorithm, the historically first mathematically correct solution to the light trans-port problem (Equation 2.12) was given by Kajiya [Kajiya, 1986]. Kajiya also was the first toformulate the Equation 2.12, in the same paper. However, today the original formulation of PathTracing is considered to be ineffective, and our analysis is based on Pharr and Humphreys ver-sion [Pharr & Humphreys, 2004]. This version of Path Tracing improves its convergence andprovides a support for volumetric rendering (Equation 2.22). The algorithm is based on integralover paths formulation (Equation 2.43) of the light transport.

The Path Tracing method is based on generating light paths from a camera towards lightsources. First, a vertex on a camera lens and a sampled direction is chosen at random. Then,the path is constructed incrementally: in loop, nearest intersection is found, and ray is scatteredfrom intersection point in a random direction. Light sources can be either intersected at randomor their illumination can be accounted for directly (Figure 4.12). In the second case, a point yi+1

on a light source is chosen at random, and a visibility test between it and a point xi on a lighttransport path is performed. The loop stops when either a ray exits the scene and escapes toinfinity or absorption happens. The absorption is a variant of Russian roulette (Equation 3.30),which with some probability terminates light path evaluation, and therefore with probability oneafter some finite number of steps finishes the loop.

Radiance Estimator

According to Equation 2.43, image is defined as an integral over all possible light transportpaths x. In order to evaluate standard Monte Carlo estimator (Equation 3.18), or better MultipleImportance Sampling estimator (Section 3.2.1), of the integral 2.43, the path contribution f(x)and probability density pdf(x) have to be evaluated. In path tracing a light transport path xj

Page 54: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 43

camera

x0

x1

x2

x3

x4

x5

y2

y3y4

y5

y6

Figure 4.12: A path generated by Path Tracing algorithm. The solid lines represents rays tracedtowards nearest intersection points. The dashed lines represents visibility tests.

with length j can be generated in two ways, with probabilities equal to:

pdf1(λ, x[j]) = pdfW (x0)pdfW (x0 → x1)pdftr(x0 → x1)Gx(x0 ↔ x1)·

·j−1∏i=1

[pdfσx(xi−1 → xi → xi+1)pdftr(xi → xi+1)Gx(xi ↔ xi+1)·

· pdfc(xi−1 → xi → xi+1)] (4.14)

pdf2(λ, x[j]) = pdfW (x0)pdfW (x0 → x1)pdftr(x0 → x1)Gx(x0 ↔ x1)·

·j−2∏i=1

[pdfσx(xi−1 → xi → xi+1)pdftr(xi → xi+1)Gx(xi ↔ xi+1)·

· pdfc(xi−1 → xi → xi+1)]pdfLx(yj),

(4.15)

where pdf1 is the probability of generating paths with randomly intersected light sources, pdf2 isthe probability of generating paths with special treatment of light sources,

pdfσx(xi−1 → xi → xi+1) =

pdfσ⊥(xi−1 → xi → xi+1), if xi ∈ Apdfσ(xi−1 → xi → xi+1), if xi ∈ V

(4.16)

is scattering probability measured with respect to projected solid angle in case of surface scatteringor ordinary solid angle in case of volumetric scattering, pdfL(x) and pdfW (x) are probabilities ofselecting a point on a light source or a camera lens, pdfL(x → y) and pdfW (x → y) are theprobabilities of emission in a given direction, and

pdfc(xi−1 → xi → xi+1) = min(

1,‖tr(xi → xi+1)fx(xi−1 → xi → xi+1)‖λpdftr(xi → xi+1)pdfσx(xi−1 → xi → xi+1)

)(4.17)

is a Russian roulette continuation probability after scattering at vertex xi (i.e. 1 − pdfc is anabsorption probability). The tr factor is defined by Equation 2.15, Gx by 2.40, and fx by 2.42.All these probabilities might be dependent on wavelength λ. The norm ‖·‖λ converts a functionof wavelength into a real value. Typically, ‖f(λ)‖λ = luminance(f(λ)) is used, but we found thatusing

‖f(λ)‖λ = f(λs), (4.18)

where λs is a wavelength chosen from a cluster for generating a light transport path, togetherwith spectral Multiple Importance Sampling (pdfc used with norm 4.18 is wavelength dependent),reduces slightly color noise in rendered images. The error reduction of applying 4.18 over usingluminance for this purpose, calculated as L1 norm of difference from reference image is about7% ± 2%. This is not substantial, but the resulting images have fewer spots of wrong color withless intensity, and the technique does not increase rendering time, so it is worth to apply.

Page 55: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 44

The path contribution is given by the following equation:

f(λ, xj) = We(x0 → x1)Gx(x0 ↔ x1)·

·j−1∏i=1

[fx(xi−1 → xi → xi+1)Gx(xi ↔ xi+1)

]Lex(xj−1 → xj), (4.19)

where We is a camera sensitivity and Lex is emitted radiance (Equation 2.40). Obviously, thepath contribution f(λ, xj) has the same form as integrand from integral form of Light TransportEquation (Equation 2.43).

These equations omit cases of short light paths in open environments. However, the fix isobvious. If a camera ray escapes to infinity (one vertex path), the path contribution to the imageis assumed to be zero. Assuming that due to finite absorption probability 1 − qi in each ithscattering event, with probability one the path is to be terminated after k scattering events. Bothtechniques used for generating paths are combined using Multiple Importance Sampling. Thespectral radiance estimator along a ray x0 → x1 is then a sum:

Lλi(x0 → x1) =k∑j=2

(CW 1

j

f(λi, x1j )

pdf1(λs, x1j )

+ CW 2j

f(λi, x2j )

pdf2(λs, x2j )

), i = 1, 2, . . . C (4.20)

Wαj =

pdfα(λs, xj)∑Ci=1 pdf1(λj , xj) +

∑Ci=1 pdf2(λj , xj)

,

where Wαj is Multiple Importance Sampling weight. The cluster of C spectral samples is evaluated

at once, using a path constructed with a randomly chosen λs value. It is not necessary that thenumber of samples taken from techniques 1 and 2 are equal, but in practice this approach givesgood results. When Equations 4.14, 4.15 and 4.19 are substituted into 4.20, there happens to bea lot of cancelling factors in numerators and denominators, so the actual implementation is muchsimpler than appears to be.

Algorithm Flaws

Unfortunately, Path Tracing is not a particularly efficient approach. Besides local path samplinglimitation, which affects virtually all unbiased light transport algorithms, it exhibits a few majorflaws. The most common scene configurations, which cause Path Tracing to fail, are:

• Dominant indirect illumination – directly illuminated surface area is little comparatively tototal scene surface area.

• Caustics, especially from relatively small light sources.

• Inappropriate local sampling – directions favorized by local scattering do not match lighttransport over whole paths.

The first two problems can be assigned to asymmetry of Path Tracing light path generation.Light paths are started from sensor, while this is not always optimal. Mirrors placed near cameraare therefore handled efficiently, but ones placed near light sources are not. The difficult caseof dominant indirect illumination is presented in Figure 4.13. In 1990 Arvo and Kirk [Arvo &Kirk, 1990] presented an algorithm which represents the light as a stream of particles, photons,which are traced from light sources. After photons hit surfaces, they are scattered in randomdirections, while every hit (if visible) is recorded by a camera. This technique solves inefficienthandling mirrors placed near light sources (thus allowing effective rendering of caustics), but itfails when mirrors appear next to camera, which is an easy case for Path Tracing. In fact it doesnot solve the asymmetry problem, but inverts it. Moreover, particle transport suffers from verypoor performance when scene contains many light sources and majority of them are invisible tocamera. That is, scenes are rendered with one camera, and might contain a lot of light sources,which is a potential issue for algorithms which trace rays from lights towards a camera. The issuewith inappropriate local sampling is discussed more thoroughly in Sections 4.4.2 and 4.4.3.

Page 56: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 45

Figure 4.13: Comparison of results of Path Tracing rendering of scenes with different illumination.Left image: Mostly directly illuminated scene with matte surfaces. Right image: Moving thesphere under the light source increases the role of indirect illumination, which causes Path Tracingutter failure. Both images were rendered with 4M samples in 640x480 resolution.

Figure 4.14: Results of simplified path tracing. Top left: direct illumination only, 2M samples,55sec. Top right: one indirect bounce, 2M samples, 1min 20sec. Bottom left: two indirect bounces,2M samples, 2min. Bottom right: reference full global illumination, 8M samples, 13min.

Towards Real Time Global Illumination

Despite its poor quality at simulating sophisticated lighting phenomena, Path Tracing is fairlygood at rendering simple global illumination effects, especially when rendering time is more im-portant than final image quality. Path Tracing does not employ any complex analysis of gatheredsamples, and ray casting is the only one time consuming procedure in this algorithm. If illumina-tion is restricted to a direct component, and, say, at most two indirect ray bounces, a scene canbe rendered in reasonable quality and in a reasonable amount of time. What means reasonableis the matter of personal taste. One can assess quality of images presented in Figure 4.14, sincethis method would never provide any meaningful and objective error bounds. Moreover, if the

Page 57: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 46

cam

era

x0x1 x2

x3 x4

y2y3

light

y1y0

Figure 4.15: Batch of light transport paths generated by a Bidirectional Light Transport Algorithm.Each pair of vertexes from camera and light subpaths is connected forming a full path.

scene contains caustic illumination, it is to be missing from the image, and the direct componentof the illumination should be more important than indirect one. In order to avoid black patchesin images, an ambient light term must be manually introduced, even if two indirect bounces arecomputed. Nevertheless, the ambient light has much smaller influence on the illumination than ifdirect illumination only is evaluated.

4.4.2 Bidirectional Path Tracing

Nowadays it is a well known fact that both Path Tracing and Particle Tracing are not robust,and sometimes it is most efficient tracing paths from a camera, and sometimes from a light source.The Bidirectional Path Tracing algorithm does exactly this. This algorithm is far more reliable,since it is able to handle well all scenes which are easy for Path Tracing, Particle Tracing and someother configurations. Moreover, if Bidirectional Path Tracing happens to fail, Path Tracing andParticle Tracing would also fail on such a scene.

Radiance Estimator

The algorithm works by generating one subpath starting from a camera and another subpathstarting from a randomly selected light source. Then these paths are connected with a deterministicstep. In the original algorithm [Veach, 1997] full paths are generated by connecting every pair ofvertexes from both subpaths (see Figure 4.15). Therefore, a path with length k can be generated ink different ways, varying the number of vertexes of camera and light subpaths. Path contributionsare then weighted by Multiple Importance Sampling. Probability of generating a camera subpathwith length m is given by the following formula:

pdfE(λ, x[0]) = 1pdfE(λ, x[1]) = pdfA(x0)pdfE(λ, x[m]) = pdfA(x0)pdfW (x0 → x1)pdftr(x0 → x1)Gx(x0 ↔ x1)·

·m−2∏i=1

[pdfσx(xi−1 → xi → xi+1)pdftr(xi → xi+1)Gx(xi ↔ xi+1)·

· pdfc(xi−1 → xi → xi+1)],

(4.21)

Page 58: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 47

where pdfσx is defined by Equation 4.16 and Gx by Equation 2.40. A formula for probability ofgenerating light subpath with length n is defined in a similar way:

pdfL(λ, y[0]) = 1

pdfL(λ, y[1]) = pdfA|V (y0)

pdfL(λ, y[n]) = pdfA|V (y0)pdfL(y0 → y1)pdftr(y0 → y1)Gx(y0 ↔ y1)·

·n−2∏i=1

[pdfσx(yi−1 → yi → yi+1)pdftr(yi → yi+1)Gx(yi ↔ yi+1)·

· pdfc(yi−1 → yi → yi+1)].

(4.22)

The Bidirectional Path Tracing radiance estimator is then a sum:

Lλi =∞∑m=0

∞∑n=0

CWm,n

f(λi, xy[m+n+1])pdfE(λs, x[m])pdfL(λs, y[n])

, i = 1, 2, . . . , C, (4.23)

where

Wm,n =pdfE(λs, x[m])pdfL(λs, y[n])

T1 + T2

T1 =m∑i=0

C∑j=1

pdfE(λj , x0 . . . xi)pdfL(λj , xi+1 . . . xmyn . . . y0) (4.24)

T2 =n∑i=0

C∑j=1

pdfE(λj , x0 . . . xmyn . . . yi+1)pdfL(λj , yi . . . y0)

is the Multiple Importance Sampling weight, and

xy[m+n+1] = x0x1 . . . xmynyn−1 . . . y0 (4.25)

is a concatenated light transport path, which contribution f(λ, xy) is defined by the equation 4.19.The infinite sum is terminated by Russian roulette based absorption. Similarly as in Path Tracing,with probability one there exist finite subpath lengths s and t. Each term in the sum in estimator4.23 is assumed to be zero for m > s and n > t. Again, the C spectral samples are evaluated atonce, using subpaths generated with randomly chosen λs value.

Bidirectional Path Tracing radiance estimator 4.23 contributes to radiance at several points onimage plane at once, not only to the point defined by camera ray x0 → x1. In fact, each evaluatedpath with just one camera vertex provides additional different camera ray. These additional radi-ance samples are to be stored at seemingly random positions on image plane. The image consists oftwo different, independently stored, subimages, one for initial camera rays x0 → x1, and the secondfor additional rays in the form x0 → yi, i = 0, 1, . . . , t. These two subimages are reconstructedwith different algorithms and the final image is a sum of these two [Veach, 1997].

Optimizations

Due to connections between each pair of vertexes from light and camera subpaths, BidirectionalPath Tracing estimator requires a lot of rays to be traced: s + t rays are necessary for generat-ing subpaths and st rays are required for visibility tests. This can hurt application performance,especially in highly glossy environments, when early absorption is unlikely. Veach proposed tech-nique called efficiency optimized Russian roulette to solve this issue [Veach, 1997]. Unfortunately,Veach’s approach assumes that partially rendered image is stored in a pixel array, so it cannotbe used in our implementation. However, it is enough to generate a path with a given lengthjust once to obtain estimator with low variance. The actual number of camera and light subpathvertexes used to obtain a full path with length k is chosen at random, which leads to unbiased

Page 59: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 48

cam

era

x0x1 x2

x3 x4

y’2y’3 y’4 y’5

light

y’1

y0y1

y2y3

[8][2]

[4][3]

[7]

[5]

[6]

Figure 4.16: Batch of light transport paths generated by an optimized Bidirectional Light TransportAlgorithm. The algorithm uses specialized direct lighting and reduction of number of visibilitytests. Direct illumination points y′i are used instead of y0, and paths with a given length [k] areconcatenated just once.

estimator with something higher variance, but nevertheless better efficiency due to significantlyreduced computational cost. Our technique requires exactly s + t + max(s, t) rays to be traced ifsubpaths have lengths s and t.

Basic Bidirectional Path Tracing algorithm uses just one vertex at a light source for eachcamera subpath. This is inefficient, since specialized direct lighting techniques provide much lessvariance, increasing rendering time only slightly. We have implemented such optimization exactlyas suggested by Veach [Veach, 1997]. Both of these optimizations are presented in Figure 4.16.

Open Issues

While being far superior to Path Tracing and Particle Tracing, even fine tuned for performance,Bidirectional Path Tracing still is not an algorithm of choice for rendering complex scenes. Besideslocal path sampling limitation, it exposes some issues. The most notable are:

• Mainly invisible illumination – majority of light subpath to camera subpath connections passthrough opaque surfaces.

• Inappropriate local sampling – directions favorized by local scattering do not match lighttransport over whole paths.

Due to the first flaw, Bidirectional Path Tracing is unsuitable for large open environments. Itis very unlikely that a light subpath started somewhere at sky, and generated independently ofcamera subpath, is visible by the camera. The inappropriate local sampling occurs when, forexample, a directly visible highly polished surface reflects most of camera rays into unlit part ofthe scene. Therefore majority of illumination is transported by paths which are less likely to begenerated, which results in highly noise images.

Similar case is a sparse participating medium illuminated by a very bright light source. Sincethe medium is not dense, almost all rays pass through it and escape into darker area of the scene,while few of them interact with the medium. Because interaction occurs with low probability, andinteraction points receive relatively strong direct illumination, resulting image is dark with whitespots. The main image feature is then the illuminated participating medium, while algorithmprefers tracing rays into far darker surroundings, effectively wasting computational resources.

Page 60: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 49

4.4.3 Metropolis Light Transport

Path Tracing and Bidirectional Path Tracing generate independent light transport paths. Ifa path with a relatively high radiance contribution is found, it is dropped after being sampled,and the new path is generated from a scratch. If at least some of important paths happen to bedifficult to be sampled at random (for exaple, due to inappropriate local sampling), the resultingimage has large variance. Metropolis Light Transport [Veach, 1997], on the other hand, is capableof reusing previously generated paths during sampling the new ones.

Metropolis algorithm generates samples from a function f : X → R, where X is a space of alllight transport paths. After the initial sample x0 is constructed, the algorithm creates sequenceof samples x1, x2, . . . xn. A sample xi is obtained as a random mutation of sample xi−1. Themutation is then randomly accepted or rejected, and xi is set to be either the mutation result, orthe unmodified xi−1 otherwise. The mutation acceptance probability is chosen in such a way thata pdf for sampling function f in the limit becomes proportional to the f itself.

Path Mutations

In order to obtain the desired sampling density over X, the mutation acceptance probabilitymust be appropriately evaluated. Suppose that a mutation transforms a path x to y. The condition,which holds when Metropolis sampling reaches equilibrium regardless of initial sample x0 is calleda detailed balance:

f(x)pdfT (x→ y)pdfa(x→ y) = f(y)pdfT (y → x)pdfa(y → x), (4.26)

where pdfT (x→ y) is a conditional probability of generating a mutated path y provided that currentpath is x, and pdfa is a mutation acceptance probability. This equation leaves some freedom inchoosing an appropriate pdfa, and since high mutation acceptance probability improves convergenceof the algorithm, the following expression provides maximal possible acceptance probability:

pdfa(x→ y) = min

1,f(y)pdfT (y → x)f(x)pdfT (x→ y)

. (4.27)

Additionally, in order to properly explore light transport paths space X, mutation scheme mustbe ergodic. The algorithm has to converge to an equilibrium state no matter how x0 is chosen. Toensure ergodicity, it is enough to have a mutation with pdfT (x → y) > 0 for each x and y suchthat f(x) > 0 and f(y) > 0.

Mutation Strategies

Mutations of light transport paths are designed to efficiently capture variety of illuminationphenomena. Good mutation strategy should have the following properties:

• large changes to light transport path,

• high acceptance probability,

• stratification over image plane,

• low cost.

All these properties are difficult to obtain with a single mutation algorithm, so a proper mutationstrategy offers a set of individual mutations, which are addressed to satisfy different goals. Veachproposed using bidirectional mutations and lens, caustic, and multichain perturbations [Veach,1997]. Pauly extended this mutation set by a propagation mutation [Pauly, 1999], which enhancesalgorithm robustness in the presence of participating media.

Page 61: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 50

A particular mutation, which is to be used to modify current path, can be selected at ran-dom from a mutation set. The optimal probabilities for mutation selection are somewhat scenedependent. Assigning them roughly equal values produces a strategy which is fairly robust andless prone to excessive error resulting from difficult scene configuration, however the strategy canbe suboptimal for simple scenes without sophisticated illumination phenomena.

Algorithm Initialization and Radiance Estimator

Metroplis Light Transport algorithm exhibits two issues which have to be addressed – it isprone to start-up bias phenomenon and it can compute relative brightness of parts of image only.Fortunately, both these problems can be solved using a clever way of algorithm initialization. Thealgorithm initialization is due to Veach [Veach, 1997].

The start-up bias is caused by arbitrary choice of initial light transport path x0. Althoughthe algorithm samples paths according to probability which is proportional to f(x), it does soonly in the limit, when n →∞. The initial path, however, affects the probability of sampling forany finite n. To avoid start-up bias, it is enough to multiple each sample math xi by a weightW0 = f(x0)/pdf(x0), where x0 is a path generated by arbitrary capable light transport algorithmwith a probability pdf(x0).

The image generated by a weighted Metropolis sampling is unbiased and provides absolutebrightness information as well, yet the algorithm is still unusable in practice. For example, if apath x0 happens to have zero contribution, f(x0) = 0, the resulting image is a black square. TheMetropolis algorithm can be initialized using n0 samples generated by different light transportalgorithm to solve this issue. The weight is then evaluated as:

W0 =1m

m∑j=1

f(xj)pdf(xj)

. (4.28)

The initial path of Metropolis algorithm is then chosen at random from the generated set of mpaths, x0 ← xj , where jth path is chosen with the probability proportional to f(xj)/pdf(xj). Thesampling algorithm can be run several times to generate total of n required mutations, each timestarting with different randomly chosen initial (x0) path. This feature enables efficient paralleliza-tion of the algorithm (see Section 5.2.4), and a variance based error estimation as well [Veach,1997].

The Metropolis algorithm generates samples according to a function g : X → R. Since pathcontribution f(x) is a spectrum, the spectrum has to be converted to a real value. The goodsolution is to set g(x) = ‖f(x)‖λ, where for the norm ‖·‖λ can be used the Equation 4.18. Theimage generated by a Metropolis algorithm is formed by storing spectral samples f(x) at locations(u, v) on image plane defined by directions of camera rays from paths x. The samples are weightedby W0 (Equation 4.28).

Optimizations

Metropolis Light Transport tends to concentrate sampling effort in the relatively brighter partsof the image. While this is a welcome feature if bright and dark parts are mixed in area less than orcomparable to image resolution, dark areas which span across several pixels receive little samplesas well. It is even possible that some parts of an image would not be sampled at all. Metropolisalgorithm can be optimized to address this issue, though [Veach, 1997]. First, it is possible torecord rejected mutation as well, if samples are weighted appropriately, using the Algorithm 4.1.

Moreover, the function f can be modified to directly account for differences in image brightness.Using samples from Metropolis algorithm initialization, a tentative image I with tiny resolution(say, 32×32) can be generated. Then, the Metropolis algorithm samples according to a function g,which is defined as ratio of f to I value at a given sample location. The magnitude of a g functiontypically does not exhibit substantial variation over different image regions.

Page 62: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 51

Algorithm 4.1: Metropolis sampling optimized for recording both accepted and rejectedmutations.

for i = 1 to n doy ← mutate(xi−1)α ← pdfa(xi−1 → y)record(αf(y))record((1− α)f(xi−1))ξ ← random()if ξ < α thenxi ← y

elsexi ← xi−1

end ifend for

When a scene does not contain sophisticated lighting phenomena, and majority of illuminationis a simple direct illumination, Metropolis Light Transport yields to a basic Path Tracing. Veach[Veach, 1997] proposed excluding direct illumination component from Metropolis algorithm andevaluate it with more efficient approach. However, as he noted, if scene contains a lot of invisiblelight sources, this optimization may in fact turn to be a serious handicap.

Comparison of MLT and BDPT Algorithms

The Metropolis Light Transport is substantially more robust than Bidirectional Path Tracing. Itsolves issues related to inappropriate local sampling and a difficult visibility between light sourcesand camera. Moreover, MLT algorithm reduces the impact of local path sampling limitation,however this flaw is still not completely solved – when light sources are sufficiently small and scenematerials glossy, MLT is bound to fail also. Unfortunately, mutation based sampling scheme is lessefficient on simpler scenes, which can be rendered more efficiently with BDPT.

Energy Redistribution Path Tracing

Energy Redistribution Path Tracing [Cline et al., 2005] is a relatively recent light transportalgorithm based on light path mutations, and is similar to Metropolis Light Transport. The initialstep of this algorithm generates light transport paths using simple path tracing. The number ofsuch paths necessary to reduce noise to an acceptable level is much smaller than in pure pathtracing, however. Then, the generated paths are mutated, possibly passing through image areaassociated with neighboring pixels, therefore redistributing the energy of path tracing sample overlarger image area.

4.4.4 Irradiance and Radiance Caching

Irradiance and Radiance Caching algorithms are biased solutions to the light transport equation,based on the assumption that indirect illumination is likely to change slowly across rendered scene,and often can be interpolated using already computed nearby irradiance/radiance values. Allalgorithms which use this kind of interpolation must solve three tasks: decide when compute newvalues and when interpolate among nearby ones, provide a structure for storage of cached dataand efficiently evaluate new samples if interpolation happens to be unwanted. Images producedby irradiance/radiance caching typically are not noisy (only direct illumination component canintroduce some noise), but indirect illumination tends to be blurred.

The irradiance caching approach assumes that the cached value is, as the name suggest, irra-diance, and therefore caching and interpolating can take place only on perfectly diffuse surfaces,

Page 63: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 52

making the algorithm severely limited. The caching scheme is often used on other matte sur-faces [Pharr & Humphreys, 2004], however, this inevitably causes error which cannot be reducedby simply taking more samples. Radiance caching technique is an improvement, which does notexhibit such a major flaw – it allows caching and interpolating on diffuse and moderately glossysurfaces. The radiance caching approach caches full directional light distribution using sphericalharmonics basis function for its representation.

Since both irradiance and radiance caching trace rays only in direction from the camera to thelight sources, they are prone to substantial error during rendering phenomena like caustics. Forsome time now it is well known that reliable rendering of such effects require tracing rays in oppositedirection [Veach, 1997, Jensen, 2001]. Nowadays irradiance and radiance caching techniques arerarely used alone – instead they are used as an fairly important optimization of a final gatheringstep of the Photon Mapping approach (see Section 4.4.5 for a detailed description of this approach).

4.4.5 Photon Mapping

Photon Mapping is a biased, two pass algorithm for solving light transport equation. In the firststep, light subpaths are traced from light sources, and carried energy is stored as so called photonsat points where rays intersect scene geometry. Then, a structure specialized for fast search (socalled photon map) is built. Finally, in the second step, an image is rendered using flux estimationfrom photon map as a source of indirect illumination. The key innovation in Photon Mapping isthe usage of specialized, independent of scene geometry, data structure for photon storage. Photonmapping is particularly effective in rendering of caustics and it is not prone to local path samplinglimitation. Since Photon Mapping is biased, the method error is not only a random noise, and ingeneral produces images with parts which are regularly too bright or too dark. Fortunately, themethod is at least consistent – by increasing the number of stored photons the algorithm error canbe decreased to any arbitrarily small value.

Photon Maps Building

In the original algorithm [Jensen, 2001], photons are emitted from light sources with probabilityproportional (or roughly proportional) to the emitted radiance. Then the photons are scatteredthrough scene using BSDF sampling and Russian roulette based absorption. Particularly, photontracing is equivalent to construction of light subpaths in Bidirectional Path Tracing. At eachintersection of a ray with scene geometry, if fs at the intersection point contains a diffuse part, thephoton is stored. Stored photon contains three pieces of information:

• point of intersection,

• incoming photon direction,

• photon weight,

which are necessary to rendering of a photon map. Additionally, photons are flagged as beingdirect, caustic or indirect. This flag is also stored. Due to efficiency reasons, caustic photons arestored in normal and separate caustic map as well. Moreover, since photons stored on scene surfacesare treated differently than ones stored in scene volume, these two kind of photons are stored indifferent maps. Thus, four maps are constructed: surface global and caustic, and volumetric globaland caustic. Some of these maps might be empty, if for example scene contains no specular surfaces,capable of generating caustics. The method for tracing photons is presented in Algorithm 4.2.

Photons are initially stored in arrays. After the necessary amount of photons is traced, ap-propriate photon maps are built. Jensen [Jensen, 2001] proposed a kd-tree structure for storingphotons. This structure is very good for this purpose, since it poses no substantial storage over-head over simple array, can be constructed in O(n log n) time, and expected time for searching forphotons is O(log n), where n is the number of photons stored in the tree.

Page 64: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 53

Algorithm 4.2: Tracing photons for construction of photon maps.N ← 0; // number of shooted photons1

S ← 0; // number of stored photons2

while S < required photons do3

N ← N + 1;4

emit(photon, position, direction);5

flag(photon, as direct);6

while true do7

position← nearest intersection(position, direction);8

if position =∞ then break // ray escaped from scene9

if material(position) has matte part then10

if position ∈ A then store(photon, surface global map) ;11

else store(photon, volume global map) ;12

if flagged(photon, as caustic) then13

if position ∈ A then store(photon, surface caustic map) ;14

else store(photon, volume caustic map) ;15

end16

S ← S + 1;17

end18

direction← scatter(position, direction, fs, pdf);19

absorption prob← 1−min(1, luminance(fs)/pdf);20

if random() < absorption prob then break;21

else scale(photon, fs/(pdf ∗ absorption prob)) ;22

if not flagged(photon, as indirect) and glossy(fs) then flag(photon, as caustic) ;23

else flag(photon, as indirect) ;24

end25

if N > maximum shoots then break; // give up26

end27

Rendering of Photon Maps

When all the necessary photon maps are constructed, the scene is rendered with a specializedray tracing. Light transport paths, which are to be accounted for, are partitioned in a few sets:

• direct illumination (paths like L(M |G)2M?G∗(M |G)2E, i.e. directly illuminated matte sur-face or a light source, seen through zero or more glossy reflections),

• caustics (paths like L(M |G)2G+MG∗(M |G)2E, i.e. matte surface illuminated through oneor more glossy reflections, seen through zero or more glossy reflections),

• indirect illumination (paths like L(M |G)2(G|M)∗M(G|M)∗MG∗(M |G)2E, i.e. paths whichcontain at least two matte scattering events).

Each of them is evaluated in a different manner. Direct illumination (and illumination visiblethrough arbitrary number of glossy reflections) is always evaluated by sampling light sources,without relying on photon maps. In this case, subpaths are traced from a camera through arbitrarynumber of glossy (G) reflections until either a matte (M) reflection occurs or an absorption happens.Directly visible caustics (and caustics visible through arbitrary number of glossy reflections) arerendered using the separate caustic maps. The indirect illumination (the rest of possible lighttransport paths) is rendered using global photon maps.

When a photon map is decided to be used, the incoming radiance is based on incoming fluxestimated from the map content, instead of being calculated exactly. The reflected radiance isevaluated from incoming radiance using the formulae either (2.9) if x ∈ A or (2.18) if x ∈ V . The

Page 65: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 54

incoming radiance can be expressed in terms of flux:

Li(x,ω) =

d2Φi(x,ω)

dA(x)dσ⊥(ω), if x ∈ A,

d2Φi(x,ω)σs(x)dV (x)dσ(ω)

, if x ∈ V ,(4.29)

which after substitution into Equations (2.9) or (2.18), respectively, gives the following expressionfor the reflected radiance:

Lr(x,ωo) =

Ω

fs(x,ωi,ωo)dΦi(x,ω)

dA, if x ∈ A,∫

Ω

fp(x,ωi,ωo)dΦi(x,ω)

dV, if x ∈ V .

(4.30)

The incomin flux can be approximated by stored photons:

Lr(x,ωo) ≈

1

∆A

M∑i=1

fs(x,ωpi,ωo)∆ΦApi(x,ωpi), if x ∈ A,

1∆V

M∑i=1

fp(x,ωpi,ωo)∆ΦVpi(x,ωpi), if x ∈ V ,

(4.31)

where M is the number of photons used in the flux estimate and ∆Φpi is the flux associated withith photon. The photons stored on surfaces cannot be used to estimate volumetric flux and viceversa, hence the superscripts ΦA and ΦV . The estimation is performed in area ∆A or in volume∆V centered around the point of interest x. The simplest way of flux estimation is expanding thesphere around x until it contains required number of M photons or a prespecified maximum searchradius is reached (in the latter case M is reduced, possibly even to zero in unlit regions of thescene). If a sphere with a radius r is used for photon selection, then ∆V = 4

3πr3, and ∆A = πr2

(intersection of a sphere with a surface, which is assumed to be locally flat in small radius aroundx). Jensen [Jensen, 2001] proposed variety of optimizations for more effective flux approximations,majority of which we have implemented.

The flux estimation using photon maps is the source of bias in Photon Mapping algorithm.The method is consisted, however, because under some circumstances, increasing the number ofemitted photons N , the estimation error can be made arbitrarily small [Jensen, 2001]:

∀α∈(0,1)

limN→∞

bNαc∑i=1

fs(x,ωpi,ωo)∆Φpi(x,ωpi) = Lr(x,ωo). (4.32)

The number of photons M = bNαc used in radiance estimate increases to infinity together with N .Because of α both infinities are of different order, so the photon search radius r becomes arbitrarilysmall. The equation (4.32) is valid if f does not contain δ distributions and if point x lie on asurface, the surface is locally flat around x. This formula is the key formula to ensure convergenceof a described later in this section one-pass variant of Photon Mapping.

Full Spectral Rendering Support

Multiple Importance Sampling based spectral rendering (see Section 4.3) integrates flawlesslywith Path Tracing and Bidirectional Path Tracing. Unfortunately, the notable case, where spectralsampling causes difficulties, is Photon Mapping, designed by Jensen to work optimally with RGBtriplets only [Jensen, 2001]. There are two issues to solve – first, there are no complete lighttransport paths, which connect light source and camera, and second, millions of individual photonshave to be stored, causing excessive memory consumption if full spectrum is used to describe them.A recent work [Lai & Christensen, 2007] addresses memory issues. Unfortunately, this algorithm

Page 66: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 55

Figure 4.17: Comparison of rendering results between Photon Mapping (left image) and Bidirec-tional Path Tracing (right image), in 800x450 resolution. Both images were rendered in approx-imately the same time, using 256k image samples with 32k photons, and 512k image samples,respectively. BDPT tends to produce somehow noisy image, while PM samples are more expensiveto evaluate, and less of them can be generated in the same time.

converts photons’ spectra to RGB prior to storing them in a map, and converts RGB to spectraagain when searching through photons.

Our approach, on the other hand, is designed to converge always to the true result with increasednumber of photons, and therefore significant compression of spectral data is unsuitable. We traceand store clusters of photons with different wavelengths, instead of describing them by RGBtriplets. First, in order to explore wavelength space properly, each emitted photon cluster must haveindividually chosen wavelengths. The obvious place for optimization is that one emitted photoncluster typically corresponds to several stored photon clusters, and therefore cluster wavelengthsare stored once for each emission. Moreover, for storing energy, one can experiment with a non-standard floating point format instead of IEEE single precision. Using 8-sample clusters requires32B of data for individual stored photon, not to mention an additional 32B for each emission,which is far more than 12B required by an RGB based implementation. If a compact float formatwith shared exponent is used, the latter can be compressed even to 4B, however, with potentialloss of image quality. We have left this for further research.

When a photon is about to be stored, its energy is multiplied by weight given by Equation(4.13), which accounts for all encountered wavelength dependent scattering events. In the secondpass, rendering of photon map is performed. Camera rays should be weighted similarly prior tophoton map lookups. In the classic Photon Mapping, photons are searched in a sphere centeredaround the intersection point. The sphere radius should be chosen carefully – too small causesnoise and too large blurriness. We extend this approach to wavelength search as well. If a photoncluster is decided to be used in a flux estimate by a sphere test, additional tests are performed onindividual photons (with associated wavelengths) using a spectral search distance in a wavelengthspace. Similarly as with the spatial radius, the spectral distance must be chosen carefully, in orderto avoid noise and blurriness.

Optimizations

The basic Photon Mapping structure is optimized in a numerous ways. First, final gathering[Jensen, 2001] is added to reduce blurriness resulting from using global photon map directly forestimate indirect flux. Since final gathering is computationally expensive, its cost is often reducedwith irradiance or radiance caching (see Section 4.4.4). Apart from these two most importantimprovements, there are several other useful Photon Mapping optimizations – e.g. flagging allspecular surfaces and explicitly shot photons onto them, use shadow photons for accelerated directillumination evaluation (see [Jensen, 2001]), and many others. The results of Photon Mappingcompared with Bidirectional Path Tracing are presented in Figure 4.17.

Page 67: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 56

Figure 4.18: Comparison of rendering results between one pass Photon Mapping (left image)and two pass Photon Mapping (right image). The direct illumination is rendered using photonmap to show rendering error more clearly. Both images were rendered in 800x600 resolution, inapproximately the same time, using 256k image samples. The left image rendering progressivelyincreased number of photons from 1k to 128k, while the right one uses 128k photons from the start.The most notable artifact in the left image is a noise in shadowed area on the left side of the ring– which is a remaining of blurriness from rendering with too few photons.

One Pass Photon Mapping

Actually, Photon Mapping can be done in one pass, with only minor loses in efficiency comparedto the original approach – see Figure 4.18. This approach is very useful if interactive preview ofrendering results is required (which is described in more detail in Section 5.4). The new algorithmuses a linear function of number of image samples (n) to estimate minimal necessary photon countin photon map to obtain image with quality determined by n. Therefore, the photon map is nomore a static structure – new photons are added while new image samples are rendered.

Immediately two issues have to be solved – synchronization of read and write accesses to thephoton map structure in parallel photon mapping and balancing kd-tree. Synchronization canbe performed with simple read-write locks (classic readers-writers problem). On the other hand,kd-tree balancing requires significant algorithm modification. We have chosen to balance the scenespace instead of photons. The original algorithm starts with bounding box of all photons (unknownin our approach) and in each iteration places splitting plane at a position such that half of thephotons remains on the one side of the plane. Otherwise, our algorithm starts with bounding boxof the entire scene, and in each iteration it splits it in half across dimension in which the box is thelongest. Splitting stops when all nodes contain less photons than a certain threshold (5-6 seems tobe optimal) or a maximum recursion depth is reached. Adding new photons require just splittingof some of the nodes, where there happens to be too many photons.

The idea is somehow similar to Irradiance Caching algorithm [Ward et al., 1988]. Similarly asin this method, our approach starts with empty structure and fills it through rendering. However,Irradiance Caching calculates irradiance samples when they are needed by camera rays, while ourmodified Photon Mapping traces photons in a view independent manner.

Algorithm Flaws

Apart from being biased, the Photon Mapping is not always a best known solution for anyscene. First, when the scene contains a lot of illuminated, but invisible, surfaces, many photonswould be stored on them, which lead to waste of computational resources and excessively blurryindirect illumination. The Fradin algorithm [Fradin et al., 2005] is designed to mitigate this issue.Bidirectional Path Tracing is no better in such situations, producing noisy images instead, whereas

Page 68: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 57

Figure 4.19: Quickly generated images with one pass Photon Mapping. Left image was generatedin 8sec, while progressively refined right image in the 30sec time. Both images were rendered in640x480 resolution, in parallel, using 2.4GHz Intel Core 2 Duo.

Metropolis Light Transport is significantly more reliable. Moreover, we have found that PhotonMapping has problems with direct illumination from strongly directional light sources, like carheadlights or torches. In this way, it is better to evaluate illumination sampling paths startingfrom light sources, precisely what Bidirectional Path Tracing does.

Towards Real Time Global Illumination – Second Approach

The Photon Mapping algorithm is a good candidate for real time global illumination. Theobvious approach – using relatively few photons in photon map causes blurry indirect illumination,yet does not affect adversely direct illumination component and renders quickly. If the one passvariant is used, the image can be progressively refined, as more photons are added to the photonmap. The results are presented in Figure 4.19

4.5 Combined Light Transport Algorithm

The Section 4.4 gave a brief description of most important currently used ray tracing algorithmsfor global illumination evaluation. The important remark is that some of these algorithms can bebetter than others when rendering a particular scene, while the situation can be opposite with adifferent scene. Therefore, the best algorithm cannot be chosen without the detailed knowledge ofwhat the scene contains and the user have to decide which one to use – a situation which shouldbe avoided if fully automatic global illumination evaluation is the purpose.

The rest of this section starts with motivation behind the proposed light transport algorithm.Next, there is a description of choice of unbiased part and combining it with Photon Mapping. Themain difficulty here is to decide when a light transport path should be sampled by an unbiasedtechnique and when estimated using photon map. Then, results of the combined approach arepresented, and finally, possible future improvements are described.

4.5.1 Motivation

Our idea is to combine more than one algorithm in order to achieve the best possible result.Usage of one of unbiased techniques, e.g. Bidirectional Path Tracing is a good idea for a lot ofscenes, due to their modest memory requirements, ease of parallelization and error, which is easyto estimate. The complex algorithm, however, should detect difficult cases resulting primarily fromlocal path sampling limitation, and when the estimated error exceeds some threshold, switch to

Page 69: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 58

the one of the Photon Mapping variants. Therefore, Photon Mapping can be used only when isstrictly necessary. Thus, the proposed complex algorithm can reduce the risk of excessive memoryrequirements and the risk of unpredictable output due to bias.

The combined light transport algorithm has the potential of being much more reliable than anyunbiased technique. On the other hand, the basic Photon Mapping technique is often the algorithmof choice, and used no matter what the rendered scene represents. We argue that this approachcan lead to poor results, even on simple scenes. For example, the scene presented in Figure 4.20contains a strong direct illumination from highly directional light source, and caustics due tometallic ring. The direct illumination component is evaluated inefficiently using photon map, andeven more poorly using direct lighting strategy recommended for Photon Mapping. In fact, the bestpossible way to account for this kind of illumination is to trace rays from light source and recordtheir intersections with scene geometry directly by a camera – exactly what Bidirectional PathTracing does. Moreover, the caustics are not always created by focusing illumination (causticsinside versus outside of the ring). If the caustic is created without focusing effect, the photondensity is far lower, and BDPT is able to render such caustic far more efficiently than photonmapping, which is considered an algorithm of choice for rendering caustics. On the other hand,BDPT fails in rendering indirectly visible caustics (i.e. their reflections through mirrors).

Considering the mentioned above examples, it is clear that neither Bidirectional Path Tracingnor Photon Mapping working alone can be reliable on a wide variety of real world scenes. It fact,they cannot render efficiently all parts of the very simple test scene from Figure 4.20. This is themain reason of development of the algorithm containing both BDPT and PM components, andselects one of them at runtime, at per light transport path basis.

4.5.2 Merging of an Unbiased Algorithm with Photon Mapping

The combined algorithm contains two parts – Bidirectional Path Tracing and Photon Mapping.In order to render scenes properly, the algorithm have to decide on which light transport pathswhich part is used. The paths must not be skipped, since this cause too dark resulting image, andsimilarly, any path must not be accounted for twice. The main component of the algorithm, whichdrives the computation, is a Bidirectional Path Tracing part. Having constructed a light transportpath, it decides whether to add its contribution to the result, or pass it to evaluation by PhotonMapping.

The BDPT algorithm can generate a light transport path with k vertexes in k + 1 techniques,by varying number of vertexes generated tracing rays from camera and light sources. These k + 1techniques are assigned weights, in such a way that weighted sum of their contributions produceimage with as low variance as possible. That is, a technique which can be a source of high varianceis assigned a low weight. However, if the algorithm is correct, the weights are normalized to add upto one. Therefore, if all techniques of generating a given light transport path have low weights, theweighted sum does not help at all, and BDPT works very inefficiently producing image with overlyhigh variance. Moreover, if scene contains point light sources, perfectly specular materials and isrendered with a pinhole camera, the BDPT is likely to miss certain illumination features completely.Then, the task of combined light transport algorithm is to detect such cases, omit evaluation ofthese paths by BDPT (if these are not missed anyway), and estimate their contribution using aphoton map instead.

The modified BDPT part of combined algorithm is designed to look for special cases of lighttransport paths – paths with the form of LG+XG+E, X → D|DG+XG+D. Such paths have atleast five vertexes, have all matte scattering events separated with glossy ones, and have glossyscattering events next to light sources and camera. Thorough testing of BDPT algorithm showsthat such paths have low weights for all BDPT sampling techniques, and are main source of errorin images produced by this algorithm. If such a path is detected, is immediately omitted fromfurther processing by the BDPT component.

The photon mapping component starts with empty photon map. If the BDPT componentdetects glossy surfaces next to a camera and light sources, it orders PM component to start filling

Page 70: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 59

Figure 4.20: Comparison of results of various light transport algorithms. Top left image: Bidirec-tional Path Tracing, 4M samples. Top right image: Photon Mapping, 4M samples, 0.5M photons.Bottom image: Combined algorithm, 4M samples, 0.5M photons, which uses either BDPT or PM,estimating which one is likely to perform better. BDPT is roughly 2.5 times faster than the PMand the proposed combined algorithm.

a photon map – it is highly likely that BDPT is about to generate a high variance light transportpath, or even miss some of illumination features completely. Then to the BDPT generated imageare added samples from photon map. These samples are restricted to contain illumination onlyfrom paths omitted by the BDPT component.

Additionally, the combined algorithm employs one important optimization – the same subpathstraced from light sources are used for both BDPT samples and photon map filling. This optimiza-tion significantly reduces number of traced rays, at the cost of having photon map highly correlatedwith BDPT sampling. During our tests, however, this correlation appeared to be harmless.

4.5.3 Results and Conclusion

The comparison of images generated by the new proposed algorithm with Path Tracing, Bidi-rectional Path Tracing, and with Photon Mapping, is given in Figure 4.21. The detailed numericalcomparison of convergence of these algorithms is presented in Chapter 7, together with referenceimage in Figure 7.3. As expected, local path sampling limitation does not cause the combinedalgorithm to fail, and the bias from Photon Mapping is not a serious issue as well. On the otherhand, the algorithm cannot cope well with scenes where light sources are separated from camerawith difficult visibility blockers. This is not surprising, since such scenes cannot be efficientlyrendered by both Photon Mapping and Bidirectional Path Tracing.

Page 71: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 4. LIGHT TRANSPORT ALGORITHMS 60

Figure 4.21: Comparison of results of various light transport algorithms. Top left: Path Tracing,2M samples. Top right: Bidirectional Path Tracing, 1M samples. Bottom left: Photon Mapping,256k samples, 128k photons. Bottom right: Combined algorithm, 1M samples, 32k photons, whichuses either BDPT or PM, estimating which one is likely to perform better. All images were renderedin approximately the same time.

Page 72: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 5

Parallel Rendering

Currently, to achieve best possible performance, rendering must be run in parallel. Due tothe recent advancements in microprocessor technology, significant performance improvements areavailable as a multiplication of computational cores, rather than substantial increase of efficiencyof a sequential processing. Fortunately, majority of image synthesis algorithms exhibit very highdegree of inherent parallelism. In the simplest case, each of millions of image fragments canbe evaluated independently of others. However, more advanced techniques often introduce somedependencies, in order to, sometimes significantly, accelerate computations.

Parallel rendering is not a new idea, though. There is a lot of investigation dedicated to this area.There are conducted researches into parallelization of well-known sequential algorithms. Jensen[Jensen, 2000] show how to run his Photon Mapping in parallel environment. Debattista et al.[Debattista et al., 2006] parallelized Irradiance Caching. Parallelization allows effective renderingof huge data sets. Dietrich et al. [Dietrich et al., 2005] and Stephens et al. [Stephens et al., 2006]described techniques for interactive rendering of massive models using shared memory approach.Some important advancements allow run parallelized ray tracing algorithms in real time. Parkeret al. [Parker et al., 1999] showed a method of running a real-time classic ray tracer. Dmitrievet al. [Dmitriev et al., 2002], Wald et al. [Wald et al., 2002], Benthin et al. [Benthin et al., 2003]and Dachsbacher et al. [Dachsbacher et al., 2007], on the other hand, implemented more advancedalgorithms, which still yield interactive frame rates. Benthin [Benthin, 2006] also proposed usingcoherent rays in addition to parallelization. These algorithms are variants of a severely restrictedglobal illumination and do not try to solve light transport equation correctly, though. The recentlypopular model for parallelization are stream processors [Purcell, 2004,Gummaraju & Rosenblum,2005, Gummaraju et al., 2007]. Purcell [Purcell, 2004] describes how this model works, expressray tracing based algorithms with it and finally implements stream processor on DirectX 9 classGPU. Unfortunately, this design and implementation is inflexible and suboptimal, due to quirksand peculiarities of that time GPU programming model. Currently (2008/2009 year) it seems thattechnologies such as OpenCL (Open Computational Language) [Munshi, 2009, Trevett, 2008] orIntel Ct (C for throughput computing) [Ghuloum et al., 2007] have a lot of potential to replaceGPU shading languages, which still have insufficient programmability. Unfortunately, they are stillin a development stage, however.

The rest of this chapter starts with a detailed description of a concept of stream processing.Next, there is an explanation of how to express ray tracing in terms of stream computation. Then,there is a discussion on a choice of optimal hardware for implementation of an extended streammachine, and finally, a technique for interactive visualization of ray traced results using CPUcooperating with GPU is presented.

61

Page 73: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 62

5.1 Stream Processing

Stream processing is a constrained form of a general parallel computation. The main ideabehind this concept is a simplification of an arbitrary parallel program and hardware necessary toexecute it, which results in an increased performance, however, at the cost of limited flexibility.Thus, not every parallel algorithm can be run on stream machines. Fortunately, some of raytracing techniques easily can be expressed as stream algorithms. Stream processing is best suitedfor compute-intensive tasks, which exhibit no dependencies between operations on different dataand locality of memory accesses. It is also possible to offload machine central processor by runningsuch parts of an application on specialized streaming hardware.

This section starts with detailed description of principles of stream processing, followed by ournovel extension to the stream machine concept – a software cache – which is very convenient forsome more advanced ray tracing algorithms. Finally, there is a description of quasi-Monte Carlointegration as a stream computation.

5.1.1 Stream Processing Basics

The stream processing concept is based on so called streams and kernels. Streams are sets ofrecords, while operations of them are performed by kernels. A kernel is an arbitrary function, whichtakes exactly one stream as an input and one or more streams as an output. Kernel operates onindividual records of an input stream independently, thus a lot of such records can be processed inparallel. When record is processed, kernel writes zero or more result records into any combinationof output streams. Kernel is executed once for each input record, but due to parallel nature ofprocessing, order of output records does not necessarily match the order of input ones. However,in some definitions, this order is forced to be kept [Purcell, 2004]. Kernel execution must bestateless, that is, kernel cannot store static variables of any kind. Additionally, kernels have accessto arbitrary, constant data structures, which can be read during processing of an input stream.The concept of stream processing is presented in Figure 5.1.

... ... ... ...

... ......

Read-onlyMemory

Input streamOutput streams

Kernel

Figure 5.1: Stream machine basic architecture.

Kernels and streams can be combined into more complex designs. For example, output of onekernel can be input of another one. Additionally, more than one kernel can write to the samestream. Kernels can be organized into feedback loops, where one of the outputs is written intoinput stream. However, there is no possibility of having more than one input stream, since kernelwould not know how to match input records for processing.

5.1.2 Extended Stream Machines with Cache

The major issue, seriously limiting variety of algorithms suitable for stream machines, is atotal lack of possibility of data transfer between executions of a kernel on different records of

Page 74: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 63

input stream. We argue that adding an additional read-write memory can significantly improveperformance and potential of stream machines. We refer to this extra memory as a cache. Basically,the presented cache design guarantees atomicity of read and write operations, although order ofindividual operations is not guaranteed. That is, if processing of earlier element in a stream bya kernel induces a cache write operation, and processing of a later element causes a cache read,during read operation cache would either have previous portion of data written completely or notwritten at all. In the actual stream machine design, an algorithm can use an arbitrary numberof independently synchronized caches (limited only by the capacity of machine memory). Theextended stream machine is presented in Figure 5.2.

... ... ... ...

... ......

Read-onlyMemory

Input streamOutput streams

Kernel

Cache

Figure 5.2: Extended stream machine architecture.

Motivation

The motivation behind this extension is, among others, to enable expressing algorithms, whichadjust themselves to the input data at a runtime, in a stream style. Such algorithms could gathersome data during processing of an initial part of an input stream, and then use the gatheredinformation to accelerate processing of the rest of the input stream. Such algorithm can neverfail, because even if cache write operation is slow, and cache read comes before write is completed,founding no data at all, the algorithm gains nothing, but is still correct. Example of such analgorithm is presented in the Section 5.1.3.

Cache Synchronization

The cache synchronization in an extended stream machine is a classic readers-writers problemin the domain of concurrent programming [Ben-Ari, 2006]. This problem is known from havinga universal, provably correct solution, which unfortunately cannot be tuned to be optimal in allcases, from performance point of view. Three general known solutions are worth to mention:

• Favour of readers. Whenever a reader is already having access to a shared resource, eachnewcoming reader would also gain an access. Typically this is the best performing solution,because readers never wait unnecessarily (they wait if and only if access is granted to awriter). However, the solution is incorrect because of the risk of starvation for writers.

• Favour of writers. Whenever a writer waits for an access to a shared resource, no reader isgiven such an access. This solution typically is hardly optimal, because of a lot of unnecessarywaiting of readers. Moreover, it is possible to have at least one writer waiting all the times,which cause starvation of readers.

• Fair lock. Readers and writers wait in one common queue. The newcoming reader wouldgain access if and only if the resource is not locked for write and no writer is waiting in queue.The writer will gain access if and only if the resource is unlocked. Since there is one common

Page 75: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 64

queue, neither readers nor writers are prone to starvation. However, blocking a reader justbecause a writer is waiting, can potentially deteriorate algorithm’s performance.

The fair lock algorithm can be upgraded using priority queues for readers and writers. Byadjusting priorities, priority queues can be tuned for near optimal performance, but this tuning ishighly sensitive to the parameters of a particular application – how much read and write requeststhe application generates, and how long they last. There are also some possibilities with temporalpriority boost. For example, whenever a reader is already granted an access, all newcoming readersget slight, temporal, boost in priority. Again, when resource becomes free and there are few waitingreaders, a writer may have priority boosted, in order to gather more readers in queue and grantthem access simultaneously. Moreover, asynchronous writes can also be useful. In this case writingprocess is always released and does not wait, but the actual data are written into the cache ifand only if the cache is unblocked. Otherwise data is copied into a private, temporary buffer,and writing operation is executed at the next opportunity. All these concepts are evaluated withselected parallel ray tracing algorithms, and the results are presented in Section 5.2.5.

5.1.3 Stream Monte Carlo Integration

In this section it is presented a high level description of simple algorithm, which potentiallycan perform better on a stream machine with a cache extension. Assume that a standard MonteCarlo estimator (Equation 3.18) has to be computed. Let the data necessary to calculate f(x) beplaced in the constant memory of a stream machine. The input stream may contain successivevalues of a canonical random variable ξi. The kernel then should map the values ξi to values ofdesired random variable Xi with requested probability pdf , evaluate pdf(Xi) and f(Xi) and theneventually write the ratio f(Xi)/pdf(Xi) into the output stream. The main processor would havethen the trivial task of summing all the numbers from the output stream and dividing the sum bythe number of records in the stream. The algorithm is simple, however it is inflexible and cannotadapt itself to the shape of f(x). In some cases such possibility can be extremely important.

Monte Carlo Integration with Cache

If stream machine offers some form of read-write memory, there is much more flexibility indesign of algorithms. Let the pseudocode of kernel function be:

Algorithm 5.1: Kernel for Monte Carlo integration on an extended stream machine.Check the cache for information about f(x) shape;1

if something was found then2

use adjusted pdf(x);3

else4

use basic (hard-coded) pdf(x);5

end6

Evaluate Xi from ξi using updated adaptive pdf(x);7

Evaluate f(Xi);8

Update the cache according to the f(Xi) value;9

Write f(Xi)/pdf(Xi) to the output;10

This algorithm has a chance to adapt itself to the behaviour of function f(x). This conceptis the base of expressing some advanced ray tracing algorithms as extended stream computations(see Section 5.2.4 for details).

Errors in Numerical Algorithms

There is one caveat with this method, however. Numerical errors can be roughly groupedinto three categories – input data errors (not relevant to this discussion), roundoff errors and

Page 76: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 65

method errors. When the integration is performed with quasi-random (repeatable) values and in asequential way, in each subsequent algorithm run, errors from all three categories will be identical,and therefore the results will be the same up to every bit. On the other hand, parallel algorithmcauses some issues. First, when cache is not used, ratios in the output stream can be in a differentorder in each different algorithm run. Since typically (a⊕b)⊕c 6= a⊕(b⊕c) in computer arithmetic,results would then be equivalent up to roundoff errors. If the results are to be compared, bit-to-bitcomparison of outputs would not work anymore. Moreover, cache makes things even worse. Inparallel algorithm, content of cache could be different in each algorithm run, when, say, ith elementof an input stream is processed. Therefore the results of any two parallel algorithm with cacheruns typically differ by roundoff and method errors, even if quasirandom numbers are used. Thiscauses comparison of results even more difficult. However, if algorithm is mathematically correct,the results will inevitably converge to the true result (up to roundoff errors), independently oncache content, if the number of stream elements is to be increased without bounds.

5.2 Parallel Ray Tracing

Running ray tracing in parallel seems easy at first, but in fact there are some issues to solvein order to make this approach efficient. This section starts with description of ray tracing ini-tialization and trade-offs offered by different ray intersection acceleration structures. Next, thereare presented techniques for managing a frame buffer during parallel rendering, followed by a de-scription of streaming multipass rendering. Finally, selected ray tracing algorithms are targeted toextended stream machine, and efficiency of the design is evaluated in a multi-threaded environment.

5.2.1 Algorithm Initialization and Scene Description

A major task to be performed during ray tracing initialization is building a ray intersectionacceleration structure. Other works, e.g. loading textures, typically consume much less time.It is not uncommon to see this time is roughly a few minutes for large scenes with millions ofprimitives. If parallel rendering is used to obtain interactive frame rates, this cost of initializationis unacceptable. There was some research carried to investigate this topic. Wald et al. [Waldet al., 2006] proposed using grids instead of kd-trees as an acceleration structure for dynamicscenes. However, if initialization cost is unimportant, kd-trees are structures with best possibleperformance [Havran, 2001, Popov et al., 2007]. Popov [Popov et al., 2006] also investigatedstreaming kd-tree building approach. There was proposed an algorithm for parallel building of akd-tree [Shevtsov et al., 2007], which seems to fit best into already parallelized ray tracing.

The cache is particularly well suited for lazy kd-tree building. Typically, in a streaming raytracer, scene description is static and is prepared before actual stream computations. In thepresented design, however, scene description can be just read from file, with additional processingpostponed to be performed lazily, placing results into cache. Unfortunately, lazy kd-tree buildingdoes not seem to be a big win over parallel kd-tree construction, though. We suspect, but wehave not checked this, that lazy approach can offer more reasonable performance gains if a scenecontains a lot of invisible and unlit geometry, which is not the case of our tests.

5.2.2 Frame Buffer as an Output Stream

The output streams of the presented rendering machine contains image samples randomlyscattered at the image plane. Samples from such stream can be either immediately convertedto a raster image, or stored for further processing. The first option is used by the GPU-basedinteractive previewer, presented in Section 5.4. This approach is effective from memory usagepoint of view, but somehow limits image post processing capabilities. Alternatively, if all samplescould be stored individually, for example in a file in external memory, then the image could beformed by sophisticated external tools. This approach allows, for example, adaptive or non-linearfiltering in order to remove noise [Rushmeier & Ward, 1994,Suykens & Willems, 2000]. Animations

Page 77: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 66

could be generated in a frameless mode [Bishop et al., 1994, Dayal et al., 2005], which could bemore efficient. We leave these aspects for further research.

The previewer can generate onscreen images, as well as offscreen ones, in arbitrary resolution.The processing power of GPU is enough to justify a transfer of stream of samples to it, and thefinal image backwards to the main memory and then to the file in external storage. The onlylimitation is GPU memory size, which in inexpensive consumer level hardware is currently (2010year) typically 512MB or 1GB. The memory size limits maximum size of offscreen buffer to containroughly 10M or 20M pixels.

The output stream needs synchronization, since it can be written to by rendering stream ma-chine, and it can be read by an interactive previewer. The double buffering is necessary and enoughsolution to avoid rendering stalls when output stream data is sent to the GPU previewer. Thepreviewer can be a bottleneck only while rendering very simple scenes with top CPUs (e.g. IntelCore i7). The problem is, however, not with stream synchronization, but with insufficient speedof input processing by the GPU. These issues are examined in detail in Section 5.2.5.

5.2.3 Multipass Rendering

Multipass rendering cannot be executed by just one run of a stream machine. Examples ofalgorithms when multipass rendering is necessary are, among others, initialization of MetropolisLight Transport, and a photon map building step in the original Photon Mapping algorithm. Thepresented extended stream machine design allows this feature by splitting rendering into two func-tions – preprocess and render, which have to be implemented as a definition of rendering algorithm.Preprocess function is done before a stream rendering run. It can be performed sequentially, or ifnecessary, manually parallelized. After preprocess, stream of image samples is evaluated in parallel.

Preprocess signals if multipass is necessary. In this case, a number of necessary samples isdefined by preprocess routine, and then these samples are evaluated in a streaming way. Outputstreams are ignored, and the only way of communication is extended stream machine cache. Whenstream evaluation is finished, preprocess is executed again. Only the final rendering step can be runindefinitely, without prior specification of required number of samples. However, if the interactivepreview of partially rendered result is required, the preprocess must be performed quickly, sinceit increases latency between availability of initial result and rendering start. The sequence ofrendering procedures is presented below:

Algorithm 5.2: Single- and multipass rendering on extended stream machine.Initial multipass rendering passes.repeatMultipassSamples ← preprocess()if MultipassSamples > 0 thenrender(MultipassSamples) Rendering without output streams.At this point all cache writes are guaranteed to be finalized.

end ifuntil MultipassSamples > 0

Final rendering pass.render(until aborted) Rendering with output streams.cleanup()

5.2.4 Ray Tracing on an Extended Stream Machine

Expressing ray tracing as a stream computation gives a lot of freedom in design. Purcell [Purcell,2004] proposed splitting a full application into several small kernels, like ray-triangle intersection,

Page 78: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 67

grid traversal, ray scattering and generation of new rays, and so on. This design choice is well suitedfor graphics chips with limited programmability. We argue that it is possible and performancewise justified expressing the whole ray tracing algorithm with just one kernel. When extendedstream machine is used, the Photon Mapping is suitable for this design, too. Unfortunately, thisparticular approach limits choice of plausible hardware as an implementation platform of such astream machine. In fact, only general purpose CPUs are suitable, and even OpenGL 4.0/DirectX11 class GPUs are not programmable enough. This issue is discussed in detail in Section 5.3.

The single kernel design is very simple yet extremely efficient. The input stream contains justsample numbers, implemented as a continuous range of integers. That is, in other words a taskdescription is like this: evaluate samples from s1 to s2. All the work is performed in a kernel.The work contains converting sample numbers into quasi-Monte Carlo pseudorandom sequences,generating and tracing rays, and finally returning radiance values along traced rays. The outputcontains two streams and is a bit more complex, but not excessively. If an algorithm traces raysfrom the camera towards the scene only, each input number corresponds to one camera ray andone radiance value alongside it, which is written into the first output stream. If algorithm tracesrays from lights towards the camera, each input number corresponds to zero or more camera rays,with radiance values written into the second output stream. Additionally, it is necessary to storethe image plane location for each calculated radiance value. Due to mathematics underneathlight transport algorithms (discussed in Section 4.4), content of the two streams cannot be mixed.Sample density in the first stream does not affect local brightness of an image (it affects theprecision of the output only), while in the second stream it does. In fact, data from each stream isused to form an individual image, with a slightly different algorithm, and the final image is a sumof these two. Note that the final image formation is performed after a stream ray tracing, but itsimpact on performance is minimal, since its computational cost is substantially smaller.

Path Tracing and Bidirectional Path Tracing

These algorithms are ideal candidates for parallelization, and therefore near linear speedups canbe expected. All the samples generated by them are independent, so the only potential bottleneckis an algorithm initialization which includes costly construction of acceleration structure. The PathTracing algorithm uses only first output stream, so it is even better for parallelization. If real timerendering is to be achieved due to massive parallelization, a simplified version of Path Tracing (seeSection 4.4.1) seems to be an algorithm of choice. Parallelization efficiency and speedup of thisalgorithm is discussed in Section 5.2.5.

Photon Mapping and Combined Approach

The one pass variant of Photon Mapping starts with an empty cache. When initial imagesamples are to be generated, the algorithm found no photons in photon map, and switches to emitand trace a pack of photons and store them in the map. The minimal required number of photonsis expressed as a linear function f(n) = an+ b of number of evaluated sample n, where a and b areadjustable constants. For performance reasons, somehow more photons than required by functionf is generated. Eventually, the map contains enough photons for the rendering of range of n1 ton2 image samples. When a sample ni > n2 is to be evaluated, photon map is filled with additionalphotons. The algorithm run time, and therefore image quality, is limited only by machine memorysize.

The combined approach is implemented in a similar way, yet with the difference that not allimage samples are evaluated using a photon map. In some cases, when the predicted error of allbidirectional samples never exceeds a prespecified threshold, the photon map is never filled andremains empty through the entire algorithm run.

Page 79: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 68

Figure 5.3: Test scenes for parallel rendering. Left image: scene a) with roughly a million ofprimitives. Right image: scene b) with a few primitives.

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

Spe

edup

Number of threads

IdealPT

BDPTPM

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

Spe

edup

Number of threads

IdealPT

BDPTPM

Figure 5.4: Parallel Path Tracing, Bidirectional Path Tracing, and Photon Mapping run times.Left image: results for scene a). Right image: results for scene b).

5.2.5 Results and Conclusions

For test purposes a machine with an Intel Core i7 CPU has been used. The Core i7 is a quadcore processor capable of running eight threads simultaneously due to the SMT technology (eachcore can run up to two independent threads). The two test scenes are presented in Figure 5.3. Thescene a) is complex, being made from an about million primitives, The scene b) is much simpler,built from a few primitives, which occupy just one node of kd-tree. Both scenes are rendered withPath Tracing, Bidirectional Path Tracing and one pass Photon Mapping (Figure 5.4), forcing thestream machine to run on various numbers of threads.

The streaming approach to ray tracing is a very convenient way of expressing the algorithmsand exhibit their potential for parallelization. Our approach differs from previous ones, mainlybecause we have assumed that a general purpose processor is available, and just one kernel isenough for the whole ray tracing. The two substantial benefits from streaming are high scalability

Page 80: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 69

and memory latency hiding. The scalability is likely to be crucial in near future, when CPUswith more cores are to be introduced. In our case memory latency hiding is not the result ofdata locality. In fact, our input stream is virtually empty (the integers representing ranges ofsamples to evaluate consume few kilobytes at most), while the read only memory containing scenedescription is typically hundreds of megabytes or even gigabytes large. Access patterns to the scenedescription depend on ray generation algorithms. Selected algorithms prefer to scatter rays intothe scene as much as possible, since this improves statistical properties of generated samples. Thatis, mixed and scattered rays provide much more information than coherent ones. Unfortunately,this approach is not especially cache friendly – the presented implementation reads data in severalbyte chunks from seemingly random locations. However, when CPUs provide technologies likeSMT (Symmetric Multi-Threading), memory latencies are not an issue. When one thread is stuckwaiting for data, the processor core could continue to execute another thread. Creating muchmore additional threads is very easy due to employed streaming model. Therefore, only memorybandwidth can become a potential bottleneck.

The memory latency hiding is very well visible in parallel rendering efficiency of scene a). Thetotal speedup obtained for Path Tracing and Bidirectional Path Tracing is roughly 5.5 on a quadcore CPU. The speedup using four threads is far from ideal though, being roughly 3.2 instead ofexpected 4.0 (the Intel Turbo Boost technology is insignificant, increasing frequency of CPU ifone core is used by approx. 5%). This seems to be a hardware issue – independent cores competefor access to a shared L3 cache and a memory controller. Both above mentioned algorithms usejust read-only data structures with no synchronization, so obtained suboptimal speedup is not asoftware problem. To additionally support this claim, we present a different scene b), containingjust exactly seven, non-textured primitives. Bidirectional Path Tracing produces two streams ofoutput data, while Path Tracing produces just one. In fact, BDPT generates roughly 4-5 timesmore data, measured in bytes, than PT over the same time period. In scene b) rendering, thenecessity of transfer of ray traced samples to the GPU (see Section 5.4) is the bottleneck forBDPT. In case of scene a), both PT and BDPT exhibit almost identical speedups, so the transferof samples to the GPU does not affect the speedups.

Completely different results are obtained with Photon Mapping algorithm. In the tests, our onepass variant of the method has been used. The photon map is a read-write structure, so it mustbe properly locked, limiting scalability. The most efficient synchronization solution appears to besimplest fair readers-writer lock – without any extra features – the efficiency gain is questionablesince the parameters tuning is highly scene dependent. In fact, using more cores allow renderingmore complex scenes in the same time, rather than rendering the same scene in shorter times.This is clearly visible for the scene b), where maximum obtained speedup is roughly 1.4 for twothreads. Adding more threads only makes matters worse. The one pass Photon Mapping can berun efficiently in parallel for real world scenes, though. The combined algorithm speedup could beanywhere between Bidirectional Path Tracing and Photon Mapping, depending on the renderedscene. This is due to fact that it chooses one of these methods depending on the currently sampledlight transport path.

During the tests, the Intel Core i7 platform has shown very good rendering times, comparedto previous Intel architecture with memory controller separated from a CPU by inefficient FSB(Front Side Bus), yet it has not provided expected speedups in parallel rendering. We suspect thatL2 ↔ L3 cache or memory throughput is too low. Moreover, it would be interesting to see howperform a CPU where a core can process any thread – a solution known from GPUs. In currentCore i7 architecture, any core can process one of just two threads, and if both threads wait fordata, the core is stalled.

5.3 Choice of Optimal Hardware

The choice of the appropriate hardware for high performance rendering is crucial, especiallywhen parallel implementations are necessary for achieving reasonable computation time, since itaffects design and performance of the application. Currently, single processor sequential machines

Page 81: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 70

do not have enough computational power for full global illumination. This fact probably will betrue in future, because modern advancements in microprocessor technology give substantially moreattention to produce more computational cores on a single machine, than to produce ultra fastsingle processor cores.

The rest of this section starts with a comparison of two models of parallel programming – sharedmemory with multiprocessor machines and explicit message passing with clusters of individualmachines. Next, there is a comparison of multiprocessor machines and GPUs, both of which areprogrammed in a shared memory style. Finally, there is justification of our choice of optimalhardware for implementation of an extended stream machine.

5.3.1 Shared Memory vs. Clusters of Individual Machines

From programmer’s perspective, there are two popular basic parallel techniques – shared mem-ory and message passing. Shared memory model is used on multiprocessor machines, whereasmessage passing in distributed environment. Shared memory tends to be faster, but lack scal-ability. On the other hand, explicit message passing allows massive parallelization, at the costof delays introduced even when few machines are used. In ray tracing, both techniques do notprovide additional available memory. In shared memory model this fact is obvious. Ray tracingimplemented by means of message passing in cluster environment requires data replication on eachmachine, to avoid substantial performance degradation.

With the recent development of multi core processors, shared memory gains an advantage.Today, having twelve processor cores in a single workstation is not uncommon. Recent introduc-tion of six core CPUs – Intel Xeon ’Dunnington’ – with four socket boards results in 24 coresupercomputers. Unfortunately, machines with more than two CPU sockets on a motherboard aresubstantially more expensive, while two socket machines roughly double the cost over single socketones. Considering price-to-performance ratio, currently two processor machines with twelve corestotal, seem to be optimal for ray tracing.

5.3.2 Multiprocessor Machines vs. Graphics Processors

Contemporary rendering algorithms are written for either CPUs or GPUs. The importantdifferences between these are degree of parallelism and flexibility of programming model. Forexample, as of 2010 Nvidia architecture contains up to 480 processing cores [Nvidia Corporation,2009b], while popular Intel CPU based servers contain two CPUs with six cores each. However,despite these numbers, GPU based global illumination ray tracers are not significantly faster thanCPU ones. The computational power of a modern GPU can be fully utilized only in a narrowclass of algorithms, due to severe limitations imposed on GPU programs. The most significantare, among others, severe performance penalties for data dependent branching, lack of virtualfunction calls, lack of memory management techniques, and inefficient double precision processing,suffering from performance drop about 8x, not to mention very limited available memory amount.Ghuloum [Ghuloum, 2007] described majority of these cases. This obviously is a major handicap forany GPU based ray tracer, limiting its features drastically. The direct comparison of computationalpower of GPU and CPU, presented in [Nvidia Corporation, 2009a], is not trustworthy, because ofthese restrictions and the fact that a powerful GPU needs also a powerful CPU to run the driver.

There are technologies for installing multiple graphics cards in a single machine – ATI CrossFireallows joining up to four units, and Nvidia SLI up to three. Both technologies offer questionableprice-to-performance ratio, however. First, despite the fact that all GPUs are placed in a singlemachine, shared memory model does not work. That is, a GPU cannot directly access memory ofanother GPU. In fact, memory management is not exposed and performed indirectly by graphicsdriver, which forces data replication between individual units. Moreover, to make things worse,two physical GPUs are not seen as a one logical unit with doubled number of cores and the sameamount of memory. That is, software which is well suited for a standard GPU may not work at allin SLI or CrossFire mode. This is not an uncommon issue, and there exist a few computer games,

Page 82: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 71

which cannot take advantage of these technologies. Since multisocket motherboards do not exhibitsuch design flaws, and because low level details of graphics cards are not published, we have neverseriously considered SLI and CrossFire as technologies well suited for implementing more powerfulstream machines.

In author’s opinion, a term GPU (Graphics Processing Unit) is seriously misleading. In fact,there are many graphics algorithms, which cannot be run efficiently on any contemporary GPU.Actually, GPGPU unfortunately is not general purpose enough for a flexible and efficient imple-mentation of ray tracing. The major handicap is the architecture of GPUs. The 480 core NvidiaFermi consists only 15 truly independent cores, each of them capable of processing 32 threads.The caveat is that each of the 32 threads must execute the same instruction at the same time. Ifsome of threads execute ’if’ branch, and others choose ’else’ branch, these branches are executedsequentially, forcing parts of threads to stall.

It is important to realize which class of algorithms can be effectively run on GPUs. For example,matrix multiplication or FFT are good candidates, but sorting, tree building and tree search arenot. The two latter algorithms are major time consuming processes in ray tracing, and thereforethere were work dedicated to avoid these difficulties. Purcell [Purcell, 2004] does not use tree datastructure at all, using less flexible grid as ray intersection accelerator, and Popov et al. [Popovet al., 2007] modifies tree searching to a stackless algorithm, at the cost of large memory overhead.It is not sure if GPU performance benefits can overcome penalties for using suboptimal algorithms,just to make ray tracing compatible with contemporary GPUs. The presented implementation doesnot try to achieve this compatibility, focusing solely on optimization for Intel CPU architectures.However, since modern GPUs process regular data structures very efficiently, they are very usefulin visualization of ray tracing results. This topic is investigated in detail in Section 5.4.

5.3.3 Future-proof Choice

Today the future of rendering is unclear. Recently, GPGPU programs become significantly morepopular. Nvidia designed for this purpose new API called CUDA [Nvidia Corporation, 2009a]. ThisAPI is much better than using OpenGL or DirectX shaders for non-graphic computations, but isstill not as general purpose as traditional CPU programming tools. The GPGPU obviously hasa lot of potential, see for example [Harris, 2005], but CPU vendors also improve their productssignificantly. There are also ideas of producing a processor from massive number of x86 cores [Seileret al., 2008]. Moreover, there has even been proposed a chip, based on a FPGA technology,dedicated for ray tracing operations [Woop et al., 2005].

Therefore targeting a rendering application onto a hardware platform with peculiarities requir-ing serious algorithmic restrictions, which is likely to become obsolete in few years time, is nota good idea. Instead, we have expressed ray tracing computations for a virtual stream machinemodel, and have implemented this machine on the best, fully programmable platform, which iscurrently available. Today, the best platform seems to be one of the Intel workstations, but ifsituation changes in the near future, implementing a stream machine on a more flexible, futureversion of today immature GPGPU, should not be very difficult. Perhaps a new revision of ourrendering software will be rewritten in the Intel Ct language, which is currently being developed.

5.4 Interactive Visualization of Ray Tracing Results

Just after the appearance of first programmable DirectX 9 class graphics processors therewere first attempts to use it for ray tracing [Purcell et al., 2002]. Nowadays, vast majority ofcontemporary real time global illumination algorithms are based on computational power of modernGPUs, e.g. [McGuire & Luebke, 2009,Wang et al., 2009]. Unfortunately, they still put restrictions,often quite severe, on scene content (limited range of material and geometry representation), scenesize, and illumination phenomena which are possible to capture.

Page 83: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 72

The true, unrestricted, global illumination algorithms, which solve the Rendering Equation[Kajiya, 1986], are not well suited for OpenGL 4.0 class GPUs. Such implementation is possible,as has been shown numerous times, but severely restricted when compared with classic multi-core CPU solutions, because GPUs cannot process irregular data structures effectively [Ghuloum,2007]. However, this is not the only way to obtain interactivity – nowadays multi-CPU workstationscan perform interactive ray tracing [Pohl, 2009], yet true global illumination is still unachievable.Interactivity can also be obtained using clusters of machines with CPU rendering [Benthin, 2006].

On the other hand, approach presented here is substantially different from those above – placingabsolutely no restrictions on scene and illumination effects, it uses GPU based visualization clientjust to display and postprocess image made from CPU ray traced point samples, in resolutiondynamically adjusted for real time performance. Our renderer, designed for flexibility of CPUs,based on significantly modified Bidirectional Path Tracing and Photon Mapping with quasi-Monte-Carlo (QMC) approach (see Chapters 3 and 4). Such traditionally CPU based algorithms are verydifficult to port to GPUs. When, despite all problems, they are ported eventually, performancebenefits of GPUs over multicore CPUs are questionable [Ghuloum, 2007].

Pure ray tracing algorithms are based on point sampling scene primitives, not using scan linerasterization at all. This gives much freedom in the way how samples are chosen, however QMCray tracing algorithms produce a huge number of samples, which do not fit into a raster RGBgrid. Converting these data to 3x8bit integer based RGB image at interactive frame rates may beimpossible even for multi-core CPUs, especially when dynamic image resolution has to be adjustedto the server rendering speed and scene complexity, with some non-trivial post-processing added.As we will show, conversion of ray tracing output to a displayable image and many post-processingeffects can be expressed purely by rasterization operations, in which GPUs excel. The main ideabehind the presented approach is therefore the usage of the best suitable processor for a givenalgorithm, instead of porting everything to GPUs.

The rest of this section starts with a characterization of output of rendering server, required tobe compatible with the presented visualization client. Then, there are detailed descriptions of awrapper for rendering server and algorithms used in visualization client. Finally, obtained resultsare discussed. Research results explained in this section are also presented in [Radziszewski et al.,2010].

5.4.1 Required Server Output

In general the server may run any point sampling algorithm, but in this project we rely onQMC ray tracing. The visualization client assumes the specific format of the server’s output. Inthe following subsections there is a detailed description of the conditions which should be fulfilledto make the client work properly.

Color Model

Having in mind further processing, it may be useful to output full spectral images (see Section4.3 for a detailed description of full spectral rendering). However, full spectral representationrequires huge amount of memory. For example, full HD spectral image in 16bit floating precisionand with 3nm wavelength sampling from 400nm to 700nm needs as much as 1920×1080×100×2B ≈400MB, while RGB one requires 1920× 1080× 3× 2B ≈ 12MB.

The standard CIE XYZ space [Fraser et al., 2005] seems to be the best option instead, sincean RGB space, which depends on a particular display hardware, is not a plausible choice. Forthis reason the visualization client accepts CIE XYZ color samples. The rendering server nativelygenerates full spectral data and a server wrapper converts it internally from full spectrum to thethree component color space.

Page 84: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 73

Format of Output Streams

Some most advanced ray tracing algorithms trace rays in both directions – from the cameratowards lights (camera rays), and in the opposite one (light rays). Such approaches produce twokind of samples, which must be processed differently in order to produce displayable images [Veach,1997].

The client accepts two input streams. The format of samples is identical in both streams:([u, v], [x, y, z, w]), where [u, v] are screen space coordinates, in [0, 1]2 range, or, perhaps, withslight overscan to avoid postprocess filtering edge artifacts, x, y, z is sample color value in CIEstandard, and w is the sample weight.

The two streams differ only in interpretation of sample density. The pixels of image fromcamera rays are evaluated by averaging local samples using any suitable filter – sum of weightedsamples is divided by sum of weights. On the other hand, pixels of light image are formed using asuitable density estimation technique – samples are filtered and summed, but not divided by sumof weights. Therefore, a sample density affects only quality of camera image, while it affects bothquality and brightness of light image. The final, displayable, image is a sum of both camera andlight images, the latter divided by a number of traced paths. Unfortunately, samples from lightimage potentially can be scattered very nonuniformly over screen space. This, however, is not anissue, when the sample density is roughly proportional to image brightness.

Obviously, not all ray tracing algorithms need both – camera and light – output streams.For example, Path Tracing [Kajiya, 1986] and Photon Mapping [Jensen, 2001] produce camerasamples only, while Particle Tracing [Arvo & Kirk, 1990] needs only light image. Therefore, thevisualization client employs an obvious optimization – it skips processing of a stream given thatno samples were generated into it.

Stream Uniformity

The server should provide stream of color values scattered uniformly at random locations in thescreen space. The uniformity of sampling ensures acceptable image quality even at low samplingrates, which is typical due to high computational cost of ray tracing.

Additionally, the output stream should be generated roughly uniformly in time. Otherwise theclient might fail to maintain interactive refresh rates. The original, two pass variant of PhotonMapping is therefore unsuitable for interactive visualization. This is the main motivation fordevelopment of one pass version of this technique, described in detail in Section 4.4.5.

Strictly speaking, the new approach does not generate batches of samples in exactly uniformtime. Due to kd-tree lookup computational complexity as well as linear dependence betweennumber of photons in kd-tree and number of samples computed, the average time to calculatenth sample is the order of O(log n), where n is the sample number. Logarithm, however, changesslowly, which is acceptable for the client.

Coherent vs. Non-Coherent Rays

For some time now it is often claimed that it is beneficial to trace camera rays in a coherentway, because it can significantly accelerate rendering [Wald et al., 2001, Benthin, 2006]. This istrue, but only for primary rays (sent directly from camera or light source). Unfortunately, rays,which are scattered through the scene, do not follow any coherent pattern and caching does nothelp much. Since true global illumination algorithms typically trace paths of several rays, thesealgorithms do not benefit much from coherent ray tracing.

What is more, coherent ray tracing tends to provide new image data in tiles, which make pro-gressive improvement of image quality difficult. On the other hand, we have chosen to spreadeven primary rays as evenly as possible, using carefully designed Niederreiter-Xing QMC se-quence [Niederreiter & Xing, 1996] as the source of pseudorandom numbers. Therefore, it can

Page 85: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 74

be expected that very few traced rays provide reasonable estimate of colour of the entire image,and subsequently traced rays improve image quality evenly.

5.4.2 Client and Server Algorithms

Finally, a GPU task is a conversion of point samples into a raster image. The conversion is donewith resolution dynamically adjusted to the number and variance of point samples. In the image, acolor conversion from XYZ to RGB space of current monitor, together with gamut mapping, tonemapping, gamma correction and other post-processing effects are performed.

As a target platform we have chosen a GPU compatible with OpenGL 3.2 [Segal & Akeley,2010,Shreiner, 2009] and GLSL 1.5 [Kessenich et al., 2010,Rost & Licea-Kane, 2009]. Major partof algorithm is coded as a GLSL shader, which suits our needs very well. Recent technologies,such as Nvidia CUDA, ATI Stream, or OpenCL [Munshi, 2009] are not necessary for this kind ofalgorithm.

The rendering task is split into two processes (or threads in one process, if a single applicationis used as a client and server) running in parallel: a server wrapper process and visualizationprocess. The rendering process may be further split into independent threads, if multicore CPUsor multiple CPU machines are used.

Server Wrapper Process

Ray tracing can produce virtually unlimited number of samples, being limited only theoreticallyby the machine numerical precision (our implementation can generate as many as 264 samples beforesample locations eventually start overlap). Therefore, ray tracing process is reset only immediatelyafter user input, which modifies the scene. Otherwise, it runs indefinitely, progressively improvingimage quality.

The server wrapper runs on a separate thread, processing commands. The wrapper recognizesfour commands: term, abort, render and sync. The term command causes wrapper to exit itscommand loop, and is used to terminate the application. The abort command aborts currentrendering, and is used to reset server to the new user input (for example, camera position change).

The render command orders server to perform rendering. The rendering is aborted wheneither abort or term command is issued. Maximum time to abort rendering is a time necessary togenerate just one sample. Any algorithm capable of generating the specified output (see Section5.4.1) can be used. In our server implementation, rendering is performed in parallel on multicoreCPUs.

The wrapper allows registering asynchronous finish and sync events. The finish event isgenerated when rendering is finished (either a prespecified number of samples was generated orabort was issued). The sync command, when is executed, triggers a sync event, passing to it anydata specified in sync message. When sync event is triggered, all commands sent before the synccommand are guaranteed to be finished. These events can be used to synchronize the visualizationclient with rendering server. Apart from sending asynchronous messages, the wrapper can bequeried synchronically for already rendered samples. Since this query just copies the data to theprovided buffer, server blocking due to necessary synchronization takes little time.

Client Process

Client is responsible for visualizing samples generated by server, and additionally it processesGUI window system messages. Client stores its internal data in the two screen-aligned two layertexture arrays, in the IEEE 32bit floating point format. A 4-channel [X,Y, Z,W ] texture and a twocomponent variance [V ar,W ] texture are stored, each layer for camera and light input streams.Therefore, client stores 48 bytes of data per screen pixel, apart from standard integer front andback buffers. The details of client main loop are presented in Figure 5.5.

Page 86: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 75

process input

rasterize samplesrepaint back buffer

swap buffers with vsync

get new samples

start

quit

Figure 5.5: Main loop of visualization client process.

When all GUI messages are processed, client rasterizes new samples, generated by the server,into its internal textures. This task is performed by the render-to-texture feature of FramebufferObject (FBO). The client uses an almost empty vertex program, which only passes through data.The geometry program is equivalent to rendering textured point sprites fixed functionality, andadditionally, it routes input samples to the appropriate texture layer. The input is a stream ofthe following elements – two component screen position (u, v), four component color (x, y, z, w)and flag, which encode whether the sample belongs to camera or light stream. Input is placed inVertex Buffer Object (VBO), and is then rendered with GL ’render points’ command. Points arerendered in blending mode set to perform addition, ensuring that all samples add up instead ofoverwriting previous texture content.

Additional input is a monochromatic HDR filter texture, used to draw point sprites. Thetexture is normalized (all the texel values add up to one) and the texture border value is set tozero. The filter texture is applied without rescaling and with bilinear filtering, thus preserving filternormalization, which is crucial for algorithm correctness. We have found that 5x5 texel windowedGaussian blur gives good results.

The rendering is performed in two passes. First, color texture array is updated. In the secondpass, using already up-to-date color texture, variance texture array is updated. In both passes,the same samples are rendered. The variance is updated using the following formula:

Vj = Vj−1 +∑i

(Yi − Y j)2, (5.1)

for jth batch of i samples. The formula does not give the best theoretically possible results, sincethe mean Y is approximated using only already evaluated samples. The alternative formula:

Vj = Y ′j − Y2

j ,

Y ′j = Y ′j−1 +∑i

Y 2i ,

(5.2)

which requires storing sum of squares (Y ′) instead of variance, should be avoided due to poornumerical stability (even negative variance results are possible). In both formulas the division byn− 1 factor, where n is the total number of samples in a given stream, is omitted. This division isperformed when variance data is read from its texture. The details of rasterizing new samples arepresented in Algorithm 5.3.

Page 87: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 76

Algorithm 5.3: Rasterization of point samples by Visualization client.The content of client sample buffer (triples [u, v], [x, y, z, w], f lag) is loaded into VBO, there1

is one buffer for both streams – camera and light samples are encoded by a flag;Monochromatic float texture with filter image is selected and point draw command is2

issued, the texture is used as a texture sprite for emulated point sprites;Geometry program routes the samples to the appropriate texture layer;3

Fragment program performs multiplication of ’color’ attribute by the texture value4

[X,Y, Z,W ];After rasterization, color texture array is detached from FBO, GPU MIP-map build5

command is issued;Texture LoDs (used by ’repaint back buffer’ processing) for both streams are evaluated as6

LoDi = log4(P/Si)), where P is number of pixels on the screen and Si is the number ofsamples from ith stream computed so forth;Second draw is issued, with variance textures as output this time. The variance is evaluated7

only for luminance (Y ) component, since three component variance typically do not helpmuch and substantially complicates algorithm. Variance output for each stream is(Yavg − Y )2, where Yavg is read from previously generated color texture, and Y is luminanceof currently processed sample, multiplied by filter texture;Similarly to color texture, variance texture array is detached from FBO, GPU MIP-map8

build command is issued;

In order to repaint back buffer, client draws a screen-sized quad, using the four textures as aninput. The screen is filled with custom fragment program. The program accepts following controlparameters: level of detail (LoD) for both streams, light image weight (Lw), image brightness (B),contrast (C), gamma (G), color profile matrix (P ), and variance masking strength (V m). Level ofdetail (LoD) is already evaluated during rasterization. Now, the LoD values are used by fragmentprogram to blur texture data if not enough samples are computed. Light image weight is got fromthe server along with samples, and its value is equal to the number of paths traced from lightsources. This parameter is used to scale light image texture appropriately, such that the texturecan be summed with camera image texture.

Image brightness, contrast, gamma and color profile are set by the user, and their valuesadjust the image appearance. Additionally, the visualization client is able to add a glare effect(see Figure 5.6) as an additional post-process, implemented as a convolution with a HDR glaretexture, generated according to [Spencer et al., 1995]. However, sufficiently large glare filters arefar beyond computational power of contemporary GPUs for real-time screen refresh rate. Sincethese parameters are defined only for client, and do not affect server rendering at all, their valuescan be modified freely without resetting the server rendering process.

Variance of samples is estimated only for luminance (CIE Y channel), using the standardvariance estimator (V ≈ 1

N−1

∑(E(Y )− Yi)2, where N is the number of samples, Yi are luminance

values, and E(Y ) is the luminance value estimated from samples computed so far. The client isable to increase blurriness according to the local changes in estimated variance, hence slightlymasking noise produced by stochastic ray tracing. The noise to blurriness ratio can be controlledby V m parameter.

The blurriness is created by low pass filter or bilateral filtering [Paris et al., 2008] guided byvariance estimation, which potentially can be much better in preserving image features than asimple low pass filter. However, bilateral filtering works correctly only if noise is less intense thanimage features. When image is heavily undersampled, this assumption may not be satisfied, anda low pass filter remains the only viable option. For example, in Figure 5.8, the two leftmostimages cannot be enhanced by bilateral filtering. On the other hand, this technique does a goodjob improving the quality of middle image from Figure 5.10.

Unfortunately, the noise masking feature can hide only the random error which is the result ofvariance. It cannot hide (in fact, it cannot even detect) other kind of error resulting from bias.

Page 88: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 77

Figure 5.6: Glare effect applied as a postprocess on the visualization client. The effect is notgenerated in real time, it took roughly 10sec to render an image and 1sec to visualize it on Nvidia9800 GT in 512x512 resolution.

The variance is the only source of error in Bidirectional Path Tracing, while Photon Mapping erroris dominated by bias. The details of processing back buffer repaint are presented in Algorithm 5.4.

Algorithm 5.4: Visualization client repaint processing.The program reads data from both variance maps, using requested LoDs through hardware1

MIP-mapping;LoDs for both streams are evaluated according to initial LoDs, the variance and V m, for2

ith stream: LoD′i ← LoDi + V m log4([V ar]);[X,Y, Z,W ] textures of both streams are sampled, this time using just evaluated LoD′ and3

custom filtering technique (hardware MIP-mapping produces very poor results, see section5.4.3 for more detailed discussion);Texture samples for both streams are normalized, i.e. [X,Y, Z,W ]→ [X/W,Y/W,Z/W, 1] (if4

W = 0, then sample is considered to be [0, 0, 0, 1]). Then, light texture sample, divided byLw, is added to camera texture sample, producing single result for further processing;Optionally, glare effect is applied here. Our glare texture is generated to be applied in XYZ5

color space instead of RGB one;Tone mapping of luminance (Y ) is performed, using very simple yet effective procedure:6

Y ′ ← 1− exp(−(B ∗ Y )C), while X and Z components are scaled by Y/Y ′ ratio. If Y = 0 itmeans that image is black at that point and X ′Y ′Z ′ ← (0, 0, 0) is used;Resulting X ′Y ′Z ′ is multiplied by matrix P , and a basic gamut mapping is performed (see7

Section 6.1.3). Now output is in RGB format, normalized to [0, 1] range;Gamma correction using G is performed;8

Page 89: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 78

Next, client swaps front and back buffers, in synchronization with screen refresh period. Thisguarantees constant frame rate (typically 60Hz for common LCDs).1 Finally, client reads newsamples from the server. The reading is performed with synchronization, blocking the server for amoment. However, client does not display samples immediately, blocking server just for copyingthis portion of data to its internal buffer for later processing.

5.4.3 MIP-mapping Issues

Images produced by rasterizing ray traced samples are created as screen-sized textures. Shouldenough samples be generated, these images could be used immediately without any resampling.Unfortunately, contemporary CPUs are far too slow to generate at least #screen pixels of suchsamples in, say, 1/30sec, which is required for real time performance. Therefore, some kind ofblurring texture data, according to fraction of necessary samples generated and the local samplevariance, have to be performed.

While MIP-mapping is reasonably good in filtering out texture details which would otherwisecause aliasing, it cannot be used reliably to blur the texture image. Blurring by using LoD biasparameter of texture sampling function produces extremely conspicuous and distracting squarepattern, with severe bilinear filtering artifacts (see Figure 5.7 for details). This is not surprising,since a GPU uses box filter to generate MIP-maps and linear interpolation between texels to eval-uate texture value at the sampled point. Moreover, MIP-mapping with polynomial reconstructioninstead of linear one fails as well. We have used custom texture sampling with Catmull-Rom splineinterpolation for this purpose.

Visually good results can be obtained by using Gaussian blur:

I(u, v) =

∑i

∑j Tijgij(u, v)∑

i

∑j gij(u, v)

.

The I is texture sample, u, v is the sample position, T are texel values, and gij = exp(−σd2ij) is

the filter kernel, with σ controlling blurriness, and dij is the distance between the u, v positionand texel Tij . Unfortunately, direct implementation of Gaussian blur requires sampling an entiretexture for evaluation of any texture sample, which is far beyond computational capabilities ofcontemporary GPUs. The weight of Gaussian filter, however, quickly drops to zero with increasingdistance from evaluated sample. Truncating the filter to a fixed size window containing limitednumber of samples is a commonly used practice.

The simple truncation is not always optimal, since quality of truncated Gaussian filter dependsstrongly on the σ parameter – to obtain similar quality with different sigmas, an O(σ−1) number oftexels have to be summed. That is, if a Gaussian filter is truncated too much, it starts to resemblea box filter. In our case, σ varies substantially, and therefore more advanced technique should beused. We may notice that decreasing a resolution of the original image twice, and increasing σ fourtimes, approximates the original filter on the original image. Eventually, the following algorithmis employed: initial MIP-map level is set to zero, and while σ is smaller than a threshold t, the σis multiplied by four, and MIP-map level is increased by one.

The threshold t and number of summed texels have been adjusted empirically to balance theblur quality and computational cost. First we have found that truncation range R of roughly 2.5is a maximum value which ensures reasonable performance. For such truncation, setting t ≈ 1is reasonable. Additionally, it is better to use a product of g and smooth windowing function winstead of original g if truncation is used. The w = 1 − smoothstep(0, R, d)E , where E controlshow quickly w drops to zero with distance, works quite well. The value E = 8 yields good results.

What is more, the transition between MIP-map levels is noticeable and decreases image quality.This is especially distracting if σ varies across the image, which is the case because blur is adjusted

1GPU class must be properly selected for a monitor resolution. If GPU is too poor, interactivity is not obtained.We found that best contemporary single processor GPU (Nvidia GTX 285, at the time of testing) is enough forrefresh rate of 30Hz in full HD. Such issue, however, does not slow down the server – the same number of samplesis still rendered in the same amount of time, they are just displayed more rarely, in larger batches.

Page 90: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 79

to the locally estimated variance. Therefore, similarly as in trilinear filtering, the Gaussian bluris performed on two most appropriate MIP-map levels, and the results are linearly interpolated,avoiding sudden pops when MIP-map level changes. Therefore, truncation to range 2.5 causeblurring to use 2[(2 · 2.5)2] = 50 texture fetches on average, which is costly, yet acceptable oncontemporary GPUs.

The sophisticated filtering scheme is used only for [X,Y, Z,W ] textures. Variance [V ar] tex-tures, not being displayed directly, do not have to be sampled with anything more complicatedthat basic MIP-mapping. This saves some computational power of a GPU, yet does not producenoticeable visual artifacts.

Figure 5.7: Comparison of MIP-mapping and custom filtering based blur quality. From left: ref-erence image, hardware mipmapping, custom reconstruction based on Catmull-Rom polynomials,windowed Gaussian blur.

5.4.4 Results and Discussion

The quality of rendered images obviously mostly depends on the rendering algorithm used.We have tested the visualization client in cooperation with Path Tracing (Figure 5.8) and PhotonMapping (Figure 5.9). Both figures present initial image rendered after 1/30sec and show the speedof image quality improvement. All the tests were performed on Intel Core i7 CPU and Nvidia 9800GT GPU, in 512x512 resolution.

Figure 5.8: Results of Path Tracing (from left: after 1/30sec, 1/3sec, 3sec, 30sec). The PathTracing error appears as noise, blur in the first two images is caused by undersampling (far lessthan 1 sample per pixel was evaluated).

The client is responsible merely for visualization and postprocessing, assuming that it is pro-vided with stream of point samples, scattered roughly evenly through an entire image. The onlyalgorithm for image quality improvement is noise reduction based on variance analysis. The errordue to variance (seen as high frequency noise) is much more prominent in results of Path Tracingthan in Photon Mapping, so the noise reduction has been tested on the first algorithm. The resultsare presented in Figure 5.10.

When multiple processors are used in the same application, good load balancing is important.While it is well known how to load balance ray tracing work between multiple CPUs, in our

Page 91: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 80

Figure 5.9: Results of Photon Mapping (from left: after 1/30sec, 1/3sec, 3sec, 30sec). PhotonMapping does not produce much noise, but due to overhead caused by photon tracing and finalgathering, less image samples than with Path Tracing were computed, which cause some blurriness.

Figure 5.10: Noise reduction based on variance analysis of Path Tracing image (from left: nonoise reduction, with noise reduction, variance image). The difference is not large, but noticeable,especially in shadowed area beneath the sphere and on the indirectly illuminated ceiling.

application it is impossible to balance loads between visualization client and ray tracing server.The subtasks performed by CPUs and GPU are substantially different and suited for differentarchitectures of these two processors, so work cannot be moved to the less busy unit as needed. Infact, on contemporary machines rendering server is always at full load, and GPU can be not fullyutilized, especially when low resolution images are displayed. However, it is good to have somereserve in GPU power to ensure real time client response.

We have presented an interactive GUI visualization client for displaying ray traced imagesonline, written mainly in GLSL. Apart from visualization, the client can hide noise of input databy means of variance analysis. Moreover, the client can apply glare effect as a postprocessingtechnique, which is performed quite efficiently on GPU. The client is able to obtain interactivityregardless of the ray tracing speed. However, the price to pay is blurriness of images rendered atinteractive rate. Nevertheless, the image quality improves quickly with time whenever renderedscene is not changed.

Additionally, we have modified the Photon Mapping algorithm to be a one-pass technique, withthe photon map being updated interactively during the whole rendering process. This enables usingPhoton Mapping with the presented visualization client, which then could ensure progressive imagequality improvement, without any latencies resulting from construction of photon map structure.

Our approach scales well with increasing number of CPU cores for ray-tracing, as well as withincreasing number of shader processors on a GPU. Moreover, the program never reads results fromthe GPU, so it does not cause synchronization bottlenecks, and should be friendly with multi-GPUtechnologies like SLI or Crossfire.

Our visualization client has a lot of potential for future upgrades. The adaptive filtering tech-nique [Suykens & Willems, 2000] seems to be good approach to significantly reduce image noise on

Page 92: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 5. PARALLEL RENDERING 81

the side of the visualization client. Moreover the client can be extended to support frameless ren-dering [Bishop et al., 1994,Dayal et al., 2005]. This very interesting and promising technique canimprove image quality substantially using samples from previous frames, provided that subsequentimages do not differ too much.

In future we plan to introduce to our client stereo capability, using OpenGL quad-bufferedstereo technology. Ray tracing algorithms can easily be converted to render images from twocameras at once, and a lot of them can do this even more efficiently than rendering two imagessequentially (for example, Photon Mapping can employ one photon map for both cameras, andsimilarly, Bidirectional Path Tracing can generate one light subpath for two camera subpaths).Unfortunately, stereo rendering doubles the load on the GPU shaders, as well as on the GPUmemory. However, it seems that interactive stereo can be obtained by slight decrease of customtexture filtering quality.

Page 93: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 6

Rendering Software Design andImplementation

Global illumination algorithms alone are not enough to create realistic images. Equally impor-tant is input data, which can properly describe real world geometry, materials, and light sources.Without sufficiently complex artificial scenes, even globally illuminated images look plain, dull andunbelievable. There is a good design idea to separate rendering algorithms from input data man-agement functions by a well designed layer of abstraction. All the classes and functions availablefor rendering algorithms in this thesis are called environment functions.

In order to achieve satisfactory degree of realism, environment functions must operate on hugeamount of data. Efficient implementation of this task is very difficult, because of storage limitationsof contemporary machines and a required performance of the functions. In fact, requirements oflow memory consumption and high execution speed are contradictory, and any implementationmust seek for a reasonable compromise. The efficiency of a given rendering algorithm decides howmany environment function calls is required for rendering image with requested quality. The finalrendering speed, therefore, cannot be satisfactory, if good rendering algorithm is paired with slowenvironment functions.

Apart from environment functions, a clear and effective interface between them and renderingalgorithms is also a necessity. Well designed interface can make implementation of the softwareeasy, while poor one can even disable possibility of implementation of certain algorithms. Anextended stream machine and parallelism is also hidden behind a specialized interface, however,this interface is not fully transparent.

The rest of this chapter starts with a description of framework for core functionality and itsinterfaces. These interfaces define the communication between 3D objects and rendering algo-rithms. Next, an integrated texturing language, specialized for ray tracing and global illuminationis presented, and finally, new reflection model, also optimized for global illumination algorithms,is derived.

6.1 Core Functionality Interface

The framework mainly defines interfaces between rendering algorithms, cameras and 3D objects,and some auxiliary functions and classes. Careful design of framework is an extremely importanttask in creating high quality rendering system. The framework decides what and what not canever be implemented. Interface modifications typically cause tedious and time consuming changesof large parts of the system.

Our idea is to define a framework as an interface in object oriented sense, with addition of a fewauxiliary classes. The framework contains only variety of sampling methods. That is, algorithms

82

Page 94: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 83

can be implemented only by usage of sampling routines, and it is enough to define samplingoperations to define new 3D objects or cameras. What is more, the design exhibits symmetry oflight transport, which enables giving 3D object and camera very similar and consistent interfaces.This feature substantially simplifies implementation of bidirectional ray tracing algorithms.

The idea of modular rendering system, supporting variety of ray tracing concepts, is notnew. Some early works on this topic are [Kirk & Arvo, 1988] and [Shirley et al., 1991]. Kirkand Arvo [Kirk & Arvo, 1988] described a framework for classic ray tracing, whereas Shirley etal. [Shirley et al., 1991] presented a design of a global illumination system. However, the Shirley’sframework is designed for zonal method, incorporates a lot of details of particular algorithms andtherefore cannot be easily modified to support different rendering techniques. A more recent workin this area [Urena et al., 1997] presents a very general object oriented design for not only ray trac-ing, but also z-buffer rasterization and radiosity. Unfortunately, the approach is overcomplicatedand inconvenient for support of non quasi-Monte Carlo ray tracing only. Greenberg et al. [Green-berg et al., 1997] describe current (for 1997 year) rendering techniques and research areas, ratherthat defines an object oriented framework. This paper divides the image synthesis task into lightreflection models, light transport simulation and perceptual issues. This allows independent re-search in any of these domains. Moreover, it gives a lot of attention to correctness of renderedimages, obtained by using carefully measured reflection data and validation of images of artificialscenes through comparison with photographs of real ones. Geimer and Muller’s [Geimer & Muller,2003] main point is interactivity and support of Intel’s SSE* instruction sets [Intel Corporation,2009]. Lesson et al. [Leeson et al., 2000] designed interface around mathematical concepts like func-tions and integration besides the rendering ones. This system provides also debugging elements –independent testing of implementation of particular interfaces and graphical viewer of traced rays.Wald et al. [Wald et al., 2005] presents a system suitable for both geometric design accuracy andrealistic visualization based on ray tracing, executed in real time.

All the previously mentioned projects assumes rays travel along straight lines in 3D space.However, there are some exotic approaches, which break this assumption. Hanson and Weiskopf[Hanson & Weiskopf, 2001] have not assumed that light speed is infinite and visualize relativityeffects by using 4D (3D + time) space. A novel and interesting approach to ray tracing [Ihrkeet al., 2007] allows rendering of volumetric objects with varying refractive indexes. Due to thatrays no more travel along straight lines. Instead their trajectories are evaluated by solving partialdifferential equations. These methods require substantially different approach to tracing rays.

The most similar to our approach is one of Pharr and Humphreys [Pharr & Humphreys, 2004].They designed a framework for ray tracing based global illumination and implements a few wellknown algorithms within it. However, the framework mixes some aspects of 3D objects represen-tation with implementation of rendering algorithms. While this sometimes can be useful in finetuning rendering for best achievable performance, it could be possibly very time consuming anderror prone when integrating new concepts into rendering platform.

The interface is based on a few structures: photon event, spectral color and a sample data.These structures, with some other elements, are arguments of sampling routines. Photon eventdescribes intersection of ray with object geometry and photon scattering. It is filled by intersectionfunction, and can be read by subsequent function calls. Spectral color describes, depending on theroutine, a emission spectrum, a camera sensitivity or a scattering factor. The sample data is usedfor generating quasi-Monte Carlo sample points.

6.1.1 Quasi-Monte Carlo Sampling

Sampling interface allows very elegant and concise way of specifying what could be done withcameras and 3D objects. The sampling methods are general enough to implement majority ofray tracing algorithms. Basic sampling of a function is selecting arguments of function y = f(x)at random, with probability depending on its shape. Argument of each function is a variablein some space Ω1. Similarly, y = f(x) is not necessarily a real number, and then f transfersx ∈ Ω1 → y ∈ Ω2. In our sampling there can be distinguished three basic operations listed below.

Page 95: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 84

• Sampling the argument x (e.g. direction of scattering).

• Evaluation of sampling probability density (e.g. what is the probability of scattering ray in agiven direction ω). The result often is difficult to compute exactly and can be approximated,however crude approximation hurts performance.

• Evaluation of y = f(x) value for a given argument x (e.g. what is the scattering factor in agiven direction ω).

All the sampling routines can be grouped into four different categories listed below.

• Queries about a point on a surface or in a volume – in scene objects these queries randomizeemission points, while in cameras they select a point on the lens. The y value denotesspectrum, which describes spatial emission or sensitivity respectively. In case of point lightsor pinhole cameras y is a δ distribution coefficient. The probability is measured with respectto the lens or light source area, or a light source volume in case of a volumetric emission.These query functions are used by ray tracing algorithms to start a light transport path.

• Queries about transmission – such as ’find the nearest ray intersection with object or lens’or ’check if and where ray interacts with medium’ if object contains volumetric effects. Theargument x represents a point in a volume or a point on a surface, while the y value isspectrum describing attenuation on a ray path. The probability is measured with respect tothe distance.

• Queries about emission/sensitivity – such as what is the emission or a camera sensitivityat a point in a given direction. Points are generated by point generating group of routines.The argument x is a direction vector, and y value is a spectrum representing emission orsensitivity. Probability is measured with solid angle for volumetric emission or projectedsolid angle for surface emission and cameras.

• Queries about scattering – direction in which ray scatters after a collision with a sceneobject. The argument x is a direction vector while the y value is a BSDF or a phase functionrepresented as a spectrum. Probability is measured with solid angle for volumetric scattering,projected solid angle for surface scattering or is a δ distribution for perfect specular surfaces.These functions are used only with respect to the points acquired by one of two previousmethods – either surface or volume sampling or transmission.

Eventually the basic interface contains twelve functions altogether. However, to make the imple-mentation more efficient, the query about transmission attenuation is divided into two functions.One tests ray for nearest intersection with an opaque surface, thus returning a binary value if rayhits a surface or not and a distance if hit occurred, while another calculates attenuation of ray radi-ance due to participating media and semi-transparent surfaces. Finally, the interface contains oneadditional function – routine for freeing temporary memory, since most implementations requirestoring data between calls which generate points and queries about scattering.

Quasi-Monte Carlo numbers are used instead of true random ones. In fact, despite not beingstrictly correct without modification the theory behind Monte Carlo Integration, this approachallows reproduction of any sequence of operations, and often better convergence. There are two kindof random number generators. First are generators which preserve state, with initialization routineand ’next’ routine. These generators can produce numbers only sequentially, thus random accessto the middle of sequence is extremely inefficient. Second kind of generators are so called explicit,hash functions or low discrepancy sequences. They allow immediate access to any pseudorandomnumber. The low discrepancy sequences allow better convergence of MC integration, but only forlow-dimensional functions. Algorithm generating good quality high dimensional low discrepancysequence is still an unsolved task.

The presented system needs random access sequence with theoretically unbounded dimension-ality. It uses for this purpose a complex generator made from two basic ones. First four dimensionsis generated by a Niederreiter (t, s)-sequence in a base 2 with s = 4 and quality t = 1 [Niederreiter

Page 96: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 85

& Xing, 1996] (see Section 3.3.2 for a brief description of (t, s)-sequences). Further dimensions aregenerated by a pseudorandom number generator. The 64-bit seed is formed as a sample numberxor-ed with dimension number multiplied by a large prime number. The number generator con-sists of three 64-bit congruent RNG steps separated by translation of high order bits to mix withlow ones. The known from good statistical properties Mersenne Twister generator [Matsumoto &Nishimura, 1998] cannot be used here, because it does not offer random access to any sequencenumber, providing a parameterless ’next’ routine instead. Nevertheless, our simple function worksquite good, and does not exhibit significant flaws.

Each sample point is described by a 64-bit sample number, and each coordinate of a pointis described by a separate 32-bit coordinate number. Light transport algorithm provides thesenumbers for sampling routines implemented by 3D objects and cameras. Implementations of theseentities actually must use the provided QMC sample point generator, instead of defining indepen-dent sample point generating routines, due to peculiarities of QMC sampling (see Section 3.3.5 fora more detailed explanation). Each time an object or a camera use a pseudorandom number, acoordinate number is increased by one. This operation ensures that all objects intersection gener-ated by subsequent rays on the same path use appropriate coordinates of a sample point. On theother hand, each individual path is assigned a different sample point, so no accidental correlationbetween sample numbers can occur.

6.1.2 Ray Intersection Computation

Efficient computation of intersections of rays with scene geometry is crucial from performancepoint of view. There were proposed a lot of so called acceleration structures, which provideaverage logarithmic complexity of such computation, with respect to the number of primitivesin the scene. However, we are not aware of efficient implementation of an acceleration structurethat can keep surface and volumetric primitives together. Typically, two separate structures areused, or even worse, volumetric primitives are just kept in an array. In this section we proposea substantially better method, which is significantly faster if a scene contains a lot of volumetricprimitives. Moreover, our approach is capable of handling semi-transparent surfaces, such as thinstained glass, with no light path vertex for ray scattering on such a surface.

Due to their best known performance, we choose to use kd-trees as acceleration structures.Volumetric primitives are inserted into trees similarly as common surfaces – they can be boundedby boxes, and tests if such a volume has a common part with a tree node are easily performed.Almost any well known tree construction algorithm can therefore be immediately adapted to insertvolumetric primitives apart from surfaces into the tree. More tricky is efficient traversal of such atree. Effect of volumetric primitive on radiance of a ray depends on, among others, how long theray travels through a primitive. This quantity, however, is unknown, because a surface which beeventually hit by a ray, may be found after a ray-volume interaction is to be computed. In Figure6.1 there is shown an example ray interacting with various primitives.

Whenever a ray intersection search is performed, a reference and a distance to the nearest,encountered so far opaque surface is kept. Meanwhile, references and distances to even nearestsemi-transparent surfaces and volumetric primitives are stored in a heap. The key used in heapis a distance, which in case of volumetric primitives is a distance to the initial intersection of raywith a primitive boundary. If a nearer opaque surface is found, references to primitives that areintersected further are removed from the heap. When a tree traversal is finished, ray radiance ismodified according to each semi-transparent primitive stored in the heap. Therefore, independenttraversals for surface and volumetric primitives are unnecessary.

Moreover, semi-transparent surfaces, which affect ray radiance, but do not scatter rays indifferent directions, like thin stained glass, where offset due to refraction is negligible, can be added’for free’, without the necessity of generating extra intersections with these primitives. Effectively,this optimization shortens light transport paths, by one vertex for each such intersection. Thesepaths are faster to generate and less prone to high variance. Typically, a semi-transparent surfacecombines a transparent and an opaque elements. For example, a thin glass surface may either

Page 97: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 86

Interactions of a raywith semi-transparentsurfaces

Interaction of ray with a volumetric primitive

Ray hits an opaque surface

Volumetric primitive bounding box

Figure 6.1: An interaction of ray with surface and volumetric primitives. The ray is terminatedwhen it hits an opaque surface. The interaction with volume takes place only to the point oftermination. Semi-transparent surfaces, which do not scatter a ray, may be accounted withoutgenerating additional ray-surface intersections.

Figure 6.2: Optimization of intersections of rays with semi-transparent surfaces. Left image:Standard ray intersection. Right image: Omission of explicit generation of ray intersections withsuch surfaces. Both techniques obviously converge to the same result, however much faster withthis optimization.

reflect a ray or let it through, attenuating it but not changing its direction. Implemented softwareassigns a degree of transparency coefficient for each surface, which describes how much light is ableto pass through the surface. This coefficient is analogous to a scattering coefficient σs which isused to describe participating media. The coefficient may depend on several variables, e.g. cosineof angle between a ray direction and a surface normal. Whenever such a surface is intersected,a random test is performed, to determine if ray passes through the surface, or is scattered onit. Effect of this improvement is presented in Figure 6.2. Here, a semi-transparent stainglass, ismodeled with the described technique. The difference is striking, especially in coloured shadowviewed indirectly through glass.

6.1.3 Spectra and Colors

An efficient technique for sampling spectra (described in Section 4.3) is not enough for the fullspectral rendering. Typically, obtaining an input scene description with spectral data is much ofan issue. The spectral data is often unavailable and physical simulations necessary to obtain them

Page 98: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 87

are far too complicated. In this case a conversion from an RGB color representation has to beperformed. One of the popular algorithms is described by Smits [Smits, 1999]. We have providedan alternative approach for this purpose, however.

Additionally, a result of any spectral simulation is a spectrum which cannot be displayeddirectly. It has to be converted to an RGB model, taking into account human visual systemproperties as well as color gamut of the target hardware. Since implemented rendering core outputsimage samples in CIE XYZ format, the conversion to an RGB model can be easily replaced by amore sophisticated approach.

Acquiring spectral data

Spectral data typically can be obtained by employing physically based calculations, using mea-sured data or by conversion from RGB images. Common example of physically based calculationsare Planck’s blackbody radiation formula and reflection from metals described by Fresnel equa-tions. These formulae are not computationally expensive when compared to full global illuminationcost and give physically plausible results, so they should be applied whenever possible. If thereare no simple physical equations describing given phenomenon, measured data can be applied.The example of such approach is CIE illuminant D65, which is a tabularized spectrum of daylightillumination.

However, typically there exist neither physical formulae nor measured spectral data. In thiscase the only solution is to convert RGB colors to the full spectral representation. This conversionis, obviously, not well-defined – there are infinitely many spectra for any one RGB color, and theimplementation must choose arbitrarily one of them. Compared to previously created conversionalgorithm [Smits, 1999], our approach is simpler, produces smoother output spectrum and is appli-cable to point sampled spectra representation instead of piecewise constant basis functions. Due tothe fact that such conversion is not well defined, it cannot be said which approach is more accuratein general.

Actually, plausible RGB to spectrum conversions for material reflectivity and for light sourcesare different. An idealized white material is the material which reflects all the received light, andtherefore absorbs nothing. So the conversion for materials maps from an RGB triplet (1, 1, 1) toa spectrum, which have the constant value of one. If this conversion is used for a light source,the effect is an reddish illumination. This is because white light is not the light with a constantspectrum, but the daily Sun light, which is a complex result of scattering Sun rays in Earthatmosphere. Human visual system adapts to these conditions, and perceives this as the neutralcolorless light.

The basic conversion implemented in our model is defined for reflectances. The plausibleconversion means that resulting spectra satisfy some requirements. First, the triplets of the form(c, c, c) should be mapped to appropriate constant valued spectra. Second, perceived hue of anRGB color should be preserved – when a textured surface is illuminated with a white light, outputimage should match the texture color. This can be precisely expressed as:

spectrum ((r, g, b)) ∗D65→ XY Z → RGB = c1(r, g, b), (6.1)

where c1 is arbitrary constant, using sRGB profile for XY Z ← RGB transform. Moreover, inmost cases resulting spectrum should be smooth.

The basis of conversion are three almost arbitrarily chosen functions: r(λ), g(λ) and b(λ). Forsimplicity, in our model these functions are spline based and the actual spectral curve for bluecomponent is dependent: b(λ) = 1.0− (r(λ) + g(λ)), which guarantees that functions sum to one.The converted spectrum is calculated as s(λ) = R ∗ r(λ) + G ∗ g(λ) + B ∗ b(λ), where (R,G,B)is a given color triplet. The conversions for light sources are defined as a product of daylightillumination spectrum (given as a standard CIE illuminant D65) and a converted (R,G,B) triplet.Thus the RGB data acts as modulator of a D65 white light. One of the possible r(λ), g(λ) and b(λ)are presented in Table 6.1. The precision of the algorithm cannot be perfect, because it contains alot of measured physiological and hardware data. Particularly, there is no unique RGB standard,

Page 99: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 88

R(λ) function, for a ≤ λ < b G(λ) function, for a ≤ λ < ba [nm] b [nm] R(λ) a [nm] b [nm] G(λ)

0 404.50 R = 0.165 0 468.22 G = 0404.50 447.29 R = 0.165(1− P (a, b, λ)) 468.22 494.75 G = P (a, b, λ)447.29 574.34 R = 0 494.75 573.99 G = 1574.34 603.73 R = P (a, b, λ) 573.99 601.77 G = 1− P (a, b, λ)603.73 ∞ R = 1 601.77 ∞ G = 0

Table 6.1: Piecewise polynomial spectral functions for RGB colors. The R(λ) and G(λ) arepresented. The function B(λ) is defined as B(λ) = 1− (R(λ) +G(λ)). The polynomial P (a, b, x)is a function P (a, b, x) = t2(3− 2t), where t = clamp((x− a)/(b− a), 0, 1).

and the conversion of D65 light to one of these standards does not necessarily result in an idealgray color.

Conversion to RGB from XYZ

For XYZ to RGB transformation our idea is to examine a simple mapping approach, which stillcan produce good quality results. Presented technique does not care for luminance and chromaticadaption, but it leaves a few parameters to be adjusted arbitrarily. Algorithm presented herecontains two independent parts. First is luminance (tone) mapping, and second is the actual colorconversion. Luminance mapping allows setting sensitivity and contrast parameters, while colorconversion takes a color profile matrix.

The luminance mapping is necessary because common display hardware has very limited con-trast in comparison with contrasts that appear in nature. Usually mapping algorithms map thecomputed luminance from range [0,∞) to range [0, 1], which is then quantized linearly (typicallyto 256 levels) and displayed. The simplest solutions are to clamp all too large values to unity, orscale linearly to the brightest value. These methods, however, produce too many white patchesor display dark regions far too dark. Effective conversion needs nonlinear mapping. In our modelwe use function y′ = 1 − 2−(σy)C for this purpose, where y is computed luminance, y′ is mappedluminance, σ is brightness scaling parameter and C is contrast. In advanced approaches, localcontrast preserving techniques are used. They use different scaling factors in different parts ofimage, based on local average brightness. The result is better contrast on the whole image, but atthe price of non-monotonic mapping.

After mapping the luminance, the RGB output has to be calculated. Due to limited colorgamut of RGB model the exact result cannot be produced. We assume that mapped luminance(y′) must be preserved exactly in order to keep the whole mapping monotonic, the hue also mustbe roughly preserved (it cannot be changed e.g. from red to blue while mapping) and only oneparameter that might be modified by a large amount is saturation. In this technique all problemsthat emerge while mapping results in fade-to-white effect. Particularly, when sufficiently stronglighting is used, each color becomes pure white, i.e. (1, 1, 1) in an RGB model. This should not beseen as a harmful effect, since it is similar to overexposure in photography. Careful adjustment ofσ and C parameters gives reasonably good results in almost all lighting conditions.

The color conversion algorithm requires color profile matrix P . The matrix multiplicationcan produce out of gamut colors. However, simple desaturation of such colors to render themdisplayable, works reasonably well. The desaturation algorithm uses color profile matrix P andsecond row of its inverse, P−1, which affects y component in expression XY Z = P−1∗RGB. Theseelements must be nonnegative and must sum to one, which requirement is satisfied by sRGB colormatrix. The algorithm assumes that y component of input is in [0, 1] range. The idea of thisapproach, presented in Algorithm 6.1 is to compute clamped RGB color first, then check howclamping affect luminance, and finally adjust RGB color to compensate for luminance change.

In Figure 6.3 there is a comparison between desaturation and clamping, using sRGB colorprofile. The colors are defined by blackbody radiation (1500K for red and 6600K for gray) with

Page 100: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 89

Algorithm 6.1: Gamut mapping by desaturation.RGB ← P ∗XY Z;1

clRGB ← clamp(RGB, 0, 1);2

∆y ← y′ − (P−1 ∗ clRGB).y;3

if ∆y ≤ 0 then4

∆RGB ← clRGB5

else6

∆RGB ← 1− clRGB7

end8

result← clRGB + ∆RGB∗∆y(P−1∗∆RGB).y9

carefully chosen mapping sensitivity. Each subsequent color patch has the sensitivity set to value1.25 times larger than the previous. First row is computed with luminance correction – it exposesthe fade-to-white effect. On the other hand, the saturated images in second row cannot approachfull luminance, no matter how high the sensitivity is.

Figure 6.3: Comparison between different gamut mapping techniques. The upper line is generatedwith desaturation, while the lower is due to clamping. The red color is defined as 1500K blackbodyradiation, and gray has a temperature of 6600K.

6.1.4 Extension Support

Extension support is a very handy feature if ray tracer implementation is used for experimentswith variety of light transport algorithms. Unfortunately, a C++ language does not support ma-chine code loading at runtime. The rendering software uses Windows DLL mechanism [Richter,1999] for this purpose. Due to lack of C++ language standard for importing pure virtual functionsfrom dynamic libraries, arrays of virtual functions are created manually, from standard C like func-tions, with explicitly passed ’this’ pointer. This tedious work is necessary to provide compatibilityif different compilers are used for platform core and individual plugins.

Majority of rendering functionality is defined by abstract interfaces, and therefore can be im-plemented as plugins. Most important objects which can be implemented as plugins are listedbelow:

• Light transport algorithms,

• 3D objects,

• Cameras,

Each plugin provides an independent part of rendering functionality, which can be modified andexperiment with independently of the rest of the system, possibly by independent group of people.

Typically, implementation of necessary functions of above mentioned objects requires so muchcomputational resources, that additional layer of virtual function calls has little effects on perfor-mance. On the other hand, functionality listed below is not available as plugins:

• Spectrum representation,

Page 101: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 90

• Quasi-Monte Carlo sample generation.

These functions are much simpler and much more frequently executed, so making them virtualwould noticeably hurt performance. As a consequence of being integrated into core functionality,any modification to them would require recompilation of rendering core and all plugins.

6.2 Procedural Texturing Language

Texturing is a technique used in rendering. Its purpose is improvement of image quality withoutincreasing geometrical complexity of 3D objects. A simple application of texture, which is presentedin Figure 6.4, shows how much in contemporary computer graphics is modeled using textures.

Figure 6.4: Left image: human model with head built from geometric primitives only. Right image:the result of texture application onto geometrical model.

Classic (non-procedural) texturing, presented in Figure 6.4 is a painting of 2D maps onto3D models. In spite of being able to define arbitrary appearance of any 3D model, this techniqueexhibits some flaws. First, texture maps have to be created, which is often time consuming. Second,storing maps during rendering consumes a lot of machine memory. Moreover, sometimes 3Dtextures are also useful. These textures require even far more storage. As a consequence, complexscenes are frequently constructed using the same texture several times or using textures withreduced resolution. A partial solution to this problem is rendering with compressed textures [Beerset al., 1996,Wang et al., 2007,Radziszewski & Alda, 2008b].

However, there are a lot of objects, which can be textured procedurally. Procedural texturing isbased on executing typically short programs, which evaluate material properties of textured points.The example result of procedural texturing is presented in Figure 6.5. Procedural texturing, despitethe possibility of generating wide variety of object appearances with minimal storage, is not alwaysan ideal solution. First, it is not always possible to apply it, like human face in Figure 6.4. Second,more complicated programs can significantly increase rendering time. Simple array lookup, evenwith sophisticated filtering, typically is faster than executing a general program. Due to this amost common solution is a hybrid approach – a procedural texturing language enhanced withcommands for support of classic texture images.

Rendering, especially in real time, used to be based on hard-coded shading formula, definedas a sum of a matte reflection with texture color, and a white highlight, with adjustable glossi-ness [Shreiner, 2009]. More modern approaches allowed using a few textures combined by simplemathematical operations, like sum or product. Obviously, these methods were totally inadequatefor simulation of vast diversity of real life phenomena. First well known approach to a flexibleshading were shade trees [Cook, 1984]. Since that time there have been created a lot of far moresophisticated languages designed for procedurally defining appearance of surfaces. Many usefultechniques for procedural texturing are presented in [Ebert et al., 2002]. The popular shadinglanguage for off-line rendering is a part of RenderMan software [Cortes & Raghavachary, 2008].Typical shading languages used in modern, hardware accelerated rendering are GLSL [Rost &

Page 102: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 91

Figure 6.5: Sample procedural texture.

Licea-Kane, 2009], HLSL [Luna, 2006], and Cg [Fernando & Kilgard, 2003]. These languageshave syntax which is similar to C, but they are compiled for specialized graphics hardware. Cur-rently, these languages impose serious restrictions on programs. Nevertheless, these restrictions areforced by deficiencies of contemporary GPUs, rather than being inherent limit of these languages.Therefore, it is widely expected that these languages might be far more powerful in near future.

As a part of rendering software, we have designed and implemented texturing language op-timized for ray tracing, with functional language syntax. The rest of this section starts with adescription of functional programming aspects. Next, our language syntax and semantic is ex-plained. This is followed by language execution model and its virtual machine API, targeted forintegration with rendering software. Finally, example results are presented.

6.2.1 Functional Languages

Functional programming is a programming technique which treats computation as evaluationof mathematical functions. This evaluation is emphasized in favour of state management, whichis major part of traditional, imperative programming style. In practice, the difference betweena mathematical function and the concept of a function used in imperative programming is thatimperative functions can have side effects, reading and writing to state variables, besides its formalinput parameters and output result. Because of this the same language expression can resultin different values at different times depending on the state of the executing program. On theother hand, in functional code, the output value of a function depends only on the argumentsthat are input to the function, so calling a function f multiple times with the same value foran argument x will produce the same result f(x) all times. Eliminating side-effects can make iteasier to understand the behavior of a program. Moreover, it makes easier to implement compileroptimizations, improving performance of functional programs. This is one of the key motivationsfor the development of functional languages. One of such languages is Haskell [Hudak et al., 2007].

The main idea of imperative programming is description of subsequent tasks to perform. Theselanguages are good as general purpose languages. They can describe typical mathematical func-tions, as well as arbitrary other tasks, e.g. a web server or a computer game. However, in someprogramming tasks programs always evaluate a function of already defined input variables andconstants, and return result of this function. In such cases, functional programming seems to bemuch more convenient and less error prone. Actually, this is the case of programmable texturingand shading, and therefore we argue that functional languages could be better for such task thangeneral purpose, C-like languages so often used for this purpose.

The functional languages have been used in computer graphics already [Karczmarczuk, 2002,Elliott, 2004]. However, Kaczmarczuk created the language named Clean, which was used toimplement a library with image synthesis and processing tools. Clean language is not integratedwith any rendering application, the toolset can just create 2D images, display and save them.Elliot, on the other hand, created a Vertigo language, used for programming graphics processors.

Page 103: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 92

The language compiler outputs assembler in the DirectX format, so the language can be treatedas HLSL replacement. To our knowledge, there is no functional language designed to cooperatewith ray tracing based global illumination, which is the motivation of our research in this area.

6.2.2 Syntax and Semantic

The language design is largely affected by the language purpose. In this case, the languageis designed for texturing and cooperation with ray tracing based global illumination software.Therefore, the language is not intended to be used as a tool for creating standalone applications.Its usage model should be similar with languages like GLSL, HLSL or Cg – the programs arecompiled at runtime and executed by rendering application in order to define objects’ appearance.The rest of this section starts with a distinction between texturing and shading. This is followedby the description of presenter language grammar and semantic of grammatical constructions.

Texturing vs. Shading

Texturing and shading are in fact substantially different operations, despite the fact that theseterms are frequently used interchangeably. Shading operation is an evaluation of final color ofimage fragment, which is about to be placed in frame buffer. Shading takes into account materialproperties and illumination as well. On the other hand, texturing operation defines in texturedpoint material properties only.

This differentiation is important, because different types of rendering algorithm can use eithertexturing or shading. Shading is used in rasterization algorithms supporting local illumination,where shaders are expected to produce final fragment color. Local illumination uses informationabout shaded point and a tiny set of constants, like light sources description. This information obvi-ously is not enough for full global illumination calculations. On the other hand, global illuminationalgorithms automatically calculate illumination simulating laws of physics. Any intervention intothese procedures by shaders is harmful, so global illumination algorithms are designed to cooper-ate with texturing only. Since the thesis concentrates on the latter approach only, the presentedlanguage is designed for texturing purposes only.

Language Grammar and Semantic

The simplified language grammar is presented in Figure 6.2. At the highest level, the pro-gram contains surface and volume descriptors, with function, constants and type definitions. Thedescriptors specify particular outputs, which have to be defined. The outputs are script-definedexpressions returning values of particular type. There are listed all possible outputs of surfacedescriptors, with required types given in the brackets:

• BSDF (material, default value is matte),

• scattering coefficient (spectrum),

• absorption coefficient (spectrum),

• emission function (material),

• emission coefficient (spectrum),

• surface height (real),

• surface height gradient (real[2]).

The volume descriptors have similar fields, except height and gradient:

• phase function (material, default value is isotropic),

Page 104: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 93

• scattering coefficient (spectrum),

• absorption coefficient (spectrum),

• emission function (material),

• emission coefficient (spectrum).

The scattering coefficients in both descriptors describe how likely is the primitive to scatter rays (seeSection 2.2.2 for explanation of volume scattering coefficient). The scattering coefficient in surfacescan be used to create transparent or semi-transparent surfaces (see Section 6.1.2). Wheneverscattering coefficient is non-zero, scattering may occur. The scattering is described by BSDFs forsurfaces and phase functions for volumes (see Section 2.2.3). Whenever no scattering occurs, rayradiance is attenuated according to absorption coefficient. If a surface or a volume have a non-zeroemission coefficient, emission may occur. Emission is defined by materials, similarly as scattering.The surface descriptors have two additional outputs – height, used for displacement mapping, andgradient, which can be used for bump mapping if surface primitive does not support displacement.Apart from descriptors, all non-nested functions and constants are visible from outside of script.The alias construction can be used to explicitly specify elements visible outside of script underdifferent names.

The materials are opaque handlers describing interaction of light with scene primitives, describ-ing surface as being matte, glossy or so on. Currently, materials cannot be defined from scratchin scripts, and appropriate standard library functions which return them have to be used. Thespectra are opaque handlers for internal full spectral representation of colors. There are a lot ofstandard library functions as well as operators for operating on spectra.

At the lowest level, the script is made from function definitions and function calls. The functionshave only formal input parameters and have to return exactly one value. The function definitionis given by any number of assignments to temporary variables (which are assigned values once, thevalue cannot be redefined), followed by an expression defining returned value. Therefore, iterativeconstructions cannot be used, and as such are replaced by recursion. Expressions are created fromoperators, function calls, constants and environment variables. These variables, accessible with$ident construction, provide access to the ray-primitive intersection data, like input and outputdirections, surface normal, intersection point and so on.

The language offers abundance of features for making programs more concise and programmingeasier. To name a few, generic type support is similar to templates known from C++, functiontypes and anonymous functions provide functionality similar to lambda expressions from the newC++200x standard, and library functions for vector and matrix operations and complex numbersenable concise representation of many of mathematical concepts.

6.2.3 Execution Model and Virtual Machine API

The texturing programs, which are part of the rendered scene description, should be com-piled at runtime, during preprocess just before rendering. The compiler outputs code targetedfor specialized stack based virtual machine. Despite this approach provides inferior performancecompared to CPU native code, we have chosen it due to hardware independence. That is, if, infuture, the software is to be moved to another platform, it is enough just to recompile its code,which is not the case if native CPU instructions were explicitly generated at runtime. Duringrendering, texturing program functions might be executed whenever ray intersects scene geometry,in order to evaluate material properties at intersection point. In the rest of this section there aredescribed two sets of functions – language standard library, which is intended to use by texturingprograms, and a virtual machine API, targeted for its integration with ray tracing software.

Page 105: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 94

program → (import | function | typeid | const | surface | volume | alias)*import → import string ;*

function → type ident? ( formals? ) nested definition ;*typeid → ident typeid type ;*type → basic | enum | structure | array | functype | generic | identbasic → logic | integer | real | complex | spectrum | materialenum → enum ident ? ident(, ident)*

structure → struct ident ? (type ident(, ident)* ;)+ array → type [ ] | type [ n ] | type [ n , m ]

functype → function formals -> typegeneric → generic( ident )?formals → formal(, formal)*formal → type ident(, ident)*nested → (function | typeid | const)*const → const ident = expression ;*

definition → (ident = expression;)* return expression;expression → expression op expression | prefix expression | expression suffix | term

term → (expression) | call | selection | constant | $ ident | # identcall → ident (( arguments ))?

arguments → (expression (, expression)*)?selection → select (expression , if expression ;)* expression otherwise ; endconstant → true | false | integer | real | real i | stringsurface → surface (ident | string) (ident = expression;)* ;*volume → volume (ident | string) (ident = expression;)* ;*alias → (ident | string) alias ident ;*

Table 6.2: Simplified texturing language grammar. Font symbol denotes non-terminal symbol,font symbol denotes terminal symbol as literally written letters, and font symbol denotes complexterminal symbols, like identifiers or real numbers. Symbols: | means selection, ? means optional,* means zero or more occurrence, and + means one or more occurrence.

Standard Library

The standard library provides a rich set of functions for assist program development. Thesefunctions provide, among others, materials, images from external files, noise generation, physicalequations, and mathematical functions like sin or exp.

Basic materials, which are available in library, are among others: matte reflective and mattetranslucent materials, glossy reflective and refractive materials, based on microfacet and Phongmodels, and ideal specular reflective and refractive materials. The materials are parametrized byinputs like color or glossiness (if applicable). The materials can be combined to form more complexones, using complex materials, available in standard library as well. A complex material takes twomaterials and a combination factor as input.

The standard library provides exhaustive support for access to images from external files.These images can be stored in memory either directly or in compressed form (saving storage atthe price of slower access). The images can be read with just reconstruction filter, or with lowpass, blurring filter. Additionally, gradients of image functions can be evaluated. The images canbe either monochromatic or in color. In the latter case, the result is implicitly converted to fullspectral representation. Additionally, the library provides optional conversion from sRGB colorspace, before transform to full spectrum.

Because of their usefulness, the library offers variety of noise primitives [Perlin, 1985, Ebertet al., 2002]. These functions produce real valued n-dimensional noise and gradients of noise.The noise is guaranteed to be restricted to [−1, 1] range, to have smooth first derivative and havelimited frequency content. Sample results generated using noise primitive are presented in Figure6.6.

Page 106: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 95

Figure 6.6: Images generated using noise primitive. From left: simple noise, sum of six noiseoctaves, noise as an argument of sine function, noise as an argument of exponential function. Eachimage is generated by varying some of its input as a function of image location.

Moreover, in the library there are defined a few functions based on physical equations – namely:blackbody radiation in function of temperature and Fresnel equations for dielectric and metals infunction of indexes of refraction and angles between incoming and outgoing rays and surface normal.These functions can be used to improve realism of generated images.

Virtual Machine API

In order to use texturing programs, the rendering software must compile appropriate scripts.The compilation is divided into two parts. First, the program source is compiled into internal graphrepresentation. At this stage, just syntax checking is performed. Arbitrary number of scripts canbe compiled independently. Second, linking of selected compiled scripts is performed to generateprograms. During linking, semantic checks are performed, name dependences are resolved, andvarious code optimizations take place. After being successfully linked, program is guaranteed to beerror free. Finally, individual surface and volume descriptions can be generated from the program.The example of generating surface description from scripts is presented in Algorithm 6.2.

Algorithm 6.2: Sample usage of procedural texturing.MMsGraph* g1 = mmsMkGraph();MMsGraph* g2 = mmsMkGraph();if (!mmsParseString(*g1, script)) throw Error(mmsGetGraphLog(*g1));if (!mmsParseFile(*g2, "marble.txt")) throw Error(mmsGetGraphLog(*g2));MMsProgram* p = mmsMkProgram();mmsAddGraph(*p, *g1);mmsAddGraph(*p, *g2);if (!mmsLinkProgram(*p, mms::O2)) throw Error(mmsGetProgramLog(*p));Surface* s = mmsGetSurface(*p, "mysurf");if (!s) throw Error("No surface description \"mysurf\" exist.");//... constructed surface can be used heremmsDeleteProgram(p);mmsDeleteGraph(g2);mmsDeleteGraph(g1);

6.2.4 Results and Conclusion

The presented language is a very convenient tool for procedural texturing. It employs a lot offunctionality dedicated to make texturing task easier. The language is designed and optimized forcooperation with ray tracing based global illumination algorithms. The one thing to consider is

Page 107: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 96

Figure 6.7: Procedurally defined materials.

Figure 6.8: Mandelbrot and Julia Fractals.

whether it is better to have the programs be executed by a virtual machine, or provide a compilerfor, say, x86/x64 SSEx unit native code. The compilation to native code would ensure significantlybetter performance, however, large effort to write a good compiler can be wasted, if in future therendering software is to be moved to a different platform. Obviously, all material scripts wouldremain unchanged, regardless the language is compiled to native code or not.

In the Figure 6.7 there are presented some rendering results of spheres with surfaces described bypresented language. All these scripts are simple, with just few calls of standard library functionsand mathematical operations. On the other hand, Mandelbrot and Julia fractals, presented inFigure 6.8, use some more advaced features of the language – support for complex numbers andrecursion.

6.3 New Glossy Reflection Models

Modeling reflection properties of surfaces is very important for rendering. Traditionally, inglobal illumination fraction of light which is reflected from a surface is described by a BRDF(Bidirectional Reflection Distribution Function) abstraction. This function is defined over all scenesurface points, as well as two light directions – incident and outgoing. As the name suggests, toconform the laws of physics, all BRDFs must be symmetric, i.e. swapping incident and outgoingdirections must not change BRDF value. Moreover, the function must be energy preserving – itcannot reflect more light than it receives.

Page 108: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 97

To achieve best results of rendering with global illumination, energy preservation of BRDFshould satisfy more strict requirements. It is desirable that basic BRDF model reflects exactly allthe light that arrives on a surface. The actual value of reflection is then modeled by a texture. IfBRDF is unable to reflect all incident light, even white texture appears to absorb some part of it. Inlocal illumination algorithms this can be corrected a bit by making reflection value more than unit,but in global illumination such trick can have fatal consequences due to multiple light scattering.Our model is strictly energy preserving, while it still maintains other desirable properties.

In the subsequent section there is a brief description of former research related to the BRDFconcept. Next, requirements, which should be satisfied by a plausible reflection model are pre-sented. Then there is explained the derivation of our reflection function, followed by comparison ofour results with previous ones. Finally, we present a summary, which describes what was achievedduring our research and what is left for future development. Research results explained in thissection are also presented in [Radziszewski & Alda, 2008a].

6.3.1 Related Work

The first well known attempt to create glossy reflection is Phong model [Phong, 1975]. Thismodel is, however, neither symmetric nor energy conserving. An improved version of it was createdby Neumann et al. [Neumann et al., 1999a, Neumann et al., 1999c]. Lafortune et al. [Lafortuneet al., 1997] used combination of generalized Phong reflection functions to adjust scattering modelto measured data.

There are popular reflection models based on microfacets. Blinn [Blinn, 1977] and Cook etal. [Cook & Torrance, 1982] assumed that scattering from each individual microfacet is specular,while Oren and Nayar [Oren & Nayar, 1994] used diffuse reflection instead.

A lot of work was dedicated to anisotropic scattering models. The first well known approachis Kajiya’s one [Kajiya, 1985], which uses physical model of surface reflection. Ward [Ward, 1992]presented a new technique of modeling anisotropic reflection, together with method to measurereal-world material reflectances. The Walter’s technical report [Walter, 2005] describes how toefficiently implement Ward’s model in Monte Carlo renderer. Ashikhmin and Shirley [Ashikhmin& Shirley, 2000] showed how to modify Phong reflection model to support anisotropy.

Some approaches are based on physical laws. He et al. [He et al., 1991] developed a model thatsupports well many different types of surface reflections. Stam [Stam, 1999] used wave optics toaccurately model diffraction of light. Westin et al. [Westin et al., 1992] used a different approachto obtain this goal. They employed Monte Carlo simulation of scattering of light from surfacemicrogeometry to obtain coefficients to be fitted into their BRDF representation.

On the other hand, Schlick’s model [Schlick, 1993], is purely phenomenological. It accounts fordiffuse and glossy reflection, in isotropic and anisotropic versions through a small set of intuitiveparameters. Pellacini et al. [Pellacini et al., 2000] used a physically based model of reflection andmodified its parameters in a way which makes them perceptually meaningful.

A novel approach of Edwards et al. [Edwards et al., 2006] is designed to preserve all energywhile scattering, however at the cost of non-symmetric scattering function. Different approachwas taken by Neumann et al. [Neumann et al., 1999b]. They modified Phong model to increaseits reflectivity at grazing angles as much as possible while still satisfying energy conservation andsymmetry as well.

Some general knowledge on light reflection models can be found in Lawrence’s thesis [Lawrence,2006]. More information on this topic is in Siggraph Course [Ashikhmin et al., 2001] and in Westin’set al. technical report [Westin et al., 2004b]. Westin et al. [Westin et al., 2004a] also provided adetailed comparison of different BRDF models. Stark et al. [Stark et al., 2005] shown that manyBRDFs can be expressed in more convenient, less than 4D space (two directional vectors). Shirleyet al. [Shirley et al., 1997] described some general issues which are encountered when reflectionmodels are created.

Page 109: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 98

6.3.2 Properties of Reflection Functions

In order to create visually plausible images, all reflection functions should satisfy some welldefined basic requirements.

Energy conservation. In global illumination it is not enough to ensure that no surface scattersmore light that it receives. It is desirable to have a function which scatters exactly alllight. We are aware of only Edwards’s et al. work [Edwards et al., 2006], which satisfies thisrequirement, but at the high price of lack of symmetry. Neumann et al. [Neumann et al.,1999b] improved reflectivity of Phong model, but the energy preservation still is not ideal.

Symmetry. The symmetry of BRDF (see Equation 2.23) is very important when bidirectionalmethods (which trace rays from viewer and from light as well) are used. When a BRDF isnot symmetrical, appropriate corrections similar to described in [Veach, 1996] must be made,in order to get proper rendering results. Since these corrections are not part of the BRDFmodel itself, BRDF sampling may turn out to be extremely inefficient. Obviously, the bestoption is minimize usage of non-symmetric BRDF with Monte Carlo renderings. This isreasonable, since majority of currently used basic BRDF models are symmetric.

Everywhere positive. If a reflection function happens to be equal to zero on part of its domain,the respective surface may potentially render to black, no matter how strong the illuminationis. Having a ’blackbody’ on the scene is a severe artifact, which is typically mitigated by acomplex BRDF with an additional additive diffuse component. However, this option producesdull matte color and is not visually plausible. All reflection functions based on Phong modelhave inside a factor equal to max(cosθ, 0), where θ is an angle between viewing direction andideal reflection direction. If illumination is not perpendicular, all these BRDFs are prone toexhibit black patches.

Everywhere smooth. Human eye happens to be particularly sensitive on detecting discontinu-ities of first derivative of illumination, especially on smooth, curved surfaces. This artifactoccurs in any BRDF which uses functions such as min or max. Particularly, many micro-facet based models use so-called geometric attenuation factor with min function, and lookunpleasant at low glossiness values.

Limit #1 – diffuse. It is very helpful in modeling if glossy BRDF can be made ’just a bit’ moreglossy than a matte surface. That is, good quality reflection model should be arbitrarily closeto matte reflection when glossiness is near to zero. Surprisingly, few of BRDF models satisfythis useful and easy to achieve property.

Limit #2 – specular. Similarly, it is convenient if glossy BRDF becomes near to ideal specularreflection when glossiness approaches infinity. Unfortunately, this property is partially muchmore difficult to achieve than Limit #1. First, all glossy BRDFs are able to scatter light innear ideal reflection direction, which is correct. Second, energy preservation typically is notsatisfied. While at perpendicular illumination majority of popular BRDFs are fine, whenevergrazing angles are encountered, these reflection functions tend to absorb more and more light.

Ease of sampling. Having a probability distribution proportional (or almost proportional) toBRDF value, which can be integrated and then inverted analytically, allows efficient BRDFsampling in Monte Carlo rendering. This feature is roughly satisfied in majority of popularBRDF models.

Numerical stability. Numerical stability is extremely important property of any computationalalgorithm, yet it is rarely mentioned in BRDF related works. Particularly, any reflectionfunction, which denormalizes over any part of its domain, is a potential source of significantinaccuracy. Example of such case are microfacet models, based on so-called halfangle vectors.The halfangle vector is a normalized componentwise sum of viewing and illumination direc-tion. In some cases, the halfangle vector is calculated as ωh = [0, 0, 0]/ ‖[0, 0, 0]‖, causing aserious error.

Our work is an attempt to create a BRDF which satisfies all these conditions together.

Page 110: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 99

6.3.3 Derivation of Reflection Function

In this section a detailed derivation of the new reflection model is presented. Since this modelis purely phenomenological, all mathematical functions chosen to use in it are selected just becauseof desirable properties they have. This particular choice has no physical basis, and of course, isnot unique. Through the rest of this section the notation presented in Table 6.3 is used. By

Symbol Meaningfr Reflection function (BRDF)R Reflectivity of BRDFωi Direction of incident lightωo Direction of outgoing lightωr Ideal reflection direction of outgoing lightN Surface normalu,v Arbitrary orthogonal tangent directionsθi Angle between ωi and Nθr Angle between ωr and Nφi Angle between ωi and uφr Angle between ωr and uΩ Hemisphere above surface, BRDF domain

Table 6.3: Notation used in BRDF derivation.

convention, all direction vectors are in Ω, i.e. cosine of angle between any of them and N isnon-negative. Moreover, these vectors are of unit length.

Energy Conservation

Energy conservation requires that reflectivity of the BRDF must not be greater than one (seeEquation 2.25), and is desirable to be equal to one. The reflectivity can be expressed in a differentdomain. The following expression is used through the rest of this section:

R(θo, φo) =

2π∫0

π2∫

0

fr(θi, φi, θo, φo) cos(θi) sin(θi)dθidφi. (6.2)

It is very useful if reflection function fr can be separated into a product:

fr(θi, φi, θo, φo) = fθ(θi, θo)fφ(θi, φi, θo, φo), (6.3)

where fθ is latitudal reflection and fφ is longitudal reflection. If fφ integrates to unit regardless of θiand θo this separation significantly simplifies reflectivity evaluation, which now can be re-expressedas:

R(θo, φo) =

π2∫

0

2π∫0

fφ(θi, φi, θo, φo)dφi

fθ(θi, θo) cos(θi) sin(θi)dθi, (6.4)

and energy conservation as:

2π∫0

fφ(θi, φi, θo, φo)dφi ≤ 1 and

π2∫

0

fθ(θi, θo) cos(θi) sin(θi)dθi ≤ 1. (6.5)

Due to this feature, latitudal and longitudal reflection functions can be treated separately.

Page 111: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 100

Latitudal Reflection Function

The domain of latitudal function is very inconvenient due to the sine and cosine factors in theintegrand:

Rθ(θo) =

π2∫

0

fθ(θi, θo) cos(θi) sin(θi)dθi. (6.6)

However, substituting x = cos2(θi), y = cos2(θo) and dx = −2 sin(θi) cos(θi)dθi leads to muchsimpler expression for reflectivity:

Ry(y) = 0.5

1∫0

fθ(x, y)dx. (6.7)

Despite being much simpler, this space is still not well suited for developing reflection function,mainly because of necessity of symbolic integration. Using the final transformation it may beobtained:

Fθ(x, y) =

y∫0

x∫0

fθ(s, t)dsdt and fθ(x, y) =∂2Fθ(x, y)∂x∂y

. (6.8)

Designing a function Fθ is much easier than fθ. The requirements that Fθ must satisfy are thefollowing:

∀x,y Fθ(x, y) = Fθ(y, x) (6.9)∀x Fθ(x, 1) = x (6.10)

∀x1≤x2 Fθ(x1, y) ≤ Fθ(x2, y) (6.11)

The requirement (6.10) can be released a bit. If it is not satisfied, it is enough if Fθ(1, 1) = 1 andFθ(0, 1) = 0 are satisfied instead. In the latter case, applying:

x′ = F−1(x, 1) and y′ = F−1(1, y) (6.12)

guarantees that Fθ(x′, y′) satisfies original requirements (6.9-6.11).

A matte BRDF in this space is expressed as Fθ = xy. We have found that the following(unnormalized) function is a plausible initial choice for latitudal glossy reflection:

fθ(x, y) = sech2(n(x− y)

). (6.13)

Transforming this equation into Fθ space leads to:

Fθ(x, y) =ln cosh(nx) + ln cosh(ny)− ln cosh

(n(x− y)

)2 ln coshn

. (6.14)

This function satisfies only the released requirements, so it is necessary to substitute:

x′ =1n

artanh(

1− e−2 ln(coshn)x

tanhn

)(6.15)

for x, and analogical expression for y. After substitution and transformation to the fθ space itmay be obtained:

fθ(x, y) =m tanh2 n · e−m(x+y)(

tanh2 n− (1− e−mx) (1− e−my))2 , (6.16)

where m = 2 ln coshn. Finally, it should be substituted x = cos2(θi) and y = cos2(θr). Consideringhow complex the final expression is, it is clear why it is difficult to guess the form of plausiblereflection function, and how useful these auxiliary spaces are.

Page 112: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 101

Longitudal Reflection Function

The longitudal reflection function should be a function of cos(φi − φr). It has to integrate tounit over [−π, π], so it is reasonable to choose a function that can be integrated analytically:

fφ(φi, φr) = Cn1

[n(1− cos(φi − φr)) + 1]6, and Cn =

(2n+ 1)5.5

2πP5(n), (6.17)

where P5(n) = 7.875n5 + 21.875n4 + 25n3 + 15n2 + 5n + 1. When n = 0, the function becomesconstant. When n increases, the function is largest when φi = φr. In the limit, when n approachesinfinity, the function converges to δ(φi − φr).

There is still one issue – whenever either ωi or ωr is almost parallel to N , φi or φr is poorlydefined. In fact, in these cases, the function should progressively become constant. The simplesubstitution n = n′ sin θi sin θr works fine.

Reflection Model

Combining latitudal and longitudal scattering functions leads to the final BRDF:

fr(θi, φi, θo, φo) =(2nφ sin θi sin θr + 1)5.5

2πP5(nφ) (nφ sin θi sin θr(1− cos(φi − φr)) + 1)6 ·

· mθ tanh2 nθ · e−mθ(cos2 θi+cos2 θr)(tanh2 nθ −

(1− e−mθ cos2 θi

) (1− e−mθ cos2 θr

))2 . (6.18)

The parameters nθ and nφ do not have to satisfy nθ = nφ = n. Using various functions of theform nθ = f1(n) and nφ = f2(n) leads to a variety of different single parameter glossy scatteringmodels.

The reflection angles (θr and φr) may be computed from outgoing angles θo and φo in a fewways. For example, it can be used ideal reflection, ideal refraction or backward scattering for thispurpose, leading to variety of useful BRDFs.

The reflection model is strictly energy preserving, so cosine weighted BRDF forms probabilitydensity functions (pdf) to sample θi and φi from. Obviously, both pdfs are integrable analytically,which is very helpful. Care must be taken, since probability of selecting given direction vector ωi isdefined over projected solid angle around N , instead of ordinary solid angle. However, probabilitydensity defined over projected solid angle is often more convenient for use in ray tracing algorithmsthan over ordinary solid angle.

6.3.4 Results and Conclusions

The following results are generated using a white sphere and a complex dragon model illumi-nated by a point light source. The proportions of latitudal and longitudal gloss are set to nθ = nand nφ = 0.75n

√n sin θi sin θr.

In Fig. 6.9 there is examined how selected well-known scattering models cope with littleglossiness. Both Phong-based models expose a zero reflectivity with certain directions, while max-Phong and microfacet models have shading discontinuities due to usage of max or min functions.Neither of these models is fully energy conserving.

In Fig. 6.10 there is examined latitudal component of our reflection model. The scatteringis increased at grazing angles to achieve full energy conservation. Similarly, Fig. 6.11 presentslongitudal scattering only.

In Fig. 6.12 and Fig. 6.13 there is presented our BRDF model, defined as a product of latitudaland longitudal scattering. The Fig. 6.12 shows how the BRDF behaves when glossiness is increased,while the Fig. 6.13 changes illumination angle using the same glossiness. The BRDF exhibits some

Page 113: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 102

Figure 6.9: Comparison of different glossy BRDFs with gloss ’just a bit’ more than matte. Fromleft: diffuse reference, reciprocal Phong, max-Phong, microfacet.

Figure 6.10: Latitudal scattering only. From left: glossiness ’just a bit’ more than matte, mediumglossiness, large glossiness, similarity between θi and θr.

Figure 6.11: Longitudal scattering only with varying glossiness.

Figure 6.12: Product of latitudal and longitudal scattering with increasing glossiness.

anisotropy at non-perpendicular illumination, but this is not a problem with complex models. InFig. 6.14 there is a dragon model rendered with our BRDF and two different glossinesses.

We have presented a novel approach to create BRDFs, for which we have designed an energypreserving symmetrical reflection function. Energy conservation allows improved rendering results.For example, when a model rendered with our function is placed into an environment with uniform

Page 114: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 6. RENDERING SOFTWARE DESIGN AND IMPLEMENTATION 103

Figure 6.13: Scattering with perpendicular (left) and grazing (right) illumination.

Figure 6.14: Complex dragon model rendered with glossiness n = 2 (left) and n = 4 (right).

illumination it vanishes. On the other hand, majority of other models lose some energy, especially atgrazing angles. In this case, models have dark borders, which are impossible to control. Moreover,our BRDF behaves intuitively: when glossiness is decreased, it progressively becomes matte.

This function, however, still has some flaws. Most notably, it has difficulty in controllinganisotropy at non-perpendicular illumination, visible at very smooth surfaces. Secondly, it becomesnumerically unstable when glossiness is increased. Minimizing the impact of those disadvantagesrequires further research in this area.

Page 115: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 7

Results

In this chapter there is a summary of most important results of research on which the thesis isbased. First, there are discussed selected aspects of image quality assessment and various numericalmetrics. These metrics are used later in this chapter to compare images generated by the algorithmspresented in the thesis with alternative commonly available methods. The comparison is providedfor the novel full spectral rendering (see Section 4.3) and the combined light transport algorithm(described in Section 4.5). Finally, since the results of parallelization of light transport algorithmsdo not affect image quality, there is no need in providing image comparison. The parallelizationefficiency is given in Section 5.2.5.

7.1 Image Comparison

Before performing any reasonable assessment of rendering algorithm quality, and thereforequality of images produced by the algorithm, the method used to compare images and the referenceimage must be precisely defined. In the case of geometric optics simulation, the best possiblereference image would be the photography of a real world scene. Unfortunately, to make thecomparison of rendering output with photography meaningful, the input data for rendering, e.g.scene geometry, surface reflectance, light sources’ radiance and so on have to be precisely measured.Without a laboratory very well equipped in specialized measuring devices, this task is impossible.

In majority of research related to rendering much simpler, yet still reasonable approach is used.The reference image is produced with a well-known and proven to be correct reference algorithm (forexample, Bidirectional Path Tracing), with extremely long rendering time, to minimize potentialerrors in reference image. Since BDPT is unbiased, when the image exhibits no visible variance,one can be extremely confident that the image is correct. Then the tested algorithms are runon the same input, and the resulting images are compared with a reference one. This method isable to caught all rendering errors, except those that arise from assumption of geometric opticsapplicability.

The method used to compare the difference between the test image and the reference imageis very important. There have been some research into measuring image difference, e.g. [Wilsonet al., 1997]. These methods take into account various aspects of human visual system. However,since there is no widely used and accepted sophisticated metric used to image comparison, we haveemployed a standard RMS (Root Mean Square) norm:

d(I1, I2) =

√√√√ 1N

N∑i=1

(p1i − p2

i )2, (7.1)

where d(I1, I2) is a distance between images I1 and I2, N is the number of pixels in the images, andpji is the value of ith pixel of jth image, normalized to [0, 1] range. The norm (7.1) is, obviously,

104

Page 116: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 7. RESULTS 105

imperfect, but seems to be most often used. The norm is defined for grayscale images. For RGBones, the norm can be evaluated for each channel separately, and summed using sRGB-to-luminanceweights, i.e. dRGB = 0.21dR + 0.72dG + 0.07dB .

7.2 Full Spectral Rendering

In this section there is a numerical comparison of convergence of Bidirectional Path Tracingequipped with standard and Multiple Importance Sampling based full spectral rendering. Thereference image of a test scene is presented in Figure 7.2. The numerical comparison using a norm(7.1) is performed on a rectangle containing the glass figure (see Figure 7.1), and its results arepresented in Table 7.1.

Figure 7.1: Comparison of spectral rendering algorithms. Left: Multiple Importance Samplingbased full spectrum rendering, 1 sample/pixel. Middle: single spectral sample per cluster rendering,1 sample/pixel. Right: reference image, 256 samples/pixel.

Number of samplesalgorithm 1M 2M 4M 8M 16MMIS 0.203 0.156 0.121 0.103 0.085basic 0.376 0.285 0.229 0.175 0.132

Table 7.1: Comparison of convergence of spectral sampling techniques used with Bidirectional PathTracing. The table contains the differences between the reference image and images rendered by agiven algorithm with a given number of samples.

7.3 Comparison of Rendering Algorithms

In this section there is a numerical comparison of convergence of four light transport algorithms:Path Tracing, Bidirectional Path Tracing, Photon Mapping and our new combined algorithm. The

Page 117: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 7. RESULTS 106

reference image of a test scene is presented in Figure 7.3. The results of comparison using a norm(7.1) are presented in Table 7.2.

Rendering timeAlgorithm 15sec 1min 4min 16minPT 13.9 13.2 12.4 11.1BDPT 9.97 9.64 8.95 7.94PM 9.93 9.94 8.86 8.28Combined 9.51 8.75 8.14 7.19

Table 7.2: Comparison of convergence of selected rendering algorithms. The table contains thedifferences between the reference image and images rendered by a given algorithm in a given timeThe difference is evaluated using RMS norm, the results are scaled by a factor f = 102.

Page 118: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 7. RESULTS 107

Fig

ure

7.2:

Full

spec

tral

rend

erin

gof

asc

ene

wit

him

perf

ect

refr

acti

onon

glas

s.T

heim

age

isre

nder

edw

ith

Bid

irec

tion

alP

ath

Tra

cing

,us

ing

0.5G

sam

ples

,at

1920

x108

0re

solu

tion

,in

3h27

min

on2.

93G

Hz

Inte

lC

ore

i7C

PU

.In

the

bott

omri

ght

corn

erof

the

imag

eth

ere

isa

min

iatu

resh

owin

ges

tim

ated

rela

tive

vari

ance

ofva

riou

spa

rts

ofth

em

ain

imag

e.

Page 119: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 7. RESULTS 108

Fig

ure

7.3:

Scen

eco

ntai

ning

indi

rect

lyvi

sibl

eca

usti

cs,

rend

ered

wit

hth

epr

esen

ted

here

com

bine

dlig

httr

ansp

ort

algo

rith

m,

usin

g64

Msa

mpl

es,

upto

8Mph

oton

s,at

1920

x108

0re

solu

tion

,in

1h50

min

on2.

93G

Hz

Inte

lC

ore

i7C

PU

.In

the

bott

omri

ght

corn

erof

the

imag

eth

ere

isa

min

iatu

resh

owin

ges

tim

ated

rela

tive

vari

ance

ofva

riou

spa

rts

ofth

em

ain

imag

e.

Page 120: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Chapter 8

Conclusion

This chapter starts with summary of most important original contributions, somewhat moredetailed than in Chapter 1. Then, possible future improvements of developed techniques arepresented, and eventually, some final thoughts about rendering and global illumination in particularare given.

8.1 Contributions Summary

First, we have dropped the assumption that image is a 2D array of pixels, with a fixed resolution.Instead, the image is logically assumed to be a function defined over [0, 1]2 × λ space, where unitsquare represents image film surface and λ is wavelength. The rendering output is thereforenot a pixel grid, but a stream of real-valued samples, indexed by position and wavelength. Asa consequence, ray tracing algorithms cannot use information about image resolution, values ofparticular pixels and so on. However, this design had proven its strength when real time globalillumination is the main purpose – moving many algorithms out of core renderer to a postprocessstage is now feasible. The samples of a function of real values gives much more possibilities than anarray of pixels. In our implementation, a stream of samples is sent to the GPU for further processingand final conversion to a pixel-based, displayable image. For example, variance of samples can beestimated, and used with the number of evaluated samples to blur image appropriately, maskingundersampling and noise artifacts.

Second, we have developed full spectral rendering with a significantly improved dispersion sim-ulation. Despite being proven incorrect, a lot of global illumination algorithms are designed to usean RGB color model, due to its lower computational costs. On the other hand, our implementa-tion uses just eight spectral samples with randomized wavelengths for each image sample, beingnot much slower than RGB based renderers. If the spectral samples are chosen carefully, eightsamples is enough to provide color noise-free image. The dispersion handling is based on Multi-ple Importance Sampling. This new technique is much more efficient at simulating non-idealizedwavelength dependent phenomena, and fits elegantly into Monte Carlo and quasi-Monte Carlo raytracing algorithms. For our knowledge, this is the first usage of MIS technique for full spectralsimulations. Moreover, we have modified Photon Mapping algorithm to work correctly with thenovel spectral sampling technique.

Third, we have adapted Photon Mapping to be a one pass rendering algorithm. The modifiedapproach starts storing just few photons, and later, the photon map size is increased. This aspectallows rendering images with progressive quality improvement, a feature impossible to obtainin two pass variants of this technique. It appears that the dynamic photon map comes at alittle performance penalty for typical rendered scenes. Additionally, we have provided a detailedpotential error prediction technique for Bidirectional Path Tracing. This feature is then used inthe novel rendering algorithm, which combines the BDPT algorithm with Photon Mapping. Thenew algorithm first tries to use BDPT rendering branch, and if the predicted error for a given light

109

Page 121: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 8. CONCLUSION 110

transport path risks to be larger than a certain threshold, it skips the path. The problematic pathsfor BDPT are mainly caused by local path sampling limitation. Since just skipping problematiccomputations cause final rendered image to be too dark, it later evaluates all skipped paths withPhoton Mapping component. Therefore, the algorithm produces images with little bias, sincemajority of image is calculated by unbiased technique, and with little noise, because noise causingpaths are evaluated from photon map, which tends to trade noise for blurring.

Next, we have provided an extension to the stream processor model to support read-writememory. The basic streaming model states that operations on different elements from the inputstream are entirely independent. This feature simplifies the implementation of streaming proces-sors, yet it is ineffective for some algorithms. The extension guarantees coherency of all pieces ofwritten data, but the order of different reading and writing operations is not preserved. The nec-essary synchronization is done with carefully tuned variant of solution of readers-writers problem.Therefore, the correctness of any algorithm must not depend on content of this cache memory, butthe algorithm may use it to accelerate its operation. The extended stream machine have muchmore usefulness than basic one. The cache mechanism is crucial for parallel implementation ofone pass version of photon mapping, and therefore also for combined Bidirectional Path Tracingwith Photon Mapping. Moreover, we have designed and implemented an interactive viewer of raytracing results, based on processing power of GPUs. The viewer works in parallel with CPU basedrenderer, which generates new samples into output streams, while previous data is displayed. Thisconcept takes advantage of strong points of different architectures of CPUs and GPUs, allowinginteractive preview of partially rendered results, without slowing down further rendering.

Finally, since for good quality images complex input scenes are equally important as efficientlight transport algorithms, we have provided a few improvements in this area. We have designedan interface between 3D objects, cameras and rendering algorithms based entirely on sampling.This interface provides clear abstraction layer, which is general enough to express majority of raytracing algorithms. It is designed to simplify the implementation of bidirectional light transportmethods and provides mutation functionality for Metropolis sampling. Furthermore, the interfaceincorporates functionality for support of full spectral rendering and carefully designed quasi-MonteCarlo sequence generation infrastructure. Then, we have provided a shading language optimizedfor ray-tracing. The new concept is usage of functional language for this purpose. The lan-guage is computationally complete and enables easy creation of complex material scripts. Thescript style resembles much more mathematical notation than classic imperative programminglanguages. Moreover, we have developed a new reflection model, which is together symmetricand energy preserving. This reflection model makes the task of describing glossy surfaces easier.Eventually, We have developed a specialized technique to store 2D surfaces and 3D participatingmedia in the same ray intersection acceleration structure. If a rendered scene contains roughlysimilar number of different 2D and 3D entities, the improved algorithm is near as twice as fastas algorithm using two separate structures. The ray traversal is performed in two steps. Duringfirst step, acceleration structure is searched for first non-transparent intersection point, storing allintersections with transparent elements. During second step, when the point of nearest intersectionwith non-transparent element is known, ray power is reduced accordingly to attenuation of everyintersected transparent element.

8.2 Final Thoughts and Future Work

One of the basic assumptions in the design of presented rendering algorithms was the correct-ness under the assumption of geometric optics applicability. The convergence of the output imageto the ideal one while increasing rendering time is therefore the crucial requirement. The researchdescribed in this thesis are a step further into improving reliability of algorithms satisfying this re-quirement. Additionally, interactive previewer of partially rendered images was provided. However,due to the correctness requirement, the implemented algorithms cannot match the performance ofalternative approaches, designed to produce best possible images in short time, without botheringwith progressive convergence to the ideal image.

Page 122: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

CHAPTER 8. CONCLUSION 111

After a few years spent on research in this domain, it becomes clear for the author that anamazing amount of research into rendering has been done, yet the real time rendering, which isstill based on scan-line rasterization, nowadays cannot be targeted to global illumination and raytracing, due to performance limitations. However, the development of hardware, and especiallygraphics hardware, runs at incredible pace, changing the programming models dramatically injust few years time. Two notable changes takes place during development of algorithms for thepurpose of presented research – namely introduction of partially programmable GPUs and switchto multicore CPUs. Today is it not clear which model – CPU or GPU is likely to be better forray tracing in the future, but it seems certain that one of these in a few years period is likely tohave enough computational power to run true global illumination in real time. The contributionsdescribed in this thesis can bring this day nearer, by making global illumination algorithms moreefficient and reliable.

Page 123: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Bibliography

[Arikan et al., 2005] Arikan, O., Forsyth, D. A., & O’Brien, J. F. (2005). Fast and DetailedApproximate Global Illumination by Irradiance Decomposition. In SIGGRAPH 2005 Proceedings(pp. 1108–1114). New York, NY, USA: ACM.

[Arvo, 1993] Arvo, J. (1993). Transfer Equations in Global Illumination. SIGGRAPH 1993 CourseNotes.

[Arvo & Kirk, 1990] Arvo, J. & Kirk, D. (1990). Particle Transport and Image Synthesis. InSIGGRAPH 1990 Proceedings, volume 24 (pp. 63–66). New York, NY, USA: ACM.

[Ashikhmin & Shirley, 2000] Ashikhmin, M. & Shirley, P. (2000). An Anisotropic Phong BRDFModel. Journal of Graphics Tools, 5(2), 25–32, A. K. Peters, Ltd., Natick, MA, USA.

[Ashikhmin et al., 2001] Ashikhmin, M., Shirley, P., Marschner, S., & Stam, J. (2001). State ofthe Art in Modeling and Measuring of Surface Reflection. SIGGRAPH 2001 Course #10.

[Beers et al., 1996] Beers, A. C., Agrawala, M., & Chaddha, N. (1996). Rendering from Com-pressed Textures. In SIGGRAPH 1996 Proceedings, volume 30 (pp. 373–378).

[Ben-Ari, 2006] Ben-Ari, M. (2006). Principles of Concurrent and Distributed Programming. Ad-dison Wesley, second edition.

[Benthin, 2006] Benthin, C. (2006). Realtime Ray Tracing on Current CPU Architectures. PhDthesis, Saarland University, Saarbrucken, Germany.

[Benthin et al., 2003] Benthin, C., Wald, I., & Slusallek, P. (2003). A Scalable Approach to In-teractive Global Illumination. Computer Graphics Forum (Proceedings of Eurographics 2003),22(3).

[Bishop et al., 1994] Bishop, G., Fuchs, H., McMillan, L., & Scher Zagier, E. (1994). FramelessRendering: Double Buffering Considered Harmful. In SIGGRAPH 1994 Proceedings, volume 28(pp. 175–176). New York, NY, USA: ACM.

[Blinn, 1977] Blinn, J. F. (1977). Models of Light Reflection for Computer Synthesized Pictures.In SIGGRAPH 1977 Proceedings (pp. 192–198). New York, NY, USA: ACM.

[Cline et al., 2005] Cline, D., Talbot, J., & Egbert, P. (2005). Energy Redistribution Path Tracing.In SIGGRAPH 2005 Proceedings (pp. 1186–1195). New York, NY, USA: ACM.

[Cohen & Wallace, 1993] Cohen, M. F. & Wallace, J. R. (1993). Radiosity and Realistic ImageSynthesis. Academic Press Professional.

[Cook, 1984] Cook, R. L. (1984). Shade Trees. In SIGGRAPH 1984 Proceedings, volume 18 (pp.223–231). New York, NY, USA: ACM.

[Cook et al., 1984] Cook, R. L., Porter, T., & Carpenter, L. (1984). Distributed Ray Tracing. InSIGGRAPH 1984 Proceedings, volume 18 (pp. 137–145). New York, NY, USA: ACM.

112

Page 124: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 113

[Cook & Torrance, 1982] Cook, R. L. & Torrance, K. E. (1982). A Reflectance Model for ComputerGraphics. ACM Transactions on Graphics, 1(1), 7–24, ACM, New York, NY, USA.

[Cortes & Raghavachary, 2008] Cortes, R. & Raghavachary, S. (2008). The RenderMan ShadingLanguage Guide. Thomson Course Technology.

[Dachsbacher et al., 2007] Dachsbacher, C., Stamminger, M., Drettakis, G., & Durand, F. (2007).Implicit Visibility and Antiradiance for Interactive Global Illumination. ACM Transactions onGraphics (Proceedings of SIGGRAPH 2007), 26(3).

[Dayal et al., 2005] Dayal, A., Woolley, C., Watson, B., & Luebke, D. (2005). Adaptive FramelessRendering. In Rendering Techniques 2005 (pp. 265–275).

[Debattista et al., 2006] Debattista, K., Santos, L. P., & Chalmers, A. (2006). Accelerating theIrradiance Cache through Parallel Component-Based Rendering. In EGPGV2006 – 6th Euro-graphics Symposium on Parallel Graphics Visualization (pp. 27–34).: Eurographics.

[Devlin et al., 2002] Devlin, K., Chalmers, A., Wilkie, A., & Purgathofer, W. (2002). Tone Re-production and Physically Based Spectral Rendering. In State of the Art Reports, Eurographics2002 (pp. 101–123).

[Dietrich et al., 2005] Dietrich, A., Wald, I., & Slusallek, P. (2005). Large-Scale CAD ModelVisualization on a Scalable Shared-Memory Architecture. In G. Greiner, J. Hornegger, H.Niemann, & M. Stamminger (Eds.), Proceedings of 10th International Fall Workshop - Vision,Modeling, and Visualization (VMV) 2005 (pp. 303–310). Erlangen, Germany: AkademischeVerlagsgesellschaft Aka.

[Dmitriev et al., 2002] Dmitriev, K., Brabec, S., Myszkowski, K., & Seidel, H.-P. (2002). Interac-tive Global Illumination Using Selective Photon Tracing. In Proceedings of the 13th EurographicsWorkshop on Rendering.

[Dong, 2006] Dong, W. (2006). Rendering Optical Effects Based on Spectra Representation inComplex Scenes. In Computer Graphics International (pp. 719–726).

[Durikovic & Kimura, 2006] Durikovic, R. & Kimura, R. (2006). GPU Rendering of the ThinFilm on Paints with Full Spectrum. In Proceedings of the IEEE Conference on InformationVisualization (pp. 751–756).

[Ebert et al., 2002] Ebert, D. S., Musgrave, F. K., Peachey, D., Perlin, K., & Worley, S. (2002).Texturing and Modeling: A Procedural Approach. Morgan Kaufmann, third edition.

[Edwards et al., 2006] Edwards, D., Boulos, S., Johnson, J., Shirley, P., Ashikhmin, M., Stark, M.,& Wyman, C. (2006). The Halfway Vector Disk for BRDF Modeling. ACM Transactions onGraphics, 25(1), 1–18, ACM, New York, NY, USA.

[Elliott, 2004] Elliott, C. (2004). Programming Graphics Processors Functionally. In Haskell ’04:Proceedings of the 2004 ACM SIGPLAN workshop on Haskell (pp. 45–56). New York, NY, USA:ACM.

[Evans & McCool, 1999] Evans, G. F. & McCool, M. D. (1999). Stratified Wavelength Clustersfor Efficient Spectral Monte Carlo Rendering. In Graphics Interface (pp. 42–49).

[Fan et al., 2005] Fan, S., Chenney, S., & Lai, Y.-c. (2005). Metropolis Photon Sampling withOptional User Guidance. In Rendering Techniques (pp. 127–138).

[Faure, 1982] Faure, H. (1982). Discrepance de suites associees a un systeme de numeration (endimension s). Acta Arithmetica, 41, 337–351.

[Fernando & Kilgard, 2003] Fernando, R. & Kilgard, M. J. (2003). The Cg Tutorial: The DefinitiveGuide to Programmable Real-Time Graphics. Addison-Wesley Longman Publishing Co., Inc.,Boston, MA, USA.

Page 125: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 114

[Fishman, 1999] Fishman, G. S. (1999). Monte Carlo: Concepts, Algorithms and Applications.Springer-Verlag, New York, USA.

[Fradin et al., 2005] Fradin, D., Meneveaux, D., & Horna, S. (2005). Out-of-Core Photon Mappingfor Large Buildings. In Proceedings of Eurographics Symposium on Rendering EGSR 2005,Konstanz, Germany.

[Fraser et al., 2005] Fraser, B., Murphy, C., & Bunting, F. (2005). Real World Color Management.Peachpit Press, Berkeley, CA, USA, second edition.

[Geimer & Muller, 2003] Geimer, M. & Muller, S. (2003). A Cross-Platform Framework for In-teractive Ray Tracing. Tagungsband Graphiktag der Gesellschaft fur Informatik, (pp. 25–34).,Frankfurt/Main, Germany.

[Gentle, 2003] Gentle, J. E. (2003). Random Number Generation and Monte Carlo Methods.Springer-Verlag.

[Ghuloum, 2007] Ghuloum, A. (2007). The Problem(s) with GPGPU. http://blogs.intel.com/research/2007/10/the problem with gpgpu.php.

[Ghuloum et al., 2007] Ghuloum, A., Sprangle, E., Fang, J., Gansha, & Wu, X. Z. (2007). Ct:A Flexible Parallel Programming Model for Tera-scale Architectures. Technical report, IntelCorporation.

[Gondek et al., 1994] Gondek, J. S., Meyer, G. W., & Newman, J. G. (1994). Wavelength Depen-dent Reflectance Functions. In SIGGRAPH 1994 Proceedings, volume 28 (pp. 213–220). NewYork, NY, USA: ACM.

[Goral et al., 1984] Goral, C. M., Torrance, K. E., Greenberg, D. P., & Battaile, B. (1984). Mod-eling the Interaction of Light Between Diffuse Surfaces. In SIGGRAPH 1984 Proceedings, vol-ume 18 (pp. 213–222). New York, NY, USA: ACM.

[Greenberg et al., 1997] Greenberg, D. P., Torrance, K. E., Shirley, P., Arvo, J., Lafortune, E.,Ferwerda, J. A., Walter, B., Trumbore, B., Pattanaik, S., & Foo, S.-C. (1997). A Frameworkfor Realistic Image Synthesis. In SIGGRAPH 1997 Proceedings, volume 31 (pp. 477–494). NewYork, NY, USA: ACM Press/Addison-Wesley Publishing Co.

[Gummaraju et al., 2007] Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., & Dally, W. J.(2007). Architectural Support for the Stream Execution Model on General-Purpose Processors.In 16th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[Gummaraju & Rosenblum, 2005] Gummaraju, J. & Rosenblum, M. (2005). Stream Programmingon General-Purpose Processors. In MICRO 38: Proceedings of the 38th annual ACM/IEEEinternational symposium on Microarchitecture Barcelona, Spain.

[Halton, 1960] Halton, J. H. (1960). On the efficiency of certain quasi-random sequences of pointsin evaluating multi-dimensional integrands. Numerische Mathematik, 2, 84–90.

[Hanson & Weiskopf, 2001] Hanson, A. J. & Weiskopf, D. (2001). Visualizing Relativity. SIG-GRAPH 2001 Course #15.

[Harris, 2005] Harris, M. (2005). Mapping Computational Concepts to GPUs. In SIGGRAPH ’05:ACM SIGGRAPH 2005 Courses (pp.50). New York, NY, USA: ACM.

[Havran, 2001] Havran, V. (2001). Heuristic Ray Shooting Algorithms. PhD thesis, Czech TechnicalUniversity, Prague, Czech Republic.

[Havran et al., 2005] Havran, V., Herzog, R., & Seidel, H.-P. (2005). Fast Final Gathering viaReverse Photon Mapping. Computer Graphics Forum (Proceedings of Eurographics 2005), 24(3),323–333.

Page 126: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 115

[He et al., 1991] He, X. D., Torrance, K. E., Sillion, F. X., & Greenberg, D. P. (1991). A Compre-hensive Physical Model for Light Reflection. In SIGGRAPH 1991 Proceedings, volume 25 (pp.175–186). New York, NY, USA: ACM.

[Heckbert, 1990] Heckbert, P. S. (1990). Adaptive Radiosity Textures for Bidirectional Ray Trac-ing. In SIGGRAPH 1990 Proceedings, volume 24 (pp. 145–154). New York, NY, USA: ACM.

[Herzog et al., 2007] Herzog, R., Havran, V., Kinuwaki, S., Myszkowski, K., & Seidel, H.-P. (2007).Global Illumination using Photon Ray Splatting. Computer Graphics Forum, 26(3), 503513.

[Hong & Hickernell, 2003] Hong, H. S. & Hickernell, F. J. (2003). Implementing Scrambled DigitalSequences. ACM Transactions on Mathematical Software, 29(2), 95–109, ACM, New York, NY,USA.

[Hudak et al., 2007] Hudak, P., Hughes, J., Jones, S. P., & Wadler, P. (2007). A History of Haskell:Being Lazy with Class. In HOPL III: Proceedings of the third ACM SIGPLAN conference onHistory of programming languages (pp. 12–1–12–55). New York, NY, USA: ACM.

[Ihrke et al., 2007] Ihrke, I., Ziegler, G., Tevs, A., Theobalt, C., Magnor, M., & Seidel, H.-P.(2007). Eikonal Rendering: Efficient Light Transport in Refractive Objects. In SIGGRAPH2007 Proceedings (pp. 59–1 – 59–9). New York, NY, USA: ACM.

[Intel Corporation, 2009] Intel Corporation (2009). Intel 64 and IA-32 Architectures Soft-ware Developer’s Manual, volume 1: Basic Architecture. http://www.intel.com/products/processor/manuals/.

[Jarosz et al., 2008] Jarosz, W., Donner, C., Zwicker, M., & Jensen, H. W. (2008). RadianceCaching for Participating Media. ACM Transactions on Graphics, 27(1), 1–11, ACM, NewYork, NY, USA.

[Jensen, 2000] Jensen, H. W. (2000). Parallel Global Illumination using Photon Mapping. InSIGGRAPH 2000 Course #30.

[Jensen, 2001] Jensen, H. W. (2001). Realistic Image Synthesis using Photon Mapping. A. K.Peters, Ltd., Natick, MA, USA.

[Johnson & Fairchild, 1999] Johnson, G. M. & Fairchild, M. D. (1999). Full-Spectral Color Calcu-lations in Realistic Image Synthesis. IEEE Computer Graphics and Applications, 19(4), 47–53.

[Kajiya, 1985] Kajiya, J. T. (1985). Anisotropic Reflection Models. In SIGGRAPH 1985 Proceed-ings, volume 19 (pp. 15–21). New York, NY, USA: ACM.

[Kajiya, 1986] Kajiya, J. T. (1986). The Rendering Equation. In SIGGRAPH 1986 Proceedings,volume 20 (pp. 143–150). New York, NY, USA: ACM.

[Karczmarczuk, 2002] Karczmarczuk, J. (2002). Functional Approach to Texture Generation. InPADL ’02: Proceedings of the 4th International Symposium on Practical Aspects of DeclarativeLanguages (pp. 225–242). London, UK: Springer-Verlag.

[Kelemen et al., 2002] Kelemen, C., Szirmay-Kalos, L., Antal, G., & Csonka, F. (2002). A Sim-ple and Robust Mutation Strategy for the Metropolis Light Transport Algorithm. ComputerGraphics Forum, 21(3), 531–540.

[Kessenich et al., 2010] Kessenich, J., Baldwin, D., & Rost, R. (2010). The OpenGL ShadingLanguage, language version 4.00.

[Kirk & Arvo, 1988] Kirk, D. & Arvo, J. (1988). The Ray Tracing Kernel. In Proceedings ofAusgraph ’88 (pp. 75–82). Melbourne, Australia.

[Krivanek, 2005] Krivanek, J. (2005). Radiance Caching for Global Illumination Computation onGlossy Surfaces. PhD thesis, Czech Technical University, Prague, Czech Republic.

Page 127: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 116

[Krivanek et al., 2006] Krivanek, J., Bouatouch, K., Pattanaik, S. N., & Zara, J. (2006). MakingRadiance and Irradiance Caching Practical: Adaptive Caching and Neighbor Clamping. In T.Akenine-Moller & W. Heidrich (Eds.), Rendering Techniques 2006, Eurographics Symposium onRendering (pp. 127–138). Nicosia, Cyprus: Eurographics Association.

[Lafortune et al., 1997] Lafortune, E. P. F., Foo, S.-C., Torrance, K. E., & Greenberg, D. P. (1997).Non-Linear Approximation of Reflectance Functions. In SIGGRAPH 1997 Proceedings, vol-ume 31 (pp. 117–126). New York, NY, USA: ACM Press/Addison-Wesley Publishing Co.

[Lai & Christensen, 2007] Lai, G. & Christensen, N. J. (2007). A Compression Method for SpectralPhoton Map Rendering. In WSCG 2007 Proceedings (pp. 95–102).

[Lawrence, 2006] Lawrence, J. (2006). Acquisition and Representation of Material Appearance forEditing and Rendering. PhD thesis, Princeton University, Princeton, NJ, USA.

[Leeson et al., 2000] Leeson, W., O’Sullivan, C., & Collins, S. (2000). EFFIGI: An Efficient Frame-work for Implementing Global Illumination. In WSCG 2000 Proceedings.

[Luebke & Parker, 2008] Luebke, D. & Parker, S. (2008). Interactive Ray Tracing with CUDA.Nvision 2008 Proceedings.

[Luna, 2006] Luna, F. (2006). Introduction to 3D Game Programming with Direct X 9.0c: AShader Approach. Wordware Publishing, Inc.

[Matsumoto & Nishimura, 1998] Matsumoto, M. & Nishimura, T. (1998). Mersenne Twister: A623-dimensionally equidistributed uniform pseudorandom number generator. ACM Transactionson Modeling and Computer Simulation, 8(1), 3–30.

[McGuire & Luebke, 2009] McGuire, M. & Luebke, D. (2009). Hardware-Accelerated GlobalIllumination by Image Space Photon Mapping. In Proceedings of the 2009 ACM SIG-GRAPH/EuroGraphics conference on High Performance Graphics New York, NY, USA: ACM.

[Munshi, 2009] Munshi, A. (2009). The OpenCL Specification. Khronos OpenCL Working Group.http://www.khronos.org/registry/cl/.

[Neumann et al., 1999a] Neumann, L., Neumann, A., & Szirmay-Kalos, L. (1999a). CompactMetallic Reflectance Models. In P. Brunet & R. Scopigno (Eds.), Computer Graphics Forum(Eurographics ’99), volume 18(3) (pp. 161–172).: The Eurographics Association and BlackwellPublishers.

[Neumann et al., 1999b] Neumann, L., Neumann, A., & Szirmay-Kalos, L. (1999b). ReflectanceModels by Pumping up the Albedo Function. Machine Graphics and Vision.

[Neumann et al., 1999c] Neumann, L., Neumann, A., & Szirmay-Kalos, L. (1999c). ReflectanceModels with Fast Importance Sampling. Computer Graphics Forum, 18(4), 249–265.

[Niederreiter, 1992] Niederreiter, H. (1992). Random Number Generation and Quasi-Monte CarloMethods. Society for Industrial and Applied Mathematics, Philadelphia, USA.

[Niederreiter & Xing, 1996] Niederreiter, H. & Xing, C. (1996). Low-Discrepancy Sequences andGlobal Function Fields with Many Rational Places. Finite Fields and Their Applications, 2(3),241–273.

[Nvidia Corporation, 2009a] Nvidia Corporation (2009a). Nvidia CUDA Programming Guide.http://www.nvidia.com/object/cuda develop.html.

[Nvidia Corporation, 2009b] Nvidia Corporation (2009b). Nvidia’s Next Generation CUDA Com-pute Architecture: Fermi. http://www.nvidia.com/object/fermi architecture.html.

[Oren & Nayar, 1994] Oren, M. & Nayar, S. K. (1994). Generalization of Lambert’s ReflectanceModel. In SIGGRAPH 1994 Proceedings, volume 28 (pp. 239–246). New York, NY, USA: ACM.

Page 128: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 117

[Paris et al., 2008] Paris, S., Kornprobst, P., Tumblin, J., & Durand, F. (2008). A Gentle Intro-duction to Bilateral Filtering and its Applications. SIGGRAPH 2008 course notes.

[Parker et al., 1999] Parker, S., Martin, W., Sloan, P., Shirley, P., Smits, B., & Hansen, C. (1999).Interactive Ray Tracing. In Symposium on Interactive 3D Graphics (pp. 119–126).

[Pauly, 1999] Pauly, M. (1999). Robust Monte Carlo Methods for Photorealistic Rendering ofVolumetric Effects. Master’s thesis, University of Kaiserslautern, Kaiserslautern, Germany.

[Peercy, 1993] Peercy, M. S. (1993). Linear Color Representations for Full Spectral Rendering. InSIGGRAPH 1993 Proceedings, volume 27 (pp. 191–198).

[Pellacini et al., 2000] Pellacini, F., Ferwerda, J. A., & Greenberg, D. P. (2000). Toward a Psycho-physically-Based Light Reflection Model for Image Synthesis. In SIGGRAPH 2000 Proceedings(pp. 55–64). New York, NY, USA: ACM Press/Addison-Wesley Publishing Co.

[Perlin, 1985] Perlin, K. (1985). An Image Synthesizer. In SIGGRAPH 1985 Proceedings, vol-ume 19 (pp. 287–296). New York, NY, USA: ACM.

[Pharr & Humphreys, 2004] Pharr, M. & Humphreys, G. (2004). Physically Based Rendering:from Theory to Implementation. Morgan Kaufmann, San Francisco, CA, USA.

[Phong, 1975] Phong, B. T. (1975). Illumination for Computer Generated Pictures. Communica-tions of the ACM, 18(6), 311–317, ACM, New York, NY, USA.

[Plucinska & Plucinski, 2000] Plucinska, A. & Plucinski, E. (2000). Rachunek Prawdopodobien-stwa. Wydawnictwa Naukowo-Techniczne, Warszawa, Poland.

[Pohl, 2009] Pohl, D. (2009). Light It Up! Quake Wars Gets Ray Traced. Intel Visual Adrenaline,2, 34–39.

[Popov et al., 2006] Popov, S., Gunther, J., Seidel, H.-P., & Slusallek, P. (2006). Experiences withStreaming Construction of SAH KD-Trees. In Proceedings of the 2006 IEEE Symposium onInteractive Ray Tracing (pp. 89–94).

[Popov et al., 2007] Popov, S., Gunther, J., Seidel, H.-P., & Slusallek, P. (2007). Stackless kd-treetraversal for high performance gpu ray tracing. In Computer Graphics Forum, volume 26 (pp.415–424).

[Purcell, 2004] Purcell, T. J. (2004). Ray Tracing on a Stream Processor. PhD thesis, StanfordUniversity, Stanford, CA, USA.

[Purcell et al., 2002] Purcell, T. J., Buck, I., Mark, W. R., & Hanrahan, P. (2002). Ray Tracingon Programmable Graphics Hardware. ACM Transactions on Graphics, 21(3), 703–712.

[Radziszewski & Alda, 2008a] Radziszewski, M. & Alda, W. (2008a). Family of energy conservingglossy reflection models. Lecture Notes in Computer Science, 5102, 46–55.

[Radziszewski & Alda, 2008b] Radziszewski, M. & Alda, W. (2008b). Optimization of FrequencyFiltering in Random Access JPEG Library. Computer Science, 9, 109–120, Uczelniane Wydaw-nictwa Naukowo-Dydaktyczne AGH, Krakow, Poland.

[Radziszewski et al., 2010] Radziszewski, M., Alda, W., & Boryczko, K. (2010). Interactive raytracing client. In WSCG 2010 proceedings (pp. 271–278).

[Radziszewski et al., 2009] Radziszewski, M., Boryczko, K., & Alda, W. (2009). An ImprovedTechnique for Full Spectral Rendering. Journal of WSCG, 17(1), 9–16, UNION Agency, Plzen,Czech Republic.

[Richter, 1999] Richter, J. (1999). Programming Applications for Microsoft Windows. MicrosoftPress, fourth edition.

Page 129: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 118

[Rost & Licea-Kane, 2009] Rost, R. J. & Licea-Kane, B. (2009). OpenGL Shading Language. Ad-dison Wesley, third edition.

[Rougeron & Peroche, 1997] Rougeron, G. & Peroche, B. (1997). An Adaptive Representation ofSpectral Data for Reflectance Computations. In Rendering Techniques ’97 (Proceedings of the8th Eurographics Workshop on Rendering) (pp. 127–138).

[Rushmeier & Ward, 1994] Rushmeier, H. E. & Ward, G. J. (1994). Energy Preserving Non-LinearFilters. In SIGGRAPH 1994 Proceedings, volume 28 (pp. 131–138). New York, NY, USA: ACM.

[Schlick, 1993] Schlick, C. (1993). A Customizable Reflectance Model for Everyday Rendering. InFourth Eurographics Workshop on Rendering (pp. 73–84). Paris, France.

[Segal & Akeley, 2010] Segal, M. & Akeley, K. (2010). The OpenGL Graphics System: A Specifi-cation, version 4.0.

[Seiler et al., 2008] Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P.,Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., & Hanra-han, P. (2008). Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Trans-actions on Graphics, 27(3), 1–15, ACM, New York, NY, USA.

[Shevtsov et al., 2007] Shevtsov, M., Soupikov, A., & Kapustin, A. (2007). Highly Parallel FastKD-Tree Construction for Interactive Ray Tracing of Dynamic Scenes. In Computer GraphicsForum (pp. 395–404).

[Shirley et al., 1997] Shirley, P., Smits, B., Hu, H., & Lafortune, E. (1997). A Practitioners’Assessment of Light Reflection Models. In Pacific Graphics ’97 Proceedings (pp.40). Washington,DC, USA: IEEE Computer Society.

[Shirley et al., 1991] Shirley, P., Sung, K., & Brown, W. (1991). A Ray Tracing Framework forGlobal Illumination Systems. In Proceedings of Graphics Interface ’91 (pp. 117–128). Toronto,Ontario: Canadian Information Processing Society.

[Shreiner, 2009] Shreiner, D. (2009). OpenGL Programming Guide: The Official Guide to LearningOpenGL, versions 3.0 and 3.1. Addison-Wesley Professional, seventh edition.

[Smits, 1999] Smits, B. (1999). An RGB to Spectrum Conversion for Reflectances. Journal ofGraphics Tools, (pp. 11–22).

[Spencer et al., 1995] Spencer, G., Shirley, P., Zimmerman, K., & Greenberg, D. P. (1995).Physically-Based Glare Effects for Digital Images. In SIGGRAPH 1995 Proceedings, volume 29(pp. 325–334). New York, NY, USA: ACM.

[Stam, 1999] Stam, J. (1999). Diffraction Shaders. In SIGGRAPH 1999 Proceedings (pp. 101–110).New York, NY, USA: ACM Press/Addison-Wesley Publishing Co.

[Stark et al., 2005] Stark, M. M., Arvo, J., & Smits, B. (2005). Barycentric Parameterizations forIsotropic BRDFs. IEEE Transactions on Visualization and Computer Graphics, 11(2), 126–138,IEEE Educational Activities Department, Piscataway, NJ, USA.

[Stephens et al., 2006] Stephens, A., Boulos, S., Bigler, J., Wald, I., & Parker, S. G. (2006). AnApplication of Scalable Massive Model Interaction using Shared Memory Systems. In Proceedingsof the 2006 Eurographics Symposium on Parallel Graphics and Visualization (pp. 19–26).

[Stone, 2003] Stone, M. (2003). A Field Guide to Digital Color. AK Peters, Natick, MA, USA.

[Sun et al., 2000] Sun, Y., Fracchia, D. F., Drew, M. S., & Calvert, T. W. (2000). RenderingIridescent Colors of Optical Disks. In 11th Eurographics Workshop on Rendering (EGRW) (pp.341–352).

[Sun et al., 2001] Sun, Y., Fracchia, D. F., Drew, M. S., & Calvert, T. W. (2001). A SpectrallyBased Framework for Realistic Image Synthesis. The Visual Computer, 17(7), 429–444.

Page 130: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 119

[Suykens & Willems, 2000] Suykens, F. & Willems, Y. D. (2000). Adaptive Filtering for Progres-sive Monte Carlo Image Rendering. In Proceedings of the 8th International Conference in CentralEurope on Computer Graphics, Visualization and Interactive Digital Media (WSCG) 2000 (pp.220–227).

[Trevett, 2008] Trevett, N. (2008). OpenCL: The Open Standard for Heterogenous Parallel Pro-gramming. Proceedings of the International Conference for High Performance Computing, Net-working, Storage and Analysis (SC08).

[Urena et al., 1997] Urena, C., Torres, J., Revelles, J., Cano, P., del Sol, V., & Cabrera, M. (1997).Designing an Object-Oriented Rendering System. In 6th Eurographics Workshop on Program-ming Paradigms in Graphics (pp. 23–42).

[Veach, 1996] Veach, E. (1996). Non-Symmetric Scattering in Light Transport Algorithms. InProceedings of the Eurographics Workshop on Rendering Techniques ’96 (pp. 81–90). London,UK: Springer-Verlag.

[Veach, 1997] Veach, E. (1997). Robust Monte Carlo Methods for Light Transport Simulation. PhDthesis, Stanford University, Stanford, CA, USA.

[Veach & Guibas, 1995] Veach, E. & Guibas, L. J. (1995). Optimally Combining Sampling Tech-niques for Monte Carlo Rendering. In SIGGRAPH 1995 Proceedings, volume 29 (pp. 419–428).New York, NY, USA: ACM.

[Wald et al., 2005] Wald, I., Benthin, C., Efremov, A., Dahmen, T., Gunther, J., Dietrich, A.,Havran, V., Slusallek, P., & Seidel, H.-P. (2005). A Ray Tracing based Virtual Reality Frame-work for Industrial Design. Technical Report TR-2005-02, Computer Graphics Group, SaarlandUniversity.

[Wald et al., 2002] Wald, I., Benthin, C., & Slusallek, P. (2002). Interactive Global IlluminationUsing Fast Ray Tracing. In Proceedings of the 13th Eurographics Workshop on Rendering (pp.15–24).

[Wald et al., 2006] Wald, I., Ize, T., Kensler, A., Knoll, A., & Parker, S. G. (2006). Ray TracingAnimated Scenes using Coherent Grid Traversal. In SIGGRAPH 2006 Proceedings (pp. 485–493). New York, NY, USA: ACM.

[Wald et al., 2001] Wald, I., Slusallek, P., Benthin, C., & Wagner, M. (2001). Interactive Renderingwith Coherent Ray Tracing. In Computer Graphics Forum (pp. 153–164).

[Wallace et al., 1987] Wallace, J. R., Cohen, M. F., & Greenberg, D. P. (1987). A Two-PassSolution to the Rendering Equation: A Synthesis of Ray Tracing and Radiosity Methods. InSIGGRAPH 1987 Proceedings, volume 21 (pp. 311–320). New York, NY, USA: ACM.

[Walter, 2005] Walter, B. (2005). Notes on the Ward BRDF. Technical Report PCG-05-06, CornellUniversity.

[Walter et al., 2007] Walter, B., Marschner, S. R., Li, H., & Torrance, K. E. (2007). MicrofacetModels for Refraction through Rough Surfaces . In Eurographics Symposium on Rendering (pp.195–206). Grenoble, France: Eurographics Association.

[Wang et al., 2007] Wang, L., Wang, X., Sloan, P.-P., Wei, L.-Y., Tong, X., & Guo, B. (2007).Rendering from Compressed High Dynamic Range Textures on Programmable Graphics Hard-ware. In I3D ’07: Proceedings of the 2007 symposium on Interactive 3D graphics and games(pp. 17–24). New York, NY, USA: ACM.

[Wang et al., 2009] Wang, R., Wang, R., Zhou, K., Pan, M., & Bao, H. (2009). An Efficient GPU-based Approach for Interactive Global Illumination. ACM Transactions on Graphics, 28(3), 1–8,ACM, New York, NY, USA.

Page 131: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

BIBLIOGRAPHY 120

[Ward, 1992] Ward, G. J. (1992). Measuring and Modeling Anisotropic Reflection. In SIGGRAPH1992 Proceedings, volume 26 (pp. 265–272). New York, NY, USA: ACM.

[Ward & Eydelberg-Vileshin, 2002] Ward, G. J. & Eydelberg-Vileshin, E. (2002). Picture PerfectRGB Rendering Using Spectral Prefiltering and Sharp Color Primaries. In EGRW ’02: Proceed-ings of the 13th Eurographics workshop on Rendering (pp. 117–124). Aire-la-Ville, Switzerland:Eurographics Association.

[Ward & Heckbert, 1992] Ward, G. J. & Heckbert, P. (1992). Irradiance Gradients. In ThirdEurographics Workshop on Rendering (pp. 85–98). Bristol, UK.

[Ward et al., 1988] Ward, G. J., Rubinstein, F. M., & Clear, R. D. (1988). A Ray Tracing Solutionfor Diffuse Interreflection. In SIGGRAPH 1988 Proceedings, volume 22 (pp. 85–92). New York,NY, USA: ACM.

[Westin et al., 1992] Westin, S. H., Arvo, J. R., & Torrance, K. E. (1992). Predicting ReflectanceFunctions from Complex Surfaces. In SIGGRAPH 1992 Proceedings, volume 26 (pp. 255–264).New York, NY, USA: ACM.

[Westin et al., 2004a] Westin, S. H., Li, H., & Torrance, K. E. (2004a). A Comparison of FourBRDF Models. Technical Report PCG-04-02, Cornell University.

[Westin et al., 2004b] Westin, S. H., Li, H., & Torrance, K. E. (2004b). A Field Guide to BRDFModels. Technical Report PCG-04-01, Cornell University.

[Whitted, 1980] Whitted, T. (1980). An Improved Illumination Model for Shaded Display. Com-munications of the ACM, 23(6), 343–349, ACM, New York, NY, USA.

[Wilkie et al., 2000] Wilkie, A., Tobler, R., & Purgathofer, W. (2000). Raytracing of DispersionEffects in Transparent Materials. In WSCG 2000 Conference Proceedings.

[Wilson et al., 1997] Wilson, D. L., Baddeley, A. J., & Owens, R. A. (1997). A New Metric forGrey-Scale Image Comparison. International Journal of Computer Vision, 24(1), 5–17, KluwerAcademic Publishers, Hingham, MA, USA.

[Woop et al., 2005] Woop, S., Schmittler, J., & Slusallek, P. (2005). RPU: A Programmable RayProcessing Unit for Realtime Ray Tracing. In SIGGRAPH 2005 Proceedings (pp. 434–444). NewYork, NY, USA: ACM.

Page 132: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

Index

absorption coefficient, 8

bidirectional path tracing, 46issues, 48optimizations, 47radiance estimator, 46

Borel sets, 14BRDF, 9

requirements, 98symmetry, 9, 98

BSDF, 7energy conservation, 9properties, 9

BTDF, 9

cdf, see cumulative distribution functionChebyshev’s inequality, 16CIE XYZ color space, 32coherent ray tracing, 73CrossFire, 70cumulative distribution function, 15

discrepancy, 20star, 20

dispersion, 33

estimators, 16bias, 16efficiency, 18variance reduction, 17

extensions, 89extinction coefficient, 8

Faure sequence, 21full spectrum, 32, 72

acquiring data, 87basic operations, 35cluster sampling, 36conversion to RGB, 88representation, 35sample cluster generation, 37sampling light paths, 41

functional programming, 91

general purpose GPU, 71geometric optics, 4

assumptions, 4global illumination, 1

GPGPU, see general purpose GPUGPU, see graphics processing unitgraphics processing unit, 70

algorithms, 71limitations, 70

Halton sequence, 21hardware for rendering, 69

importance, 11importance sampling, 17interactive visualization, 71

back buffer repaint, 75client and server, 74hardware platform, 74load balancing, 79MIP-mapping issues, 78rasterization of samples, 74server wrapper, 74variance estimation, 76

irradiance, 5irradiance caching, 51

Latin hypercube sampling, 19variance, 19

light transport equation, 7analytic solutions, 10integral formulation, 12simplifications, 10

light transport paths, 29classification, 30construction, 30local sampling, 31

linear illumination, 5local illumination, 1low discrepancy sequences, 21

message passing, 70metamers, 32Metropolis light transport, 49

initialization, 50mutations, 49optimizations, 50

Monte Carlo integration, 14multiple importance sampling, 17

Niederreiter-Xing sequence, 24noise, 94

121

Page 133: Design and Implementation of Parallel Light Transport Algorithms based on quasi-Monte Carlo

INDEX 122

non-deterministic algorithms, 14Las Vegas, 14Monte Carlo, 14

participating media, 7path tracing, 42

limitations, 44radiance estimator, 42

pdf, see probability density functionphase function, 8

properties, 9photon mapping, 52

full spectral, 54limitations, 56on stream machine, 67one pass, 56optimizations, 55

probability density function, 15procedural texturing, 90

quadrature rules, 14quasi-Monte Carlo, 19

limitations, 24randomized, 23sampling, 83

radiance, 6radiance caching, 51radiant flux, 5radiant intensity, 6radical inverse, 21radiosity, 29random variable, 15

expected value, 15multidimensional, 15standard deviation, 15variance, 15

rasterization, 28ray casting operator, 7

computation, 85ray tracing, 28readers-writers problem, 63rendering equation, see light transport equationRGB color space, 32RMS norm, 104root mean square, see RMS normRussian roulette, 18

scattering coefficient, 8shading, 92shared memory, 70σ-algebra, 14SLI, 70splitting, 18stratified sampling, 19stream processing, 62

cache synchronization, 63kernels, 62model with cache, 62Monte Carlo integration, 64multipass rendering, 66ray tracing, 66streams, 62

texturing, 92(t,m, s)-nets, 22transmittance, 8(t, s)-sequences, 22

van der Corput sequence, 21view dependent, 27view independent, 27volume emittance, 6

wavelength dependent phenomena, 36