8
BSP 2 : Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli * Dip. Informatica e Automazione Universit` a Roma Tre, Italy Alberto Paoluzzi Dip. Informatica e Automazione Universit` a Roma Tre, Italy Valerio Pascucci Center for Applied Scientific Computing L. Livermore National Lab., USA Abstract We introduce a parallel approach to geometric modeling of com- plex objects and scenes, combining a dataflow streaming of BSP trees with a partition of the object space into independent portions, to be evaluated in parallel with minimal interprocess communica- tion. Binary Space Partition (BSP) is a space index used in graphics for hidden-surface removal and animation. We use BSP trees with fuzzy leaves as a progressive representation of solid meshes. Our approach is implemented as a dataflow with processes that progress concurrently, where each refinement of the input to a process is mapped instantly to a refinement of the output, so that the result is also a stream of progressive refinements. This framework allows for progressive generation of complex geometric parts and large-scale assemblies. We have adapted several graphics techniques, includ- ing BSP, boundary polygons, CSG, splines and subdivision meth- ods, to fit into our dataflow graph, where four types of processes produce, transform, combine or consume mesh cells. This approach is scalable over different kinds of HPC hardware and different num- ber of computing nodes, by way of the decomposition of the object space and of the distribution of computational processes. Compil- ing a generative geometric expression into a dataflow graph is well suited to SMP machines, whereas a space decomposition into inde- pendent portions fits well with computing clusters and grids. CR Categories: I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Boolean-ops; I.3.6 [Computer Graphics]: Methodology and Techniques—BSP-tree, NURBS Keywords: dataflow, real-time, on-demand, streaming, BSP-tree, progressive modeling, adaptive refinement 1 Introduction The purpose of this paper is to introduce a novel parallel technol- ogy for geometric modeling of very large-scale assemblies and very complex shapes. Our motivation is in the development of a real- time framework for generation of locally adaptive model meshes, to support contemporary shape and properties design and facilitate the exploration of solution space and design optimization. The method we discuss can be classified as embarrassingly paral- lel and relies on (a) the concurrency of the processes of a dataflow network fed by a continuous stream of shape refinements, and on (b) the generation of model portions to be (progressively) evalu- ated in parallel as a queue of independent jobs. Both such parallel * e-mail: [email protected] e-mail: [email protected] e-mail: [email protected] computational mechanisms are based on a space index, well-known in graphics as BSP (Binary Space Partition), and normally used for software rendering with hidden surfaces removed and for animation within static scenes. A BSP is a binary tree of hyperplanes where the leaves correspond to the convex cells of the space partition in- duced by such hyperplanes. They are labeled either as solid (IN) or empty space (OUT). We use a further FUZZY label for BSP leaves whose solidity is yet undecided. Whereas the solid or empty cells are usually not further detailed (they could be for simulation pur- poses), a fuzzy cell leaf may be later split by a properly generated hyperplane, giving rise to two new leaves. one of which is usually either solid or empty. The goal is to produce from the very first moment a mesh de- composition of an approximation of the geometric model generated by a symbolic expression, that may contain primitive or imported objects, affine or projective transformations, geometric operations (Boolean ops, Cartesian product, Minkowski sum, etc.) and paren- theses. Such generative expression is therefore compiled into a dataflow graph pipeline using four types of concurrent processes, that respectively produce, transform, combine or consume cells of the generated mesh. The mesh is continuously improved while new refinement cells traverse the pipeline. The dataflow parallelism works as follows. A continuous stream of BSP nodes is generated by each specialized producer (or builder) process and traverses the pipeline (see Figure ??), getting transformed or suitably operated by each traversed process, implemented as an independent thread. This kind of parallelism is well suited for SMP machines and other kind of shared-memory HPC hardware. This approach is embarrassingly parallel, because there is no com- munication between processors, communication overhead is negli- gible and the speed-up is nearly linear with the number of proces- sors. In particular, (a) the generating code (either source or com- piled, depending on the link throughput and on the availability of a specialized machine pool) is broadcasted to all nodes of a clus- ter; (b) a specification of scene fragments, made by clipping hyper- planes, is distributed to the computing nodes, and the scene por- tions are elaborated independently of each other; (c) the resulting (refinements of) sub-meshes will be collected at suitable time inter- vals. For this purpose we use a large-grain Bulk Synchronous Par- allel approach, from which the technology name BSP 2 is derived. The name stands for Binary Space Partition over Bulk Synchronous Parallel hardware. The approach is incredibly simple, as the clip- ping of each scene fragment is also generated within the same BSP framework, just by setting the first hyperplanes, so that various sub- trees are independently detailed by different computing nodes. The approach easily supports local adaptive refinement (e.g. driven by simulations) or hierarchical dynamical distribution of the computa- tional load within the same BSP framework. After the initial dis- tribution of the generating expression (or executable code) to all of the nodes, each job is simply specified by the clipping BSP. Furthermore, the computation can be easily organized as a queue of independent jobs, that may also be generated hierarchically, and submitted to some computational infrastructure by using high- troughput batch tools for clusters or grids, like Condor [?] or the Globus toolkit [?; ?]. In this case it is not necessary that the job splitting be computed according to the space distribution of the model: an octree or k-d-tree partition of either the bounding volume

BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

BSP2: Parallel Solid Modeling using Dataflow Binary Space Partition

Giorgio Scorzelli∗

Dip. Informatica e AutomazioneUniversit̀a Roma Tre, Italy

Alberto Paoluzzi†

Dip. Informatica e AutomazioneUniversit̀a Roma Tre, Italy

Valerio Pascucci‡

Center for Applied Scientific ComputingL. Livermore National Lab., USA

Abstract

We introduce a parallel approach to geometric modeling of com-plex objects and scenes, combining adataflowstreaming of BSPtrees with apartition of the object space into independent portions,to be evaluated in parallel with minimal interprocess communica-tion. Binary Space Partition (BSP) is a space index used in graphicsfor hidden-surface removal and animation. We use BSP trees withfuzzyleaves as a progressive representation of solid meshes. Ourapproach is implemented as a dataflow with processes that progressconcurrently, where each refinement of the input to a process ismapped instantly to a refinement of the output, so that the result isalso a stream of progressive refinements. This framework allows forprogressive generation of complex geometric parts and large-scaleassemblies. We have adapted several graphics techniques, includ-ing BSP, boundarypolygons, CSG, splinesandsubdivisionmeth-ods, to fit into our dataflow graph, where four types of processesproduce, transform, combineor consumemesh cells. This approachis scalable over different kinds of HPC hardware and different num-ber of computing nodes, by way of the decomposition of the objectspace and of the distribution of computational processes. Compil-ing a generative geometric expression into a dataflow graph is wellsuited to SMP machines, whereas a space decomposition into inde-pendent portions fits well with computing clusters and grids.

CR Categories: I.3.5 [Computer Graphics]: ComputationalGeometry and Object Modeling—Boolean-ops; I.3.6 [ComputerGraphics]: Methodology and Techniques—BSP-tree,NURBS

Keywords: dataflow, real-time, on-demand, streaming,BSP-tree,progressive modeling, adaptive refinement

1 Introduction

The purpose of this paper is to introduce a novel parallel technol-ogy for geometric modeling of very large-scale assemblies and verycomplex shapes. Our motivation is in the development of a real-time framework for generation of locally adaptive model meshes,to support contemporary shape and properties design and facilitatethe exploration of solution space and design optimization.

The method we discuss can be classified as embarrassingly paral-lel and relies on (a) the concurrency of the processes of a dataflownetwork fed by a continuous stream of shape refinements, and on(b) the generation of model portions to be (progressively) evalu-ated in parallel as a queue of independent jobs. Both such parallel

∗e-mail: [email protected]†e-mail: [email protected]‡e-mail: [email protected]

computational mechanisms are based on a space index, well-knownin graphics as BSP (Binary Space Partition), and normally used forsoftware rendering with hidden surfaces removed and for animationwithin static scenes. A BSP is a binary tree of hyperplanes wherethe leaves correspond to the convex cells of the space partition in-duced by such hyperplanes. They are labeled either as solid (IN) orempty space (OUT). We use a furtherFUZZY label for BSP leaveswhose solidity is yet undecided. Whereas the solid or empty cellsare usually not further detailed (they could be for simulation pur-poses), a fuzzy cell leaf may be later split by a properly generatedhyperplane, giving rise to two new leaves. one of which is usuallyeither solid or empty.

The goal is to produce from the very first moment a mesh de-composition of an approximation of the geometric model generatedby a symbolic expression, that may contain primitive or importedobjects, affine or projective transformations, geometric operations(Boolean ops, Cartesian product, Minkowski sum, etc.) and paren-theses. Such generative expression is therefore compiled into adataflow graph pipeline using four types of concurrent processes,that respectively produce, transform, combine or consume cells ofthe generated mesh. The mesh is continuously improved whilenew refinement cells traverse the pipeline. The dataflow parallelismworks as follows. A continuous stream of BSP nodes is generatedby each specialized producer (or builder) process and traverses thepipeline (see Figure??), getting transformed or suitably operatedby each traversed process, implemented as an independent thread.This kind of parallelism is well suited for SMP machines and otherkind of shared-memory HPC hardware.

This approach isembarrassingly parallel, because there is no com-munication between processors, communication overhead is negli-gible and the speed-up is nearly linear with the number of proces-sors. In particular, (a) the generating code (either source or com-piled, depending on the link throughput and on the availability ofa specialized machine pool) is broadcasted to all nodes of a clus-ter; (b) a specification of scene fragments, made by clipping hyper-planes, is distributed to the computing nodes, and the scene por-tions are elaboratedindependentlyof each other; (c) the resulting(refinements of) sub-meshes will be collected at suitable time inter-vals. For this purpose we use a large-grain Bulk Synchronous Par-allel approach, from which the technology name BSP2 is derived.The name stands forBinary Space Partition over Bulk SynchronousParallel hardware. The approach is incredibly simple, as the clip-ping of each scene fragment is also generated within the same BSPframework, just by setting the first hyperplanes, so that various sub-trees are independently detailed by different computing nodes. Theapproach easily supports local adaptive refinement (e.g. driven bysimulations) or hierarchical dynamical distribution of the computa-tional load within the same BSP framework. After the initial dis-tribution of the generating expression (or executable code) to all ofthe nodes, each job is simply specified by the clipping BSP.

Furthermore, the computation can be easily organized as a queueof independent jobs, that may also be generated hierarchically,and submitted to some computational infrastructure by using high-troughput batch tools for clusters or grids, like Condor [?] or theGlobus toolkit [?; ?]. In this case it is not necessary that the jobsplitting be computed according to the space distribution of themodel: an octree ork-d-tree partition of either the bounding volume

Page 2: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

or the embedding space is sufficient, since jobs that contain someempty portion of the scene will quickly return and will be substi-tuted by new jobs from the queue. Further jobs can be dynamicallyadded to the queue in case of hierarchical adaptive load distribu-tion. Priority queues may be used to speed up the computation withadaptive local focus. In perspective, it is not difficult to imaginean implementation of our approach using OGSA XML-based webservices [?].

The approach we introduce in this paper is not only, to the au-thors knowledge, the first complete framework for parallel geomet-ric modeling, where most of geometric and graphics techniques areaccomodated, moreover is also optimal when used in a distributedcomputational environment. As a matter of fact, the computation ofmodel fragments on the nodes is completely independent, i.e. thereis negligible interprocess communication. Some inter-node trans-fer of data is needed only to distribute the load to each processor(just by giving a BSP-tree path), and to return the results, i.e. thecomputed mesh cells, on the master node.

The paper is organized as follows. Section 2 is devoted to recallthe previous work related both to parallel solid modeling and to thepresent approach. In Section??the architecture of the approach andits fitting of several modeling and graphics techniques are shortlydiscussed. In Section?? the dataflow model of a geometric compu-tation is described, our strategy of scene-splitting into independentjobs is detailed, and the interface reconstruction problem is solved.In Section??we discuss some examples and report on the timing ofour approach. The Conclusion section outlines our ongoing workand open problems. The Appendix contains the detailed runtimeson various cluster configurations.

2 Related work

More than a hundred papers could be cited about parallel renderingand visualization of both volume and surface geometric models.Conversely, the computation of complex solid models was tradi-tionally considered very hard to parallelize, and very few previousattempts to parallel shape generation can be found in the literature.A 2D subdivision approach was recently used to improve the ro-bustness and speed of Sandia National Laboratory’s 3D MEMS ge-ometry modeler through a combination of mask subdivision andcode parallelization [?]. An algorithm for parallel intersection ofsolids using the Connection Machine and specialized data struc-tures had been previously introduced by [?].

The lack of parallel approaches to solid modeling can be explainedwith the very high complexity of boundary data structures, nor-mally used to represent a non-manifold solid model [?], and theirlack of implicit space indexing. In our approach a double represen-tation of geometry and topology is used instead, by combining BSPtrees [?; ?], which do not store any information about the topologyof the space they split, with the complete representation [?] of thestepwise-generated mesh topology, by using the Hasse diagram ofthe polytopal complex [?; ?]. This design choice allows one to splitthe model to be generated into independent fragments. The algo-rithm for progressive Boolean operations (see Figure??) is givenin [?], the integration of several graphics and modeling techniquesin this framework is discussed in [?]. Another significant differencewith standard approaches, that makes possible the parallel one herepresented, is its focus on solid mesh decomposition instead than onboundary representations. A trade-off with much higher memorysize is unavoidable when the model is used for simulations and notonly for visualization.

Our approach is a hybridization of the dataflow and BSP paral-

lel models of computation. The Bulk Synchronous Parallel (BSP)model, proposed by [?] and developed by [?], aims to provide aunified framework for the design and programming of general pur-pose parallel computer systems made by a set of processor-memorypairs and by a communication network. The name dataflow comesfrom the conceptual notion that a program in a dataflow computeris a directed graph, and that data flows between instructions alongits arcs [?]. In some sense, our progressive approach to geometricand solid modeling is a demonstration of the power and flexibilityof the dataflow approach [?], often associated to a functional lan-guage. An introduction to the visual prototype ofPLASM [?], ageometric extension of the FL language [?; ?] is given in [?].

3 Dataflow streaming architecture

A monolithic geometric kernel is substituted in the dataflow sideof our technology by a set of concurrent processes, which pro-duce a more and more detailed representation of a complex geo-metric scene, depending on the available resources (time, proces-sors, memory, bandwidth). This result is obtained by compilinga shape-generating geometric expression into a dataflow graph ofprocesses whose components produce, transform, combine, con-sume (or observe) suitable data structures, depending on the taskthey have to perform. The approach is aimed at supporting genera-tive geometric modeling, starting from either primitive or importedshapes, that may come from external data stores. Such atomicshapes may be either transformed with affine or projective transfor-mations, or aggregated within hierarchical assemblies, or combinedby several operators, including Boolean operations, Cartesian prod-uct and Minkowski sums of point-sets. A combination of CartesianProduct and Minkowski sum is used to implement the operations ofextrusion, sweeping and offset [?].

Our dataflow approach results in a large-grain streamlined paral-lelism where suitable data structures flow between specialized pro-cesses, with the resulting shape produced by a set of progressiverefinements of a first approximation of the result, that is generatednearly instantly. The data tokens conceptually flowing between dif-ferent computations are a couple of pointers to the double represen-tation of the mesh, i.e. (a) a BSP-tree node (actually either a linearhyperplane or a leaf label) and (b) its associatedd-cell in the Hassediagram of the current mesh.

cylinder

builder

translation

transformer

rotation

transformer

rotation

transformer

union

combiner

cube

builder

difference

combiner

viewer

driver

translation

transformer

Figure 1: Dataflow graph of the generating expression that pro-duces the mechanical piece in Figure??. The various processes runconcurrently in a shared memory environment.

3.1 Framework components

Notice that we only handle collections of piecewise-linear boundedand convex sets, i.e. complexes of polytopal cells. Notice also that

Page 3: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

Figure 2: Progressive generation of a CSG object. (a) TheFUZZY cells, to be split at the next step, are shown in light gray. TheIN cells arein dark gray. (b) Progressiveand distributedcomputation, with object fragments distributed on 4 nodes of a cluster. (c) Shaded image of thegenerated object.

the approach isdimension-independent, in the sense that both thedata structures and the operations may accommodate point-sets ofarbitrary dimension. A complex is saidweakwhen the affine hullof a k-face of ad-cell, 1≤ k≤ d, may contain more than onek-face. In particular, we exploit the one-to-one mapping betweenweak polytopald-complexes inEd and BSP trees, where the pres-ence of T-junctions between cells is allowed. In the following, forthe sake of simplicity, we will saymeshinstead than weak polytopald-complex.

A short summary of the main properties of BSP trees may be usefulfor the reader. In particular, each node: (a) is associated to a convexcell; (b) if non-leaf, then contains a hyperplane splitting its cell;(c) if leaf, then contains a label that characterizes its cell as eitherIN, OUT or FUZZY; (d) is defined as the point-set intersection ofthe halfspaces associated to the (unique) path from the node to theroot; (e) equals the point-set union of cells associated to the subtreerooted in it (see Figure??).

b

c

d

e

+-

-

-

-

- +

++

+

c

d

e

b

aa

...

...

......

... ...

-

--

-

-

+

++

+

+

Figure 3: (a) Space partition with hyperplanes. (b) CorrespondingBSP tree.

- f2

- f2

- f1

- f1

+ f2

+ f2

+ f1 +

f1

v2

v5 v1 v2

v3

v3

v1

v4

v5

v4

f

f

h

+c

+c

-c

-c

+ +

+

+ +

= =

=

- -

-

- - -

-

f3 f3

Figure 4: Splitting of ad-cell c with a hyperplaneh, and corre-sponding Hasse diagram.

The basic operation performed by our system is the splitting of amesh cell with a hyperplane (the current BSP-tree node). This op-eration is implemented in avery fast and robust way [?; ?]. Acomplete representation of ad-mesh is given by itsHasse diagram,the directed graph of thecoverrelation of cells, that represents theinclusion betweenk-faces, 0≤ k ≤ d, and hence the structure ofd-polytopes and polytopal meshes.

4 Parallel modeling with BSP2

As said in the Introduction, we have a hybridization of dataflowand BSP parallel models. The first, implementing a multithreadeddataflow of elementary data structures, is used at level of the singlecomputational node, either single- or multi-processor. The second,implementing a distributed computation, is employed to subdividethe load between the nodes of a cluster, with minimum amount ofcommunication, corresponding just to the input distribution and theoutput collection. Any combination of the two approaches is pos-sible, scaling from a grid of supercomputers to a single-processormachine.

At the application level, a third type of parallelism could be ex-ploited, mapping the user geometric environment, abstracted as adirected graph of geometric values (parts and hierarchical assem-blies) linked by functional dependencies, into a distributed com-putation with minimal (close to zero) interprocess communication,by assigning the generating code associated to the various objectsto different master nodes, and by allowing them to redistribute theload on the basis of a top-level partition of the object with BSPhyperplanes.

4.1 Dataflow-based parallelization

A dataflow network made up of four kind of processes is automat-ically set up, depending on the expression DAG of the geometricvalue or, better, of the class of parameterized geometric values, tobe generated. In the following, we summarize the progressive op-erations available in our prototype implementation.

Producers (or builders) are concurrent processes with one out-put stream and no input stream. Each producer is specialized for ei-ther the progressive generation of a type of geometric object, or theimporting of geometric data from external stores, like e.g. for theimporting ofobj files. Other producers generate thed-sphere, thed-cylinder, thed-cuboid, thed-simplex. Producer processes havethe role ofsourceof tokens (BSP nodes) into the dataflow network.

Transformers are concurrent processes with one input streamand one output stream. Typical transformers are affine or perspec-tive transformations, that are applied as suitable matrix product toboth the covectors in aBSPstream and the 0-cells in the associatedmesh-cell stream. Other transformers correspond to unary opera-tions, like the Booleancomplementof a point-set, theextrusionofa d-mesh into a(d+1)-mesh, and theembeddingof a d-mesh intoa d-subspace of a higher-dimensional space. For the inverse oper-ation of projectionof a mesh onto a lower-dimensional space wedo not have yet an efficient progressive solution. This class of pro-cesses includes also the builders ofk-splines, both polynomial and

Page 4: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

Figure 5: BSP2 progressive generation of the 2-circle. Thedataflow refinementbased on progressive splits of convex cells withBSP nodes(hyperplanes) is shown, together with the model partition induced by theBSP subtree closest to the root, to be detailedindependentlyondifferent nodes (pairs processor-memory) of theBSPcomputer. Notice that the load partition may be done in an arbitrary way. TheOUT , INandFUZZY cells are shown white, dark gray and light gray, respectively.

Figure 6: Progressive refinement of a portion of biquadratic rational B-spline through a dataflow of BSP nodes. Notice that each refinementis generated by splitting and is contained within the previous cell. In this case (approximation of a surface with a solid mesh) all of the cellsare eitherOUT or FUZZY, i.e. there are no solid cells.

rational, that may be seen as transformations from ak-index tensorof control points inEd to a more and more refined solid approx-imation of ak-variate manifold inEd, 1≤ k≤ d (see Figure??).We callsolid a mesh ofd-cells inEd. Since solid approximationsare generated, also for surfaces and curves, Boolean operations areallowed between both solids and surfaces, and solids and curves.Finally, integration filtersmay extract the value of volume integralsfrom the cells flowing into the pipeline. In particular, ourinertia fil-ter computes the 4×4 inertia tensor (with volume, first and secondsmoments and products of inertia) using the algorithm for (multidi-mensional) domain integration of polynomials given by [?].

Combiners are concurrent processes with more than one inputstream and a single output stream. Typical combiners are theBoolean operations ofunion, intersection, differenceandsymmet-ric difference. Other combiners are the Cartesianproductof pointsets, actually implemented as a proper embedding of arguments fol-lowed by intersection [?], and the Minkowskisumof a mesh witha polytope, useful to implement common operations, like theoffsetand thesweepingof a mesh. It may be useful to recall that eachprocess maps a continuous stream of refinements of its inputs to acontinuous stream of refinements of its output, so that a large-grainapproximation of the result is obtained from the very first moment,and then continuously updated to more and more fine-grained ap-

proximations (see, e.g., Figure??). The refinement cells generatedindependently on each node of a BSP computer may be collected atthe end of each computation superstep.

Consumers are processes with one input stream and no outputstream. They have the role ofsink nodes in our dataflow network.Consumer processes are used to drive the computation, by sendingcontrol signals upstream the pipeline, and to exploit the resultingmodel. An interactiveviewercomponent is a process of this kind,as well as asimulationprocess using the generated mesh.Exporterprocesses into standard file formats for geometric data, likeVRML ,3DXML STEP or IGES, also belong to this class. An array of con-sumer processes may be attached to the same progressive model,i.e to the same dynamic repository of geometric data. It is purpose-full to havemultipleviewers or computations on the same dataset.

4.2 Space-based parallelization

The Bulk Synchronous Parallelism part of our approach is very sim-ple, and can be subdivided into 5 supersteps, denoted here ascom-pilation, broadcasting, distribution, computation, collection andresolution, respectively. A furtherrenderingsuperstep could beactually added either to the collection or resolution phases, or to

Page 5: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

both, depending on the rendering problem at hand. Using someHigh Throughput Computation system like Condor, some super-steps could be condensed into ajob definition. In this case, thecollection phase is not explicit, being contained in the output blobreturned by a finished job.

Compilation A master machine compiles the symbolic expres-sion of the geometric model to be generated into a multithreadedexecutable code. The generated dataflow graph is a DAG corre-sponding to the reversed expression tree with source nodes associ-ated to builder processes (better: threads), in general with a singlesink node corresponding to the geometric value to be generated.More than one sink is actually possible, if different views or com-putations with different resolutions have to be supported.

Broadcasting The generating code is broadcasted to all thenodes of a cluster. The code can actually be either symbolic orexecutable. In the first case the compiler must be present on allthe nodes of a cluster, to execute the translation locally. A smallercommunication time is clearly expended using the symbolic broad-casting, but this requires a higher level of privileges on all of thenodes. Conversely, a symbolic deployment is mandatory when us-ing a multiplatform cluster or grid. It is interesting to remark thatall of the nodes work using the same program code. The (part of)model generated by the actual execution will depend on the portionof the space partition (BSP-path) actually assigned to the variousnodes.

Distribution The master machine computes a partition of eitherthe spaceEd, which embeds the geometric value to be generated,or a boundingd-box (if known), as a balanced BSP tree withk lev-els, beingk a small number. The embedding space or bounding boxare therefore partitioned into 2k non-overlapping portions, whoseunion is the space or box itself. At this point each one of suchspace-partition portions is distributed to one of then nodes of thecluster, with 2k ≤ n. In particular, each space portion will be repre-sented by a path on the BSP tree from the root to one of the leaves.If the number of portions is higher than the number of computingnodes, the BSP model of computation here described, with explic-itly visible progression of the computation, should be abandoned,in favor of the submission of a queue of independent jobs to someHigh Throughput Computing infrastructure.

Computation The computation phase consists in the parallelprogressive processing of the convex sets, intersection of the half-spaces on the corresponding BSP-tree paths, and previously dis-tributed to every node (see Figure??). The task is performed bythe multithreaded executable code broadcasted at the beginning. Ifthe node is a multiprocessor (typically a SMP computer), then theoperating system may take care of the concurrency, getting the bestperformance from the dataflow model of the progressive compu-tation. In a hierarchical architecture, where some nodes work asmaster for others, a further distribution of the load may arise, withmore splitting of its own convex set by each master, and subsequentdistribution of subregions (updated BSP-tree paths) to its slaves.

Collection The generated refinement cells are collected fromeach node at the end of every computational superstep of the BSPcomputer. Once again, our approach is very simple and natural. Infew words: all the cells arrived at the sink nodes of each dataflowgraph in the time interval from the previous superstep must be trans-ferred to the master node. In a hierarchical architecture each mastermay collect the last generated cells from its slaves, and transfer toits master the cells collected in the previous superstep. A differentmanagement of cells depending on their label is worth of notice.In particular,OUT cells are simply discarded,IN cells are accumu-lated on the master, whereasFUZZY cells are substituted by theINor FUZZY ones they were split into (see Figures??and??).

Resolution The last superstep is executed only on the masternode(s), whereas the progressive computation proceeds on the non-master nodes until the next collection, that can be setup in corre-spondence with the end of the resolution phase on the master(s).The aim of this superstep is to glue together the lastly-collectedrefinements coming from different nodes, and glue them to the pre-viously generatedIN cells, if present. The task is not hard, since(a) the (oldest)(d− 1)-cells of each refinement portion must belinked to the in boundaries they originated from, and (b) adjacentrefinements must be linked on their common ordered boundaries,which are known in advance. A recursive traversal of the commonboundaries of each two adjacent portions will suffice.

Rendering Last but not least, the load distribution to nodes viaBSP-tree paths can be used to produce in a rapid way engineer-ing drawings with hidden surfaces removed, and even to stronglyspeed-up the interactive user-interface with the rendered assemblyof objects fragments, by using standard computer graphics tech-niques. In fact, just properly traversing its own BSP-subtree, eachcomputer node is able to get in linear time a depth-sort of the bound-ary 2-faces of the computed object fragment. Both this ordered setand a raster image of the fragment rendering can be returned tothe master node. The image may be used as a mask to add with apriority index to the master’s frame-buffer, using the standard 2.5Dordered display of group of primitives, like in illustration programs.The subtree associated to the depth-ordering can be attached to theBSP-path assigned to the computer node, and used to produce quickengineering animations. We may add that we used the first tech-nique for generating the images in Figure??.

5 Example and timing

In this section we show some examples of parallel evaluation of thegeometric model of the Leaning Tower of Pisa shown in Figure??.The timing of its evaluation on a cluster of workstations using aBSP model of computation is shown in Table??. The setup for theexperiment is described below.

The model of the tower is generated as a symbolic expression withabout 2600 primitives, i.e. producers of elementary data, most ofwhich are cylinders of different radius. A similar number of affinetransformers and some hundreds of Boolean operations with an av-erage number of arguments ranging from 2 to 12, for a total numberof more than 5,000 threads, is present in the dataflow network. Thehardest operation is the Boolean difference of the internal kernelof the tower with the spiral-stair volume, built as the union of 293parallelepipeds.

The platform for the computation summarized in Tables??–??anddisplayed in Figure?? is a cluster of IBM Intellistations M-ProPentium4 at 3GHz, with 1GB RAM and SuSE Linux. The modelwas split into either 1, 2, 4, 8 or 16 fragments of similar size usinga splitting BSP-tree made of some vertical planes passing for thetower axis and some other planes. Twelve of the 16 fragments ofthe maximal decomposition are shown in Figure?? from differentviewpoints, and assembled with a small translation depending onscaled barycenters.

As it can be seen in Table??, the whole model is computed on a sin-gle node in 5.17 seconds, and in 0.61 seconds when distributed into16 nodes. The degradation with respect to the average node time of0.37 seconds is due to the fact that the computation performed bya node depends on the actual number of non-empty operations onthe owned model fragment, that is not known in advance. The lastrow of the table reports the sum of computation times on the clus-ter. It is possible to see that the decomposition method scales very

Page 6: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

Figure 7: Rendering of fragments of the Leaning Tower of Pisa from different viewpoints. In each image are shown 12 parts of the tower’spartition into 16 fragments.

Table 1: Runtimes (sec) of distributed evaluation of the towermodel.

nodes 1 2 4 8 16max time 5.17 3.42 1.71 0.86 0.61med time 5.17 2.63 1.31 0.66 0.37min time 5.17 1.84 0.93 0.45 0.09time sum 5.17 5.26 5.26 5.3 6.05

well with respect to the number of fragments of the model partition.The communication times are not reported, since they scale linearlywith the output size and also depend linearly on the network and/orfile-system performance.

More interesting is to see how the computation times vary with thenumber of cells in the output mesh, i.e. with the progressive detail-ing of the model. We show in the Appendix the number of bothsolid and boundary cells and the corresponding runtimes for vari-ous mesh resolutions (i.e. at subsequent moments of the progres-sive model generation), and for various configuration of the cluster.Only the computing times are reported, since the communication isnegligible. Some close views of tower portions at various resolu-tions are given in Figure??. Ray-traced images of tower portionsfrom different viewpoints are shown in Figure??.

6 Conclusion

In this paper we have introduced the first (to the authors’ knowl-edge) parallel framework for high-performance geometric and solidmodeling. A double type of parallelism is achieved at both processand data levels, by (a) compiling the generating expression into adataflow network of concurrent threads, and (b) splitting the objectinto fragments to be distributed to nodes and generated indepen-dently. This approach is rooted in symbolic geometric modelingwith a functional design language [?], and is motivated by real-timegeneration of locally adaptive model meshes, to support both shapeand properties design and facilitate the exploration of solution space

and design optimization.

Our prototype system is presently implemented as a multithreadedlibrary written in C and namedXGE, for eXtreme Geometric Envi-ronment [?]. Integration withPLASM design language just started,with the aim of compiling the user functional environment into adistributed dataflow, capable of exploiting the available resourceson-demand. We are proud to observe that all the important tech-niques of geometric and solid modeling are well supported by thistechnology. Also, it allows for automatic generation of models atdifferent levels of detail, even in an adaptive, viewing-driven way.Our next step will concentrate on a close integration of solid andphysical modeling, with the goal of supporting progressive simula-tions and adaptive, simulation-driven refinements of the generatedmesh.

Acknowledgments

We gratefully acknowledge valuable comments and friendly en-couragement from Fausto Bernardini, Antonio Di Carlo and VadimShapiro. Franco Milicchio and Simone Portuesi are working onsimulation and language integration, respectively. We acknowl-edge support from IBM Corp. through a SUR (Shared UniversityResources) Award and PLM Lab donation to Roma Tre, the U.S.Department of Energy by University of California Lawrence Liv-ermore National Laboratory under contract No. W-7405-ENG-48,and the Grant 2001-2004 from Spike Consortium, under MIUR Ital-ian National Project SPI-09.

A Appendix: Runtime and mesh dimension

In Tables??–?? we show the load distribution and the correspond-ing runtimes of the cluster nodes previously described. The tablesare organized as follows: three groups of three columns are ded-icated to data related tolow, mediumand high resolution of theprogressive model of the Leaning Tower. Some images from close

Page 7: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

Figure 8: Close views of the tower at different resolutions, i.e. at different times during the progressive evaluation.

views of each model resolution are given in the columns of Fig-ure??, respectively.

For each resolution of the model we give: (a) the number of 3Dcells in each portion of the mesh; (b) the number of its boundary 2Dcells; (c) the time needed to generate the model portion assigned tothe node. In the last row of each table, for each model resolution,is reported the total number of 3D cells, the total number of 2Dboundary cells and the average node runtime.

We remark that whereas only solid and fuzzy cells are consideredhere, a comparable number of cells in the empty space are actuallygenerated by our computational environment, but are not consid-ered in this experiment, that is finalized to understand the modelgeneration. The empty cells are very useful for applications of in-dustrial automation and motion planning, since the associated BSPtree allows for fast computation of distance field (from the modelboundary) and fast point location.

Page 8: BSP2: Parallel Solid Modeling using Dataflow …scorzell/phd/docs/html/pdf/ics2005.pdfBSP2: Parallel Solid Modeling using Dataflow Binary Space Partition Giorgio Scorzelli∗ Dip

Table 2: Model resolution and runtimes, for number of nodesn = 1

resolutionlow medium high

Node 3-cells 2-cells Sec 3-cells 2-cells Sec 3-cells 2-cells Sec1 22,537 57,003 2.07 69,845 121,648 7.30 151,216 236,318 18.09

Table 3: Model resolution and runtimes, for number of nodesn = 2

resolutionlow medium high

Node 3-cells 2-cells Sec 3-cells 2-cells Sec 3-cells 2-cells Sec1 15,809 36,715 1.35 46,879 76,457 3.89 99,371 147,891 9.432 7,150 21,589 0.64 23,743 47,424 1.98 53,192 92,241 4.02

22,959 58,304 1.00 70,622 123,881 2.94 152,563 240,132 6.73

Table 4: Model resolution and runtimes, for number of nodesn = 4

resolutionlow medium high

Node 3-cells 2-cells Sec 3-cells 2-cells Sec 3-cells 2-cells Sec1 7,952 18,551 0.64 23,490 38,493 1.90 49,751 74,272 3.672 8,017 18,788 0.67 23,591 38,762 1.90 49,894 74,583 3.713 3,626 11,093 0.32 11,956 24,067 1.00 26,719 46,509 2.024 3,524 10,644 0.32 11,787 23,569 0.94 26,473 45,944 1.97

23,119 59,076 0.49 70,824 124,891 1.43 152,837 241,308 2.84

Table 5: Model resolution and runtimes, for number of nodesn = 8

resolutionlow medium high

Node 3-cells 2-cells Sec 3-cells 2-cells Sec 3-cells 2-cells Sec1 1,755 5,314 0.17 5,883 11,788 0.49 13,221 22,971 0.952 4,016 9,563 0.34 11,860 19,692 0.94 25,071 37,737 1.873 1,769 5,397 0.14 5,904 11,880 0.49 13,252 23,072 0.974 4,029 9,395 0.32 11,789 19,353 0.94 24,916 37,210 1.855 1,798 5,532 0.17 5,953 12,023 0.46 13,324 23,232 0.996 4,073 9,530 0.34 11,854 19,515 0.94 25,004 37,402 1.827 1,828 5,629 0.14 6,003 12,144 0.49 13,395 23,377 0.998 3,907 9,191 0.34 11,694 19,261 0.89 24,840 37,234 1.83

23,175 59,551 0.25 70,940 125,656 0.70 153,023 242,235 1.41

Table 6: Model resolution and runtimes, for number of nodesn = 16

resolutionlow medium high

Node 3-cells 2-cells Sec 3-cells 2-cells Sec 3-cells 2-cells Sec1 1,673 5,013 0.17 417 1,218 0.05 12,411 21,215 0.912 148 516 0.06 785 1,870 0.09 1,098 2,515 0.123 2,162 3,952 0.19 1,321 3,096 0.13 10,387 13,062 1.364 2,050 6,217 0.19 1,523 3,663 0.14 15,016 25,605 2.255 1,593 4,863 0.15 4,554 8,725 0.42 11,563 20,155 1.636 258 806 0.05 4,660 9,040 0.39 2,000 3,770 0.267 2,545 4,887 0.20 4,811 9,513 0.40 12,759 17,279 0.868 1,639 4,991 0.19 5,234 7,128 0.37 12,668 21,279 0.979 1,445 4,528 0.14 5,281 10,496 0.45 10,386 18,011 0.7910 430 1,322 0.06 5,558 7,904 0.42 3,251 6,182 0.2611 2,355 4,318 0.18 5,594 10,955 0.45 11,028 14,700 0.7612 1,881 5,718 0.19 5,775 11,006 0.49 14,449 24,003 1.1013 1,380 4,317 0.14 6,332 9,218 0.47 10,113 17,150 0.7814 542 1,645 0.08 6,591 12,436 0.58 3,603 7,179 0.2615 2,760 5,696 0.26 6,847 13,206 0.60 15,214 21,278 1.1016 1,284 3,909 0.15 7,381 11,207 0.55 10,023 17,033 0.77

24,145 62,698 0.15 72,664 130,681 0.37 155,969 250,416 0.89