VADA Lab.SungKyunKwan Univ. 1 Partitioning 1998. 5. 19 조 준 동

SungKyunKwan Univ.

1VADA Lab.

Partitioning

1998. 5. 19

조 준 동

SungKyunKwan Univ.

2VADA Lab.

Partitioning in VLSI CAD• Partitioning is a technique widely used to solve diverse problems occurring in

VLSI CAD. Applications of partitioning can be found in logic synthesis, logic optimization, testing, and layout synthesis.

• High-quality partitioning is critical in high-level synthesis. To be useful, high-level synthesis algorithms should be able to handle very large systems. Typically, designers partition high-level design specifications manually into procedures, each of which is then synthesized individually. However, logic decomposition of the design into procedures may not be appropriate for high-level and logic-level synthesis [60]. Different partitionings of the high-level specifications may produce substantial differences in the resulting IC chip areas and overall system performance.

• Some technology mapping programs use partitioning techniques to map a circuit specified as a network of modules performing simple Boolean operations onto a network composed of specific modules available in an FPGA.

SungKyunKwan Univ.

3VADA Lab.

Partitioning in VLSI CAD• Since the test generation problem for large circuits may be extremely intensive

computationally, circuit partitioning may provide the means to speed it up. Generally, the problem of test pattern generation is NP-complete. To date, all test generation algorithms that guarantee finding a test for a given fault exhibit the worst-case behavior requiring CPU times exponentially increasing with the circuit size. If the circuit can be partitioned into k parts (k not fixed), each of bounded size c, then the worst-case test generation time would be reduced linearly related to the circuit size.

• Partitioning is often utilized in layout synthesis to produce and/or improve the placement of the circuit modules. Partitioning is used to find strongly connected subcircuits in the design, and the resulting information is utilized by some placement algorithms to place in mutual proximity components belonging to such subcircuits, thus minimizing delays and routing lengths.

SungKyunKwan Univ.

4VADA Lab.

Partitioning in VLSI CAD• Another important class of partitioning problems occurs at the system design l

evel. Since IC packages can hold only a limited number of logic components and external terminals, the components must be partitioned into subcircuits small enough to be implemented in the available packages.

• Partitioning has been used as well to estimate some properties of physical IC designs, such as the expected IC area.

SungKyunKwan Univ.

5VADA Lab.

Circuit Partitioning• The early attempts to solve the circuit partitioning problem were based on the representat

ion of the circuit as a graph G = (V,E), where V is a set of nodes (vertices) representing the fundamental components, such as gates, flip-flops, inputs and outputs and E is a set of edges representing nets present in the network. Graph partitioning problems representing VLSI design problems usually involve separating the set of the graph nodes into disjoint subsets while optimizing some objective function defined on the graph vertices and edges. In the partitioned graph, edges can be divided into two classes: inter-subset edges whose vertices belong to different subsets, and intra-subset edges whose vertices belong to the same subset. The objective functions associated with the graph partitioning problems usually treat these classes of edges in different ways.

• One classic graph partitioning problem is the minimum cut (mincut) problem. Its objective is to divide V into two disjoint parts, U and W, such that the number of the inter-subset edges is minimized. The set e(U,W) is referred to as a cut set, and the number of edges in cut set as the cut value.

SungKyunKwan Univ.

6VADA Lab.

Circuit Partitioning

• graph and physical representation

SungKyunKwan Univ.

7VADA Lab.

VHDL exampleprocess communication

control/data flow graph

Behavioral description

SungKyunKwan Univ.

8VADA Lab.

Mincut Partitioning• An exact solution to the mincut problem was provided by Ford and Fulkerson [11], who tran

sformed the mincut problem into the maximum flow (maxflow) problem. The maxflow-mincut algorithm finds a maximum flow in a network; the maxflow value is equal to the mincut value. The first heuristic algorithm for a two-way graph partitioning into equal-sized subsets was proposed by Kernighan and Lin, Their method consists of choosing an initial partition randomly and reducing the cut value by exchanging appropriately selected pairs of nodes from the subsets. After exchanging the positions, nodes are locked in new positions. In subsequent steps, pair of unlocked nodes are selected and exchanged until all nodes are locked. The execution of the algorithm stops, when it riches the local minimum.

• Most nets in digital circuits are multi-point connections among more than two modules (logic gates, flip-flops, etc.). Therefore, modeling VLSI circuit partitioning problems as graph partitioning problems may lead to poor results caused by inadequate representation of multi-point nets which have to be decomposed into two-point connections. One way to approximate circuit partitioning problems is to transform the circuit into a weighted graph G' representation via a net model. For example, a multi-point net connecting n nodes may be modeled as a complete graph (clique) spanned on these nodes, i.e., containing all possible edges among these nodes.

SungKyunKwan Univ.

9VADA Lab.

Clustering (Cont’d)• Clustering based on criterion B below the first cut-line,

then criterion A

• Clustering based on criterion A below the second cut-line, then criterion B

SungKyunKwan Univ.

10VADA Lab.

Clustering Example• Two-cluster Partition

• Three-cluster Partition

SungKyunKwan Univ.

11VADA Lab.

Survey on Partitioning• . We discuss the traditional min-cut and ratio cut bipartitioning formulations along with

multi-way extensions and newer problem formulations, e.g., constraint-driven partitioning (for FPGAs) and partitioning with module replication. Our discussion of solution approaches is divided into four major categories: move-based approaches, geometric representations, combinatorial formulations, and clustering approaches. Move-based algorithms iteratively explore the space of feasible solutions according to a neighborhood operator ; such methods include greed, iterative exchange, simulated annealing, and evolutionary algorithms. Algorithms based on geometric representations embed the circuit netlist in some type of "geometry", e.g, a 1-dimensional linear ordering or a multi-dimensional vector space; the embeddings are commonly constructed using spectral methods. Combinatorial methods transform the partitioning problem into another type of optimization, e.g., based on network flows or mathematical programming. Finally, clustering algorithms merge the netlist modules into many small clusters; we discuss methods which combine clustering with existing algorithms (e.g., two-phase partitioning).

SungKyunKwan Univ.

12VADA Lab.

Survey on Partitioning• F-M partitioning algorithm is perhaps the most widely adopted algorithm, due to the line

ar time complexity, its efficiency and the ease of the implementation. There have been many enchancement of the algorithm proposed in the past. Both Khrishnamurthy and Ng, et. al., have reported that the quality of the solutions yielded by the F-M algorithm is very erratic for circuit partitioning. Subsequently, Krishnamurthy amended the Fiduccia-Mattheyses implementation with a look-ahead technique which considerably improved the average performance. Sanchis extended their work to partition hypergraphs into k partitions. Sechen proposed new improved objective function for mincut circuit partitioning, based on the statistical model, which estimate the expected number of net crossings of the cutline. There have been many improvements of F-M algorithm published, which utilized other techniques such as clustering, replication and other improvement scheme of the basic F-M heuristic.

SungKyunKwan Univ.

13VADA Lab.

Survey on Partitioning• An important class of partitioning approaches consists of so-called constructive methods,

where methods that are based on graph spectra received the most attention to date. They use eigenvalues and eigenvectors of matrices derived from the netlist graph. Early theoretical work by Barnes, Donath, and Hoffman established relationship between the spectral properties and the partitioning properties of graph. More recently, eigenvector and eigenvalue methods have been used for both component placement and graph minimum-width bisection. Hadley et al. used an eigenvector approach for obtaining good initial partitions of the netlist as a starting solution for iterative improvement algorithm, which was used afterwards. Hagen and Kahng applied eigenvector decomposition of graph for solving ratio-cut partitioning problem. They found that the second smallest eigenvalue of matrix representation of graph yields a lower bound on the optimal ratio cut cost. The most recent work from Alpert et al. and Chan et. al showed that more extensive eigenvector computation leads to better partitioning results. Other approaches to constructive partitioning approaches are based on placement techniques, vertex orderings and clustering, dynamic and boolean programming and geometric embeddings.

SungKyunKwan Univ.

14VADA Lab.

Complexity of Partitioning

In general, computing the optimal partitioning is an NP-complete problem, which means that the best known algorithms take time which is an exponential function of n=|N| and p, and it is widely believed that no algorithm whose running time is a polynomial function of n=|N| and p exists (see ``Computers and Intractability'', M. Garey and D. Johnson, W. H. Freeman, 1979, for details.) Therefore we need to use heuristics to get approximate solutions for problems where n is large. The picture below illustrates a larger graph partitioning problem; it was generated using the spectral partitioning algorithm as implemented in the graph partitioning software by Gilbert et al, described below. The partition is N = Nblue U Nbl

ack, with red edges connecting nodes in the two partitions.

SungKyunKwan Univ.

15VADA Lab.

Chaco• Before a calculation can be performed on a parallel computer, it must first be decompose

d into tasks which are assigned to different processors. Efficient use of the machine requires that each processor have about the same amount of work to do and that the quantity of interprocessor communication is kept small.

• Partitioning G means dividing N into the union of P disjoint pieces

N = N1 U N2 U ... U NP, where the nodes (jobs) in Ni are assigned to be done by processor Pi. This partitioning is done subject to the optimality conditions below.

– 1.The sums of the weights Wn of the nodes n in each Ni is approximately equal. This means the load is approximately balanced across processors.

– 2.The sum of the weights We of edges connecting nodes in different Ni and Nj should be minimized. This means that the total size of all messages communicated between different processors is minimized.

• Chaco is used at over 150 institutions for parallel computing, sparse matrix reordering, circuit placement and a range of other applications. More information about Chaco can be found at [email protected]

SungKyunKwan Univ.

16VADA Lab.

Chaco• A good solution to the graph partitioning

problem assigns nodes to processors so that

• 1.The sums of node weights are approximately equal for each processor. This means that each processor has an equal amount of floating point work to do, the the problem is load balanced.

• 2.As few edges cross processor boundaries as possible. This minimizes communication, since each crossing edge Ai,j means that xj must be sent to the processor owning xi.

• The figure below illustrates such a partitioning onto 4 processors (colored blue, red, green and magenta). Crossing edges, which require communication, are colored black, and noncrossing edges, which require no communication, have the same color as the processor.

SungKyunKwan Univ.

17VADA Lab.

Chaco• Given n nodes and p processors, there

are exponentially many ways to assign n nodes to p processors, some of which more nearly satisfy the optimality conditions than others. To illustrate, the following figure shows two partitions of a graph with 8 nodes onto 4 processors, with 2 nodes per processor. The partitioning on the left has 6 edges crossing processor boundaries and so is superior to the partitioning on the right, with 10 edges crossing processor boundaries. The reader is invited to find another 6-edge-crossing partition, and show that no fewer edge crossings

suffice.

SungKyunKwan Univ.

18VADA Lab.

Edge Separator and Vertex Separator

Bisecting a graph G=(N,E) can be done in twoways. In the last section, we discussed finding thesmallest subset Es of E such that removing Esfrom E divided G into two disconnected subgraph

sG1 and G2, with nodes N1 and N2 respectively,where N1 U N2 = N and N1 and N2 are disjointand equally large. (If the number of nodes is odd,we obviously cannot make |N1|=|N2|. So we willcall Es an edge separator if |N1| and |N2| aresufficiently close; we will be more explicit abouthow different |N1| and |N2| can be only whennecessary.) The edges in Es connect nodes in N1to nodes in N2. Since removing Es disconnects G,Es is called an edge separator. The other way tobisect a graph is to find a vertex separator, asubset Ns of N, such that removing Ns and allincident edges from G also results in twodisconnected subgraphs G1 and G2 of G. In otherwords N = N1 U Ns U N2, where all three subsetsof N are disjoint, N1 and N2 are equally large, andno edges connect N1 and N2.

The following figure illustrates these ideas. The

green edges, Es1, form an edge separator, as well

as the blue edges Es2. The red nodes, Ns, are a

vertex separator, since removing them and the

indicident edges (Es1, Es2, and the purple edges),

leaves two disjoint subgraphs.

Theorem. (Tarjan, Lipton, "A separator theorem for planar graphs", SIAM J. Appl. Math., 36:177-189, April 1979). Let G=(N,E) be an planar graph. Then we can find a vertex separator Ns, so that N = N1 U Ns U N2 is a disjoint partition of N, |N1| <= (2/3)*|N|, |N2| <= (2/3)*|N|, and |Ns| <= sqrt(8*|N|).

SungKyunKwan Univ.

19VADA Lab.

Inertial Partitioning1. Choose a straight line L, given by

a*(x-xbar)+b(y-ybar) = 0.

This is a straight line through (xbar,ybar), with slope -a/b. We assume without loss of generality that

a2 + b2 = 1.

2.For each node ni=(xi,yi), compute a coordinate by computing the dot-product Si = -b*(xi-xbar) + a*(yi-ybar). Si is distance from (xbar,ybar) of the projection of (xi,yi) onto the line L.

3.Find the median value Sbar of the Si's. 4.Let the nodes (xi,yi) satisfying Si <= Sbar be in partition N1, and the nodes where Si > Sbar be in partition N2.

SungKyunKwan Univ.

20VADA Lab.

Inertial Partitioning In mathematical terms, we want to pick a line such that the sum of squares of lengths of the green

lines in the figure are minimized; this is also called doing a total least squares fit of a line to the nodes.

In physical terms, if we think of the nodes as unit masses, we choose (x,y) to be the axis about which the

moment of inertia of the nodes is minimized. This is why the method is called inertial partitioning. This

means choosing a, b, xbar and ybar so that a2 + b2 = 1, and the following quantity is minimized:

sumi=1,...,|N| (length of i-th green line)2 = sumi=1,...,|N| ((xi-xbar)2 + (yi-ybar)2 -

(-b*(xi-xbar) + a*(yi-ybar))2 )

... by the Pythagorean theorem

= a2 * ( sumi=1,...,|N| (xi-xbar)2 ) + b2 * ( sumi=1,...,|N| (yi-ybar)2 )

+ 2*a*b * ( sumi=1,...,|N| (xi-xbar)*(yi-ybar) ) = a2 * X2 + b2 * Y2 + 2*a*b * XY

= [ a b ] * [ X2 XY ] * [ a ] = [ a b ] * M * [ a ]

[ XY Y2 ] [ b ] [ b ]

where X2, Y2 and XY are the summations in the previous lines. One can show that an answer is to choose xbar = sumi=1,...,|N| xi / |N|, ybar = sumi=1,...,|N| yi / |N|, i.e. (xbar,ybar) is the "center of mass" of the nodes, and (a,b) is the unit eigevector corresponding to the smallest eigenvalue of the matrix M.

SungKyunKwan Univ.

21VADA Lab.

Partitioning on Planar graph• A very simple partitioning algorithm is based o

n breadth first search (BFS) of a graph. It is reasonably effective on planar graphs, and probably does well on overlap graphs as defined above. Given a connected graph G=(N,E) and a distinguished node r in N we will call the root, breadth first search produces a subgraph T of G (with the same nodes and a subset of the edges), where T is a tree with root r. In addition, it associates a level with each node n, which is the number of edges on the path from r to n in T. The implementation requires a data structure called a Queue, or a First-In-First-Out (FIFO) list. It will contain a list of objects to be processed. There are two operations one can perform on a Queue. Enqueue(x) adds an object x to the left end of the Queue. y=Dequeue() removes the rightmost entry of the Queue and returns it in y. In other words, if x1, x2, ..., xk are Enqueued on the Queue in that order, then k consecutive Dequeue operations (possibly interleaved with the Enqueue operations) will return x1, x2, ... , xk.

•

NT = {(r,0)} ... Initially T is just the root r, ... which is at level 0 ET = empty set ... T = (NT, ET) at each stage of the algorithm Enqueue((r,0)) ... Queue is a list of nodes to be processed Mark r ... Mark the root r as having been processed While the Queue is nonempty ... While nodes remain to be processed (n,level) = Dequeue() ... Get a node to process For all unmarked children c of n NT = NT U (c,level+1) ... Add child c to the list of nodes NT of T ET = ET U (n,c) ... Add the edge (n,c) to the edge list ET of T Enqueue((c,level+1)) ... Add child c to Queue for later processing Mark c ... Mark c as having been visited End for End while

SungKyunKwan Univ.

22VADA Lab.

Breadth First Search• Partitioning the graph into nodes at level L

or lower, and nodes at level L+1 or higher, guarantees that only tree and interlevel edges will be cut. There can be no "extra" edges connecting, say, the root to the leaves of the tree. This is illustrated in the above figure, where the 10 nodes above the dotted blue line are assigned to partition N1, and the 10 nodes below the line as assigned to N2.

• For example, suppose one had an n-by-n mesh with unit distance between nodes. Choose any node r as root from which to build a BFS tree. Then the nodes at level L and above approximately form a diamond centered at r with a diagonal of length 2*L. This is shown below, where nodes are visited counterclockwise starting with the north.

SungKyunKwan Univ.

23VADA Lab.

Kernighan and Lin Algorithm• B. Kernighan and S. Lin ("An effective heuristic p

rocedure for partitioning graphs", The Bell System Technial Journal, pp. 291--308, Feb 1970), which takes O(|N|3) time per iteration. A more complicated and efficient implementation, which takes only O(|E|) time per iteration, was presented by C. Fiduccia and R. Mattheyses, "A linear-time heuristic for improving network partitions", Technical Report 82CRD130, General Electric Co., Corporate Research and Development Ceter, Schenectady, NY 1982.

• We start with an edge weighted graph G=(N,E,WE), and a partitioning G = A U B into equal parts: |A| = |B|. Let w(e) = w(i,j) be the weight of edge e=(i,j), where the weight is 0 if no edge e=(i,j) exists. The goal is to find equal-sized subsets X in A and Y in B, such that exchanging X and Y reduces the total cost of edges from A to B. More precisely, we let T = sum[ a in A and b in B ] w(a,b) = cost of edges from A to B and seek X and Y such that new_A = A - X U Y and new_B = B - Y U X has a lower cost new_T. To compute new_T efficiently, we introduce:

E(a) = external cost of a = sum[ b in B ] w(a,b)I(a) = internal cost of a = sum[ a' in A, a'!=a]w(a,a') D(a) = cost of a = E(a) - I(a) and analogously E(b) = external cost of b = sum[ a in A ] w(a,b)I(b) = internal cost of b = sum[ b' in B, b' !=b]w(b,b')D(b) = cost of b = E(b) - I(b)Then it is easy to show that swapping a in A and b inB changes T to new_T = T - ( D(a) + D(b) -2*w(a,b) ) = T - gain(a,b)In other words, gain(a,b) = D(a)+D(b)-2*w(a,b) measures the improvement in the partitioning by swapping a and b. D(a') and D(b') also change to new_D(a') = D(a') + 2*w(a',a) - 2*w(a',b) for all a' in A, a' !=a new_D(b') = D(b') + 2*w(b',b) - 2*w(b',a) for all b' in B, b' != b

SungKyunKwan Univ.

24VADA Lab.

SungKyunKwan Univ.

25VADA Lab.

Kernighan and Lin Algorithm

(0) Compute T = cost of partition N = A U B ... cost = O(|N|2) Repeat(1) Compute costs D(n) for all n in N ... cost = O(|N|2)(2) Unmark all nodes in G ... cost = O(|N|)(3) While there are unmarked nodes ... |N|/2 iterations(3.1) Find an unmarked pair (a,b) maximizing g

ain(a,b) ... cost = O(|N|2)(3.2) Mark a and b (but do not swap them) ... cost = O(1)(3.3) Update D(n) for all unmarked n, as though

a and b had been swapped ... cost = O(|N|) End while

... At this point, we have computed a sequence of pairs ... (a1,b1), ... , (ak,bk) and ... gains gain(1), ..., gain(k) ... where k = |N|/2, ordered by the order in which ... we marked them(4) Pick j maximizing Gain = sumi=1...j gain(i) ... Gain is the reduction in cost from swapping ... (a1,b1),...,(aj,bj)(5) If Gain > 0 then(5.2) Update A = A - {a1,...,ak} U {b1,...,bk} ... cost = O(|N|)(5.2) Update B = B - {b1,...,bk} U {a1,...,ak} ... cost = O(|N|)(5.3) Update T = T - Gain ... cost = O(1) End if Until Gain <= 0

SungKyunKwan Univ.

26VADA Lab.

Spectral Partitioning• This is a powerful but expensive technique,

based on techniques introduced by Fiedler in the 1970s, but popularized in 1990 by A.

• Pothen, H. Simon, and K.-P. Liou, "Partitioning sparse matrices with eigenvectors of graphs", SIAM J. Matrix Anal. Appl., 11:430--452. We will first describe the algorithm, and then give three related justifications for its efficacy. Let G=(N,E) be an undirected, unweighted graph without self edges (i,i) or multiple edges from one node to another. We define two matrices related to this graph.

• Definition The incidence matrix In(G) of G is an |N|-by-|E| matrix, with one row for each node and one column for each edge.

• Suppose edge e=(i,j). Then column e of In(G) is zero except for the the i-th and j-th entries, which are +1 and -1, respectively.

Note that there is some ambiguity in this definition, since G is undirected; writing edge e=(i,j) instead of (j,i) is equivalent to multiplyingcolumn e of In(G) by -1. We will see that this ambiguity will not be important to us.

Definition The Laplacian matrix L(G) of G is an |N|-by-|N| symmetric matrix, with one row and column for each node. It is defined as follows. (L(G))(i,j) = degree of node i if i=j (number of incident edges) = -1 if i!=j and there is an edge (i,j)

SungKyunKwan Univ.

27VADA Lab.

Spatial Locality: Hardware Partitioning

• The interface logic should be properly partitioned for area and timing reasons. Minimization of global busses leads to lower bus capacitance, and thus lower interconnect power.

• Signal values within the clusters tend to be more highly correlated.

• Data path should be partitioned into approximately equal size.

• In the DSP area, data paths tens to occupy far more area than the control paths.

• Wiring is still one of the domain area consumers

• The method used to identify clusters is based on the eigenvalues and eigenvectors of the Laplacian of the graph.

• The eigen vector corresponding to the second smallest eigen value provides a 1-D placement of the nodes which minimizes the mean-squared connection length.

SungKyunKwan Univ.

28VADA Lab.

Spectral Partitioning in VLSI placement

SungKyunKwan Univ.

29VADA Lab.

Spectral Partitioning in VLSI placement

• Setting the derivative of the Lagrangian, L, to zero gives:

• The solution to the above equation are those is the eigenvalue and x is the corresponding eigenvector.

• The smallest eigenvalue 0 gives a trivial solution with all nodes at the same point. The eigenvector corresponding to the second smallest eigenvalue minimizes the cost function while giving a non-trivial solution

0)( xIQ

SungKyunKwan Univ.

30VADA Lab.

Key Ideas in Spectral Partitioning

SungKyunKwan Univ.

31VADA Lab.

Spectral Partitioning

SungKyunKwan Univ.

32VADA Lab.

Spectral Partitioning norm(In(G)'*v)2 lambda = ------------------ norm(v)2 where norm(z)2 = sumi z(i)2

= sum{all edges e=(i,j)} (v(i)-v(j))2

---------------------------------- sumi v(i)2

5. The eigenvalues of L(G) are nonnegative:

0 <= lambda1 <= lambda2 <= ... <= lambdan

6.The number of of connected components of G is equal to the number of lambdai) equal to 0.

In particular, lambda2 != 0 if and only if G is connected.

The following theorem state some important facts about In(G) and L(G). It introduces us to the idea that the eigenvalues and eigen vectors of L(G) are related to the connectivity of G. Theorem 1. Given a graph G, its associated matrices In(G) and L(G) have the following properties.

1.L(G) is a symmetric matrix. This means the eigenvalues of L(G) are real, and its eigenvectors are real and orthogonal. 2.Let e=[1,...,1]', where ' means transpose, i.e. the column vector of all ones. Then L(G)*e = 0. 3.In(G)*(In(G))' = L(G). This is independent of the signs chosen in each column of In(G). 4.Suppose L(G)*v = lambda*v, where v is nonzero. Then

SungKyunKwan Univ.

33VADA Lab.

Spectral Partitioning Compute the eigenvector v2 corresponding to lambda2 of L(G) for each node n of G if v2(n) < 0 put node n in partition N- else put node n in partition N+ endif endforFirst we show that this partition is at least re

asonable, because it tends to give connected components N- and N+:

Theorem 2. (M. Fiedler, "A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory", Czech.Math. J. 25:619--637, 1975.) Let G be connected, and N- and N+ be defined by the above algorithm. Then N- is connected. If no v2(n) = 0, N+ is also connected.

There are a number of reasons lambda2 is called the algebraic connectivity. Here is another. Theorem 3. (Fiedler). Let G=(N,E) be a graph,and G1=(N,E1) a subgraph, i.e. with the samenodes and subset of the edges, so that G1 is "lessconnected" than G. Then lambda2(L(G1)) <=lambda2(L(G)), i.e. the algebraic connectivity ofG1 is also less than or equal to the algebraicconnectivity of G. Motivation for spectral bisection, by analogy with

a vibrating string

How does a taut string vibrate when it is plucked?From our background in either physics or music,we know that it has certain modes of vibration orharmonics. If we were to take snapshots of thesemodes, they would look like this:

SungKyunKwan Univ.

34VADA Lab.

Spectral Partitioning

SungKyunKwan Univ.

35VADA Lab.

Multilevel Kernighan-LinGc is computed in step (1) ofRecursive_partition as follows. We define amatching of a graph G=(N,E) as a subsetEm of the edges. E with the property that notwo edges in Em share an endpoint. Amaximal matching is one to which no moreedges can be added and remain a matching.We can compute a maximal matching by asimple random algorithm:

let Em be empty mark all nodes in N as unmatched for i = 1 to |N| ... visit the nodes in a rando

m order if node i has not been matched, choose an edge e=(i,j) where j is also u

nmatched, and add it to Em mark i and j as matched end if end for

Given a matching, Gc is computed as follows.We let there be a node r in Nc for each edge inEm. Then we construct Ec as follows:

for r = 1 to |Em| ... for each node in Nc let (i,j) be the edge in Em corresponding to no

de r for each other edge e=(i,k) in E incident on i let ek be the edge in Em incident on k, and let rk be the corresponding node in Nc add the edge (r,rk) to Ec end for for each other edge e=(j,k) in E incident on j let ek be the edge in Em incident on k, and let rk be the corresponding node in Nc add the edge (r,rk) to Ec end for end for if there are multiple edges between pairs of nodes of Nc, collapse them into single edges

SungKyunKwan Univ.

36VADA Lab.

Multilevel Kernighan-LinNote that we can take node weights into

account by letting the weight of a node (i,j)

in Nc be the sum of the weights of the

nodes I and j. We can similarly take edge

weights into account by letting the weight

of an edge in Ec be the sum of the weights

of the edges "collapsed" into it.

Furthermore, we can choose the edge (i,j)

which matches j to i in the construction of

Nc above to have the large weight of all

edges incident on i; this will tend to

minimize the weights of the cut edges. This

is called heavy edge matching in METIS,

and is illustrated on the right.

SungKyunKwan Univ.

37VADA Lab.

Multilevel Kernighan-LinGiven a partition (Nc+,Nc-) from step

(2) of Recursive_partition, it is easily expanded to a partition (N+,N-) in step (3) by associating

with each node in Nc+ or Nc- the nodes of N that comprise it. This is again shown below:

Finally, in step (4) of Recurive_partition, the approximate partition from step (3) is improved using a variation of Kernighan-Lin.

SungKyunKwan Univ.

38VADA Lab.

Multilevel Spectral PartitioningThere is a simple "greedy" algorithm forfinding an Nc: Nc = empty set for i = 1 to |N| if node i is not adjacent to any node alre

ady in Nc add i to Nc end if end forThis is shown below in the case where G issimply a chain of 9 nodes with nearestneighbor connections, in which case Ncconsists simply of every other node of N.

Now we turn to the divide-and-conqueralgorithm of Barnard and Simon, which isbased on spectral partitioning rather thanKernighan-Lin. The expensive part ofspectral bisection is finding the eigenvectorv2, which requires a possibly large numberof matrix-vector multiplications with theLaplacian matrix L(G) of the graph G. Thedivide-and-conquer approach ofRecursive_partition will dramaticallydecrease the cost. Barnard and Simonperform step (1) of Recursive_partition,computing Gc = (Nc,Ec) from G=(N,E),slightly differently than above: They find amaximal independent subset Nc of N. Thismeans that N contains Nc and E containsEc, no nodes in Nc are directly connectedby edges in E (independence), and Nc is aslarge as possible (maximality).

SungKyunKwan Univ.

39VADA Lab.

hMETIS• hMETIS is a set of programs for partitioning hypergraphs such as those corres

ponding to VLSI circuits. The algorithms implemented by hMETIS are based on the multilevel hypergraph partitioning scheme described in [KAKS97].

• hMETIS produces bisections that cut 10% to 300% fewer hyperedges than those cut by other popular algorithms such as PARABOLI, PROP, and CLIP-PROP, especially for circuits with over 100,000 cells, and circuits with non-unit cell areaIt is extremely fast!A single run of hMETIS is faster than a single run of simpler schemes such as FM, KL, or CLIP. Furthermore, because of its very good average cut characteristics, it produces high quality partitionings in significantly fewer runs. It can bisect circuits with over 100,000 vertices in a couple of minutes on Pentium-class workstations.

• The performance of hMETIS on the new ISPD98 benchmark suite can be found in the paper by Chuck Alpert.

http://www.users.cs.umn.edu/~karypis/metis/metis.html

SungKyunKwan Univ.

40VADA Lab.

How good is Recursive Bisection?• Horst D. Simon and Shang-Hua Teng , Report RNR-93-012, August 1993

• The most commonly used p-way partitioning method is recursive bisection. It first "optimally" divides the graph (mesh) into two equal sized pieces and then recursively divides the two pieces.We show that,due to the greedy nature and the lack of global information,recursive bisection, in the worst case,may produce a partition that is very far from the optimal one. Our negative result is complemented by two positive ones.First, we show that for some important classes of graphs that occur in practical applications,such as well shaped finite element and finite difference meshes,recursive bisection is normally within a constant factor of the optimal one. Secondly,we show that if the balanced condition is relaxed so that each block in the partition is bounded by (1+e)n/p,then there exists a approximately balanced recursive partitioning scheme that finds a partition whose cost is within an 0(log p) factor of the cost of the optimal p-way partition.

SungKyunKwan Univ.

41VADA Lab.

Partitioning Algorithm with Multiple Constraints

1998. 5. 19

조 준 동

SungKyunKwan Univ.

42VADA Lab.

Partitioning with pin and area constraints

회로가 그래프 G(V,E) 로 표현될 때 , V 는 n 개의 노드를 갖는 전체 노드의 집합으로 V = v_1 , v_2 , …, v_n 이며 각 노드는 면적 a_i 를 갖는다 . 간선 e_ij 는 노드 v_i 와 v_j 를 연결한다 . E 는 전체 노드간의 간선들의 집합이다 . 그래프 분할은 전체 노드의 집합을 서로 겹치지 않는 k 개의 블록 V1,V2 ... ,Vk 으로 나누는 것이다 . 이때 각 블럭들은 각각의 면적 A1, A2, ... ,Ak 및 각각의 블록의 핀 개수인 P1,P2, …, Pk 를 가지고 있다 . 각각의 블럭은 면적과 핀을 비롯한 여러 가지 제약조건들을 가지고 있다 . 각 블록이 가질 수 있는 최대 면적은 A_upper 이고 최소 면적은 A_lower, 최대 핀의 개수는 P_upper 이다 . 또 C_ij 는 블록 Vi 와 Vj 사이를 연결하는 간선들의 가중치의 합이다 . 분할 결과는 이러한 제약조건들을 만족시키면서 각 블록들간을 연결하는 간선의 가중치가 적어지도록 만드는 것이다 . k 개의 부그래프의 집합을 K 라고 할 때 , 분할은 제약조건들을 만족시키며 다음의 목적함수를 최소화시키는 최적의 매핑 Γ:V-> K 를 찾는 것이다 .

kiPPAAA

jiCW

upperiupperilower

k

i

k

jij

1

)(

,,

1 1

SungKyunKwan Univ.

43VADA Lab.

스위칭에 의한 충전과 방전• 전체 전력소모의 최대 90% 까지 차지

PMOSpull-upnetwork

NMOSpull-upnetwork

V dd

short circuit + leakage

charge

discharge

C L

SungKyunKwan Univ.

44VADA Lab.

저전력을 위한 분할• 기존의 방법 : cut 을 지나가는 간선의 수• 저전력 : 간선의 스위칭 동작의 수

0.25

0.25

0.25

0.25

0.750.75

( a ) cut ÀÇ ¼ö·Î ÀÚ¸§ ( b ) ½ºÀ§Äª µ¿ÀÛÀÇ ¼ö·Î ÀÚ¸§

SungKyunKwan Univ.

45VADA Lab.

최소비용흐름 알고리즘• 주어진 양을 가장 적은 비용으로 원하는 목적지까지 보낼수 있는

방법– 각 통로는 용량과 비용을 가짐

• Max-flow min-cut : 간선의 수만 고려• Min-Cost flow : 간선마다 스위칭 동작의 가중치를 부여

– 비용 : 스위칭 동작 vs. 간선의 수 – 용량 : 간선에 흐를 수 있는 최대양

• 비용이 적을수록 선택되도록 큰 용량

W S Ci i i ( )1

SungKyunKwan Univ.

46VADA Lab.

Network and Mincost Flow

10 / 1001 / 5

20 / 10

10 / 35

15 / 30

10 / 35

10 / 100

15 / 30

45 / 55

23 / 11

100 / 10

30 / 24

1 / 10

3 / 56 / 100

100 / 10

100 / 10

45 / 55

23 / 11

7 / 80

SungKyunKwan Univ.

47VADA Lab.

그래프 변환 알고리즘• Min-Cost Flow

경로를 찾음• Cut 을 찾기 위해서 그래프의 변환이

필요• 레벨에 따른 topolo

gical 정렬Level 1

Level 5

Level 4

Level 3

Level 2

SungKyunKwan Univ.

48VADA Lab.

그래프 변환 알고리즘• 추가된 노드 및 간선

Level ( i )

Level ( i+1 )

»õ·Î »ý¼ºµÈ °£¼±

±âÁ¸ÀÇ °£¼±

»õ·Î »ý¼ºµÈ ³ëµå

±âÁ¸ÀÇ ³ëµå

Source Sink

SungKyunKwan Univ.

49VADA Lab.

그래프 변환

Level 1

Level 5

Level 4

Level 3

Level 2

sinkSource

S T

SungKyunKwan Univ.

50VADA Lab.

AlgorithmInput: Flow f, Network

Output: Partition the network into f subnetworks

단계 1:

그래프에 Flow 를 push 하여 최소비용흐름 알고리즘 수행 ;

만약 각각의 partition 에 대하여 A_upper 또는 P_upper 를 만족하면 마침 ;

그렇지않으면 f = f+1; 증가시키고 upper bound 를 만족할 때까지 단계 1 을 반복한다 .

단계 2:

만약 A_lower 또는 P_lower 를 만족하지 않는두개의 partition p, q 가 있고

upperqplower

upperqplower

PPPP

AAAA

라면 p 와 q 는 merge 가 가능하고 모든 가능한 {p,q} set 에 대하여 최소비용매칭을 적용하여 분할된 partition 의 개수를 줄임 .

SungKyunKwan Univ.

51VADA Lab.

참고문헌[1] J.D.Cho and P.D.Franzon, "High-Performance Design Automation for Multi-Chip Modules and Packages", World

Scientific Pub. Co. 1996[2] H.J.M.Veendrick, "Short-Circuit Dessipation of Static CMOS Circuitry and its Impact on the Design of Buffer Cir

cuits" IEEE JSSCC, pp.468-473, August, 1984[3] H.B.Bakoglu, "Circuits, Interconnections and Packaging for VLSI", pp.81-112, Addison-Wesley Publishing Co.,

1990[4] K.M.hall. "An r-dimensional quadratic placement algorithm", Management Sci., vol.17, pp.219-229, Nov, 197

0[5] Cadence Design Systems. "A Vision for Multi-Chip Module design in the nineties", Tech. Rep. Cadence Design

Systems Inc., Santa Clara, CA, 1993[6] R.Raghavan, J.Cohoon, and S.Shani. "Single Bend Wiring", Journal of Algorithms, 7(2):232-257, June, 1986 [7] Kernighan, B.W. and S.lin. "An efficient heuristic procedure to partition graphs" Bell System Technical Journal,

492:291-307, Feb. 1970[8] Wei, Y.C. and C.K.Cheng "Ratio-Cut Partitioning for Hierachical Designs", IEEE Trans. on Computer-Aided Desi

gn. 40(7):911-921, 1991[9] S.W.Hadley, B.L.Mark, and A.Vanelli, "An Efficient Eigenvector Approach for Finding Netlist Partitions", IEEE Tr

ans. on Computer-Aided Design, vol. CAD-11, pp.85-892, July, 1992[10] L.R.Fold, Jr. and D.R.Fulkerson. "Flows in Networks", Princeton University Press, Princeton, NJ, 1962[11] Liu H. and D.F.Wong, "Network Flow Based Multi-Way Partitioning With Area and Pin Constraints", IEEE/ACM

Symposium on Physical Design, pp. 12-17, 1997[12] Kirkpatrick, S. Jr., C.Gelatt, and M.Vecchi. "Optimization by simulated annealing", Science, 220(4598):498-

516, May, 1983[13] Pedram, M. "Power Minimization in IC Design: Principles and Applications," ACM Trans. on Design Automatio

n of Electronics Systems, 1(1), Jan. pp. 3-56, 1996. [14] A.H.Farrahi and M.Sarrafzadeh. "FPGA Technology Mapping for Power Minimizatioin", In International Worksh

op on Field-Programmable Logic and Applications, pp66-77, Sep. 1994[15] M.A.Breur, "Min-Cut Placement", J.Design Automation and Fault-Tolerant Computing, pp.343-382, Oct. 197

7

SungKyunKwan Univ.

52VADA Lab.

[16] M.Hanan and M.J.Kutrzberg. A Review of the Placement and the Quadratic Assignment Problem, Apr. 1072.[17] N.R.Quinn, "The Placement Problem as Viewed from the Physics of Classical Mechanics", Proc. of the 12th Design Automation Conference, pp.173-178, 1975[18] C.Sehen, and A.Sangiovanni-Vincentelli, "The Timber Wolf placement and routing package", IEEE Journal of Solid-State Circuits, Sc-20, pp.501-522, 1985[19] K.Shahookar, and P.Mazumder, "A Genetic Approach to Standard Cell Placement", First European Design Automation Conference, Mar. 1990[20] J.D.Cho, S.Raje, M.Sarrafzadeh, M.Sriram, and S.M.Kang, "Crosstalk Minimum Layer Assignment", In Proc. IEEE Custom Integr. Circuits Conf., San Diego, CA, pp.29.7.1-29.7.4, 1993[21] J.M.Ho, M.Sarrafzadeh, G,Vijayan, and C.K.Wong. "Layer Assignment for Multi-Chip Modules", IEEE Trans. on Computer-Aided Design, CAD-9(12):1272-1277, Dec., 1991[22] G.Devaraj. "Distributed placement and crosstalk driven router for multichip modules", In MS Thesis, Univ. of Cincinnati, 1994[23] J.D.Cho. "Min-Cost Flow based Minimum-Cost Rectilinear Steiner Distance-Preserving Tree", International Symposium on Physical Desigh, pp-82-87, 1997[24] A.Vitttal and M.Marek-Sadowska. "Minimal Delay Interconnection Design using Alphabetic Trees", In Design Automation Conference, pp.392-396, 1994[25] M.C.Golumbic. "Algorithmic Graph Theory and Perfect Graph", pp.80-103, New York : Academic. 1980[26] R.Vemuri. "Genetic Algorithms for partitioning, placement, and layer assignment for multichip modules", Ph.D. Thesis, Univ. of Cincinnati, 1994[27] J.L.Kennington and R.V.Helgason, "Algorithms for Network Programmin", John Wiley, 1980[28] J.Y.Cho and J.D.Cho "Improving Performance and Routability Estimation in MCM Placement", In InterPack'97, Hawaii, June, 1997[29] J.Y.Cho and J.D.Cho "Partitioning for Low Power Using Min-Cost Flow Algorithm", submitted to 한국반도체학술대회 , Feb, 1998

Documents

VADA Lab.SungKyunKwan Univ. 1 Partitioning 1998. 5. 19 조 준 동