Software packages incorporating the multilevel paradigm

Partitioning and Placement

4.3 Partitioning algorithms used in circuit design

4.3.4 Software packages incorporating the multilevel paradigm

Inspired by the success of multigrid methods used in other research areas, the multi-level paradigm for graph partitioning was introduced by Barnard et al. [45]. The first multilevel implementation was a recursive spectral bipartitioning algorithm designed for partitioning large unstructured meshes for distributed-memory architectures. The first attempt was later followed by two famous software packages, Chaco [46,47] and Metis [48], which made the multilevel approach a quasi standard in graph partition-ing. Although the two solutions differed in details, they shared the multilevel approach and produced competitive results. Over the years, both software packages was fol-lowed by new editions, and also the group of multilevel partitioners was extended with new software packages (e.g. Scotch [49], JOSTLE [50], Parkway [51]). The research group who developed the Chaco program created a more powerful software package, called Zoltan [52], to support high-performance parallel partitioning, parallel graph coloring and dynamic load balancing. The Metis software package was followed by hMetis [53], which was designed for partitioning hypergraphs arising in VLSI circuits, and ParMetis [54], which was created for MPI-based parallel execution of partitioning of large graphs.

The focus of the partitioning programs moved to parallel execution to keep up with the increasing size of the graphs challenging the modern computing architectures. As the size of the graphs investigated in the dissertation is relatively small, the lack of parallel execution can be tolerated. Hereby, I review the algorithms of the two most famous software packages (Chaco and hMetis), which still provide the basis of more complex parallel partitioning packages. To demonstrate the limitations of naive parti-tioning for the problem proposed in the dissertation, hMetis was selected as a

represen-From the aspect of combined partitioning and placement, the terminal propagation fea-ture of Chaco is also noted, and the technique is related to the algorithm proposed in Section 4.5.

4.3.4.1 Chaco

The Chaco [47] software package is based on the spectral bipartitioning technique I reviewed in Section 4.3.2.1. The key contribution of the work is to apply the mul-tilevel paradigm to spectral bipartitioning. As a typical mulmul-tilevel algorithm, it has three phases: coarsening, spectral partitioning, and uncoarsening with local refine-ment. The key success of the algorithm is that the cost of both coarsening and uncoars-ening phases is very low and proportional to the number of edges. During coarsuncoars-ening, instead of eigenvectors, only the partition is transferred to the next level. The relatively expensive spectral partitioning is carried out only on the coarsest graph, which has a very limited size. To preserve constraints related to cluster size or weighted edges, the weight of vertices and edges can also be adjusted at coarsening.

As the algorithm is designed for ordinary graphs, at coarsening, edges containing only two vertices has to be contracted. When an edge is selected for contraction, the incident two vertices are joined, and the weight of the new vertex equals to the sum of the weights of the original vertices. Additionally, the edges, which connected the original vertices to a common vertex, are joined, and the weight of the new edge equals to the sum of the weights of the original edges.

In the beginning of each coarsening step, amaximal matching is generated to de-termine the edges to be contracted. A maximal matching of a G(V, E)matrix is the maximal subset of E edges, in which no two edges share a common vertex. In the program, the maximal matching is generated via visiting the edges in a random order, which requires a time proportional to the number of edges.

At the uncoarsening phase, both vertices and clusters are projected back to the pre-vious level. As the back-projected partition is not necessarily at a local optimum, a local refinement algorithm (Fiduccia-Mattheyses described in Section 4.3.1.2) is ap-plied for fine-tuning.

An interesting feature of the program is the terminal propagation. The terminal propagation was originally proposed by [55] and later integrated into the spectral par-titioning technique [56] as well. It was inspired by the problem of parpar-titioning meshes for parallel architectures with fixed topology (e.g. hypercube with different dimen-sions). A special partitioning was required which discriminates against cut edges (communications) that connect nonadjacent clusters (computing nodes). The idea was to add virtual vertices to represent the already formed clusters of the recursive biparti-tioning. For each unpartitioned vertex, a virtual edge was added to the graph connect-ing the vertex with one of the virtual vertices. Via the virtual edges, the preference to put a vertex close to an already formed cluster could be incorporated into the model for the rest of the bipartitionings.

Chaco supports partitioning for two types of architectures: hypercube and grid. In both cases, the possible dimensions are 1, 2, and 3. In case of grids, the exact size of the grid has to be specified, that is, the number of clusters has to be known before the partitioning. The terminal propagation is applied at both the recursive spectral bipartitioning of the coarsest graph and the move-based local refinements.

The constraint of a fixed parallel architecture is partly similar to the problem pre-sented in the dissertation. In both cases, the penalty of cut edges depends on the topology of the clusters. The difference is that in my case the topology is not fixed.

For a given graph (mathematical expression), it is not known a priori how many clus-ters it should contain or which cluster topology is the best for the maximum perfor-mance. Leaving these parameters free in the optimization procedure, one can answer these questions and reach better quality. In this sense, the algorithm proposed in Sec-tion 4.5.2 can be regarded as smarter technique addressing the problem without com-promise. Although, accepting these compromises, an algorithm could be designed using the terminal propagation technique. A possible application of the technique is discussed and related to my solution in Section 4.5.2.6.

4.3.4.2 hMetis

The hMetis [53] software package is one of the state-of-the-art partitioning programs used for VLSI circuit partitioning. It is based on a multilevel hypergraph partitioning scheme and it is capable of minimizing several objective functions. The multilevel idea is to create a sequence of successive approximations of the original hypergraph and to

The key contribution of the program is the approximation (calledcoarsening), when a smaller hypergraph is created from a hypergraph, and the inverse of this process (called uncoarsening), when a partition of an approximating hypergraph is projected back to the original hypergraph and the partition in the original hypergraph is refined. The partitioning of the approximating hypergraph can be done via a standard partitioning algorithm like Kernighan-Lin (described in Section 4.3.1.1) or Fiduccia-Mattheyses (described in Section 4.3.1.2).

For coarsening, uncoarsening, and refinement, several strategies are available in the hMetis program. In each case the objective is to find and contract such vertices at coarsening which would belong to the same partition class anyway. Common coarsen-ing heuristics are Edge Coarsening(EC), Hyperedge Coarsening(HC) and Modified Hyperedge Coarsening (MHC). In EC special weights are added to hyperegdes rank-ing smaller sized edges higher, then pairs of vertices which are incident to hyperedges with the largest weights are contracted. In HC hyperedges are sorted in a decreasing weight order, and hyperedges of the same weight are sorted in a increasing size order, then vertices of full hyperedges are contracted. MHC is an enhanced version of HC:

after contraction, the list of remaining hyperedges is traversed again and uncontracted vertices of the same hyperedge are also contracted. Uncoarsening simply projects the partition back to the original hypergraph, while refinement fine-tunes the quality of the back-projected partition via another partitioning (e.g. FM again).

4.4 Empirically validating the advantage of locally

In document Efﬁcient implementation of computationally intensive algorithms on parallel computing platforms Csaba Nemes (Pldal 68-71)