Partitioning problem - Partitioning and Placement

Partitioning and Placement

4.2 Partitioning problem

In the section, a high-level partitioning problem is described where the data-flow graph representation of a mathematical expression is partitioned to determine the locally con-trolled parts of the resulting arithmetic unit.

4.2.1 Problem formulation

After converting the mathematical expression into a data-flow graph (directed acyclic hypergraph), the number of cut arcs can be minimized by graph partitioning tech-niques, and the size and I/O connections of the clusters can also be balanced or con-strained. The number of cut arcs shall be minimized to reduce the area requirements of the circuit, while constraining the number of I/O connections of the clusters provides high-speed control units. Mathematical foundation of the problem is established in the

Definition 1 A directed hypergraph denoted by G(V, E)is a pair < V, E >, where V is a non empty set of nodes (vertices) and E is a set of hyperarcs (hyperedges);

a hyperarc eis an ordered pair < S, T >, with S ⊂ V, S 6= ∅, and T ⊂ (V \S).

Elements ofSandT are called the sources and targets of the arc, respectively. Sources and targets of a hyperarceis denoted byS(e)andT(e), respectively.

In a hypergraph G(V, E), a path p_st between nodes s and t is an alternating se-quence of distinct nodes and hyperarcss =v₀, e₁, v₁, e₂e_kv_k=tsuch thatvi−1 ∈S(e_i) andv_i ∈T(e_i)for alli= 1...k. A pathp_st is called a cycle ifs =t. A directed hyper-graph is called acyclic if there is no cycle in the hyperhyper-graph.

In our case, the data-flow graph of the high-level circuit can be mapped to a special acyclic hypergraph, in which each hyperarc e has only one source: |S(e)| = 1. The special property comes from the fact that every signal is driven by one source in the design. The acyclic property comes from the assumption that we deal with a simple evaluation of a mathematical expression which can be implemented without accumu-lators or recursion.

Definition 2 Given a hypergraphG(V, E), aP decomposition ofV into disjoint sub-setsV₁, V₂, ..., V_nsuch thatS

iV_i =V is called a partitioning ofG. The terms subdo-main, cluster, or partition class are used to refer to each one of theseV_i sets.

The proposed optimization can be described as a hypergraph partitioning with spe-cial cost functions assigned to cut hyperarcs or partition classes. In the presented model, anarea cost is defined for each cut arc describing the number of required FI-FOs:

f_Area(e, P, G) := |{j : (S(e)∪T(e))∩V_j 6=∅}| −1 (4.1) Each cut arcs, which have targets in classes different from the source class, shall be replaced by one FIFO for each target class. The area cost of a partition P can be computed as the sum of the cost of the arcs.

F_Area(P, G) :=X

e∈E

f_Area(e, P, G) (4.2)

To formulate the IO constraint, acontrol costis defined for each arc and partition

If an arc is completely outside a class, the control cost for the class is zero. If an arc has only targets in a class, the control cost for the class is one. Finally, if an arc has the source in a class, the control cost equals the area cost, because all added FIFOs will be controlled by the class. The control cost of a partition class can be computed as the sum of the cost of the arcs:

FControl(Vi, P, G) :=X

e∈E

fControl(e, Vi, P, G) (4.4)

The relationFArea(P, G)∗2 =P

Vi∈P FControl(Vi, P, G)can be concluded by ob-serving the fact that every FIFO is connected to two control units. Hence, every FIFO is computed twice in control cost computation. As a consequence of the relation, either cost function can be minimized for our purposes, however, the constraining cannot be skipped, otherwise the control costs of some classes may exceed the user defined limit.

The mathematical formulation of the proposed partitioning can be given as the following constrained optimization:

wherecis an upper limit for the number of FIFOs controlled by one control unit (pro-posed in Section 4.1.2). Note that two partition classes containing the global input and output nodes of the data-flow graph are fixed to avoid a trivial solution in which all vertices belong to the same partition class. Also note that the data-flow graphs where the number of global I/Os is less than the previously described upper limit can be implemented with one control unit efficiently, and there is no need for optimization.

connections of each net of the circuit. According to the objective functions used in the partitioning, the netlist representation can be simplified to a more simple representa-tion (e.g. hypergraphs, directed graphs). Netlists can be represented by hypergraphs if vertices and hyperedges are assigned to circuit modules and nets, respectively. In this case pins belonging to the same module are not distinguished and hyperedges naturally represents nets, as they can connect more than two modules. Hypergraph representa-tion can be further simplified to graphs, however, in this case hyperedges have to be modeled by extra edges and vertices [28]. In the clique net model, each hyperedge is replaced by edges connecting each pair of vertices incident to the given hyperedge. In the directed graph model, each source vertex of a net/hyperedge is connected to each target vertex of the hyperedge via a new edge. In the bipartite graph model, each hy-peredge is replaced with a new vertex, and old and new vertices are connected if the corresponding vertex was incident to the corresponding hyperedge. Unfortunately, in case of most partitioning objectives, hyperedges cannot be equivalently replaced [28].

Similarly, in case of theFArea objective, I have not found such a replacement for hy-peredges which does not alter the F_Area objective function. From the aspect of the proposed optimization problem, both the fan-out and the directions of nets are impor-tant, which explains the application of the more complex hypergraph model.

Finding an optimal solution for Problem 1 is NP-complete because if we could solve this problem in polynomial time, it would yield a polynomial algorithm for the ”graph partitioning problem” (Problem ND15 in [29]), which is known to be NP-complete. For a proof, note that any normal graph is also a hypergraph and the effects of I/O constraints can be eliminated by selecting a very large upper limit.

In document Efﬁcient implementation of computationally intensive algorithms on parallel computing platforms Csaba Nemes (Pldal 58-61)