# Congestion Reduction during Placement with Provably Good Approximation Bound<sup>1</sup>

Xiaojian Yang<sup>†</sup> Maogang Wang<sup>‡</sup> Ryan Kastner<sup>§</sup> Soheil Ghiasi<sup>¶</sup> Majid Sarrafzadeh<sup>¶</sup>

- <sup>†</sup> Synplicity Inc. Sunnyvale, California
- Cadence Design Systems Inc. San Jose, California
- § Department of Electrical and Computer Engineering University of California, Santa Barbara
- Computer Science Department University of California, Los Angeles

This paper presents a novel method to reduce routing congestion during placement stage. The proposed approach is used as a post-processing step in placement. Congestion reduction is based on local improvement on the existing layout. However, the approach has a global view of the congestion over the entire design. It uses integer linear programming (ILP) to formulate the problem of conflicts between multiple congested regions, and performs local improvement according to the solution of the ILP problem. The approximation algorithm of the formulated ILP problem is studied and good approximation bounds are given and proved. Experiments show that the proposed approach can effectively alleviate the congestion of global routing result. The low computational complexity of the proposed approach indicates its scalability on large designs.

Categories and Subject Descriptors: B.7.2 [Integrated Circuits]: Design Aids

 $General \ Terms: \ Algorithms, \ Experimentation$ 

Additional Key Words and Phrases: Physical Design, Placement, Routability, Congestion

## 1. INTRODUCTION

As VLSI system complexity continues to increase, physical design is getting more and more difficult. Traditional placement tools focus on minimizing total wirelength to obtain better routability and smaller layout area [1; 2; 3]. Despite the pervasive use of half-perimeter wirelength objective, there is a mismatch between wirelength and congestion objectives in placement [4]; a placement with less total wirelength does not necessarily

<sup>&</sup>lt;sup>1</sup>This work was support in part by the National Science Foundation (NSF) grant CCR-0090203. A preliminary version of this paper appeared in the Proceedings of 2001 International Conference on Computer-Aided Design (ICCAD).

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, February 2003, Pages 1–17.

correspond to a better layout after routing. Congestion – an important objective indicating routability – has not drawn enough research attention in placement related studies. Dealing with congestion is widely addressed in routing algorithms. However, in most cases, a portion of routing violation cannot be removed given fixed cell locations. It is of value to consider routability in placement stage where the effort on congestion reduction would be more effective [5]. Two recently proposed congestion estimation models, a Rent's rule based model [6] and a probabilistic model [7], showed the trend of addressing congestion problem at early design stages.

In [8], a routability model was proposed and incorporated in the annealing based placement. The use of the model effectively reduces the congestion. However, the proposed approach discards the extensive research work on wirelength minimization, and it significantly degrades placement speed. A multi-partitioning technique using pre-determined Steiner trees was introduced in [9]. The restriction on the number of partitions confines the performance of the approach. A congestion driven placement approach was proposed in [10]. It uses area router to evaluate local congestion during placement. Several other approaches, e.g. [11; 12; 13], also incorporate routing within placement. In practice, combining global router and placer is an effective way to improve routability, yet researchers study on more efficient approaches to handle the increasing design size.

A recent study [12] shows that a post-processing technique is effective to minimize congestion because the congestion correlates with wirelength in a global view. However, reducing congestion after a wirelength-driven placement is a non-trivial problem. Traditionally, people perturb existing placement within a window around the congested area [11; 13]. Local improvement within small windows has limited effect, whereas expanding search windows will cause interactions between congested areas, resulting in unpredictable results.

This paper presents a novel technique based on integer linear programming (ILP) to alleviate congestion in placement. The proposed approach is used as a post-processing step during detailed placement stage. We study the difference between placement congestion and routing congestion, propose the congestion expansion technique to reduce congestion, and transform the expansion problem of multiple congested areas into an ILP problem. We also discuss the approximation algorithms for the proposed ILP problem and provide good alternative approach to solve the ILP problem efficiently. Our goal is to achieve a less congested layout after global routing by modifying the existing placement. To demonstrate the effectiveness of the proposed method, we use the overflow after global routing as a measurement of the placement quality. The proposed approach effectively reduces the overflow of global routing, as well as the total routed wirelength.

The rest of this paper is organized as follows. Section 2 gives preliminaries. Section 3 describes the routing estimation and congestion measurement used in this work. In section 4, we introduce an ILP based algorithm which alleviates congestion during the detailed placement. Additionally, an approximation algorithm to solve the ILP problem is proposed and analyzed. Section 5 presents experimental results to show the effectiveness of the approach. We conclude in section 6.

# 2. PRELIMINARIES

A *circuit* can be modeled with a hypergraph G(C, N), where C is a set of cells and N is a set of nets. A *net*  $e \in N$  is a subset of C which contains two or more cells. A *placement* is

a set of locations for all cells within a rectangular chip area.

During the detailed placement and global routing, the core area is divided into  $m \times n$  rectangular *global bins*. For standard-cell designs, we set *n* to the number of standard-cell rows; *m* is set so that the average number of cells per global bin is less than 5.

The boundaries of the global bins are *global bin edges*. For each global bin edge *e*, the *routing demand* d(e) is the number of wires that cross this boundary; the *routing supply* s(e) is the number of wires that are allowed to cross the boundary. The *overflow* of a boundary, *overflow*(b), is max(d(b) - s(b), 0). The *total overflow* of a design is the sum of the overflows over all the global bin edges of the design. A lower overflow usually indicates better routability of the circuit.

The *bounding box* of a net is the minimum rectangle which contains all the cells belonging to this net. The *total bounding box wirelength* of a design is the summation of the half-perimeter of the bounding box over all the nets. If we assume that the width and the height of global bins are unit length, the *normalized total bounding box wirelength* of a design has similar definition with total bounding box wirelength, but measured by global bin grid units. The *total routed wirelength* is the sum of actual wirelength over all the nets (including each wire segment), measured by global bin grid units.

# 3. CONGESTION IN PLACEMENT

# 3.1 Routing Estimation

To evaluate the congestion during placement, fast and accurate routing estimation is required [8; 7]. Selecting a routing estimation model highly depends on the internal mechanism of global router. For a general maze router, three conventional routing estimation models are widely used. They are bounding box model, star model and minimum spanning tree (MST) model. The MST model is accurate but it is also computationally expensive. Bounding box model requires the least computation for updating. It also generates reasonable estimation. In this work we adopt the bounding box model of Cheng [8], as illustrated in Fig. 1.



Fig. 1. Bounding box routing estimation model

For each global bin b(i, j) at column *i* and row *j*, let  $C_{ij,h}^{(k)}$  denote the number of horizontal wire crossings on the right edge of global bin b(i, j) induced by net *k*. Similarly, let  $C_{ij,v}^{(k)}$  denote the number of vertical wire crossings on the bottom edge of global bin b(i, j) induced by net *k*. If we use xmin(k), xmax(k), ymin(k) and ymax(k) to describe the bounding box of net *k*, we have,



where q(k) is a compensation factor defined in [8]. The bounding box wirelength underestimates the actual wiring for nets with more than three terminals. Therefore q(k) has been introduced in order to model multi-terminal nets. q(k) depends on the number of terminals of net k. q(k) is 1 for 2-terminal or 3-terminal nets, and slowly increases to 2.79 for nets with 50 terminals.

With the routing estimation for each net, we can calculate the total estimated number of crossings for global bin edges. For each global bin b(i, j), the routing demand of its right and bottom edge are:

$$C_{ij,h} = \sum_{k=1}^{N} C_{ij,h}^{(k)}$$
$$C_{ij,v} = \sum_{k=1}^{N} C_{ij,v}^{(k)}$$

# 3.2 Congestion Cost

We study the standard cell placement problem without consideration of macro cells. Therefore we assume uniformly distributed routing tracks for the entire core area. Let  $Cap_h$  and  $Cap_v$  be the number of tracks for vertical and horizontal global bin edges, respectively. For bin b(i, j), the overflow of the right edge  $OF_{ij,h}$  is  $max(C_{ij,h} - Cap_h, 0)$ , and the overflow of the bottom edge  $OF_{ij,v}$  is  $max(C_{ij,v} - Cap_v, 0)$ .

The congestion cost function of the design can be modeled using overflow only. A more reasonable cost function would be a combination of wirelength and overflow (for a study of different congestion cost functions, see [14]). In this work, we employ a combination of wirelength and quadratic function of overflow. The horizontal congestion of the design  $COST_h$  is,

$$COST_{h} = \sum_{i=1}^{m-1} \sum_{j=1}^{n} (C_{ij,h} + OF_{ij,h}^{2})$$

The vertical congestion  $Cost_v$  is:

$$COST_{\nu} = \sum_{i=1}^{m} \sum_{j=1}^{n-1} (C_{ij,\nu} + OF_{ij,\nu}^2)$$

The total congestion cost *COST* is the sum of  $COST_h$  and  $COST_v$ .

Additionally, the total overflow *OF* of the layout is the sum of overflow over all the global bin edges:

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, February 2003.

$$OF = \sum_{i=1}^{m-1} \sum_{j=1}^{n} OF_{ij,h} + \sum_{i=1}^{m} \sum_{j=1}^{n-1} OF_{ij,v}$$

# 3.3 Bin Congestion Degree

To identify the congested area, we define a *congestion degree*  $C_{ij}$  for each global bin b(i, j) as the average relative congestion of its four edges.

$$C_{ij} = \frac{1}{4} \left( \frac{C_{ij,h}}{C_{avg,h}} + \frac{C_{(i-1)j,h}}{C_{avg,h}} + \frac{C_{ij,\nu}}{C_{avg,\nu}} + \frac{C_{i(j-1),\nu}}{C_{avg,\nu}} \right)$$

where  $C_{avg,h}$  and  $C_{avg,v}$  are the average numbers of horizontal and vertical crossings for all of the bin, respectively. They are obtained by,

$$C_{avg,h} = \frac{1}{(m-1)n} \sum_{i=1}^{m-1} \sum_{j=1}^{n} C_{ij,h}$$
$$C_{avg,v} = \frac{1}{m(n-1)} \sum_{i=1}^{m} \sum_{j=1}^{n-1} C_{ij,v}$$

## 4. CONGESTION REDUCTION IN DETAILED PLACEMENT



Fig. 2. Congested areas expand from placement to routing. (a) estimated routing congestion in placement. (b) congestion after maze-routing using large capacity without ripup and re-route. (c) congestion after maze-routing using tight capacity with rip-up and re-route. Note that the congestion threshold in placement is higher than that in routing, resulting in the expansion of the congested areas.

We focus on the integration of the congestion reduction technique into the detailed placement stage for the following reasons. Firstly, the total routing demand of the design globally correlates to the total wirelength. Minimizing total wirelength indirectly reduces congestion. Traditionally people use the half perimeter of bounding box to estimate routed wirelength of a net. Extensive research on minimizing bounding box wirelength can be utilized in global congestion reduction.<sup>2</sup> Secondly, congestion reduction in detailed place-

 $<sup>^{2}</sup>$ The authors also believe that a congestion-driven global placement (different from minimizing wirelength) would be more effective, and it should draw research attention as well.

ment costs less in terms of computation time. In most cases, combining the routing estimation into global placement will dramatically degrade the placement speed. Considering that a main portion of placement time should be allocated to timing issues, the excessive computational time for congestion reduction is discouraged. Finally, in detailed placement stage, more accurate layout information has been revealed. Therefore the routing estimation more accurate; the reduction of the estimated congestion is more likely to be transformed to the reduction of routing congestion.

#### 4.1 Congested Region Expansion

Based on routing estimation, there are two ways to identify a congested area in placement. We can define the congested bin as a global bin with a congestion degree (described in Section 3.3) greater than a certain threshold value. Or we can define the congested bin if at least one of its edges is congested, i.e. the overflow of at least one edge is greater than zero. Congested regions are unevenly distributed throughout the chip area.

As expected, a congested region in placement will shrink after global routing, since routers "intelligently" handle congestion. However, if we change the point of view, by setting a higher threshold value in placement than in routing, the congested region is actually enlarged due to the detours in the congested spot. The tighter the routing resources, the larger the congested area. Fig. 2 gives an example of this phenomenon. It is the expansion which makes the placement congestion problem hard: the effort on reducing the congestion in placement may be unnecessary to routing, or may cause new congested area.

# 4.2 Multiple Congested Area Expansion based on Integer Programming

The expansion of a congested region suggests: (a) the congestion reduction should be performed within a larger region than the congested region; (b) certain techniques are required to handle the conflicts between the expansions of multiple congested regions. We name the congestion optimization region the *expansion area*. For a single congested region, it is desired to use a larger expansion area so the congestion can be effectively reduced. However, we should bound the expansion area since a larger expansion area requires longer running time. Additionally, the expansion area of one region may overlap with that of another region if these two congested regions are close. This may cause unexpected congested regions. A larger expansion area increases the likelihood of the existence of the overlapped regions. An arbitrative mechanism is needed to determine the expansion range for each congested region. We transform this arbitrating problem into an integer programming problem.

Assume we have K congested regions. Two rectangular expansion areas are assigned to each congested region, a smaller one and larger one. These expansion areas are overlapped as shown in Fig. 3. If we try to reduce congestion for one congested region without consideration of the other congested region's expansion, two expansion areas may overlap and a new congested region may be created.

The problem is to find a combination of expansion schemes for all of the congested regions, such that the maximum congestion over the core area is minimized. First, we use a simplified model to describe the expansion. As shown in Fig. 4, a congested region has two expansion areas:  $E_0$  and  $E_1$ . For  $E_0$ , the expected average congestion degrees (or average density)  $d_0$  is:

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, February 2003.

6



Fig. 3. Overlaps between expansion areas for multiple congested regions



Fig. 4. Two expansion areas for a congested region

$$d_0 = \frac{1}{A_0} \sum_{(i,j) \in E_0} C_{ij}$$

where  $A_0$  is the area of  $E_0$ .

Similarly, the expected average congestion degree  $d_1$  for area  $E_1$  is:

$$d_1 = \frac{1}{A_1} \sum_{(i,j)\in E_1} C_{ij}$$

where  $A_1$  is the area of  $E_1$ .

For any global bin b(i, j) in the expansion area  $E_0$ , its congestion degree before and after expansion scheme 0 are  $C_{ij}$  and  $d_0$ , respectively. Thus, for congested region k, we define the *incremental degree* for any global bin b(i, j) at expansion scheme 0:

$$d_{ij}^{k,0} = \begin{cases} d_0 - C_{ij} & \text{if } b(i,j) \in E_0 \\ \\ 0 & \text{otherwise} \end{cases}$$

The incremental degree for any global bin b(i, j) at expansion scheme 1:

$$d_{ij}^{k,1} = \begin{cases} d_1 - C_{ij} & \text{if } b(i,j) \in E_1 \\ 0 & \text{otherwise} \end{cases}$$

For each congested region k, there is a corresponding binary variable  $x_k$ .  $x_k$  is 0 if the expansion  $E_0$  is chosen, or 1 if  $E_1$  is chosen.

The expansion scheme problem can be transformed into a 0-1 integer linear program (ILP) problem:

inimize 
$$C_{max}$$
 (1)

s.t. 
$$C_{ij} + \sum_{k=1}^{K} d_{ij}^{k,0}(1-x_k) + d_{ij}^{k,1}x_k \le C_{max}$$
 (2)

$$i = 1, ..., m$$
  
 $j = 1, ..., n$   
 $x_k \in \{0, 1\}$   $k = 1, ..., K$  (3)

where  $C_{max}$  is the maximum congestion degree over all the global bins. For each global bin there is one constraint. If a global bin b(i, j) is located in the expansion area  $E_0$  of congested region k, a term  $d_{ij}^{k,0}(1-x_k) + d_{ij}^{k,1}x_k$  will be added to its constraint. If bin b(i, j) is located in the expansion area  $E_1$  but not  $E_0$ , an item  $d_{ij}^{k,1}x_k$  will be generated. If bin b(i, j) is located in neither of these two areas, no constraint is created from the congested region k. If a bin is not covered by any of the congested regions, it has a simple constraint:  $C_{ij} \leq C_{max}$ .

The transformed ILP problem can be optimally solved if the number of congested regions is limited. The problem solution determines the expansion scheme for each congested region. Local congestion reduction will be performed within the pre-determined expansion areas.

# 4.3 Approximation Algorithm for ILP

m

In this section, we will discuss the approximation approach of the formulated ILP problem. When the number of congested regions is large (e.g. > 100), the ILP problem can not be solved efficiently. Even if it is a moderate number (e.g.  $\sim 50 - 60$ ), a certain amount of running time still prevents the approach to be used in the inner loop of the optimization flow. We need to solve the transformed ILP problem as quickly as possible. On the other hand, we do not require the optimality of the solution because the solution only provides an estimated target for congestion reduction approach. Therefore we look into approximations of the original problem.

One approach to solve this problem is using the linear programming relaxation and *threshold rounding*. First we relax the integrality constraint  $x_k \in \{0, 1\}$  for all k and obtain the relaxation linear programming (LP) problem in which  $x_k \in [0, 1]$  for all k. This LP problem can be solved efficiently. We solve this relaxed LP problem to obtain the fractional solution  $\hat{x}_k$ . Then we use threshold rounding to obtain a solution of original ILP problem: for all k, if  $\hat{x}_k \ge 0.5$ , set  $x_k$  to 1, otherwise set it to 0. Let  $C^*$  denote the optimal value of the objective function for the original ILP problem, and let  $C_{app}$  denote the objective value obtained by the approximation approach (relaxation and rounding), We have:

THEOREM 4.1. For the ILP problem described by (1), (2) and (3), the relaxation and threshold rounding approach finds a solution with an objective value  $C_{app}$  such that  $C_{app} \leq 2C^*$ , where  $C^*$  is the optimal solution for the ILP problem.

PROOF. Consider the left side of the constraint inequality in the ILP problem. For each global bin b(i, j), each congested region k corresponds to a portion of the constraint:

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, February 2003.

8

 $d_{ij}^{k,0}(1-x_k) + d_{ij}^{k,1}x_k$ . Let  $\hat{D}_{ij}^k$  denote the value of this portion before rounding and  $D_{ij}^k$  denote the value after rounding. Assume for the optimal solution of relaxed LP problem,  $\hat{x}_k < 0.5$  and  $x_k$  is set to 0, we have:

$$\frac{D_{ij}^{k}}{\hat{D}_{ij}^{k}} = \frac{d_{ij}^{k,0}(1-x_{k}) + d_{ij}^{k,1}x_{k}}{d_{ij}^{k,0}(1-\hat{x}_{k}) + d_{ij}^{k,1}\hat{x}_{k}} \leq \frac{d_{ij}^{k,0}(1-\hat{x}_{k}) + d_{ij}^{k,1}\hat{x}_{k}}{d_{ij}^{k,0}(1-\hat{x}_{k}) + d_{ij}^{k,1}\hat{x}_{k}} \leq \frac{d_{ij}^{k,0}}{d_{ij}^{k,0}(1-\hat{x}_{k})} \leq 2$$

Similarly, for  $\hat{x}_k \ge 0.5$  and  $x_k = 1$ , the ratio between  $D_{ij}^k$  and  $\hat{D}_{ij}^k$  is also less than or equal to 2. Therefore,

$$\frac{C_{ij} + \sum_{k=1}^{K} D_{ij}^{k}}{C_{ij} + \sum_{k=1}^{K} \widehat{D}_{ij}^{k}} \leq 2$$
  
 $i = 1, ..., m \quad j = 1, ..., n$ 

Since the objective function  $C_{max}$  is determined by the maximum left side of inequality constraint for all *i* and *j*, the value of objective function after rounding  $C_{app} \leq \hat{C}$ . Here  $\hat{C}$  is the optimal objective of the relaxed LP problem. Since the solution of the relaxed LP problem is the dual bound of the original ILP problem, i.e.  $\hat{C} \leq C^*$ , we have  $C_{app} \leq 2C^*$ 

Theorem 4.1 gives a 2-approximation algorithm for the proposed ILP problem because the algorithm has a performance ratio<sup>3</sup> at most 2. In practice, there exists better approximation feature for the algorithm. This is because the solution of the relaxed LP problem is not uniformly distributed over [0, 1]. Fig. 5 shows the solution distribution of  $\hat{x}_k$  for a series of problems. The figure is obtained in the following way. For a given circuit, we assign two expansion plans for each congested region, the ILP problem is then formulated, relaxed and solved. We repeat this process by setting different expansion plans. Finally, we count the total number of variables for all of the problems, and number of variables within a given range.

According to the experimental observation, we may use a distribution model to describe the  $x_k$  distribution. Here we adopt a square distribution f(x) as illustrated in Fig. 6. This is not an accurate model to describe the solution feature, however, we will show later that it is useful if it bounds the real distribution.

Assume the probability distribution function is  $f(x) = c(x - 1/2)^2$ . *c* is a constant such that

$$\int_0^1 c(x - \frac{1}{2})^2 dx = 1 \tag{4}$$

From (4) we obtain c = 12. Thus,  $f(x) = 12(x - 1/2)^2$ . For any given x', 0 < x' < 1/2, the probability  $p_1$  that  $x_k \le x'$  or  $x_k \ge 1 - x'$  can be computed by

<sup>&</sup>lt;sup>3</sup>Performance ratio of an algorithm is the ratio between the solution delivered by this algorithm and the optimal solution.



Fig. 5. Relaxed ILP solution distribution. Each bar shows the percentage of variables that have values in the corresponding range, e.g., the left most bar shows that there are 35% of variables with values between 0 and 0.05; the second left bar shows that there are 2% of variables with values between 0.05 and 0.10, etc.



Fig. 6. Assumed distribution of the solution for the relaxed ILP

$$p_1 = 2 \int_0^{x'} f(x) dx$$
 (5)

We thus have

THEOREM 4.2. For the ILP problem described by (1), (2) and (3), if the solution distribution  $f(x) = 12(x - 1/2)^2$ , the relaxation and threshold rounding approach finds an  $\alpha$ -approximate solution, with probability at least p. The relationship between  $\alpha$  and p is

$$p = \left(8\left(\frac{\alpha-1}{\alpha}\right)^3 - 12\left(\frac{\alpha-1}{\alpha}\right)^2 + 6\left(\frac{\alpha-1}{\alpha}\right)\right)^K$$

where K is the number of variables in the ILP problem.

PROOF. According to (5) and f(x), for a given variable  $x_k$  and a given value x' (0 < x' < 1/2), the probability p' with  $x_k \le x'$  or  $x_k \ge 1 - x'$  is

$$p' = 8x'^3 - 12x'^2 + 6x' \tag{6}$$

When  $x_k \le x'$ , in the worst case, rounding  $x_k$  to 0 will increase the left side of a constraint from  $d_0(1-x_k) + d_1x_k$  to  $d_0$ . So, the ratio between the constraint value with or without rounding is:

$$\frac{d_0}{d_0(1-x_k)+d_1x_k} \le \frac{1}{1-x_k} \le \frac{1}{1-x'}$$

The similar case occurs when  $x_k \ge x'$ . The ratio is:

$$\frac{d_1}{d_1(1-x_k)+d_1x_k} \le \frac{1}{x_k} \le \frac{1}{1-x'}$$

So the threshold rounding approach is an  $\alpha$ -approximation algorithm and the approximation factor

$$\alpha = \frac{1}{1 - x'}$$

Thus, we have

$$x' = \frac{\alpha - 1}{\alpha}$$

Plug it into (6), we have

$$p' = 8(\frac{\alpha-1}{\alpha})^3 - 12(\frac{\alpha-1}{\alpha})^2 + 6(\frac{\alpha-1}{\alpha})$$

This is the probability for one variable. The probability p that all K variables are less than x' or greater than 1 - x' is  $p'^{K}$ .  $\Box$ 

Theorem 4.2 gives the probability of producing a bounded solution. For instance, the probability of producing a 1.5-approximate solution for 40 variables is 0.22.

Noting that the actual distribution does not necessarily obey the square distribution, we have

COROLLARY 4.3. Theorem 4.2 holds for any distribution g(x) if

$$\int_0^{x'} g(x)dx \ge \int_0^{x'} f(x)dx, \quad and$$
$$\int_{1-x'}^1 g(x)dx \ge \int_{1-x'}^1 f(x)dx \quad x' = \frac{\alpha - 1}{\alpha}$$

The analysis on the approximation algorithm for the ILP problem is not directly used in our congestion reduction approach. However, it reveals the problem features and suggests a fast approximation method to solve the ILP problem. In some specific cases where the ILP problem contains too many variables, and can not be solved efficiently, the approximation algorithm can be used to obtain a reasonable solution within a short amount of time.

# 4.4 Congestion Reduction in Detailed Placement

Based on the congestion analysis, we propose an approach to alleviate the routing congestion in detailed placement stage. The entire flow of the approach is described by Algorithm 1.

Algorithm 1 Congestion Reduction Algorithm for Multiple Congested Area

| Input: Circuit G and a detailed placement $P_0$ ,                                            |
|----------------------------------------------------------------------------------------------|
| Output: Placement $P_1$ with alleviated congestion                                           |
| Snap cells into global bins according to their current position;                             |
| for all net $n \in N$ do                                                                     |
| Do routing estimation for net <i>n</i> ;                                                     |
| end for                                                                                      |
| Calculate average horizontal/vertical routing demand over all of the edges: Cave, h, Cave, v |
| for all global bin $b(i, j)$ , $i = 1, \dots, m, j = 1, \dots, n$ do                         |
| Assign an estimated congestion degree $C_{ij}$ to this bin;                                  |
| end for                                                                                      |
| Identify congested regions;                                                                  |
| Assign two expansion ranges for each congested region;                                       |
| Formulate the ILP problem and solve it;                                                      |
| Determine expansion range for each congested region by ILP;                                  |
| for all congested region do                                                                  |
| Expand this region according to pre-determined expansion range;                              |
| Do local improvement within the range to reduce congestion;                                  |
| end for                                                                                      |
| Create detailed placement by spreading cells in each global bin.                             |

The congestion reduction approach starts from an existing placement after wirelength minimization. The core area is divided into uniform global bins. Cells are assigned into global bins according to their current position. Based on the routing estimation model and bin congestion degree described in Section 3, we have a congestion distribution map for the current placement.

The next step is to identify the congested regions. This is accomplished by picking a congested global bin as the seed, checking the neighborhood bins and including the congested bins into this group of congested bins. Then we use the minimum rectangle that contains these connected congested bins as one congested region. A new seed is then picked to form the next congested region. For some designs, many of congested bins are connected. A large congested region will be found based on the approach above. However, large congested regions may degrade the effect of the congestion reduction within its range. In this case, we set a maximum area for the congested regions to prevent the formation of too large congested regions.

Once we have all of the congested regions, we assign two expansions to each region. The size of the expansion area is proportional to the original congested region. We use x% to denote the expansion scale of a congested region, i.e., the width and height of the expanded region will be x% longer than that of the original region. For each congested region, we have two expansion plans: a larger one and a smaller one. The selection between these two plans is made by formulating and solving the ILP problem described in Section 4.2.

The expansion area determines the range of local improvement for a congested region. The local improvement is based on cell swapping. A pair of cells are randomly chosen and swapped. Routing estimations of the nets that connect these two cells are re-evaluated. The swap will be accepted if the total congestion cost in the expansion area is lower after swapping, and will be rejected otherwise. In order to speed up the performance, cell swap-

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, February 2003.

12

ping is performed based on global bin structure, i.e. cells are located at the center of global bins. This may cause the cost function mismatch between the current placement and the final placement after resolving the overlaps. To limit the mismatch, row balance and bin balance are maintained during the congestion reduction<sup>4</sup>.

A *cell re-ordering* technique is used in this approach to enhance the congestion-driven swapping. This technique is described in Fig. 7. In Fig. 7(a) cell A has an interconnect to cell C. Swapping cell A and B will reduce the congestion cost on edge *e*. However, A and B may not be actually swapped. In Fig. 7(b), since cell A is wider than cell B, after swapping and resolving overlap, cell A is still in its original global bin. If we use the re-ordering technique to put cell A at the right side in the global bin when it enters the bin, the probability of a valid swap will be much higher.



Fig. 7. Cell re-ordering in congestion reduction approach: (a) Cell A and B are swapped to reduce congestion on edge *e*; (b) Swapping in global bin stage will be nullified in detailed placement; (c) Re-order cell A after swapping to ensure the effect of swapping.

Local improvement is performed for each congested region. After a given number of improvement iterations, the algorithm terminates by spreading cells in each global bin. A final detailed placement is generated and the global routing will be executed on this detailed placement.

## 5. EXPERIMENTAL RESULTS

The proposed approach has been implemented in C. All the experiments were done in Linux environment on a PC with a 733MHz CPU. The experimental circuits are chosen from IBM-PLACE benchmarks [15]. Table I shows the circuit statistics. Number of grids and vertical/horizontal capacities in global routing are also shown.

Table II shows the effect of congestion reduction as a post-processing. We compare two placement flows in the experiments. First, we place each circuit using a wirelengthdriven placer, Dragon [16], then route it using a global router Labyrinth [17] which is based on maze routing and rip-up and re-route. In the second run, we apply proposed congestion reduction approach on the same placement, followed by global routing using same parameters. We compare the global routing results (overflow and routed wirelength) for placements with or without congestion reduction. For each circuit, we intentionally adjust the vertical/horizontal capacity in global routing to obtain reasonable amount of overflow.

<sup>&</sup>lt;sup>4</sup>We use row balance factor 0.01 and bin balance factor 0.50 in this work.

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, February 2003.

| ckt   | #cells | #nets  | grids           | #c/bin | V/H Cap |
|-------|--------|--------|-----------------|--------|---------|
| ibm01 | 12,036 | 13,056 | 64× 64          | 2.9    | 12 / 14 |
| ibm02 | 19,062 | 19,291 | $80 \times 64$  | 3.7    | 22 / 34 |
| ibm03 | 21,924 | 26,104 | 80× 64          | 4.3    | 20 / 30 |
| ibm04 | 26,346 | 31,328 | 96× 64          | 4.3    | 20 / 23 |
| ibm05 | 28,146 | 29,647 | $128 \times 64$ | 3.4    | 42 / 63 |
| ibm06 | 32,185 | 34,935 | $128 \times 64$ | 3.9    | 19 / 34 |
| ibm07 | 45,135 | 46,885 | 192× 64         | 3.6    | 21/36   |
| ibm08 | 50,977 | 49,228 | 192× 64         | 4.1    | 21 / 32 |
| ibm09 | 57,746 | 59,454 | 256× 64         | 3.1    | 14 / 28 |
| ibm10 | 67,692 | 72,760 | $256 \times 64$ | 4.1    | 27 / 40 |

Table I. Tested circuit statistics, including number of cells, number of nets, number of global bins, average number of cells per bin and vertical/horizontal capacity in global routing.

|       | overf | low | BB-WL |       | grid-BB-WL |       | grid-routed-WL |       | runtime |
|-------|-------|-----|-------|-------|------------|-------|----------------|-------|---------|
| ckt   | NCR   | CR  | NCR   | CR    | NCR        | CR    | NCR            | CR    | (secs)  |
| ibm01 | 398   | 309 | 4.79  | 0     | 52742      | 0     | 76517          | -0.9% | 11      |
| ibm02 | 492   | 292 | 13.82 | -0.7% | 142240     | -0.1% | 204734         | -2.4% | 33      |
| ibm03 | 209   | 181 | 13.02 | 0     | 129654     | 0     | 185194         | -1.8% | 25      |
| ibm04 | 882   | 778 | 16.31 | +0.1% | 147943     | 0     | 196920         | -0.7% | 39      |
| ibm05 | 251   | 0   | 37.14 | 0     | 472110     | 0     | 689671         | -9.8% | 134     |
| ibm06 | 834   | 540 | 20.67 | -0.1% | 241014     | -0.1% | 346137         | -0.9% | 169     |
| ibm07 | 697   | 686 | 30.76 | +0.1% | 327253     | -0.1% | 449213         | +0.4% | 164     |
| ibm08 | 665   | 654 | 33.56 | -0.1% | 339442     | -0.1% | 469666         | -0.2% | 181     |
| ibm09 | 505   | 268 | 30.38 | -0.1% | 367378     | -0.1% | 481176         | -1.2% | 232     |
| ibm10 | 588   | 383 | 59.81 | +1.1% | 513215     | +1.2% | 679606         | -0.3% | 389     |

Table II. Global routing results comparison between placement without congestion reduction (NCR) and with CR. The total overflow, bounding box wirelength, normalized bounding box wirelength and normalized routed wirelength are compared. Lower overflows are shown in boldface. Wirelength with CR is relative to that without CR. Runtime is in CPU seconds.

As can be seen, the congestion reduction approach considerably reduces the total overflow after global routing. For the best case among all the results, circuit *ibm05*, the total overflow turns to zero by congestion reduction. As of total routed wirelength, almost all the circuits have a shorter wirelength after congestion reduction. This indicates that wirelength is not sacrificed due to the reduction of overflow. We also noted that the total bounding box wirelength (both actual or normalized grid wirelength) with or without congestion reduction do not differ significantly. This reveals: (a) the bounding box wirelength is no longer a good metric of routability in detailed placement, and (b) our proposed approach does small perturbation on the existing placement. Finally, the short amount of running time shows that the method can scale well on large circuits.

Table III shows the effect of ILP based expansion technique in congestion reduction. For a given circuit ibm02 which contains 9 congested regions, we use single expansion technique with different expansion areas from 10% to 100%. The double expansion technique is also applied with different combination of expansion areas. To solve the formulated ILP problem, we use ILP solver<sup>5</sup> or relax the problem and solve the corresponding LP problem. All of these methods are tested by reporting the total overflow after global routing.

<sup>&</sup>lt;sup>5</sup>CPLEX 7.0

ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, February 2003.

| single expansion |     |       |     | double expansion |     |        |     |
|------------------|-----|-------|-----|------------------|-----|--------|-----|
|                  |     |       | ILP |                  | LP  |        |     |
| exp %            | OF  | exp % | OF  | exp %            | OF  | exp %  | OF  |
| 10               | 447 | 60    | 433 | 10/60            | 433 | 10/60  | 412 |
| 20               | 457 | 70    | 439 | 20/70            | 396 | 20/70  | 381 |
| 30               | 473 | 80    | 355 | 30/80            | 436 | 30/80  | 458 |
| 40               | 427 | 90    | 457 | 40/90            | 388 | 40/90  | 421 |
| 50               | 430 | 100   | 424 | 50/100           | 380 | 50/100 | 425 |

Table III. Comparison between single expansion and ILP based expansion for circuit ibm02 (9 congested regions). Overflow is reported for each expansion area x% in single expansion case, and for each combination of two expansion areas  $x_1$ %/ $x_2$ % in double expansion case. Both ILP and relaxed LP are used in double expansion case.

As can be seen, in general the double expansion scheme produces better placement compared to the single expansion, e.g., 20%/70% combination is better than either 20% or 70% expansion.



Fig. 8. Comparison of different layouts with or without congestion reduction for circuit ibm10. Overflown edges are shown in black. (a) layout without congestion reduction, overflow=588, (b) layout with congestion reduction, overflow=383.

Fig. 8 shows the effect of our congestion reduction approach on circuit ibm10. Fig. 8(a) is the overflown edge map after global routing without congestion reduction. Fig. 8(b) is the map with congestion reduction, containing less overflown edges.

# 6. CONCLUSION AND DISCUSSION

We have developed an algorithm to alleviate congestion during placement. In the congestion reduction process, routing estimation model is used to evaluate edge congestion and bin congestion. Congested spots on the design are relieved using the local improvement within a search window. Such a window size is determined by formulating and solving an ILP problem when dealing with multiple congested regions. We also study the approximation algorithm for the proposed ILP problem. Experimental results on overflow of routed designs show that the proposed method is an effective way to improve design routability.

In this paper, we focus on a post-processing congestion reduction problem. However, circuit performance is becoming more and more critical for modern designs. Meeting timing constraints and producing routable designs are two indispensable objectives in placement stage. It should be noted that congestion and timing are two highly related objectives

in placement. In many cases, addressing congestion issue would greatly help timing optimization. An obvious example is the following. If some regions of a placed circuit are highly congested, it is more likely to have many detoured wires in the final layout. Excessive detoured wires cause bad timing prediction in placement and nullify any timing-driven placement method. Another example is that reducing congestion is helpful for buffering, since the newly added buffers have little impact on the routability if the current congestion is under control. In some other cases, however, reducing congestion may pose negative effect on timing improvement. For better routability, some cells are moved out of high density region. This may cause longer interconnections on critical paths. An appropriate trade-off between congestion and timing optimization is desired in this case.

# 7. ACKNOWLEDGMENT

The authors would like to thank the reviewers for their detailed feedback. The authors are also grateful to Prof. Jason Cong, Prof. Miodrag Potkonjak, Prof. Lieven Vandenberghe for their valuable comments on the manuscript.

#### REFERENCES

A. E. Dunlop and B. W. Kernighan. "A Procedure for Placement of Standard Cell VLSI Circuits". *IEEE Transactions on Computer Aided Design*, 4(1):92–98, January 1985.

W. J. Sun and C. Sechen. "Efficient and Effective Placement for Very Large Circuits". *IEEE Transactions on Computer Aided Design*, 14(3):349–359, March 1995.

J. M. Kleinhans, G. Sigl, F. M. Johannes, and K. J. Antreich. "GORDIAN: VLSI Placement by Quadratic Programming and Slicing Optimization". *IEEE Transactions on Computer Aided Design*, 10(3):365–365, 1991.

A. E. Caldwell, A. B. Kahng, and I. L. Markov. "Can Recursive Bisection Alone Produce Routable Placements?". In *Design Automation Conference*, pages 477–482. IEEE/ACM, June 2000.

A. B. Kahng, S. Mantik, and D. Stroobandt. "Requirements for Models of Achievable Routing". In *International Symposium on Physical Design*, pages 4–11. ACM, April 2000.

X. Yang, R. Kastner, and M. Sarrafzadeh. "Congestion Estimation During Top-down Placement". In *International Symposium on Physical Design*, pages 164–169. ACM, April 2001.

J. Lou, S. Krishnamoorthy, and H. S. Sheng. "Estimating Routing Congestion using Probabilistic Analysis". In *International Symposium on Physical Design*, pages 112–117. ACM, April 2001.

C. E. Cheng. "RISA: Accurate and Efficient Placement Routability Modeling". In International Conference on Computer-Aided Design, pages 690–695, 1994.

S. Mayrhofer and U. Lauther. "Congestion-Driven Placement Using a New Multi-partitioning Heuristic". In *International Conference on Computer-Aided Design*, pages 332–335. IEEE, 1990.

P. N. Parakh, R. B. Brown, and K. A. Sakallah. "Congestion Driven Quadratic Placement". In *Design Automation Conference*, pages 275–278. IEEE/ACM, June 1998.

R. S. Tsay and S. C. Chang. "Early Wirability Checking and 2-D Congestion-Driven Circuit Placement". In *International Conference on ASIC*. IEEE, 1992.

M. Wang, X. Yang, and M. Sarrafzadeh. "Congestion Minimization During Top-Down Placement". *IEEE Transactions on Computer Aided Design*, 19(10):1140–1148, 2000.

M. Wang, X. Yang, K. Eguro, and M. Sarrafzadeh. "Multi-Center Congestion Estimation and Minimization During Placement". In *International Symposium on Physical Design*, pages 147–152. ACM, April 2000.

J. Cong and P. Madden. "Performance Driven Multi-Layer General Area Routing for PCB/MCM Designs". In *Design Automation Conference*, pages 356–361. IEEE/ACM, 1998.

ERLAB. "IBM-PLACE benchmark". http://er.cs.ucla.edu/benchmarks/ibm-place/.

M. Wang, X. Yang, and M. Sarrafzadeh. "Dragon2000: Fast Standard-cell Placement for Large Circuits". In *International Conference on Computer-Aided Design*, pages 260–263. IEEE, 2000.

ERLAB. "Labyrinth — A global router and routing development tool". http://www.cs.ucla.edu/~kastner/labyrinth/.

Received December 2001; revised December 2002; accepted Feburary 2003