# Reviewing 4-to-2 Adders for Multi-Operand Addition* 

Peter Kornerup<br>Dept. of Mathematics and Computer Science<br>Southern Danish University, Odense, Denmark<br>E-mail: kornerup@imada.sdu.dk


#### Abstract

Recently there has been quite a number of papers discussing the use of redundant 4-to-2 adders for the accumulation of partial products in multipliers, claiming one type to be superior to other types. This paper analyses the use of various 3- and 4-element redundant digit sets for radix 2, and compare their adder implementations using various encodings of the digits and carries. It is shown that theoretically they are equivalent, and differences in their implementations need only be very marginal. Another recent proposal for the use of the digit-set $\{0,1,2,3\}$, with a special 3-bit encoding of digits, is analyzed and some optimizations are shown, including the possibility of using a 2-bit encoding, simplifying the wiring of a multiplier tree. All these proposed designs are shown to be equivalent to a standard 4-to-2, carry-save adder, except possibly for a few signal inversions.


## 1. Introduction

When implementing fast multipliers in VLSI, a major part of area and time is spent on accumulating partial products using some kind of tree structures. Originally these were based on the use of full-adders (and occasionally at the ends some half-adders), reducing the sum of three rows to the sum of two rows, using either the Wallace- [17] or Dadda- organizations [2] of the tree structure. As these structures are not very regular to lay out due to the 3 -to- 2 structure, it was suggested to use 4-to- 2 adders which allow the use of binary tree structures. [18] seems to be the first to propose their use, based on the carry-save representation, where two addends are considered an encoding of a single operand represented using the redundant digit-set $\{0,1,2\}$. Thus the 4 -to- 2 adder can be considered an adder taking two such operands and adding them to produce the result in the same representation. This addition can be performed in digit parallel, and thus in constant time, using an array of such digit adders. Radix 2 signed-digits for 4 -to- 2 adders over the digit-set $\{-1,0,1\}$ were then later proposed in [15, 5]. Recently [3] proposed using the digit-set $\{0,1,2,3\}$ with a special 3 -bit encoding of the digits, and most recently [10] compared various 3- and 4 -element digit sets using some particular encodings, claiming significant differences in the implementation of these. Other publications discussing multiplier organizations are [16, 11, 6, 7, 8, 9, 12, 19], and plenty more.

After an introduction in Section 2 on the standard carry-save 4-to-2 adder, and some results on the use of signals with negative weights, Section 3 goes through a detailed review of a series of 4-to2 adders recently presented in [10]. These are all based on digit codings of the form $d= \pm 2 d^{h} \pm d^{l}$ for redundant digit-sets with 3 or 4 elements, and using a special technique denoted equal-weight grouping, or EWG. Here columns of bits of the same weight are accumulated, but producing a

[^0]sum represented in the various digit-sets. Results from SPICE simulations on the designs were presented, whereupon the authors then postulate some claims on the relative performances when using the investigated digit-sets.

The following Section 4 then shows that all these designs can be implemented using the very same basic 4-to-2 carry-save adder, just using alternative external connections and possibly a few inverters. Hence there are no principal differences between the use of the various digit sets, as opposed to the claims of the paper, nor to the use of the digit encodings proposed or the 4-element digit-sets.

Section 5 reviews some other recent results from [3], where the use of the digit-set $\{0,1,2,3\}$ and a special 3-bit encoding of that digit-set is employed. The authors claim to achieve fairly significant speed-ups by this encoding. By a detailed analysis and identification of some basic building blocks, it is shown that it is not necessary to use the particular 3-bit encoding to obtain the same effect, there is an equivalent 2 -bit encoding. It is found that a saving in the wiring can be obtained by reorganizing the building blocks. Finally it is then shown that this design is also equivalent to one based on the standard 4-to-2 carry-save adder. In Section 6 conclusions are drawn.

## 2. A basic building block.

We will start by looking at the basic functionality of a 4 -to- 2 adder, whose purpose in a multiplier (and other multi-operand addition problems) is reducing the sum of four binary operands to the sum of two. The operands can be partial products as generated and delivered in parallel at the leaves of the tree, or at the internal nodes of the tree they can be the result delivered as pairs of operands, forming the result of other 4-to-2 adders. In general we shall assume that such operands and results can have a specific positive or negative weight (is to be added or subtracted).

A pair of two bits of positive weight can be interpreted as a digit in the set $\{0,1,2\}$, the carrysave digit-set, employing the carry-save encoding:

$$
\begin{array}{llll}
0 & \sim 00 & & \\
1 & \sim 01 & \text { or } & 10  \tag{1}\\
2 & \sim 11 & &
\end{array}
$$

where it may be noted that the digit value 1 has two encodings.
Usually the 4 -to- 2 adder is considered an adder taking two such carry-save operands, and delivering the sum of these in the same carry-save representation. But alternatively interpreted at the digit level, it may also be considered a digit-set converter from the set $\{0,1,2\}+\{0,1,2\}=\{0,1,2,3,4\}$ into the set $\{0,1,2\}$, where such a conversion can be pictured in a diagram [4]:


The converter or 4-to-2 adder can be realized by a combination of two full adders as in Figure 1, where the tuple $\left(i_{1}, i_{2}\right)$, respectively $\left(i_{3}, i_{4}\right)$, is the carry-save encoding of the operands, and ( $o_{1}, o_{2}$ ) represents the result. Note that carries $\left(c_{i n}^{\prime}, c_{i n}^{\prime \prime}\right)$ similarly is a carry-save encoding of the incoming "double" carry, and $\left(c_{o u t}^{\prime}, c_{o u t}^{\prime \prime}\right)$ of the outgoing carry, corresponding to the previous conversion diagram.


Figure 1. 4-to-2 carry-save adder composed of two full adders, and "box" version.
A particularly efficient realization of a 4-to-2 adder was shown in [7], as illustrated in Figure 2, where the XOR-gates were implemented by MUXes using pass-transistors.


Figure 2. 4-to-2 carry-save adder built of multiplexers.
Note that the addition of the carry $c_{i n}^{\prime \prime}$ is obtained at no cost in Figures 1 and 2, corresponding to the second conversion in the conversion diagram above being without cost in logic.

The transistor implementation of the XOR-gates in [7] uses a "dual-rail" representation of signals, these being provided and used in true as well as in inverted form. This implies that inversion of a signal can be realized by twisting wires, and thus at no cost in logic.

For the following discussion we will use the simplified "box"-version of Figure 1, not being concerned with the actual implementation of the logic. Signals are marked with their relative weight.

In Figures 1 and 2, and according to the carry-save representation, the signals $i_{1}, i_{2}, i_{3}, i_{4}, c_{i n}^{\prime}, c_{i n}^{\prime \prime}$, $o_{1}$ and $o_{2}$ all have the weight 1 , while $c_{o u t}^{\prime}$ and $c_{o u t}^{\prime \prime}$ have weight 2 , corresponding to the equation

$$
\begin{equation*}
2\left(c_{\text {out }}^{\prime}+c_{\text {out }}^{\prime \prime}\right)+\left(o_{1}+o_{2}\right)=i_{1}+i_{2}+i_{3}+i_{4}+c_{\text {in }}^{\prime}+c_{\text {in }}^{\prime \prime} . \tag{2}
\end{equation*}
$$

We will in the following assume that $o_{2}=c_{i n}^{\prime \prime}$, since this allows $c_{i n}^{\prime \prime}$ to be added in at no cost. Thus we have the following defining equations

$$
\begin{align*}
o_{1} & =\left(i_{1}+i_{2}+i_{3}+i_{4}+c_{\text {in }}^{\prime}\right) \bmod 2 \\
o_{2} & =c_{\text {in }}^{\prime \prime} \\
c_{\text {out }}^{\prime}+c_{\text {out }}^{\prime \prime} & =\left(i_{1}+i_{2}+i_{3}+i_{4}+c_{\text {in }}^{\prime}\right) \operatorname{div} 2, \tag{3}
\end{align*}
$$

where the pair $c_{o u t}^{\prime}, c_{\text {out }}^{\prime \prime}$ provides a carry-save encoding of the combined carry in $\{0,1,2\}$.

To allow changes in the weights for other digit sets, following [1] we note this theorem:
Theorem 1 Let a binary signal $b \in\{0,1\}$ have associated weight $w$, so that the value of the signal is $v=w b$. Inverting the signal into $1-b$, while at the same time negating the sign of the weight, changes its value into $v^{\prime}=v-w$, i.e., the value is being biased by the amount $-w$.

Proof: Trivial since $v^{\prime}=(-w)(1-b)=w b-w=v-w$.
When changing the interpretation of input signals as representation of values, it is then necessary to perform equivalent changes in the interpretation of the output signals. E.g., when the the domain of input to an adder or a digit-set conversion is being biased, the output must be equivalently biased.

Now consider the base 2 digit-set $\{-1,0,1\}$ using the signed-digit encoding (also denoted borrow-save), where two bit strings are considered a string of digits:

$$
\begin{aligned}
& \times \not \times \times \cdot \cdots \times \text { Neg. weight } \\
& \times \otimes \times \cdot \cdot \cdot \times \times \text { Pos. weight }
\end{aligned}
$$

obtained by pairing bits using the digit encoding:

$$
\begin{array}{rlll}
-1 & \sim 10 \\
0 & \sim 00 & \text { or } &  \tag{4}\\
1 & \sim 01
\end{array}
$$

where the left-most bit has negative weight. The conversion diagram for the addition of two signeddigit operands then may look as:


If in Figure 1 we change the input so that $\left(i_{1}, i_{2}\right)$ and $\left(i_{3}, i_{4}\right)$ represent signed-digits, with $i_{1}$ and $i_{3}$ respectively having negative weight and the signals thus delivered inverted, then the input is being biased by -2 . The output must then according to equation (2) be equivalently biased, which is possible by changing the sign of $c_{o u t}^{\prime}$ by inverting its signal value. But then $c_{i n}^{\prime}$ must also change sign, which changes the bias in the input to -3 . Finally, to compensate we must change the sign of the weight of $o_{1}$, corresponding to the output $\left(o_{1}, o_{2}\right)$ now representing a signed-digit, hence the signals must now satisfy the following equations:

$$
\begin{aligned}
2\left(\left(1-c_{\text {out }}^{\prime}\right)+c_{\text {out }}^{\prime \prime}\right)+\left(\left(1-o_{1}\right)+o_{2}\right) & =\left(1-i_{1}\right)+i_{2}+\left(1-i_{3}\right)+i_{4}+\left(1-c_{\text {in }}^{\prime}\right)+c_{\text {in }}^{\prime \prime} \\
\text { or } 2\left(-c_{\text {out }}^{\prime}+c_{\text {out }}^{\prime \prime}\right)+\left(-o_{1}+o_{2}\right) & =-i_{1}+i_{2}-i_{3}+i_{4}-c_{\text {in }}^{\prime}+c_{\text {in }}^{\prime \prime}
\end{aligned}
$$

and the equivalent equations derived from (3). Inverting the signals appropriately in Figure 1 we obtain Figure 3.


Figure 3. 4-to-2 signed-digit adder obtained from a carry-save adder
Theorem 2 In a computational model where inversion is without cost in area and time, radix 2 signed-digit addition can be realized at exactly the same cost as carry-save addition.

Observe that if an array of such adders are connected (here vertically) to form an $n$-digit adder, then no inversions are needed between the individual adders. Similarly if a tree of such arrays are formed to perform multi-operand addition (here horizontal connections), then all the inversions internally in the tree can be eliminated.

Theorem 3 Multi-operand addition of radix 2 signed-digit operands can be implemented by a tree of carry-save adders, by inverting all negatively weighted signals on input as well as on output, but with no changes internally to the tree.

## 3. Codings of the form $\pm 2 d^{h} \pm d^{l}$.

Recently in [10] various radix 2 redundant digit-sets, using encodings of the form $\left(d^{h}, d^{l}\right)$ with $d= \pm 2 d^{h} \pm d^{l}$, were suggested and analyzed, employing a particular way of implementation denoted equal-weight grouping or EWG, to be described below. The digit-sets investigated were

$$
\begin{array}{rlrl}
\mathcal{D}^{(S D)} & =\{-1,0,1\} & \mathcal{D}^{(C S 2)}=\{0,1,2\} \\
\mathcal{D}^{\left(S D 3^{(-)}\right)} & =\{-2,-1,0,1\} & \mathcal{D}^{(C S 3)}=\{0,1,2,3\}  \tag{5}\\
\mathcal{D}^{\left(S D 3^{(+)}\right)} & =\{-1,0,1,2\} . & &
\end{array}
$$

The digit-sets $\mathcal{D}^{(S D)}$ and $\mathcal{D}^{\left(S D 3^{(-)}\right)}$are coded as $d=-2 d^{h}+d^{l}$, corresponding to a 2's complement encoding of the digit $d$. Since the digit-set $\mathcal{D}^{(S D)}$ does not include -2 , the bit-pattern $\left(d^{h}, d^{l}\right)=(1,0)$ is not valid in the $S D$ representation. The set $\mathcal{D}^{\left(S D 3^{(+)}\right)}$is realized by changing the sign of both components, i.e., $d=2 d^{h}-d^{l}$. The $\mathcal{D}^{(C S 3)}$ and $\mathcal{D}^{(C S 2)}$ digit-sets are coded with $d=2 d^{h}+d^{l}$, where the bit-pattern $\left(d^{h}, d^{l}\right)=(1,1)$ is invalid in the $\mathcal{D}^{(C S 2)}$ representations.

Since the encodings employ weights differing by a factor of 2 , neighboring digits $d_{i}$ and $d_{i-1}$ in a radix representation overlap one another, i.e., $d_{i}^{l}$ has the same weight as $d_{i-1}^{h}$,

but accumulation of several bit-vectors from the encoding of digit vectors can still be performed in vertical columns. This is what the authors of [10] denote equal-weight grouping or EWG-form.

Normally redundant adders are used to reduce the sum of two digit-vectors to a single redundant digit-vector, absorbing and emitting suitable carries. With additions of the form $\mathcal{D}+\mathcal{D} \rightarrow \mathcal{D}$, where $\mathcal{D}$ is one of the above listed $C S$ digit-sets, e.g., with $\mathcal{D}=\mathcal{D}^{(C S 3)}$, then $\mathcal{D}+\mathcal{D}=\{0,1,2,3,4,5,6\}$. With EWG, bits from the encoding of two neighboring positions are added, the summation takes bits in a column forming a sum in $\{0,1,2,3,4\}$, but producing the resulting digit in $\mathcal{D}^{(C S 3)}=\{0,1,2,3\}$ as shown in the following diagram (without the carries):


The coding used for $\mathcal{D}^{(C S 2)}$ and $\mathcal{D}^{(C S 3)}$ is $d=2 d^{h}+d^{l}$. Since the bit-vectors all have positive weight, it is a digit-set conversion of the form:

for the case of addition in $\mathcal{D}^{(C S 3)}$, where using EWG the conversion is from $\{0,1,2,3,4\}$ back into $\mathcal{D}^{(C S 3)}=\{0,1,2,3\}$. When the coding is $d=-2 d^{h}+d^{l}$, two of the bit-vectors have negative weight, and by EWG the mapping is from $\{-2,-1,0,1,2\}$ to $\{-1,0,1\}$.

For comparison the authors then consider four cases:

1. $S D+S D \rightarrow S D:{ }^{1}$ Here they employ the signed-digit, 4-to-2 adder from [5] using the encoding $-1 \sim 11,0 \sim 00$ and $1 \sim 0$, corresponding to 2 's complement encoding (but it could also be interpreted as a sign-magnitude). The carry-set $\mathcal{C}^{(S D)}=\{-1,0,1\}$ is assumed to be in the same encoding.


2. $\underline{S D 3^{(-)}+S D 3^{(-)} \rightarrow S D 3^{(-)}}$: The carry-set used is $\mathcal{C}^{\left(S D^{(-)}\right)}=\{-1,0,1\}$, despite the fact that $\{0,1\}$ is sufficient, as can be seen from this conversion diagram:


The signals in their mux-based design are assumed to have the weights shown in the diagram:


It is noted that the addition $S D 3^{(+)}+S D 3^{(+)} \rightarrow S D 3^{(+)}=\{-1,0,1,2\}$ can be realized by interchanging the positive and negative inputs.
3. $\underline{C S 2+C S 2 \rightarrow C S 2: ~ T h e ~ c a r r y-s e t ~ h e r e ~ i s ~ p r o v e n ~ t o ~ b e ~} \mathcal{C}^{(C S 2)}=\{0,1\}$, as the two outgoing carries in the diagram is shown never simultaneously to be 1 . Another mux-based design is used in [10], with assumed weights as indicated:


[^1]4. $\underline{C S 3}+C S 3 \rightarrow C S 3$ : The carry-set is again $\mathcal{C}^{(C S 3)}=\{0,1\}$, hence a simplified conversion is sufficient. To implement the adder they employ the 4-to-2 compressor (carry-save based) from [7], as described in Figure 2. Again they do not describe the connections and weights, but we assume they are as described here:

where the lower right-hand connectors are not used (they are just connected by a "jumper".)
Note that for all four designs above, input is provided in the EWG-form by combining signals (bits of the same absolute value weight) from two neighboring digit positions, but output is delivered in the proper encoding for the particular digit-set. The authors of [10] performed SPICE simulations of the four designs above, yielding the following table (their Table 8):

| Adder Cell | Critical Path Delay (ns) |
| :---: | :---: |
| SD | 0.78750 |
| $S D 3^{(-)}$ | 0.96025 |
| $C S 2$ | 0.66100 |
| $C S 3$ | 0.46580 |

On page 1276 of [10] it is stated:
"multipliers based on CS3 can be expected to outperform multipliers based on other redundant representations"
and furthermore using arguments on digit-set cardinalities:
"Therefore, cells such as 1 and 3 from [the listing above] are fundamentally more complex, hence, bigger and slower."

## 4. Alternative implementations.

Using the results of Section 2, and contrary to the above statements, we shall now show that all the four adder cases can be implemented by the same hardware, e.g., the efficient design from [7] as used in Figure 2 and case $C S 3$ above, just using an alternative signal interpretation. It will then be apparent that there are no advantages in using the particular codings of the form $\pm 2 d^{h} \pm d^{l}$, as opposed to the form $d^{\prime} \pm d^{\prime \prime}$, traditionally used for carry-save and signed-digit encodings.

We will do this by showing that in the above four cases there exist alternative implementations, using any generic 4 -to- 2 carry-save adder. In particular we may employ the mux-based design from [7], where it is possible at no cost in transistor count to add properly placed signal inversions.

1. $\underline{S D+S D \rightarrow S D}$

where the lower right-hand connectors are not used (they are just connected by a "jumper".)
 it delivers a result in $\{-1,0,1\} \subset S D 3^{(-)}=\{-2,-1,0,1\}$.
2. $\underline{C S 2+C S 2 \rightarrow C S 2 \text { : The standard carry-save adder can directly be used, just using different }}$ output connections to deliver the result in the $2 d^{h}+d^{l}$ encoding.

3. $C S 3+C S 3 \rightarrow C S 3$ : The same 4-to-2, carry-save adder as in the previous case can be used since $\{0,1,2\} \subset \mathcal{D}^{(S D 3)}=\{0,1,2,3\}$. Note that this was the solution used in [10].
Observe again, that in any array-structure of such adders (e.g., in a multiplier tree), there is no need for inversions internally in the array. We can thus conclude that it is possible to implement addition in these five digit-sets (including $S D 3^{(+)}$) with the very same logic, except possibly for a few inversions, which may even be trivially realized at no cost in logic.

## 5. Another example using the digit-Set $\{0,1,2,3\}$.

In [3] another implementation was proposed employing the digit-set $\{0,1,2,3\}$, albeit with a 3-bit encoding of digits, where the value of the code can be obtained by adding the values of the three bits ( $d, e, f$ ) for all the valid combinations, $v=d+e+f$, where $(e, f) \neq(1,0)$. Internally a unique 2 -bit coding $(t, w)$ was also used:

| Value | 0 | 1 | - | 2 | 1 | 2 | - | 3 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $d, e, f$ | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| $t, w$ | 00 | 01 | - | 10 | 01 | 10 | - | 11 |

Observe that $(t, w)$ is the same encoding as used in [10] for the digit-sets $\mathcal{D}^{(C S 2)}=\{0,1,2\}$ and $\mathcal{D}^{(C S 3)}=\{0,1,2,3\}$, with digit value $v=2 d^{h}+d^{l}$. Mapping from $(d, e, f)$ into $(t, w)$ is performed by the logic in Figure 4 (slightly reorganized from [3]), shown together with a simplified block symbol.


Figure 4. Mapping from $(d, e, f)$ into $(t, w)$.
This logic is functionally equivalent to a full-adder, except for the restrictions on the input $(d, e, f)$ (which we will denote the def-encoding), and coding of the output in the $(t, w)$-form (denoted the $t w$-encoding). [3] then describes various adders for combinations of operands in the redundant $d e f$-encoding and ordinary binary, with output in def-encoding. We shall only go into details with two of these. The first we describe in Figure 5 is an adder taking four ordinary binary operands, and delivering their sum in the def-encoding, here shown slightly modified from [3].

Observe that the left-most part of Figure 5, computing $\left(t_{\text {out }}, d\right)$, is a standard full-adder, and the right-most part is a recoding where arithmetically $e+f=i+t$, but assures that only legal


Figure 5. Adder taking four binary signals into ( $d, e, f$ ) encoding.
combinations of $d, e, f$ are produced $((e, f) \neq(1,0))$. This is the logic needed at the first level of a multiplier tree, the critical path here is through two XOR gates, and the authors claim a speedup of about 1.7 compared to an ordinary 4 -to- 2 adder.

For the case of two redundant operands the critical path is two XORs, plus an AND and an OR gate, where the authors have found a speedup of about 1.2. The following conversion diagram shows how the addition is performed:


Figure 6, reproduced with slight modifications from [3], shows their implementation, using a carry in $\{0,1,2\}$ in carry-save representation $\left(t^{1}, t^{2}\right)$, plus a carry $u$ in $\{0,1\}$.

A)

Figure 6. A: Adder with two redundant operands, B: Block diagram.
Observe that the dot-framed part is again a standard full-adder, and the logic producing $e$ and $f$ is again the recoding assuring $(e, f) \neq(1,0)$. Figure 6-A can also be described in a simplified way by the block diagram of Figure 6-B.

Now note that instead of using the def-coding at input and output, we may as well remove the two $\mathrm{FA}_{t w}$ adders at the top, and add one at the bottom, thus using the $t w$-coding on input and output, as illustrated in Figure 7.


Figure 7. Reorganized 4-to-2 tw-adder.
We have thus shown that it is not necessary to use the def-encoding at the external interface to the adder, and that the 4-to-2 adder in tw-encoding can be implemented with only about two thirds of the logic needed for the adder using $d e f$-encoding, as presented in [3]. The critical paths here are of the same length as in Figure 6.

We finally need the adder-type to use at the first level of an adder tree, i.e., taking four binary signals and producing their sum in $t w$-encoding, and a recoding to use at the root of the adder tree. Both of these are trivially found as shown in Figure 8.

B)
A)

Figure 8. A: First level adder for $t w$-encoded result. B: Root conversion.
However, the simplifications of the nodes do not help in an adder tree, since there still is a defto $t w$-conversion between each level in the tree, and at the first level an extra similar conversion is needed now as seen in Figure 8-A. Note though that at the root of the tree, instead of a FA ${ }_{t w}$ adder to convert from $d e f$ - to tw-encoding, now only the simple re-wiring in Figure 8 -B is needed to deliver the ordinary carry-save encoding of the result. Thus the critical path of the tree is unchanged.

But more important, the wiring of the tree is now simplified, since only two signals instead of three are needed in each connection between the layers of the adder tree.

Now observe that in all places where signals $e, f$ are fed into an $\mathrm{FA}_{t w}$-adder, there is a recoding taking place, to assure that the pattern $(e, f) \neq(1,0)$. If the logic of that recoding is combined with the logic of the $\mathrm{FA}_{t w}$-adder, it appears that the result is an ordinary full-adder. Just consider

Figure 8-A with input ( $d, w, t$ ) to the combination of a recoder and $\mathrm{FA}_{t w}$-adder, with output denoted $\left(t^{\prime}, w^{\prime}\right)$, then from the combined logic

we find $t^{\prime}=d w+d t+w t$ and $w^{\prime}=d \oplus w \oplus t$, which is the functionality of a standard fulladder. We have thus shown that the proposed adder is again equivalent to the standard carry-save, 4 -to- 2 adder based on the digit-set $\{0,1,2\}$, and the speed difference observed is due to different implementations of the full-adders used.

## 6. Conclusions

Using some trivial results on signal weights it has been shown that there are no fundamental differences between the redundant, radix 2, 4-to-2 adders based on carry-save, and those based on signed-digit representations, i.e., digit sets $\{0,1,2\}$ and $\{-1,0,1\}$ with their usual encodings. There need only be very small differences between implementations, as only a few inverters are needed to change one type of adder into the other. For multi-operand addition, as in multiplication using an array- or tree-structure of adders, it has been shown that inversions are only needed on appropriate input and output signals, no changes are necessary internal to the array- or tree-structure. In a computational model where inversion is without cost in area and time, signed-digit arithmetic can be performed at exactly the same cost as carry-save.

After a review of some redundant adder designs from [10] for a variety of digit-sets (3-and 4element), for which claims were made about principal differences and relative speeds, it was shown that they can in fact all be implemented by the same basic 4-to-2 carry-save adder, just using some other interface wiring to the environment and possibly a few inverters. Hence there is no principal difference between the use of, say, the digit-sets $\{0,1,2,3\}$ and $\{0,1,2\}$, the difference is just a question of interpretation of which signal-pairs constitute encodings of digits.

Finally another proposal in [3] to use a radix-2, 4-element digit-set with a 3-bit digit encoding, has been reviewed. It is shown that this design is only marginally different from the adder in [10] also employing the digit-set $\{0,1,2,3\}$, and thus can also be interpreted as a normal 4-to-2, carrysave adder. It was furthermore shown that the 3-bit encoding is not necessary, a standard 2-bit encoding can be used. Actually, it is found that the difference is only in the positioning of the boundaries between the levels in a multiplier tree.

All the proposed 4-to-2 adder designs, employing digit or carry encodings of the form $v=d^{\prime} \pm d^{\prime \prime}$ or $v= \pm 2 d^{\prime} \pm d^{\prime \prime}$ (including also the sign-magnitude encoding), have thus been shown to be equivalent to a carry-save adder over the standard carry-save digit-set $\{0,1,2\}$, except possibly for a few signal inversions. It is conjectured that there are no other fundamentally different digit encodings that will allow for faster implementations. Any single, or two or more alternating, optimal adder designs employing encodings of the form $v=d^{\prime} \pm d^{\prime \prime}$ can be used internally in an array (or tree) for multi-operand addition; only at the external interfaces to the array it may be necessary to add inverters, adapting to the digit set and encoding used in the environment.

## References

[1] T. Aoki, Y. Sawada, and T. Higuchi. Signed-Weight Arithmetic and its Application to a Field-Programmable Digital Filter Architecture. IEICE Trans. Electron., E82(9):1687-1698, 1999.
[2] L. Dadda. Some Schemes for Parallel Multipliers. Alta Freq., 34:349-356, 1965. Reprinted in [13].
[3] M.D. Ercegovac and T. Lang. Effective Coding for Fast Redundant Adders using the Radix-2 Digit Set $\{0,1,2,3\}$. In Proc. 31st Asilomar Conf. Signals Systems and Computers, pages 1163-1167, 1997.
[4] P. Kornerup. Digit-Set Conversions: Generalizations and Applications. IEEE Transactions on Computers, C-43(6):622-629, May 1994.
[5] S. Kuninobu, T. Nishiyama, H. Edamatsu, T. Taniguchi, and N. Takagi. Design of High Speed MOS Multiplier and Divider Using Redundant Binary Representation. In Proc. 8th IEEE Symposium on Computer Arithmetic, pages 80-86. IEEE Computer Society, 1987.
[6] Z. J. Mou and F. Jutand. "Overturned-stairs" adder trees and multiplier design. IEEE Transactions on Computers, 41(8):940-948, August 1992.
[7] N. Ohkubo, M. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki, and Y. Nakagome. A 4.4 ns CMOS $54 \times 54-b$ Multiplier Using Pass Transistor Multiplexer. IEEE Journal of Solid State Circuits, 30(3):251-257, 1995.
[8] V.G. Oklobdzija and D. Villeger. Improving Multiplier Design by using Improved Column Compression Tree and Optimized Final Adder in CMOS Technology. IEEE Transactions on VLSI, 3(2):292-301, 1995.
[9] V.G. Oklobdzija, D. Villeger, and S.S Liu. A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach. IEEE Transactions on Computers, 45(3):294-306, March 1996.
[10] D.S. Phatak, T. Goff, and I. Koren. Constant-Time Addition and Simultaneous Format Conversion Based on Redundant Binary Representations. IEEE Transactions on Computers, 50(11):1267-1278, 2001.
[11] M.R. Santoro and M.R. Horowitz. A Pipelined 64x64b Iterative Array Multiplier. Proc. IEEE International SolidState Circuit Conference, pages 36-37, 1988.
[12] P.F. Stelling, C. U. Martel, V.G. Oklobdzija, and R. Ravi. Optimal Circuits for Parallel Multipliers. IEEE Transactions on Computers, 47(3):273-285, March 1998.
[13] Earl E. Swartzlander, editor. Computer Arithmetic, Vol I. Dowden, Hutchinson and Ross, Inc., 1980. Reprinted by IEEE Computer Society Press, 1990.
[14] Earl E. Swartzlander, editor. Computer Arithmetic, Vol II. IEEE Computer Society Press, 1990.
[15] N. Takagi, H. Yasuura, and S. Yajima. High-Speed VLSI Multiplication Algorithm with a Redundant Binary Addition Tree. IEEE Transactions on Computers, C-34(9):789-796, September 1985.
[16] J. Vuillemin. A Very Fast Multiplication Algorithm for VLSI Implementation. INTEGRATION, the VLSI Journal, 1:39-52, 1983. Reprinted in [14].
[17] C. S. Wallace. A Suggestion for a Fast Multiplier. IEEE Transactions on Electronic Computers, EC-13:14-17, 1964. Reprinted in [13].
[18] A. Weinberger. 4-2 Carry-Save Adder Module. IBM Technical Disclosure Bulletin, 23, January 1981.
[19] W.-C. Yeh and C.-W Jen. High-Speed Booth Encoded Parallel Multiplier Design. IEEE Transactions on Computers, 49(7):692-701, 2000.


[^0]:    * This work has been supported by the the Danish Natural Science Foundation, grant no. 9801811

[^1]:    ${ }^{1}$ This is the notation used in [10], note that due to EWG this is not set addition.

