Fine-Grain Control of Multiple Functional Blocks with Lookup Table-Based Adaptive Body Biasing

Byunghee Choi and Youngsoo Shin
Department of Electrical Engineering, KAIST
Daejeon 305-701, Korea

Abstract—A reduced supply voltage must be accompanied by a reduced threshold voltage, which makes this approach to power saving susceptible to process variation in transistor parameters, as well as resulting in increased subthreshold leakage. We propose a new adaptive body biasing scheme, based on a lookup table for independent control of multiple functional blocks on a chip, which controls leakage and also compensates for process variation at the block level. An adaptive body bias is applied to blocks in active mode and a large reverse body bias is applied to blocks in standby mode. This is achieved by a central body bias controller, which has a low overhead in terms of area, delay, and power consumption. A design methodology for semicustom design using standard-cell elements is developed and verified with benchmark circuits.

I. INTRODUCTION

The supply voltage of CMOS circuits keeps being reduced in step with technology scaling so as to manage their power consumption. This increases the circuit delay, and the threshold voltage is reduced to compensate. This leads to an exponential increase in subthreshold leakage, which is the main component of standby power consumption. A reduced supply voltage has another implication in the design of circuits: process variations due to transistor parameters such as channel length and threshold voltage have a higher impact on speed and leakage current [1]. The spread in frequency and leakage distribution due to process variation can cause a $20\times$ variation in chip leakage and a $30\%$ variation in chip frequency [2]. This wide variation in frequency and leakage affects the yield, since chips with excessive leakage and chips at too low a frequency have to be discarded.

In order to accommodate the process variation and to reduce the leakage current, body bias circuits are used to control body (or substrate) bias dynamically. The threshold voltage of an MOS transistor is a function of its body to source potential. The threshold voltage can be modulated to achieve higher performance by a forward body bias (FBB). The switching power can be reduced by means of FBB, since it allows the same frequency to be achieved at a lower supply voltage [3]. A reverse body bias (RBB) uses a higher threshold voltage and further reduces standby leakage current: the leakage current of a circuit is monitored and a feedback controller adjusts the body voltage until the predetermined leakage target is met [4]. It is possible to utilize FBB and RBB together, and this is called adaptive body bias (ABB), which has been shown to be very effective for minimizing the impact of both die-to-die and within-die parameter variations on frequency and active leakage power [5].

Although body biasing is efficient, the biasing circuits represent a large overhead in terms of area, power consumption, and the delay required to adjust the body bias. Thus, most circuit techniques for body biasing are targeted to an entire chip or several functional blocks, where the overhead of the biasing circuits is acceptable because of the scale of the circuits that they control, but the downside is that blocks are not controlled independently. In order to achieve fine-grain control of leakage and to compensate for intra-die process variation, it is important to be able to control several functional blocks on the same chip independently, which is only possible if biasing circuits with very low overheads can be used.

In this paper, we propose a new ABB scheme in which multiple macros are controlled independently, depending on their mode of operation. ABB is used to compensate for the process variation in the performance of a macro when it is in active mode and RBB is used to reduce its leakage current in standby mode. The salient feature of the proposed scheme is a lookup table that holds a binary code for each macro corresponding to its active mode body bias voltage. The binary code is fetched by a power management unit, and then the corresponding body bias voltage is generated by the controller.

II. LOOKUP TABLE-BASED ADAPTIVE BODY BIASING

A. Overall Operation

Fig. 1 outlines the way in which a lookup table can be used for adaptive body biasing. Suppose we have $n$ independent macro functional blocks (macros for brevity) on a chip. A power management unit (PMU) detects a state change of a macro. When a macro changes its state from standby to active mode, the PMU generates a list of macros that are in active mode and sends it to the adaptive body bias controller. The controller then fetches the body bias voltage corresponding to each macro from the lookup table and generates it. The body bias voltage is then applied to the corresponding macro functional block.

![Fig. 1. Adaptive body biasing using a lookup table.](image-url)
mode, the PMU fetches a codeword from the lookup table. The codeword is input to the adaptive body bias controller, which is marked as a block in Fig. 1. The controller then generates a pair of active-mode body bias voltages for the macro (one for NMOS and the other for PMOS transistors). When a macro changes its state from active to standby mode, a predetermined large reverse body bias is directly generated by the controller without using the lookup table.

The lookup table holds a codeword for each macro corresponding to the active mode body bias of that macro. The number of bits in each codeword determines the number of available bias voltages for compensating for process variations. Obviously, more bias voltages allow finer compensation for compensating process variation, but more bits means a larger overhead for the adaptive body bias controller. Thus, the length of the codeword needs to be determined carefully. The values of the lookup table entries are determined and programmed after fabrication. The delay of each macro is monitored for each codeword, and the code that allows the macro to meet its delay target is selected.

The proposed architecture allows multiple macros, each of which operates in more than one modes, to be controlled independently. In active mode, either FBB or RBB is used for process compensation, depending on the process variation of the macro. In standby mode, a large RBB is used to suppress the leakage current.

B. Body Bias Controller

Once the PMU has fetched a codeword for a macro, the decoder shown in Fig. 1 generates an address which has one bit at 1 for each combination of values in the codeword. This address is then used by the body bias generator to generate the body biases.

The body bias generator consists of a level shifter, a demultiplexer (DEMUX), and a resistor tree. The resistor tree requires voltages of \( V_{DDH} \) (higher than \( V_{DD} \)) and \( V_{DDL} \) (lower than \( V_{SS} \)), instead of \( V_{DD} \) and \( V_{SS} \). A level shifter is employed to convert the address from the decoder, which uses \( V_{DD} \) as logic 1 and \( V_{SS} \) as logic 0, to a new pair of addresses: one for the PMOS switches in the resistor tree and the other for the NMOS switches. The address for the PMOS switches uses the levels \( V_{DDH} \) and \( V_{SS} \); the address for the NMOS switches uses \( V_{DD} \) and \( V_{DDL} \). The details will be explained in the next subsection.

After generation, the addresses are routed to the resistor tree through the DEMUX. Note that the resistor tree requires a pair of addresses for each macro, and so there are 2\( n \) addresses between the DEMUX and the resistor tree. The select signal, which is \( \lceil \log_2 n \rceil \) bits wide, selects the macro to which level-shifted addresses are routed. The on signal, which turns on the DEMUX, is important in the operation of the body bias generator. Normally the DEMUX is turned off by de-asserting the on signal, decoupling the resistor tree from the level shifter. When the PMU wants to apply the active body bias to a particular macro, the corresponding values appear on the select lines. However, it takes time for the decoder and the level shifter to generate the required signals. Thus, the on signal must only be asserted after the delay for decoding and level shifting, so that the selected macro receives the correctly decoded and level-shifted addresses. Once the DEMUX has transferred the required addresses, on is de-asserted again, turning off the DEMUX.

1) Resistor Tree: In order to generate the active-mode body bias voltage, we use a resistor tree, as shown in Fig. 2. This tree consists of \( N \) equal transistors connected in series, which divide the potential difference between \( V_{DDH} \) and \( V_{DDL} \) into \( N \) intermediate potentials. A set of predetermined bias voltages can then be obtained by connecting switches where needed.

We use a PMOS switch to obtain the PMOS body bias voltage \( V_p \), since the bias voltage for the PMOS body is around \( V_{DD} \), although it will be higher than \( V_{DD} \) for reverse body biasing. We therefore apply \( V_{DDH} \) to the gate of any PMOS switches that need to be turned off. Similarly, an NMOS switch is used to produce the NMOS body bias voltage, and we apply \( V_{DDL} \) to the gates of switches that are to be turned off. For instance, suppose that macro 1 in Fig. 1 makes the transition from standby to active mode. The PMU fetches the codeword 001, which is then decoded to yield 01000000. The logic level is shifted so that, if the address is to be used for PMOS switches (see Fig. 2), addr1 corresponds to \( V_{DDH} \) while the remaining bits correspond to \( V_{SS} \); but if the address is destined for NMOS switches, addr1 corresponds to \( V_{DD} \) while the remaining bits correspond to \( V_{DDL} \).

The body of each PMOS device in the resistor tree is biased to its own source, meaning that the n-well of each device needs to be isolated. This represents an area overhead, but frees the PMOS devices from the body effect. It also guarantees the stability of bias voltages generated by the resistor tree, even if \( V_i \) changes. In other words, the bias voltages are determined only by the number of serially connected PMOS devices, and are not affected by process variations. This is an important
property of a body bias controller.

Since we use the same resistor tree to bias all $n$ macros, each macro uses a dedicated switch, as shown in Fig. 2. When the resistor tree is used to bias one of the macros, the status of the switches for all the other macros must be maintained, and this is achieved by latches at the gate input of all switches.

2) Amplifier: The PMOS devices in the resistor tree operate in the subthreshold region. Therefore, the current that they draw is the subthreshold leakage current, which is very small and inadequate to drive the body of a macro. An amplifier, as shown in Fig. 3, is therefore required to boost the weak current from the resistor tree for NMOS body biasing.

A simple two-stage amplifier is used: the first gain stage is a differential-input single-ended output stage, and the second is a common-source stage. The circuit that generates the control signals (wakeup, standby and amp_on) from the sleep signal received from the PMU is also shown in Fig. 3.

For the transition from active to standby mode, the amp_on signal is de-asserted first, which turns off the transistors highlighted in Fig. 3, so as to reduce the overall power consumption of the amplifier during standby mode. This is followed by asserting the standby signal, which turns on M21. This transistor then applies the predetermined large reverse body bias ($V_{DDL}$) to the bodies of the NMOS devices in a macro. Note that M22 remains turned off by the de-asserted wakeup signal. The presence of M3 and M4 is important for the safe operation of the amplifier. Since the gate of M6 is connected to the bodies of the NMOS devices in a macro, a large reverse body bias applied through M21 might reduce $V_n$, the output of the amplifier, at the gate input to M5. This would affect the potential of the resistor tree in the opposite direction, which might in turn affect the body bias of other macros in active mode, since the one resistor tree is shared among all macros. This potential problem can be avoided by turning off M3 and M4, which cuts the path from M6 to M5.

For the transition from standby to active mode, the standby signal is de-asserted, which turns off M21. M22 is then turned on by wakeup, and the body potential of NMOS devices quickly goes up from $V_{DDL}$ to $V_{SS}$. Once the body is stable at $V_{SS}$, M22 turns off, and the amplifier is subsequently turned on by the amp_on signal. The bodies of the NMOS devices gradually settle down to the potential that is required to compensate for the process variation of their macro. The presence of M22 is also important in the transition from standby to active mode. If we switch directly from a large reverse body bias to an active-mode body bias, which is around $V_{SS}$ for NMOS devices, the potential at the gate of M6 can affect the gate potential of M5. We alleviate this problem by using M22 to boost the body potential from $V_{DDL}$ to $V_{SS}$, and then turn on the amplifier by means of the amp_on signal.

C. Design Methodology for Cell-Based Semicustom Design

In order to validate the proposed lookup table-based adaptive body biasing in semicustom designs using standard-cell elements, we developed a custom cell library and associated layout methodology. We took 21 cells (four inverters, three 2-input NAND gates, one 3-input NAND gate, one 4-input NAND gate, one 2-input NOR gate, one tri-state buffer, six flip-flops, and four latches) from a commercial 180nm cell library, removed the body contacts, optimized the layout, and then re-characterized the devices using SPICE simulations. By optimizing the layout, we were able to reduce the height of each cell by 11%, which achieves a saving of area.

Our layout methodology is shown in Fig. 4. A new tap cell [6] was designed to deliver the body biases, supplied by the adaptive body bias controller, to the n-well and p-well. The tap cells are inserted in a regular fashion as shown in Fig. 4. They are fixed in their locations, and then the logic elements are placed and routed automatically. The columns of the tap cells are separated by 50$\mu$m [6]. The layout of a tap cell and of a 2-input NAND gate are also shown in the figure. The application of this layout methodology to example circuits will be demonstrated in Section III.
TABLE I

EXPERIMENTAL RESULT ON ISCAS BENCHMARK CIRCUITS AT ROOM TEMPERATURE, FOR $V_{DD} = 1.8V$

<table>
<thead>
<tr>
<th>Circuits</th>
<th>Gates</th>
<th>Area (µm²)</th>
<th>Leakage (nA)</th>
<th>Delay (ns)</th>
<th>Area (µm²)</th>
<th>$\Delta V_T$ (mV)</th>
<th>Leakage (nA)</th>
<th>Compensated delay (ns)</th>
<th>$V_{IH}$ (V) / $V_{IL}$ (V)</th>
</tr>
</thead>
<tbody>
<tr>
<td>c2540</td>
<td>1659</td>
<td>120 × 105</td>
<td>512</td>
<td>2.394</td>
<td>107 × 109</td>
<td>-30</td>
<td>4.12</td>
<td>1291</td>
<td>-0.2 / +1.6</td>
</tr>
<tr>
<td>c6288</td>
<td>2416</td>
<td>315 × 115</td>
<td>910</td>
<td>0.513</td>
<td>291 × 112</td>
<td>-10</td>
<td>13.43</td>
<td>0.812</td>
<td>-0.17 / +1.7</td>
</tr>
<tr>
<td>s1423</td>
<td>731</td>
<td>121 × 105</td>
<td>170</td>
<td>3.135</td>
<td>110 × 107</td>
<td>10</td>
<td>4.76</td>
<td>3.135</td>
<td>-0.05 / +1.85</td>
</tr>
<tr>
<td>s9234</td>
<td>5008</td>
<td>315 × 70</td>
<td>1001</td>
<td>0.763</td>
<td>291 × 71</td>
<td>30</td>
<td>24.59</td>
<td>0.759</td>
<td>-0.15 / +1.95</td>
</tr>
</tbody>
</table>

III. EXPERIMENTAL RESULTS

We performed experiments on a set of four circuits taken from the ISCAS’89 benchmarks. Table I gives the characteristics of the original circuits. Each circuit was mapped on to a commercial 180nm triple-well, 1.8V gate library. Using the same 21 gates from the library, we were able to compare the original circuit with the one that is mapped to our custom library. Each circuit was placed and routed, and used the area shown in the third column. The transistor-level netlist is then extracted from the layout and simulated to determine the standby leakage current and the active-mode circuit delay.

The sixth column of Table I shows the area of each circuit when mapped on to our custom cell library, as explained in Section II. Compared to the original circuit, the use of custom cells gives us area savings of between 7% and 11% even including tap cells, due to the reduced cell height. The controller occupies an area of 70µm x 105µm, of which 57% is taken up by the resistor tree. The size of this proportion is due to the well isolation required by PMOS devices. The resistor tree consists of 96 PMOS devices, $V_{DDH}=3.3V$, and $V_{DDL}=-1.5V$, so that bias voltages between $V_{DDL}$ and $V_{DDH}$ can be generated in steps of 50mV. The negative voltage of $V_{DDL}$ could be provided from out of the chip or could be generated by using a charge pump, which is beyond the scope of this paper. The codeword consists of 3 bits which gives good process compensation, as explained in the previous section.

In order to simulate the effects of process variation, we assumed that each circuit has a threshold voltage which differs from its standard value as shown in the seventh column of Table I. In the eighth column is the standby leakage current of each circuit. Compared to the original circuit, the leakage is cut by a factor of between 40 and 124, due to the large reverse body bias that we use in standby mode. The ninth column shows the delay in each circuit when active mode body bias is applied; the amount of bias is shown in the last column of the table. In contrast with the delays in the original circuit, all the circuits are now compensated.

IV. CONCLUSION

An adaptive body biasing has been used to compensate for process variation and to reduce subthreshold leakage current. The overhead of biasing circuits has limited its use to chip-level. In this paper, we have proposed a new adaptive body biasing scheme that can be used in block by block basis. The proposed scheme uses a lookup table that holds a codeword corresponding to the active mode body bias of each block on a chip, which, when applied, can compensate for process variation. A predetermined reverse body bias is used to reduce subthreshold leakage in standby mode. Since a fixed number of predetermined bias voltages are used, it is important to design them in efficient way. We have presented the layout methodology for applying the proposed scheme to semi-custom designs using standard-cell elements. We performed an experiment with benchmark circuits, and have demonstrated that, through the use of proposed scheme, process variations can be compensated for and standby leakage current is reduced significantly.

ACKNOWLEDGMENT

This work was supported by Samsung Electronics.

REFERENCES