Fault Simulation Model for \textit{iDDT} Testing: An Investigation

Abhishek Singh, Chintan Patel and Jim Plusquellic

Department of Computer Engineering, University of Maryland, Baltimore County

\texttt{abhishek, cpatel2, plusquel@cs.umbc.edu}

\section*{Abstract}

In today's technologies, resistive shorting and open defects are becoming more predominant. Conventional fault models, and tools based on these models are becoming inadequate in addressing these defects resulting from new failure mechanisms. In prior works \textit{iDDT} testing techniques have shown to detect resistive defects. However, in order to incorporate \textit{iDDT} based methods into production test flows, it is necessary to develop a fault simulation strategy to enable ATPG and fault coverage to be determined. To our knowledge, no practical technique exists to perform fault simulation for \textit{iDDT} based methods. At the heart of the difficulty of developing a fault simulation strategy is the analog nature of the test observable. In this paper we investigate a fault simulation model that partitions the task of simulating the CUT (chip under test) into linear and non-linear components. We also propose a path isolation strategy for core-logic as a means of reducing the computational complexity involved in deriving \textit{iDDT} signals in the non-linear portion. More specifically an Impulse Response based method is derived to eliminate the need for transient simulations of the entire CUT.

\section{1.0 Introduction}

Existing fault models are becoming increasingly ineffective in modeling defect mechanisms in DSM technologies. For example, the change from a subtractive aluminum process to damascene Cu may lead to more particle-related blocked-etch resistive opens [1]. Also, technology scaling increases the probability of resistive vias caused by incomplete etch. This suggests the need for screening mechanisms that can target such resistive type of defects.

Previous research on \textit{iDDT} test, such as [2-8], [11-12], shows that changes in the circuit configuration caused by a defect, manifests itself as anomalies in the transient signals measured at the power supply ports. The methodology presented in this paper is applicable to any of these \textit{iDDT} test methods. However, we feel that the method described in [17], called Transient Signal Analysis (TSA), is better able to exploit the full potential of \textit{iDDT} testing. Therefore, we use TSA as a vehicle for screening defects in the proposed fault simulation flow. TSA is an \textit{iDDT} test technique based on cross correlation of multiple \textit{iDDT} signals measured at power supply ports in a CUT.

The integration of TSA or other \textit{iDDT} test techniques into existing production test flows requires tools for performing ATPG and fault simulations. A fault simulation engine, for such techniques, must generate \textit{iDDT} signals that are subsequently processed using the defect screening procedure. This implies the need for performing transient simulations on the entire CUT. The memory and time requirements of such simulations are prohibitive. In this paper we propose a model that can be used to implement a practical fault simulator for \textit{iDDT} testing. This is achieved by decomposing the CUT into two systems and individually analyzing the means of reducing the simulation complexity of these systems. The two systems involve, 1) a linear constituent, namely, the power grid and 2) a non-linear core logic circuit. A unique method based on impulse response (IR) and convolution is proposed for the linear power grid component. This method allows simulation-less computation of the power grid response to transient inputs produced from the core logic (non-linear) component. Furthermore, a path isolation scheme is proposed to address the simulation complexity of the non-linear component.

\section{2.0 Background}

Testing methods based on the analysis of power supply transient signals are described in [2-6], [11-12] for digital circuits and in [7-8] for analog circuits. However, we have not uncovered any prior work that proposes models that can enable fault simulation of \textit{iDDT} test vectors. In addition to providing a practical methodology for \textit{iDDT} fault simulation, we believe that this work can be leveraged for power verification and signal integrity analysis. This is true because these tasks share the requirement of transient simulations. The former analyzes these signals to identify anomalies caused by defects whereas, the latter processes dynamic IR and Ldi/dt drop, and package/on-chip resonance.

Due to signal integrity problems caused by aggressively increasing device densities, simulation of transient power distribution in a chip has become an essential step in power verification. To meet this growing requirement several static and transient simulation techniques have been proposed in the past decade. Mathematical tools that speed-up the power grid simulations, such as [15], [18-20] and methods based on Transient Current Simulation of logic circuits [9], [13] and [16] have been proposed. These allow derivation of transient currents drawn by logic circuits without running SPICE level transient simulations.
3.0 System Overview

Any digital CMOS chip can be modeled as a combination of two complementary electrical systems, a linear RC system formed by the power (V_{DD} and GND) grid and a non-linear RC-transistor system formed by the underlying CMOS logic circuit. For example, Figure 1 shows a portion of a typical row based standard-cell design. The upper half of the figure depicts the power grid, laid out as a stack of uniformly spaced metal runners at alternating layers connected using stacked contacts. (For simplicity only a two layer structure is shown) The parasitic resistance and capacitance of the power grid along with the on-chip decoupling capacitors form a linear RC system. This linear system is referred to as Power Grid Circuit (PGC). In reality the physical structure of metal rails comprising a power grid also contributes a significant amount of inductance(L), thereby rendering it a three-dimensional RLC ladder circuit. In this work, we ignore the inductive component of the power grid for simplicity. It must be realized that this does not have any bearing on the proposed concept of system partitioning. This is true because L, like R and C, is a linear component and does not affect the linear nature of the PGC.

The lower half of the figure shows the core logic comprising the standard-cells (MOS-transistors) and the signal routing of the chip. Since, MOS-transistors are inherently non-linear devices, this portion forms a non-linear electrical system, referred to as Core Logic Circuit (CLC). The local V_{DD} and GND runners on the standard-cells are connected to the global power grid at various points through special nets/vias called follow-pins, as indicated in the figure.

A transition sequence applied at the primary inputs (or outputs of scan latches) causes the gates along a sensitized path to switch in a temporal sequence. Each switching gate draws transient current (i_{DS}) from the power grid, sourced by the external power supply pads or C4s. The PGC transforms these i_{DS} signals into composite current transients (i_{DDT}) that are measured at the C4s. Therefore, the CUT can be modeled as a cascade of two electrical systems wherein, the outputs the CLC feeds the inputs of PGC. The transient current drawn by a CMOS gate can be modeled as a PWL current source connected between the V_{DD} and GND grid. The metal layer at which the CUT is partitioned into the above two systems, determines the sparseness of the input and output ports in the PGC and CLC respectively.

4.0 Fault Simulation Flow for i_{DDT} Testing

The fault simulation procedure for i_{DDT} testing can be described using a flow diagram shown in Figure 2. The subsequent sections of this paper elaborate on each part of the flow diagram.

![Flow diagram of the fault simulation procedure for i_{DDT} testing](Figure 2)

**Step1: System Partitioning:** The most important step in the flow is the partitioning of the CUT into a linear (PGC) and a non-linear (CLC) electrical system.

**Step2: Power Grid Characterization:** Power grid characterization involves the derivation of IR functions between each input-output port of the power grid. The IR functions, by definition, completely characterize the power grid and can used to derive its response to any arbitrary input using convolution.

**Step3: Generation of Iso-IR Bands:** An iso-IR band defines a physical region in the grid layout consisting of input locations that are characterized by a similar IR. Such a categorization helps in significantly reducing the number of convolution operations required to generate the power grid response.

**Step4: Fault Injection:** A fault is injected between two nodes in the layout of the CUT by introducing resistive connection modeling resistive shorts/bridges or opens. The advantages of using layout information are 1) the fault model closely represents the physical defects and 2) only physically adjacent nets are considered as short/bridge candidates. We believe that this fault injection scheme closely follows the defect based test paradigm.

**Step5: Isolated-Path Transient Simulation:** An isolated path corresponds to a sensitized path that is physically separated from the chip layout by breaking connections with the unsensitized logic and the power grid, while, preserving the fanout loads and the signal connections of the gates within the sensitized path. The transient current waveform generated by the core logic at each input location in the CLC can be derived through SPICE or time-domain Current Waveform Simulations of these isolated paths.

**Step6: Iso-IR Band Selection:** Based on the physical location of the input current source, the iso-IR band corresponding to that input is identified.
**Step7: Convolution:** The $i_{DDT}$ or $i_{DS}$ waveforms measured at specific nodes during the isolated path simulations are convolved with the IR functions of the iso-IR band selected in the previous step.

**Step8: Linear Superposition:** By the virtue of superposition property of linear time invariant (LTI) systems the power grid response to individual current input source can be combined through linear superposition to derive the overall response of the grid at a given $C_4$. This property allows us to simulate all the sensitized paths independently. Another major advantage of this property is to limit the maximum number of convolution operations required to derive the overall response of the PGC. This is detailed further in the paper.

**Step9: $i_{DDT}$ analysis for fault detection:** The $i_{DDT}$ thus obtained at the $C_4$s, includes the current drawn by the defect and the defect-free logic. These $i_{DDT}$s can be used as the input for TSA or as input to a similar technique using the RLC model of the probe card as a means of obtaining the overall response. In either case, the output of this fault simulation procedure is the basis for a) determining the $i_{DDT}$-test coverage of a given set of test patterns, b) guiding the ATPG algorithm to generate vectors that can enhance the detectability of the defect and c) determining the range of resistance values for which a resistive short, open or bridging defect is detectable.

### 5.0 System Partitioning

The CUT is decomposed into a linear and non-linear system by breaking the physical connections between the power grid and the core logic cells. The metal layer at which the connections are broken defines the partitioning scheme. For example, Figure 3 shows two possible partitioning schemes. In one case the partitioning is done at the follow-pins whereas, in the second case the partitioning is done at the nodes where MOS transistors in each standard-cell connects to the local $V_{DD}$/GND rails. The “cross” symbol indicates the nodes at which the connections are broken in each scheme. The same is true for GND grid not shown in the figure. In the first case, referred to as FP-scheme (for follow-pins), the linear system (PGC) consist of the global power-grid and the non-linear system (CLC) consists of the standard-cells, including their local $V_{DD}$/GND rails and the signal routing. In the second case, referred to as the TR-scheme (for transistors), the PGC includes the global power grid as well as the local $V_{DD}$/GND rails in each row of the core logic and the CLC includes the standard-cells (without $V_{DD}$/GND rails) and the signal routing. The locations (or nodes) at which the systems are partitioned represent the input-ports of the PGC and output-ports of the CLC. For the TR-scheme, when the source nodes of more than one MOS-transistor are connected directly to the local $V_{DD}$/GND rails, which may be true for complex standard cells, any one node can be considered as an output. This is a reasonable assumption given that the impedance between any two points on the local $V_{DD}$ (or GND) rail in a standard-cell is sufficiently small. The FP-scheme offers fewer number of input-output ports, reducing the complexity involved in the PGC characterization but may increase the complexity of core-logic (non-linear) simulation due to inclusion of local $V_{DD}$/GND metal rails in the CLC. In contrast, the TR-scheme offers larger number of input-output ports which increases the complexity of PGC characterization but simplifies the core-logic simulations. These trade-offs will be revisited in the subsequent sections.

### 6.0 Power Grid Circuit (PGC) Characterization

In current technologies the $V_{DD}$/GND routing can occupy more than 25% of the total routing area on a chip and thus consist of millions of linear elements. SPICE simulations on these grids can be very expensive and therefore, several techniques, such as [15], [18], [19] and [20], have been proposed for fast power grid simulation. Even these tools may prove infeasible as the base simulation engine for an $i_{DDT}$ fault simulator, due to the large number of faults and test vectors. We demonstrate a convolution procedure based on the linearity property of the PGC that enables us to compute the power-grid response to the input switching transients from the core logic without running simulations.

Any linear time invariant (LTI) system can be completely characterized by its Impulse Response (IR) function denoted as $h(t)$. The impulse response $h(t)$, is the output of the system to a unit impulse function, $δ(t)$. Once the IR of a linear system is known, we may construct the response of a the system to an arbitrary input signal as a sum of suitably delayed and scaled impulse responses. This process is called convolution and is mathematically described using Eq. 1. Here, $f(t)$ is the input signal, $g(t)$ is the output signal

\[
g(t) = h(t) \ast f(t) = \int_{-\infty}^{\infty} h(u)f(t-u)du \tag{1}
\]

and $h(t)$ is the IR function. The response of a linear system to an arbitrary input signal can thus be computed by convolution using the IR function in time domain.

The PGC represents a multi-input and multi-output linear system. Each input on the grid sees a different RC network to each of the outputs ($C_4$s) therefore, there exists a unique IR function for each input-output pair, denoted as...
where \( h_{ij}(t) \), where \( i \) and \( j \) represents the input and output port of the PGC respectively. A set of such IR functions, \( h_{ij}(t) \), can be used to characterize the PGC. Once the grid is characterized, its response to transient inputs can be determined by convolving the transients with the corresponding IR functions. Superposition and shift-invariance properties of a LTI system are used to determine the outputs due to 1) multiple switching gates within a sensitized path and 2) multiple sensitized paths under the same input sequence. This can eliminate the need to simulate millions of RC elements with a fixed number of multiplication and addition operations during the fault simulation.

### 7.0 Experimental Setup

Figure 4(a) shows a portion of the commercial power grid subsequently referred to as the Quad. The Quad occupies a 10,000 by 10,000 unit area and interfaces to a set of external power supplies through an area array of \( V_{DD} \) and GND C4 pads. As indicated in the figure, there are 4 \( V_{DD} \) C4s and 6 GND C4s in this portion of the grid. Figure 4(b) shows that the grid is constructed over 4 layers of metal, with metal 1 (M1) and metal 3 (M3) running vertically and M2 and M4 running horizontally. The C4s are connected to wide runners of vertical M5, shown in Figure 4(a), that are in turn connected to the M1-M4 grid. In each layer of metal, the \( V_{DD} \) and GND rails alternate. Stacked contacts are placed at the appropriate crossings of the horizontal and vertical rails.

We derived an RC model of the Quad using an extraction script that preserves the physical structure of the metal interconnect in the topology of the RC network, i.e., no network reduction heuristics are applied. The resistance per square and the overlap capacitances per unit area of TSMC’s 0.25µm 5 metal process used in the extraction process were obtained from published parameters by MOSIS [21].

In this experiment we consider the Quad as the PGC and a custom designed 16-bit logarithmic adder as the CLC. We inserted ~3000 labels at the M1-M2 crossovers in the grid layout to represent possible input locations on the PGC.

The impulse response from each of the input ports to the C4s are obtained by first stimulating the grid at each of the input ports with a unit step input. This gives the step response, \( s_{ij}(t) \), of the grid with respect to input location \( i \) and output \( j \). The impulse response, \( h_{ij}(t) \), can be derived from the step response, \( s_{ij}(t) \), using the differentiation in time domain. This is again due to the linearity property of the system and can be described using Eq. 2, where \( \delta(t) \) and \( u(t) \) represent unit impulse and step input functions, respectively and \( h(t) \) and \( s(t) \) represent the unit impulse and step response of the linear system, respectively.

\[
\delta(t) = \frac{d}{dt} u(t) \Rightarrow h(t) = \frac{d}{dt} s(t)
\]

Figure 5(a) shows a subset of the possible input locations in the quad. It also indicates the IR functions that exist between a source location, \( i \), to all the C4s. (b) Step response. (c) Impulse Response.

Figure 6(a) shows the “Quad”: A portion of the commercial power grid used in the simulation experiments.

![Figure 4](image.png)

Figure 4. The “Quad”: A portion of the commercial power grid used in the simulation experiments.

![Figure 6](image.png)

Figure 6. a) Triangle input current to the PGC (b) Superimposed SPICE and Convolution responses.
figure the two curves are almost identical.

7.1 Iso-IR Contours

Eq. 3 gives the expression for convolution sum of discrete signals where \( g[i] \), \( h[i] \) and \( f[i] \) represents the output, IR and input of the linear system respectively. If \( f[i] \) is a N point signal, \( h[i] \) is a M point signal then \( g[i] \) is a N+M-1 point signal as given by the above equation. Eq. 3 shows that the convolution based method requires N*M additions and multiplications. Thus, the complexity involved in the computation of grid response is significantly reduced as compared to a SPICE simulation based approach that involves solving several partial differential equations.

Comparison of IR function curves from adjacent input locations suggests that their amplitude and shape characteristics vary slowly as a function of distance. Thus IR functions from adjacent input locations, within a user defined threshold, can be grouped into regions or bands. The threshold is chosen based on tolerable difference in the output waveforms obtained using Eq. 3. However, selection of a suitable threshold requires a means of quantifying the similarity between two waveforms in terms of their shape characteristics. This similarity analysis is performed using cross-correlation and auto-correlation operations.

Cross-correlation of two waveforms results into a third waveform, the amplitude of which indicates the degree of similarity between the two waveforms. Also, the location of its peak indicates the time-shift required in the second signal to obtain the maximum match with the first waveform. This is mathematically expressed by Eq. 4, where \( r_{xy}[i] \) represents cross-correlation of two waveforms \( x[i] \) and \( y[i] \). When \( x[i] \) and \( y[i] \) are identical, the operation is termed as auto-correlation, \( a_x(t) \). The peak value of an auto-correlation function provides us the maximum expected value for the degree of similarity in the IR functions.

First, the input locations are sorted in an ascending order of their euclidian distance with respect to a reference location. The creation of a new iso-IR band begins with the selection of an representative location, referred to as focus of the iso-IR band, which is the first input location in the distance-sorted list and is not a part of any previously identified iso-IR band. The IR function of the focus is auto-correlated to obtain the maximum expected degree of similarity, \( \text{max}(a_f(t)) \). The algorithm then searches the entire grid space to find all input locations that have a peak cross-correlation value not exceeding 5% (selected difference tolerance) of the peak auto-correlation value of the focus. This condition is given by Eq. 5.

\[
\text{max}\left\{ r_{f_i h_j}(t) \right\} \leq 0.05 \cdot \text{max}\left\{ a_f(t) \right\} \quad \text{Eq. 5.}
\]

Where the quantity on the left represents the cross-correlation of the IR function of focus with the IR function of any other location \( h(t) \). The quantity on the right represents the auto-correlation of focus’s IR function.

Figure 7 shows the Quad with the iso-IR band limit contours. A band is enclosed within two iso-IR limit contours. For the purpose of clarity only every other contour is shown in the figure. Using the difference tolerance of 5%, 28 iso-IR contours were obtained. This shows a factor of 100 reduction in the number of IR functions (~3000 to ~30) required to generate the response of the PGC within the given difference tolerance.

![Figure 7. Iso-IR contours depicting the regions with similar impulse response at C4 Vdd0.](image)

The categorization of IR functions into iso-IR bands reduces the maximum number of convolution operations to the total number of identified iso-IR bands. This is due to the superposition property of a linear system explained using Eq. 6. Where \( B \) represents the total number of iso-IR bands on the grid and \( S \) represents the total number of input locations inside a given iso-IR band. As the equation suggests the output \( y_{C40}[n] \) is a linear superposition of responses due to each iso-IR band, \( y_j[n] \). Ideally, computa-
tion of each $y_i/n$ requires $S$ convolutions, however, due to creation of iso-IR bands the complexity involved in the computation of each $y_i/n$ reduces to a single convolution operation. This helps to reduce the total number of convolutions to $B$.

8.0 Experimental Design

The accuracy of the iso-IR band based fault simulation procedure is verified using a full-custom designed 16-bit logarithmic adder as a representative of the sensitized portion in the CLC. Figure 8(a) shows the approximate location of the adder in the lower left corner of the PGC. The layout for the 16-bit logarithmic adder is shown in Figure 8(b). The gates in the layout consist of transistors with W/L ratios ranging from 2 to 5 for NMOS and 3 to 7 for PMOS (however, most are minimum size). The power rails of the adder are connected to SPICE voltage sources at the six labeled points, $V_{a0}$ through $V_{a5}$. These points are determined by locating the intersection of each label placed in the PGC with the local $V_{DD}$ and GND rails of the adder. Therefore, it may be considered a case of partitioning using the FP-scheme as the local rails are considered part of the CLC. The GND connection points (not shown) are adjacent to the $V_{DD}$ connection points.

The six $V_{DD}$ input locations in the PGC ($V_{a0}$ through $V_{a5}$) were found to traverse 4 different iso-IR bands. Therefore, the computation of power grid response for a given input sequence entails six convolution operations using four IR functions. Figure 9(a) shows the current transients measured at the six input locations using SPICE simulations. The response of the PGC at $V_{DD0}$ obtained by convolving each of the current waveforms with their corresponding IR functions is shown in Figure 9(b). The overall response of PGC to the sensitized adder logic measured at $V_{DD0}$ is obtained by linear superposition of its response to individual current sources. Figure 10(a) shows the overall response of the PGC to the sensitized adder logic at $V_{DD0}$ obtained using convolution based method overlaid with the response obtained using SPICE simulations.

![Adder Layout](image)

Figure 8. (a) A 16-bit logarithmic adder connected to the PGC. (b) Layout of the adder.

![Adder Waveforms](image)

Figure 9. (a) Adder $i_{DDT}$ waveforms that form inputs the PGC. (b) Response of the PGC to the individual $i_{DDT}$s in figure 9 (a).

![Convolution vs. SPICE](image)

Figure 10. a) Overall response using SPICE superimposed with convolution derived results. (b) Overall response using SPICE superimposed with convolution derived results of PWL fits to $i_{DDT}$s.

The complexity involved in the computation of power grid response based on convolution operations can be further reduced by creating piece-wise-linear (PWL) abstractions of the input current waveforms. An algorithm based on detection of the change in polarity of slope in a waveform is used to derive it’s PWL abstraction. This significantly reduces the number of points in each input signal and thus the number of multiplication and addition operations. Figure 10(b) shows the overall response of the grid obtained using convolution based on PWL abstracted inputs. Overlaid with this curve is the SPICE generated response obtained using original SPICE waveforms as inputs. The peak amplitude and width values obtained using SPICE and the percentage difference error as compared to the convolution results on original SPICE and PWL inputs are listed in Table 1. The width of $i_{DDT}$ is measured as the time-interval between the points at which the waveform attains 5% of its peak value.

<table>
<thead>
<tr>
<th>Original $i_{DDT}$s</th>
<th>PWL fitted $i_{DDT}$s</th>
</tr>
</thead>
<tbody>
<tr>
<td>5% width (spice) = 3.26ns</td>
<td>5% width (spice) = 3.26ns</td>
</tr>
<tr>
<td>peak (spice) = 0.594 mA</td>
<td>peak (spice) = 0.594 mA</td>
</tr>
<tr>
<td>% width error = 0.4%</td>
<td>% width error = 3%</td>
</tr>
<tr>
<td>% peak error = 1.8%</td>
<td>% peak error = 10%</td>
</tr>
</tbody>
</table>

Table 1: Amplitude and Peak Error.

9.0 Isolated Path (CLC) Simulation

The SPICE simulations of isolated paths can be used as a means of enabling the simulation of the CLC paths to
derive the current input waveforms that feed the PGC. We can analyze the complexity involved in such a scheme by assuming the average number of gates sensitized under a given test sequence. If we assume the maximum fanout (F_O) of 4, logic depth (D) of 5 and maximum fanin (F_I) of 3, the maximum number of sensitized gates can be computed using geometric progression given by Eq. 7. Assuming that the minimum number of gates sensitized under a test sequence is 5 (30 transistors), the average number of transistors per test sequence is 1038. This shows a significant reduction in the number of transistors that need to be simulated under a given test sequence compared to the entire CLC of the CUT. It must be realized that under any given test sequence several paths may get sensitized independently. All such independently sensitized paths can be isolated and simulated in parallel and their results can be combined during the PGC convolution process. Instead of using SPICE to simulate the isolated paths we may also make use of tools that perform Transient Current Simulations at the switch level verilog netlist. We are currently investigating the accuracy and complexity of Transient Current Simulations. Also results for paths already simulated when processing a previous input test vector can be reused for other test sequences.

10.0 Conclusions

This paper investigates the practical issues concerning the implementation of fault simulation methodology for IDD testing. A new model based on convolution based approach is proposed that can be used to compute the power grid response using the precomputed impulse response (IR) functions. This circumvents the need for running time and memory intensive transient simulations on the entire CUT. The categorization of IR functions into iso-IR bands is shown to further reduce the number of convolutions required to compute the transient response of the grid to the maximum number of iso-IR bands. An approach based on isolation of sensitized paths from the layout is proposed as a means of enabling the simulation of core logic circuit. The accuracy and complexity of these methods are evaluated on part of a commercial power grid using a 16-bit logarithmic adder as the core-logic.

References