TITLE: TIMING-LOGIC DERATING COMPUTATION USING EVENT PROPAGATION PROBABILITIES
Soft errors, also called transient errors, are intermittent malfunctions of the
hardware that are not reproducible~\\cite{Nguyen03}. These errors, which can occur
more often than hard (permanent) errors~\\cite{Karlsson94}, arise from Single Event
Upsets (SEU). Soft Error Rate (SER) for a device is defined as the error rate due to SEUs.
The SER of integrated circuits increases with device scaling since as the feature size
shrinks, the amount of charge per device decreases enabling a particle strike to be much
more likely to cause an error~\\cite{Shivakumar02}.
This causes particles of lower energy, which are far more plentiful, can generate sufficient
charge to cause a soft error.
While memory elements have been more susceptible to SEUs than combinational logic, however,
analytical models predict that the SER in the combinational logic will be comparable to that
of memory elements by 2011~\\cite{Shivakumar02}. Unlike memory elements which can be protected
by 10-20\\% area and power overhead~\\cite{Thaller03}, the redundancy techniques for logic
circuits impose significant overhead (area, power, timing, and cost)~\\cite{Mohanram03a}.
This high overhead may not be affordable in mainstream applications, where the cost, area,
and power are the main concerns.
The first step in developing efficient soft error tolerant schemes is to accurately estimate
the SERs of the design and the contribution of circuit nodes to the overall system SER. While
particle flux uniformly encounters the whole circuit, different circuit nodes have different
error propagation probabilities.
To compute the error rate of a node $n_i$ in a digital circuit, three probability factors have
to be computed~\\cite{Nguyen03}:
Nominal FIT:
the occurrence rate of SEUs at node $n_i$ to cause a glitch at the output of the gate. This
parameter depends on the energy of the particle, type and size of the gate, and the device
characteristics.
Logic Derating:
the probability that node $n_i$ is functionally sensitized by the input vectors such that the
erroneous value is propagated from the error site to system outputs or flip-flops.
Timing Derating:
the probability that the propagated erroneous value (from node $n_i$) to the flip-flop inputs
is latched in the flip-flops.
An erroneous system state occurs in the following scenario.
particle strike must cause a glitch at the output of the gate (Nominal FIT), this glitch has to
propagate through the combinational logic to the flip-flop inputs (Logic Derating), and finally
this erroneous glitch must be captured in a flip-flop, i.e. the erroneous transient must have
a sufficient overlap with the latching window of the flip-flop (timing derating).
Note that in the general case, logic derating and timing derating are not independent, since the
propagation of erroneous glitch depends on the logic state of the entire circuit. Hence, an
accurate analysis requires the concept of timing-logic derating.
Nominal FIT can be easily obtained from layout information of library cells, technology parameters,
and particle energy~\\cite{Maheshwari03,Mohanram03a,Nguyen03}. Hence in this paper, we do not focus
on estimating this parameter.
To compute logic derating, it is required to compute the probability that the node is functionally
sensitized by the input vectors to propagate the erroneous value from the error site to
outputs~\\cite{Mohanram03a}. Previous logic derating estimation methods use fault injection based
on random vector simulation approaches~\\cite{Maheshwari03, Mohanram03a,Nguyen03,Omana03,Reorda03,Shivakumar02,Zhou04b}.
The execution time for logic derating estimation of a node in large circuits exponentially increases
with the size of the circuit. Hence, logic derating estimation of larger circuits becomes intractable
and very inaccurate using fault injection techniques. We have proposed an analytical method to
accurately estimate logic derating in combinational circuits using signal
probabilities \\cite{asadi-date05,asadi-iscas05}. The proposed algorithm gives linear computational
complexity. There has been previous research on probabilistic testability analysis, which involves
computation of observability of internal nodes~\\cite{Jain84,Jone95,Seth85}. These algorithms either
sacrifice accuracy in reconvergent fanouts or give non-linear computational complexity \\cite{Jain84,Jone95,Seth85}.
Timing derating depends on the width of a glitch caused by a particle strike at the output of the gate, the
latching windows of reachable flip-flips, and the propagation delay from the output of the victim gate to
the input of reachable flip-flips. Moreover, the propagation probability of the erroneous glitch (from
logic perspective) must be also carefully considered since propagation of transients through reconvergent
paths might be blocked depending on the logic state of the circuit. Unlike logic derating which requires
static analysis, timing derating requires dynamic analysis of transient propagation. Therefore, fault
injection method for timing derating estimation requires timing-accurate simulation, which makes this
approach even more tedious compared to logic derating estimation.
In this talk, we present a technique for logic-timing derating estimation of sequential circuits. We use
an enhanced static timing analysis method to compute all propagated waveforms from a stricken gate to
reachable flip-flops and calculate the probability of latching an incorrect value in a flip-flop (i.e. incorrect system
state). We also exploit a technique based on signal probability to estimate propagation probabilities.
Experiments show that our SER estimation technique is 4-5 order of magnitude faster than Monte-Carlo
simulation method while the difference is less than 2\\% on average.
===============================
References:
\\bibitem{asadi-date05}-,``An Accurate SER Estimation Method Based on Propagation Probability,\'\' In Design Automation and Test in Europe (DATE) Conference, pp.306-307, 2005.
\\bibitem{asadi-iscas05}-,``An Analytical Approach for Soft Error Rate Estimation In Digital Circuits,\'\' In IEEE International Symposium on Circuits and Systems (ISCAS), 2005
\\bibitem{Jain84} S. K. Jain and V. D. Agrawal, ``STAFAN: An Alternate to Fault Simulation,\'\' Proc. of the 21$st$ Design Automation Conference, pp. 18-23, 1984.
\\bibitem{Jone95} W. B. Jone and S. R. Das, ``CALOP - A Random Pattern Testability Analyzer,\'\' IEEE Trans. on Systems, Man and Cybernetics, vol. 25, May 1995.
\\bibitem{Karlsson94} J. Karlsson, P. Ledan, P. Dahlgren, and R. Johansson, ``Using Heavy-Ion Radiation to Validate Fault Handling Mechanisms,\'\' IEEE Micro, 14(1), pp. 8-23,
Feb. 1994.
\\bibitem{Maheshwari03} A. Maheshwari, I. Koren, and W. Burleson, ``Techniques for Transient Fault Sensitivity Analysis and Reduction in VLSI Circuits,\'\' Proc. of the IEEE Intl. Sym. on Defect and Fault-tolerance, pp. 597-604, 2003.
\\bibitem{Mohanram03a} K. Mohanram and N. A. Touba, ``Cost-Effective Approach for Reducing Soft Error Failure Rate in Logic Circuits,\'\' Proc. Int\'l Test Conf., pp. 893-901, 2003.
\\bibitem{Nguyen03} H. T. Nguyen and Y. Yagil, ``A Systematic Approach to SER Estimation and Solutions,\'\' Proc. Intl. Reliability Physical Symp., pp. 60-70, 2003.
\\bibitem{Omana03} M. Omana, G. Papasso, D. Rossi, and C. Metra, ``A Model for Transient Fault Propagation in Combinatorial Logic,\'\' Proc. IEEE Intl. On-Line Testing Symp. 2003.
\\bibitem{Reorda03} M. Sonza Reorda and M. Violante, ``Accurate and Efficient Analysis of Single Event Transients in VLSI Circuits,\'\' Proc. IEEE Int\'l On-Line Testing Symp. 2003.
\\bibitem{Seth85} S. C. Seth, L. Pan and V. D. Agrawal, ``PREDICT - Probabilistic estimation of digital circuit testability,\'\' Proc. 15th Int. Symp. on Fault-Tolerant
Computing, 1985, pp. 220-225.
\\bibitem{Shivakumar02} P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger, and L. Alvisi, ``Modeling the Effect of Technology Trends on the Soft Error Rate of Combinatorial
Logic,\'\' Proc. Int\'l Conf. on Dependable Systems and Networks (DSN), pp. 389-398, 2002.
\\bibitem{Thaller03} K. Thaller and A. Steininger, ``A Transparent Online Memory Test for Simultaneous Detection of Functional Faults and Soft Errors in Memories,\'\' IEEE Trans. on Reliability, Vol. 52, Issue 4, pp. 413-422 , Dec. 2003.
\\bibitem{Zhou04b} Q. Zhou and K. Mohanram, ``Cost-Effective Radiation Hardening Technique for Combinational Logic,\'\' Proc. International Conference on Computer-Aided Design, 2004.