BUGS are like a career partner for a verification engineer. Every new bug discovery makes you feel elated and satisfied. Probably a verification engineer would say BUGS more no. of times than his/her kid’s name J.
However, bugs aren’t always welcome especially when discovered late. Intel, the top semiconductor company has been bitten by another nasty bug recently.
OLD BUG – In late 1994, Intel confirmed a bug, popularly known as FDIV BUG (FDIV – x86 assembly language floating point instruction) in hardware divide block of the Pentium processor. According to Intel, the bug is rare (once every 9 to 10 billion operand pairs) and its occurrence depends upon frequency of FP instruction usage, the input operands, how output of this unit is propagated into further computation and the way in which the final results are interpreted. The bug was root caused to few missing entries in the lookup table used by the divide operation algorithm and was fixed with a mask change in the re-spin. Total cost of replacement of the processors was approximated to $475 million in 1995.
One example to test the BUG was - (824633702441.0) X (1/824633702441.0) should be exactly equal to 1 while the affected chips return 0.999999996274709702 for this calculation.
NEW BUG – A few weeks back, Intel confirmed another bug in a recently announced (at CES) Cougar point chip sets having 2 sets of SATA ports (3Gbps & 6Gbps). The problem was discovered in a transistor in 3Gbps PLL clocking tree that has a very thin gate oxide allowing it to turn on with very low voltage. This transistor has been biased with a high voltage leading to high leakage current. Continuous usage of this port will lead to transistor failure estimated to be in 2 to 3 years. The problem was confirmed in the Intel reliability lab while testing for accelerated life time performance (~time machine J). The remedy is a metal layer fix. Intel decides to replace the chip sets and has declared the approximate cost to be $700 million in worst case.
Points to ponder
- The mean time between the 2 bugs (both ended up with customer) is 15 years.
- Intel’s handling of this crisis equivalent situation has improved a lot since the first one.
- The corrected samples took months for the first one but weeks for second.
- FDIV bug could have been discovered with random verification (was still evolving at that time).
- The new one is a reliability issue. Probably we need modeling techniques to uncover such issues early enough.
- The cost of the bug to Intel is more than the annual revenues of many semiconductor companies.
BUGS, the inevitable part of our careers can be costly at times. The ongoing development in verification methodologies, standards, modeling and EDA technology all work towards taming those hidden defects that tend to prove Murphy’s law (If anything can go wrong, it will go wrong when you least expect it) some day.
Be cautious & Happy bug hunting!