When looking at datasheets of electronic equipment, especially the expensive ones, you can often see the MTBF or mean life value. Naturally high values are better, but that number can be misleading exactly in the same way as average salary values given in the press - the majority of people get less.
How the MTBF number is obtained? Typically a manufacturer will take a sample of products and test it under severe use conditions (but still within specs) until the required amount of time will pass to prove target MTBF value. Proven MTBF is calculated according to a very old formula, based on Chi-squared distribution:
Accelerated Life Test
Unfortunately, the formula requires a huge amount of testing time without any failures to prove the target. That is why in most cases the testing is done on elevated use stress – so the test hours can be multiplied by the Acceleration Factor.
Another thing – maybe you have noticed that the formula does not specify the number of test samples. It is because test time is a sum of the test time accumulated by all samples. And that means that we can try to cheat the procedure by testing a very big number of samples for a small amount of time. But few companies can afford that, especially if it’s a new design.
It’s an uneasy task to find the Acceleration Factor for your test. First of all, you have to specify the target failure mode that you want to accelerate. This should be this critical part of your design that has to be checked carefully. And you design your test around that thing. Next, you will have to check the research literature on failure modes that are similar to yours and try to find some values describing how much the degradation will accelerate with increasing test temperature, for example. If you cannot find anything don’t let that stop you – find the next most similar case and use it as an assumption.
What is left is to design the test setup and decide how much you want to increase the test stress. Remember, it cannot go beyond the values given in the specs, and it has to be higher than normal use by a number that reduces the test time to a reasonable scenario.
What is left is to prepare and run the test. But you don’t just let it run – you need to observe it closely during the test. Simply, because it is a great opportunity to learn about the degradation processes in a product. And that takes place before it is released to customers, which is a much safer learning process.
Testing to failure
Remember that critical part of your design? Maybe it’s a new part or new supplier or new technology that we don’t know. How about we test a sample to failure to truly discover it’s reliability. However, this can take a lot of time, which we typically don’t have.
We could accelerate it, just like before, but then, we really don’t know how the reliability on elevated stress level will translate to a normal use case. To answer that one has to run an additional test with the additional sample using a different value of elevated stress. With time-to-failure data from two stress levels, we can extrapolate to use conditions.
Because we need at least two samples and different test conditions you may need more than one test setup (money!) to run all of them at the same time. And because we want to see failures we can’t say how long it will take exactly, which is a difficult thing to say to a program manager. That is why that methodology is more useful when testing new parts early in the design process.
But what if there is a way to measure the level of degradation of a sample e.g. drop in performance or change in signals it produces?
We could measure how it changes with time and in that case, we don’t need to wait for failures to happen – you can extrapolate from measurements to find the time when performance will drop below the accepted level – like for example luminosity of LED. If you can take measurements online, it is possible to make projections of how much more testing time is needed.
The test outcome is the most accurate information about the performance of the tested part in the field.
By far this is the most advanced approach to reliability testing. But because of difficulties with test setup and sample preparation, also the most difficult to perform.