RAMS stands for Reliability, Availability, Maintainability, and Safety (RAMS) analysis. RAMS data is essential in the ILS framework. While much of the RAMS can be done in a spreadsheet, it is best done with dedicated software tools. Here is why.
The motivation for this text is the Relyence software in our offer and its capability to perform all essential RAMS activities in the ILS framework. That is the last mention of Relyence in this article. From now on we will refer to general software tools only. Although we use Relyence screenshots to show discussed methods.
The role of ILS is to provide optimal support for a product. The first requirement for that is a clear representation of what “a product” means. This is done in Product Breakdown Structure (PBS), where a system is divided into subsystems, sub-subsystems, and components, each level with its relevant data e.g. requirements, maintenance activities, vendor information, etc. that have to be stored. And by doing that a product database is created. PBS is similar to a folder structure or tree structure with the main system (root) and subsystems as branches. PBS “tree” is a standard form of system breakdown present in many RAMS-oriented software tools.

Next, is the Failure Mode and Effects Analysis (FMEA) in various forms. It can be reliability-oriented FMECA (Failure Mode Effects and Criticality Analysis) or maintenance-oriented FMEA as part of Reliability Centered Maintenance (RCM) analysis. Essentially this is a repository of failure modes present in a product and the effects they have on the product and its mission (or for the user). That list is the heart of RAMS activities. A great deal of paperwork and thinking about the physics of failure is performed to understand how a product may fail in the field. If such data exist, field data (warranty, service) from similar products is used and it is the best source of intel. FMEA is traditionally done in a spreadsheet although modern FMEA standards in practice impose that it is moved to dedicated software tools. FMEA information is vital for many engineering disciplines, from design, system integration, safety, manufacturing to maintenance and spare parts planning. It should be placed in a database that is easy to maintain, update, and access by all involved individuals.

FMEA data overlaps heavily with Fault Tree Analysis (FTA) which is most commonly used by safety analysts to understand the failure propagation in technical systems and their consequences. It is a standard in aerospace engineering, railway, and nuclear industry and is propagating to other industries. It’s easy to draw a simple fault tree on a piece of paper but for complex products, it can take several pieces so it’s better just to use the dedicated software (and there are plenty on the market).

Reliability Centered Maintenance (RCM), already mentioned, takes the FMEA data on failure modes and, through discussion and calculations, assigns optimal maintenance strategies to each one. It is the strategy level, which means that it is not yet a detailed instruction on what and how to do it, but rather a high-level approach to maintenance e.g. is it periodic inspection with condition check or replacement every x miles, flight hours, work cycles, or maybe simply “run to failure” and replace if it’s not essential. RCM is the most detailed methodology for maintenance planning and is recommended to be used in the ILS program. The outcome of RCM is the general list of maintenance tasks, with intervals where applicable. It can be expanded with detailed maintenance planning: manhours needed (repair time), maintenance tools, spare parts needed, and Level of Repair Analysis (LORA) i.e. if it can be repaired in the field, in the maintenance depot, or by the vendor only.

A much more detailed approach to maintenance analysis is the repair time estimation based on low-level repair operations count, based on MIL-HDBK-472. Low level, in the context of the handbook, means very basic manual operations e.g. time to unscrew each bolt, open enclosure, close enclosure, etc. It can be cumbersome to do – but at the same time the most detailed and complete calculation. It should be added that repair times are an important input to availability simulations. Sometimes this information can be found in published sources, for selected components, but its validity for a similar product is questionable.

We mentioned that RAMS data must be accessible. It is because it can be reused in multiple activities of ILS. For example reliability analysis, which involves statistical calculations, is best done with FMEA data on failure modes and ideally also with failure rates from similar products already deployed in the field. It is maybe the most mathematically involving part of ILS where established statistical tools are used to calculate the probability of failure of individual components, subsystems, and finally the whole system. That calculation can also be performed in FTA software. But to do that, first, the probability of failure of individual components must be estimated. As mentioned, this can be done from field data, or reliability test results on multiple samples. Analysis of such data must be done by trained individuals with care taken when preparing input data and when interpreting the results. For example, one important question to answer is the long-term failure rate behavior. Is it stable over time? Is it increasing with time? Or maybe decreasing initially (which points to quality issues). The analysis will be different for one-time-use products and different for repairable products. And it should be pointed out that product reliability estimates are essential in the ILS program.

The least desirable (but often unavoidable) source of failure rates are the published databases from specific industries e.g. OREDA from oil & gas or US NRC Performance Estimates on nuclear power plants. The issue here lies in the uncertainty about the similarity of products in the database to a product analyzed under the ILS program. It is not a small assumption to make that a product designed by someone else, manufactured in another facility, and maintained by a different organization will have a similar failure rate value.
The probability of failure, however, is not the entire picture. Availability is equally important. This can be estimated through the use of simulation tools that take:
- PBS tree
- the failure rates from the field data on similar products, or FTA estimates
- maintenance time estimates from RCM
and from it, the RAM model is built.
The RAM model simulation outcome is the expected number of failures per unit of time, total downtime in a unit of time (per year), and obviously availability, which is the percentage of mission time (e.g. of the year) when the system is available to be used (not failed and not under maintenance or inspection). Those models are great! They can be used to estimate the number of spare parts needed, redundancy if there are poor-performing parts of a system, frequency of inspections, and finally – reliability allocation. Reliability allocation allows to determine the reliability targets for components in a system that are achievable and will result in reaching the reliability target of the system itself.

We should not forget the MTBF prediction tools (MTBF – mean time between failures). Those software tools utilize technical standards like MIL-HDBK-217 (most popular) or Bellcore/Telcordia to populate the BOM (bill of materials) table with failure rate values for individual electronic components and estimate MTBF values for a product. The calculation takes parameters that describe the usage environment and model the impact on reliability numbers. This approach can involve also mechanical parts. MIL-HDBK-217 is a popular approach to MTBF estimation when the budget for reliability testing is limited and no field data exists that would allow building a RAM model. In fact, many companies in the defense sector make MIL-HDBK-217 calculation a standard for the supplier base.

Finally, a systematic approach to field (service) data analysis should be explained. As mentioned it is the best source of reliability estimates that are used in other types of analysis. A FRACAS (Failure Reporting and Corrective Action System) methodology was developed in the past to address that topic. It is a process but always takes the form of a software system, where field failures are reported by field or service technicians. Those reports are gathered in a shared database. The failure data is then processed by reliability or quality engineers. Reliability engineers run statistical failure rate analysis. Quality engineers look at the individual failure modes, identify causes, and propose improvements for repeating problems (8D or DMAIC process).
As a summary, below is the list (not intended to be exhaustive) of required or recommended RAMS activities that produce data required in the ILS program:
- Product Breakdown Structure
- FMEA or FMECA (list of product failure modes)
- FTA (Fault Tree Analysis)
- RCM (Reliability Centered Maintenance)
- Maintainability analysis by MIL-HDBK-472
- Statistical reliability (failure rate) analysis
- RAM modeling and simulation (availability estimation)
- MTBF prediction (MIL-HDBK-217)
- FRACAS – field data analysis (Failure Reporting and Corrective Action System)
Those activities are often performed by analysts from different organizations in a given company. Most of them reuse data from other listed activities. RAMS software tools that enable sharing data in user user-friendly fashion are recommended for all ILS oriented companies.