AI/ML |FEATURE VALIDATED AND DEPLOYABLE AI/ML FOR NDT DATA DIAGNOSTICS BY ERIC LINDGREN While artificial intelligence/machine learning (AI/ML) methods have shown promise for the analysis of image and signal data, applications using nondestructive testing (NDT) for managing the safety of systems must meet a high level of quantified capability. Engineering decisions require technique validation with statistical bounds on performance to enable integration into critical analyses, such as life management and risk analysis. The Air Force Research Laboratory (AFRL) has pursued several projects to apply a hybrid approach that integrates AI/ML methods with heuristic and model-based algorithms to assist inspectors in accomplishing complex NDT evaluations. Three such examples are described in this article, including a method that was validated through a probability of detection (POD) study and deployed by the Department of the Air Force (DAF) in 2004 (Lindgren et al. 2005). Key lessons learned include the importance of considering the wide variability present in NDT applications upfront and maintaining a critical role for human inspectors to ensure NDT data quality and address outlier indications. Introduction There is a growing increase in interest and attention in AI/ML, which are statistical methods for data analysis. The promise of AI/ML is to use statistical methods to self-extract attributes in the data, such as relationships and/or trends in data that are not as quickly and reliably made through typical human observation. The DAF has embraced the use of these tools for applications where it can accelerate decision-making in representative campaigns, as shown in Figure 1. The objective defined for one of these efforts is summarized as: “The Air Force aims to harness and wield the most optimal forms of artificial intelligence to accomplish all mission-sets of the service with greater speed and accuracy” (USAF n.d.). With the potential to secure more NDT data through the transformation to fully digital instru- ments connected as envisioned by the Internet of Things (IoT) and NDE 4.0, there is an increased interest to use AI/ML methods as the diagnostic tool to determine if a flaw is present in NDT data. Justification for the use of AI/ML includes improved accuracy, improved reliability, and faster disposi- tion time by decreasing or eliminating dependence on human interpretation and analysis of NDT data. The initial focus for the use of AI/ML addresses the detection of flaw indications, although there is exploration in the use of AI/ML to provide addi- tional information on characterizing the size and location of discontinuities. When considering the applicability of AI/ML for flaw detection, it is important to recall that these technical approaches are based on statistical methods, namely regression or classification of data. The concept includes the use of multiple statistical methods in parallel combined with multiple layers of analysis to extract statistical trends in the data to enable decisions that are not readily detectable through more classical methods. These multidimen- sional data analysis methods frequently are called neural networks. These approaches can either be trained using data with known ground truths called supervised AL/ML, or be allowed to form the sta- tistical relationships without training data, called unsupervised AI/ML. As these methods rely on Figure 1. The Department of the Air Force artificial intelligence/machine learning campaign illustration. J U L Y 2 0 2 3 • M A T E R I A L S E V A L U A T I O N 35 2307 ME July dup.indd 35 6/19/23 3:41 PM US AIR FORCE GRAPHIC BY TRAVIS BURCHAM
data, critical attributes of the data must be con- sidered for their use. This includes the amount of available data, the accuracy of the data, and noise present in the data. The intent of this article is to discuss some of the challenges of using AI/ML exclusively for the analysis of NDT data through a representative outcome when considering noise and data quantity. The approach being used by the researchers at the AFRL to enhance manual interpretation of NDT data is discussed, and several representative examples that integrate attributes of AI/ML into diagnostic capability are presented. The intent is to highlight the capabilities and opportunities within the NDT community to facilitate and accelerate the analysis of NDT data. AI/ML Requirements for Engineering Decisions The detection of flaws using NDT capabilities is an engineering decision that requires a statistical metric of capability to ensure the safety of systems. In aviation, the capability is frequently validated by a POD study that follows the guidance provided in MIL-HDBK-1823A (US DOD 2009). To make these types of assessments possible, it is necessary to have metrics on the data that include such factors as quantity, quality, and fidelity, which includes such relatively simple factors as signal-to-noise ratios (SNRs). The outcome of a POD study that follows the guidelines of MIL-HDBK-1823A will be appropriate statistical metrics for risk calculations, ensuring the safety of systems. In the DAF, this is part of the Aircraft Structural Integrity Program (ASIP) (US DOD 2016) and the Propulsion Systems Integrity Program (PSIP) (US DOD 2008). Similar to POD studies, the same factors of the data affect the use of AI/ML. These factors become more critical as a function of the risk to a system if a flaw is not detected during an inspection cycle. Therefore, detailed understanding of the data being used is important to enable proper use of the AI/ML algorithms when using them to extract infor- mation from this data. Recent work has illustrated the impact of data quantity and SNR on the ability of a supervised neural network–based classifier (Lindgren 2022). The study used a synthetic dataset and introduced Gaussian noise at different percent levels while varying the number of data points used to train the AI/ML algorithm. The neural network used for this study was a multilayered percep- tron with four layers and 50 layers in each hidden layer. The results of this evaluation are shown in Figure 2. The plot illustrates the log of the mean square error of the neural network as a function of SNR for varying the number of data points in each dataset. The SNR varies from an infinite value to one that is poor of only 10 to 1. The number of data points in each dataset varies from 50 up to 14 000. The outcomes are presented in standard box plots showing the interquartile region (IQR) and whiskers based on the 1.5 IQR value, and the outliers are indi- cated by red indices for each set of numbered data points. It is clear from this study that the improved SNR and larger datasets result in a lower value for the mean squared error. This outcome is intui- tively anticipated as it is expected that more data with higher fidelity will result in improved model outcomes. However, this example highlights some of the challenges of using AI/ML for NDT data analysis. Even with the highest level of SNR, using smaller datasets for training will produce outliers that are considerably deviant for the mean values. When considering the impact on safety of systems, these outliers are the equivalent of a large, missed FEATURE |AI/ML Signal-to-noise ration (SNR) inf 100 50 25 10 10–1 10–2 10–3 10–4 Data set size 50 100 250 500 1k 2k 4k 8k 14k Figure 2. Multilayer perceptron results illustrating mean square error as a function of data quantity and signal-to-noise ratio. 36 M A T E R I A L S E V A L U A T I O N • J U L Y 2 0 2 3 2307 ME July dup.indd 36 6/19/23 3:41 PM Mean squared error
ASNT grants non-exclusive, non-transferable license of this material to . All rights reserved. © ASNT 2026. To report unauthorized use, contact: customersupport@asnt.org











































































































