data, critical attributes of the data must be con- sidered for their use. This includes the amount of available data, the accuracy of the data, and noise present in the data. The intent of this article is to discuss some of the challenges of using AI/ML exclusively for the analysis of NDT data through a representative outcome when considering noise and data quantity. The approach being used by the researchers at the AFRL to enhance manual interpretation of NDT data is discussed, and several representative examples that integrate attributes of AI/ML into diagnostic capability are presented. The intent is to highlight the capabilities and opportunities within the NDT community to facilitate and accelerate the analysis of NDT data. AI/ML Requirements for Engineering Decisions The detection of flaws using NDT capabilities is an engineering decision that requires a statistical metric of capability to ensure the safety of systems. In aviation, the capability is frequently validated by a POD study that follows the guidance provided in MIL-HDBK-1823A (US DOD 2009). To make these types of assessments possible, it is necessary to have metrics on the data that include such factors as quantity, quality, and fidelity, which includes such relatively simple factors as signal-to-noise ratios (SNRs). The outcome of a POD study that follows the guidelines of MIL-HDBK-1823A will be appropriate statistical metrics for risk calculations, ensuring the safety of systems. In the DAF, this is part of the Aircraft Structural Integrity Program (ASIP) (US DOD 2016) and the Propulsion Systems Integrity Program (PSIP) (US DOD 2008). Similar to POD studies, the same factors of the data affect the use of AI/ML. These factors become more critical as a function of the risk to a system if a flaw is not detected during an inspection cycle. Therefore, detailed understanding of the data being used is important to enable proper use of the AI/ML algorithms when using them to extract infor- mation from this data. Recent work has illustrated the impact of data quantity and SNR on the ability of a supervised neural network–based classifier (Lindgren 2022). The study used a synthetic dataset and introduced Gaussian noise at different percent levels while varying the number of data points used to train the AI/ML algorithm. The neural network used for this study was a multilayered percep- tron with four layers and 50 layers in each hidden layer. The results of this evaluation are shown in Figure 2. The plot illustrates the log of the mean square error of the neural network as a function of SNR for varying the number of data points in each dataset. The SNR varies from an infinite value to one that is poor of only 10 to 1. The number of data points in each dataset varies from 50 up to 14 000. The outcomes are presented in standard box plots showing the interquartile region (IQR) and whiskers based on the 1.5 IQR value, and the outliers are indi- cated by red indices for each set of numbered data points. It is clear from this study that the improved SNR and larger datasets result in a lower value for the mean squared error. This outcome is intui- tively anticipated as it is expected that more data with higher fidelity will result in improved model outcomes. However, this example highlights some of the challenges of using AI/ML for NDT data analysis. Even with the highest level of SNR, using smaller datasets for training will produce outliers that are considerably deviant for the mean values. When considering the impact on safety of systems, these outliers are the equivalent of a large, missed FEATURE |AI/ML Signal-to-noise ration (SNR) inf 100 50 25 10 10–1 10–2 10–3 10–4 Data set size 50 100 250 500 1k 2k 4k 8k 14k Figure 2. Multilayer perceptron results illustrating mean square error as a function of data quantity and signal-to-noise ratio. 36 M A T E R I A L S E V A L U A T I O N • J U L Y 2 0 2 3 2307 ME July dup.indd 36 6/19/23 3:41 PM Mean squared error
flaw that could lead to an increased risk of a cata- strophic outcome. It is important to recall that it is not the smallest flaw that can be detected, but the largest flaw that could be missed that impacts the safety of a system. This is especially true in aviation where single-load path structures are expected to have an extraordinarily low risk of failure when risk is managed by damage tolerance (US DOD 2016). This data sensitivity study demonstrates two critical issues that need to be considered when applying AI/ML algorithms to NDT data. The first is the number of data points required to achieve improved performance of AI/ML methods. Large training sets of actual flaws are hard to generate due to the time and cost of preparing such samples. A common complaint of POD studies that follow the guidance of MIL-HDBK-1823A is the high cost to prepare samples with characterized flaws. The minimum number of flaws for a versus a-hat (i.e., flaw size versus magnitude of the signal response from the measurement system) assessments is 40 and for hit-miss assessments is 60. Large datasets of flaw responses in NDT data are difficult to find from service since the engineering response to the detection of a growing number of flaws is either to modify or replace the structural element of concern before a large population of flaws is present. An option that has been pursued includes the use of simulation to generate the required datasets for training. However, the challenge is to create simu- lations that are representative of the flaws found in actual structures. This approach would require a validation process with a good amount of empiri- cal data covering the wide range of test conditions expected from an engineering perspective. The second issue is the ability to address outliers and nuances in data that can be indicators of flaws. The concern is the tendency of statistical methods to ignore such features when using large datasets. Unless the attributes of the outlier and nuance change in data are included in sufficient large quan- tities in training, the approach would tend to dismiss such features in the data, which could result in missed flaws. Conversely, if the AI/ML is sensitive to outliers, then the concern becomes that a large number of false calls could decrease the value of implementing the AI/ML algorithm. Thus, the lessons learned from the analysis of representative data includes the need to have the right data for training, including multiple flaws that are independent from each other. It is extremely important to recall that resampling the same data is not acceptable unless proper statistical methods to address correlated data are included in the analysis. Similarly, it is not acceptable to test AI/ ML methods using the same data that was used for training. Another aspect is to ensure factors that can affect the statistical analysis of data (such as SNR) are included in the training datasets. In addition, if simulation data is used in training, it must be from validated models that capture all the anticipated variances found in the NDT data for the inspec- tion. Lastly, the desired precision and accuracy of the diagnostics to be performed by AI/ML must be defined to ensure the amount of available data is sufficient to meet these objectives. This last consid- eration is especially true if unsupervised methods are being considered. Challenges for AI/ML in NDT As indicated by the sensitivity studies in the previous section, a significant challenge for the use of AI/ML in NDT data is to capture the effect of all the factors that can influence the capability to detect the flaws of interest. Figure 3 is a repre- sentation of these factors that the author has used extensively to illustrate the additional challenges when migrating from a laboratory to an operational environment. The three general classes of chal- lenges can be summarized as equipment variability, structural complexity and variability, plus flaw com- plexity and variability. In addition, these parame- ters can change as a function of the life of a system, which increases the capability validation difficulty of the NDT system when integrated into system life management. Equipment variability is the easiest of the three sources of variability to address from a research and development perspective. The variability in equip- ment settings can be defined and managed, but the unknown that frequently needs to be quantified is sensor variability and its impact on the diagnostics of flaws. Common NDT procedures address this with calibration processes, which alleviate many of these Find damage here ≠ Sensors Notch Plate L Rear spar Figure 3. Representative increase in challenges when migrating from a laboratory environment (left) to an operational environment (right). J U L Y 2 0 2 3 • M A T E R I A L S E V A L U A T I O N 37 2307 ME July dup.indd 37 6/19/23 3:41 PM
ASNT grants non-exclusive, non-transferable license of this material to . All rights reserved. © ASNT 2024. To report unauthorized use, contact: customersupport@asnt.org