long-term dependencies in weld sequences, convolutional long short-term memory (ConvLSTM) architectures (Shi et al. 2015) were explored. All investigations were conducted with Tensorflow (Abadi et al. 2015) and Keras (Chollet et al. 2015) for Python. Due to the sequential processing of ConvLSTMs and relatively small sequence items (resized A-scans see the Model Training and Performance Evaluation Section), the amenability of processing to parallelization is heavily reduced for this task, and it was consequently found in preliminary work that CPUs were faster for both training and inference. Sequence process- ing using ConvLSTM differs from, for example, a pure CNN or transformer architecture, which is highly parallelizable and benefits greatly from computing on GPU. Thus, all computa- tions were performed using an Intel® Core™ i7 CPU. FEASIBILITY STUDY The feasibility of a ConvLSTM-based architecture was inves- tigated with input M-scans (i.e., arbitrary-length sequences of A-scans) resized vertically to 128 pixels. This investigation was designed to estimate the upper limit on the number of filters per layer and the number of layers based on the processing time requirement of 1 ms per A-scan in a production environ- ment, which includes input preprocessing and potential com- munications overhead. Preliminary tests determined that the production environment ran inference approximately 35–45% faster than the development environment due to, for example, the removal of training overhead in the exported network graph, differences in Tensorflow compilation, differences in programming language, and so on. Accounting for the speedup in the production environment, along with overhead from preprocessing and so forth, a cutoff of 1.1 ms per A-scan was imposed. An overarching architecture (Figure 4) was designed with one ConvLSTM module and one max pooling operation per layer variants were tested having 1–5 layers and an initial layer with 8–32 filters, with number of filters doubling per layer. Following a flattening, the last layer was a time-distributed (i.e., shared across all time steps) fully connected layer with five outputs. To guard against extra computational overhead from initial resource allocation, one M-scan was fully processed prior to recording inference times. Subsequently, 10 randomly selected, arbitrary-length M-scans, comprising of a total of 2589 A-scans, were processed, during which inference times were recorded. As M-scan length has no impact on mean inference time per A-scan, though it may subtly impact variance of infer- ence times, the selected M-scans were held constant through all trials so that the same exact A-scans were processed in each trial. The largest feasible model was used for further training and evaluation. MODEL TRAINING AND PERFORMANCE EVALUATION At both training and testing time, M-scan images were cropped vertically to tightly focus on the welded stackup, resized to a height of 128 pixels, and cropped horizontally starting at the current-on timing until the end of the weld process. Data augmentation was conducted only at training time and was designed to desensitize the model to various potential sit- uations that could occur in a production system (e.g., elec- tromagnetic interference, slight misreporting of current-on timing, gain and contrast variance, shift in A-scan gating, etc.). Thus, augmentation involved some typical image augmen- tation steps such as random vertical shifts of both top and bottom image cropping positions prior to resizing vertically to 128 pixels, random horizontal shift of current-on (image left edge) position, addition of artificial noise, and random contrast adjustments. In addition, random horizontal resizing of M-scans to uniformly distributed randomly-selected widths from 75–400 pixels was conducted to desensitize the model to the weld timing distribution of the training data, with the aim of producing a more robust model such that it can correctly interpret data from welds having weld times vastly different from those typically observed in the training data. Key event timings and MNS curves were adjusted according to any aug- mentations performed. Due to the random horizontal resizing, inputs and targets were zero-padded after the end of the sequence. Three models were trained using Monte-Carlo validation and evaluated on a held-out testing dataset. Of the 18 223 labeled M-scan samples, 16 400 were used for training, 1640 for validation, and 1823 for testing. Each model was trained using the Adam optimizer (Kingma and Ba 2015) for 400 epochs with Figure 4. An unrolled schematic diagram of the ConvLSTM architecture used in this study. Data flow over depth of network is from bottom to top. Data flow over weld time is from left to right. Input A-scans (xt =1…n) are fed to the network. ConvLSTM layer (1…k) states (denoted s composed of C and H states observed in standard LSTMs) are initially zeros and modified over time given previous states and new inputs from previous layer. A max pooling operation follows each ConvLSTM. Outputs of the last ConvLSTM layer are fed into the time-distributed (i.e., shared across all time steps) fully connected decision-making layer (denoted FC in the figure). Input and output dimensionalities are depicted. y 1 … … … … … … … x 1 s 1,0 s k,0 s k,1 s k,2 s k,n s 1,1 s 1,2 s 1,n x 2 128 128 128 x n 5 5 5 ConvLSTM k ConvLSTM 1 FC y 2 y n J U L Y 2 0 2 3 • M A T E R I A L S E V A L U A T I O N 65 2307 ME July dup.indd 65 6/19/23 3:41 PM
a batch size of 32, with early stopping and learning rate reduc- tion based on validation loss. Binary cross-entropy loss was used for the event outputs, while mean-squared error was used on the MNS regression output. Inputs and loss were masked such that models would skip the zero-vector A-scans during training and not backpropagate loss on those inputs thus, the model did not learn from the zero-padded regions. To evaluate performance, a number of different perfor- mance indicators were used for each task. With respect to event detection, performance was evaluated using sensitivity with respect to the absolute error of ground truth versus model predictions of event timings from 0–30 ms, overall specific- ity, and histograms of timing error for true positives. MNS regression performance was assessed using the percentage of A-scans that are correct within an absolute difference of 0.1. Results The results of the feasibility study and performance evaluation are discussed next. Feasibility Study The feasibility study results (Table 1) demonstrated that archi- tectures starting with eight filters in the first layer were feasible up to three layers (Figure 5a). The three-layer architecture with eight filters in the first layer had an inference time of 1.06 ms (SD =0.13 ms), whereas a four-layer architecture had an infer- ence time of 1.22 ms (SD =0.14 ms). With 16 filters (Figure 5b), ME |AI/ML T A B L E 1 Summary of feasibility study results Architecture (filters per ConvLSTM layer) Parameters Inference time (ms) 8 3461 0.866 (0.193) 8-16 8133 1.024 (0.155) 8-16-32 26 693 1.060 (0.129) 8-16-32-64 100 677 1.221 (0.142) 8-16-32-64-128 396 101 1.428 (0.155) 16 8453 0.952 (0.304) 16-32 27 013 0.954 (0.160) 16-32-64 100 997 1.085 (0.151) 16-32-64-128 396 421 1.344 (0.132) 16-32-64-128-256 1 577 093 1.992 (0.154) 32 23 045 0.999 (0.268) 32-64 97 029 0.970 (0.108) 32-64-128 392 453 1.264 (0.126) 32-64-128-256 1 573 125 1.942 (0.144) 32-64-128-256-512 6 293 765 4.271 (0.189) Note: Mean inference time per A-scan over 2589 A-scans, standard deviation in parentheses Architecture Architecture Architecture Legend Inference time Parameters 8 8-16 8-16-32 8-16-32-64 8-16-32-64-128 400 000 300 000 200 000 100 000 0 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 6 000 000 4 000 000 2 000 000 0 32 32-64 32-64-128 32-64-128-256 32-64-128-256-512 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 16 16-32 16-32-64 16-32-64-128 16-32-64-128-256 1 600 000 1 200 000 800 000 400 000 0 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Figure 5. Inference time per A-scan in milliseconds (orange line) and parameter count (blue dashed line) per architecture (X axis 1–5 layers), with first layer number of filters: (a) 8 (b) 16 and (c) 32. Inference time generally grows superlinearly with respect to both number of layers and parameters. Means across all 2589 A-scans are plotted with 1 standard deviation of mean shown with error bars. 66 M A T E R I A L S E V A L U A T I O N • J U L Y 2 0 2 3 2307 ME July dup.indd 66 6/19/23 3:41 PM No .of parameter s N o. of parameters No .of parameter s A-scan inference time (ms) A-scan inference time (ms) A-scan inference time (ms)
ASNT grants non-exclusive, non-transferable license of this material to . All rights reserved. © ASNT 2025. To report unauthorized use, contact: customersupport@asnt.org