the results. Simulation 2 (Section 4.3) is a large Monte Carlo
study that compares how the distribution of critical values can
change as the model complexity increases. It illustrates how
using an overly simple model may lead to biased estimates
of ​​ 90​​​ and ​​ 90/95​​​ Code for simulating the data, fitting the
models, and estimating POD is available at https://github.com/
christieknott/Multivariate-POD. The provided code uses the
R programming language Appendix B provides some import-
ant information about how to interpret the variance-related
outputs from R and SAS software.
4.1. POD Experimental Study: The Inspiration for
This Work
The US Air Force Research Laboratory performed an in-house
study of multilayer metal plates with bolt holes, using bolt-hole
eddy current (BHEC) as the nondestructive inspection method.
Four-layer metal plates were stacked and clamped together,
aligning the six bolt holes in the plates. The metal plates were
made of three different materials. The specimens contained both
fatigue cracks and notches propagating from the bolt holes, and
these defects had a range of realistic sizes. The fatigue cracks
were either corner cracks or mid-bore cracks [16].
Analysis of variance (ANOVA) tests revealed that crack/
notch size (​ was not the only variable related to the eddy
current response. Material type (​​ A​​​ ​​ B​​​ and ​​ C​​​ was a signif-
icant factor. Additionally, if Material A was in the layer below
the actively scanned layer (​​ A​​​ the response changed. Cracks
(​ and notches (​ showed different responses, too. There
were also interactions between some of the variables [16].
Varying across these significant variables led to eight dif-
ferent combinations. Within the study, three types of models
were fit:
1. One “collapsed” model, which ignored all the variables
except defect size.
2. Eight “by case” models, each run on a subset of the data.
3. A “multiple” linear model that included all the significant
variables.
Using Akaike information criterion (AIC) and Bayesian
information criterion (BIC), the “multiple” linear model [22]
was the best fit to the data, and its form was: ​​
ˆ =0.6254 +7.3595x +1.9627 m​B​​ +
1.3290 m​C​​ +1.4834d +0.9609 bA​​ +0.8584x​m​B​​ +
3.5873x​m​C​​ 3.5155 m​B​​d 0.9717 m​C​​d 1.1020d​bA​​ +
2.7590x​m​A​​d +14.7659x​m​B​​d 0.4538x​m​C​​d
0.6907xc​bA​​ 0.6247xd​bA​​​.​
The methods described in Section 3.1 were used to
estimate POD curves, and the results were very different when
comparing each modeling type. The hypothesis of whether
a more accurate linear model would yield a more accurate
POD curve could not be tested with experimental data, since
the true POD curve is unknown [16]. However, in a simula-
tion study, the true POD can be estimated from prior knowl-
edge (i.e., knowing the input function) and from Monte Carlo
sampling. Thus, Simulations 1 and 2 were conducted.
4.2. Simulation 1: Simple Example
A simulated dataset was created to represent an NDE system
whose response depends on both the material being inspected
(A or B) and the area of a discontinuity. This simulated case is
intended to loosely represent eddy current measurements cor-
related with the cross-sectional area of quarter-penny (corner)
discontinuities at edges, where varying responses are observed
for materials with different conductivity. The formulas for the
simulation model are given in Equation 42, where is the con-
tinuous length of the discontinuity ​​ (x 0.2,1)​​ of which there
are 50 values, and is the random noise. A plot of the data is
given in Figure 1. The decision threshold was set at ​​ dec​​ =3​.
The variables ​​ 50×1​​​ and ​​ 50×1​​​ are column vectors of zeros and
ones, respectively, each of length 50.
(42)​ yA​​ =10 x​​2​ +2*​1​50×1​​ +𝛆​ ​​
y​B​​ =20x2 + 1​50×1​​ +𝛆​
𝛆 Normal​(µ = 0​50×1​​,​σ​​2​​1​50×1​​ = 1​50×1​​)​​
Categorical variables need to use a coding scheme to
be included in the modeling. Equation 21 gives the material
coding schemes for the data in Figure 1. Since variables beyond
discontinuity size impact the signal response, the simple linear
model from Equation 2 is probably insufficient. Three different
approaches will be considered for this data:
Ñ By case: Divide the data by material and fit two separate
models (A only or B only)—50 observations each.
Ñ Collapsed: Ignore the effect of material (Combo)—100
observations.
Ñ Multiple: Extend the linear model to include discontinuity
size and material as variables (Both)—100 observations.
Within each of these approaches, a model with respect to
discontinuity size (​ discontinuity area (​​ 2​​ and discontinuity
volume (​​ 3​​ is considered. When models include higher-order
ME
|
PODMODELING
20
15
10
5
0
0.00 0.25 0.50 0.75 1.00
x
Material A
Material B
Figure 1. Plot of the simulated data for each material.
64
M AT E R I A L S E V A L U AT I O N A U G U S T 2 0 2 5
y
terms (e.g., squared values) or interactions (e.g., between
material and discontinuity size), we recommend centering
the continuous predictor to reduce the potential for multicol-
linearity [17, 23]. When predictor variables are highly correlated
(such as and ​​​ 2​​ )​​​​ the variances associated with the param-
eters may be inflated, which can affect variance estimation in
the POD curve. Although other more complex methods exist
to reduce multicollinearity, a simple centering of the variables
often resolves any issues and requires no structural changes to
the ​​ 90/95​​​ estimation methods in Section 3. The need for center-
ing can be determined using the variance inflation factor (VIF)
[17, 23].
Simulation 1 considers a single random instance from
Equation 42, and Simulation 2 considers 10 000 random
instances. Figure 2 shows the fitted linear models for all
approaches from Simulation 1 plotted against the data, with
the top graph showing Material A and the bottom showing
Material B. The combined models (​ =x​ and ​​ = x​​ 2​ +x​)​​​​
are the same in both graphs because they do not account for
material. The by-case models (​ =x​ for Material A, =x​
for Material B, = x​​ 2​ +x​ for Material A, and = x​​ 2​ +x​ for
Material B) and those that include material (​ =x +m​ and =
x​​ 2​ +x +m +mx +m x​​ 2​​ are different between Material A and
Material B. Models including ​​ 2​​ naturally show more curva-
ture, and models fit separately to A and B demonstrate steeper
increases for Material B and less for Material A.
For Material A, the line =x​ is identical to the =x +m​
line, and similarly for Material B. The line = x​​ 2​ +x​ for
Material A is identical to the = x​​ 2​ +x +m +mx +m x​​ 2​​ line,
and similarly for Material B. Since these models have the same
mean, the latter lines are overlaid on the former in Figure 2
and may appear to be missing. However, the variances are dif-
ferent between these models, so they will have different POD
curves (see Figure 3).
The methods in Section 3.1 were used to create the models
in Tables 1, 2, 4, and 5. Tables 4 and 5 list the ​ˆ​​​​ μ​pod and ​ˆ​​​ σ​pod =
σ​x​​/​α​1​​​ values for Materials A and B. Variable ​​ 1​​​ differs for the
materials, so each material has a different ​ˆ​​​​ σ​pod The values for ​​
ˆ μ​pod​​​ and ​​​σ​pod​​​​ ˆ in Table 4 are in terms of while Table 5 values
are in terms of so the numbers differ because they are on
different scales.
When the data is split into subsets (model =x​ for each
material), the variance for Material A is estimated as 1.072 =
1.1449 and for Material B as 1.862 =3.4596. However, when the
materials are modeled together (​ =x +m +mx​ a single
variance is estimated for both materials simultaneously:
1.522 =2.3104. This is different from the =x​ model where
material is ignored, which yields 2.382 =5.6644 this larger
variance occurs because the important material variable is
not included in the mean estimate. This applies to the qua-
dratic models as well, though estimated in -space rather
than -space.
Table 1 provides each fitted model resulting from the
maximum likelihood estimation (MLE). The MLE fit informa-
tion is provided in Table 2, in which smaller values of –2 log
likelihood (-2LL), AIC, and BIC indicate better models when
comparing the same dataset. These statistics are useful even
for comparing nonlinear models [16, 20]. Because Table 2
contains information from linear models, it is possible to
20
15
10
5
0
0.00 0.25 0.50 0.75 1.00
Discontinuity size
y =x
y =x for A
y =x for B
y =x2 +x
y =x2 +x for A
y =x2 +x for B
y =x2 +x +material
y =x +material
20
15
10
5
0
0.00 0.25 0.50 0.75 1.00
Discontinuity size
y =x
y =x for A
y =x for B
y =x2 +x
y =x2 +x for A
y =x2 +x for B
y =x2 +x +material
y =x +material
Figure 2. Linear model fits for each case and material: (a) Material A
(b) Material B.
y =x
y =x for A
y =x for B
y =x2 +x
y =x2 +x for A
y =x2 +x for B
y =x2 +x +material for A
y =x2 +x +material for B
y =x +material for A
y =x +material for B
0.00 0.25 0.50
Discontinuity size
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
0.00 0.25 0.50
Discontinuity size
Figure 3. Probability of detection (POD) curves for each model:
(a) Material A (b) Material B.
A U G U S T 2 0 2 5 M AT E R I A L S E V A L U AT I O N 65
Signal
response
Signal
response
POD
POD
Previous Page Next Page