approximate the R2 and the adjusted R2 (called ​​ a​ which is
adjusted to penalize for unnecessary variables. For example, ​​
R​a​​​ =0.77 indicates that 77% of response variability (​ can be
explained by the model.
When fitting a model, it is important to balance the
number of model parameters to estimate (​ with the number
of observations (​ often referred to as “the curse of dimen-
sionality.” In practice, the goal is to estimate parameters that
improve the model’s ability to represent the data (avoiding
underfitting), but not estimate parameters that represent noise
within the data (avoiding overfitting). The number of model
parameters to fit are described by the degrees of freedom
(DoF). The residual DoF represent the number of observations
that are free to vary, where residual DoF =number of observa-
tions DoF. These residual DoF are important for estimating
the residuals, and residuals are a description of how much the
fitted model differs from the response (​
This discussion of residual DoF highlights a difference
between models fit on a subset of the data versus models fit on
all the data, with a categorical variable included. For example,
using Table 2, consider the =x​ model for Material A, which
has DoF =3 (with three parameters ​​ 0​​​ ​​ 1​​​ and ​​ ε​ but only
50 observations, so the residual DoF =47. In comparison,
the =x +m +xm​ model has DoF =5 (with five parame-
ters: ​​ 0​​​ ​​ 1​​​ ​​ 2​​​ ​​ 3​​​ and ​​ ε​ but it is fit on the entire dataset
of 100 observations, so the residual DoF =95. Thus, if is a
significant variable, then the =x +m +xm​ model is better
than the =x​ subset models in describing the residuals.
When fitting a linear model, several hypothesis tests are
performed. For this application, P-values are compared against
α =0.05​ which is a measure of the Type 1 error: rejecting the
null hypothesis when it is actually true. When testing for sig-
nificance on the subset models, each is compared to =0.05​.
For two subsets (like Materials A and B), the allowable Type 1
error is α =0.1​ rather than =0.05​ for the model that
includes material as a variable.
The likelihood ratio test (LRT) is also useful for comparing
models, as it tests whether two nested models are statistically
significantly different from each other. Models are nested if
one has fewer terms than the other. For example, model =x​
is nested inside model = x​​ 2​ +x​ If the test returns a P-value
0.05, the models are not significantly different, and the addi-
tional variables in the larger model are not needed. Table 3
shows the LRT results for all nested models, and in all cases,
the additional terms are useful because they result in signifi-
cantly different likelihoods (P-values 0.001).
When using all the data, the worst model is =x​ with
AIC =463.40. Adding the ​​ 2​​ term (​ = x​​ 2​ +x​ leads to a
smaller AIC of 442.91, and adding the material term reduces it
further to 377.26. The best model for all the data includes the
most variables, with the form = x​​ 2​ +x +m +mx +m x​​ 2​​ and
an AIC of 299.93. When modeling each material separately
TA B L E 4
POD mean and variance terms for each fitted linear model, with statical tests of whether the linear model
assumptions of normality and constant variance are met
Model y =Material ​​​ˆ​​​​ pod 𝛔​x​​​ ​​​ˆ​​​​ pod Normality P-value Constant variance
P-value
x Combo 0.27 2.38 0.156 0.3316 0.001*
x A subset 0.26 1.07 0.104 0.3894 0.3204
x B subset 0.27 1.86 0.092 0.3270 0.9688
x +m +xm A 0.26 1.52 0.148 0.1785 0.3100
x +m +xm B 0.27 1.52 0.075 0.1785 0.3100
*P-values 0.05 indicate that the assumption is violated.
TA B L E 5
POD mean and variance terms for each fitted linear model, with statical tests of whether the linear model
assumptions of normality and constant variance are met
Model y =Material ​ˆ​​​​ pod 𝛔x​​​ ​​​ˆ​​​​ pod Normality P-value Constant variance
P-value
x2 +x Combo 3 2.13 3.02 0.5652 0.001*
x2 +x A subset 3 0.81 1.15 0.5747 0.0177*
x2 +x B subset 3 1.18 1.67 0.6543 0.7282
x2 +x +m +mx +mx2 Both 3 1.01 2.03 0.3316 0.3972
Note: Values in this table are in terms of z =f(x) (for some function g), not x, as described in Section 3.1.
*P-values 0.05 indicate that the assumption is violated.
A U G U S T 2 0 2 5 M AT E R I A L S E V A L U AT I O N 67
(A only or B only), the = x​​ 2​ +x​ model performs better than
the =x​ model.
Since the data was simulated from a quadratic function (of
the form = x​​ 2​ +x +m +mx +m x​​ 2​​ a model that includes a
cubic term should result in overfitting. To test this, a higher-order
model was also fit: = x​​ 3​ + x​​ 2​ +x +m +xm + x​​ 2​ m + x​​ 3​ m​ In
the simulation, the estimated coefficients for ​​ 3​​ and ​​ 3​ m​ were
not significantly different from zero. This indicates that these
terms do not improve the model and should not be included.
If this model were written explicitly, it would be ​​y​ =0 × x​​ 3​ +
5.5155 +29.5736x +2.2768m +28.8131mx +6.9602 x​​ 2​ +
7.4635m​x​​ 2​ +0 × x​​ 3​ m​ which is identical to the last model in
Table 1. Although the estimated model is the same, adding the
cubic terms leads to a poorer fit, as indicated by the larger (and
therefore worse) AIC and BIC values in Table 2 when compar-
ing the overfit model (labeled ​​ 3​ +…+m​x​​ 3​​ to the best model
(labeled = x​​ 2​ +x +m +mx +m x​​ 2​​ Overfitting is also evident
in the difference between the approximate ​​ ​2​​ of 0.9593 and
the adjusted ​​ ​2​​ (which is adjusted to penalize for unnecessary
variables), ​​ a​ =0.9562​ which is smaller. Finally, the critical
values ​​ 90​​​ and ​​ 90/95​​​ become negative, so this overfit model
does not provide usable POD estimates.
Tables 4 and 5 present the normality (via Shapiro–
Wilk) and constant variance (via Durbin–Watson) tests
[17]. Normality is met in all cases, but when material is not
accounted for in the model, the constant variance assump-
tion is violated (P-values 0.05). These tables also list the ​​​μ​pod​​​​ ˆ
and ​ˆ​​​​ σ​pod values. Note that the values are reported on different
scales and cannot be compared directly. While Equation 6
describes how to calculate these values for the simple linear
model, these values will be calculated differently as the models
grow in complexity.
Figure 3 shows the resulting POD curve for all considered
models (from Table 2), and Table 6 lists the ​​ 50​​​ ​​ 90​​​ and ​​ 90/95​​​
values for each model. In this simple simulation, several of the
more complex models provided smaller critical values, suggest-
ing that a poor-fitting model may underestimate the capability
of the NDE system. The common practice of using ​ˆ​​​ μ​pod = a​50​​​
does not apply when a change of variables is used, as with the
quadratic models.
A better linear fit to the data will provide a more accurate
estimate of the POD. In Simulation 1, a general trend is seen
where the ​​ 90​​​ and ​​ 90/95​​​ values decreased as the quality of
the model increased. Better-fitting models will have smaller
standard error, and because these values are sensitive to the
standard error, this is not surprising.
The next section describes Simulation 2, in which
Simulation 1 is repeated 10 000 times. This Monte Carlo study
is intended to provide estimates of how the results could
change with the quality of the model, given a large population
of datasets.
4.3. Simulation 2: Monte Carlo Study
To estimate how much the results change with the quality of
the model, a Monte Carlo simulation was run using 10 000
runs of Equation 42, refitting all models after each run. AIC
and BIC values for each model of the 10 000 simulations were
smaller (therefore, better) for the quadratic models. Consistent
with the behavior observed in a single simulation (see Table 3),
the LRT results showed that more complex models outper-
formed simpler models in all but three of the 10 000 simula-
tions (0.03%).
The best overall model was = x​​ 2​ +x +m +mx +
m x​​ 2​​ The = x​​ 2​ +x​ model outperformed =x​ and
y =x +m +xm​ also outperformed =x​ Although the
LRT cannot be used to compare models = x​​ 2​ +x​ and
y =x +m +xm​ directly (since they are not nested), a direct
comparison of log-likelihoods, AIC, and BIC revealed that
y =x +m +xm​ was always a better fit to the data than =
x​​ 2​ +x​.
According to the LRT results, the +m +xm​ model is
always different from the ​​ 3​ + x​​ 2​ +x +m +xm + x​​ 2​ m + x​​ 3​ m​
model. However, 93.6% of the best models (quadratic with
interactions, = x​​ 2​ +x +m +mx +m x​​ 2​​ showed no signifi-
cant improvement (P-value 0.05) when the cubic terms were
added (i.e., ​​ 3​ + x​​ 2​ +x +m +xm + x​​ 2​ m + x​​ 3​ m​
Contrary to expectation, the median (via Hodges–
Lehmann) critical values of ​​ 50​​​ ​​ 90​​​ and ​​ 90/95​​​ from the qua-
dratic model (​ = x​​ 2​ +x +m +mx +m x​​ 2​​ were more similar
to those from the =x +m +xm​ model than those from the
y =x +material​ model. Figures 4, 5, and 6 show box plots
and density plots for each of the critical values, and Table 7
provides the Hodges–Lehmann estimates of the median
critical values with 95% confidence intervals for the 10 000
simulations.
ME
|
PODMODELING
TA B L E 6
Critical values for the example simulation
(corresponding to the values in Figure 3)
Model y =Material a50 a90 a90/95
x Combo 0.2702 0.4706 0.5055
x2 +x Combo 0.2089 0.2874 0.3076
x A subset 0.2645 0.3980 0.4320
x2 +x A subset 0.3742 0.4335 0.4503
x +m A 0.2645 0.4544 0.4958
x2 +x +m* A 0.2707* 0.3474* 0.3652*
x B subset 0.2731 0.3911 0.4213
x2 +x B subset 0.2464 0.2923 0.3053
x +m B 0.2731 0.3693 0.3917
x2 +x +m* B 0.1771* 0.2176* 0.2271*
*The critical values corresponding to the model that best fit the data.
68
M AT E R I A L S E V A L U AT I O N A U G U S T 2 0 2 5
Previous Page Next Page