approximate the R2 and the adjusted R2 (called a which is
adjusted to penalize for unnecessary variables. For example,
Ra =0.77 indicates that 77% of response variability ( can be
explained by the model.
When fitting a model, it is important to balance the
number of model parameters to estimate ( with the number
of observations ( often referred to as “the curse of dimen-
sionality.” In practice, the goal is to estimate parameters that
improve the model’s ability to represent the data (avoiding
underfitting), but not estimate parameters that represent noise
within the data (avoiding overfitting). The number of model
parameters to fit are described by the degrees of freedom
(DoF). The residual DoF represent the number of observations
that are free to vary, where residual DoF =number of observa-
tions – DoF. These residual DoF are important for estimating
the residuals, and residuals are a description of how much the
fitted model differs from the response (
This discussion of residual DoF highlights a difference
between models fit on a subset of the data versus models fit on
all the data, with a categorical variable included. For example,
using Table 2, consider the =x model for Material A, which
has DoF =3 (with three parameters 0 1 and ε but only
50 observations, so the residual DoF =47. In comparison,
the =x +m +xm model has DoF =5 (with five parame-
ters: 0 1 2 3 and ε but it is fit on the entire dataset
of 100 observations, so the residual DoF =95. Thus, if is a
significant variable, then the =x +m +xm model is better
than the =x subset models in describing the residuals.
When fitting a linear model, several hypothesis tests are
performed. For this application, P-values are compared against
α =0.05 which is a measure of the Type 1 error: rejecting the
null hypothesis when it is actually true. When testing for sig-
nificance on the subset models, each is compared to =0.05.
For two subsets (like Materials A and B), the allowable Type 1
error is α =0.1 rather than =0.05 for the model that
includes material as a variable.
The likelihood ratio test (LRT) is also useful for comparing
models, as it tests whether two nested models are statistically
significantly different from each other. Models are nested if
one has fewer terms than the other. For example, model =x
is nested inside model = x 2 +x If the test returns a P-value
0.05, the models are not significantly different, and the addi-
tional variables in the larger model are not needed. Table 3
shows the LRT results for all nested models, and in all cases,
the additional terms are useful because they result in signifi-
cantly different likelihoods (P-values 0.001).
When using all the data, the worst model is =x with
AIC =463.40. Adding the 2 term ( = x 2 +x leads to a
smaller AIC of 442.91, and adding the material term reduces it
further to 377.26. The best model for all the data includes the
most variables, with the form = x 2 +x +m +mx +m x 2 and
an AIC of 299.93. When modeling each material separately
TA B L E 4
POD mean and variance terms for each fitted linear model, with statical tests of whether the linear model
assumptions of normality and constant variance are met
Model y =Material ˆ pod 𝛔x ˆ pod Normality P-value Constant variance
P-value
x Combo 0.27 2.38 0.156 0.3316 0.001*
x A subset 0.26 1.07 0.104 0.3894 0.3204
x B subset 0.27 1.86 0.092 0.3270 0.9688
x +m +xm A 0.26 1.52 0.148 0.1785 0.3100
x +m +xm B 0.27 1.52 0.075 0.1785 0.3100
*P-values 0.05 indicate that the assumption is violated.
TA B L E 5
POD mean and variance terms for each fitted linear model, with statical tests of whether the linear model
assumptions of normality and constant variance are met
Model y =Material ˆ pod 𝛔x ˆ pod Normality P-value Constant variance
P-value
x2 +x Combo 3 2.13 3.02 0.5652 0.001*
x2 +x A subset 3 0.81 1.15 0.5747 0.0177*
x2 +x B subset 3 1.18 1.67 0.6543 0.7282
x2 +x +m +mx +mx2 Both 3 1.01 2.03 0.3316 0.3972
Note: Values in this table are in terms of z =f(x) (for some function g), not x, as described in Section 3.1.
*P-values 0.05 indicate that the assumption is violated.
A U G U S T 2 0 2 5 • M AT E R I A L S E V A L U AT I O N 67
adjusted to penalize for unnecessary variables. For example,
Ra =0.77 indicates that 77% of response variability ( can be
explained by the model.
When fitting a model, it is important to balance the
number of model parameters to estimate ( with the number
of observations ( often referred to as “the curse of dimen-
sionality.” In practice, the goal is to estimate parameters that
improve the model’s ability to represent the data (avoiding
underfitting), but not estimate parameters that represent noise
within the data (avoiding overfitting). The number of model
parameters to fit are described by the degrees of freedom
(DoF). The residual DoF represent the number of observations
that are free to vary, where residual DoF =number of observa-
tions – DoF. These residual DoF are important for estimating
the residuals, and residuals are a description of how much the
fitted model differs from the response (
This discussion of residual DoF highlights a difference
between models fit on a subset of the data versus models fit on
all the data, with a categorical variable included. For example,
using Table 2, consider the =x model for Material A, which
has DoF =3 (with three parameters 0 1 and ε but only
50 observations, so the residual DoF =47. In comparison,
the =x +m +xm model has DoF =5 (with five parame-
ters: 0 1 2 3 and ε but it is fit on the entire dataset
of 100 observations, so the residual DoF =95. Thus, if is a
significant variable, then the =x +m +xm model is better
than the =x subset models in describing the residuals.
When fitting a linear model, several hypothesis tests are
performed. For this application, P-values are compared against
α =0.05 which is a measure of the Type 1 error: rejecting the
null hypothesis when it is actually true. When testing for sig-
nificance on the subset models, each is compared to =0.05.
For two subsets (like Materials A and B), the allowable Type 1
error is α =0.1 rather than =0.05 for the model that
includes material as a variable.
The likelihood ratio test (LRT) is also useful for comparing
models, as it tests whether two nested models are statistically
significantly different from each other. Models are nested if
one has fewer terms than the other. For example, model =x
is nested inside model = x 2 +x If the test returns a P-value
0.05, the models are not significantly different, and the addi-
tional variables in the larger model are not needed. Table 3
shows the LRT results for all nested models, and in all cases,
the additional terms are useful because they result in signifi-
cantly different likelihoods (P-values 0.001).
When using all the data, the worst model is =x with
AIC =463.40. Adding the 2 term ( = x 2 +x leads to a
smaller AIC of 442.91, and adding the material term reduces it
further to 377.26. The best model for all the data includes the
most variables, with the form = x 2 +x +m +mx +m x 2 and
an AIC of 299.93. When modeling each material separately
TA B L E 4
POD mean and variance terms for each fitted linear model, with statical tests of whether the linear model
assumptions of normality and constant variance are met
Model y =Material ˆ pod 𝛔x ˆ pod Normality P-value Constant variance
P-value
x Combo 0.27 2.38 0.156 0.3316 0.001*
x A subset 0.26 1.07 0.104 0.3894 0.3204
x B subset 0.27 1.86 0.092 0.3270 0.9688
x +m +xm A 0.26 1.52 0.148 0.1785 0.3100
x +m +xm B 0.27 1.52 0.075 0.1785 0.3100
*P-values 0.05 indicate that the assumption is violated.
TA B L E 5
POD mean and variance terms for each fitted linear model, with statical tests of whether the linear model
assumptions of normality and constant variance are met
Model y =Material ˆ pod 𝛔x ˆ pod Normality P-value Constant variance
P-value
x2 +x Combo 3 2.13 3.02 0.5652 0.001*
x2 +x A subset 3 0.81 1.15 0.5747 0.0177*
x2 +x B subset 3 1.18 1.67 0.6543 0.7282
x2 +x +m +mx +mx2 Both 3 1.01 2.03 0.3316 0.3972
Note: Values in this table are in terms of z =f(x) (for some function g), not x, as described in Section 3.1.
*P-values 0.05 indicate that the assumption is violated.
A U G U S T 2 0 2 5 • M AT E R I A L S E V A L U AT I O N 67















































































































