MEASUREMENT UNCERTAINTY
ANALYSIS
Measurement uncertainty analysis,
based on the Guide to the Expression
of Uncertainty in Measurement (GUM)
framework [59], systematically assesses
systematic and random errors in NDE
inspections. It is essential in engineering
metrology, ensuring that measurement
deviations are accurately quantified and
reported. Unlike methods that assume
a predefined probability model, this
approach systematically identifies and
combines uncertainty sources—such as
instrument calibration, environmental
conditions, and operator variability—into
a comprehensive uncertainty budget.
For example, if we want to estimate
discontinuity depth from an ultrasonic
echo, the workflow of the GUM-based
uncertainty analysis is as follows:
Ñ Identify input quantities. List every
variable/uncertainty that influences ,
such as sound velocity, time of flight,
probe angle, temperature, and calibra-
tion block tolerances.
Ñ Classify and evaluate. Each input is
then classified. Type A inputs come
from statistical data obtained through
repeat scans Type B inputs rely on
specifications or expert knowledge.
Each input is assigned a standard
uncertainty ( xi).
Ñ Uncertainty propagation. Build a
measurement model =f(x1, x2, …)
to propagate these individual uncer-
tainties. Insert the measured (or
nominal) values of all inputs
into to obtain the best point
estimate ˆ =f
( ˆ 1, ˆ2,…) which
is the central value about which the
confidence interval will be built.
Linearize or run a Monte Carlo
simulation to obtain the combined
standard uncertainty:
(4) Uc(d) = √ _____________
∑
i
(∂xi
∂ f u (xi) ) 2
Ñ Expand uncertainty. Multiply c (d)
by a coverage factor (typically =2
for 95% confidence) to get the
expanded uncertainty, =k Uc(d),
and then obtain the uncertainty esti-
mation.
Ñ Uncertainty estimation. Report the
inspection result as =U ± ˆ .
Using GUM makes NDE uncertainty
statements transparent, traceable, and
comparable across laboratories and
inspection procedures. For example, in
phased array ultrasonic testing (PAUT)
for weld inspections, uncertainties arise
from variations in ultrasonic wave speed,
probe positioning, and couplet layer
thickness—all of which affect discon-
tinuity sizing accuracy. GUM has been
applied to analyze the uncertainty of
beam parameters in NDE [60].
Recent research explores the integra-
tion of advanced computational tech-
niques such as deep learning to improve
uncertainty quantification in dynamic
inspection environments [24].
CONFIDENCE INTERVALS AND
HYPOTHESIS TESTING
Confidence intervals (CIs) and hypoth-
esis testing (HT) provide a statistical
framework for quantifying uncertainty
in discontinuity detection and material
property measurements. Confidence
intervals estimate the probable range
where a discontinuity parameter, such
as discontinuity size or wall thickness,
is likely to fall, providing a measure of
quantification precision. Hypothesis
testing evaluates whether a detected
discontinuity is statistically significant
or merely an artifact caused by sensor
noise, material variability, or operator
inconsistencies, helping to reduce false
positives in inspections.
For instance, bootstrap confidence
intervals are used in process capability
analysis to evaluate performance indices
for non-normal data, as shown by Kashif
et al. [61] and Rao et al. [62]. These
methods provide reliable coverage prob-
abilities and narrower interval widths,
making them suitable for asymmetric
data distributions common in NDE.
Hypothesis testing is particularly
useful in NDE applications where mea-
surement variability can lead to false
indications. In radiographic testing (RT)
of welded joints, HT distinguishes actual
discontinuities from false signals caused
by scatter radiation or imaging incon-
sistencies. By reducing false alarms and
misclassifications, HT enhances inspec-
tion accuracy and helps prevent unnec-
essary repairs [63].
Generally, CIs quantify how precisely
the discontinuity size (or other mea-
surand) is known, while HT provides a
yes-or-no decision against an acceptance
criterion, with controlled false alarm risk.
It is often beneficial to combine both
methods to deliver a complete, safety-
focused reliability assessment for NDE
inspections.
For example, to set the acceptance
threshold crit for discontinuity depth
from an ultrasonic echo, the workflow is
as follows:
Ñ Collect representative data and
compute descriptive statistics.
Acquire repeated measurements 1 ,
x2, …xn under controlled, reproducible
conditions. Calculate the mean _ and
standard deviation .
Ñ Construct the CI. Select confidence
level − α (commonly 95%). The
two-sided CI for the mean is:
(5) CI =
_
± tn−1, α
2 √
s _n
where
tn−1, α
2
is the Student’s t-value for − 1
degrees of freedom, and
tn−1, α
2
s
√
_n
is the half-width, which is the
amount added and subtracted to
create the confidence band around
that mean.
Ñ Formulate hypotheses. Define the
null and alternative hypotheses:
≤ dcrit (Discontinuity acceptance) H0μ
H1μ dcrit (Discontinuity rejection)
Ñ Calculate test statistic. Choose
significance level (commonly 0.05),
and then calculate one sample
statistic:
(6) t =
_x − dcrit _
s
√ _n
Ñ Decision-making. The CI quantifies
measurement precision, while HT
provides a rigorously justified pass/
fail call. It is useful to combine both
results to strengthen the reliability
assessment in NDE inspections.
A U G U S T 2 0 2 5 • M AT E R I A L S E V A L U AT I O N 29
ANALYSIS
Measurement uncertainty analysis,
based on the Guide to the Expression
of Uncertainty in Measurement (GUM)
framework [59], systematically assesses
systematic and random errors in NDE
inspections. It is essential in engineering
metrology, ensuring that measurement
deviations are accurately quantified and
reported. Unlike methods that assume
a predefined probability model, this
approach systematically identifies and
combines uncertainty sources—such as
instrument calibration, environmental
conditions, and operator variability—into
a comprehensive uncertainty budget.
For example, if we want to estimate
discontinuity depth from an ultrasonic
echo, the workflow of the GUM-based
uncertainty analysis is as follows:
Ñ Identify input quantities. List every
variable/uncertainty that influences ,
such as sound velocity, time of flight,
probe angle, temperature, and calibra-
tion block tolerances.
Ñ Classify and evaluate. Each input is
then classified. Type A inputs come
from statistical data obtained through
repeat scans Type B inputs rely on
specifications or expert knowledge.
Each input is assigned a standard
uncertainty ( xi).
Ñ Uncertainty propagation. Build a
measurement model =f(x1, x2, …)
to propagate these individual uncer-
tainties. Insert the measured (or
nominal) values of all inputs
into to obtain the best point
estimate ˆ =f
( ˆ 1, ˆ2,…) which
is the central value about which the
confidence interval will be built.
Linearize or run a Monte Carlo
simulation to obtain the combined
standard uncertainty:
(4) Uc(d) = √ _____________
∑
i
(∂xi
∂ f u (xi) ) 2
Ñ Expand uncertainty. Multiply c (d)
by a coverage factor (typically =2
for 95% confidence) to get the
expanded uncertainty, =k Uc(d),
and then obtain the uncertainty esti-
mation.
Ñ Uncertainty estimation. Report the
inspection result as =U ± ˆ .
Using GUM makes NDE uncertainty
statements transparent, traceable, and
comparable across laboratories and
inspection procedures. For example, in
phased array ultrasonic testing (PAUT)
for weld inspections, uncertainties arise
from variations in ultrasonic wave speed,
probe positioning, and couplet layer
thickness—all of which affect discon-
tinuity sizing accuracy. GUM has been
applied to analyze the uncertainty of
beam parameters in NDE [60].
Recent research explores the integra-
tion of advanced computational tech-
niques such as deep learning to improve
uncertainty quantification in dynamic
inspection environments [24].
CONFIDENCE INTERVALS AND
HYPOTHESIS TESTING
Confidence intervals (CIs) and hypoth-
esis testing (HT) provide a statistical
framework for quantifying uncertainty
in discontinuity detection and material
property measurements. Confidence
intervals estimate the probable range
where a discontinuity parameter, such
as discontinuity size or wall thickness,
is likely to fall, providing a measure of
quantification precision. Hypothesis
testing evaluates whether a detected
discontinuity is statistically significant
or merely an artifact caused by sensor
noise, material variability, or operator
inconsistencies, helping to reduce false
positives in inspections.
For instance, bootstrap confidence
intervals are used in process capability
analysis to evaluate performance indices
for non-normal data, as shown by Kashif
et al. [61] and Rao et al. [62]. These
methods provide reliable coverage prob-
abilities and narrower interval widths,
making them suitable for asymmetric
data distributions common in NDE.
Hypothesis testing is particularly
useful in NDE applications where mea-
surement variability can lead to false
indications. In radiographic testing (RT)
of welded joints, HT distinguishes actual
discontinuities from false signals caused
by scatter radiation or imaging incon-
sistencies. By reducing false alarms and
misclassifications, HT enhances inspec-
tion accuracy and helps prevent unnec-
essary repairs [63].
Generally, CIs quantify how precisely
the discontinuity size (or other mea-
surand) is known, while HT provides a
yes-or-no decision against an acceptance
criterion, with controlled false alarm risk.
It is often beneficial to combine both
methods to deliver a complete, safety-
focused reliability assessment for NDE
inspections.
For example, to set the acceptance
threshold crit for discontinuity depth
from an ultrasonic echo, the workflow is
as follows:
Ñ Collect representative data and
compute descriptive statistics.
Acquire repeated measurements 1 ,
x2, …xn under controlled, reproducible
conditions. Calculate the mean _ and
standard deviation .
Ñ Construct the CI. Select confidence
level − α (commonly 95%). The
two-sided CI for the mean is:
(5) CI =
_
± tn−1, α
2 √
s _n
where
tn−1, α
2
is the Student’s t-value for − 1
degrees of freedom, and
tn−1, α
2
s
√
_n
is the half-width, which is the
amount added and subtracted to
create the confidence band around
that mean.
Ñ Formulate hypotheses. Define the
null and alternative hypotheses:
≤ dcrit (Discontinuity acceptance) H0μ
H1μ dcrit (Discontinuity rejection)
Ñ Calculate test statistic. Choose
significance level (commonly 0.05),
and then calculate one sample
statistic:
(6) t =
_x − dcrit _
s
√ _n
Ñ Decision-making. The CI quantifies
measurement precision, while HT
provides a rigorously justified pass/
fail call. It is useful to combine both
results to strengthen the reliability
assessment in NDE inspections.
A U G U S T 2 0 2 5 • M AT E R I A L S E V A L U AT I O N 29















































































































