Transform (GCC-PHAT) method has become widely preferred
due to its robustness against noise and reverberation (Knapp
and Carter 1976). Additionally, GCC-PHAT demonstrates resil-
ience in high signal-to-interference ratio (SIR) environments,
effectively managing interference from additional sources
(Kwon et al. 2010). This characteristic aligns well with the
experimental conditions in our study, where high SIR levels are
preserved despite background noise from air conditioners, lab-
oratory equipment, and the robots themselves. These factors
support the selection of GCC-PHAT for TDOA estimation in
our setup.
To provide a comprehensive comparison with alterna-
tive TDOA estimation algorithms, we also implemented the
coherence-based method (Carter 1987) and the Smoothed
Coherence Transform (SCOT) method (Carter et al. 1973). The
results reveal that TDOA estimates are consistent across all
methods, exhibiting a standard deviation of × 10−6 relative to
GCC-PHAT. However, in terms of computation time, SCOT and
the coherence-based method performed slightly better, with
average times of 0.050 s and 0.0519 s, respectively, compared to
GCC-PHAT’s 0.0685 s.
The GCC-PHAT operates in the frequency domain, as
shown in Equation 2:
(2) CCF[f] =A1[f]A2[f]*
where
A1 and 2 are the Fourier transforms of 1 and 2 ,
respectively, and
the {.}*operator denotes the complex conjugate.
By applying a weighting function [f] the generalized
cross-correlation (GCC) is obtained, as shown in Equation 3:
(3) GCCF[f] =ψ[f]A1[f]A2[f]*
The phase transform (PHAT) weighting, given in Equation 4,
normalizes the magnitude of the cross-spectrum, preserving
phase information and enhancing robustness to amplitude
variations:
(4) ψ[f] = 1 _
|A1[f]A2[f]*|
Finally, the GCC-PHAT function is expressed as:
(5) PHATF[f] = A1[f]A2[f]* _
|A1[f]A2[f]*|
TDOA is computed by taking an argmax over HATF Once the
TDOA is calculated, it can be used to determine the distance of
the sound source from the microphones:
(6) ∆d =c *∆ t
(7) ∆d = √
__________________
(x2 − x)2 − (y2 − y)2 − √
__________________
(x1 − x)2 − (y1 − y)2
where
c is the speed of sound (set to 343 m/s for this study),
∆t denotes TDOA, and
the coordinates ( 1 1 and ( 2 2 represent the known
positions of the microphones, forming a hyperbola.
In two dimensions, sound source localization (SSL) can
be achieved by calculating two or more hyperbolas and iden-
tifying their intersections, as indicated in Figure 1. To deter-
mine these intersections, we apply the nonlinear least-squares
method to solve the resulting system of equations (Coleman
and Li 1996 Levenberg 1944).
In this study, we focused on two-dimensional source local-
ization for the following reasons: (1) Simplifying the problem
to better understand path planning strategies that can improve
source localization accuracy, (2) Assuming the source is
approximately at the same level as the moving platform, (3)
The source was considered far enough that the Z-dimension
was less significant compared to the X and Y dimensions, and
(4) The platform’s motion was limited, and we did not account
for sensor rotation around the X or Y axes.
Acoustical Data Acquisition
Gas leaks typically produce acoustic frequencies ranging from
10 kHz to 100 kHz, with the most pronounced energy differ-
ence between leak signals and ambient noise occurring around
40 kHz. This makes 40 kHz an ideal frequency for gas leak
detection due to its clear distinction from background noise.
For this study, directional optical microphones were used due
to their broad detection range (10 Hz to 1 MHz) and low self-
noise (B. Fischer 2016 Delic 2019).
Data was collected using a NI data acquisition system,
capable of sampling up to 20 MS/s per channel. To meet
the Nyquist theorem requirements for 40 kHz detection,
the system used a sampling rate of up to 400 kHz, ensuring
accurate signal representation. To simulate a gas leak, com-
pressed air was released through an open valve.
–600
–250 –200 –150 –100 –50 0 50 100
–800
–1000
–1200
–1400
–1600
–1800
X (mm)
–2000
10.29
1 0.2
9 –1 0
.29
–10
.2
9
– 10.2
9
–1 0.29
10.2 9
10.2 9
–19.7225
–19
.7 225
–1 9. 7225
–19.7
22 5
1 9.722
5
Estimated location
Sound source
Error
M1 M1 M2
M2
Figure 1. Sound source localization (SSL) using formed hyperbolas.
A P R I L 2 0 2 5 • M AT E R I A L S E V A L U AT I O N 53
Y
(mm)
due to its robustness against noise and reverberation (Knapp
and Carter 1976). Additionally, GCC-PHAT demonstrates resil-
ience in high signal-to-interference ratio (SIR) environments,
effectively managing interference from additional sources
(Kwon et al. 2010). This characteristic aligns well with the
experimental conditions in our study, where high SIR levels are
preserved despite background noise from air conditioners, lab-
oratory equipment, and the robots themselves. These factors
support the selection of GCC-PHAT for TDOA estimation in
our setup.
To provide a comprehensive comparison with alterna-
tive TDOA estimation algorithms, we also implemented the
coherence-based method (Carter 1987) and the Smoothed
Coherence Transform (SCOT) method (Carter et al. 1973). The
results reveal that TDOA estimates are consistent across all
methods, exhibiting a standard deviation of × 10−6 relative to
GCC-PHAT. However, in terms of computation time, SCOT and
the coherence-based method performed slightly better, with
average times of 0.050 s and 0.0519 s, respectively, compared to
GCC-PHAT’s 0.0685 s.
The GCC-PHAT operates in the frequency domain, as
shown in Equation 2:
(2) CCF[f] =A1[f]A2[f]*
where
A1 and 2 are the Fourier transforms of 1 and 2 ,
respectively, and
the {.}*operator denotes the complex conjugate.
By applying a weighting function [f] the generalized
cross-correlation (GCC) is obtained, as shown in Equation 3:
(3) GCCF[f] =ψ[f]A1[f]A2[f]*
The phase transform (PHAT) weighting, given in Equation 4,
normalizes the magnitude of the cross-spectrum, preserving
phase information and enhancing robustness to amplitude
variations:
(4) ψ[f] = 1 _
|A1[f]A2[f]*|
Finally, the GCC-PHAT function is expressed as:
(5) PHATF[f] = A1[f]A2[f]* _
|A1[f]A2[f]*|
TDOA is computed by taking an argmax over HATF Once the
TDOA is calculated, it can be used to determine the distance of
the sound source from the microphones:
(6) ∆d =c *∆ t
(7) ∆d = √
__________________
(x2 − x)2 − (y2 − y)2 − √
__________________
(x1 − x)2 − (y1 − y)2
where
c is the speed of sound (set to 343 m/s for this study),
∆t denotes TDOA, and
the coordinates ( 1 1 and ( 2 2 represent the known
positions of the microphones, forming a hyperbola.
In two dimensions, sound source localization (SSL) can
be achieved by calculating two or more hyperbolas and iden-
tifying their intersections, as indicated in Figure 1. To deter-
mine these intersections, we apply the nonlinear least-squares
method to solve the resulting system of equations (Coleman
and Li 1996 Levenberg 1944).
In this study, we focused on two-dimensional source local-
ization for the following reasons: (1) Simplifying the problem
to better understand path planning strategies that can improve
source localization accuracy, (2) Assuming the source is
approximately at the same level as the moving platform, (3)
The source was considered far enough that the Z-dimension
was less significant compared to the X and Y dimensions, and
(4) The platform’s motion was limited, and we did not account
for sensor rotation around the X or Y axes.
Acoustical Data Acquisition
Gas leaks typically produce acoustic frequencies ranging from
10 kHz to 100 kHz, with the most pronounced energy differ-
ence between leak signals and ambient noise occurring around
40 kHz. This makes 40 kHz an ideal frequency for gas leak
detection due to its clear distinction from background noise.
For this study, directional optical microphones were used due
to their broad detection range (10 Hz to 1 MHz) and low self-
noise (B. Fischer 2016 Delic 2019).
Data was collected using a NI data acquisition system,
capable of sampling up to 20 MS/s per channel. To meet
the Nyquist theorem requirements for 40 kHz detection,
the system used a sampling rate of up to 400 kHz, ensuring
accurate signal representation. To simulate a gas leak, com-
pressed air was released through an open valve.
–600
–250 –200 –150 –100 –50 0 50 100
–800
–1000
–1200
–1400
–1600
–1800
X (mm)
–2000
10.29
1 0.2
9 –1 0
.29
–10
.2
9
– 10.2
9
–1 0.29
10.2 9
10.2 9
–19.7225
–19
.7 225
–1 9. 7225
–19.7
22 5
1 9.722
5
Estimated location
Sound source
Error
M1 M1 M2
M2
Figure 1. Sound source localization (SSL) using formed hyperbolas.
A P R I L 2 0 2 5 • M AT E R I A L S E V A L U AT I O N 53
Y
(mm)