as its magnitude, location, and potential impact on the pipe-
line’s integrity (Liang et al. 2013). The main advantages of using
acoustic signals for gas leak diagnosis include swift response
time, long range, and precise localization (S. Wang and Yao
2020). A key drawback of using acoustic signals is their reduced
sensitivity as the sensors move farther away from the source
due to significant attenuation (Liu et al. 2017). This attenua-
tion is even more pronounced with higher-frequency sounds
(ultrasound), which are often involved with smaller leakages.
One effective approach to mitigate this problem is by employ-
ing an array of microphones along with acoustic imaging
techniques (Li et al. 2021 Fluke 2021 Drives &Controls 2021).
However, this method generates a substantial amount of data,
which is not ideal for a near-real-time approach required for
creating a semi-autonomous platform and quick response.
Fortunately, this issue has been ingeniously addressed by bio-
logical systems in their evolutionary acoustic sensing systems,
using only two ears (binaural sensing). Instead of relying on
increased sensor numbers to enhance sensitivity, these bio-
logical systems evolved their external ear (pinna) shapes to act
as hardware for magnifying sound coming from specific direc-
tions. This evolution was further complemented by strategic
motion and motion control of the pinna and head as well as
their movement (Populin and Yin 1998 Fletcher 2014), allowing
them to dynamically change acoustic directionality and sen-
sitivities as needed. Inspired by these biological mechanisms,
we propose in this paper an acoustic sensing system that com-
prises two microphones, mimicking two ears and binaural
hearing, with a framework designed to optimize or strategize
movement to improve source localization accuracy.
Sound source localization (SSL) is a prominent topic in
robotics, with a detailed review of common methods provided
by Rascon and Meza (2017). While gas and air leak local-
ization is essential for industrial inspections and has been
explored in earlier research, many existing approaches rely on
fixed microphone arrays (Eret and Meskell 2012), handheld
devices (Liao et al. 2013) for manual detection, and statistical
time-domain features for pinpointing leaks in pipes (F. Wang
et al. 2017). Yan et al. (2018) used a four-element linear array to
identify multiple sources with the MUSIC algorithm within the
63–187 kHz emission range. Focusing on ultrasonic emissions
from leaks in pressurized pipes, Schenck, Daems, and Steckel
(2019) employed a peak search on a beam-formed spectrum
and use a 32-element microphone array. By leveraging
multiple poses of the robot and integrating simultaneous local-
ization and mapping (SLAM) techniques with the microphone
array, potential leaks are effectively localized. Most recently,
G. K. J. Fischer et al. (2024) introduced an autonomous robotic
system for comprehensive plant inspection, equipped with a
variety of sensors, including lidar, stereo, UV/IR/RGB cameras,
electronic noses, and five microphones. These sensors detect
factors like methane leaks, flow rates, and infrastructure issues.
The system was tested at a wastewater treatment site, achiev-
ing gas leak localization with a 50 cm error through acoustic
assessments.
Current robotic systems for industrial inspection rely on
various sensors, such as lidar, stereo, and depth cameras, to
map environments for navigation and distance estimation.
However, little attention has been given to optimizing the
robot’s movement and positioning to enhance leak localiza-
tion once an anomaly, such as sound generated by a leak, is
detected by the microphones. Typically, these systems use
microphone arrays to estimate the direction of arrival (DOA)
of sound, combined with techniques like SLAM and lidar to
locate the leak. While multi-microphone array techniques can
perform well from fixed positions, they often require more
complex hardware setups, increased costs, and larger physical
space. In contrast, the proposed method achieves accurate
localization using a simpler hardware setup (two microphones)
with lower computational demands. This is accomplished by
strategically leveraging the robot’s mobility, effectively trading
hardware complexity and computational power for movement.
Specifically, this paper explores how motion, combined with
a two-microphone array inspired by an animal’s external
auditory system, enhances sound source localization. In other
words, we focus on optimizing the robot’s movement strat-
egies to acquire samples that enable precise sound source
localization while maintaining a lightweight system configu-
ration. First, we examine the effect of motion using a fixed-
base collaborative robotic arm. Based on the insights gained
from this study, we propose a motion strategy to enhance the
robot’s source localization. Finally, we implement this strategy
on a quadruped robot to explore its effectiveness for mobile
platforms.
Method
This section provides a comprehensive overview of our meth-
odology, covering the sound source localization approach,
acoustic data acquisition process, robotic systems, and experi-
mental setup.
Source Localization Method
The time difference of arrival (TDOA) between two signals is
commonly estimated by calculating a cross-correlation vector
(CCV). The Pearson correlation factor, as shown in Equation 1,
provides the simplest way of calculating the CCV (Rascon and
Meza 2017):
(1)​ CCV[τ]​ =
​_ ​_ ​Σ________________________​​)​​​​​2​−​]τ​​________________−_t[​​​​2a​​​(​​​)​​​​​1​−​]t[​​​​1a​​​(​​​​​____________t​​
Σt​​​(​​​a1​​​[t]​
​_
1​​​​)​​​​2​ Σt​​​(​​​a2​​​[t τ]​
​_
2​​​​)​​​​2​​​​
where​​
a​1​​​ and ​​ 2​​​ are the two discrete signals being compared,
τ is the time shift applied to ​​ 2​​​ and
_ a​1​​​​ and ​​a​2​​​​ _ are the mean values of ​​ 1​​​ and ​​ 2​​​ respectively.
Although computationally efficient and straightforward,
CCV-based TDOA estimation is highly susceptible to environ-
mental noise and reverberations, often leading to inaccurate
estimates (Brandstein and Silverman 1997). To mitigate these
challenges, the Generalized Cross-Correlation with Phase
ME
|
LEAKLOCALIZATION
52
M AT E R I A L S E V A L U AT I O N A P R I L 2 0 2 5
Transform (GCC-PHAT) method has become widely preferred
due to its robustness against noise and reverberation (Knapp
and Carter 1976). Additionally, GCC-PHAT demonstrates resil-
ience in high signal-to-interference ratio (SIR) environments,
effectively managing interference from additional sources
(Kwon et al. 2010). This characteristic aligns well with the
experimental conditions in our study, where high SIR levels are
preserved despite background noise from air conditioners, lab-
oratory equipment, and the robots themselves. These factors
support the selection of GCC-PHAT for TDOA estimation in
our setup.
To provide a comprehensive comparison with alterna-
tive TDOA estimation algorithms, we also implemented the
coherence-based method (Carter 1987) and the Smoothed
Coherence Transform (SCOT) method (Carter et al. 1973). The
results reveal that TDOA estimates are consistent across all
methods, exhibiting a standard deviation of × 10​​−6​​ relative to
GCC-PHAT. However, in terms of computation time, SCOT and
the coherence-based method performed slightly better, with
average times of 0.050 s and 0.0519 s, respectively, compared to
GCC-PHAT’s 0.0685 s.
The GCC-PHAT operates in the frequency domain, as
shown in Equation 2:
(2)​ CCF​[f]​ =A1​[f]​A2​[f]​​​*​​
where
A1​ and 2​ are the Fourier transforms of ​​ 1​​​ and ​​ 2​​​ ,
respectively, and
the {.}*operator denotes the complex conjugate.
By applying a weighting function ​​ [​​f]​​​​ the generalized
cross-correlation (GCC) is obtained, as shown in Equation 3:
(3)​ GCCF​[f]​ =ψ​[f]​​A1​[f]​A2​[​​f​]​​​​*​​
The phase transform (PHAT) weighting, given in Equation 4,
normalizes the magnitude of the cross-spectrum, preserving
phase information and enhancing robustness to amplitude
variations:
(4)​ ψ​[f]​ = 1 _
|A1​[f]​A2​[​​f​]​​​​*​|​​​
Finally, the GCC-PHAT function is expressed as:
(5)​ PHA​TF​​​[f]​ = A1​[f]​A2​[​​f​]​​​​*​ _
|A1​[f]​A2​[​​f​]​​​​*​|​​​​
TDOA is computed by taking an argmax over HA​TF​​​ Once the
TDOA is calculated, it can be used to determine the distance of
the sound source from the microphones:
(6)​ ∆d =c * t​
(7)​ ∆d =
__________________
(​​x​2​​ x​)​​​​2​ (​​y2​​ y​)​​​​2​​
__________________​​​
(​​x1​​ x​)​​​​2​ (​​y1​​ y​)​​​​2​​​
where​
c​ is the speed of sound (set to 343 m/s for this study),
∆t denotes TDOA, and
the coordinates (​​ 1​​​ ​​ 1​​​ and (​​ 2​​​ ​​ 2​​​ represent the known
positions of the microphones, forming a hyperbola.
In two dimensions, sound source localization (SSL) can
be achieved by calculating two or more hyperbolas and iden-
tifying their intersections, as indicated in Figure 1. To deter-
mine these intersections, we apply the nonlinear least-squares
method to solve the resulting system of equations (Coleman
and Li 1996 Levenberg 1944).
In this study, we focused on two-dimensional source local-
ization for the following reasons: (1) Simplifying the problem
to better understand path planning strategies that can improve
source localization accuracy, (2) Assuming the source is
approximately at the same level as the moving platform, (3)
The source was considered far enough that the Z-dimension
was less significant compared to the X and Y dimensions, and
(4) The platform’s motion was limited, and we did not account
for sensor rotation around the X or Y axes.
Acoustical Data Acquisition
Gas leaks typically produce acoustic frequencies ranging from
10 kHz to 100 kHz, with the most pronounced energy differ-
ence between leak signals and ambient noise occurring around
40 kHz. This makes 40 kHz an ideal frequency for gas leak
detection due to its clear distinction from background noise.
For this study, directional optical microphones were used due
to their broad detection range (10 Hz to 1 MHz) and low self-
noise (B. Fischer 2016 Delic 2019).
Data was collected using a NI data acquisition system,
capable of sampling up to 20 MS/s per channel. To meet
the Nyquist theorem requirements for 40 kHz detection,
the system used a sampling rate of up to 400 kHz, ensuring
accurate signal representation. To simulate a gas leak, com-
pressed air was released through an open valve.
–600
–250 –200 –150 –100 –50 0 50 100
–800
–1000
–1200
–1400
–1600
–1800
X (mm)
–2000
10.29
1 0.2
9 –1 0
.29
–10
.2
9
10.2
9
–1 0.29
10.2 9
10.2 9
–19.7225
–19
.7 225
–1 9. 7225
–19.7
22 5
1 9.722
5
Estimated location
Sound source
Error
M1 M1 M2
M2
Figure 1. Sound source localization (SSL) using formed hyperbolas.
A P R I L 2 0 2 5 M AT E R I A L S E V A L U AT I O N 53
Y
(mm)
Previous Page Next Page