the reliance on rail temperature in Equation 2. During feature
extraction, frequencies could then be constrained to those
associated with the RNT directly instead of the stress through
Equation 2. Removing the reliance on rail temperature addi-
tionally proves whether or not the impact of rail tempera-
ture can be modelled adequately using only PSD features
as opposed to explicitly providing it. It is noted here that for
feature extraction, models included only the lateral FDD as
input to demonstrate performance using a single direction.
The mRMR and the NCA algorithms were used to identify
the bandwidths most sensitive to stress and boundary condi-
tions. The first algorithm maximizes the relevance and mini-
mizes the redundancy of a set of data with respect to a certain
variable. This is done by using the maximum mutual informa-
tion quotient (MIQ):
(4)​ max
x S​​c​​​ MIQx​​ = max
x S​​c​​​​
I​(​​x, y​)​​ _
1
|S|​​
z∈S​​ I(​​x, z)​​​​​​
where
S is the number of features, and
I represents the mutual information.
In this study, S =700 to start because the PSD from which
the bandwidths are going to be identified spans from 0 to
700 Hz at 1 Hz frequency resolution. The mutual information I
measures how much the uncertainty of a random variable can
be reduced by knowing another variable. This is determined by
the following equation:
(5)​ I​ (​​ X, Z​ )​​ =
i,j​​ P(X = x​i​​, Z = z​j​​)​log​​_______________ P​(​​X = x​i​​, Z = z​j​​​)​​
P​(X = x​i​​)​P​(Z = z​j​​​)​​​​​
where
X and Z are random variables.
If the two variables are independent, I(X, Z) is equal to
zero. Therefore, to determine relevance and redundancy in the
PSD spectrums, the mRMR algorithm is used to maximize the
mutual information between a frequency and RNT whereas it
minimizes the mutual information between neighboring fre-
quencies. NCA is a nonparametric method that seeks to obtain
features that maximize the prediction accuracy of a regression
problem and acts as an alternative method to determining the
most prevalent features. This is accomplished by making use of
gradient descent to learn a distance metric by finding a linear
transformation of the input space. The distance metric takes
the following form:
(6)​ dw​​​ (​​ x​i​​, x​j​​​ )​​ =
z=1​
P wz​ 2​​ |​x​iz​​ x​jz​​|​​​​
where
P represents the number of features (frequencies),
wz is feature weight, and
xiz and xjz are observations associated with said feature and a
random reference observation.
Weight vector w is optimized by approximating the reference
observation as a probability distribution allowing the residual
error between a prediction and target to be minimized with
optimization techniques like gradient descent. The distance
metric can then be utilized in the objective function where the
residual error between predicted and actual is weighted by the
probability of a sample belonging to the reference point. More
detailed implementations on both methods can be found in the
literature (Ding and Peng 2005 Yang et al. 2012).
Feature Extraction and RNT Prediction
mRMR was performed on all 700 features to gauge the impor-
tance across different frequencies. The results are presented in
Figure 5a, where the importance is mapped against each indi-
vidual frequency. As mRMR is a mutual information quotient
that applies normalization by the total mutual information with
respect to other variables (i.e., frequencies), importance is a
value between 0 and 1 where the higher the value, the more
important it is. The locations of the most significant frequencies
are emphasized in Figure 5b. Here the FDD of all the training
and validation signals (35 +15) are plotted and the color map
indicates the normalized amplitude of each signal. The dash
lines signify the 30 most relevant features (i.e., frequencies). The
figure demonstrates that many extracted features coincide with
or are close to the peaks. In addition, the features align with very
low frequency information (i.e., 0–130 Hz).
Similar to Figures 5a and 5b, Figures 6a and 6b present the
results obtained using NCA, which shares the same scope of
identifying the importance within a certain set of data but using
a different mathematical formulation with respect to mRMR.
NCA’s ordinate axis is represented by the weights associated with
each feature, enabling importances greater than one. Similar to
what was determined with mRMR, some of the most import-
ant frequencies lay around 50 Hz. This peak corresponds to the
so-called mode B, a translational mode in which the rail under-
goes a lateral deformation as a rigid body. This mode is strongly
affected by the boundary conditions and is highly governed by
the rail mass and the lateral resistance of the ties and fasteners.
One noticeable difference between Figure 5b and Figure 6b is
that the relevant features determined with the NCA fall on the
peaks of the PSDs, whereas the relevant features determined with
mRMR reside around the valleys between consecutive peaks.
This is shown in Figure 7 for the peak around 350 Hz. Features
from both also tended to align on the 500 Hz peak, which is
linked to the so-called mode E, a temperature-dependent mode
of vibration. These results are not surprising. The ground truth
RNT displayed in Figure 2 demonstrates that the boundary con-
ditions change daily with the ambient temperatures.
Finally, the following analyses were performed to quantify
the advantages about the use of a few frequencies with respect
to the full 0–700 Hz spectrum. Four feature sets were consid-
ered: one with all 700 frequencies, and one with the 100 most,
30 most, and 20 most relevant features, respectively. The latter
three were extracted with mRMR. The results are presented in
Figure 8, where the mean absolute error (MAE) between the
ME
|
RAILROADS
72
M A T E R I A L S E V A L U A T I O N J A N U A R Y 2 0 2 4
2401 ME January.indd 72 12/20/23 8:01 AM
0
0
0.2
0.4
0.6
0.8
1
200 100 400 300 600 500 700
Frequency (Hz)
0 200 100 400 300 600 500 700
Frequency (Hz)
600
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1
51
122
56
58 288 399
375
495
482
674
640
487
321 132
134
123
43
42
41 67
64
136
196
198
116
113
97
81 57
85
Figure 5. (a) features extracted via the mRMR method. Higher amplitude means higher importance (b) top 30 mRMR features
overlayed with the FDD signals in the training set. Most of the features align or are within the bandwidth of well-established peaks.
0
0
0.5
1
1.5
2
2.5
3
200 100 400 300 600 500 700
Frequency (Hz)
0 200 100 400 300 600 500 700
Frequency (Hz)
600
500
400
300
200
100
0
0 0.2 0.4 0.6 0.8 1
51
56
57 53
58
54
52
64
61 146
245
350
341
349
353
244
476
500
508
505
506
509
502
205
199 65
60
55
151
49
50
Figure 6. (a) NCA feature importance. A higher score means higher weight, which means higher importance: (b) top 30 NCA features
overlaid using dashed lines on the training set. NCA does a significant job at localization of relevant peaks related to RNT in the PSD
spectra.
280 400 420 300 320 340 360 380
Frequency (Hz)
600
500
400
300
200
100
0
0 0.2 0.3 0.1 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 7. Zoomed-in view of 350 Hz peak where the dashed lines
correspond to location of features extracted from mRMR. Solid lines are
features extracted from NCA on the same peak.
LR
0
0.5
1
0.5
2
2.5
3
Tree SVM Ensemble GPR ANN Kernel
700 features
100 features
30 features
20 features
Figure 8. Algorithm performance comparison using a select number of mRMR
features.
J A N U A R Y 2 0 2 4 M A T E R I A L S E V A L U A T I O N 73
2401 ME January.indd 73 12/20/23 8:01 AM
Signal
number Importance
Signal
number Importance
Signal
number MAE
(°C)
Previous Page Next Page