wireless sensor. The FDD encompassed the only input to the
models for feature extraction so we could remove any reliance
on temperature and strictly associate with RNT. Seven different
algorithms were considered: LR, decision trees, SVM, ensem-
bles, GPR, and ANN, as well as kernel approximation methods.
Alongside the base algorithms provided in MATLAB, varia-
tions in a few base parameters were also tested. These include
kernel type for GPRs and SVMs and number of hidden layers
for ANN. All the models in addition to their parameter varia-
tions are listed Table 1.
To compare the different algorithms, the input vector
consisted of the two full FDD amplitude directions, the corre-
sponding frequencies, and the temperature manually recorded
from the railhead probe at those excitations. The performance
of each model was determined by calculating the mean-
squared error (MSE) associated with the RNT:
(3)​ MSE = 1
n
i=1​
n
(​​ Yi​​​ ˆ i​​​​ )​​​​
2​​​
where​​
Y​i​​​ is the ground truth RNT,
ˆ i​​​​ is the neutral temperature predicted by the algorithm, and
n represents the number of total experimental
measurements.
This was chosen to penalize outliers during the training
procedure, which is accomplished by using the square term.
The RNT was chosen as the target instead of the stress due to
T A B L E 1
Types of machine learning algorithms tested with their variants
Model Type Note
Linear Linear Terms linear
Linear Interactions Terms interactions
Linear Robust Terms linear, robust
Tree Fine Minimum leaf size 4
Tree Medium Minimum leaf size 12
Tree Coarse Minimum leaf size 36
SVM Linear Linear kernel
SVM Quadratic Quadratic kernel
SVM Cubic Cubic kernel
SVM Fine Gaussian Gaussian kernel, kernel scale 6.6
SVM Medium Gaussian Gaussian kernel, kernel scale 26
SVM Coarse Gaussian Gaussian kernel, kernel scale 110
GPR Rational quadratic Rational quadratic kernel, constant basis
GPR Squared
exponential Squared exponential kernel, constant basis
GPR Matern 5/2 Matern 5/2 kernel, constant basis
GPR Exponential Exponential kernel, constant basis
Ensemble Boosted trees Minimum leaf size 8, 30 learners,
0.1 learning rate
Ensemble Bagged trees Minimum leaf size 8, 30 learners
ANN Narrow 1 layer, ReLU activation, 10 nodes
ANN Medium 1 layer, ReLU activation, 25 nodes
ANN Wide 1 layer, ReLU activation, 100 nodes
ANN Bilayered 2 layer, ReLU activation, 10 nodes each
ANN Trilayered 3 layer, ReLU activation, 10 nodes each
Kernel SVM kernel SVM kernel learner
Kernel Least-squares
kernel regression Least-squares kernel learner
J A N U A R Y 2 0 2 4 M A T E R I A L S E V A L U A T I O N 71
2401 ME January.indd 71 12/20/23 8:01 AM
the reliance on rail temperature in Equation 2. During feature
extraction, frequencies could then be constrained to those
associated with the RNT directly instead of the stress through
Equation 2. Removing the reliance on rail temperature addi-
tionally proves whether or not the impact of rail tempera-
ture can be modelled adequately using only PSD features
as opposed to explicitly providing it. It is noted here that for
feature extraction, models included only the lateral FDD as
input to demonstrate performance using a single direction.
The mRMR and the NCA algorithms were used to identify
the bandwidths most sensitive to stress and boundary condi-
tions. The first algorithm maximizes the relevance and mini-
mizes the redundancy of a set of data with respect to a certain
variable. This is done by using the maximum mutual informa-
tion quotient (MIQ):
(4)​ max
x S​​c​​​ MIQx​​ = max
x S​​c​​​​
I​(​​x, y​)​​ _
1
|S|​​
z∈S​​ I(​​x, z)​​​​​​
where
S is the number of features, and
I represents the mutual information.
In this study, S =700 to start because the PSD from which
the bandwidths are going to be identified spans from 0 to
700 Hz at 1 Hz frequency resolution. The mutual information I
measures how much the uncertainty of a random variable can
be reduced by knowing another variable. This is determined by
the following equation:
(5)​ I​ (​​ X, Z​ )​​ =
i,j​​ P(X = x​i​​, Z = z​j​​)​log​​_______________ P​(​​X = x​i​​, Z = z​j​​​)​​
P​(X = x​i​​)​P​(Z = z​j​​​)​​​​​
where
X and Z are random variables.
If the two variables are independent, I(X, Z) is equal to
zero. Therefore, to determine relevance and redundancy in the
PSD spectrums, the mRMR algorithm is used to maximize the
mutual information between a frequency and RNT whereas it
minimizes the mutual information between neighboring fre-
quencies. NCA is a nonparametric method that seeks to obtain
features that maximize the prediction accuracy of a regression
problem and acts as an alternative method to determining the
most prevalent features. This is accomplished by making use of
gradient descent to learn a distance metric by finding a linear
transformation of the input space. The distance metric takes
the following form:
(6)​ dw​​​ (​​ x​i​​, x​j​​​ )​​ =
z=1​
P wz​ 2​​ |​x​iz​​ x​jz​​|​​​​
where
P represents the number of features (frequencies),
wz is feature weight, and
xiz and xjz are observations associated with said feature and a
random reference observation.
Weight vector w is optimized by approximating the reference
observation as a probability distribution allowing the residual
error between a prediction and target to be minimized with
optimization techniques like gradient descent. The distance
metric can then be utilized in the objective function where the
residual error between predicted and actual is weighted by the
probability of a sample belonging to the reference point. More
detailed implementations on both methods can be found in the
literature (Ding and Peng 2005 Yang et al. 2012).
Feature Extraction and RNT Prediction
mRMR was performed on all 700 features to gauge the impor-
tance across different frequencies. The results are presented in
Figure 5a, where the importance is mapped against each indi-
vidual frequency. As mRMR is a mutual information quotient
that applies normalization by the total mutual information with
respect to other variables (i.e., frequencies), importance is a
value between 0 and 1 where the higher the value, the more
important it is. The locations of the most significant frequencies
are emphasized in Figure 5b. Here the FDD of all the training
and validation signals (35 +15) are plotted and the color map
indicates the normalized amplitude of each signal. The dash
lines signify the 30 most relevant features (i.e., frequencies). The
figure demonstrates that many extracted features coincide with
or are close to the peaks. In addition, the features align with very
low frequency information (i.e., 0–130 Hz).
Similar to Figures 5a and 5b, Figures 6a and 6b present the
results obtained using NCA, which shares the same scope of
identifying the importance within a certain set of data but using
a different mathematical formulation with respect to mRMR.
NCA’s ordinate axis is represented by the weights associated with
each feature, enabling importances greater than one. Similar to
what was determined with mRMR, some of the most import-
ant frequencies lay around 50 Hz. This peak corresponds to the
so-called mode B, a translational mode in which the rail under-
goes a lateral deformation as a rigid body. This mode is strongly
affected by the boundary conditions and is highly governed by
the rail mass and the lateral resistance of the ties and fasteners.
One noticeable difference between Figure 5b and Figure 6b is
that the relevant features determined with the NCA fall on the
peaks of the PSDs, whereas the relevant features determined with
mRMR reside around the valleys between consecutive peaks.
This is shown in Figure 7 for the peak around 350 Hz. Features
from both also tended to align on the 500 Hz peak, which is
linked to the so-called mode E, a temperature-dependent mode
of vibration. These results are not surprising. The ground truth
RNT displayed in Figure 2 demonstrates that the boundary con-
ditions change daily with the ambient temperatures.
Finally, the following analyses were performed to quantify
the advantages about the use of a few frequencies with respect
to the full 0–700 Hz spectrum. Four feature sets were consid-
ered: one with all 700 frequencies, and one with the 100 most,
30 most, and 20 most relevant features, respectively. The latter
three were extracted with mRMR. The results are presented in
Figure 8, where the mean absolute error (MAE) between the
ME
|
RAILROADS
72
M A T E R I A L S E V A L U A T I O N J A N U A R Y 2 0 2 4
2401 ME January.indd 72 12/20/23 8:01 AM
Previous Page Next Page