Abstract
The automatic identification of radionuclides is essential for remote, unmanned monitoring and the rapid detection of radioactive contamination. While deep learning techniques have significantly enhanced recognition accuracy, their "black-box" nature and reliance on large-scale datasets pose challenges. These issues include poor interpretability, high overfitting risk, and uncontrollable errors, all of which limit their use in high-reliability fields such as the nuclear industry. This paper introduces FhyMetric-Net, a novel interpretable model for mixed radionuclide identification that integrates physical priors and feature metric constraints. The core innovations of this work include: (1) Embedding radionuclide characteristic peak physical information into neural networks in a differentiable manner, for the first time. This technique constrains the feature optimization space, enhancing both the reliability and interpretability of the model. (2) Proposing a feature space metric constraint method for mixed radionuclide samples to improve the model's ability to extract discriminative features. During training, we used only 1 % of the sample size compared to the test dataset. In extensive performance tests, involving various detector types and more complex radionuclide mixtures, FhyMetric-Net achieved an 97.589 % F1 Score comprehensive performance.This result places it at the forefront of the field, with a parameter count of just 0.051032M—only 1.316 % of ResNet-18. Ablation studies revealed that the physical prior constraints and feature metric constraints played a significant role in improving Precision and Recall, respectively. Qualitative analysis of the model's weights further confirmed advances the application of automated mixed radionuclide identification in high-reliability scenarios within the nuclear industry.
Full Text
Preamble
FhyMetric-Net: Interpretable mixed radioisotope identification model integrating prior characteristic peak physical information and feature metric constraints Cao-Lin Zhang, Jiang-Mei Zhang, 1, 2, Hao-Lin Liu, Guo-Wei Yang, Jia-Qi Wang, and Rui Tang 1 School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China CAEA Innovation Center of Nuclear Environmental Safety Technology, Southwest University of Science and Technology, Mianyang 621010, China The automatic identification of radionuclides is essential for remote, unmanned monitoring and the rapid detection of radioactive contamination. While deep learning techniques have significantly enhanced recog- nition accuracy, their "black-box" nature and reliance on large-scale datasets pose challenges. These issues include poor interpretability, high overfitting risk, and uncontrollable errors, all of which limit their use in high- reliability fields such as the nuclear industry. This paper introduces FhyMetric-Net, a novel interpretable model for mixed radionuclide identification that integrates physical priors and feature metric constraints. The core in- novations of this work include: (1) Embedding radionuclide characteristic peak physical information into neural networks in a differentiable manner, for the first time. This technique constrains the feature optimization space, enhancing both the reliability and interpretability of the model. (2) Proposing a feature space metric constraint method for mixed radionuclide samples to improve the model’s ability to extract discriminative features. Dur- ing training, we used only 1% of the sample size compared to the test dataset. In extensive performance tests, involving various detector types and more complex radionuclide mixtures, FhyMetric-Net achieved an 97.589% F1 Score comprehensive performance. This result places it at the forefront of the field, with a parameter count of just 0.051032M—only 1.316% of ResNet-18. Ablation studies revealed that the physical prior constraints and feature metric constraints played a significant role in improving Precision and Recall, respectively. Qualitative analysis of the model’s weights further confirmed advances the application of automated mixed radionuclide identification in high-reliability scenarios within the nuclear industry.
Keywords
Radioisotope Identification, Deep Learning, Physical Interpretability, Feature Metric Constraints, Interdisciplinary
INTRODUCTION
Radioisotope identification (RIID) is a key technology for detecting radioactive contamination and identifying radioac- tive sources, which plays a crucial role in nuclear safety and environmental protection [ Traditional RIID methods mainly search for characteristic peaks based on derivative, Fourier transform or wavelet anal-
ysis techniques, and match them with a predefined nuclide li- 8
brary (containing key information such as energy and branch- 9
ing ratio) to determine the type of nuclide [ However, due to factors such as statistical fluctuations and peak overlapping, these methods typically require manual it- erative adjustment of parameters to achieve optimal noise smoothing and peak finding performance.
This process is not only cumbersome, but also susceptible to human factors, which limiting the efficiency and accuracy of identification
In recent years, methods based on Deep Learning (DL), 18
which can automatically adjust parameters to extract deep features from energy spectra, have been widely studied. Ex- isting research results generally indicate that DL models have
significant advantages in high identification accuracy when 22
facing challenges such as low count rates and low SNR [
The mainstream DL methods rely on large-scale training 25
data and deep neural network (NN) to create an end-to-end Supported by the Sichuan Province “Open competition to select the best candidate” Project (No.24zs9102) mapping between input energy spectra and output probabili-
ties, achieving high recognition accuracy. In other words, the 28
performance of these methods is typically determined by the
scale of the training data [ 17 ]. 30
However, the shape of radioactive nuclide energy spectra is influenced by random factors, such as environmental back- ground and noise, making it challenging to obtain large-scale real energy spectra samples. RIID tasks clearly exhibit the characteristics of "Few-shot + multi-label" problems.
So far, purely data-driven methods have faced challenges in other fields, including high data acquisition costs, poor inter- pretability, and susceptibility to noise, as seen in cancer, fu- sion plasma, earthquakes, weather, and climate change. The key challenge lies in the fact that, without appropriate con- straints, the high dimensionality of the data leads to an ex- cessively large model search space [ ]. Due to the black- box nature of NN models, issues such as poor interpretability, high overfitting risk, and uncontrollable errors are inevitable.
These issues limit the widespread application of such meth- ods in the nuclear industry, particularly in high-reliability sce- narios.
A key issue that needs to be addressed is how to enable NNs to learn in a way that aligns with human expert knowl- edge and exhibits strong generalization ability, even with insufficient data samples. The data-knowledge dual-driven methods have brought transformative innovations in DL [ These methods integrate domain-specific prior knowledge
into the learning process, effectively constraining the model’s 54
hypothesis [ ]. Research shows that this method can reduce data requirements, improve generalization capabilities [ and enhance interpretability [ For the recent studies, we will discuss categorizing it into
two main areas: the exploration of model interpretability and methods for enhancing model reliability.
Researchers have studied the interpretability of the NN model to understand the reasons behind the high identifica- tion accuracy of NN. Mario Gomez-Fernandez et al. [ showed the connection between the region of interest (ROI) and the physical characteristics (e.g., photoelectric peaks) us- ing thermal vector maps. Yu Wang et al. [ ] used Class Activation Mapping (CAM) to visualize and explain the key regions learned by the model.
Findings from these studies suggest that effective classi-
fication models tend to concentrate their learning on physi- 70
cally meaningful regions of the gamma-ray spectrum—such 71
as characteristic photopeaks—rather than on irrelevant areas like background noise. This observation underscores the ne- cessity of establishing explicit constraints between model pre- dictions and spectral regions that embody physical charac- teristics. For instance, the presence of a full-energy peak at 661.7 keV is a strong indicator of the radionuclide Cs. Em-
bedding such relationships into model training has the poten- 78
tial to simultaneously enhance both the reliability and physi- cal interpretability of NNs.
Some researchers have worked to improve the robustness of the model. Zakariya Chaouai et al. [ ] improved the
robustness of the model by using adversarial learning tech- 83
niques to reduce the likelihood of NN misguidedness. Hao- 84
Lin Liu et al. [ ] performed classification by integrating lo- cal and global characteristics of a deep convolutional neural network (CNN), enhancing intraclass similarity and differ- ences between classes. For RIID in urban environments [ a detection model based on a weighted k-nearest neighbors (KNN) framework was developed to extract discriminative
features from energy spectra and minimize inter-class simi- 91
larity for the same radioactive material. Nevertheless, current approaches remain largely data- driven, with limited incorporation of domain-specific phys- ical knowledge. In particular, these methods often overlook the probabilistic constraints that link the presence of radionu- clides with specific spectral features, resulting in a lack of intuitive physical explainability.
Attention has become a key concept in the rapidly advanc- ing field of DL. In computer vision, salient visual attention models have been shown to extract low-level visual features to identify potential key regions [ ]. In multi-label image classification tasks, it has been proven to effectively capture semantic and spatial relationships between labels in images ]. These studies align with RIID, as each category of radionuclide is associated with specific characteristics of the energy spectrum region, such as the energy and intensity of full-energy peaks, etc..
Jiaqian Sun et al. [ ] presented a NN model that com-
bines convolution and self-attention mechanisms. Wang et 110
al. [ 33 ] proposed a DL-based method for recognizing mul- 111
tiple radioactive nuclides using a channel attention module.
This model explains how it utilizes feature information from the photopeak and Compton edge by interpreting spectral fea- tures. However, existing methods are purely data-driven and do not incorporate prior physical information to establish con- straints between prediction results and the physical character- istic regions in the energy spectrum.
To address these issues, we have developed a novel model that combines physical constraints with data-driven ap- proaches for mixed RIID. The model integrates prior physical information about the radionuclide characteristic peaks with feature space metric constraints, and can effectively capture the latent relationship between radionuclide presence proba- bilities and characteristic peaks through a learnable convolu- tional network, offering intuitive physical interpretability.
The specific contributions are outlined as follows: (1) A residual network incorporating multi-scale dilated convolutions was utilized for energy spectrum feature extrac- tion, allowing the model to efficiently integrate both local de- tails and global spatial semantic features. (2) The optimization space of NN weights was constrained by prior information on radionuclide physical characteristic peaks, thereby enhancing the reliability and interpretability of the model. (3) A feature metric constraint method tailored for the multi-label classification (MLC) of mixed radionuclides was proposed, constructing a discriminative feature space. (4) A clear and intuitive computational process was es- tablished that links radionuclide prediction probabilities with feature weights, enhancing the causal logic and physical in- terpretability of the model inference.
MATERIALS AND METHODS Backbone Network The overall architecture of the proposed model, shown in Fig. , consists of four components: the backbone net- work, prior physical constraints, feature metric constraints, and MLC.
Deep features of the energy spectrum were extracted using a one-dimensional CNN. Delicate local features need to be extracted for stacked, narrow or weak peaks. However, some radionuclide characteristic peaks, such as those of Eu, are distributed across multiple energy ranges, necessitating the extraction of global spatial semantic features.
A novel NN architecture called Multi-Scale Dilated Resid- ual Network (MSDR-Net) was proposed. This architecture integrates local detail features and global spatial semantic features in the energy spectrum, effectively addressing the
vanishing gradient problem while maintaining a lightweight 159
structure. The model input is a one-dimensional vector of the
gamma energy spectrum, X ∈ R (1 × 1024) . Each radionu- 162
clide category was labeled with a binary sequence , . . . , P , where if category present, and otherwise. Additionally, each energy spectrum sample was normalized to a maximum value of 1 to ensure consistent scaling across the dataset. Three parallel resid- ual modules then process the input features at multiple scales using different dilation rates. Dilated convolutions expand
BatchNorm CONCAT Backbone Input Leaky the receptive field by inserting gaps between kernel elements, without increasing the number of parameters.
The output of the dilated convolution can be expressed as:
k =0 W [ k ] · X [ i + r · ( k −⌊ K/ 2 ⌋ )]) + b (1) 173
Y [ i ] = (
The receptive field is defined by the following formula:
Receptive Field = ( K − 1) × r + 1 (2) 175
Where is the output feature, is the input feature,
W [ k ] is the convolution kernel, b is the bias, K = 3 is the 177
kernel size, and r = 4 , 6 , 8 are the dilation rate. Each module 178
captured contextual information from distinct energy ranges using two layers of dilated convolution and adjusted the chan- nel dimension with a convolution.
Leaky ReLU activation function was applied in the paper, which is defined as follows:
� x, x ≥ 0 kx, x < 0 (3) 184
σ ( x ) =
where is a small constant (0.15 in this study) that allows a small gradient to flow through negative inputs, preventing neurons from becoming inactive. It preserves the nonlinear activation characteristics and enhances the stability of gradi- ent propagation.
To accelerate model convergence and reduce overfitting, a Batch Normalization layer was applied after each convolu- tional layer, as expressed by the following formula: 𝑷𝒉𝒚𝑷𝒆𝒂𝒌
137 Cs:0.99
Adaptive Average Pooling 𝑷𝒉𝒚𝑨𝒕𝒕
60 Co:0.99
Global Sum Pooling Batch Prior physical constraints Multi Label classification Class A Class C 𝒕𝒓𝒊𝒑𝒍𝒆𝒕 Absent Class B 𝒄𝒐𝒏𝒕𝒓𝒂𝒔𝒕𝒊𝒗𝒆
y = x − E [ x ] �
denotes the mean of the input data, represents
the variance, and ϵ = 1 e − 5 is introduced to prevent division 195
by zero. The learnable parameters are introduced to restore the representational capacity of the model, allowing flexible adjustments to the output distribution of each layer.
Each branch output was padded to a fixed length of 1024 using dilated convolutions. The outputs were then concate- nated along the channel dimension into 384-dimensional fea- tures, which were assigned to the number of classes by a
convolutional layer , resulting in a class feature F MSDR ∈ 203
1024)
F MSDR = f MSDR ( X ; θ MSDR ) , F MSDR ∈ R ( C × 1024)
represents the parametric function of the back- bone network, while refers to the learned parameters, including the convolutional kernel weights and biases In general, the input energy spectrum samples undergo fea- ture decomposition for each class using MSDR-Net, facilitat- ing subsequent physical information processing and feature space constraints.
Prior Physical Constraints The characteristic peaks energy of gamma radionuclides and the associated Full Width at Half Maximum (FWHM) Dilated Residual Block Feature metric constraints
which is a function of energy. To constrain the model to fo-
cus on these physically significant regions, we introduce a 217
constant gamma characteristic peaks prior information ma-
trix M PhyPeak ∈ R C × N , where C is the number of nuclide 219
categories and N=1024 is the number of channels. 220
For each characteristic peak of a radionuclide, the corre- sponding channel addresses within the FWHM range are as-
signed a weight of 1, while the remaining channels are as- 223
signed a weight of α ( α ∈ [0 , 1]) . Based on the energy cali- 224
bration, the matrix P hyP eak indexed by channels was ob- tained:
M PhyPeak = [ V rad 1 , V rad 2 , . . . , V rad C ] T (6) 227
In the above equation, V rad n = [ w 1 , w 2 , . . . , w N ] is the 228
weight mask for the n -th radionuclide, with w i = 1 if channel 229
i is within a characteristic peak region, and w i = α otherwise. 230
This matrix is applied to the features extracted by the backbone network via element-wise multiplication to ob- tain physically-attentive features:
P hyAtt P hyP eak P hyAtt Subsequent adaptive average pooling and a convolu- tion layer are applied for dimensionality reduction and fine-
tuning, yielding the final physical attention features F P hyAtt . 237
During training, M P hyP eak is a constant attention weight 238
matrix derived from physical prior information. Traditional
data-driven methods rely on weakly supervised learning to 240
generate attention weights adaptively. Therefore, the method proposed in this paper imposes stronger constraints.
Adaptive Average Pooling was applied to reduce feature dimensionality from 1024 to 128 and smooth weights. Ad- ditionally, a convolution operation was employed for
further weight fine-tuning. 246
Feature Metric Constraints Implementing feature space metric constraints to separate dissimilar classes and group similar ones aids the model in extracting discriminative features, thereby improving model reliability, in previous studies [ However, in MLC tasks involving mixed radionuclides, sample labels often overlap, making it challenging to apply metric constraints at the sample level. In this study, we pro- posed a novel feature space metric constraint method specifi- cally designed for mixed RIID tasks.
For a training batch, the model outputs a feature tensor 258
Batch , where is the batch size, the num- ber of categories, and the feature dimension (e.g., 128).
Based on the multi-label ground truth y ∈{ 0 , 1 } B × C , we 261
construct the following sets for each nuclide category
Presence Feature Set E i : E i = { f b i | y b i = 1 } , containing 263
the feature vectors for nuclide from all samples in the batch where it is present. Here, denotes the feature vector for the nuclide from the sample.
Absent Feature Set A i : A i = { f b i | y b i = 0 } , containing 267
the feature vectors for nuclide from all samples where it is absent.
The distance metric is defined using cosine dis- tance, which focuses on the angular difference between vec-
tors and is less sensitive to absolute magnitudes, making it 272
suitable for comparing weight distributions:
d ( a , b ) = 1 − a · b ∥ a ∥ 2 ∥ b ∥ 2 (9) 274
Presence Constraint (Triplet Loss): For each nuclide present in the batch, we aim to pull features of the same nu- clide closer and push features of different nuclides apart. The triplet loss for nuclide triplet = max where is the eigenvector of the nuclide selected as the anchor point. is the eigenvector of the same nuclide as anchor (both being nuclide ) but farthest from it, representing the difficult positive sample.
Similarly, is the eigenvector of a different nuclide (not nuclide ) but closest to the anchor, representing the difficult negative sample.
The γ is a margin that enforces a minimum distance be- 288
tween positive and negative pairs, ensuring that the model learns to distinguish between them effectively.
The overall presence constraint loss is the average over all present nuclides in the batch:
L triplet = 1
i ∈I L ( i ) triplet (11) 293
where is the set of indices for nuclides present in the batch, and is the size of this set Absence Constraint (Contrastive Loss):
This constraint ensures that features for an absent nuclide are distant from all
features of present nuclides. For each feature f a i ∈A i of an 298
absent nuclide i , we calculate its minimum distance to any 299
feature of any present nuclide: ) = min The absence constraint loss is:
L contrastive = 1
f a i ∈A i max (0 , γ − d min ( f a i )) (13) 303
i/ ∈I
P hyAtt AdaAvg P hyAtt P hyAtt
where is the total number of absent nuclide features in the batch.
The loss penalizes absent nuclide features that are too close to any present nuclide features, pushing them apart by at least the margin The total feature metric loss is the sum of the two con- straints:
L Metric = L triplet + L contrastive (14) 311
Multi-Label Classification and Total Loss Function The multi-label classification head processes the feature
F PhyAtt ∈ R ( C × D ) to produce the predicted probability ˆ P i for 314
each nuclide category using global sum pooling followed by a sigmoid function:
k =1 F PhyAtt ( i, k ) (15) 317
S i =
ˆ P i = Sigmoid ( S i ) (16) 318
This linear-additive process enhances the causal inter- pretability between the feature weights and the final pre- diction. The classification loss is the Binary Cross-Entropy (BCE) loss: ) log
L BCE = −
The total loss function for training is a weighted sum of the 324
classification loss and the feature metric loss:
L T otal = L BCE + λ L Metric (18) 326
where is a hyperparameter controlling the weight of the metric constraint. The values of were determined via grid search, as detailed in Section IV.A.
DATA PREPARATION Monte Carlo Simulation To quantitatively assess the generalizability of the model, a particle transport model based on Geant4-11.1.3 [ ] was developed. This will allow us to accurately control multiple variables for quantitative analysis. In this study, two scintil- lators were considered: a 3-inch NaI(Tl) and a 1-inch CeBr as shown in Fig.
Fe,Cr,Ni,C point source (Tl)/ Read Out Scintillator The detectors consist of a 2-mm-thick stainless steel shell
containing Fe, Cr, Ni, and C, a 0.3-mm-thick MgO reflec- 339
tive layer, and a SiPM photoelectric converter. NaI detec- tor is widely used due to its low cost and ease of process- ing into larger sizes. In contrast, CeBr has emerged as an alternative to traditional NaI(Tl) and LaBr :Ce crystals due to its excellent energy resolution, negligible background, and
increasing application in environmental monitoring [ 41 – 43 ]. 345
The simulated gamma-ray energy response range spans from 30 to 3000 keV, with 100 million photons simulated per cal- culation. The statistical uncertainty of the simulated results ranged from 0.0002 to 0.0017. The energy deposition spec- trum of gamma photons in the detector was obtained by in- putting the target radionuclide energy and branching ratio.
FWHM ( E 0 ) = C 1 + C 2 �
E = FWHM
To simulate the finite energy resolution of real detectors, 354
we apply Gaussian broadening to the simulated energy de- 355
position values . Specifically, for each energy deposition event, a broadened energy is generated according to Equa- , which is then used as the pulse height recorded in the final simulated spectrum. The FWHM in the equation is cal- culated using Equation , modeling the intrinsic resolution of the detection system at energy We considered eight artificial radioactive nuclides ( 133 Ba, These in- clude common emission contaminants after nuclear accidents ] and includes challenges such as low-energy Comp- ton plateau interference and peak overlapping shown as Fig.
Data Augmentation Data augmentation was conducted by controlling the gross count, SNR and radionuclide mixing ratio. These methods
Detector Gaussian broadening coefficients Number of Mixed Gross Count NaI-Train Control Variables Challenges Dataset Size NaI-Train Test-ALL All parameters in Table 1 [TABLE:1] Comprehensive performance Test-Detector Detector and Gaussian broadening coefficients Variation in the energy resolution Test-NumofMixed Number of Mixed Larger number of nuclide mixtures Test-LowCounts Gross Count Low Gross Counts Test-LowSNR Low SNR (a) Energy spectrum samples in the NaI-1 detector have been widely used in previous research and proven to ef- fectively simulate the random noise in measured energy spec- tra [ The parameters to be adjusted and their corresponding ranges are listed in Table . Specifically, the SNR is defined , where is the counts of photon events emit- ted by the nuclides, and is the number of photon events come from the background. The background spectrum was (b) Energy spectrum samples in the NaI-2 detector experimentally measured with a NaI(Tl) or CeBr detector in a lead shielding environment over 3600 seconds. To generate a mixed energy spectrum, we randomly selected radionu- clides from the eight candidates and combined their individ- ual spectra, weighted by randomly assigned activity levels.
Dataset Splitting To rigorously evaluate the model’s generalization ability under controlled distribution shifts, as shown in Table quantitatively assess performance by adjusting multiple pa-
rameters. The training set is only 1% of the sample size of 388
each subgroup in the test set, to simulate the scenario of Few-
shot training. 390
The primary approach involves intentionally introducing
discrepancies between the training and test data using the 392
"hold-out method." This allows us to assess the model’s abil- ity to handle complex, unseen scenarios.
Specifically, the training set NaI-Train uses a NaI(Tl) 395
detector with a limited number of mixed radionuclides train ), creating a simpler environment for model learn- ing. In contrast, the test set introduces new challenges that the
model has not encountered during training. 399
(1) Increased number of mixed radionuclides: Test sets
with k test = 3 or 4 require the model to predict new spec- 401
tral patterns and radionuclide combinations, going beyond the
training set’s scope. 403
(2) Variation in Gaussian broadening coefficients: Alter- 404
ing these parameters simulates different detector responses, introducing instrumental uncertainties.
This strategy ensures that no radionuclide combinations
from the test set appear in the training set. As a result, 408
the model faces more demanding generalization requirements
than models evaluated using conventional partitioning meth- 410
ods (eg. randomly split all data into 70% and 30% for train- ing and testing). Importantly, this design exposes the model to entirely new data distributions during testing, providing a more rigorous challenge to its robustness.
Evaluation Metrics In this study, we use Precision, Recall, and F1 Score as the key evaluation metrics to assess the performance of our multi-label classification model. To handle the multi-label na- ture of the problem, we compute these metrics using Micro- Averaging , which aggregates the performance across all la- bels by treating the problem as a single binary classification task. This approach ensures a balanced evaluation.
Micro-Averaging combines the predictions for all labels by summing the true positives, false positives, and false nega- tives across all labels. The overall Precision, Recall, and F1 Score are then computed based on these aggregated counts, providing a comprehensive measure of model performance.
The formulas for Micro-averaged Precision, Recall, and F1 Score are as follows:
Precision micro = � N i =1 TP i � N i =1 ( TP i + FP i ) (21) 430
Recall micro = � N i =1 TP i � N i =1 ( TP i + FN i ) (22) 432
F 1 micro = 2 × Precision micro × Recall micro Precision micro + Recall micro (23) 434
In these equations, , and represent the true positives, false positives, and false negatives for the -th label, respectively. The final Micro-averaged metrics are computed by summing over all labels, providing a global evaluation of the model’s performance across all labels.
Compared Methods
To evaluate the recognition performance of the pro- 441
posed method, we considered three classic CNN models :
AlexNet[ ], VGG-16 [ ], and ResNet-18 [ VGG-16 is a representative of dense and deep CNNs, while ResNet-18 represents the residual network structure. These models were originally designed for 2D CNNs in image
recognition tasks but have been adapted to 1D in this paper 447
based on their original architecture.
F. Model Training 449
During training, the Train dataset was split into an 8:2 ratio
for training and validation. During each data reading step in 451
training, Gaussian random noise with a mean of 0.1 and vari- 452
ance of 0.04 was added to augment the training dataset. The
initial learning rate was set to 1e-4, and parameter optimiza- 454
tion was performed using the Adam optimizer. All compared models were trained from scratch with ran-
dom initialization. All layers of the networks were updated 457
during training to ensure a fair comparison. 458
Methods
F1 Score (%) Precision (%) Recall (%) Params (M) FLOPs (G) VGG-16-CNN ResNet-18-CNN AlexNet-CNN CLS+PHY FhyMetric-Net Determination Parameter weight tio between the characteristic peak region and the non- characteristic peak region (1 : ) is a critical hyperpa- rameter of the model.
To determine its optimal value without relying on empirical settings, we conducted a sys- tematic sensitivity analysis.
We fixed the weight of the characteristic peak region to 1 and tested multiple val-
ues of α within the interval [0 . 01 , 1 . 0] (specifically α = 468
For each value of , the model was trained from scratch and evalu-
ated on a unified comprehensive test set, with the primary 471
evaluation metric being the micro-average F1 Score. The ex- perimental results are shown in Fig.
The analysis revealed that while the model achieved the op-
timal F1 Score (95.61%) with ( α = 0.7) (ratio 1:0.7), a slightly 475
lower performance (94.81%) was observed at ( α = 0.05) (ratio 476
1:0.05), which was only about 0.8% lower but still superior to most configurations.
After considering the trade-off between performance and
physical interpretability, we chose ( α = 0.05) as the default 480
parameter. This decision was based on two factors: first, a smaller enhances physical interpretability by concentrat- ing the model’s feature weight around the characteristic peak region, which strengthens the physical transparency of the model’s decision-making; second, the performance cost of a
0.8% drop in F1 Score is minimal compared to the substantial 486
benefits in model reliability and physical consistency, align- ing with the study’s focus on embedding physical priors for improved interpretability.
Determination of Parameter A comprehensive grid search was performed to determine the optimal values The search space was
Score. The optimal performance was achieved with γ = 0 . 3 497
and λ = 0 . 005 shown as in Fig. 5 [FIGURE:5] . 498
To comprehensively evaluate the performance of the FhyMetric-Net model, we compared it with several bench- mark models and conducted ablation experiments to verify were conducted on the Test-ALL dataset, with the results Comparison with Traditional Models:
We first com-
pared FhyMetric-Net with several classic deep learning mod- 507
els, including AlexNet, VGG-16, and ResNet-18. The results
clearly demonstrate that FhyMetric-Net significantly outper- 509
forms these models in key performance metrics. FhyMetric-Net achieved an F1 Score of 97.589%, out- performing ResNet-18, VGG-16, and AlexNet by 3.999%, 5.812%, and 5.846%, respectively. This advantage is primar- ily attributed to its excellent recall rate of 95.304%, which,
while maintaining a high precision of 99.986%, enables the 515
model to more comprehensively identify existent isotopes and effectively reduce false negatives. In contrast, although tra- ditional models also exhibit high precision, their relatively lower recall rates indicate issues with missed detections in complex mixed energy spectrum scenarios.
Notably, FhyMetric-Net not only leads in performance but
also has a significantly smaller parameter count (0.051032 522
M), just 1.316% of the next-best ResNet-18 and only 0.057% of VGG-16. This highlights the efficiency of the lightweight network structure designed specifically for gamma energy spectrum characteristics, effectively capturing key features while avoiding the substantial parameter redundancy present
in traditional general-purpose architectures. Ablation Experiments:
To analyze the contribution of each innovative component in FhyMetric-Net, we conducted a series of systematic ablation experiments.
Baseline Model (CLS): The model trained using only binary cross-entropy loss (BCE Loss) (CLS) performed poorly, achieving an F1 Score of 87.707%.
Incorporating Physical Priors (CLS+PHY): After applying prior physical constraints to the baseline model, the performance improved dramatically. The F1 Score increased from 87.707% to 94.812%, and the recall rate rose from 82.924% to 91.634%. This strong improvement convincingly demonstrates that incorpo- rating physical prior information provides crucial in- ductive bias and optimization guidance for model train-
ing. It helps the model focus on physically meaningful 543
regions, such as characteristic peaks, greatly alleviating
training difficulties and enhancing both the reliability 545
of features and the model’s generalization ability.
Introducing Feature Metric Constraints (FhyMetric-Net):
By further adding feature met- ric constraints on top of CLS+PHY, forming the complete FhyMetric-Net model, the performance and the recall rate improved to 95.304%. This indi- cates that the feature metric constraint, by imposing intra-class compactness and inter-class separation in the high-dimensional feature space, sharpened the model’s decision boundary.
It guided the model to learn feature representations that are more robust to complex situations like background fluctuations and peak overlaps, thereby optimizing recall rate on top of the already high precision.
Based on the above results, we can conclude that the in- troduction of physical prior constraints is key to the model’s success. It addresses the core issue of insufficient data-driven
model training in Few-shot scenarios, providing a reliable 564
initial optimization direction. Metric learning further refines 565
the decision boundary, and the feature metric constraint fine- tunes it by improving the distribution quality in feature space, thus enhancing the model’s discriminative power, especially on difficult samples (e.g., low activity, complex mixtures).
There is a clear synergistic effect between physical and metric constraints. The physical constraints provide a solid starting
point for feature learning, while the metric constraints "re- 572
fine" it, leading to FhyMetric-Net achieving exceptional clas-
sification performance while maintaining high efficiency. 574
Control Variables To further investigate the effects of physical prior con- straints and feature metric constraints, we conducted quan- titative tests on four different datasets, including those with varying detector types, mixed isotopes, low gross counts, and low SNR, shown as in Table . As shown in Fig. , the results demonstrate that the proposed method achieves state- of-the-art (SOTA) F1 Score performance across all tests.
As shown in Fig. , the F1 Score of the model fluctuates be- tween 94.597% and 96.329% when facing different detector types with varying energy resolution, demonstrating strong overall performance. The performance of CLS+PHY is gen- erally superior to other methods, indicating that incorporating prior physical information to optimize constraint weights ef- fectively enhances the utilization of limited parameters. The precision performance of CLS+PHY is comparable to that of FhyMetric-Net, with the primary improvement in perfor- mance due to the increased Recall. This suggests that pres- ence constraints enhance the discriminative power of features
to ensure recognition precision, while the absence constraints 594
reduces the false positive rate. As shown in Fig. , when faced with mixed isotopic spectra
(containing more than three isotope) not encountered during 597
training, FhyMetric-Net outperforms other methods. The per- 598
formance gap primarily stems from Recall. When the num- ber of isotopes in the mixture is less than three, all methods achieve an F1 Score greater than 95%. This methods based on
deep learning (DL) perform well on test data from the same 602
distribution as the training set. When the isotope mixture con- 603
tains three isotopes, FhyMetric-Net maintains an F1 Score above 95%. However, when the mixture includes four iso- topes, its performance drops to 87.599%, though it still out-
performs other methods. CLS+PHY shows a more significant 607
performance improvement over CLS, demonstrating that the physical prior constraints contribute to the model’s general- ization ability.
In Fig. , we evaluate the model’s performance on low-count and low-SNR spectra. As with the previous analy- sis, CLS+PHY and FhyMetric-Net exhibit comparable preci- sion performance. The primary improvement with the inclu- sion of feature metric constraints is observed in Recall, fur- ther confirming the role of feature metric constraints. In Fig. , the F1 Score of CLS+PHY is on par with that of ResNet-
18. In extreme low-SNR conditions (0.1-0.2), ResNet-18 out-
performs CLS+PHY in F1 Score, likely due to its innovative residual network structure and larger parameter count. How- ever, overall, ResNet-18 has lower precision than CLS+PHY, indicating that the incorporation of physical prior information helps the model focus on key regions with physical character- istics, enhancing precision.
Overall, all methods maintain high recognition precision, 625
with performance differences primarily in Recall. Notably, CLS+PHY and FhyMetric-Net show comparable precision, and FhyMetric-Net with feature metric constraints further im- proves Recall, consistent with the overall performance evalu- ation in Table Physical priors (such as the information of characteristic peaks in this paper) can be regarded as domain knowledge independent of detector types when energy resolution fluctu-
ations are minimal. By implementing necessary constraints, 634
the features learned by the model will inherently focus on the "identity" of the nuclide rather than specific data distributions.
Through the application of feature metric constraints and lim-
ited weight fine-tuning, the model’s recall rate can be further 638
improved, thereby enhancing its overall generalization capa- bility. (a) The t-SNE visualization of CLS+PHY Model (b) The t-SNE visualization of FhyMetric-Net Model P hyAtt weights.
Methods
Presence loss Absence loss CLS+PHY FhyMetric-Net Feature Metrics Analysis To better demonstrate the effectiveness of the measurement constraints in the feature space, we visualized the P hyAtt weights distribution in the feature space using t-distributed Stochastic Neighbor Embedding (t-SNE) [ ] at Test-ALL dataset.
The t-SNE visualization of the P hyAtt features (Fig.
reveals that after training with metric constraints, the feature 648
distributions for different nuclides become more separable in the embedding space. Features of the same nuclide form com- pact clusters, while features of absent nuclides are well sepa- rated from those of existent nuclides, validating the effective-
ness of the proposed metric learning approach. 653
For quantitative analysis, we computed the average loss values of the model for presence and absence constraints sep- arately, using Equation and Equation . The results are presented in Table After applying the feature metric constraints, the loss asso- ciated with the presence constraint decreased from 0.013368 to 0.011823. This suggests a reduction in intra-class distance and an increase in inter-class distance, indicating improved feature separation.
The absence constraint loss also decreased from 0.001707 to 0.000958, demonstrating that the feature metric constraints effectively adjust feature weights to maximize the distance between spectra with and without nuclides, which will sig-
nificantly reduce the false alarm rate and further improve the 667
recall. EXPLAINABLE ANALYSIS To investigate the impact of feature metric constraints on
the model’s discriminative mechanism, we performed a vi- 671
sual analysis of its P hyAtt feature weight distribution using
the same test spectrum (containing four nuclides: 131 I, 60 Co, 673
134 Cs, and
Cs), as shown in Fig. . By comparing the model’s weights before and after applying the constraints, we
uncover the mechanism that enhances decision reliability. 676
As shown in Fig. , the 604.721 keV characteristic peak Cs has a relatively low count and overlaps with the 661.7 keV peak of Cs. This presents a challenge for the model, as it must effectively handle low gross count spec- tra with low SNR levels. In contrast, the 364.489 keV peak
of 131 I has a significantly higher count, which may interfere 682
with the model’s ability to correctly assign weights to key re- gions of other nuclides, such as the 1173.2 keV and 1332.5 keV peaks of It is important to note that the feature weights output by the model include both positive and negative values. In the visualization, negative weights are represented by darker col- ors, while positive weights (bright colors) indicate areas the model is emphasizing. Negative weights (dark colors) reflect the model’s active suppression of certain regions. This sup-
pression mechanism is vital for excluding irrelevant features 692
or background interference, which enhances the model’s dis- criminative precision.
137 Cs
661.7 keV
131 I
364.489 keV
134 Cs
604.721 keV
134 Cs
795.84 keV
60 Co
1173.2 keV, 1332.5 keV
133 Ba
356.013 keV (a) The explainable analysis for CLS+PHY model
137 Cs
661.7 keV
131 I
364.489 keV
134 Cs
604.721 keV
134 Cs
795.84 keV
60 Co
1173.2 keV, 1332.5 keV (b) The explainable analysis for FhyMetric-Net model P hyAtt weights.
Without feature metric constraints (CLS+PHY), the model mistakenly assigned high weights to the non-existent
feature peak at 356.013 keV, leading to false positives. This suggests that relying solely on physical prior constraints may still cause the model to overfit to noise or spectral similarities
that occasionally appear in the training data. However, after 700
introducing feature metric constraints (FhyMetric-Net), the model completely suppressed the response to Ba. The un-
derlying mechanism is the absence constraint (Equation 13 ), 703
which forces the model to learn that "the absence feature rep- resentation of Ba must remain distinct from those of ex- isting nuclides," thus eliminating false positives in the feature space and improving specificity.
Co, which was present but missed by the CLS+PHY model, the weight distribution provides further insight. Al- though CLS+PHY assigned strong weights to the 1173.2 keV and 1332.5 keV peaks of Co, it failed to suppress weights in other regions. As a result, the SNR of the feature was insuf- ficient, and classification confidence fell below the required threshold (The sum of weights is -50.51763170245655).
In contrast, FhyMetric-Net exhibited global optimization: it maintained moderate weights at key characteristic peaks
while applying significant negative weight suppression in 717
other regions, thus improving the discriminative power of the target features.
At the 604.721 keV feature peak of Cs, the CLS+PHY weight distribution was skewed to the right, while FhyMetric-
Net’s weights were shifted left, aligning more accurately with 722
the actual peak position. This subtle shift suggests that the feature metric constraint guides the model to avoid potential overlap interference with the 661.7 keV peak of Cs, fur-
ther refining feature localization and enhancing the ability to 726
distinguish complex spectral shapes. This pattern results from the joint optimization of intra- class feature compactness and inter-class feature separation, achieved by the presence constraint. This strategy success- fully prevents missed detection of Co and fine-tunes the weight distribution of Cs, improving feature discriminabil- The visual analysis above demonstrates that feature metric constraints optimize decision-making by guiding the model to learn globally discriminative feature distribution patterns.
The core mechanism involves the synergistic optimization of 737
positive and negative weights—positive weights reinforce key features, while negative weights suppress interference. This
dual mechanism enables more reliable identification in com- 740
plex spectral backgrounds.
CONCLUSIONS
This paper introduces FhyMetric-Net , a lightweight and interpretable hybrid RIID model. By deeply integrating do- main knowledge (physical priors of characteristic peaks) with
data-driven approaches (feature metric learning), the model 746
effectively addresses key challenges in Few-shot, multi-label scenarios, such as poor interpretability, weak generalization,
and low reliability in traditional deep learning models. The 749
main contributions of this work can be summarized as fol- lows:
1. A novel NN paradigm, Physical Constraint-Driven
Metric Learning Sharpening, is proposed. By con- 753
structing a physical prior constraint matrix, the model
receives high-value initialization guidance, signifi- 755
cantly reducing reliance on large datasets and mitigat- ing overfitting risks.
2. A feature space metric constraint method for mixed ra-
dionuclide recognition tasks is designed. By enforc- 759
ing intra-class compactness and inter-class separation in the feature space, the model is compelled to learn more discriminative feature representations, achieving SOTA performance in complex scenarios such as low gross count, low SNR levels, and multi-nuclide mix- tures.
3. The model achieves both lightweight design (with only
0.051M parameters) and high accuracy (comprehensive
F1 Score performance> 95%), while providing intuitive 768
weight visualizations. This creates a clear causal link between predicted results and physical features, laying a solid foundation for its deployment in high-reliability nuclear industry applications.
Current Issues:
• The proposed method requires tuning several hyperpa- 774
rameters during the training process, which presents a 775
challenge. There is room for further improvement in the model’s performance for spectra with more mixed radionu- clides.
Future Works: Explore dynamic or adaptive physical constraint strengths (e.g., parameter ) and metric constraint boundaries (e.g., parameter ) to allow the model to intelligently adapt to data of varying quality.
Investigate more elegant feature metric constraint methods to better balance the geometric structure of the feature space with the final classification objective, fur- ther enhancing the model’s generalization ability.
Explore how to incorporate richer physical informa- tion (e.g., Compton plateau profiles, Peak-to-Compton ratios) in a differentiable manner, leveraging tech-
niques such as transfer learning to address challenges 792
in broader detector applications, including CdTe and HPGe detectors.
Address the critical issue of how to systematically and fairly evaluate the model’s generalization ability. De-
signing appropriate test datasets will be key to moving 797
the model toward practical applications. Systematically evaluate various noise models, includ- ing Poisson noise, and explore the correlation between their parameters and the noise characteristics of real gamma spectrometers, aiming to obtain better and more targeted data augmentation solutions.
We believe that FhyMetric-Net offers a promising solution to the common challenges of “Few-shot, High-reliability, In- terpretable” scientific computing, and its core ideas may serve as a reference for future radionuclide automatic identification
[1] J. Klusoˇn, Environmental monitoring and in situ gamma 810
spectrometry. Radiat. Phys. Chem. 209-216 (2001). H. Kofuji, In situ measurement of Cs and Cs in seabed using underwater -spectrometry systems: application in sur- veys following the Fukushima Dai-ichi Nuclear Power Plant accident. J. Radioanal. Nucl. Chem. , 1575-1579 (2015).
D. Connor, P.G. Martin, T.B. Scott, Airborne radiation mapping: overview and application of current and future aerial systems. Int. J. Remote Sens. , 5953-5987 (2016).
E.G. Androulakaki, M. Kokkoris, C. Tsabaris, et al ., In situ spectrometry in a marine environment using full spectrum anal- ysis for natural radionuclides. Appl. Radiat. Isot. , 76-86 (2016).
D. Fagan, S. Robinson, R. Runkle, Statistical methods ap- plied to gamma-ray spectroscopy algorithms in nuclear se- curity missions. Appl. Radiat. Isot. , 2428-2439 (2012).
J. Routti, S. Prussin, Photopeak method for the com- puter analysis of -ray spectra from semiconductor de- tectors.
Nucl. Instrum. Methods. (1969). I.A. Slavi´c, S.P. Bingulac, A simple method for full automatic gamma-ray spectra analysis. Nucl. Instrum. Methods. , 261- 268 (1970).
G. Xiao, L. Deng, B. Zhang et al., A nonlinear wavelet
method
smoothing low-level gamma- spectra. Nucl. Technol (2004).
C.J. Sullivan, S.E. Garner, K.B. Butterfield, Wavelet anal- ysis of gamma-ray spectra.
IEEE Symposium Conference Record Nuclear Science 2004 pp. 281-286 Vol. 1 (2004).
C.J. Sullivan, M.E. Martinez, S.E. Garner, Wavelet anal- sodium iodide spectra.
Nuclear Science Symposium Conference Record (2005).
M. Monterial, K. Nelson, S. Labov, et al ., Benchmarking Al- gorithm for Radio Nuclide Identification (BARNI) Literature Review. (2019,2).
S. Qi, W. Zhao, Y. Chen et al., Comparison of machine learn- ing approaches for radioisotope identification using NaI(TI) gamma-ray spectrum. Appl. Radiat. Isot. pp. 110212 (2022).
S.M. Galib, P.K. Bhowmik, A.V. Avachat, et al
comparative study of machine learning methods for au- 857
tomated identification of radioisotopes using NaI gamma- ray spectra. Nucl. Eng. Technol. , 4072-4079 (2021).
C. Li, S. Liu, C. Wang, et al ., A new radionuclide identi- fication method for low-count energy spectra with multiple radionuclides. Appl. Radiat. Isotopes. , 110219 (2022). methods.
BIBLIOGRAPHY S. Qi, S. Wang, Y. Chen, et al ., Radionuclide identification method for NaI low-count gamma-ray spectra using artifi- cial neural network. Nucl. Eng. Technol. , 269–274 (2022).
Liu, HL., Ji, HB., Zhang, JM, et al ., Identification algorithm of low-count energy spectra under short-duration measurement based on heterogeneous sample transfer. NUCL SCI TECH. , 42 (2025).
[17] Sarker, I.H, Deep Learning: A Comprehensive Overview 873
on Techniques, Taxonomy, Applications and Re- 874
search Directions. SN COMPUT. SCI. 420 (2021). P. A. K. Reinbold, L. M. Kageorge, M. F. Schatz et al., Robust
learning from noisy, incomplete, high-dimensional experimen- 878
Commun. 12, 3219 (2021). J. Nie, J. Jiang, Y. Li et al., Data and domain knowl- edge dual-driven artificial intelligence:
Survey, applica- tions, and challenges. EXPERT SYST 42(1), e13425 (2025).
L. Von Rueden, S. Mayer, K. Beckh et al., Informed
machine learning–a taxonomy and survey of inte- 887
grating prior knowledge into learning systems. IEEE 888
Trans. Knowl. 35(1), (2021).
[21] A. Borghesi, F. Baldo, M. Milano, Improving deep learning 891
models via constraint-based domain knowledge: a brief survey (2020).
R. Roscher, B. Bohn, M. F. Duarte et al., Explain-
able machine learning for scientific insights and 895
discoveries. Access (2020). Mario Gomez-Fernandez, Weng-Keen Wong, Akira Tokuhiro,
et al ., Isotope identification using deep learning:
An ex- 899
planation. Nucl. Instrum. Meth. A , 164925 (2021).
Yu Wang and Qingxu Yao and Quanhu Zhang, et al Explainable radionuclide identification algorithm based on convolutional neural network class activation mapping.
Nucl. Technol. (2022).
[25] Zakariya Chaouai, Geoffrey Daniel, Jean-Marc Mar- 907
tinez, et al ., Application of adversarial learning for 908
identification radionuclides gamma-ray spec- Nucl.
Instrum. Meth. (2022). Zhang, et al . A novel ap- proach for feature extraction from a gamma-ray energy spec- trum based on image descriptor transferring for radionu- clide identification. NUCL SCI TECH. , 158 (2022).
Liu, HL., Ji, HB., Zhang, JM., et al . Novel algorithm
for detection and identification of radioactive materials in an urban environment. NUCL SCI TECH. , 154 (2023).
[28] Vaswani, Ashish, Parmar N, et al . Attention is all you need. in 921
Paper In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17) (Red Hook, USA, 2017).
Zhaoyang., Zhong, Guoqiang., Hui.,
review on the attention mechanism of deep learn- 926
NEUROCOMPUTING. (2021). Zhu, Feng., Li, Hongsheng., Ouyang, Wanli., et al . Learn- ing Spatial Regularization With Image-Level Supervisions for Multi-Label Image Classification. in Paper Presented at
the Proceedings of the IEEE Conference on Computer Vi- 932
sion and Pattern Recognition (CVPR). (Hawaii, USA, 2017) 933
[31] Gao, Bin-Bin and Zhou, Hong-Yu, et al . Learning to Discover 935
Multi-Class Attentional Regions for Multi-Label Image Recog-
nition, IEEE Transactions on Image Processing 30 , 5920-5932 937
(2021). Jiaqian Sun, Deqing Niu, Jie Liang, et al . Rapid nuclide
identification algorithm based on self-attention mechanism 940
neural network. Ann. Nucl. Energy. , 110708 (2024).
Wang Y, Zhang Q, Yao Q, et al . Multiple radionuclide iden-
tification using deep learning with channel attention module 944
and visual explanation. Front. Phys. 1036557 (2022). 10.3389/fphy.2022.1036557 Schlagenhauf, T., Lin, Y. Noack, B. Discriminative feature
learning through feature distance loss. Machine Vision and Ap- 948
plications , 25 (2023). KAYA, B˙ILGE, H.¸S. Metric Learn- Survey.
Symmetry. (2019).
[36] J. Lu, J. Hu and J. Zhou, Deep Metric Learning for Vi- 954
sual Understanding: An Overview of Recent Advances, Signal Processing Magazine, (2017).
S. Agostinelli, J. Allison, K. Amako, et al ., Geant4—a simu- lation toolkit. Nucl. Instrum. Methods Phys. Res. A. , 250- 303 (2003).
J. Li, S. Liu, Y. Zhang et al., Pre-assessment of dose rates marine biota discharge Haiyang Nuclear Power Plant, China.
Environ. Radioactiv. (2015). S. Ueda, H. Hasegawa, H. Kakiuchi et al., Fluvial dis- charges of radiocaesium from watersheds contaminated by the Fukushima Dai-ichi Nuclear Power Plant accident, Japan. J. Environ. Radioactiv. pp. 96-104 (2013).
G. Katata, M. Ota, H. Terada et al., Atmospheric discharge and dispersion of radionuclides during the Fukushima Dai- ichi Nuclear Power Plant accident. Part I: Source term estima- tion and local-scale atmospheric dispersion in early phase of the accident. J. Environ. Radioactiv. pp. 103-113 (2012).
[41] Christos Tsabaris, Effrossyni G. Androulakaki, Aristides 977
Prospathopoulos,. et al Development and optimization underwater in-situ cerium bromide spectrometer radioactivity measurements aquatic environ- men. J. ENVIRON. RADIOACTIV. 12-20 (2019).
Wang, M., Gu, Y., Xiong, ML,. et al . Method for rapid
warning and activity concentration estimates in online water 984
-spectrometry systems. NUCL SCI TECH. , 49 (2024).
[43] Gu, Y., Sun, K., Ge, LQ,. et al . Investigating the minimum de- 987
tectable activity concentration and contributing factors in air- borne gamma-ray spectrometry. NUCL SCI TECH. , 110 (2021).
Krizhevsky, Alex, Sutskever, Ilya, Hinton, Geoffrey E,. et al ImageNet Classification with Deep Convolutional Neural Net- works. in Paper Presented at the Advances in Neural Infor- mation Processing Systems (NIPS) (Lake Tahoe, USA, 2012).
K. Simonyan, A. Zisserman, Very deep convolutional networks
for large-scale image recognition, in Paper Presented at the 997
3rd International Conference on Learning Representations (San 998
Diego, USA, 2015)
[46] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for im- 1000
age recognition, in Paper Presented at the IEEE Conference on 1001
Computer Vision and Pattern Recognition (CVPR) (Las Vegas, 1002
USA, 2016)
[47] L. Van der Maaten, G. Hinton, Visualizing data using t-SNE. J. 1004
Mach. Learn. Res. (11), 2579–2604.721 (2008)