ChinaRxiv

Transfer learning empowers material Z classification with muon tomography

Wang, Mr. Haochen, Zhang, Mr. Zhao, Yu, Dr. Pei, Bao, Miss Yuxin, Prof. Jiajia Zhai, Xu, Dr. Yu, Deng, Dr. Li, Dr. Sa Xiao, Dr. Xueheng Zhang, Yu, Dr. Yuhong, Dr. Weibo He, Chen, Dr. Liangwen, Zhang, Prof. Yu, Yang, Prof. Lei 杨磊, Sun, Prof. Zhiyu 孙志宇

Submitted 2025-06-24 | ChinaXiv: chinaxiv-202506.00229

Note: Figures in this paper have not yet been translated.

Abstract

Cosmic-ray muon sources exhibit distinct scattering angle distributions when interacting with materials of different atomic numbers (Z values), facilitating the identification of various Z-class materials, particularly those radioactive high-Z nuclear elements. Most of the traditional identification methods are based on complex muon event reconstruction and trajectory fitting processes. Supervised machine learning methods offer some improvement but rely heavily on prior knowledge of target materials, significantly limiting their practical applicability in detecting concealed materials. For the first time, transfer learning is introduced into the field of muon tomography in this work. We propose two lightweight neural network models for fine-tuning and adversarial transfer learning, utilizing muon tomography data of bare materials to predict the Z-class of coated materials. By employing the inverse cumulative distribution function method, more accurate scattering angle distributions could be obtained from limited data, leading to an improvement by nearly 4% in prediction accuracy compared with the traditional random sampling based training. When applied to coated materials with limited labeled or even unlabeled muon tomography data, the proposed method achieves an overall prediction accuracy exceeding 96%, with high-Z materials reaching nearly 99%. Simulation results indicate that transfer learning improves prediction accuracy by approximately 10% compared to direct prediction without transfer. This study demonstrates the effectiveness of transfer learning in overcoming the physical challenges associated with limited labeled/unlabeled data, highlights the promising potential of transfer learning in the field of muon tomography.

Full Text

Preamble

Transfer Learning Empowers Material Z Classification with Muon Tomography

Haochen Wang,¹,∗ Zhao Zhang,²,³,∗ Pei Yu,⁴,² Yuxin Bao,¹ Jiajia Zhai,⁴,² Yu Xu,⁴,² Li Deng,⁴,² Sa Xiao,⁵ Xueheng Zhang,²,⁶,⁴ Yuhong Yu,²,⁶,⁴ Weibo He,⁵,† Liangwen Chen,⁴,²,⁶,‡ Yu Zhang,¹,§ Lei Yang,²,⁴,⁶ and Zhiyu Sun²,⁴,⁶

¹School of Physics, Hefei University of Technology, Hefei 230601, China
²Institute of Modern Physics, CAS, Lanzhou 730000, China
³Frontiers Science Center for Rare Isotopes, Lanzhou University, Lanzhou, 730000, China
⁴Advanced Energy Science and Technology Guangdong Laboratory, Huizhou 516000, China
⁵Institute of Materials, China Academy of Engineering Physics, Jiangyou 621907, China
⁶School of Nuclear Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China

Cosmic-ray muons exhibit distinct scattering angle distributions when interacting with materials of different atomic numbers (Z values), enabling the identification of various Z-class materials—particularly radioactive high-Z nuclear materials. Traditional identification methods rely primarily on complex statistical iterative reconstruction or simple trajectory approximation. While supervised machine learning offers some improvement, it heavily depends on prior knowledge of target materials, significantly limiting practical applicability for detecting concealed materials. This work introduces transfer learning to muon tomography for the first time. We propose two lightweight neural network models for fine-tuning and adversarial transfer learning that utilize muon scattering data from bare materials to predict the Z-class of materials coated with typical shieldings (e.g., aluminum or polyethylene), simulating practical scenarios such as cargo inspection and arms control. By introducing a novel inverse cumulative distribution-based sampling method, we obtain more accurate scattering angle distributions from data, yielding nearly 4% improvement in prediction accuracy compared to traditional random sampling-based training. When applied to coated materials with limited or even unlabeled muon tomography data, our method achieves overall prediction accuracy exceeding 96%, with high-Z materials reaching nearly 99%. Simulation results indicate that transfer learning improves prediction accuracy by approximately 10% compared to direct prediction without transfer. This study demonstrates the effectiveness of transfer learning in overcoming physical challenges associated with limited labeled/unlabeled data and highlights its promising potential in muon tomography.

Keywords: Transfer learning · Muon scattering · Z-class identification · Neural network

Introduction

Muon tomography has demonstrated an important role in detection and imaging over the past decades [4–7]. Muons were discovered in cosmic rays by Carl D. Anderson and Seth H. Neddermeyer in 1937. These particles are generated when primary cosmic rays collide with atomic nuclei in the upper atmosphere, initiating nuclear-electromagnetic cascades. Most cosmic-ray muons originate from decays of charged pions and kaons produced in these interactions. Below 1 GeV, the cosmic-ray muon energy spectrum is nearly flat, but it steepens in the 10–100 GeV range, closely following the primary cosmic-ray spectrum. Above 100 GeV, the spectrum becomes even steeper because high-energy pions are more likely to interact with the atmosphere before decaying into muons [1]. At sea level, muons are the most abundant charged particles, with an intensity of approximately 1 cm⁻² min⁻¹ [2, 3]. Like other charged particles, muons interact with atomic matter, leading to energy loss and multiple scattering. However, their interactions are purely electroweak, resulting in significantly lower energy loss compared to most other particles and granting them exceptional penetration capability.

In 1970, muon transmission was first developed for discovering new chambers inside a pyramid by Alvarez et al. [8]. Since then, muon techniques have been widely applied to nuclear safeguards [9], volcano studies [10], and underground tunneling [11]. Cheng et al. applied muon radiography to investigate the internal density distribution of the Laoheishan volcanic cone [12]. In 2003, Los Alamos National Laboratory (LANL) first introduced muon scattering tomography to security detection and material identification [13–15], underscoring its immense potential for detecting special nuclear materials such as illicit uranium concealed within cargo and containers. Leveraging these distinctive physical properties, muon tomography has become an effective method for detecting large-scale, high-density objects.

Z classification—grouping materials based on atomic number, typically divided into low-Z, mid-Z, and high-Z categories—reflects both physical characteristics (e.g., scattering behavior) and practical needs in inspection and nuclear verification tasks [16]. Z-class identification based on muon scattering is crucial for security screening and industrial applications. Xiao et al. presented a modified multi-group model to improve image resolution of high-Z materials [17]. Ji et al. proposed a method for imaging materials using the ratio of secondary particles produced by muons [18]. Yu et al. proposed a novel imaging reconstruction method with large voxels and angle capping to reduce time and storage consumption [19]. However, most traditional identification methods rely on complicated reconstruction of muon events and track fitting processes, which significantly increase algorithm design and computational costs.

Compared to traditional physics-based reconstruction methods, deep learning models can automatically extract and learn complex features from data. This end-to-end paradigm significantly improves computational accuracy and efficiency [20, 21]. Deep learning techniques have demonstrated outstanding performance in compressed sensing, which is well-suited for image reconstruction and array processing [22–24]. Meanwhile, deep learning has shown great potential in physics, particularly in nuclear and particle physics [25–27]. Common deep learning methods rely on supervised learning based on labeled samples. Gao et al. proposed a convolutional neural network for feature extraction to classify materials based on muon scattering [28]. Our previous work also explored a feasible solution for muon-based material identification using supervised deep learning [29, 30]. However, in practical scenarios, obtaining labeled scattering angle data for coated materials is the major challenge. This scarcity of labeled data limits the practical application of supervised learning for coated material prediction.

As a critical deep learning strategy, transfer learning enables knowledge learned in one domain (the source domain) to be transferred to another related domain (the target domain), effectively mitigating limited labeled data problems [31–33]. This inspires our proposal of a lightweight neural network based on transfer learning for Z-class identification of coated materials using muon scattering data. We define Z-class identification of bare materials as the source domain task and that of coated materials as the target domain task. This formulation is motivated by realistic applications—such as cargo inspection [34], nuclear safeguard verification, and arms control [35, 36]—where internal composition is unknown or only partially known. By introducing a novel data preprocessing and sampling method, the model transfers the feature-label mapping learned from bare material scattering angles to predict Z-class for coated materials. We designed two coated material scenarios using Al and PE as coating materials (Fig. 1 [FIGURE:1]). Utilizing two transfer learning paradigms—fine-tuning [37] and Domain-Adversarial Neural Network (DANN) [38]—we achieve Z classification for nine coated materials (three each from high, mid, and low-Z categories).

To evaluate our approach, we conducted simulations on Z-class classification tasks with muon scattering angle data using Geant4 Monte Carlo simulation [39]. Results demonstrate that this approach effectively transfers scattering angle features from bare materials, enabling accurate classification of coated ones. Our method not only reduces reliance on large-scale labeled data for coated materials but also maintains excellent Z-class classification accuracy within minutes, particularly achieving superior accuracy for the application-critical high-Z class identification. Our contributions are threefold:

We propose a novel sampling method combining inverse cumulative distribution function (CDF), integration, and interpolation that improves the feature expression capability of muon scattering angle samples, optimizing neural network training effectiveness.
We develop a fine-tune-based transfer learning model for fast Z-class prediction when coated material labels are scarce, and a DANN transfer learning model based on adversarial principles for stable Z-class prediction when coated material labels are completely unknown.
We provide comprehensive result analysis and physical correlation interpretation, demonstrating potential application scenarios for each method and offering a diversified scheme for applying and extending transfer learning in muon techniques such as cargo inspection and nuclear safeguards.

To our knowledge, this is the first study to apply transfer learning to muon tomography. This research provides an alternative machine learning scheme to traditional identification methods and demonstrates the great potential of transfer learning in mitigating high-cost reconstruction and data scarcity challenges in muon-based applications with similar scenarios.

[FIGURE:1]

II. Muon Scattering Simulation and Sampling Method

A. Simulation Setups

Scattering angle data were generated using Geant4 simulation incorporating the CRY (Cosmic-Ray Shower Library) software [40], which produces muons with the energy and angular distribution of cosmic muons at sea level. The object dimensions are 10 cm × 10 cm × 10 cm; it may be coated on all six sides with a 1 cm thick layer, resulting in an overall size of 12 cm × 12 cm × 12 cm for coated objects. Four position-sensitive detectors (30 cm × 30 cm) are placed above and below the object to measure incident and emergent muon trajectories for angle calculations. A 20 cm spacing between the second and third detectors accommodates tested materials at the center of this gap. The distance between detectors 1-2 and detectors 3-4 is 35 mm each. Notably, the simulation uses ideal detectors without characteristics such as detector noise. This setup is illustrated in Fig. 2 [FIGURE:2].

[FIGURE:2]

First, we established a muon source with 1 GeV energy and conducted scattering angle simulations for nine bare materials (Mg, Al, Ti, Fe, Cu, Zn, W, Pb, U) to verify data reliability. Figure 3 [FIGURE:3] presents probability distribution statistics of simulated scattering angle data using kernel density estimation (KDE), categorized by both Z classes and individual materials. Results indicate significant differences in scattering angle distributions among different Z-class materials, as well as among materials within the same Z class. In the formal experiment, we performed simulations based on the energy and angular distributions of cosmic-ray muons at sea level. For each material in different scenarios (bare, Al-coated, or PE-coated), we simulated 500,000 muon scattering angle data points. The scattering angle in our simulation is the included angle between incoming and outgoing muon tracks in 3D space, calculated as:

$$\theta = \arccos\left(\frac{\vec{\mu}{in} \cdot \vec{\mu}}}{|\vec{\mu{in}||\vec{\mu}\right)$$}|

B. Inverse CDF Sampling

As shown in Fig. 3, the primary distinguishing feature of muon scattering through different materials lies in variations of their scattering angle distributions. Additionally, high-quality training samples contribute to more effective neural networks [41, 42]. When employing a neural network as a mapping function between scattering angle data and material properties, the model should learn features that facilitate differentiation between materials based on their scattering characteristics. To effectively capture distribution characteristics of simulation data, we introduce a sampling method based on the inverse cumulative distribution function.

First, we uniformly select a set of quantile points within the overall scattering angle distribution range for a given material and use non-parametric methods to estimate the probability density function (PDF) at these quantile points. Corresponding CDF values are then computed from these PDF values. Finally, training samples are obtained through inverse CDF sampling. The complete sampling process is shown in Fig. 4 [FIGURE:4], with pseudocode provided in Appendix 1. Compared with traditional random sampling (RS), this method generates samples sharing the same probability distribution as the overall simulation data.

[FIGURE:4]

We employed two non-parametric estimation methods—KDE [43] and histogram estimation (HE)—to compute probability density values at selected quantile points. The fundamental idea of KDE is to use a smooth kernel function to perform weighted averaging of data points, thereby obtaining a probability density estimate. In other words, it calculates the weighted contribution of data points at a given quantile point. Selecting specific quantile points rather than using entire data ensures computational efficiency and accurate distribution estimation while mitigating impact from fine-grained details that could disrupt overall probability distribution smoothness. For a selected quantile point $X_q$, its kernel density estimate of the PDF is given by:

$$f_k(X_q) = \sum K\left(\frac{X_q - X_i}{h}\right)$$

where $K$ is the kernel function, $h$ is the bandwidth parameter controlling estimation smoothness, and $X_i$ are the true data values. We employ a Gaussian kernel function with the Silverman method for automatic bandwidth adjustment.

In contrast, the HE method is more straightforward and intuitive. It uniformly divides the overall data range into $b$ bins and counts data points falling within each bin:

$$f_h(x) = \sum \frac{1(X_i \in B_j)}{nh}$$

where $h$ is the bin width, $n$ is the total number of samples, and $1(X_i \in B_j)$ is an indicator function that equals 1 if data point $X_i$ falls within bin $B_j$ and 0 otherwise. KDE provides smoother PDF estimates, enabling acquisition of higher-quality samples. However, for sparse data or data with extreme values, KDE may lose accuracy. HE is more stable, but improper bin selection can lead to overfitting or underfitting. Given the peaked and long-tailed characteristics of our simulation data, we incorporate both methods into the sampling process.

To convert discrete PDF values into continuous CDF values, we employ the compound trapezoidal rule (Eq. 4) and cubic spline interpolation (Eq. 5). The integral interval is divided into multiple sub-intervals with the trapezoidal rule applied within each to improve integration accuracy:

$$F(X_q) \approx \frac{X_{q+1} - X_q}{2}\left[f(X_1) + 2\sum_{p=2}^{q-1}f(X_p) + f(X_q)\right]$$

where $f(X_q)$ is obtained from Eq. 2 or Eq. 3 depending on the PDF calculation method. A smooth, continuous CDF is obtained using cubic spline interpolation between discrete CDF values, with parameters $a_j$, $b_j$, and $c_j$ determined by zero-, first-, and second-order boundary conditions for different CDF value pairs:

$$F_j(x) = a_j(x - X_i)^3 + b_j(x - X_i)^2 + c_j(x - X_i) + d_i$$

Based on the calculated CDF values, we conduct $N_s$ rounds of sampling for each material's simulation data under different experimental scenarios. In each iteration, we generate $n$ random numbers $u_n$ satisfying uniform distribution $U \sim (0, 1)$ and use inverse CDF to obtain $n$ scattering angle data points ${x_1, x_2, ..., x_n}$:

$$x_i = F^{-1}(u_i)$$

To prevent overfitting and loss of robustness from high sample similarity, we incorporate a similarity check mechanism. By calculating Euclidean distance between samples generated in each iteration, we filter out excessively similar samples, ultimately producing $N_s$ diverse samples aligned with the overall simulation data distribution.

Sampling process parameters are listed in Table 1 [TABLE:1]. Sampled scattering angle data are stored as features, while corresponding material Z-classes serve as labels (0, 1, and 2 for low-, mid-, and high-Z), forming feature-label pairs.

[TABLE:1]

For each material in different scenarios (bare, Al-coated, or PE-coated), we acquired 1,000 samples. In specific training and prediction phases, a consistent train-test split is applied across materials. Detailed partitioning ratios for various phases are provided in corresponding parameter tables. Additionally, we applied the traditional random sampling method (RS), which directly samples from raw simulated scattering angle data, using the same sample size as inverse CDF sampling. Comparing training accuracy demonstrates the superiority of our inverse CDF sampling method.

III. Transfer Learning-Based Z-Class Identification

Implicit correlation features exist between source and target domain tasks, constituting the practical feasibility basis for transfer learning [31–33]. In this study, we adopt two transfer learning paradigms: fine-tuning and adversarial transfer learning with DANN. Fine-tuning performs limited parameter adaptation in the target domain based on a pre-trained source domain model, enabling efficient learning of feature-label relationships even with scarce target domain data. Adversarial transfer learning, conversely, applies when target domain labels are completely unknown. By extracting shared discriminative features, it aligns feature distributions between source and target domains, enabling unsupervised classification.

A. Pre-training and Fine-Tuned Transfer

Fine-tuning is an essential neural network transfer technique widely applied due to its low computational cost and high training efficiency under limited target data conditions. We constructed a unified lightweight neural network with two hidden layers for both pre-training and fine-tuning (P&F model). Scattering angle sample data are received by the input layer, processed through two hidden layers for feature extraction, and finally classified by the output layer. Detailed network structure is shown in Fig. 5 [FIGURE:5].

[TABLE:2]

Since the P&F model is designed for multi-classification tasks, we use cross-entropy as the loss function [44]:

$$L_{P\&F} = -\sum_i\sum_j y_{i,j}\log(\text{Softmax}(z_{i,j}))$$

where $N$ is the batch size and $C$ is the number of classes (low, mid, and high). For sample $i$, the ground-truth class label $y_{i,j}$ is encoded in one-hot format and automatically converted by PyTorch. Corresponding logits $z_{i,j}$ represent raw neural network predictions normalized by the Softmax activation function. The model is trained on the training set, and its prediction accuracy on the test set is recorded to evaluate generalization performance and robustness. Specific parameter settings are detailed in Table 2.

For pre-training, we trained the P&F model with feature-label pairs $(x_s, y_s)$ from the source domain using three sampling methods. The detailed training process is shown in Fig. 6 [FIGURE:6], aiming to learn the mapping between bare material scattering angle data and Z categories. After pre-training, hidden layer parameters are optimized to effectively extract high-order features from raw scattering angle data. In subsequent fine-tuning, input and hidden layer parameters are frozen while only training parameters between the last hidden layer and output layer using fewer target domain training samples.

[FIGURE:6]

In the pre-training stage, we performed supervised training on nine materials from high, mid, and low-Z categories. Results in Table 3 [TABLE:3] indicate that the pre-trained model achieves high prediction accuracy on the test dataset, confirming neural network effectiveness in learning the mapping between scattering angle data and Z categories. Accuracy is defined as the ratio of correct predictions to total sample number.

Before applying fine-tuned transfer learning, we directly evaluated the pre-trained model on two target domains with Al and PE coatings. Test results are shown in Table 4 [TABLE:4] (Pre-train). With inverse CDF sampling (KDE and HE), classification accuracy in the Al-coated target domain decreased by approximately 12%, with the most significant drop in low-Z material prediction accuracy (about 30%). This occurs because, as a low-Z material, Al coating's relative influence on scattering angle distribution is inversely proportional to the coated material's intrinsic Z value. In the PE-coated target domain, pre-trained model prediction accuracy remained almost unchanged (decreasing only about 1%), since PE, as a hydrocarbon compound, is an ultra-low-Z material whose coating minimally impacts metallic material scattering angle distributions. In contrast, RS-trained models showed physically inconsistent patterns due to sampling randomness.

[FIGURE:5]

In our P&F architecture, the first hidden layer extracts fundamental features reflecting low-level scattering physics. These features are largely invariant across domains given shared input data structure. Freezing this layer preserves transferable, physically meaningful representations. The second hidden layer captures more domain-specific cues, but empirical studies showed that fine-tuning it while keeping the first frozen often leads to overfitting due to limited target dataset size. Freezing both hidden layers stabilizes performance by retaining shared representations and reducing domain-specific noise influence. Additionally, freezing both layers reduces trainable parameters, benefiting low-cost scenarios.

For fine-tuning, we freeze corresponding parameters and adapt the P&F network with fewer target domain training samples $(x_t, y_t)$. This enables efficient adaptation to target domain tasks. Classification accuracy of the fine-tuned model on target test datasets for Al-coated and PE-coated materials is presented in Table 4 (Fine-tune). Benefiting from well-optimized pre-training parameters, the fine-tuned P&F model demonstrates excellent predictive performance across both tasks. Detailed training processes are shown in Fig. 7 [FIGURE:7].

[FIGURE:7]

Due to globally shared neural network parameters, fine-tuning aims to enhance overall classification accuracy rather than optimizing each class individually. As discriminative ability improves for certain categories, others may deteriorate slightly, causing parameter competition and trade-offs. This may produce minor accuracy decreases for specific classes, though overall target task performance still improves. The observation that training loss exceeds testing loss, with training curves showing greater fluctuation while test curves remain smooth, primarily results from limited training set size. Using only 30% of data for training makes the model more sensitive to outliers and difficult samples, leading to higher average loss and increased variance during optimization. In contrast, the larger test set (70%) provides more reliable performance estimates, effectively averaging out anomalies for smoother, lower loss curves. Furthermore, since parameter updates occur only on the training set, its loss fluctuates across batch iterations, whereas test loss remains relatively stable.

B. Transfer Learning with DANN Model

Identifying Z-class of unknown coated materials presents a more significant challenge as an unlabeled target domain problem. However, by identifying common scattering angle features between coated and bare materials, we can train a neural network to achieve superior classification in the unknown domain. Traditional domain alignment methods typically compute specific mathematical relationships between source and target domains and incorporate them into the loss function [45, 46]. Given that different transfer learning tasks exhibit distinct data characteristics, determining optimal mathematical relationships as training objectives remains highly challenging.

The adversarial concept in neural networks was first introduced in [47], where adversarial models generate samples to enhance robustness. DANN extends adversarial training to transfer learning by incorporating a domain discriminator that enforces feature distribution alignment between source and target domains through adversarial training, thereby facilitating unsupervised learning in the target domain. This approach fully exploits neural network fitting capabilities and enables effective feature alignment without requiring explicit definition of feature relationships between specific source and target domains.

Our DANN model, illustrated in Fig. 8 [FIGURE:8], consists of three main components: a feature extractor, a classifier, and a domain discriminator. During training, labeled scattering angle data from bare materials and unlabeled data from coated materials are fed into the feature extractor. Extracted features are passed to the domain discriminator, while source domain features and corresponding labels serve as classifier input.

[FIGURE:8]

First, the domain discriminator determines whether input features originate from source or target domain while computing loss function $L_{disc}$ to optimize itself, aiming to improve domain feature discrimination. Since the discriminator only classifies whether features belong to source or target domain, we use binary cross-entropy:

$$L_{disc} = -\sum_i\left[y_i\log(\text{Sigmoid}(-z_i)) + (1-y_i)\log(1-\text{Sigmoid}(-z_i))\right]$$

where ground-truth label $y_i \in {0, 1}$ indicates source or target domain class. Model logits $z_i$ represent raw predictions before Sigmoid activation. Unlike Softmax in cross-entropy, binary cross-entropy requires explicitly defining Sigmoid as the output layer activation function.

Second, when receiving a sample $(F(x_s), y_s)$ from the source domain, the classifier computes loss function $L_{cls}$ for conventional multi-classification, similar to pre-training and fine-tuning. Consequently, $L_{cls}$ is also defined as cross-entropy with the same expression as Eq. 7. Finally, the overall loss function:

$$L_{total} = L_{cls} - \lambda L_{disc}$$

trains the extractor, where $L_{cls}$ improves extractor and classifier prediction accuracy while $-\lambda L_{disc}$ trains the extractor to gradually extract shared features between domains, achieving domain alignment. Gradient reversal ensures an adversarial relationship between extractor and discriminator during training, eventually reaching equilibrium. Gradient reversal parameter $\lambda$ balances feature alignment and classification accuracy improvement. In this study, feature alignment is more challenging than classification, and due to minimal feature distribution discrepancy between domains, $L_{disc}$ remains relatively low. Therefore, we assign higher weight to $\lambda$ to enhance training performance. After training, the extractor and classifier can extract common features from both domains and perform effective classification. Detailed hyperparameters are shown in Table 5 [TABLE:5].

[FIGURE:9]

Training processes and final DANN results are presented in Fig. 9 [FIGURE:9] and Table 6 [TABLE:6]. Results indicate that inverse CDF-sampled DANN exhibits slightly lower low-Z material prediction accuracy compared to mid- and high-Z materials. This is consistent with pre-trained model inference in the target domain and can be attributed to significant changes in low-Z material scattering angle distributions after Al coating, making feature alignment more challenging. Meanwhile, the large gradient reversal coefficient $-\lambda$ causes overall training loss to become negative by including adversarial discriminator loss. This does not affect test set evaluation, where the domain discriminator and gradient reversal are not involved. Therefore, test loss calculated solely on standard cross-entropy remains positive. This expected behavior does not impact model performance assessment on the target classification task.

[TABLE:6]

IV. Results and Discussion

Overall prediction accuracies of pre-trained, fine-tuned, and DANN models in the target domain are summarized in Table 7 [TABLE:7]. Results indicate that inverse CDF sampling effectively improves sample quality, thereby enhancing prediction accuracy. Due to RS method randomness and neural networks' black-box nature, RS-trained results are difficult to interpret physically. However, since inverse CDF sampling effectively captures scattering angle data distribution, it offers stronger result interpretability. Additionally, as stated in [48], 1,400 scattering instances are sufficient to construct a statistically reliable muon scattering angle probability distribution. Although we compute CDF at selected points, our global interpolation acts on 500,000 instances—far beyond this threshold. Therefore, in practical applications, total training data requirements can be further reduced as long as sample diversity is maintained.

[TABLE:7]

For our two target domain tasks, since PE coating minimally impacts muon scattering angle distribution, model prediction accuracy in the PE task only slightly improved (approaching 100%) after transfer learning. However, in the Al task, transfer learning improved prediction accuracy by approximately 10%. Comparing the two transfer learning methods, fine-tuning achieves slightly higher prediction accuracy than DANN. As summarized in Section III, fine-tuning benefits from supervised learning, leading to more balanced accuracy across Z-class materials. In contrast, although DANN improves low-Z material prediction accuracy by over 20% compared to non-transfer learning, its accuracy remains slightly lower than fine-tuning due to unsupervised target domain training. However, this highlights a key DANN advantage: it requires no target domain labels yet achieves accuracy comparable to fine-tuning. Moreover, since DANN focuses on extracting shared features between domains while fine-tuning learns the entire source domain feature space before transfer (potentially capturing irrelevant features), DANN demonstrates greater stability and robustness. This makes DANN particularly valuable for real-world applications requiring fully unsupervised transfer learning.

The two models contain parameters on the order of $10^5$–$10^6$ (P&F: 72,579 parameters; DANN: 323,332 parameters). In our implementation, both transfer learning methods require running memory on the order of $10^2$–$10^3$ MB and run within minutes on a local Intel Core i9-14900K (24-core, 32-thread, up to 6.0 GHz) computer. Model architectures, training processes, and data loading were implemented using PyTorch libraries. This analysis fully demonstrates that our models and algorithms have relatively low deployment and training costs.

V. Conclusion

We developed a novel transfer learning method for identifying material Z-class using muon scattering angle data, providing an alternative to traditional complex physical model reconstruction methods. First, we conducted Monte Carlo simulations using Geant4 to obtain scattering angle data for specified materials in both bare and coated states. We employed fitting techniques including computation and interpolation to derive material scattering angle probability distributions. From these distributions, we generated sampled datasets better conforming to overall distributions, yielding approximately 4% prediction accuracy improvement in the target domain and enhancing training process physical interpretability.

Meanwhile, in real-world applications, the correspondence between scattering angle data and coated materials is often unknown. To address this, we introduced two lightweight neural networks trained using transfer learning. By employing either fine-tuned supervised learning or adversarial unsupervised learning on coated materials, these models transfer source-domain knowledge from bare material data to coated material target domains. In the PE target domain task, where scattering angle distribution changes minimally after coating, prediction accuracy reaches 99%. In the more challenging Al-coated task, prediction accuracy improves by approximately 10% compared to the pre-transfer model. Results demonstrate that our method achieves high prediction accuracy even when mapping between coated materials and scattering data is scarce or unknown. We analyzed results across different tasks and scenarios, verifying that machine learning-based Z-class identification aligns with muon interaction physical principles and validating the feasibility of Z-class prediction for coated materials via transfer learning.

Furthermore, this study reveals that features learned through machine learning exhibit transferability rather than merely relying on repeated application of domain-specific expertise. This suggests that transfer learning-based machine learning methods can serve as cost-effective training approaches for physics research tasks in similar situations.

Future research incorporating additional physical variables such as muon momentum beyond scattering angle is expected to further improve Z classification accuracy and robustness. Real-world detection challenges include detector noise and electronic fluctuations that bias scattering angle measurements, resolution limitations that reduce ability to distinguish materials with similar scattering properties, and system inefficiencies such as dead time and low detection efficiency that degrade data quality. We may introduce noise-aware training to boost model robustness and generalization in real environments. Furthermore, by introducing more statistics and enhancing model generalization ability, transfer learning methods are expected to extend to Z-value identification tasks in more complex scenarios.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (Grants No. 12405402, 12475106, 12105327, 12405337) and the Guangdong Basic and Applied Basic Research Foundation, China (Grant No. 2023B1515120067). Computing resources were primarily provided by the supercomputing system in the Dongjiang Yuan Intelligent Computing Center.

Appendix: Algorithmic Pseudo-Code

Algorithm 1: Data Conversion and Sampling Process

Input: total material set $M$, Z-class dictionary of materials $Z$, total simulation data .root
Output: training dataset $D_{train}$ ($M \times N_s \times n \times r_1$), test dataset $D_{test}$ ($M \times N_s \times n \times r_2$)

Parameters: number of simulation data per material $N^*$; number of samples $N_s$; number of scattering angles $n$ per sample; similarity threshold $k$; number of mean quantile points $q$; training:test ratio $r_1:r_2$.

// data format conversion
// time complexity: O(N*) space complexity: O(N*)
1. for material m in M do
2.   read .root data for material m;
3.   if .root data is not None then
4.     remove NaN values and trim to same size;
5.     (features, label) ← (processed .root, Z-class of the material);
6.     total dataset table D*_M ← (features, label) with .csv format;
7.   end
8. end
9. return D*

// sampling process
// time complexity: O(m × (N* + N² × n)) space complexity: O(m × N* + m × N* × n)
10. for data of material m D*_m in D* do
11.   X ← features in D*_m
12.   Y ← label in D*_m
13.   X_q ← selected q points in X;
14.   f(X_q) ← PDF values calculated on X_q;
15.   CDF with Composite Trapezoidal Rule: F(X) ← (X_{q+1}-X_q)/2 × [f(X_1) + 2∑_{p=2}^{q-1}f(X_p) + f(X_q)]
16.   cubic spline interpolation;
17.   for I ← 1 to N_s do
18.     build sample list D_m ← ∅ for material m;
19.     u_n ← n numbers satisfying U ∼ (0, 1);
20.     x_i ← F^{-1}(u_i), where i = 1, 2, ..., n;
21.     x_I ← {x_1, ..., x_n};
22.     if D_m is not empty and similarity between (x_I, y) and samples in D_m > k then
23.       I ← I - 1;
24.     add (x_1, y) to D_m;
25.   end
26.   return D_m = {(x_1, y), ..., (x_{N_s}, y)};
27. end
28. integrate all material sample lists into dataset D_M;
29. divide training and test datasets: D_train ← D_M × r_1, D_test ← D_M × r_2;
30. return D_train and D_test;

Algorithm 2: Pre-training and Fine-tune Transfer Learning

Input: source training dataset $D_{train}^S$, source test dataset $D_{test}^S$, target training dataset $D_{train}^T$, target test dataset $D_{test}^T$
Output: pre-trained model $\Theta_P(x, \theta_p)$, fine-tuned model $\Theta_F(x, \theta_f)$

Parameters: batch-size $N$, number of epochs $T$, learning rate $\alpha$, optimizer Adam, neural network model $\Theta(x, \theta)$.

// Pre-training process
1. for t ← 1 to T do
2.   for D_train in D_{train}^S with batch-size N
3.     // Iterate over dataset D_{train}^S
4.     X_i ← labels in D_train
5.     Y_i ← features in D_train
6.     \hat{Y}_i ← Θ(X_i, θ);
7.     loss function CrossEntropy: L = -1/N ∑_i^N ∑_j^3 Y_{i,j} log \hat{Y}_{i,j}, where j is Z categories;
8.     loss.backward();
9.     θ_p ← update model weights θ with Adam in α;
10.    acctrain ← cumulate accuracies with Y_i and \hat{Y}_i;
11.  end
12.  for D_test^N in D_{test}^S
13.    X_i^* ← labels in D_test
14.    Y_i^* ← features in D_test
15.    \hat{Y}_i^* ← Θ(X_i^*, θ);
16.    acctest ← cumulate accuracies with Y_i^* and \hat{Y}_i^*;
17.    (acc1, acc2, acc3) ← cumulate accuracies of (Low-Z, Mid-Z, High-Z) with Y_i^* and \hat{Y}_i^*;
18.  end
19.  store acctrain, acctest, acc1, acc2, acc3;
20. end
21. return pre-trained model Θ_P(x, θ_p);

// Fine-tuning process
22. load model Θ_P(x, θ_p);
23. Θ_F(x, θ) ← update only the last fully connected layer of Θ_P(x, θ_p);
24. train model Θ_F(x, θ) with D_{train}^T and test with D_{test}^T (same training process as pre-training);
25. store acctrain, acctest, acc1, acc2, acc3;
26. return fine-tuned model Θ_F(x, θ_f);

Algorithm 3: Training Process of DANN

Input: source training dataset $D_{train}^S$, target training dataset $D_{train}^T$, target test dataset $D_{test}^T$
Output: trained feature extractor $F(x, f)$, classifier $C(x, c)$, and domain discriminator $D(x, d)$

Parameters: batch-size $N$, number of epochs $T$, learning rates $\alpha_1$, $\alpha_2$, inversion parameter $\lambda$; feature extractor $F$, classifier $C$, domain discriminator $D$.

// Training process of DANN
1. for t ← 1 to T do
2.   (X_S, Y_S), (X_T, ∅) ← D_{train}^S, D_{train}^T from D_{train}^N
3.   F_S ← F(X_S), F_T ← F(X_T);
4.   \hat{Y}_S ← C(F_S);
5.   cls_loss ← CrossEntropy(\hat{Y}_S, Y_S);
6.   // Update domain discriminator
7.   domain_labels ← [1 for F_S, 0 for F_T];
8.   domain_pred ← D([F_S, F_T]);
9.   disc_loss ← BinaryCrossEntropy(domain_pred, domain_labels);
10.  disc_loss.backward();
11.  d* ← update D(x, d) weights d with Adam in α_2;
12.  // Update feature extractor and classifier
13.  total_loss ← cls_loss - λ × disc_loss;
14.  total_loss.backward();
15.  f*, c* ← update F(x, f) and C(x, c) weights f and c with Adam in α_1;
16.  acctrain ← cumulate accuracies with \hat{Y}_S and Y_S;
17. end

// Evaluation on target domain
18. set every model to evaluation mode (disable gradient updating);
19. for D_test^{N*} from D_{test}^T
20.   (X, Y) ← D_test
21.   F_T ← F(X);
22.   \hat{Y} ← C(F_T);
23.   acctrain ← cumulate accuracies with \hat{Y} and Y;
24.   compute accuracies for each category (Low-Z, Mid-Z, High-Z);
25. end
26. store acctrain, acctest, accuracy for each category (low-Z, mid-Z, high-Z);
27. return trained model F(x, f*), C(x, c*) and D(x, d*);

References

[1] P. Shukla, S. Sankrith, Energy and angular distributions of atmospheric muons at the Earth (2016). arXiv:1606.06907
[2] S.H. Neddermeyer, C.D. Anderson, Cosmic-ray particles intermediate mass. Phys. Rev. 54, 88-89 (1938). doi:10.1103/PhysRev.54.88.2
[3] M. Tanabashi, P.D. Grp, K. Hagiwara et al., Review of particle physics: particle data group. Phys. Rev. D. 98, 030001 (2018). doi:10.1103/PhysRevD.98.030001
[4] N. Su, Y.Y. Liu, L. Wang et al., A Comparison of Muon Flux Models at Sea Level for Muon Imaging and Low Background Experiments. Front. Energy. Res. 9, 750159 (2021). doi:10.3389/fenrg.2021.750159
[5] H. Miyadera, C.L. Morris, Muon scattering tomography: review. Appl. Opt. 61, C154-C161 (2022). doi:10.1364/AO.445806
[6] A. Hannah, A.F. Alrheli, D. Ancius et al., Muon Imaging. (International Atomic Energy Agency, Vienna, 2022), pp. 5-8
[7] H.K.M. Tanaka, C. Bozza, A. Bross et al., Muography. Nat. Rev. Methods Primers 3, 88 (2023). doi:10.1038/s43586-023-00245-4
[8] L.W. Alvarez, J.A. Anderson, F.E. Bedwei et al., Search for hidden chambers in the pyramids. Science. 167, 3919 (1970). doi:10.1126/science.167.3919.832
[9] S. Chatzidakis, C.K. Choi, L.H. Tsoukalas, Analysis of spent nuclear fuel imaging using multiple coulomb scattering of cosmic muons. IEEE. T. Nucl. Sci. 63, 2866-2874 (2016). doi:10.1109/TNS.2016.2618009
[10] G. Leone, H.K.M. Tanaka, M. Holma et al., Muography as a new complementary tool in monitoring volcanic hazard: implications for early warning systems. Proc. R. Soc. A. 477, 2255 (2021). doi:10.1098/rspa.2021.0320
[11] R. Han, Q. Yu, Z. Li et al., Cosmic muon flux measurement and tunnel overburden structure imaging. JINST. 15, 06019 (2020). doi:10.1088/1748-0221/15/06/P06019
[12] Y.P. Cheng, R. Han, Z.W. Li et al., Imaging internal density structure of the Laoheishan volcanic cone with cosmic ray muon radiography. Nucl. Sci. Tech. 33, 88 (2022). doi:10.1007/s41365-022-01072-4
[13] K.N. Borozdin, G.E. Hogan, C. Morris et al., Radiographic imaging with cosmic-ray muons. Nature. 422, 277 (2003). doi:10.1038/422277a
[14] W.C. Priedhorsky, K.N. Borozdin, G.E. Hogan et al., Detection of high-Z objects using multiple scattering of cosmic ray muons. Rev. Sci. Instrum. 74, 4294–4297 (2003). doi:10.1063/1.1606536
[15] L.J. Schultz, K.N. Borozdin, J.J. Gomez et al., Image reconstruction and material Z discrimination via cosmic ray muon radiography. Nucl. Instrum. Meth. A. 519, 687-694 (2004). doi:10.1016/j.nima.2003.11.035
[16] P Lalor, A Danagoulian, Fundamental limitations of dual energy X-ray scanners for cargo content atomic number discrimination. Appl. Radiat Isotopes. 206, 111201 (2024). doi:10.1016/j.apradiso.2024.111201
[17] S. Xiao, W.B. He, M.C. Lan et al., A modified multi-group model of angular and momentum distribution of cosmic ray muons for thickness measurement and material discrimination of slabs. Nucl. Sci. Tech. 29, 28 (2018). doi:10.1007/s41365-018-0386-1
[18] X.T. Ji, S.Y. Luo, Y.H. Huang et al., A novel 4D resolution imaging method for low and medium atomic number objects at the centimeter scale by coincidence detection technique of cosmic-ray muon and its secondary particles. Nucl. Sci. Tech. 33, 2 (2022). doi:10.1007/s41365-022-00989-0
[19] P. Yu, Z.w. Pan, Z.Y. He et al., A new efficient imaging reconstruction method for muon scattering tomography. Nucl. Instrum. Meth. A. 1069, 169932 (2024). doi:10.1016/j.nima.2024.169932
[20] X.C. Ming, H.F. Zhang, R.R. Xu et al., Nuclear mass based on the multi-task learning neural network method. Nucl. Sci. Tech. 33, 48 (2022). doi:10.1007/s41365-022-01031-z
[21] G.L Wang, K.M. Chen, S.W. Wang et al., Beam based alignment using a neural network. Nucl. Sci. Tech. 35, 75 (2024). doi:10.1007/s41365-024-01436-y
[22] Y. Xie, Q. Li, A review of deep learning methods for compressed sensing image reconstruction and its medical applications. Electronics. 11(4), (2022). doi:10.3390/electronics11040586
[23] A. Bazzi, D.T.M. Slock, L. Meilhac et al., A comparative study of sparse recovery and compressed sensing algorithms with application to AoA estimation. Paper presented at IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Edinburgh, UK, July. 2016. doi:10.1109/SPAWC.2016.7536780
[24] A. Bazzi, D.T. Slock, and L. Meilhac, A Newton-type Forward Backward Greedy method for multi-snapshot compressed sensing. Paper presented at 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, Oct. 2017. doi:10.1109/ACSSC.2017.8335537
[25] A. Radovic, M. Williams, D. Rousseau et al., Machine learning at the energy and intensity frontiers of particle physics. Nature. 560, 41 (2018). doi:10.1038/s41586-018-0361-2
[26] G. Karagiorgi, G. Kasieczka, S. Kravitz et al., Machine learning in the search for new fundamental physics. Nat. Rev. Phys. 4, 399–412 (2022). doi:10.1038/s42254-022-00455-1
[27] A. Boehnlein, M. Diefenthaler, N. Sato et al., Colloquium: Machine learning in nuclear physics. Rev. Mod. Phys. 94, 031003 (2022). doi:10.1103/RevModPhys.94.031003
[28] C.Y. Gao, X.Z. Tang, X.N. Chen et al., Convolutional neural network algorithm for material discrimination in muon scattering tomography. Atom. Energy. Sci. Techno. 57, 353-361 (2023). doi:10.7538/yzk.2022.youxian.0055
[29] W.B. He, Dissertation, Nuclear Science and Technology University of Science and Technology of China, 2019 (in Chinese)
[30] W.B. He, D.Y. Chang, R.G. Shi et al., Material discrimination using cosmic ray muon scattering tomography with an artificial neural network. Radiat. Detect. Technol. Methods. 6, 254–261 (2022). doi:10.1007/s41605-022-00319-3
[31] S.J. Pan, Q. Yang, A Survey on Transfer Learning. IEEE. T. Knoml. Data. En. 22, 1345 (2010). doi:110.1109/TKDE.2009.191
[32] F. Zhuang, Z.Y. Qi, K.Y. Duan et al., A Comprehensive Survey on Transfer Learning. in Proceedings of the IEEE. 109, 43 (2021). doi:10.1109/JPROC.2020.3004555
[33] F.Z. Zhuang, P. Luo, Q. He et al., Survey on Transfer Learning Research. J. Softw. 26, 26-39 (2015). doi:10.13328/j.cnki.jos.004631
[34] L.J. Schultz, G.S. Blanpied, K.N. Borozdin et al., Statistical Reconstruction for Cosmic Ray Muon Tomography. IEEE. T. Image Process. 16, 1985-1993 (2007). doi:10.1109/TIP.2007.901239
[35] A Clarkson, D.G. Ireland, R.A. Jebali et al., Characterising encapsulated nuclear waste using cosmic-ray Muon Tomography (MT). Paper presented at the 4th International Conference on Advancements in Nuclear Instrumentation Measurement Methods and their Applications (ANIMMA), Lisbon, Portugal, Apr. 2015. doi:10.1109/ANIMMA.2015.7465529
[36] J.M. Durham, Cosmic Ray Muon Radiography Applications in Safeguards and Arms Control (2018). arXiv:1808.06681
[37] J. Yosinski, J. Clune, Y. Bengio et al., in Advances in Neural Information Processing Systems 27 (NeurIPS 2014), Montreal, Canada, December 2014 (Curran Associates, Inc., 2014), pp. 3320-3328
[38] Y. Ganin, E. Ustinova, H. Ajakan et al., in Domain Adaptation in Computer Vision Applications, ed. by G. Csurka (Springer, Cham, 2017), pp. 189–209. doi:10.1007/978-3-319-58347-1_10
[39] S. Agostinelli, J. Allison, K. Amako et al., Geant4—a simulation toolkit. Nucl. Instrum. Meth. A. 506, 250-303 (2003). doi:10.1016/S0168-9002(03)01368-8
[40] C. Hagmann, D. Lange, and D. Wright, Cosmic-ray shower generator for Monte Carlo transport codes. Paper presented at the 2007 IEEE Nuclear Science Symposium Conference Record, Honolulu, HI, USA, Oct. 2007. doi:10.1109/NSSMIC.2007.443720
[41] A. Halevy, P. Norvig, F. Pereira, The unreasonable effectiveness of data. IEEE. INTELL. SYST. 24, 8-12 (2009). doi:10.1109/MIS.2009.36
[42] J. Deng, W. Dong, R. Socher et al., ImageNet: A large-scale hierarchical image database. Paper presented at the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, June. 2009. doi:10.1109/CVPR.2009.5206848
[43] Y.C. Chen, A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology, 1, 161–187 (2017). doi:10.1080/24709360.2017.1396742
[44] Goodfellow, Y. Bengio, A. Courville, Deep Learning. (MIT Press, Cambridge, 2016). pp. 131-134
[45] B. Sun, K. Saenko, Deep CORAL: Correlation Alignment for Deep Domain Adaptation. Paper presented at the ECCV 2016 Workshops, Amsterdam, The Netherlands, Oct. 2016. doi:10.1007/978-3-319-49409-8_35
[46] A. Gretton, K.M. Borgwardt, M.J. Rasch et al., A Kernel Two-Sample Test. J. MACH. LEARN. RES. 13, 723-773 (2012). http://jmlr.org/papers/v13/gretton12a.html
[47] I.J. Goodfellow, J. Pouget-Abadie, M. Mirza et al., in Advances in Neural Information Processing Systems 27 (NeurIPS 2014), Montreal, Canada, December 2014 (Curran Associates, Inc., 2014), pp. 2672–2680
[48] Y.P. Li, X.Z. Tang, X.N Chen et al., Experimental study on material discrimination based on muon discrete energy. Acta Phys. Sin. 72, 026501 (2023) doi:10.7498/aps.72.20221645

Submission history

[v1] 2025-06-24