ChinaRxiv

Residual resampling-based physics-informed neural network for neutron diffusion equations

Heng Zhang, Yun-Ling He, Dong Liu, Qin Hang, He-Min Yao, Di Xiang

Submitted 2025-07-26 | ChinaXiv: chinaxiv-202508.00041

Note: Figures in this paper have not yet been translated.

Abstract

The neutron diffusion equation plays a pivotal role in the analysis of nuclear reactors. Nevertheless, employ#2;ing the Physics-Informed Neural Network (PINN) method for its solution entails certain limitations. TraditionalPINN approaches often utilize fully connected network (FCN) architecture, which is susceptible to overfitting,training instability, and gradient vanishing issues as the network depth increases. These challenges result in ac#2;curacy bottlenecks in the solution. In response to these issues, the Residual-based Resample Physics-InformedNeural Network(R2 -PINN) is proposed, which proposes an improved PINN architecture that replaces the FCNwith a Convolutional Neural Network with a shortcut(S-CNN), incorporating skip connections to facilitate gra#2;dient propagation between network layers. Additionally, the incorporation of the Residual Adaptive Resampling(RAR) mechanism dynamically increases sampling points, enhancing the spatial representation capabilities andoverall predictive accuracy of the model. The experimental results illustrate that our approach significantlyimproves the model’s convergence capability, achieving high-precision predictions of physical fields. In com#2;parison to traditional FCN-based PINN methods, R2 -PINN effectively overcomes the limitations inherent incurrent methods, providing more accurate and robust solutions for neutron diffusion equations.

Full Text

Preamble

Residual resampling-based physics-informed neural network for neutron diffusion equations∗ Heng Zhang,1 Yun-Ling He,1 Dong Liu,2, 3, † Qin Hang,1 He-Min Yao,1 and Di Xiang2, 3 1College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China 2Science and Technology on Reactor System Design Technology Laboratory, Nuclear Power Institute of China, Chengdu, 610213, China 3CNNC Engineering Research Center of Nuclear Energy Software and Digital Reactor, Chengdu, 610213, China The neutron diffusion equation plays a pivotal role in nuclear reactor analysis. Nevertheless, employing the physics-informed neural network (PINN) method for its solution entails certain limitations. Conventional PINN approaches generally utilize a fully connected network (FCN) architecture that is susceptible to overfitting, training instability, and gradient vanishing as the network depth increases. These challenges result in accuracy bottlenecks in the solution. In response to these issues, the residual-based resample physics-informed neural network (R2-PINN) is proposed. It is an improved PINN architecture that replaces the FCN with a convolutional neural network with a shortcut (S-CNN). It incorporates skip connections to facilitate gradient propagation between network layers. Additionally, the incorporation of the residual adaptive resampling (RAR) mechanism dynamically increases the number of sampling points. This, in turn, enhances the spatial representation capabilities and overall predictive accuracy of the model. The experimental results illustrate that our approach significantly improves the convergence capability of the model and achieves high-precision predictions of the physical fields. Compared with conventional FCN-based PINN methods, R2-PINN effectively overcomes the limitations inherent in current methods. Thus, it provides more accurate and robust solutions for neutron diffusion equations.

Keywords: Neutron diffusion equation, Physics-informed neural network, CNN with shortcut, Residual adaptive resampling

INTRODUCTION

Nuclear reactor core analysis is crucial to ensure the safe operation of nuclear reactors. The neutron diffusion equation describes the neutron movement within a medium and is fundamental to this analysis [1]. Numerical methods have been widely applied to numerous physical scenarios, with methods such as finite difference [2] and finite element [3] being continuously developed and improved. Many related works are available in the reactor domain. For example, Hamada [4] proposed higher-order compact finite-difference schemes to solve neutron diffusion equations. Yuk [5] utilized the finite element method to solve a time-dependent neutron diffusion equation. Li [6] designed an algorithm based on the finite volume method to solve multi-group neutron diffusion equations. For the same problem, Matheus Gularte Tavares [7] and K. Zhuang [8] used the source iterative and variational nodal methods, respectively. Additionally, many researchers have successfully employed CFD software such as COMSOL [9], ANSYS FLUENT [10], and OpenFOAM [11] to address neutron diffusion problems.

However, these methods require discretization of the solution domain, which can be computationally complex and time-consuming when high-precision physics reconstruction is required [12]. Additionally, considering the complex environment in nuclear reactors, which exist in multi-physics fields such as neutron transport and heat transfer, numerical methods need to simplify and approximate the model to solve the equations. This introduces a certain number of solution errors. Meanwhile, owing to the development of neural networks (NNs), interest in machine learning-based approaches has been increasing [13]. Training an NN enables the prediction of the physical field to be faster than that with conventional numerical methods. Prior engineering knowledge should then be incorporated into the network to make the NN predictions more consistent with physical laws. PINNs were introduced by Raissi [14], which incorporate partial differential equations (PDEs) as constraints during network training by increasing the penalties for violating PDE data points. Thus, it yields more precise and physically consistent solutions. To date, PINNs have been applied successfully to various scientific computation problems in engineering fields such as fluids [15, 16], heat transfer [17, 18], flows [19, 20], and solid mechanics [21, 22], and have yielded significant results [23]. The differences between PINNs and conventional numerical methods are listed in Table 1 [TABLE:1]. The Monte Carlo method [24, 25] and CFD [26] can also be used to solve reactor problems in nuclear reactor core analysis. In recent years, deep neural networks (DNNs) have been used increasingly in reactor cores [27, 28]. Dong [29, 30] applied PINNs to solve multiple neutron-diffusion benchmark equations and demonstrated highly accurate predictive results. Utilizing FCN to capture the neutron distribution, they achieved a neutron flux distribution solution with an accuracy of 10−7 and successfully applied it to the search for critical parameters.

Moreover, FCNs are vulnerable to gradient vanishing and encounter nonconvergence during training. When the gradient vanishes, the NN parameters are not updated, and it is challenging for the network to learn new knowledge [31]. This may cause the network to have difficulty converging to an optimal solution, miss a large amount of information, and affect the expressive capability of the model. In terms of solving the neutron diffusion equation, the error in the predicted result renders the critical assessment inaccurate. Furthermore, it was observed that regions with large gradients exhibited insufficient training under uniform sampling, particularly when the number of sampling points was limited. This resulted in significant errors in regions with large gradients, thereby limiting further improvement in network accuracy. When solving the neutron diffusion equation, the limited accuracy may result in inefficient parameter searches that significantly increase the search time.

Recent studies proposed adaptive sampling based on gradient information [32] and assigned adaptive weights to sample points in loss calculations [33, 34]. These enhancements improved the prediction performance of PINNs in regions with significant gradients and limited sampling points. Furthermore, the problem of gradient disappearance in FCNs remains unresolved. Researchers have evaluated the replacement of FCNs with other neural network techniques [35] such as CNNs [36] and recurrent neural networks (RNNs) [37] to achieve better performance and more precise results.

To address this, this study proposed a novel framework called the R2-PINN. It combines the S-CNN architecture with the RAR method to solve neutron diffusion equations. The proposed model effectively alleviates the gradient vanishing problem by adding the gradient backpropagation path. Moreover, the network can balance the loss between regions and achieve higher accuracy by using a resample mechanism. The R2-PINN was evaluated against the FCN to solve the neutron diffusion equation benchmark problems (introduced in Section II). Additionally, it was demonstrated that our method effectively suppressed the loss function oscillation and achieved high-precision field prediction. In addition, our method significantly reduced the time required for an eigenvalue search. This enabled us to obtain an accuracy of 10−5 within 10 min and an accuracy of 10−4 in 250 s.

The remainder of this paper is organized as follows: Section II introduces the benchmark problems used in Section IV. Section III introduces the basic architectures including the structure of the S-CNN and RAR resample methods and the overall R2-PINN architecture. Section IV presents the multiple experiments conducted to optimize the hyperparameters and verify the superiority of the proposed model. In particular, different search algorithms are compared for the parameter search to reduce the search time. It also presents generalizability validation experiments using the same model for multiple benchmark problems. Section V analyzes and discusses the experimental results. Finally, Section VI concludes the study and discusses future research directions.

II. PROBLEM SETUP

A. One-dimensional reactor diffusion equation for a single energy group

In this section, a single-group k-eigenvalue problem is introduced for the criticality calculations. The largest value of k (known as the effective neutron multiplication factor or keff) should be determined. The single-group neutron diffusion model is given by Eq. 1:

∂ϕ(r, E, t) =∇ · D∇ϕ(r, E, t)−Σt(r, E)ϕ(r, E, t)+ (cid:90) ∞ S(r, E, t)+ ν(E′)Σf (r, E′)ϕ(r, E′, t)dE′+ (cid:90) ∞ Σs(r, E′ → E)ϕ(r, E′, t)dE′

where ϕ(r, E, t) denotes the neutron flux of the energy group E in the r-coordinate at the instant t, v represents the neutron velocity, D denotes the diffusion coefficient, ν represents the neutrons per fission numbers, χ(E) represents the prompt neutron spectra, Σt represents total macroscopic cross-section, Σs represents the macroscopic scattering cross-section from group E′ to group E, Σf represents the fission macroscopic cross-section, and S(r, E, t) represents the neutron source.

In the absence of S(r, E, t), the initial neutron flux density is symmetric along the x-axis. Eq. 1 can be simplified into Eq. 2[1]:

If the composition of the system material (i.e., k∞ and L2) is given, a unique critical size (denoted by a0) results in keff = 1. The critical size a0 corresponds to the critical state of the reactor. For reactor sizes larger than a0, keff > 1, which indicates that the reactor is in a supercritical state. Conversely, for reactor sizes smaller than a0, keff < 1, which indicates that the reactor is in a subcritical state [42]. However, if the reactor size is given, it is feasible to determine a fuel enrichment (material composition) that satisfies Eq. 4 and ensures that the reactor attains criticality. When the system is in the critical state, the neutron flux density distribution within the reactor can be described as follows:

∂ϕ(r, t) = ∇2ϕ(r, t) + k∞ − 1 L2 ϕ(r, t)

Here, k∞ represents the infinite multiplication factor, and L denotes the diffusion length. Consider a uniform bare reactor [41] that is shaped as an infinite-plate bare reactor with dimensions of infinite length and width, and a thickness (including the extrapolation distance) of a. This is illustrated in Fig. 1 [FIGURE:1]. The analytical solution for the neutron flux can be obtained using the separation-of-variables method, as shown in Eq. 3:

ϕ(x) = Acos

B. Two-dimensional reactor diffusion equation for a single energy group

Based on II A, consider a two-dimensional neutron diffusion equation formulated as follows:

ϕ(x, t) = (cid:34) (cid:88) (cid:90) a ϕ0(x′)cos (2n − 1)π (cid:35) x ′dx ′ (2n − 1)π x e(kn −1)t/ln

∂ϕ(x, y, t) = ∇2ϕ(x, y, t) + k∞ − 1 L2 ϕ(x, y, t)

By discretizing Eq. 7 and approximating the Laplace operator and partial derivatives, the equation can be converted into the following form:

i,j − ϕn i+1,j − 2ϕn i,j+1 − 2ϕn i,j + ϕn i−1,j i,j + ϕn i,j−1 k∞ − 1 L2 ϕn

Fig. 1. Infinite Plate Reactor.

The critical conditions for a bare reactor in the single-group approximation are given by Eq. 4. keff = 1 + L2B2 = 1 where keff is the effective neutron multiplication factor. When the system is in a critical state, the neutron flux-density distribution satisfies the wave equation according to the fundamental eigen function corresponding to the minimum eigenvalue g. This is given by Eq. 5:

∇2ϕ(r) + B2 g ϕ(r) = 0

where ϕn i,j denotes the value of the neutron flux at the spatial grid point (i, j) at time n. ∆t is the time step. ∆x and ∆y are the spatial steps in the x- and y-directions, respectively. D is the diffusion coefficient, and v is the neutron velocity. k∞ denotes the infinite multiplication factor, and L is the diffusion length.

According to Eq. 8, the spatial domain meshes use the finite difference method to solve for the entire domain flux magnitude, and the numerically solved data are used as a test set to verify the model accuracy.

C. Two-dimensional rectangular geometry multi-group multi-material diffusion problem

In a nuclear reactor, the neutron transportation can be described by the multi-group diffusion theory. In this case, the fast- and hot-group neutron fluxes satisfy the following diffusion equations:

−D1∇2ϕ1(r) + Σt1ϕ1(r) = [Σs1→1ϕ1(r) + Σs2→1ϕ2(r)] + [νΣf1ϕ1(r) + νΣf2ϕ2(r)]

−D2∇2ϕ2(r) + Σt2ϕ2(r) = [Σs1→2ϕ1(r) + Σs2→2ϕ2(r)] + [νΣf1ϕ1(r) + νΣf2ϕ2(r)]

where D1 and D2 are the diffusion coefficients of fast- and hot-group neutrons, respectively. ϕ1 and ϕ2 are the fast- and hot-group neutron flux densities, respectively. νΣf1 and νΣf2 are the neutron production cross sections of the fast- and hot-group neutrons, respectively. Σs1→2 is the fast group to hot group fission source term.

For pressurized water reactors, the example is divided into two different material regions, which is shown in Fig. 2 [FIGURE:2]. The material parameters for each region are listed in Table 2 [TABLE:2]. Meanwhile, considering that the boundary energy between the fast- and hot-group is low enough, under such circumstances, no hot neutrons are directly produced by nuclear fission. As a result, χ1 = 1, χ2 = 0, and Σs2→1 = 0. Eq. 9 and Eq. 10 can be simplified to Eq. 11 and Eq. 12.

−D1∇2ϕ1(r) + Σr1ϕ1(r) = [νΣf1ϕ1(r)+νΣf2ϕ2(r)]

−D2∇2ϕ2(r) + Σa2ϕ2(r) = Σs1→2ϕ1(r)

Fig. 2 [FIGURE:2]. Material Distribution [30].

Table 2 [TABLE:2]. Calculate Area Material Properties.

Material 2 Material 1 Energy Group Dg(cm) Σa (cm−1) νΣf (cm−1) Σs1→2 (cm−1) 0.02767 was used as a test set in the experiment to evaluate the model prediction accuracy.

D. 2D-IAEA benchmark problem

The 2D-IAEA PWR benchmark problem is a two-dimensional static problem with two neutron groups but without delayed neutron precursors [43]. It is modeled by the following two-dimensional two-group diffusion equations:

(cid:26) −D1∇2ϕ1 +(Σa1 +Σs1→2)ϕ1 = λχ1(νΣf1ϕ1 +νΣf2ϕ2) −D2∇2ϕ2 +Σa2ϕ2 −Σs1→2ϕ1 = λχ2(νΣf1ϕ1 +νΣf2ϕ2)

The reactor has a two-zone core containing 177 fuel assemblies with a width of 20 cm. The core is radially reflected by 20 cm of water. Owing to the symmetry along the x- and y-axes, this one-quarter reactor domain is denoted by Ω. It is composed of four sub-regions of different physical properties Ω1,2,3,4. The reactor is shown in Fig. 3 [FIGURE:3]. Neumann boundary conditions were enforced at the left and bottom boundaries. The group constants for this problem are listed in Table 3 [TABLE:3].

Fig. 3. Geometric Layout of the 2D-IAEA Benchmark Problem [43].

Table 3 [TABLE:3]. Group constants of the 2D-IAEA benchmark.

Region Group Dg(cm) Σag(cm−1) νΣfg(cm−1) Σ1→2(cm−1)

III. METHODS

A. PINN loss formulation

Based on the multi-group diffusion theory, the distribution of the neutron flux in the core is obtained by iteratively solving the discretized diffusion equation. The obtained dataset In PINN, considering the general form of a parameterized and nonlinear PDE, F (u, x, y, t, :) = 0, (x, y) ∈ Ω, t ∈ [0, T ] where u represents the latent solution and Ω represents the solution domain. This formula can express PDEs in almost all the fields of mathematical physics.

Nf data points were sampled to measure the physical consistency. This type of loss is collectively referred to as the PDE loss. It has the following form:

LossPDE = Nf(cid:88) F (u, x, y, t, :); x ∈ Ω

Subsequently, the initial and boundary losses owing to the known initial and boundary conditions are introduced. Ni and Nb data points are collected to calculate LossInitial and LossBoundary, respectively. In addition, a few data points should be used for training.

LossInitial = Ni(cid:88) (ϕtruth −ϕpredict); t = 0, x ∈ Ω

LossBoundary = Nb(cid:88) (ϕtruth −ϕpredict); x ∈ ∂Ω

LossData = Nd(cid:88) (ϕtruth −ϕpredict)

Incorporating a small amount of labeled data provides information regarding the correct order of magnitude. This enables the PINN to better calibrate its predictions and prevent unrealistic or divergent solutions. The inclusion of such labeled data significantly contributes to the stability and accuracy of the training process for PINNs. By combining the above losses, networks can be penalized, and the training process is constrained effectively. This ensures that the obtained solutions adhere more closely to the fundamental laws of physics. Therefore, the final network loss formulation is as follows:

LossTotal =LossPDE+ (LossInitial +LossBoundary +LossData) · w

The weights of the different loss functions and number of data points used to calculate these losses significantly affect the convergence speed and accuracy of the model. The final sampling ratio was determined through multiple experiments. The sampling ratio for the PDE loss, boundary loss, initial loss, and data loss was approximately 30:10:10:1. Multiple losses can be balanced by adjusting the weights w. This prevents the network from prioritizing a single loss, particularly when significant differences in the magnitude between losses exist.

B. S-CNN architecture

When solving the neutron diffusion equation, the performance of the NN can be affected significantly by the vanishing gradient problem as the network depth increases. This problem can straightforwardly result in training failure and render the results unreliable. Moreover, it significantly limits the increase in the number of layers, reduces the expressive power of the network, and limits the prediction accuracy of the network.

Inspired by the effective alleviation of the gradient vanishing problem in the ResNet architecture [44] in the image domain (which introduces the concept of residual learning), networks can learn residual mappings (the difference between the input and desired output). This facilitates the training of very deep neural networks. Thus, a skip-connection mechanism was introduced into the PINN architecture.

The input sampling features are the coordinates, namely, x, y, and t. These are independent. Therefore, a separate filter was applied to each feature in the experiment. A one-dimensional convolution was used as the basis for each network layer. Each hidden layer is expressed as follows:

zl = fl(Wlzl−1 + bl)

where zl denotes hidden layer l between the input and output layers; Wl and bl are the weight and bias, respectively; and fl(·) denotes the activation function (e.g., the Tanh function).

On this basis, skip connections were added between different layers. The corresponding hidden layer is expressed as

zl = fl((Wlzl−1 + bl) + zl−n−1), l > n + 1

where n represents the skip distance, that is, the number of crossed hidden layers. To determine where a skip connection should be added, the contribution of each layer to the training of the entire network was measured by calculating the gradient norm for each layer. Here, the gradient norm was computed as in [40]:

||g|| = sqrt(sum(g 2

For example, in the 10-layer network, the gradient contribution was calculated for each layer of the network (Fig. 4 [FIGURE:4]). When the gradient is significant, the parameters of the layer can be updated conveniently. A large gradient results in an unstable model, and a small gradient indicates that the layer may need to undergo more iterations to attain the optimal state and may encounter a gradient vanishing problem. Thus, the network increases the shallow gradient paths by establishing shallow-to-deep skip connections. This provides more gradient paths in the shallow layers and prevents vanishing gradients in the deep layers. The detailed 10-layer S-CNN architecture used in Section IV is presented in Table 4 [TABLE:4]. Here, B, C, and L represent the batch size, channel, and length, respectively.

Meanwhile, referring to the grid division of conventional numerical computation methods, fine-grained samples should be collected from regions with large gradients to improve the prediction accuracy of the network in these regions. Based on this, we consider using Eq. 24 to update our dataset, that is, introducing RAR [38] to improve the distribution of residual points during the training process of PINNs to address the bottleneck phenomenon that originates from the difficulty of reducing the PDE residuals in certain regions. This ultimately enhances the predictive accuracy of the model.

By selectively sampling more points in regions where the PDE residuals are significant, this approach enables the network to focus on challenging areas and adjust the sampling density accordingly. This, in turn, results in improved learning and prediction capabilities. The method is particularly effective for capturing the complex behavior of PDE solutions and identifying sharp gradient regions.

The pseudocode for the RAR method is as follows:

Set S with randomly sampled initial points

Algorithm 1: RAR Algorithm.

Input Output: Updated set S

Divide the solution domain into α2 subintervals;

Train PINN for n iterations;

repeat

Compute LossPDE for points in set S;

Calculate the average residual of each subinterval;

Randomly sample S′ from the subinterval with the highest average residual.

Update set S:

S = S ∪ S′;

Train PINN for n iterations;

until the maximum number of iterations is attained or the total number of points attains the limit;

Fig. 4. Gradient Norm of Each Layer.

Table 4. S-CNN Structure.

Input Size(B,C,L) Output Size(B,C,L) (None,2,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,26,1) (None,1,1) Param Layer Name input layer conv1 layer conv2 layer conv3 layer conv4 layer conv5 layer conv6 layer conv7 layer conv8 layer output layer

In the experiment, the skip distance n was set to 3 for larger gradient propagation spans. It is the best parameter for shallow networks. When n < 3, the jump distance is insufficient. This may limit the network's capability to learn complex physical fields and does not prevent the gradient vanishing problem. When n > 3, the number of jumping layers is excessively large. This may pose a challenge to gradient propagation and, thus, increase the difficulty of optimizing the network. Experiments were conducted on S-CNN models with varying depths, and high-precision results were obtained consistently. This validated the effectiveness of the residual module in addressing the gradient problem.

C. RAR Mechanism

Owing to specific regions with large gradients in the overall solution domain, if sampled uniformly, the network displays inadequate fitting of the local regions. This issue can be addressed by increasing the weights of specific sampling points (as shown in Eq. 23) or by resampling using Eq. 24:

This algorithm divides the solution domain into α2 subintervals with α subdivisions in both x- and t-directions. The Latin hypercube sampling (LHS) method [39] was used for random sampling. This ensured that sample points covered the entire space without clustering or bias.

Xnew = Xraw + w · Xf

Xnew = Xraw + Xresample

where Xraw represents the original dataset, Xnew represents the new dataset, Xf represents the set of sampling points with more significant PDE residuals, Xresample represents the set of resampling points, and w represents the penalty weight.

As shown in Fig. 5 [FIGURE:5], by adaptively adding points to regions with more significant residuals, more intensive sampling was conducted in one of the subdomains of the entire domain. This enabled the network to better capture the PDE solution's behavior. This adaptive sampling approach improved the distribution of residual points and mitigated the bottleneck phenomenon. Ultimately, it enhanced the accuracy and performance of the model and accelerated its convergence.

D. Proposed Framework: R2-PINN

Our study proposed an R2-PINN architecture. It combines the PINN structure, S-CNN, and RAR mechanisms to solve PDEs with improved accuracy and computational efficiency. The structure of R2-PINN is shown in Fig. 6 [FIGURE:6]. In R2-PINN, S-CNN is used as the backbone of the network. It can better capture the features of the PDE solution and improve the training efficiency. The RAR mechanism is employed to balance LossPDE across regions of the domain. This ensures that the network focuses on regions with large errors and adaptively refines the mesh. By combining the PINN structure, S-CNN, and RAR mechanisms, the R2-PINN can effectively solve PDEs with improved accuracy and computational efficiency. It leverages the capability of deep learning and adaptive refinement to capture the complex features of a PDE solution. Thereby, it enables a more precise representation of the physical phenomena. Section IV describes the verification of the superiority of the proposed R2-PINN network and each mechanism through a series of experiments.

IV. EXPERIMENTS

In Section IV A, the generation of datasets is introduced, specifically for testing the accuracy. Section IV B compares S-CNN networks with different depths to validate the superiority of the improved PINN network architecture. Section IV C investigates how resampling affects the model accuracy. In Section IV D, R2-PINN is implemented to search for k∞, and its search efficiency is further enhanced in Section IV E. Finally, in Section IV F, the generalization capabilities of the improved PINN architecture model are presented. Furthermore, Sections IV G to IV I evaluate the generalizability of the models in two-dimensional neutron diffusion, including multi-group and multi-material scenarios. Finally, Section IV J explains the strategy for selecting optimal hyperparameters for the R2-PINN.

Fig. 6. Overall Architecture.

Finally, all the samples in the dataset were used to validate the model accuracy.

A. Dataset preparation

1. One-dimensional reactor diffusion equation for a single energy group

According to Eq. 3 and assuming symmetric initial conditions, two initial distributions were used for the dataset.

ϕ1(x, 0) = cos(π · x /a) − 0.4cos(2π · x /a) − 0.4;

ϕ2(x, 0) = 0.5cos(2π · x /a) + 0.5; x ∈ [−0.5, 0.5]

The numerical values used in Eq. 2 are as follows: v=2.2 × 103 m/s, D=0.211 × 10−2 m, L2=2.1037 × 10−4 m2, and a=1 m. The boundary follows the delicacy boundary and predicts the flux distribution for 0.015 s. Data on how the flux varies with time for different parameter settings can be obtained by modifying the k∞ parameter value. Our test dataset consists of 10000 data points with 100 grid points in both x- and t-directions. The experimental section tests the accuracy of the model prediction on this dataset.

2. Two-dimensional reactor diffusion equation for a single energy group

To ensure that the boundary flux was zero and that the full-domain flux was continuous, we set the initial flux distribution at time t = 0 to the following equation:

ϕ = (exp(−(x 2 + y 2)/20) − exp(−100)) x ∈ [−10, 10], y ∈ [−10, 10].

An iterative solution was obtained using 100 grids in the x-, y-, and t-directions to solve the flux distribution for 1 s, which was used as a dataset for testing the model.

3. Two-dimensional rectangular geometry multi-group multi-material diffusion problem

A source iteration method was used to solve the dual-group neutron flux and test the accuracy of the model. The ranges of the values of x and y are shown in Fig. 2. The total dataset consists of 10000 data points with 100 grid points in both x- and y-directions. The boundary follows the delicacy boundary.

4. 2D-IAEA problem

The reference solution for the two-dimensional two-group diffusion equations was obtained using the high-quality general-purpose finite-element solver FreeFem++ [45]. The eigenvalue problem was solved using Arpack ++. It is an object-oriented version of the ARPACK eigenvalue package [46]. The total number of samples in the dataset was 12286 [47], with 76 data points as data fed into the network.

Fig. 5. PDE Points Distribution. (a) Before Resampling; (b) After Resampling.

B. S-CNN depth and kernel size ablation experiment

All the parameters except for the base network were maintained consistent to compare the FCN and S-CNN architectures. The total number of sampled data points was 5000, with 3000 data points used to calculate LossPDE, 1000 data points sampled for LossInitial and 1000 data points sampled for LossBoundary. Network training does not involve feeding the data to calculate LossData. The Tanh activation function and LBFGS optimizer were used. The Gaussian distribution random sampling method was used to initialize the network weights and biases. The number of hidden neurons per layer in the network was set to 26.

To investigate the impact of the layer number and kernel size on the model performance, ablation experiments were conducted on S-CNN networks with different depths and kernel sizes. The padding was set to zero for kernel size one and to one for kernel size three. The detailed experimental results are listed in Table 5 [TABLE:5] and Fig. 7 [FIGURE:7].

In Table 5 and Fig. 7, Ω MSE refers to the mean squared error (MSE) score over the entire domain, and Ω1 MSE refers to the MSE score at t=0.015 s, which is calculated separately to assess the capability of the model to extrapolate in time. The baseline refers to the results of FCN. From Table 5 and Fig. 7, it can be observed that when the number of layers is less than 10, the accuracy shows a positive correlation with the number of layers. However, as the number of layers increases further, the accuracy does not increase. Hence, it is preferable to select fewer layers to make the NN train faster and achieve a high accuracy. This is because although the expressive capability of the neural network increases with the number of layers, beyond a certain threshold, an increase in the number of layers does not significantly improve the expressive capability of the network. An increase in the number of layers only causes an increase in the training time, and the model experiences overfitting problems. Moreover, because the input features are coordinates and not related to each other, the kernel size set to one can consider each channel as an independent feature to be addressed. This operation is more in line with reality. Therefore, a kernel size set to one can achieve a higher accuracy than a kernel size set to three. Based on the above analysis, the 10-layer S-CNN with a kernel size of one exhibited the best predictive accuracy with the minimum number of parameters. Consequently, this configuration was used in the subsequent experiments.

C. Ablation experiment on resampling parameters in S-CNN

According to the RAR algorithm, the resampling granularity was set to α to define the granularity of the subdomains in the solution domain. The solution domain was divided into α subintervals along the x and t dimensions. This resulted in α2 subintervals. Following the RAR algorithm, in every 1000 epochs, the network calculates and compares the MSEs of each subdomain and performs resampling in the subinterval with the largest MSE. The number of resampling points is denoted by m.

The initial PDE sampling point was set to 3000 to prevent excessive sampling and training. Moreover, 2000 data points were sampled, with 1000 points to calculate the initial loss and 1000 points to calculate the boundary loss. The maximum number of PDE samples was set to 5000. Resampling was stopped when the total number of PDE samples attained 5000 after multiple iterations. The LHS method was used to resample.

Ablation experiments were conducted in two dimensions (resampling granularity and number of resampling points) while maintaining an identical S-CNN architecture (10 layers) and the other hyperparameters. The S-CNN network was compared with an FCN (baseline) using identical initial conditions, that is, ϕ0 = ϕ1. The detailed experimental results are presented in Tables 6 and 7.

Table 6 [TABLE:6]. Ablation Study on Resampling Numbers.

Resample Numbers m Baseline Ω MSE 8.9 × 10−7 9.1 × 10−8 1.3 × 10−7 9.5 × 10−8 1.3 × 10−7 9.8 × 10−8 8.0 × 10−8 1.3 × 10−7 1.0 × 10−7 3.9 × 10−8 Ω1 MSE 3.8 × 10−6 2.1 × 10−8 8.4 × 10−8 9.8 × 10−9 5.4 × 10−8 4.1 × 10−8 4.9 × 10−9 3.1 × 10−8 2.0 × 10−8 1.8 × 10−8

Table 7 [TABLE:7]. Ablation Study on Resampling Granularity.

Resample Granularity α Baseline Ω MSE 8.9 × 10−7 8.0 × 10−8 1.5 × 10−7 1.2 × 10−8 6.4 × 10−7 1.0 × 10−7 Ω1 MSE 3.8 × 10−6 4.9 × 10−9 8.5 × 10−8 9.1 × 10−8 3.5 × 10−8 8.8 × 10−8

From Table 6 and Table 7, it is evident that under different sampling hyperparameters, our network consistently achieved a better Ω1 MSE result. It was two orders of magnitude higher than that of the FCN baseline. When using the optimal hyperparameters, the R2-PINN achieved an accuracy of 10−8or even 10−9. This indicates that our method has a significant advantage in determining whether the flux values attain a steady-state at a specific instant. This advantage was demonstrated during the k∞ search described in Section IV D.

Furthermore, the results were obtained by adopting another initial condition, where ϕ0 = ϕ2. A boxplot analysis of the resampling hyperparameters for all the results is shown in Fig. 8 [FIGURE:8] and Fig. 9 [FIGURE:9]. When the number of resamples is 500 and the resample granularity is set to two, the model achieves the best accuracy.

Table 8 [TABLE:8]. Training Performance with Different Resample Numbers.

m Training Epochs Training Time (s) Avg. Time per Epoch(s)

Fig. 8. MSE Comparison Between Parameters. (a) m Comparison; (b) α Comparison.

Fig. 11 [FIGURE:11] shows the distribution of the values predicted by the model. When t approaches zero, there is a specific deviation between the predicted result in the boundary region and zero. This guided us to appropriately increase the weights of the LossBoundary and LossInitial.

Fig. 10 [FIGURE:10]. Error Field. (a) k∞ = 1.0001; (b) k∞ = 1.0041.

Fig. 9. Resample Parameter 3D Visualization.

Given the introduction of the RAR mechanism, it is necessary to consider whether there is a significant time overhead. Hence, time comparisons were performed for different numbers of samples. The results are listed in Table 8. It can be observed that when the number of resamplings was set to 500, the training time required by the model reduced significantly compared with that when it was set to zero. By calculating the time consumed per cycle, we determined that this trend did not increase significantly. Thus, setting an appropriate number of resamplings effectively reduced the overall network training time. This shows that the RAR method involves an increase in the number of sampling points. This results in a different density of samples in each region, as well as a relatively large number of sampling points in complex regions. This contributes substantially to the convergence speed of the network.

The model uses the optimal resampling parameter configuration to predict the flux distribution under different k∞ values where ϕ0 = ϕ1. The error plot is shown in Fig. 10. Except for the relatively large errors at t=0 s, the errors in the other regions were relatively flat. Significantly, as the time increased, the errors increased marginally. In addition, Fig. 11. Predict Result. (a) k∞ = 1.0001; (b) k∞ = 1.0041.

To evaluate the performance of the model and measure the difference between the losses, Fig. 12 [FIGURE:12] shows the training and testing losses, respectively. R2-PINN converged in 2000 epochs. In Fig. 12, LossPDE is larger than LossBoundary and LossInitial. This may result in LossPDE dominating the optimization process, whereas the other losses cannot be optimized effectively. To address this issue, w in Eq. 19 was set to 100 to impose higher penalties on the boundary and initial region error.

Based on the experimental results in Section IV B and Section IV C, an optimized S-CNN architecture with an optimized RAR mechanism was used to compose the R2-PINN for further use in searching critical parameters. This is discussed in Section IV D and Section IV E.

Fig. 13 [FIGURE:13]. ϕ Distribution. (a) ϕ0 = ϕ1, k∞ = 1.0001; (b) ϕ0 = ϕ1, k∞ = 1.0041; (c) ϕ0 = ϕ2, k∞ = 1.0001; (d) ϕ0 = ϕ2, k∞ =

D. k∞ Search with R2-PINN

For a given geometric shape and volume of the reactor core, k∞ and L2 can be adjusted by modifying the reactor core size or modifying the composition of the materials within the reactor such that keff is one. When the system attains a steady state after a sufficient period, the neutron flux density follows the distribution described by Eq. 6, and the reactor is in a critical state.

In this experiment, the L2 parameter size was not altered. Moreover, different networks were trained by adjusting k∞ to predict the evolution of the neutron flux at this parameter. By continuously adjusting the value of k∞, we attempted to adjust keff to one, thereby attaining the critical state. Initially, the parameter search range was set as [1.0001, 1.0041]. It has been verified that when k∞ is 1.0001, keff < 1. This indicates a subcritical state in which ϕ(x, t) decays exponentially with time t. When k∞ is 1.0041, keff > 1 indicates a supercritical state in which ϕ(x, t) increases continuously. The specific distributions are shown in Fig. 13.

In the absence of delayed neutrons, when the system approaches criticality it converges to a critical state within a significantly short time (5 − 8 ms) regardless of the boundary conditions. Here, ϕ does not vary. At this point, the ϕ-distribution is a steady-state analytical solution. The parameter search interval is partitioned into n equal parts, yielding multiple k∞ values. The network is trained for each value, calculates ϕt after a specific time tτ , and performs a search until ϕt approaches zero within a reasonable accuracy range. The process is illustrated in Fig. 14 [FIGURE:14]. In each network, Eq. 2 is used as the equation for LossPDE. Eq. 25 is used as the equation for LossInitial. The MSE between the boundary predicted values and zero is used to compute LossBoundary.

Using the automatic differentiation mechanism of the NN, the partial derivative of ϕ for t was calculated after obtaining the predicted values. Given the network's capability to predict flux variations within a time interval of 0.015 s, the experiment used the last five time points to calculate ∂ϕ(r, t)/∂t and to determine whether the system was in a critical state [29].

Using Fig. 14, searches for k∞ when the initial distribution follows ϕ1 and ϕ2 were conducted. Each search result was recorded. ϕt as a function of k∞ is shown in Fig. 15 [FIGURE:15]. When n = 2 and the grid method degenerates into a binary search, only approximately 20 iterations are required to obtain the search results with an accuracy level of 10−5. The total runtime of the program was approximately 30 min.

From Fig. 15, ϕt varies exponentially with k∞. This further indicates that gradient-based methods such as gradient descent or Newton's method can be evaluated to search for parameter values more rapidly and efficiently. Alternatively, using curve-fitting techniques to fit the scattered data and the intersection of the resulting curve with the x-axis yields the value of k∞ at the critical state. This process can be completed in a small number of search iterations. The k∞ value Fig. 15. Search Result of ϕt. (a) ϕ0 = ϕ1; (b) ϕ0 = ϕ2.

corresponding to the critical state is identified by progressively refining the interval. The search results are presented in Table 9. Among these, ∆ϕ at the last five time points is recorded to determine whether ϕ attains a steady state. Compared with the results of the FCN [29], the R2-PINN has a smaller ∆ϕ. This implies that the R2-PINN search has a higher accuracy than FCN. Furthermore, the search results Fig. 16. Steady-State Verification. (a) ϕ0 = ϕ1; (b) ϕ0 = ϕ2.

From Fig. 16, the flux tends to stabilize as t increases. This implies that the system has attained a critical state. This indicates that our network can effectively search for optimal parameters corresponding to the critical state.

E. k∞ Search efficiency improvement

When searching for k∞, the initial search interval is large. The k∞ value to be searched differs considerably from the critical state k value. Thus, the accuracy of prediction is not highly required. Only the prediction is used to compute ϕt to serve as a priori information for the next interval refinement. Therefore, during the training section, we examined whether the rate of variation in the fluxes converged at regular intervals. When ϕt converges, the training section is stopped, and the next k-value network training begins. This results in a faster parameter search without loss of accuracy. The equation for determining when to stop network iteration is as follows:

((ϕt)i+1 − (ϕt)i)/((ϕt)i − (ϕt)i−1) < λ

The parameter λ was set to 0.01. At every 200 iterations, Eq. 27 was computed to determine if the network had stopped iterating. The comparative results for ϕ0 = ϕ1 are listed in Table 10 [TABLE:10]. The early termination mechanism can significantly reduce the search time (0.56 times the original one).

Table 10. Time Consume Comparation.

R2-PINN R2-PINN(Early Termination) Cost Time 7841 s 1837 s 1027 s Search Result

Furthermore, as shown in Fig. 15, ϕt is exponentially related to k∞. Therefore, it can be used to determine the critical value of k∞ by fitting a quadratic function. To optimize the search algorithms, an experiment was conducted to compare three search methods: binary, grid, and quadratic fitting. Using R2-PINN with an early termination mechanism, the results are shown in Fig. 17 [FIGURE:17].

Fig. 18 [FIGURE:18]. MSE Accuracy Comparison for Different k∞.

a high accuracy of at least 10−7. This indicated its capability to handle various scenarios and maintain reliable predictions. The observed differences in MSE across parameter values were relatively small. This further highlights the stability and effectiveness of the network. The robustness of the R2-PINN network is evident from its consistently high accuracy for different parameter values.

G. R2-PINN for solving a two-dimensional reactor diffusion equation for a single energy group

In this experiment, an 11-layer S-CNN network was used for training. The remaining hyperparameters were selected as described in Section IV D.

To validate the generalizability of the models, an experiment was conducted between different k∞. The results are shown in Table 11 [TABLE:11]. Here, the network achieved an accuracy of 10−5 under different parameters, and the MSE for the extrapolation region (that is, the region with t = 1 s) still achieved an accuracy of 10−5. This demonstrates that the model performs well in terms of temporal extrapolation capability and can be used to search for parameters corresponding to the critical state.

Fig. 17. Comparison of Search Method.

Table 11. MSE Results in Different k∞.

The quadratic fitting search method determined k∞ rapidly in 250 s and attained an accuracy of 10−4. Meanwhile, the grid search method determined k∞ in 522 s and attained an accuracy of 10−5. Different methods can be selected for parameter searches based on the actual requirement for time and accuracy.

F. Network prediction for different k∞: R2-PINN's robustness

The network was trained at different k∞ values. The results were compared with the analytical solution generated to obtain the accuracy at each k∞ value, as shown in Fig. 18, to validate the robustness of the R2-PINN.

The robustness of R2-PINN is illustrated in Fig. Across the tested values, the network consistently maintained 4.4 × 10−6 1.1 × 10−6 3.2 × 10−6 1.2 × 10−6 1.8 × 10−6 8.5 × 10−7 2.4 × 10−6 1.1 × 10−5 5.5 × 10−6 2.2 × 10−6 4.5 × 10−6 t=1 s MSE 9.0 × 10−6 1.7 × 10−6 5.2 × 10−6 2.2 × 10−6 3.0 × 10−6 1.6 × 10−6 4.4 × 10−6 2.2 × 10−5 9.3 × 10−6 4.7 × 10−6 8.2 × 10−6

Searching for the k∞ parameter in the manner described in Section IV D yields k∞ = 1.1378. Using the source iteration method, we determined that k∞= 1.1202. The calculation error is approximately 1.5%. The results of the prediction and reference truths are presented in Fig. 19 [FIGURE:19]. The error plots are shown in Fig. 20 [FIGURE:20].

Using R2-PINN, the MSE of ϕ1 attains 1.36 × 10−6 and that of ϕ2 attains 2.49 × 10−7. To better illustrate the flux distribution, particularly the abrupt variations in the neutron flux at the material interfaces for different neutron groups, cross-sectional views of the neutron flux were adopted with slices extracted from the x-z and y-z planes. The results and reference solutions are presented in Fig. 21 [FIGURE:21], Fig. 22 [FIGURE:22].

Fig. 19. Comparison Between Result and Truth. (a) Prediction Result (t=0 s); (b) Prediction Result (t=1 s); (c) Truth (t=0 s); (d) Truth (t=1 s). (a) ϕ1 Distribution in x-z Plane; (b) ϕ2 Fig. 21. Predict Result.

Distribution in x-z Plane; (c) ϕ1 Distribution in y-z Plane; (d) ϕ2 Distribution in y-z Plane.

Fig. 20. Loss Field. (a) t=0 s; (b) t=1 s.

H. R2-PINN for solving two-dimensional rectangular geometry multi-group multi-material diffusion problem

In this two-group diffusion problem, each neutron group's flux is predicted independently to enhance the accuracy. Specifically, we separately model the fast group neutron flux (ϕ1) and hot group neutron flux (ϕ2) using dual PINNs with a criticality factor of keff = 0.9693. Given the interconversion relationship between the fluxes of these two energy groups, dual PINNs are designed to share loss functions and are optimized sequentially. This shared loss structure ensures that the interaction between the energy groups is captured effectively while allowing for customized optimization paths for each flux. To train the models, a four-layer S-CNN was employed. The other PINN-related parameters, such as the sampling ratios, activation functions, and optimization strategies, remained consistent with those detailed in Section IV D.

Fig. 22. Truth Result. (a) ϕ1 Distribution in x-z Plane; (b) ϕ2 Distribution in x-z Plane; (c) ϕ1 Distribution in y-z Plane; (d) ϕ2 Distribution in y-z Plane.

The error fields for each energy group are presented in Figs. 23. Owing to the small value of the flux, to clarify the error representation, we calculated the loss rate. This is shown in Fig. 23 [FIGURE:23]. The specific equations are given in Eq. 28.

loss rate = ABS(ϕpredict − ϕtruth)/ϕtruth

Fig. 23. Loss Field and Loss Rate Field. (a) ϕ1 Loss; (b) ϕ2 Loss; (c) ϕ1 Loss Rate; (d) ϕ2 Loss Rate.

The results show that we can effectively determine the flux distribution under the specified keff according to the steady-state equations. Additionally, according to the experiments presented in Section IV D and Section IV G, we can determine the critical parameters according to the transient equations. Thus, the model can be adapted well to solve the two major problems of nuclear reactors, that is, solving for the fluxes and searching for the steady-state parameters.

I. R2-PINN for solving 2D-IAEA problem

Using a 6-layer R2-PINN structure, we incorporated keff as a parameter for iterative optimization in neural network training. The model utilized 18000 points for computing LossPDE, 500 points for LossBoundary, and 76 points for LossData. Finally, the relative error and relative L∞ error were used to evaluate the accuracy of the R2-PINN. Herein, the relative L∞ error is particularly important in the nuclear engineering domain. The specific equations are expressed in Eq. 29 and Eq. 30. The prediction results and reference truth are presented in Fig. 24 [FIGURE:24]. The absolute error plots are shown in Fig. 25 [FIGURE:25].

Fig. 24. Comparison Between Results and Reference. (a) R2-PINN result of ϕ1; (b) R2-PINN result of ϕ2; (c) FreeFem++ result of ϕ1; (d) FreeFem++ result of ϕ2.

Fig. 25 [FIGURE:25]. Absolute Loss Field. (a) Absolute Error of ϕ1; (b) Absolute Error of ϕ2.

Considering the engineering acceptance criteria for the 2D IBP, the flux calculation error in fuel assemblies with a relative flux higher than 0.9 should be less than 5%. In fuel assemblies with a relative flux less than 0.9, the flux calculation error should be less than 8%. In addition, the relative error of keff should be less than 0.005 [47]. The predicted results shown in Table 12 [TABLE:12] satisfy these acceptance criteria. This indicates that R2-PINN also holds practical engineering application value.

Table 12. Results of 2D-IAEA Benchmark.

e∞ of ϕ1 e∞ of ϕ2 er of keff 1.02977 1.797 × 10−4 er = ABS(ϕpredict − ϕtruth)/ϕtruth

J. The strategy for hyperparameter selection

∥ϕpredict − ϕtruth∥∞ ∥ϕtruth∥∞

The hyperparameter selection strategy was formulated meticulously to balance the model expressiveness, training stability, and prediction accuracy across the computational domain. This subsection explains the important hyperparameter selections for neural networks.

First, a 10-layer network was selected to achieve an optimal tradeoff between complexity and expressiveness. Compared with shallower networks, deeper networks more effectively approximate complex nonlinear functions. This is typical in physical modeling. As shown in Table 5, our evaluation across configurations indicated that a 10-layer S-CNN structure consistently outperforms the others. This makes it the optimal option. This selection method was similarly applied to different benchmark cases to ensure robustness.

Second, the Tanh activation function was selected over ReLU and Sigmoid because of its smoother nonlinearity, which is critical for approximating continuous functions in physical problems. Although ReLU is effective in mitigating vanishing gradients, it has insufficient stability in capturing smooth physical fields and generally encounters difficulty with highly nonlinear PDEs. In contrast, Tanh enhances numerical stability. This makes it more suitable for PINNs.

Third, the LBFGS optimization algorithm was selected for its faster convergence in high-dimensional problems and capability to prevent gradient explosions. Compared with first-order optimizers such as Adam and SGD, LBFGS provides a second-order approximation. This results in more stable training and better performance, particularly in data-limited scenarios. During testing, Adam and SGD were vulnerable to early convergence and local minima, thereby yielding suboptimal predictions. Meanwhile, LBFGS provided better stability and training completeness.

Finally, the selection of 26 hidden neurons per layer achieved a balance between the model capacity and generalization. Excessively few neurons result in underfitting, whereas excessively many neurons risk overfitting and instability. Through extensive experimentation, 26 neurons per layer were observed to provide an optimal trade-off. This ensured accurate and stable predictions.

V. RESULTS AND DISCUSSION

An ablation study on the number of layers revealed that an increase in the depth of the S-CNN architecture improved the accuracy. Significantly, a kernel size of one yielded superior results. However, an accuracy bottleneck makes excessively deep layers unnecessary. As described in Section IV B, the S-CNN outperformed the FCN architecture across layer configurations. Even when the number of layers increased to two-fold, no vanishing gradients were observed. The addition of skip connections effectively mitigated the vanishing gradients and enhanced the stability, robustness, and overall accuracy of the model.

In Section IV C, a sensitivity analysis of the resample parameter revealed that the optimal R2-PINN configuration is a resample granularity of 2 and 500 resamples. Setting the loss weight coefficient w to 100 achieved the best prediction performance, thereby resulting in lower losses than the PDE loss. The test loss in Fig. 12 illustrates a smooth training process without significant fluctuations. At approximately 2000 epochs, the LBFGS optimizer automatically stopped training, thereby achieving an MSE accuracy of 10−7. This indicates a high precision.

Furthermore, k∞ was determined by searching for the critic state in Section IV D using the adjusted R2-PINN. R2-PINN converged rapidly in 1000 epochs and attained an accuracy of 10−8. The fast convergence and high convergence accuracy of the model indicate that it is suitable for parameter search goals that require the training of multiple networks.

Then, multiple search methods are compared in Section IV E to improve the search efficiency. From the experimental results, the quadratic fitting search method can rapidly identify k∞ in only two k-value search processes, with an accuracy of 10−4 in 250 s. This is of significant value for scenarios with high real-time requirements. The grid method can also achieve a parameter search with an accuracy of 10−5 within 10 min. When a high accuracy is required, the grid method can set the optimal grid refinement numbers to attain a reasonable duration with a higher accuracy.

The results of experiments conducted at different k∞ values (described in Section IV F) show that our R2-PINN network exhibited exceptional performance in capturing the system dynamics within the domain Ω1. This region exhibited a significant improvement in accuracy compared with FCN networks. By accurately representing the intricate features and sharp gradients of this specific region, our network enables a more precise determination of whether the system is in a steady state. This enhanced accuracy is highly effective when conducting parameter searches because it allows for higher precision and more reliable results.

Finally, to verify the generalizability of the model, experiments were conducted using the S-CNN architecture for a 2D single cluster (Section IV G). The parameter search error was approximately 1.6%. A solution with an accuracy of e-06 was obtained by using S-CNN to solve 2D multi-cluster multi-materials (Section IV H). The standardized test problem sets——2D-IAEA benchmark for search keff attained an accuracy of 10−4 (Section IV I). These results show that the model can effectively predict the variation in physical quantities in the physical field under multiple equations, different initial conditions, and boundaries. Moreover, it has good generalization under different scenarios.

To summarize, the R2-PINN network performed better than the FCN network in solving the neutron diffusion equations. The accuracy improvement of one–two points is noteworthy considering the computational efficiency achieved by our framework. This efficiency reduces the computational load and maintains a high accuracy. Thus, it is a potential approach for practical applications. Using a suitable search method, the proposed architecture exhibits good real-time performance.

VI. CONCLUSIONS

This study introduced a novel and innovative framework called R2-PINN. It addresses the persistent challenge of the disappearing gradient phenomenon in DNN. In addition, our framework is designed to enhance the accuracy and computational efficiency of PINNs when solving neutron diffusion equations. The R2-PINN can determine k∞ with an accuracy of 10−4 in 250 s. The single NN accuracy attained 10−8 on an average. This is an order of magnitude higher than that for FCN. The S-CNN architecture is integrated into the R2-PINN framework to overcome vanishing gradients. By leveraging cross-layer connections, our model effectively learns the residual information. This improves the depth and expressive power of the network. This architectural enhancement ensures that the network can effectively propagate gradients through the layers, thereby enabling more accurate and stable learning. Furthermore, the RAR method is introduced to enhance the representation and sampling strategies within the network. The RAR method allows for adaptive collection of sample points. Thereby, it ensures that the model captures the essential features and gradients in the solution space. This refinement strategy dramatically improves the capacity of the network to handle sharp gradients and intricate features in the PDE solutions.

As described in the experimental section, comprehensive comparative experiments were conducted to optimize the R2-PINN framework. Through meticulous parameter tuning, including adjusting the number of layers, kernel size, and resampling hyperparameters, R2-PINN achieved significant accuracy improvements of one–two units compared with FCN. This demonstrated its effectiveness in enhancing PDE solutions. The parameter search capability of the R2-PINN was validated by efficiently determining the corresponding value of k∞ when it entered the critical state within the specified search interval using high-precision network predictions. To evaluate the robustness and accuracy of the model under varying parameters, MSE validation experiments with different values of k∞ were conducted. These consistently yielded highly accurate results. Finally, for complex models such as the two-dimensional single-group neutron diffusion equation, two-group two-material neutron diffusion equation models, and 2D-IAEA benchmark, the search for effective value-added coefficients and steady-state flux distribution solutions was conducted successfully.

Overall, the experimental results verify that the integration of S-CNN and the RAR mechanism in R2-PINN enables the network to capture intricate features and sharp gradients in PDE solutions. The adaptive refinement strategy significantly improves the distribution of residual points. This, in turn, enhances the capability of the network to accurately represent complex physical systems. Our observations provide compelling evidence of the potential for the R2-PINN to advance the field of deep-learning-based PDE solving. Our framework outperforms existing methods and exhibits potential for application in real-world physical systems. The contributions of this study have substantial implications for the development of more accurate and efficient models in various scientific and engineering domains.

[1] Z. S. Xie, Physical Analysis of Nuclear Reactors. (Xi’an Jiaotong University Press, Xi’an, 2004), pp. 49-74. (in Chinese)

[2] M.N. Özi¸sik, H.R. Orlande, M.J. Colaço et al., Finite difference methods in heat transfer, 2nd edn. (CRC press, Inc, 2017)

[3] A. Younes, P. Ackerer, F. Delay, Mixed finite elements for solving 2-d diffusion-type equations. Rev. Geophys. 48, 2008RG000277 (2010). doi:10.1029/2008RG000277

[4] Y. M. Hamada, Higher-order compact finite difference schemes for steady and transient solutions of space–time neutron diffusion model. Ann. Nucl. Energy. 175, 109177 (2022). doi:10.1016/j.anucene.2022.109177

[5] S. Yuk, J. Cho, C. Jo et al., Time-dependent neutron diffusion analysis using finite element method for a block-type vhtr core design. Nucl. Eng. Des. 360, 110512 (2020). doi:10.1016/j.nucengdes.2020.110512

[6] X.Y. Li, K. Cheng., T. Huang et al., Research on neutron diffusion equation and nuclear thermal coupling method based on gradient updating finite volume method. Ann. Nucl. Energy. 195, 110158 (2024). doi:10.1016/j.anucene.2023.110158

[7] M.G. Tavares, C.Z. Petersen, M. Schramm et al., Solution for the multigroup neutron space kinetics equations by source iterative method. Braz. J. Radiat. Sci. 9(2A), Suppl (2021). doi:10.15392/bjrs.v9i2A.731

[8] K. Zhuang, W. Shang, T. Li et al., Variational nodal method for three-dimensional multigroup neutron diffusion equation based on arbitrary triangular prism. Ann. Nucl. Energy. 158, 108285 (2021). doi:10.1016/j.anucene.2021.108285

[9] Z. Huang, Y. Yuan, G.M. Liu et al., Solution of Neutron Diffusion Problem Based on COMSOL Multiphysics and Its Application Analysis on Micro Gas-cooled Reactor. Atomic Energy Sci. Technol. 57, 565–575 (2023). doi:10.7538/yzk.2022.youxian.0354. (in Chinese)

[10] K. Sidi-Ali, E.M. Medouri, D. Ailem et al., Neutronic calculations and thermalhydraulic application using CFD for steady state mode. Prog. Nucl. Energ. 159, 104640 (2023). doi:10.1016/j.pnucene.2023.104640 research reactor NUR at the nuclear

[11] Y. Ma, Y.H. Wang, J.H. Yang, ntkFoam: An OpenFOAM based neutron transport kinetics solver for nuclear reactor simulation. Comput. Math. Appl. 81, 512–531 (2021). doi:10.1016/j.camwa.2019.09.015

[12] G.E. Karniadakis, I.G. Kevrekidis, L. Lu et al., Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021). doi:10.1038/s42254-021-00314-5

[13] S. Cuomo, V.S. Di Cola, F. Giampaolo et al., Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next. J. Sci. Comput. 92, 88 (2022). doi:10.1007/s10915-022-01939-z

[14] M. Raissi, P. Perdikaris, G.E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). doi:10.1016/j.jcp.2018.10.045

[15] S.Z. Cai, Z.P. Mao, Z.C. Wang et al., Physics-informed neural networks (PINNs) for fluid mechanics: a review. Acta. Mech. Sinica. 37, 1727–1738 (2021). doi:10.1007/s10409-

[16] B.C. Xu, X.Z. Zhang, Y.X. Wang et al., Self-adaptive physical information neural network model for prediction of two-phase flow annulus pressure. Acta. Petrol. Sin. 44, 545 (2023). doi:10.7623/syxb202303012. (in Chinese)

[17] S.Z. Cai, Z.C. Wang, S.F. Wang et al., Physics-Informed Neural Networks for Heat Transfer Problems. J. Heat. Transf. 143(6), 060801 (2021). doi:10.1115/1.4050542

[18] B. Yu, Z.Y. Gan, S.L. Zhang et al., Prediction of 2d/3d unsteady-state temperature fields and heat sources upon the physics-informed neural networks. Engineer. Mechan. 41, 1–13 (2023). doi:10.6052/j.issn.1000-4750.2023.04.0282. (in Chinese)

[19] Z. Mao, A.D. Jagtap, G.E. Karniadakis, Physics-informed neural networks for high-speed flows. Comput. Method. Appl. M. 360, 112789 (2020). doi:10.1016/j.cma.2019.112789

[20] C. Chen, G.T. Zhang, Deep learning method based on physics informed neural network with resnet block for solving fluid flow problems. Water-sui. 13(4), 423 (2021). doi:10.3390/w13040423

[21] J.S. Bai, T. Rabczuk, A. Gupta et al., A physics-informed neural network technique based on a modified loss function for computational 2d and 3d solid mechanics. Comput. Mech. 71, 543–562 (2023). doi:10.1007/s00466-022-02252-0

[22] T.S.J. Feng, W. Liang, The buckling analysis of thin-walled structures based on physics-informed neural networks. Chinese Journal of Theoretical and Applied Mechanics. 55(11), 2539–2553 (2023). doi:10.6052/0459-1879-23-277. (in Chinese)

[23] H. Zhang, X. Lyu, D. Liu et al., Nuclear Power AI and Opportunities. Applications: Status, Challenges Nuclear Power Engineering. 44(1), 1–8 (2023). doi: 10.13832/j.jnpe.2023.01.0001. (in Chinese)

[24] P. Liu, D.F. Shi, R. Li et al., Simulation of Core Physics Benchmark VERA Based on Monte Carlo Code JMCT. Atomic Energy Sci. Technol. (in Chinese) 57, 1131–1139 (2023). doi:10.7538/yzk.2022.youxian.0593 (in Chinese)

[25] H. Guo, Y.W. Wu, Q.F. Song et al., Development of multi-group Monte-Carlo transport and depletion coupling calculation method and verification with metal-fueled fast reactor. Nucl. Sci. Tech. 34, 163 (2023). doi:10.1007/s41365-023-

[26] D.H. Daher, M. Kotb, A.M. Khalaf et al., Simulation of a molten salt fast reactor using the COMSOL Multiphysics software. Nucl. Sci. Tech. 31, 115 (2020). doi:10.1007/s41365-

[27] Q.H. Yang, Y. Yang, Y.T. Deng et al., Physics-constrained neural network for solving discontinuous interface K-eigenvalue problem with application to reactor physics. Nucl. Sci. Tech. 34, 161 (2023). doi:10.1007/s41365-023-01313-0

[28] Y.S. Hao, Z. Wu, Y.H. Pu et al., Research on inversion method for complex source-term distributions based on deep neural networks. Nucl. Sci. Tech. 34, 195 (2023). doi:10.1007/s41365-023-01327-8

[29] D. Liu, Q. Luo, L. Tang et al., Solving multi-dimensional neutron diffusion equation using deep machine learning technology based on pinn model. Nuclear Power Engineering. 43(2), 1–8 (2022). doi:10.13832/j.jnpe.2022.02.0001. (in Chinese)

[30] D. Liu, L. Tang, P. An et al., The Deep Learning Method to Search Effective Multiplication Factor of Nuclear Reactor Directly. Nuclear Power Engineering. 44(5), 6–14 (2023). doi:10.13832/j.jnpe.2023.05.0006 (in Chinese)

[31] C. Shen, L. Yan, Recent development of hydrodynamic modeling in heavy-ion collisions. Nucl. Sci. Tech. 31, 122 (2020). doi:10.1007/s41365-020-00829-z

[32] L. Lu, X.H. Meng, Z.P. Mao et al., Deepxde: A deep learning library for solving differential equations. Siam. Rev. 63(1), 208–228 (2021). doi:10.1137/19M1274067

[33] L.D. McClenny, U.M. Braga-Neto, Self-adaptive physics-informed neural networks. J. Comput. Phys. 474, 111722 (2023). doi:10.1016/j.jcp.2022.111722

[34] Z. Wang, H. Xia, S. Zhu et al., Cross-domain fault diagnosis of rotating machinery in nuclear power plant based on improved domain adaptation method. Nucl. Sci. Tech. 59(1), 67–77 (2022). doi:10.1080/00223131.2021.1953630

[35] Z.K. Lawal, H. Yassin, D.T.C. Lai et al., Physics-informed neural network (pinn) evolution and beyond: a systematic literature review and bibliometric analysis. Big Data Cogn. Comput. 6(4),140 (2022). doi:10.3390/bdcc6040140

[36] Z.W. Fang, A high-efficient hybrid physics-informed neural networks based on convolutional neural network. IEEE. T. Neur. Net. Lear. 33(10), 5514–5526 (2021). doi:10.1109/TNNLS.2021.3070878

[37] X.Y. Yang, Z.X. Zhou, L.H. Li et al., Collaborative robot dynamics with physical human–robot interaction and parameter identification with PINN. Mech. Mach. Theory. 189, 105439 (2023). doi:10.1016/j.mechmachtheory.2023.105439

[38] C.X. Wu, M. Zhu, Q.Y. Tan et al., A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Comput. Method. Appl. M. 403, 115671 (2023). doi:10.1016/j.cma.2022.115671

[39] M.D. McKay, R.J. Beckman, W. J. Conover, A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code (JSTOR Abstract). Technometrics. 21(2), 239–245 (1979). doi:10.2307/1268522

[40] M. Liu, L.M. Chen, X.H. Du et al., Activated Gradients for Deep Neural Networks. IEEE. T. Neur. Net. Lear. 34(4), 2156–2168 (2023). doi:10.1109/TNNLS.2021.3106044

[41] M.W. Stacey, Nuclear Reactor Physics, 2nd edn. (Wiley, Weinheim, 2010), pp. 43–60

[42] J.J. Duderstadt, Nuclear Reactor Analysis. (Wiley, New York, 1976), pp. 74–88

[43] N. None, Argonne Code Center: Benchmark problem book. (Argonne National Lab.(ANL), Argonne, 1977), pp.277–284

[44] K.M. He, X.Y. Zhang, S.Q. Ren et al., Deep residual learning for image recognition. 2016 Proc. CVPR. IEEE. Las Vegas, USA 2016, pp. 770–778 (2016). doi:10.1109/CVPR.2016.90

[45] F. Hecht, New development in FreeFem++. J. Numer. Math. 20(3–4), 251–265 (2012). doi:10.1515/jnum-2012-0013

[46] R.B. Lehoucq, D.C. Sorensen, C. Yang, ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. (SIAM, 1998), pp. 21–40 doi:10.1137/1.9780898719628

[47] Y. Yang, H. Gong, S. Zhang, et al., A data-enabled comprehensive network with physics-informed numerical study on solving neutron diffusion eigenvalue problems. Ann. Nucl. Energy. 183, 109656 (2023). doi:10.1016/j.anucene.2022.109656

neural

Submission history

[v1] 2025-07-26