Article No.: 1007-4627(2025)03-0001-10
Zhang Shisheng<sup>1</sup>, Zeng Linxing<sup>1</sup>
Submitted 2025-07-21 | ChinaXiv: chinaxiv-202508.00060 | Mixed source text

Full Text

Article No.: 1007-4627(2025)03-0001-10

Optimized Neural Network Models for Accurate Prediction of Proton Separation Energies in Lutetium and Rhenium Isotopic Chains

Zhang Shisheng1, Zeng Linxing1

(1. School of Physics, Beihang University, Beijing 102206)

Abstract: Nuclear masses play an important role in nuclear structure and nuclear astrophysics. For exotic nuclei near the drip line, differences of several hundred keV in separation energies obtained from masses may lead to discrepancies of at least three orders of magnitude in half-lives. At present, 149Lu is the shortest-lived proton emitter experimentally accessible; the separation energies of the surrounding, as-yet undetected nuclides will affect the nucleosynthesis path. To this end, we employ an Artificial Neural Network (ANN) and a Bayesian Neural Network (BNN) for constrained proton separation energies. Practice shows that the two methods suffer from overfitting and accuracy problems, respectively. To improve the accuracy and alleviate overfitting, based on the BNN model and taking into account both the uncertainty of the label values and the statistical error of the model, we developed the BNN-Beihang (BH) model. For the 148–151Lu and 160,161Re proton separation energies predicted by this model, the uncertainties reach below 100 keV for the first time; these results will effectively constrain the lifetimes of proton-emitting nuclei in the lutetium and rhenium isotopic chains.

Keywords: proton separation energy; neural-network model; lutetium isotopic chain; rhenium isotopic chain

CLC number: O571.22  Document code: A  DOI: 10.11804/NuclPhysRev.32.01.01

Introduction

The development of radioactive nuclear-beam physics has provided the possibility of studying exotic properties of exotic nuclei (such as halo structures[1-2], shape inversion[3-4], and new magic numbers[5]). For weakly bound nuclei near the drip line far from the stability island, halo nuclei or proton emitters may be formed[6]; the decay modes and lifetimes of these proton emitters affect the nucleosynthesis paths in the evolution of celestial bodies[7-8]. In general, a difference of several hundred keV in separation energy may cause a discrepancy of at least three orders of magnitude in the lifetime of a proton emitter[9-10]. Therefore, theoretically accurate prediction of separation energies will help us understand nucleosynthesis processes in stellar evolution as well as nuclear decay processes.

So far, 149Lu is the shortest-lived proton emitter observed experimentally[11]. Under the assumption of axial deformation, K. Auranen used the nonadiabatic quasiparticle model to explore the relation between the deformation and lifetime of 149Lu, initially obtaining the conclusion that it is oblate[11]. Subsequently, also under the assumption of axial deformation, we used the deformed relativistic Hartree-Bogoliubov theory in continuum (DRHBc) theory[12] to explicitly rule out the possibility of a prolate 149Lu. Then, in combination with a proton-emission half-life calculated by the semiclassical Wenzel-Kramers-Brillouin (WKB) approximation, the half-life was found to be within the experimental uncertainty[13]. Recently, we went beyond the framework of the axial-deformation assumption and adopted the triaxial relativistic Hartree-Bogoliubov theory in continuum (TRHBc) theory[14] to recalculate the ground-state binding energy (binding energy, \(E_B\)) of 149Lu, thereby confirming that 149Lu has a triaxially deformed ground state; further combining the WKB approximation gave a proton-emission half-life still within the experimental uncertainty[15]. It is worth mentioning that the physical quantity determining the lifetime, \(\sqrt{2\mu|E - V(r,\theta,\varphi)|}\), appears in the exponent. Here, \(V\) is the potential given by the microscopic theory, and the decay energy \(E\) is determined by the nuclear mass. Clearly, a difference of several hundred keV in the decay energy \(E\) can easily lead to orders-of-magnitude discrepancies in the emitter lifetime. Therefore, we need to improve as much as possible the accuracy of nuclear-mass predictions in this region.

There are many theoretical models for describing nuclear masses, including macroscopic liquid-drop models (Liquid Drop Model, LDM), the Duflo-Zuker (DZ) model[16], and microscopic mean-field theory (Mean Field Theory, MFT), etc. P. Möller, using the mass given by the finite-range droplet model (Finite-Range Droplet Model, FRDM), achieved an accuracy of 570 keV[17]. 1995

Received: 2025-07-15; Revised: 2025-07-15
Funding: National Natural Science Foundation of China (12175010)
Author profile: Zhang Shisheng (1976–), female, from Shenyang, Liaoning, professor, doctoral supervisor, engaged in research on atomic nuclei and nuclear astrophysics; E-mail: zss76@buaa.edu.cn

In that year, Duflo and Zuker used the 28-parameter DZ model and achieved an accuracy of 375 keV for 1751 nuclear masses[^16]. Within the framework of non-covariant density functional theory, in 2016 S. Goriely employed the Skyrme-Hartree-Fock-Bogoliubov (SHFB) method and achieved an accuracy of 561 keV for 2353 nuclear masses[^18]. In 2010, Zhao Pengwei used covariant density functional theory (Covariant Density Functional Theory, CDFT) to predict the masses of 60 nuclei with an accuracy of 1330 keV[^19]. At present, the most accurate phenomenological model is the Weizsäcker-Skyrme 4 (WS4) model[^20], which reaches an accuracy of 298 keV for the experimentally directly measured nuclear masses in Atomic Mass Evaluation 2012 (AME2012)[^21], and has a prediction error of 474 keV for all nuclear masses in AME2020[^22]. However, for proton-emission problems, the predicted values of masses or separation energies must deviate by less than 100 keV in order for the uncertainty range of lifetimes to be constrained within 1–2 orders of magnitude. Therefore, to improve model accuracy, we apply machine-learning methods to the constrained proton separation energy (proton separation energy, $S_p$).

In recent years, owing to their powerful fitting capability, machine-learning methods have been widely applied to studies in various dimensions of nuclear physics. In nuclear-physics research, neural-network models were first used to distinguish the stability of nuclei[^23]. Subsequently, machine-learning models were used to predict other properties of nuclei. For example: neural networks have been used to predict the proton separation energies of nuclei[^24]; support vector machines (Support Vector Machine, SVM) have been used to predict multiple properties of nuclei (mass, $\beta$-decay half-life, and spin/parity of nuclear ground states)[^25]; Bayesian neural network (Bayesian Neural Network, BNN) models[^26-28] and convolutional neural network (Convolutional Neural Network, CNN) models[^29] have been used to predict the charge radii of nuclei; the reduced-basis (Reduced-Basis) method has been used in few-body systems to reduce computational cost[^30]; variational autoencoders (Variational Autoencoder, VAE)[^31] and ensemble learning (Ensemble Learning)[^32] have been used to infer the equation of state of neutron stars; VAE has also been used to construct collective variables in many-fermion systems[^33]. Some interpretable models in machine learning, such as SHapley Additive exPlanation (SHAP)[^34], have been used to quantify the importance of input features in models[^35-36].

Recently, machine-learning models have been widely used to predict high-precision nuclear masses. Among them, decision trees[^37] and the decision-tree-based Light Gradient Boosting Machine (LightGBM)[^38] have the advantages of good fitting performance and being less prone to overfitting; kernel ridge regression (Kernel Ridge Regression, KRR) can alleviate overfitting in nonlinear problems[^39-40], but because of its high computational cost, it can only be applied to problems with small datasets; symbolic regression (Symbolic Regression, SR) uses analytic expressions to represent predicted values and has better interpretability[^41]; while models based on neural networks, including multilayer perceptrons (Multi-layer Perceptron)[^42-43], artificial neural networks (Artificial Neural Network, ANN)[^44-45], Kolmogorov-Arnold Network (KAN)[^46], and radial basis functions (Radial Basis Function, RBF)[^47-51], possess strong fitting and generalization capabilities.

However, the above models can only provide a single predicted value and cannot provide the distribution of the predicted value. In contrast, Gaussian processes (Gaussian Process, GP)[^52-53], BNN models[^54-56], and mixture density networks (Mixture Density Network, MDN)[^35,57-58] not only have strong fitting capability but can also provide the distribution of predicted values. Among them, MDN models still suffer from overfitting; moreover, GP models are not suitable for systems with large amounts of data because of their high time complexity. Therefore, we will use a BNN model to predict proton separation energies.

Usually, according to the prediction target, machine learning can be divided into two types: fitting experimental values (or predicted values from traditional physical models) and fitting residuals (residual) (the difference between theoretical-model predictions and experimental values). Fitting experimental values does not require introducing any assumptions, but the model accuracy is relatively low; machine-learning models that fit residuals have relatively higher accuracy, but rely on physical models. Among models that fit experimental nuclear masses, after selecting appropriate features and introducing Garvey-Kelson (GK) relations[^59-60] as constraints, the ANN model achieves a test-set accuracy of 260 keV and effectively alleviates overfitting, making it currently the most accurate model among those fitting experimental values[^45]. Among machine-learning models that fit residuals, Bayesian machine learning (Bayesian Machine Learning, BML) can achieve an accuracy of 84 keV[^61], and the CNN model can achieve an accuracy of 70 keV[^62]. Although these residual-fitting models are highly accurate, overfitting problems still remain.

In practice, existing models face a trade-off between accuracy and overfitting. Overfitting means that a model fits the already-trained data well, but its extrapolation ability is far inferior to its training-set accuracy. For example: the uncertainties of the BML model in predicting nuclei far from the $\beta$-stability line are significantly larger than those of nuclei near the $\beta$-stability line; the optimal phenomenological model WS4 achieves a prediction accuracy of about 468 keV for all nuclear masses in AME2016[^63], whereas for nuclei newly introduced in AME2020, ...

the accuracy is about 1295 keV, a difference of nearly a factor of 3. Therefore, it is necessary to construct a machine-learning model that balances accuracy and overfitting.

This paper aims to construct machine-learning models with high predictive accuracy and low overfitting for \(S_{\mathrm{p}}\) in mirror and isotopic chains. It is mainly divided into three parts: the first part introduces the theoretical frameworks of the two models, ANN and BNN; the second part presents computational details and discussion of results, focusing on the problems of the ANN and BNN models, and, by combining the strengths of both, proposes the improved BNN-Beihang (BH) model and evaluates the predictive performance of the BNN-BH model for mirror and isotopic chains; the third part gives the summary and outlook.

1 Theoretical Framework

1.1 Artificial Neural Network (ANN)

As shown in Fig. 1, the ANN model consists of an input layer, several hidden layers, and an output layer. Adjacent layers \(L_1\) and \(L_2\) are connected through a parameter matrix \(W\) and a bias vector \(b\): \(L_2=f(W\times L_1+b)\). Here, the nonlinear activation function \(f\) serves to enhance the fitting capability of the neural network. In this work, the Gaussian Error Linear Unit (GELU)\(^{[64]}\) is selected as the activation function.

Schematic diagram of the fully connected neural-network ANN model.

Fig. 1 Schematic diagram of the fully connected neural-network ANN model. Blue dots represent the input layer, yellow dots represent hidden layers, and the red dot is the output layer.

The root mean square deviation (RMSD) is used to evaluate model accuracy,

\[ \sigma_{\mathrm{rms}} = \sqrt{ \frac{ \sum_i^N (O_i-L_i)^2 }{N} }, \]

and the ratio of the RMSD of the test set to that of the training set,

\[ r_{\mathrm{set}}=\frac{\sigma_{\mathrm{rms}}^{\mathrm{test}}}{\sigma_{\mathrm{rms}}^{\mathrm{train}}}, \]

is used to evaluate the degree of overfitting. Here, \(O\) is the model prediction, \(L\) is the label value, \(N\) is the total number of data points, and \(\sigma_{\mathrm{rms}}^{\mathrm{test}}\) and \(\sigma_{\mathrm{rms}}^{\mathrm{train}}\) represent the RMSD of the test set and training set, respectively. In this work, \(r_{\mathrm{set}}<1.1\) is regarded as slight overfitting, \(r_{\mathrm{set}}>1.1\) as evident overfitting, \(r_{\mathrm{set}}>1.5\) as relatively severe overfitting, and \(r_{\mathrm{set}}>2\) as severe overfitting.

Following the ANN7 model\(^{[45]}\), this work uses seven inputs as the model features, namely \(Z\), \(N\) (proton number and neutron number), \(Z_{\mathrm{EO}}\), \(N_{\mathrm{EO}}\) (\(Z\) and \(N\) are assigned 1 if odd and 0 if even), \(\Delta Z\), \(\Delta N\) (the differences between \(Z\), \(N\) and the nearest magic numbers), and \(E_{\mathrm{sym}}\) (the symmetry energy in the WS4 model). For the training set, the feature inputs and label-value data are subjected to Z-score standardization (subtracting the mean and dividing by the standard deviation for each group of features), in order to balance the scales of each feature input and output.

During model training, the loss function is used to measure the difference between the predicted values and the label values. Let the input vector be \(I\); applying the model yields the corresponding output vector \(O\). The loss function \(LOSS(O,L)\) is determined jointly by \(O\) and the label value \(L\). The parameters in the neural network are updated through the backpropagation algorithm (BP)\(^{[65]}\). The training process of the model consists of repeated iterations from each input \(I\) into the model to the step in which the model parameters are updated. During training, the Adam optimization algorithm\(^{[66]}\) is used to obtain a better convergence rate.

The magnitude of model-parameter updates is characterized by the learning rate. If the learning rate is too large, the model is difficult to converge because of oscillations; if the learning rate is too small, the convergence speed is slow. Therefore, the choice of learning rate needs to balance convergence speed and model accuracy. Specifically, assuming that the model parameters before updating are \(w\), after substituting the loss function into the backpropagation algorithm, the gradient \(\delta w\) for updating the model parameters is obtained; then the updated model parameters are

\[ w'=w-\delta w\times l_{\mathrm{r}}, \]

where \(l_{\mathrm{r}}\) is the learning rate.

To ensure that the model parameters of the deep network can be updated, Xavier initialization (Xavier normalization initialization)\(^{[67]}\) is used before training to set the initial parameters of the ANN model. Different random-number seeds (seed) are used for Xavier initialization, leading to different random selections during the initialization process, thereby yielding different model parameters and causing differences in the predicted values.

1.2 Bayesian Neural Network (BNN)

Unlike the ANN model, each parameter of the BNN model is represented by a probability distribution, and thus the output is also a probability distribution. In general, the probability distribution of the output physical quantity is represented by the central value and standard deviation (i.e., uncertainty) obtained through sampling. Here, the probability distribution of the model parameters is represented by a Gaussian distribution \(N(\mu,\sigma)\). Therefore, the number of parameters in the BNN model is twice that of the ANN model.

In the training and prediction of the BNN model, for each model parameter

Gaussian distribution \(N(\mu,\sigma)\), obtaining predicted values \(O_1\), \(O_2\), etc., from which the mean \(\overline{O}\) of these predicted values and its uncertainty \(\sigma_O\) are obtained. To ensure the reproducibility of the sampling process of the BNN model, a random seed must be set for the random sampling process. Although the seed affects the predicted values of both the ANN model and the BNN model, the difference is that, in the ANN model, the seed determines the initial parameters of the model through Xavier initialization; in the BNN model, the seed causes differences in each output through random sampling of the model parameters.

The loss function of the BNN model consists of two parts, namely

\[ LOSS_{\mathrm{BNN}} = Loss(\overline{O}, L) + 0.01D_{\mathrm{KL}}. \]

Here, \(Loss(\overline{O}, L)\) denotes the loss function between the predicted mean \(\overline{O}\) and the label value \(L\); \(D_{\mathrm{KL}}(p||q)=\int p(x)\log[p(x)/q(x)]\,dx\) is the Kullback-Leibler Divergence (KL divergence), which is used to evaluate the difference between the predicted distribution \(q\) and the prior distribution \(p\). The training process is the same as that of the ANN model.

2 Computational Details and Results Discussion

2.1 ANN7 Model for Fitting Residuals

The label value used in training the ANN7 model is the residual, defined as the difference between the theoretical value of the physical model and the experimental value:

\[ \Delta S_{\mathrm{p}} = S_{\mathrm{p}}^{\mathrm{Theory}} - S_{\mathrm{p}}^{\mathrm{Exp}}. \]

Here, \(S_{\mathrm{p}}^{\mathrm{Theory}}\) is the predicted value obtained using the WS4 model; \(S_{\mathrm{p}}^{\mathrm{Exp}}\) denotes the experimental value, taken from the National Nuclear Data Center (NNDC) website \({}^{[68]}\).

Because our data selection needs to be compared with the results of the WS4 model, the selected range of nuclear data is the same as that of the WS4 model, namely \(Z,N\ge 8\), with a total of 3287 nuclei. Among them, the method in which 70% of the randomly selected total dataset is used as the training set and the remaining 30% as the test set is denoted as sub70. The neural-network architecture adopts \((7,32,16,1)\) (as shown in Fig. 1: the input layer \(L_1\) has 7 nodes, the two hidden layers \(L_2\) and \(L_3\) have 32 and 16 nodes, respectively, and the output layer \(L_4\) has 1 node). In addition, the ANN7 model is trained using the L1 loss function, defined as

\[ \frac{\sum_i^N |O_i-L_i|}{N}. \]

The learning rate is set to 0.001.

To evaluate the overfitting of the ANN7 model, we select the neural-network training result corresponding to the minimum RMSD of the test set as the final result, as shown in Table 1. It can be seen that the RMSD of the training set is basically distributed between 180 keV and 200 keV, while the RMSD of the test set is between 210 keV and 220 keV; \(r_{\mathrm{set}}\ge 1.14\), indicating that the model has obvious overfitting. The RMSD of the predicted erbium isotopic chain is basically distributed between 100 keV and 150 keV. Clearly, the model with seed 30 describes the erbium isotopic chain best. Unless otherwise specified, the results of the ANN7 model with seed 30 will be discussed below. In addition, we also attempted to fit \(E_B\), but \(r_{\mathrm{set}}\) was approximately 2, indicating that the degree of overfitting was more severe.

Table 1 RMSD of the ANN7 model fitting residuals for predicting \(S_{\mathrm{p}}\) of the erbium isotopic chain, in keV

Training scheme Random seed Training set Test set \(r_{\mathrm{set}}\) Er isotopes
sub70 30 193 220 1.14 109
40 185 215 1.16 121
50 186 217 1.17 143

To examine the influence of different seeds on the results, we use the deviations of the ANN7 model fitting residuals in predicting the erbium isotopic-chain \(S_{\mathrm{p}}\), as shown in Fig. 2. It can be seen that the predictions of models initialized with different seeds fluctuate considerably for the same nucleus, reaching more than 100 keV for some nuclei. This indicates that the model has a certain statistical error.

Figure 2

Fig. 2 Deviations of the ANN7 model fitting residuals in predicting the erbium isotopic-chain \(S_{\mathrm{p}}\) relative to the experimental central values. The purple dashed line, blue dashed line, and green dashed line represent the deviations of the ANN7 model predicted values relative to the experimental values under initialization with seeds 30, 40, and 50, respectively. The experimental values (black line) are taken from the NNDC website, and the shaded area indicates the experimental error range.

2.2 BNN Model for Fitting Experimental Values

Because the error information of the experimental data may be over-learned by the ANN model, leading to overfitting of the model, we therefore introduce the BNN model and adopt the method of fitting experimental values (taking the \(S_{\mathrm{p}}\) data from the NNDC website as label values). The advantage of the BNN model lies in assigning a probability to each predicted value.

distribution, thereby effectively alleviating overfitting.

To avoid the influence of outliers, the loss function used here is the MSE (Mean Squared Error), defined as

\[ \sqrt{\frac{\sum_i^N (O_i-L_i)^2}{N}} . \]

The BNN model adopts two neural-network architectures, \((7,128,1)\) and \((7,32,32,1)\), denoted as \<128> and \<32-32>, respectively. The learning rate is 0.01. The dataset selection method is the same as that of the ANN7 model fitted to residuals.

Table 2 RMSD of the training set, test set, and indium isotopic chain \(S_p\) calculated by the BNN model fitted to experimental values, in keV

Network architecture Random seed Training set Test set \(r_{\mathrm{set}}\) Indium isotopes
\<128> 30 604 691 1.14 324
40 608 690 1.13 319
50 602 687 1.14 257
\<32-32> 30 540 590 1.09 292
40 536 592 1.10 271
50 522 586 1.12 353

Next, the model is evaluated from two aspects: accuracy and overfitting. We use the BNN model fitted to experimental values to calculate the RMSD of the training set, test set, and indium isotopic chain, as shown in Table 2. It can be seen that although the RMSD accuracy of these two BNN models is inferior to that of the ANN7 model, the \(r_{\mathrm{set}}\) of \<32-32> is reduced to below 1.14, smaller than that of the ANN7 model fitted to residuals, effectively alleviating the overfitting problem. Moreover, regarding overfitting, the two-hidden-layer model performs better than the one-hidden-layer model and the ANN7 model fitted to residuals. However, the accuracy in describing the indium isotopic chain \(S_p\) is about 300 keV, which has not yet reached the expected accuracy.

2.3 Improved BNN-BH model

Although the BNN model effectively alleviates overfitting, its accuracy in fitting experimental values is not high. Below, based on the BNN model, we attempt to improve the accuracy by fitting residuals, seeking a solution to the overfitting problem.

First, in view of the fact that uncertainty in label values is one source of overfitting, we incorporate it into the BNN model to alleviate overfitting. Second, by considering multiple physical quantities, the model can learn more information. We adopt a scheme that simultaneously fits \(E_B\) and \(S_p\). Furthermore, considering that the uncertainty of the BNN model is only part of the total error, additional error sources must be included. In fact, in addition to the uncertainty of the BNN model, we also consider statistical error. Thus, we obtain the improved BNN-BH model.

For the first point above, we include the uncertainty of the label values as weights in the loss function. Specifically, the difference between the predicted value and the label value in the loss function is multiplied by the reciprocal of the label-value uncertainty, \(1/\sigma_{Exp}\), to increase the weight of high-precision label values in model training, thereby alleviating overfitting. Note that, to avoid the drawback that excessively small uncertainty for some label values prevents the model from learning all samples, we set the label-value uncertainties smaller than 20 keV to 20 keV.

For the second point above, in addition to fitting \(E_B\) and \(S_p\) simultaneously, we also extend the source of residuals to five mass models. Specifically, we train five mass models—including WS4, BW2[^69], FRDM2012[^17], KTUY05[^70], and RMF[^71]—on the residuals of \(E_B\) and \(S_p\), whose experimental values are taken from AME2020 and the NNDC website, respectively. As a result, the number of nuclei in the total dataset covered by the five models and experimental values is reduced to 3056. Below, the residual predictions and uncertainties of the BNN-BH model for the five mass models are integrated into a unified weighted average, defined as follows:

\[ O(Z,N)= \frac{ \sum_i^{n_{\mathrm{model}}} \frac{O_i(Z,N)}{\sigma_i^2(Z,N)} }{ \sum_i^{n_{\mathrm{model}}} \frac{1}{\sigma_i^2(Z,N)} }, \tag{1} \]

\[ \sigma(Z,N)= \sqrt{ \frac{n_{\mathrm{model}}}{ \sum_i^{n_{\mathrm{model}}} \frac{1}{\sigma_i^2(Z,N)} } }, \tag{2} \]

where \(O_i\) and \(\sigma_i\) are, respectively, the predicted value and uncertainty of the BNN-BH model for the label value \(L_i\) of the \(i\)-th mass model, and \(n_{\mathrm{model}}=5\).

For the third point above, the error sources are not unique. In addition to the uncertainty \(\sigma_{\mathrm{BNN}}\) provided by the BNN model, we must also consider the statistical error \(\sigma_{\mathrm{stat}}\). Therefore, the total uncertainty \(\sigma_{\mathrm{total}}\) consists of these two parts, namely

\[ \sigma_{\mathrm{total}} = \chi_\nu \sqrt{\sigma_{\mathrm{BNN}}^2+\sigma_{\mathrm{stat}}^2}, \tag{3} \]

where \(\chi_\nu\) is the amplification factor of the uncertainty, expressed as

\[ \chi_\nu = \sqrt{ \sum_i^N \frac{1}{N} \frac{(O_i-L_i)^2}{\sigma_{\mathrm{mix},i}^2} }, \tag{4} \]

where the summation is only over the isotopic chain that needs to be predicted. In addition, \(\sigma_{\mathrm{mix}}\) is the mixed error, defined as \(\sigma_{\mathrm{mix}}^2=\sigma_{\mathrm{stat}}^2+\sigma_{\mathrm{BNN}}^2\).

To obtain the uncertainty $\sigma_{\mathrm{BNN}}$ of the model predictions, we construct 20 BNN models with the neural-network architecture $(7,32,16,10)$, using 20 different seeds to initialize the sampling processes of these models and train the models. Then, for the same nucleus, these models yield 20 different mean values $S_{\mathrm{p}}^{\mathrm{seed}}$ and uncertainties $\sigma_{\mathrm{seed}}$; using Eqs. (1) and (2), the 20 predicted values and uncertainties are combined to obtain the weighted average $S_{\mathrm{p}}^{\mathrm{BNN}}$ and uncertainty $\sigma_{\mathrm{BNN}}$. Furthermore, we consider the statistical error $\sigma_{\mathrm{stat}}$. Here, only the differences in the model predictions caused by initialization with different seeds are included, defined as follows:

\[ \sigma_{\mathrm{stat}}^{2} = \frac{ \sum\limits_{\mathrm{seed}} \frac{\left(S_{\mathrm{p}}^{\mathrm{seed}}-S_{\mathrm{p}}^{\mathrm{BNN}}\right)^{2}}{\sigma_{\mathrm{seed}}^{2}} }{ \sum\limits_{\mathrm{seed}} 1/\sigma_{\mathrm{seed}}^{2} }. \tag{5} \]

The BNN-BH model improved in this way has the following advantages. On the one hand, the sources of error are considered more fully: both the uncertainty $\sigma_{\mathrm{BNN}}$ of the model prediction and the statistical error $\sigma_{\mathrm{stat}}$ are taken into account. On the other hand, by restricting the calculation of $\chi_\nu$ to local nuclei, the uncertainty in $S_{\mathrm{p}}$ for nuclei in that region can be predicted more accurately.

Table 3 Comparison between the predicted values and experimental values of $S_{\mathrm{p}}$ for some nuclei near the proton drip line in the lutetium isotopic chain by the BNN-BH model, in keV

Nucleus Predicted value Experimental value
$^{148}\mathrm{Lu}$ $-2017(98)$
$^{149}\mathrm{Lu}$ $-1900(73)$ $-1920(20)$
$^{150}\mathrm{Lu}$ $-1397(69)$ $-1270(23)$
$^{151}\mathrm{Lu}$ $-1239(73)$ $-1241(18)$

In view of the importance of the recently observed $^{149}\mathrm{Lu}$ in nuclear structure and nuclear astrophysics, and because the neighboring nuclide $^{148}\mathrm{Lu}$ will become the next candidate for proton emission to be measured, we use the newly developed BNN-BH model to predict the $S_{\mathrm{p}}$ values of $^{148-151}\mathrm{Lu}$, as shown in Table 3. It can be seen that the BNN-BH model gives very good agreement with experiment for the predicted $S_{\mathrm{p}}$ values of $^{149}\mathrm{Lu}$ and $^{151}\mathrm{Lu}$. Although the deviation of the predicted value for $^{150}\mathrm{Lu}$ is relatively large, it is significantly improved compared with the deviation of the BML model (as shown in Fig. 3). At the same time, the uncertainties of the predicted $S_{\mathrm{p}}$ values for the four nuclei from $^{148}\mathrm{Lu}$ to $^{151}\mathrm{Lu}$ are similar, and no large increase due to an increased extrapolation distance appears. This indicates that the BNN-BH model has relatively stable extrapolation capability for nuclei near the proton drip line in the lutetium isotopic chain.

At present, the best model that can provide uncertainties is the BML model. This provides a reference for evaluating the effectiveness of the BNN-BH model. Figure 3 gives the deviations of the $S_{\mathrm{p}}$ values predicted by the BNN-BH model and the BML model for the lutetium isotopic chain. The results show that, among the 38 nuclei in the lutetium isotopic chain, the predicted values of the BNN-BH model for 10 nuclei have deviations within the range from $1\sigma$ to $2\sigma$. It can be seen that the uncertainty estimates of the BNN-BH model predictions (the orange shaded region in Fig. 3) are relatively reliable. For the experimentally unobserved $^{148}\mathrm{Lu}$, the uncertainty of the BNN-BH model prediction for its $S_{\mathrm{p}}$ is 98.44 keV, below 100 keV. In addition, for the predicted $S_{\mathrm{p}}$ values in the lutetium isotopic chain, the BNN-BH model has $\sigma_{\mathrm{rms}}=97.65$ keV, which is close to the uncertainty of the predicted value for the as-yet unmeasured $^{148}\mathrm{Lu}$, indicating that the BNN-BH model has stable extrapolation capability. Thus, the BNN-BH model has high predictive accuracy for the lutetium isotopic chain and can stably extrapolate the unknown nuclide $^{148}\mathrm{Lu}$; compared with the BML model (the blue shaded region in Fig. 3), it has more stable extrapolation capability.

Figure 3 Deviations of the $S_{\mathrm{p}}$ values in the lutetium isotopic chain described by the improved BNN-BH model, the BML model, and the ANN7 model. The blue and orange shaded regions represent the uncertainty ranges of the BML model and BNN-BH model, respectively; the blue, orange, and purple dashed lines represent the predicted values of the BML model, BNN-BH model, and ANN7 model, respectively. The experimental values (black line) are taken from the NNDC website, and the gray shaded region represents the experimental-error range. Since there is no experimental value for $^{148}\mathrm{Lu}$, its deviation is set to zero.

As mentioned above, in terms of accuracy the BNN model fitted to the experimental values is not as good as the ANN7 model fitted to residuals. Therefore, taking the lutetium isotopic chain as an example, we compare the improved BNN-BH model with the ANN7 model; the calculation results are shown in Table 4. The results show that, on the training set and the test set, the difference in RMSD between the ANN7 model and the BNN-BH model is not large; however, the BNN-BH model has a smaller $r_{\mathrm{set}}$, indicating that it can more effectively alleviate the overfitting problem. At the same time, the RMSD of the ANN7 model is greater than 100 keV, whereas the RMSD of the BNN-BH model is less than 100 keV. It can be seen that, in alleviating…

has advantages over the ANN7 model in both overfitting and accuracy improvement.

Table 4 RMSD of the residuals after fitting $S_{\mathrm{p}}$ with the BNN-BH model and the ANN7 model, respectively, for the training set, test set, and rhenium isotopic chain; unit: keV

Model Training set Test set $r_{\mathrm{set}}$ Rhenium isotopic chain
ANN7 193 220 1.14 109
BNN-BH 195 211 1.08 98

To demonstrate the feasibility of the BNN-BH model, we further carry out a systematic analysis of the rhenium isotopic chain and its neighboring nuclei. Taking the rhenium isotopic chain ($Z=75$) as an example, the deviations of its $S_{\mathrm{p}}$ are shown in Fig. 4. Near the proton drip line, for example, for the proton emitters $^{160,161}\mathrm{Re}$, the uncertainty of the BML model predictions gradually expands to nearly 400 keV; whereas the uncertainties of the BNN-BH model predictions are 86.9 keV and 81.2 keV, respectively, both smaller than 100 keV. Moreover, the prediction uncertainties of the BNN-BH model can also well cover the experimental values of these two nuclei. It can thus be seen that the BNN-BH model can also predict proton-emitting nuclei in the rhenium isotopic chain with comparatively high accuracy.

Figure 4

Figure 4 Same as Fig. 3, but describing the deviations of $S_{\mathrm{p}}$ for the rhenium isotopic chain ($Z=75$). The green lines represent proton emitters.

3 Summary

The ANN7 model based on fitting residuals and the BNN model based on fitting experimental values suffer from overfitting and accuracy problems, respectively. To solve the accuracy problem and alleviate overfitting, we comprehensively consider the uncertainty of label values and the statistical error of the model, obtaining the improved BNN-BH model. Practice shows that these improvements are effective. For the still unmeasured $^{148}\mathrm{Lu}$ and the measured proton emitters $^{160,161}\mathrm{Re}$, the uncertainties of the $S_{\mathrm{p}}$ predictions (98.44 keV, 86.9 keV, and 81.2 keV) of this model all reach below 100 keV for the first time, and are significantly lower than those of the BML model. This will provide effective theoretical support for subsequent estimates of the lifetimes of proton-emitting nuclei in the rhenium isotopic chain and other nearby nuclei.

4 Acknowledgements

We thank Professor Geng Lizheng for discussions and suggestions on the issue of statistical errors.

We thank Professor Niu Zhongming for discussions and suggestions on the error analysis of the BNN model.

References

[1] TANIHATA I, HAMAGAKI H, HASHIMOTO O, et al. Phys Rev Lett, 1985, 55(24): 2676–2679. https://www.scopus.com/inward/record.uri?eid=2-s2.0-4243333684&doi=10.1103%2FPhysRevLett.55.2676&partnerID=40&md5=46319023cce8e7c88e80793263ce5154. DOI: 10.1103/PhysRevLett.55.2676.

[2] KOBAYASHI N, NAKAMURA T, KONDO Y, et al. Phys Rev Lett, 2014, 112: 242501. https://link.aps.org/doi/10.1103/PhysRevLett.112.242501.

[3] NAKAMURA T, et al. Phys Rev Lett, 2009, 103: 262501. DOI: 10.1103/PhysRevLett.103.262501.

[4] ZHANG S S, SMITH M S, KANG Z S, et al. Phys Lett B, 2014, 730: 30. DOI: 10.1016/j.physletb.2014.01.023.

[5] OZAWA A, KOBAYASHI T, SUZUKI T, et al. Phys Rev Lett, 2000, 84: 5493. https://link.aps.org/doi/10.1103/PhysRevLett.84.5493.

[6] DELION D, LIOTTA R, WYSS R. Physics Reports, 2006, 424(3): 113. https://www.sciencedirect.com/science/article/pii/S0370157305004655. DOI: https://doi.org/10.1016/j.physrep.2005.11.001.

[7] HE M, ZHANG S S, KUSAKABE M, et al. The Astrophysical Journal, 2020, 899(2): 133. https://dx.doi.org/10.3847/1538-4357/aba7b4.

[8] ZHANG S, XU S, HE M, et al. Eur Phys J A, 2021, 57(4): 114[2025-06-12]. https://link.springer.com/10.1140/epja/s10050-021-00434-7.

[9] XIAO Y, XU S Z, ZHENG R Y, et al. Phys Lett B, 2023, 845: 138160. DOI: 10.1016/j.physletb.2023.138160.

[10] QI C, LIOTTA R, WYSS R. Progress in Particle and Nuclear Physics, 2019, 105: 214. https://www.sciencedirect.com/science/article/pii/S0146641018301017. DOI: https://doi.org/10.1016/j.ppnp.2018.11.003.

[11] AURANEN K, BRISCOE A D, FERREIRA L S, et al. Phys Rev Lett, 2022, 128: 112501. https://link.aps.org/doi/10.1103/PhysRevLett.128.112501.

[12] LI L, MENG J, RING P, et al. Phys Rev C, 2012, 85: 024312. https://link.aps.org/doi/10.1103/PhysRevC.85.024312.

[13] XIAO Y, XU S Z, ZHENG R Y, et al. Phys Lett B, 2023, 845: 138160. https://www.sciencedirect.com/science/article/pii/S037026932300494X. DOI: https://doi.org/10.1016/j.physletb.2023.138160.

[14] ZHANG K, ZHANG S, MENG J. Phys Rev C, 2023, 108(4). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85174524020&doi=10.1103%2fPhysRevC.108.L041301&partnerID=40&md5=ab6886dd7589c5642556e6f969901b1e. DOI: 10.1103/PhysRevC.108.L041301.

[15] LU Q, ZHANG K Y, ZHANG S S. arXiv: 2406.09803, 2024.

[16] DUFLO J, ZUKER A. Phys Rev C, 1995, 52: R23. https://link.aps.org/doi/10.1103/PhysRevC.52.R23.

[17] MÖLLER P, MYERS W D, SAGAWA H, et al. Phys Rev Lett, 2012, 108 5: 052501. https://api.semanticscholar.org/CorpusID:32876864.

[18] GORIELY S, CHAMEL N, PEARSON J M. Phys Rev C, 2016, 93: 034337. https://link.aps.org/doi/10.1103/PhysRevC.93.034337.

[19] ZHAO P W, LI Z P, YAO J M, et al. Phys Rev C, 2010, 82: 054319. https://link.aps.org/doi/10.1103/PhysRevC.82.054319.

[20] WANG N, LIU M, WU X, et al. Phys Lett B, 2014, 734: 215. https://www.sciencedirect.com/science/article/pii/S037026931400358X. DOI: https://doi.org/10.1016/j.physletb.2014.05.049.

[21] WANG M, AUDI G, WAPSTRA A H, et al. Chin Phys C, 2012, 36(12): 1603. DOI: 10.1088/1674-1137/36/12/003.

[22] WANG M, HUANG W, KONDEV F, et al. Chin Phys C, 2021, 45(3): 030003. https://dx.doi.org/10.1088/1674-1137/abddaf.

[23] GAZULA S, CLARK J, BOHR H. Nuclear Physics A, 1992, 540(1): 1. https://www.sciencedirect.com/science/article/pii/037594749290191L. DOI: https://doi.org/10.1016/0375-9474(92)90191-L.

[24] ATHANASSOPOULOS S, MAVROMMATIS E, GERNOTH K A, et al. arXiv: nucl-th/0509075, 2005. https://arxiv.org/abs/nucl-th/0509075.

[25] CLARK J W, LI H. International Journal of Modern Physics B, 2006, 20(30n31): 5015. https://doi.org/10.1142/S0217979206036053.

[26] DONG X, AN R, LU J X, et al. Phys Rev C, 2021. https://api.semanticscholar.org/CorpusID:237571528.

[27] AN R, DONG X X, CAO L G, et al. arXiv: 2112.03829, 2023. https://arxiv.org/abs/2112.03829.

[28] DONG X X, AN R, LU J X, et al. Phys Lett B, 2023, 838: 137726. DOI: 10.1016/j.physletb.2023.137726.

[29] CAO Y Y, GUO J Y, ZHOU B. Nuclear Science and Techniques, 2023, 34(10): 152[2023-11-15]. https://link.springer.com/10.1007/s41365-023-01308-x.

[30] CHENG R Y, GODBEY K, NIU Y B, et al. Phys Rev C, 2025, 111: 064315. https://link.aps.org/doi/10.1103/4ccs-66c6.

[31] FERREIRA M, BEJGER M. Phys Rev D, 2025, 111: 023035. https://link.aps.org/doi/10.1103/PhysRevD.111.023035.

[32] FUJIMOTO Y, FUKUSHIMA K, KAMATA S, et al. Phys Rev D, 2024, 110: 034035. https://link.aps.org/doi/10.1103/PhysRevD.110.034035.

[33] LASSERI R D, REGNIER D, FROSINI M, et al. Phys Rev C, 2024, 109: 064612. https://link.aps.org/doi/10.1103/PhysRevC.109.064612.

[34] LUNDBERG S, LEE S I. arXiv: 1705.07874, 2017. https://arxiv.org/abs/1705.07874.

[35] MUMPOWER M R, SPROUSE T M, LOVELL A E, et al. Phys Rev C, 2022, 106: L021301. https://link.aps.org/doi/10.1103/PhysRevC.106.L021301.

[36] YÜKSEL E, SOYDANER D, BAHTIYAR H. Phys Rev C, 2024, 109: 064322. https://link.aps.org/doi/10.1103/PhysRevC.109.064322.

[37] CARNINI M, PASTORE A. Journal of Physics G: Nuclear and Particle Physics, 2020, 47(8): 082001. https://dx.doi.org/10.1088/1361-6471/ab92e3.

[38] GAO Z P, WANG Y J, Lü H L, et al. Nuclear Science and Techniques, 2021, 32(10): 109[2022-08-03]. https://link.springer.com/10.1007/s41365-021-00956-1.

[39] WU X, GUO L, ZHAO P. Phys Lett B, 2021, 819: 136387. https://www.sciencedirect.com/science/article/pii/S0370269321003270. DOI: https://doi.org/10.1016/j.physletb.2021.136387.

[40] WU X H, PAN C, ZHANG K Y, et al. Phys Rev C, 2024, 109: 024310. https://link.aps.org/doi/10.1103/PhysRevC.109.024310.

[41] MUNOZ J M, UDRESCU S M, RUIZ R F G. Discovering Nuclear Models from Symbolic Machine Learning[M/OL]. arXiv, 2024[2024-04-19]. http://arxiv.org/abs/2404.11477.

[42] YüKSEL E, SOYDANER D, BAHTIYAR H. International Journal of Modern Physics E, 2021, 30(03): 2150017. https://doi.org/10.1142/S0218301321500178.

[43] BAHTIYAR H, SOYDANER D, YüKSEL E. Applied Soft Computing, 2022, 128: 109470. https://www.sciencedirect.com/science/article/pii/S1568494622005762. DOI: https://doi.org/10.1016/j.asoc.2022.109470.

[44] LI C Q, TONG C N, DU H J, et al. Phys Rev C, 2022, 105(6): 064306. DOI: 10.1103/PhysRevC.105.064306.

[45] ZENG L X, YIN Y Y, DONG X X, et al. Phys Rev C, 2024, 109(3):

  1. DOI: 10.1103/PhysRevC.109.034318.

[46] LIU H, LEI J, REN Z. From Complexity to Clarity: Kolmogorov-Arnold Networks in Nuclear Binding Energy Prediction[M/OL]. arXiv, 2024[2024-07-31]. http://arxiv.org/abs/2407.20737.

[47] SHI M, NIU Z M, LIANG H Z. Chin Phys C, 2019, 43(7): 074104. DOI: 10.1088/1674-1137/43/7/074104.

[48] NIU Z M, SUN B H, LIANG H Z, et al. Phys Rev C, 2016, 94(5): 054315. DOI: 10.1103/PhysRevC.94.054315.

[49] ZHENG J S, WANG N Y, WANG Z Y, et al. Phys Rev C, 2014, 90(1): 014303. DOI: 10.1103/PhysRevC.90.014303.

[50] NIU Z M, ZHU Z L, NIU Y F, et al. Phys Rev C, 2013, 88: 024325. DOI: 10.1103/PhysRevC.88.024325.

[51] WANG N, LIU M. Phys Rev C, 2011, 84(5): 051303[2025-06-16]. https://link.aps.org/doi/10.1103/PhysRevC.84.051303.

[52] SHELLEY M, PASTORE A. Universe, 2021, 7(5). https://www.mdpi.com/2218-1997/7/5/131.

[53] QIU M, CAI B J, CHEN L W, et al. Phys Lett B, 2024, 849: 138435. https://www.sciencedirect.com/science/article/pii/S0370269323007682. DOI: https://doi.org/10.1016/j.physletb.2023.138435.

[54] UTAMA R, PIEKAREWICZ J, PROSPER H B. Phys Rev C, 2016, 93: 014311. https://link.aps.org/doi/10.1103/PhysRevC.93.014311.

[55] NIU Z, LIANG H. Phys Lett B, 2018, 778: 48. https://www.sciencedirect.com/science/article/pii/S0370269318300091. DOI: https://doi.org/10.1016/j.physletb.2018.01.002.

[56] NIU Z M, FANG J Y, NIU Y F. Phys Rev C, 2019, 100(5): 054311. DOI: 10.1103/PhysRevC.100.054311.

[57] LOVELL A E, MOHAN A T, SPROUSE T M, et al. Phys Rev C, 2022, 106: 014305. https://link.aps.org/doi/10.1103/PhysRevC.106.014305.

[58] MUMPOWER M, LI M, SPROUSE T M, et al. Front in Phys, 2023, 11: 1198572. DOI: 10.3389/fphy.2023.1198572.

[59] GARVEY G T, KELSON I. Phys Rev Lett, 1966, 16: 197. https://link.aps.org/doi/10.1103/PhysRevLett.16.197.

[60] GARVEY G T, GERACE W J, JAFFE R L, et al. Reviews of Modern Physics, 1969, 41: 1. https://api.semanticscholar.org/CorpusID:120300336.

[61] NIU Z M, LIANG H Z. Phys Rev C, 2022, 106: L021303. https://link.aps.org/doi/10.1103/PhysRevC.106.L021303.

[62] LU Y, SHANG T, DU P, et al. arXiv: 2404.14948, 2024. https://arxiv.org/abs/2404.14948.

[63] WANG M, AUDI G, KONDEV F, et al. Chin Phys C, 2017, 41(3): 481.

[64] HENDRYCKS D, GIMPEL K. arXiv: 1606.08415, 2023. https://arxiv.org/abs/1606.08415.

[65] RUMELHART D E, HINTON G E, WILLIAMS R J. Nature, 1986, 323(6088): 533[2023-10-16]. https://www.nature.com/articles/323533a0. DOI: 10.1038/323533a0.

[66] KINGMA D P, BA J. arXiv: 1412.6980, 2017. https://arxiv.org/abs/1412.6980.

[67] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C/OL]//TEH Y W, TITTERINGTON M. Proceedings of Machine Learning Research: volume 9 Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Chia Laguna Resort, Sardinia, Italy: PMLR, 2010: 249. https://proceedings.mlr.press/v9/glorot10a.html.

[68] NNDC | National Nuclear Data Center[EB/OL]. [2025-06-12]. https://www.nndc.bnl.gov/.

[69] KIRSON M W. Nuclear Physics A, 2008, 798(1): 29. https://www.sciencedirect.com/science/article/pii/S0375947407007531. DOI: https://doi.org/10.1016/j.nuclphysa.2007.10.011.

[70] KOURA H, TACHIBANA T, UNO M, et al. Progress of Theoretical Physics, 2005, 113(2): 305[2024-07-29]. https://academic.oup.com/ptp/article-lookup/doi/10.1143/PTP.113.305.

[71] GENG L, TOKI H, MENG J. Progress of Theoretical Physics, 2005, 113(4): 785. https://doi.org/10.1143/PTP.113.785.

Exactly Predicting the Proton Separation Energy of Lu and Re Chain via Optimized Neural Network Models

ZHANG Shi-Sheng$^{1}$, ZENG Lin-Xing$^{1}$

(1. School of Physics, Beihang University, Beijing 102206, China)

Abstract: Nuclear mass plays an important role in nuclear structure and nuclear astrophysics. For exotic nuclei far away from the stability valley, a difference of hundreds of keV in the separation energy given by the nuclear mass may lead to at least three orders of magnitude difference in the half-lives of emitters. At present, $^{149}\mathrm{Lu}$ is the proton emitter with the shortest lifetime discovered in experiments. The separation energies of neighboring undetected nuclei will influence the nucleosynthesis path. To this end, we apply an Artificial Neural Network (ANN) model and a Bayesian Neural Network (BNN) model to constrain the proton separation energy, and practice indicates that the two approaches have overfitting and precision problems, respectively. In order to improve the precision and mitigate the overfitting problem, based on the BNN model, and taking into account both the uncertainties of the labels and the statistical error of the models, we develop the BNN-Beihang (BH) model. The uncertainties of $^{148}\mathrm{Lu}$ and $^{160,161}\mathrm{Re}$ predicted by the model decreased to less than 100 keV for the first time. These results will effectively improve the precision of theoretical predictions for the lifetimes of Lu- and Re-chain proton emitters.

Key words: proton separation energy; neural network model; Lu chain; Re chain

Received date: 15 Jul. 2025; Revised date: 15 Jul. 2025
Foundation item: National Natural Science Foundation of China (12175010)
Corresponding author: ZHANG Shi-Sheng, E-mail: zss76@buaa.edu.cn

Submission history

Article No.: 1007-4627(2025)03-0001-10