Analysis of Clock Difference Prediction Algorithms for Domestic Optically Pumped Cesium Atomic Clocks (Postprint)
Du Hongqiang, Gong Jianjun, Wu Dan, Qu Lili, Wu Wenjun, Dong Shaowu, Zhang Shougang
Submitted 2025-10-11 | ChinaXiv: chinaxiv-202510.00051

Abstract

Cesium atomic clocks are the key core equipment in modern atomic timekeeping. In recent years, the domestically produced optically pumped cesium atomic clock TA1000 has been widely applied in timekeeping operations. The clock difference prediction algorithm for atomic clocks is one of the factors affecting timekeeping performance. Different types of atomic clocks exhibit different noise characteristics, and noise affects the stability and accuracy of clock difference prediction. To investigate clock difference prediction algorithms suitable for TA1000, three classical clock difference prediction algorithms applied to cesium clocks are analyzed, including the first-order Linear Regression (LR) model, the Autoregressive Integrated Moving Average (ARIMA) model, and the Kalman model. By utilizing clock difference sequences with different sampling durations for modeling, the subsequent clock difference data for four different durations are predicted. Based on this, the clock difference prediction effects of the three models are analyzed and compared, and the advantages and disadvantages of each clock difference prediction model when applied to TA1000 are summarized. Experiments show that among the three models, the Autoregressive Integrated Moving Average model is more suitable for short-term clock difference prediction of TA1000.

Full Text

Preamble

Vol. 66 No. 5
Sept., 2025

Acta Astronomica Sinica
doi: 10.15940/j.cnki.0001-5245.2025.05.011

Analysis of Clock Difference Prediction Algorithm for Domestic Optically-pumped Cesium Atomic Clock

DU Hong-qiang¹, GONG Jian-jun¹, WU Dan¹, QU Li-li¹, WU Wen-jun¹, DONG Shao-wu¹,²†, ZHANG Shou-gang¹,²

(1 Key Laboratory of Time Reference and Applications, Chinese Academy of Sciences, Xi'an 710600)
(2 School of Astronomy and Space Science, University of Chinese Academy of Sciences, Beijing 100049)

Abstract

Cesium atomic clocks are crucial core equipment in modern atomic timekeeping. In recent years, the domestic optically-pumped cesium atomic clock TA1000 has been widely applied in timekeeping operations. The clock difference prediction algorithm for atomic clocks is one of the factors affecting timekeeping performance. Different types of atomic clocks exhibit different noise characteristics, which influence the stability and accuracy of clock difference predictions. To investigate suitable clock difference prediction algorithms for the TA1000, we analyzed three classical algorithms applied to cesium clocks: the first-order linear regression (LR) model, the autoregressive integrated moving average (ARIMA) model, and the Kalman model. By utilizing clock difference sequences of varying sampling durations for modeling, we predicted the next four clock difference datasets of different durations. Based on this, we analyzed and compared the clock difference prediction effects of the three models, summarizing their respective advantages and disadvantages when applied to the TA1000. Experiments demonstrate that among the three models, the ARIMA model is more suitable for short-term clock difference prediction of the TA1000.

Keywords time, methods: statistical, methods: data analysis

1 Introduction

Time and frequency signals are widely used in major national defense projects such as navigation and positioning, weapon systems, precision strikes, and coordinated operations, as well as in critical civilian applications including 5G communications, finance, power grid synchronization, and seismic monitoring. Time and frequency standard signals are typically generated and maintained by timekeeping systems, and the quality of these signals largely depends on the composition of the timekeeping system. Small cesium clocks are one of the core devices in modern timekeeping, utilizing quantum transitions of cesium atoms to generate standard time and frequency signals. Small cesium clocks exhibit excellent frequency accuracy and stability performance, with compact size and light weight, making them the most numerous and widely used atomic clocks for timekeeping internationally.

Many countries have established independent standard time and frequency service systems, with each timekeeping laboratory needing to use multiple atomic clocks to form a clock ensemble for establishing and maintaining a time reference. The National Time Service Center (NTSC) of the Chinese Academy of Sciences has established a timekeeping system to independently maintain the local Coordinated Universal Time (UTC). Traditionally, timekeeping laboratories worldwide have used the American magnetic state-selection small cesium clock 5071A. In recent years, due to tense international situations and to meet the strategic demand for autonomous generation of national standard time, the number of domestic optically-pumped cesium clocks TA1000 has been continuously increasing, gradually replacing the 5071A.

An important task in timekeeping is clock difference prediction, which involves predicting the future performance of atomic clocks based on their historical clock difference data. Clock difference prediction algorithms form the basis for comprehensive atomic time scale algorithms and master clock frequency steering algorithms. The International Bureau of Weights and Measures has already adopted predictability-based weighting for atomic clocks from various timekeeping laboratories worldwide. Different types of atomic clocks have different noise characteristics, which necessitates research into suitable short-term clock difference prediction algorithms for the TA1000 as it is gradually applied to timekeeping operations, although relevant research is currently lacking. To ensure the continuity, stability, and reliability of UTC(NTSC), we selected clock difference prediction algorithms that have been applied in practical operations, including the first-order linear regression (LR) model, the autoregressive integrated moving average (ARIMA) model, and the Kalman model.

2 Optically-pumped Cesium Clock TA1000

The domestic demand for small cesium clocks is substantial, but they have primarily relied on imports. In recent years, domestic research institutions have successively conducted research on cesium atomic clocks to avoid technological bottlenecks. Significant progress has been made in key technology development, with several prototypes produced. However, constrained by core components such as electron multipliers, there remains a considerable gap before engineering and mass production can be achieved. With advances in laser technology, laser-pumped cesium beam tubes completely avoid the extremely difficult technologies and processes of magnetic state-selection and electron multipliers, gradually showing a trend to replace traditional magnetic state-selection small cesium clocks. The adoption of optically-pumped schemes has accelerated the domestic production of small cesium clocks, and the TA1000 has now achieved mass production. Compared with foreign magnetic state-selection small cesium clocks, optically-pumped cesium clocks utilize lasers for atomic state preparation and transition detection, achieving higher atomic utilization efficiency. Theoretical calculations indicate that the frequency stability of optically-pumped small cesium clocks is approximately one order of magnitude higher than that of traditional magnetic state-selection small cesium clocks.

[FIGURE:1] shows a schematic diagram of the TA1000 appearance. The TA1000 consists of three parts: the circuit system, the optical system, and the physical system. The laser system unit is responsible for the preparation and detection of atomic beam quantum states. The cesium beam tube unit serves as the physical system of the cesium clock, providing a vacuum environment for laser-microwave-atom interactions and generating Ramsey transition signals of the cesium atomic beam. The circuit system processes the atomic clock transition signals, generates locking signals, and ultimately outputs high-stability time and frequency reference signals.

3 Clock Difference Prediction Algorithms

A good atomic clock should be predictable. For a highly predictable atomic clock, the deviation between predicted and actual measured clock difference values is small. During clock difference data preprocessing, missing data can be filled in or abnormal data can be promptly detected based on clock difference prediction algorithms. Maintaining a continuous, stable, and reliable time reference requires timely detection of anomalies through clock difference data when atomic clocks malfunction, followed by reasonable adjustment of atomic clock weights according to the anomaly conditions to minimize the impact on the time reference.

[FIGURE:1] Schematic diagram of the optically-pumped cesium atomic clock TA1000

Clock difference data is obtained by comparing sub-clocks within the timekeeping ensemble with a reference clock through a time interval counter, representing equally-spaced phase measurements. The greatest advantage of cesium atomic clocks is their near-zero drift. Analysis of TA1000 clock difference data trends reveals that clock difference data generally exhibits linear trends. Additionally, relevant studies have shown that fluctuations in atomic clock differences result from the linear superposition of various noise types, all of which provide feasibility for clock difference prediction. Through mathematical modeling and analysis of clock difference data, the accuracy of atomic clock difference prediction can be improved.

3.1 First-order Linear Regression Model

Linear regression refers to using a computer to learn a pattern from given data, where the regression coefficients of this pattern must be linear, and then using the learned model to predict new data. The first-order linear regression model has only one independent variable $x$ and dependent variable $y$, without non-first-order terms such as squares. The expression is

$$y = kx + b \tag{1}$$

The goal of the first-order linear regression model is to determine the regression coefficient $k$ and constant term $b$ such that the model predictions are as close as possible to the true observed values. The model allows for errors between predicted and true values; actual values may lie above or below the regression line. As the amount of data increases, the average of these errors approaches zero, meaning residuals follow a normal distribution with zero expectation. The parameters $k$ and $b$ are obtained through the least squares method. The optimization objective of the first-order linear regression model is to minimize the mean square error, expressed as

$$J(k;b) = \sum_{i=1}^{m}(y_i - \hat{y}_i) \tag{2}$$

where $J(k;b)$ is the total sum of squared deviations between all observed values and their corresponding regression estimates, $y_i$ represents observed values, $\hat{y}_i$ represents corresponding regression estimates, and $m$ is the sample size. Taking partial derivatives of the mean square error function with respect to $k$ and $b$ and setting them equal to zero yields the optimal solution.

3.2 Autoregressive Integrated Moving Average Model

The autoregressive integrated moving average (ARIMA) model is one of the most common statistical models used for time series prediction, consisting of three components: the autoregressive model, the moving average model, and differencing. The ARIMA model has three parameters and can be expressed as ARIMA($p$, $d$, $q$), where $p$ is the number of autoregressive terms, $q$ is the number of moving average terms, and $d$ is the number of differencing operations required to make the series stationary.

The autoregressive moving average (ARMA) model combines autoregressive and moving average components, containing $p$ autoregressive terms and $q$ moving average terms, expressed as:

$$y_t = \phi + \sum_{i=1}^{p}\phi_i y_{t-i} + \epsilon_t + \sum_{i=1}^{q}\theta_i \epsilon_{t-i} \tag{3}$$

where $y_t$ represents the value at time $t$, $\phi$ is the constant term, $\phi_i$ are autoregressive coefficients, $\epsilon_t$ is the white noise error term at time $t$, and $\theta_i$ are moving average coefficients.

The ARMA model indicates that a random time series can be composed of its own historical values and random disturbance terms. If the series is stationary—meaning its trend does not change over time—future moments can be predicted using historical values of the series.

The ARIMA model combines the ARMA model with differencing, transforming data into stationary data through differencing, then establishing a model that regresses the differenced stationary data only on its historical values and present and lagged values of random error terms. The model's advantages include simplicity and ease of training, requiring only endogenous variables without exogenous variables. Its disadvantages include the requirement that time series data be stable or become stable after differencing, and that the model can only capture linear relationships, not nonlinear ones.

3.3 Kalman Model

Kalman Filtering (KF) is an algorithm that uses linear system state equations to optimally estimate system states from a set of measurements containing system noise and interference noise. Kalman filtering calculates the joint distribution at different times based on measurement values at various moments to estimate unknown factors, making it more accurate than estimation methods based on single measurements alone.

The Kalman filtering algorithm calculates the current state estimate using the previous state estimate and current state observation values. The algorithm consists of two parts—prediction and update—which iterate continuously to obtain optimal estimates. In the prediction phase, the previous state estimate is used to predict the current state, including uncertainty estimates. In the update phase, the current state observation values and prediction values obtained from the prediction phase are weighted and averaged for optimization, where observation values contain certain errors such as random noise. Variables with higher certainty receive greater weight, yielding a more accurate new estimate.

Let $\hat{X}{t|j}$ denote the state estimate at time $t$ given information up to time $j$. The Kalman filter state is represented by the following variables: $\hat{X}$ represents the prediction of the error covariance matrix at time $t$ given the error covariance matrix up to time $t-1$.}$ represents the prediction of the state at time $t$ given past states up to time $t-1$, and $P_{t|t-1

$P_{t|t}$ represents the posterior estimate error covariance matrix at time $t$, indicating the precision of the estimate. In the prediction phase, the current state is predicted based on the previous state and control inputs. This prediction is an estimate because it does not consider current observations. The error covariance matrix of the prediction is calculated from the previous error covariance matrix and the system noise covariance matrix. The prediction phase expressions are:

$$
\begin{cases}
\hat{X}{t|t-1} = F_t \hat{X} + B_t u_t \
P_{t|t-1} = F_t P_{t-1|t-1} F_t^T + Q_t
\end{cases} \tag{4}
$$

where $F_t$ is the state transition matrix, $F_t^T$ is its transpose, $B_t$ is the control matrix, $u_t$ is the control vector, and $Q_t$ is the process noise covariance matrix.

In the update phase, the current state estimate is calculated based on current observations and prediction values. This estimate is more accurate because current observations participate in the calculation. The error covariance matrix of the state estimate is obtained from the error covariance matrix calculated in the prediction step, the observation noise covariance matrix, and the Kalman gain. The update phase expressions are:

$$
\begin{cases}
K_t = P_{t|t-1} H_t^T (H_t P_{t|t-1} H_t^T + R_t)^{-1} \
\hat{X}{t|t} = \hat{X}} + K_t (Z_t - H_t \hat{X{t|t-1}) \
P
} = (I - K_t H_t) P_{t|t-1
\end{cases} \tag{5}
$$

where $Z_t$ is the observation value, $R_t$ is the observation noise covariance matrix, $H_t$ is the measurement matrix, $H_t^T$ is its transpose, $K_t$ is the Kalman gain, and $I$ is the identity matrix.

4 Experiments and Results Analysis

We conducted clock difference prediction and testing using phase comparison data between four optically-pumped cesium clocks from the NTSC reference laboratory and China's time reference UTC(NTSC). The prediction models included the first-order linear regression model, the ARIMA model, and the Kalman filter model. The original clock difference data had a sampling interval of 1 h, and the predicted clock difference data also had a time interval of 1 h. To investigate suitable short-term clock difference prediction models for optically-pumped cesium clocks, each prediction model utilized clock difference data of 1 d, 10 d, and 30 d durations for modeling, then predicted clock difference data for the next 12 h, 1 d, 2 d, and 5 d (i.e., test data volumes of 12, 24, 48, and 120 points per clock), comparing predictions with actual measurements to evaluate model prediction stability and accuracy.

We employed Root Mean Squared Error (RMSE) and Range ($R$) as performance metrics to evaluate prediction accuracy and stability, respectively. The specific definitions of RMSE and Range are:

$$
\begin{cases}
\epsilon_t = test_t - pred_t \
RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\epsilon_t^2} \
R = |\epsilon_{max} - \epsilon_{min}|
\end{cases} \tag{6}
$$

where $\epsilon_t$ is the residual between the expected output and predicted value at time $t$, and $\epsilon_{max}$ and $\epsilon_{min}$ represent the maximum and minimum values in the residual sequence $\epsilon_t$.

[TABLE:1] shows the frequency stability of the four TA1000 clocks relative to the reference signal UTC(NTSC) at 1 h, 1 d, and 5 d intervals. [TABLE:2] and [TABLE:3] display the average statistical values of RMSE and Range for the four TA1000 clocks when using 1 d of data for modeling to predict different durations. For all three models, RMSE and Range increase with prediction duration. The first-order linear regression model generally outperforms the ARIMA model, while the Kalman model's errors are significantly larger than those of the other two models. This indicates that the first-order linear regression and ARIMA models can learn clock difference trends with only small amounts of modeling data, whereas the Kalman model produces inaccurate clock difference state estimates due to insufficient modeling data.

[TABLE:4] and [TABLE:5] show the RMSE and Range for the four TA1000 clocks when using 10 d of data for modeling. Prediction error continues to increase with prediction duration. Compared with [TABLE:2] and [TABLE:3], as the modeling data volume increases, the Kalman model's RMSE and Range generally decrease because it can more accurately estimate clock difference state parameters. The first-order linear regression model's mean square error increases due to clock noise interference, while the ARIMA model, which first differences the observation data to eliminate non-stationary interference, demonstrates more robust performance.

[TABLE:6] and [TABLE:7] present the average statistical values of RMSE and Range for the four TA1000 clocks when using 30 d of data for modeling. Because the first-order linear regression model is affected by cesium clock stability factors during long-duration modeling, causing the prediction starting position not to be at zero, direct RMSE calculation yields large deviations. Therefore, when using 30 d of data, the first-order linear regression model was combined with differencing for modeling. With further increases in modeling data volume, the first-order linear regression model's RMSE and Range decrease significantly compared with using 10 d of data, indicating that long-duration modeling requires combination with differencing. The Kalman model's RMSE and Range further decrease, related to its good noise suppression capability, while also demonstrating that the Kalman model requires a certain amount of modeling data. The ARIMA model's RMSE and Range show small fluctuations with changes in modeling data volume, indicating its robust noise suppression capability for clock difference data.

[FIGURE:2] displays the model fitting residuals for the four atomic clocks using 30 d of data for modeling. All three models inevitably have model errors. The ARIMA and Kalman model residuals fluctuate near zero, while the first-order linear regression model shows large residual deviations during long-duration modeling due to noise and other factors.

[FIGURE:3] shows the prediction residuals calculated using equation (6) for the four TA1000 atomic clocks when predicting a 1 d duration using 30 d of modeling data. The first-order linear regression model performs differently across the four clocks: it performs poorly for Clock 1 but is suitable for the remaining three clocks. The ARIMA and Kalman models show similar performance. As prediction duration increases, the ARIMA model's overall fluctuations are more stable than the Kalman model's.

[TABLE:8] and [TABLE:9] compare the average RMSE and Range values for the three models under three modeling data volumes when predicting four durations. Based on Range values, the first-order linear regression and ARIMA models show similar prediction stability, both significantly outperforming the Kalman model. Based on RMSE values, the ARIMA model clearly demonstrates superior prediction accuracy. The ARIMA model's ability to eliminate non-stationary components means that changes in modeling data volume have little impact on its prediction accuracy, without the phenomenon of rapidly deteriorating accuracy with increasing prediction duration. Additionally, using smaller modeling data volumes can maintain prediction accuracy.

5 Conclusion

To enable the optically-pumped cesium atomic clock TA1000 to perform better in timekeeping ensembles and ensure the continuity, stability, and reliability of China's time reference UTC(NTSC) during the localization of atomic clocks, we analyzed three classical clock difference prediction algorithms applied to cesium clocks: the first-order linear regression model, the ARIMA model, and the Kalman model. By comparing the prediction stability and accuracy of the three models, we identified their respective advantages and disadvantages when applied to the TA1000:

(1) Under the same modeling data volume conditions, prediction errors of the same model increase with prediction duration.
(2) For short-term clock difference prediction, the first-order linear regression model produces the best results with only small amounts of clock difference modeling data, though long-duration modeling requires further investigation. The Kalman model requires sufficient clock difference data. The ARIMA model demonstrates more robust performance across different modeling data volumes.
(3) Based on prediction stability and accuracy, the ARIMA model is more suitable for short-term clock difference prediction of the optically-pumped cesium clock TA1000.

References

  1. Dong Shaowu, Wang Yanping, Wu Wenjun, et al. Journal of Time and Frequency, 2018, 41: 73
  2. Yang Yuting, Liu Chenfan, Lin Yuting, et al. Process Automation Instrumentation, 2021, 93: 97
  3. Jiang Meng. Acta Astronomica Sinica, 2024, 65: 10
  4. Li Shuzhou, Wang Maolei, Xiao Shenghong. Modern Navigation, 2017, 118: 121
  5. Wu Yiwei, Yang Bin, Xiao Shenghong, et al. Journal of Wuhan University, 2019, 1226: 1232
  6. Chen Qian, Chen Junping, Yu Chao, et al. Acta Astronomica Sinica, 2020, 61: 16
  7. Chen Q, Chen J P, Yu C, et al. ChA&A, 2020, 44: 258
  8. Jiang Shiqi, Li Bofeng. Journal of Navigation and Positioning, 2019, 118: 124
  9. Liang Yugao, Tian Kechun. Telemetry, Tracking and Command, 2014, 42: 46
  10. Fan Xuliang, Wang Xiaohong, Zhang Xianyun, et al. Telemetry, Tracking and Command, 2015, 104: 106
  11. Sun Qisong, Wang Yupu. Geomatics & Spatial Information Technology, 2016, 93: 95
  12. Yang Xu, Wang Qianxin, Lü Weicai, et al. Journal of Navigation and Positioning, 2022, 59: 68
  13. Wang Xu, Chai Hongzhou, Wang Chang. Acta Geodaetica et Cartographica Sinica, 2020, 580: 587

Submission history

Analysis of Clock Difference Prediction Algorithms for Domestic Optically Pumped Cesium Atomic Clocks (Postprint)