Calibration Reference Data Recommendation Method Postprint
Bowei Zhao, Wei Shoulin, Juanjuan Ren, Wang Feng, Ling Chenxiaoji, Liu Chao
Submitted 2025-07-02 | ChinaXiv: chinaxiv-202507.00052

Abstract

The efficient operation of automated astronomical data processing pipelines relies on precise calibration reference data recommendation mechanisms. Consequently, comprehensive calibration reference data systems have emerged. This paper systematically reviews the calibration reference data recommendation methods adopted by mainstream international telescopes and provides an in-depth analysis of their respective advantages and disadvantages. It focuses on introducing a novel recommendation strategy based on textual rules and its accompanying calibration reference data system, as well as their flexibility and efficiency in automated data processing. Furthermore, it elaborates on the critical role and potential application value of this system in the scientific data processing of the China Space Station Telescope (CSST), and prospects its future development. This research provides new ideas and methods for the recommendation of calibration reference data in astronomical data processing, holding significant theoretical and practical importance.

Full Text

Preamble

Vol. 43, No. 2

June 2025

Progress in Astronomy Vol. 43, No. 2 June 2025 doi: 10.3969/j.issn.1000-8349.2025.02.11

Calibration Reference Data Recommendation Methods

ZHAO Bowei¹;², WEI Shoulin³, REN Juanjuan¹, WANG Feng⁴, LING Chenxiaoji¹, LIU Chao¹;²

(1. Key Laboratory of Space Astronomy and Technology, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China; 2. University of Chinese Academy of Sciences, Beijing 100049, China; 3. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; 4. Center for Astrophysics and Great Bay Center of National Astronomical Data Center, Guangzhou University, Guangzhou 510006, China)

Abstract

The efficient operation of automated astronomical data processing pipelines relies on precise calibration reference data recommendation mechanisms, necessitating the development of comprehensive calibration reference data systems. This paper systematically reviews the calibration reference data recommendation methods employed by major international telescopes and analyzes their respective advantages and disadvantages. We particularly introduce a novel text-based rule recommendation strategy and its accompanying calibration reference data system, highlighting its flexibility and efficiency in automated data processing. Furthermore, we elaborate on the critical role and potential applications of this system in the scientific data processing pipeline for the China Space Station Survey Telescope (CSST), and discuss its future development prospects. This research provides new insights and methodologies for calibration reference data recommendation in astronomical data processing, holding significant theoretical and practical importance.

Keywords: astronomical data processing; calibration reference data; space telescopes

1. Introduction

Astronomy is the study of celestial objects and phenomena in the universe, fundamentally dependent on observations. Through observation, we acquire rich datasets that require processing before they can be used for measurement and analysis to reveal fundamental physical properties of celestial objects, including brightness, temperature, spectral type, radial velocity, chemical composition, and structure. These critical data not only provide essential support for exploring the universe and understanding the formation and evolution of various celestial bodies, but also establish the core position of astrophysics in modern astronomy. Astronomical observation techniques have undergone revolutionary development, evolving from naked-eye observations and hand-drawn sky charts to modern telescopes, forming a comprehensive observational system that includes astrophotometry, photometry, spectroscopy, full-wavelength astronomy, and digital sky surveys. Concurrently, astronomical telescopes have achieved tremendous leaps from small to large apertures, from single to multi-mirror systems, and from ground-based to space-based platforms. Modern astronomical observations now primarily rely on large-aperture telescopes equipped with advanced terminal instruments, providing unprecedented observational capabilities.

The application of astronomical telescopes extends beyond merely collecting observational data; their core objective is to advance scientific research and generate new knowledge. Therefore, transforming raw observational data into scientific knowledge is a crucial process, with data processing and analysis software serving as the essential bridge. Raw astronomical data contains not only genuine signals from celestial objects but also atmospheric and instrumental effects, mixed with various noise sources. Consequently, raw data must undergo complex processing procedures, including instrument effect corrections, cosmic ray removal, and various calibrations, before they become scientific data suitable for astronomical analysis. However, as modern astronomical data processing grows increasingly complex, a single terminal instrument often involves multiple data types, including scientific observations and calibration data. For example, the X-shooter instrument on the European Southern Observatory's Very Large Telescope involves over a hundred data types in its raw data processing pipeline. Moreover, different astronomical telescopes are designed around specific scientific objectives from their inception. Achieving these objectives and providing high-quality scientific data to researchers requires precise calibration of both terminal equipment and scientific observations.

The complete calibration workflow includes calibration observations, calibration data processing (generating calibration reference files), and selecting the optimal calibration reference files for scientific data processing. Since obtaining ideal calibration observations typically consumes substantial observing time, and time allocation strategies directly affect the rules for selecting calibration reference data in subsequent scientific data processing, a trade-off inevitably emerges between conducting more calibration observations versus dedicating more time to scientific observations. For ground-based observations, ideally, calibration data from the same night should be used to calibrate scientific observations from that night. However, if the terminal instrument exhibits high stability, calibration data from preceding or following nights can also be used to calibrate current scientific data, thereby improving observational efficiency.

Space-based telescopes operate outside Earth's atmosphere, enabling full-wavelength, high-precision, long-duration observations that yield high-quality data. However, the complex space environment can cause radiation damage to detectors, leading to changes in instrument performance that affect data processing and scientific product quality. Therefore, on-orbit calibration is critical for space telescopes. Renowned space telescopes such as Hubble Space Telescope (HST), James Webb Space Telescope (JWST), and Nancy Grace Roman Space Telescope all incorporate periodic on-orbit calibration observations. Additionally, astronomical telescopes typically equip multiple terminal instruments, each with unique observation modes requiring specific calibration observations. Different calibration observation types usually have varying monitoring cycles due to their distinct objectives, further increasing the complexity of calibration reference data.

With the explosive growth in astronomical data volume, constructing automated scientific data processing pipelines has become an inevitable trend. To ensure accurate scientific data processing, automated pipelines must not only automatically identify and select optimal calibration reference files but also comprehensively consider scientific observation strategies and calibration schemes. Consequently, efficiently and accurately selecting the best calibration reference data during scientific data processing has become a complex yet critical issue. In response to this demand, numerous international telescope projects have developed specialized systems for recommending optimal calibration reference data, which play vital roles, particularly in complex space telescope missions. Among these, JWST's Calibration Reference Data System (CRDS) stands out for its uniqueness and advancement. Building upon HST's long-term experience and to meet JWST's data processing requirements, a comprehensive calibration reference data system emerged, centered on constructing calibration reference file recommendation rules and automating the recommendation of optimal calibration reference files, while also managing calibration reference file storage. This system serves as the core link between calibration reference files and scientific data processing pipelines, automatically recommending the best options from numerous on-orbit calibration reference files of multiple types and versions according to established rules for use by scientific data processing pipelines. Furthermore, the text-based rule calibration reference data system exhibits exceptional flexibility and scalability, applicable not only to calibration reference file selection for astronomical telescope observations but also to other scenarios requiring automated rule-based recommendations.

Chapter 2 of this paper systematically reviews and summarizes the primary calibration reference data recommendation methods adopted by major international telescopes. Chapter 3 provides a detailed introduction to the currently more flexible and advanced text-based rule calibration reference data system. Chapter 4 focuses on the application of this system to the China Space Station Survey Telescope (CSST) scientific data processing. Chapter 5 concludes with a summary and future outlook.

2. Historical Review

Based on scientific data processing requirements, telescopes periodically interleave various necessary calibration observations during actual astronomical observations for different terminal instruments and observation modes. These periodically acquired calibration observations generate different versions of various calibration reference files. Consequently, during subsequent large-scale automated processing of astronomical observation data, automatically selecting the optimal files from numerous versions of calibration reference files for scientific observation data calibration significantly impacts processing results and accuracy, requiring focused research.

Telescope projects have adopted diverse solutions for calibration reference data recommendation. A relatively simple method involves retrieving calibration reference files closest in observation time to the scientific data from databases using Structured Query Language (SQL) for use by scientific data processing pipelines. However, this approach suffers from numerous drawbacks, such as difficulty reproducing historical processing results, excessive database dependency, limited portability, and inability to customize reference files according to specific needs. Therefore, international astronomical telescopes have adopted different calibration reference data recommendation methods based on their actual requirements.

The Chandra X-ray Observatory (CXO), launched by NASA in 1999 as the third satellite in the Great Observatories program, employs a method based on the "Chandra Calibration Database (CALDB)." This method follows the CALDB system standard model developed by NASA's High Energy Astrophysics Science Archive Research Center (HEASARC), which is used by most high-energy astronomical telescopes. Despite its name suggesting a database, the CALDB method does not essentially involve a real database. Its core principle stores calibration reference file selection rules in index files that list all available calibration reference files and their usage conditions. While not simple to construct, this method offers advantages such as reproducibility of past recommendations and software portability.

The Spitzer Space Telescope (SST), NASA's final Great Observatories mission launched in 2003, possessed sensitive infrared detection capabilities that enabled observations of some of the most distant and faint celestial objects, revealing many secrets of the universe. For scientific data processing, SST uses "caltrans" (calibration transfer) software developed by the Spitzer Science Center (SSC) in C language, interacting with Informix databases. SST employs what is likely the purest database-based method, where calibration reference file selection rules are essentially expressed as SQL queries on the database. While this system supports simple updates and revocation operations, it struggles to reproduce previous recommendations, is not easily portable, and cannot be used in other environments.

The Gemini Telescope, consisting of two 8-meter-class ground-based optical telescopes operated by the Association of Universities for Research in Astronomy (AURA) and commissioned in 2000, employs a hybrid approach. While most information is stored in databases, calibration reference file selection is performed through Python-generated SQL queries to the database. Simple SQL queries often fail to meet requirements for complex selection rules, whereas this hybrid method can construct arbitrarily complex rules. Although it provides a web service for external users to select calibration reference files, the system does not support portability, multi-version rules, or easy user customization.

The Hubble Space Telescope (HST), a large space optical telescope jointly developed by the United States and Europe and named after American astronomer Edwin Hubble, was launched in 1990 aboard the Space Shuttle and operates in low Earth orbit. To better support scientific data processing and analysis, particularly the operation of scientific data calibration pipelines, HST began planning and designing its Calibration Database System (CDBS) before launch. This system served HST successfully for over 20 years, with its working methods refined and upgraded during this period while maintaining the same overall design framework.

CDBS adopts a database-centric design, with the database containing information for all types of calibration reference files. For each calibration reference file of every HST terminal instrument, CDBS adds a file description containing terminal instrument observation mode parameters that define the exposure type. These descriptions are stored in separate database tables for each HST terminal instrument, with each table containing selectable parameters for all observation modes of that instrument's calibration reference file set. Calibration reference file descriptions also include a "UseAfter" timestamp marking the earliest time the reference file should be used. Additionally, all optimal calibration reference file selection rules are stored in XML (eXtensible Markup Language) files, which specify search conditions used when recommending reference files for scientific data.

CDBS's primary function is to query the database for various types of calibration reference files required for each new scientific dataset (or datasets requiring reprocessing) and return the filename of the best reference file. The "bestref" program then updates this name to the value of specific keywords in the scientific dataset's header file. The scientific data processing pipeline retrieves the required best reference file from these header keywords for scientific data calibration.

Although CDBS served HST successfully for over 20 years, numerous limitations emerged during long-term use. These include difficulties in testing new reference files, inability to revoke erroneous submissions, lack of remote usage support, absence of personalized customization for reference file selection rules, and near-impossibility of reproducing historical results. Since selection rules are stored in the database, the truly effective rules are difficult to understand. The database-centric design philosophy also limits the extension of selection rule types, making it challenging to implement new rule types. These limitations ultimately stem from CDBS's database dependency, specifically storing selection rules in the database.

Given CDBS's inherent limitations, the Space Telescope Science Institute (STScI) developed a new system for JWST: the Calibration Reference Data System (CRDS). This system uses text to encompass reference file selection rules, effectively solving the database dependency problem and achieving system portability. After successful trials on JWST, HST gradually transitioned from CDBS to CRDS. NASA's next-generation flagship space mission, the Roman Space Telescope, will also use this calibration reference data system. Unlike general-purpose space telescopes JWST and HST, the Roman Space Telescope also features survey capabilities. Table 1 [TABLE:1] summarizes the calibration reference data recommendation methods used by several international telescopes.

This paper will provide a detailed introduction to the new CRDS system in Chapter 3.

Table 1 Calibration reference data recommendation methods used by international telescopes

Telescope Name Operational Period Recommendation Method References Chandra X-ray Observatory July 1999–present Calibration Database (CALDB) [29] Spitzer Space Telescope August 2003–January 2020 Calibration Transfer (caltrans) [30] Gemini Telescope 2000–present Hybrid Database+Python Query Method [35] Hubble Space Telescope April 1990–present CDBS for first 24 years, CRDS currently [37, 39] James Webb Space Telescope Launched December 2021, 10-year planned operation Calibration Reference Data System (CRDS) [28] Roman Space Telescope Expected launch 2026 or 2027, 5-year operation Calibration Reference Data System (CRDS)

3. The Text-Based Rule Calibration Reference Data System (CRDS)

3.1 Overview

With the development of space astronomy, scientists designed the Calibration Reference Data System (CRDS) to serve the James Webb Space Telescope, Hubble Space Telescope, and Roman Space Telescope. This is a comprehensive system for managing and distributing calibration reference files. For scientific data processing pipelines, CRDS manages two categories of reference files: "data reference files" containing calibration data or information, and "parameter reference files" containing various configuration parameter information used in scientific data processing pipelines. Figure 1 [FIGURE:1] illustrates the relationship between the Roman Space Telescope's Wide Field Instrument (WFI) Level 2 data processing pipeline (used to generate calibrated single-exposure data) and CRDS.

CRDS is termed a "system" because, centered around its core function of "optimal reference file recommendation," it requires a complete supporting service platform, ultimately becoming a comprehensive "system" comprising Python libraries, command-line programs, web servers, and databases for managing and distributing all types of calibration reference files for all terminal instruments of an astronomical telescope, efficiently and flexibly serving automated scientific data processing.

CRDS represents a complete redesign, abandoning the traditional approach of storing calibration reference file selection rules in databases. Instead, it adopts a more concise and flexible method by encompassing all selection rules for a version within a simple text file. This approach offers multiple advantages: First, the textual format makes selection rules for various calibration reference file types clear and easy to understand. Second, these rule files are version-controlled, with different versions non-conflicting and non-confusing, facilitating management and enabling reproducibility of historical processing results. Third, text-format rule files are easily portable, allowing remote users to use them without installing complex software. Finally, the relatively simple content of rule files enables users to conveniently customize and modify rules according to their actual needs when necessary, providing flexibility and ease of operation.

To realize CRDS's core function of "optimal reference file recommendation," the prerequisite is constructing text-based rule files for various calibration reference data types, enabling optimal reference file recommendations based on these rules. As a complete system supporting this requirement, CRDS must possess multiple functions, including submission of new reference files with simultaneous reconstruction of new rule files, optimal reference file recommendation, scientific data reprocessing, and various auxiliary functions such as verification of reference and rule files (detailed in Table 2 [TABLE:2]). Figure 2 [FIGURE:2] provides an overall schematic of CRDS functions, showing that different functions involve different users. CRDS involves three main user categories: calibration scientists, scientific data processing pipeline operators, and CRDS system administrators. Calibration scientists primarily create and generate various calibration products (calibration reference files) and submit them to CRDS, providing input for the system. Scientific data processing pipeline operators are the users of CRDS's optimal reference file recommendation function. CRDS system administrators are responsible for overall system maintenance and management.

Primarily developed in Python, CRDS implements various usage modes including script command-line tools, Python API interfaces, and simple web visual interfaces. CRDS software has both client and server versions, with the client version released on GitHub while the server-side software (controlling file submission, management, and network services) is not publicly released for security reasons. Through CRDS client software or websites, users can access functions including reference file submission and verification, optimal reference file recommendation, scientific data reprocessing, and various auxiliary functions (see Table 2).

Table 2 Detailed description of CRDS functions

Function Description Recommend best reference files (bestrefs) Recommends optimal calibration reference files for scientific datasets, with modes including file mode, reprocessing mode, rule file testing mode, and comparison mode Verify reference/rule files (certify) Performs verification checks on reference/rule files, including basic document format, semantics, and parameter constraints Checksums (checksum) Adds, deletes, or verifies hash values (sha1sum) of rule files or checksums/datasums of reference files Compare files (diff/rowdiff) Compares different versions of rule or reference files and returns differences; diff compares FITS file table data column-by-column, while rowdiff specifically compares row-by-row Download reference files (get synphot) Downloads reference files mappable by a pmap file version from CRDS server to local Query affected datasets (query affected) Automatically determines scientific datasets requiring reprocessing due to rule or reference file updates List/query information (list) Lists or queries various CRDS information, such as local installation configuration, or reference/rule files on server or local cache Explain rules (matches) Interprets CRDS rules to find matching rule content for a reference file Generate context files (newcontext) Automatically generates new version context files (imap or pmap files) Refactor rmap files (refactor) Automatically reconstructs new version rmap files Submit reference files (submit/rc submit) Submits reference files to CRDS server (requires CRDS account permissions) Sync files (sync) Synchronizes reference/rule files from server to local Rename files (uniqname) Renames reference files with official HST unique CRDS names View dependencies (uses) Views or lists dependency files for a reference/rule file

3.2 Selection Rule Files

The CRDS system revolves around selection rule files that describe how to select or assign the most appropriate calibration reference files for scientific data. CRDS rule files are hierarchically designed text files, where hierarchical mapping relationships between rule files help scientific observation data simply and clearly select the required optimal calibration reference files. The rule hierarchy has three levels (see Table 3 [TABLE:3]): pmap files (pipeline-level, one per telescope), imap files (instrument-level, one per instrument), and rmap files (reference type-level, one per calibration data type). These three mapping file types have hierarchical mapping/selection relationships and are collectively called mapping files, with imap and pmap files also referred to as context files. Based on these rule files, CRDS implements a nested organizational structure corresponding to different functional levels. Figure 3 [FIGURE:3] illustrates the hierarchical structure of CRDS rule files. All observation mode reference files for a specific calibration type of a telescope's instrument are encompassed in versioned rmap files (see columns 3 and 4 in Figure 3). An instrument's imap file collects specific versions of rmap files for all calibration types of that instrument, forming the effective recommendation version of the imap file (see columns 2 and 3 in Figure 3). The pipeline-level pmap file for a telescope collects specific versions of imap files for all instruments of that telescope (see columns 1 and 2 in Figure 3). In summary, CRDS primarily operates based on four file types: three hierarchical mapping/rule files and the final mapped calibration reference files (see Figure 3).

Table 3 Introduction to CRDS rule files at different hierarchical levels

File Type Level Function pmap (pipeline context) Telescope pipeline level Manages all instruments of a telescope, used to map imap files imap (instrument context) Instrument level Manages all reference file types for an instrument, used to map rmap files rmap (reference type mapping) Reference file type level Manages all versions of reference files for a reference type, used to map specific reference files

The three hierarchical mapping/rule files in CRDS (pmap/rmap/imap) share identical structure and syntax, primarily comprising two main sections: a "header" section and a "selector" section (see Figure 4 [FIGURE:4]). The header section provides descriptive information about the mapping file, including file origin, filename, telescope name, mapping file type, and integrity verification information (see header content in Figure 4). A particularly important field is "parkey," which represents the dataset parameters used by the selector for optimal reference file lookup (typically keywords from FITS file headers or JWST/Roman data model names). The selector section includes matching rules for finding mapping results, typically in nested structures that can contain high-level selectors and nested sub-selectors. Additionally, comments can be added between the header and selector sections as needed.

Regarding the selector section of rule files, pmap and imap rule files have relatively simple selectors (see column 1 in Figure 4). The "parkey" field in pmap rule files is typically "INSTRUME," representing the instrument keyword, and its selector section simply matches and maps to imap filenames corresponding to different instruments. The "parkey" field in imap rule files is typically "REFTYPE," representing the calibration reference file type keyword, and its selector section also simply matches and maps to rmap filenames corresponding to different reference file types for that instrument. In contrast, rmap rule files are more complex (see column 2 in Figure 4), with "parkey" fields including multiple keywords as appropriate, and selectors typically being nested hierarchical selectors to match various header file keywords. CRDS rule file selectors support various selection rules including exact matching, expression matching (Match), start time (UseAfter), software version selection, most recent time selection, nearest distance selection, and range selection, as detailed in Table 4 [TABLE:4]. Match and UseAfter selectors are the most commonly used (see selector section in Figure 4). Depending on scientific objectives, observation strategies, calibration schemes, and other factors, different telescopes' scientific data processing pipelines may employ different selection rules for reference files. Additionally, scientists or other users with complex reference file selection needs can add and use new selection rules by modifying the content of rule files in their locally installed CRDS path.

Table 4 Introduction to selector rules in CRDS rule files

Rule Type Description Match Finds tuples most closely matching scientific data header information, supporting exact matching, enumerated lists, wildcards, regular expressions, literal expressions, relational expressions, range expressions, and exclusion expressions UseAfter (Start Time) Finds the most recent reference file with time earlier than the scientific data observation time SelectVersion (Software Version) Uses pipeline software versions and various relational expressions for selection, where different versions of scientific data processing software may use different reference file versions ClosestTime (Most Recent Time) Selects the most recent time GeometricallyNearest (Nearest Distance) Selects the nearest distance Bracket (Range Selection) Returns two selection results

3.3 Reference File Submission and Verification

The prerequisite for CRDS to recommend optimal reference files is the proper storage and management of submitted calibration reference files, along with reconstructing corresponding rule files during submission (see CRDS reference file submission function in Figure 2). Since space telescope on-orbit calibration generates numerous multi-version, multi-type reference files that are updated over time—especially daily monitoring items like background and dark current with daily update frequencies—and since post-launch data processing algorithm updates also lead to reference file generation updates, reference file submission occurs frequently, sometimes in large batches.

Reference file submission generally falls into two categories: adding a new reference file type that did not previously exist, or replacing existing reference files (such as obtaining better flat-field reference files); and adding new versions of existing reference file types for processing scientific observations from specific time intervals.

Since a single problematic reference file can affect the processing of large batches of scientific data, quality verification of reference files is crucial—reference files submitted to CRDS must be valid and absolutely error-free. Therefore, when submitting calibration reference files to CRDS, a series of rigorous verification checks are performed (see CRDS verification function in Table 2). Only reference files passing verification can be successfully submitted to the CRDS server and trigger reconstruction of new version rule files. The reconstructed new version rule files must also undergo verification before being stored in the CRDS server for scientific users.

Verification of reference and rule files primarily relies on constraint files with .tpn suffixes (.tpn constraint definition files) defined in CRDS. CRDS-specific .tpn constraint files define comprehensive checks required for reference/rule files (rmap files). For HST, .tpn files define nearly all CRDS checks, inherited from CDBS checks, which is also the origin of CRDS .tpn file syntax. For JWST/Roman, .tpn checks extend JWST/Roman data model checks, enabling "required" check options, matrix dimension checks, keyword relationship checks, and distinguishing acceptable values for different backend modules. .tpn file syntax supports explicit instructions, synthetic instructions, include instructions, replace instructions, and constraint instructions. The most frequently used are constraint instructions, with each line defining a constraint check typically including five fields: , , , (whether required), and (a comma-separated list of values allowing Python expressions), with fields separated by spaces. Through internally defined .tpn files, CRDS can perform comprehensive checks on reference files, from header keywords to data matrices and tables.

In addition to CRDS-specific .tpn format checks, CRDS verification (the "certify" function) also includes: JWST/Roman data model checks, fitsverify checks, table row checks, FITS/ADSF/JSON/YAML format checks, rmap rule file update checks, and file hash value verification. When submitting reference or rule files, warnings or errors are issued if problems are detected. Users can modify reference/rule files based on these warnings/errors to complete submission. Only verified reference files undergo versioned renaming and are used to construct hierarchical rule files for storage on the CRDS server for scientific users.

3.4 Optimal Reference File Recommendation

The core function of CRDS is recommending optimal calibration reference files for scientific data processing based on constructed rule files—the "bestrefs" function (see Figure 2 and Table 2). Calibration of a single scientific observation image typically requires numerous types of calibration reference files, which vary depending on the terminal instrument, observation mode, and other factors. Therefore, as calibration data products, calibration reference file header information contains keywords recording relevant information about the reference file, including actual observation instrument, configuration parameters, reference file type, usage time, creation time, and type-specific keywords. CRDS uses this header information to reconstruct and generate new version rule files, then assigns optimal calibration reference files through hierarchical mapping structures. Updates to pmap rule files set corresponding CRDS configurations (i.e., currently effective rule file versions) for scientific data processing pipelines at specific time points. CRDS reads header information from scientific data awaiting processing and, based on the selected pmap file: first matches and maps to imap files (included in the pmap file's selector) according to the "INSTRUME" keyword value (the "parkey" field in pmap files); then matches and maps to rmap files (included in the imap file's selector) based on the required reference file type "REFTYPE" value (the "parkey" field in imap files, or matches all rmap file types included in the imap file if "REFTYPE" is not set); and finally recommends the optimal calibration reference file based on selection rules and "parkey" fields in the rmap file (see Figures 3 and 4).

For HST, calling CRDS's "bestrefs" function recommends optimal calibration reference files and updates reference filenames in scientific data headers for subsequent processing. For JWST and Roman, CRDS is fully integrated into their scientific data processing pipelines, which automatically map and select optimal calibration reference files during operation based on the selected pmap file. The default pmap file is the operational status pmap, though older versions can be selected as needed. Therefore, when publishing articles using JWST data, authors must specify not only the JWST scientific data processing pipeline version but also the pmap file version and CRDS client software version, as these determine the calibration reference files used for data processing. Additionally, for traceability, the used pmap file version and optimal calibration reference filenames are ultimately recorded in header keywords of processed scientific data.

Using the optimal reference file function requires network connection to the CRDS server and proper configuration of key environment variables, including CRDS_SERVER_URL (CRDS server URL) and CRDS_PATH (local path for CRDS cache), to download required CRDS rule and reference files to local storage for data processing. This also enables users to conveniently view and flexibly modify rule files to meet customized requirements for calibration reference file usage in data processing.

3.5 Scientific Data Reprocessing Function

Calibration reference files and raw scientific observation images together serve as important inputs for scientific data processing software, determining the quality of scientific data products. Scientific data inevitably requires reprocessing, typically due to updates in data processing algorithms or calibration reference files. For space telescopes, instrument degradation caused by the space environment necessitates periodic on-orbit monitoring, generating time-varying new versions of calibration reference files. Additionally, iterative updates to calibration schemes and product generation algorithms during operation also lead to updates in calibration reference files. Therefore, calibration reference files require frequent updates and submissions to CRDS, with corresponding rule file updates. Since CRDS recommends optimal calibration reference files based on rule files, updates to selection rules or addition of new reference files inevitably lead to changes in recommended reference files for the same scientific data, meaning previously processed scientific data may require reprocessing.

The reprocessing function represents another important and novel feature of CRDS. CRDS's reprocessing function typically collaborates with archival databases to determine scientific datasets requiring reprocessing due to new reference file submissions. When scientific data processing pipelines select a new version pmap file, CRDS automatically calculates which scientific data require reprocessing due to rule file changes, thereby automatically triggering scientific data reprocessing (see CRDS reprocessing function in Figure 2). The reprocessing function calculates affected scientific datasets by comparing reference files assigned by old and new rule files, enabling CRDS to automatically determine which processed scientific datasets need reprocessing. Upon completion, the reprocessing system stores logs and recommended reprocessing datasets, accessible via e-mail, client programs, or web interfaces. Finally, scientific data processing pipeline operators or scientific data users decide whether to reprocess affected scientific data and execute reprocessing operations.

3.6 Other Auxiliary Functions

Beyond constructing selection rule files and assigning optimal reference files based on hierarchical rule files, CRDS provides diverse auxiliary tools to facilitate rule and reference file management. These auxiliary functions include: downloading and managing CRDS rule/reference/status information, constraint and format checks, file comparisons, content viewing, and dependency viewing (see Table 2). These functions and tools enhance CRDS system usability and manageability.

Furthermore, as a complete system, CRDS employs dedicated network servers to facilitate various functions. CRDS servers primarily manage CRDS rule files, reference files, and metadata. Different telescopes have their own CRDS server URLs. CRDS servers provide various network services through JSONRPC and HTTP interfaces, including reference/rule file submission and archiving, optimal reference file recommendation, reprocessing functions, and file distribution downloads, to support client software functions. Figure 5 [FIGURE:5] illustrates the workflow of scientific data processing pipelines obtaining optimal calibration reference files from CRDS servers.

4. CSST Application

4.1 CSST Introduction

The China Space Station Survey Telescope (CSST) is China's first large-scale space optical astronomical telescope, featuring a 2-meter primary mirror and an effective field of view of 1.72 square degrees. It will operate in low Earth orbit, co-orbiting with the space station for maintenance and upgrades, with a planned 10-year operational lifetime. The survey telescope comprises five backend modules (multicolor imaging and slitless spectroscopy survey camera, multichannel imager, integral field spectrograph, exoplanet imaging coronagraph, and terahertz spectrometer) (see Figure 6 [FIGURE:6]), enabling large-field survey observations, various precision astronomical measurements, and deep-field/ultra-deep-field observations. CSST is expected to achieve major breakthroughs in cosmology, galaxies and active galactic nuclei, Milky Way and nearby galaxies, stellar science, exoplanets and solar system objects, astrometry, and transient and variable sources, representing China's flagship space astronomy project.

To achieve "large field of view, high image quality, and multi-band" capabilities, CSST's five first-generation observation terminal instruments are highly distinctive. The main survey camera (MSC) focal plane consists of 30 tiled 9,000×9,000 CCDs (charge-coupled devices) with 2.5 billion total pixels, including 18 detectors for multiband imaging and 12 for slitless spectroscopy, with astrometric calibration components using CCDs identical to the main focal plane. The short-wavelength infrared module has eight 640×512 detectors, with four for near-infrared imaging and four for slitless spectroscopy. The multichannel imager (MCI) can perform simultaneous observations in ultraviolet, visible, and near-infrared channels, each equipped with filter wheel components for subdividing bands and detectors consistent with the survey module. The integral field spectrograph (IFS) employs an improved image slicer-based field integral unit, enabling high spatial resolution imaging spectroscopy across the entire optical band due to the absence of atmospheric seeing limitations in space observations. The exoplanet imaging coronagraph (CPI-C) uses a technique called starlight suppression to block stellar brightness in the surrounding field of view, enabling observation of faint planetary light, primarily for exoplanet observations in visible to near-infrared bands. The high-sensitivity terahertz detection module (HSTDM) operates in the terahertz frequency band between infrared and microwave, used for wideband molecular line searches, Milky Way molecular cloud formation and evolution, extragalactic neutral carbon fine structure observations, and cold gas studies in the nearby universe.

4.2 CSST Calibration Data Recommendation System

The CSST scientific data processing system is a dedicated system for CSST data processing, with primary tasks including observation data simulation, scientific observation requirement planning, and data processing. Data processing is the core task, requiring processing of raw data (Level 0) from all terminal instruments to generate Level 1 and Level 2 data that meet scientific user requirements, ultimately providing important observational foundation data for scientific research. CSST's numerous characteristics—including multiple backend modules, diverse observation modes, simultaneous multicolor imaging and slitless spectroscopy for the survey module, large field of view and tiled CCDs, and diverse scientific objectives—combined with complex space environment factors, pose enormous challenges for scientific data calibration, resulting in extremely complex calibration data and challenges for calibration reference data management and distribution. Therefore, developing a dedicated calibration reference data recommendation system for CSST (CSST Calibration Data System, CCDS) is a core task during scientific data processing system development. This system must address the complexity of calibration data during CSST's on-orbit operation and manage large numbers of time-varying calibration reference files of various types.

By establishing reasonable recommendation rules, this system ensures efficient and accurate recommendation of optimal calibration reference files to scientific data processing pipelines under the complex on-orbit working modes of the space telescope.

Based on CSST's actual requirements, calibration products can be obtained through two approaches: a calibration product generation toolkit and a calibration product generation pipeline. The toolkit primarily includes generation of calibration products for the survey module with relatively low monitoring frequency and various calibration products for precision measurement modules. These generated calibration products are submitted to CCDS only after manual verification, through manual methods using web interfaces or Python APIs. For the survey module, daily monitoring items for detectors (such as background, dark current, and internal flat fields) require high monitoring frequency and process large data volumes, necessitating development into a calibration product generation pipeline for automatic generation, verification, and submission to CCDS, with parallel processing capabilities, posing new challenges for CCDS. Additionally, generation of some pipeline products requires calling and depending on other calibration products—for example, dark current products depend on combined backgrounds, while internal flat-field products depend on both combined dark current and combined backgrounds—adding complexity to the CCDS system through related dependency logic. Figure 7 [FIGURE:7] illustrates the interaction between the developing calibration product generation pipeline and CCDS. As shown, CCDS faces higher requirements to meet the needs of large-scale automated calibration product generation and submission. Furthermore, considering the diversity of CSST terminal instruments and data, calibration product data types will be highly varied. Therefore, in addition to supporting FITS/ADSF/JSON/YAML reference data formats, CCDS will also expand support for additional reference file formats, including TOML and PICKLE files. With the development of CSST's scientific data processing system and calibration product generation toolkit/pipeline, CCDS requirements will continuously upgrade, making CCDS more extensible.

5. Summary and Outlook

Calibration reference data recommendation systems play crucial roles in automated astronomical data processing, automatically assigning optimal calibration reference files for scientific data from different periods, instruments, and observation modes. Through decades of international research and practice, comprehensive calibration reference data systems integrating calibration reference data management, rule formulation and reconstruction, and automated reference file recommendation have gradually developed. These systems have been implemented on renowned international space telescopes including JWST/HST and the planned Roman Space Telescope, demonstrating broad flexibility for application to survey and general-purpose telescopes for calibration reference data management, distribution, and support for automated scientific data processing pipelines.

With continuous development in observational astronomy and the emergence of various survey and medium-to-large aperture telescopes, the need for automated processing of massive astronomical observation data makes research and construction of comprehensive calibration reference data management and distribution systems critically important. Particularly for telescopes like CSST, which not only features a survey module capable of high-resolution multicolor imaging and slitless spectroscopy surveys over large sky areas but also integrates multiple precision measurement modules for various fine astronomical observations, this complexity makes calibration reference data management and distribution extremely challenging, urgently requiring a comprehensive calibration reference data system. With continuous advancement in computer science and interdisciplinary fields, calibration reference data systems can leverage more advanced network technologies and artificial intelligence to continuously optimize and improve, supporting rapid upload and management of large-scale and large-volume reference files, intelligent construction of selection rules and reference file recommendation, and driving the entire astronomical data processing workflow toward greater efficiency and intelligence. This will form an intelligent link from observational data to scientific objectives, ultimately optimizing telescope scientific output.

This research was supported by the China Manned Space Engineering CSST Special Scientific Research Program (CMS-CSST-2025-A19). We thank the China Space Station Survey Telescope Scientific Data Processing System for supporting this work and the reviewers for their valuable comments.

References

[1] Harris D. Astronomy: A Study of Celestial Objects. New York: Murphy & Moore Publishing, 2022: 10
[2] "China Disciplines and Frontier Fields Development Strategy Research (2021-2035)" Project Team. China's Astronomy 2035 Development Strategy. Beijing: Science Press, 2023: 8
[3] Huang Youran, Xu Aiao, Tang Yuhua, et al. Observational Astrophysics. Beijing: Science Press, 1987: 3
[4] Xiao Naiyuan, Xuan Huancan. Illustrated History of Astronomy. Nanjing: Nanjing University Press, 2012: 5
[5] Cheng Jingquan. The Era of Giant Telescopes: Modern Optical Astronomical Telescopes. Nanjing: Nanjing University Press, 2023: 1
[6] Djorgovski S G, Mahabal A A, Drake A J, et al. Planets, Stars and Stellar Systems. Dordrecht: Springer Reference, 2013: 223
[7] Ferguson H C, Greenfield P, Axelrod T, et al. astro2010: The Astronomy and Astrophysics Decadal Survey, Washington: The National Academies Press, 2010: 15
[8] Lena P, Rouan D, Lebrun F, et al. Observational Astrophysics. 3rd ed. Heidelberg: Springer-Verlag, 2012:
[9] Freudling W, Romaniello M, Bramich D M, et al. A&A, 2013, 559: A96
[10] Vernet J, Dekker H, D'Odorico S, et al. A&A, 2011, 536: A105
[11] Zombeck M V. Handbook of Space Astronomy and Astrophysics. Third Edition. Cambridge: Cambridge University Press, 2007: 597
[12] Zhan H. ChSBu, 2021, 66: 1290
[13] Gu Y D. ChSBu, 2022, 37: 1031
[14] Hands A D P, Ryden K A, Meredith N P, et al. Space Weather, 2018, 16: 1216
[15] Doxsey R. SpaceOps 2006 Conference. Rome: AIAA, 2006: 5936
[16] Yoshioka K, Miyoshi Y, Kurita S, et al. Space Weather, 2021, 19: e02611
[17] Bohlin R C. AJ, 2016, 152: 60
[18] https://www.stsci.edu/hst/instrumentation/wfc3/calibration-plan, 2025
[19] https://www.stsci.edu/hst/instrumentation/acs/calibration/calibration-plans, 2025
[20] Menzel M, Davis M, Parrish K, et al. PASP, 2023, 135: 058002
[21] Rigby J, Perrin M, McElwain M, et al. PASP, 2023, 135: 048001
[22] https://www.stsci.edu/jwst/science-execution/approved-programs/calibration, 2025
[23] Freudling W, Zampieri S, Coccato L, et al. A&A, 2024, 681: A93
[24] Gong Q, Bergkoetter M, Berrier J, et al. Journal of Astronomical Telescopes, Instruments, and Systems, 2020, 6: 045008
[25] Bailey V P, Bendek E, Monacelli B, et al. Proc. SPIE, 2023, 12680: 126800T
[26] Scaramella R, Amiaux J, Mellier Y, et al. A&A, 2022, 662: A112
[27] Euclid Collaboration, Paterson K, Schirmer M, et al. A&A, 2023, 674: A172
[28] Greenfield P, Miller T. Astronomy and Computing, 2016, 16: 41
[29] Graessle D E, Evans I N, Glotfelty K, et al. SPIE, 2006, 6270: 62701X-1
[30] Lee W, Laher R, Fowler J W, et al. ASPC, 2005, 347: 594
[31] https://www.stsci.edu/stsci/meetings/calhst/home.html, 2025
[32] Li Jian, Cui Chenzhou, He Boliang, et al. Progress in Astronomy, 2013, 31: 1
[33] Werner M W, Roelling T T, Low F J, et al. ApJS, 2004, 154: 1
[34] Fazio G G, Hora J L, Allen L E, et al. ApJS, 2004, 154: 10
[35] Hirst P, Cardenes R. Proc. SPIE, 2016, 9913: 99131E
[36] Jenkner H. European Southern Observatory Conference and Workshop Proceedings, 1988, 28: 357
[37] Cox C R, Tullos C. Proc. SPIE, 1993, 1945: 69
[38] Cox C R, Lubow S, Tullos C. Observatory Operations to Optimize Scientific Return, 1998, 3349: 218
[39] Swam M S, Lubow S, Hurt L. Astronomical Data Analysis Software and Systems, 2004, 314: 824
[40] Spergel D, Gehrels N, Baltay C, et al. https://arxiv.org/abs/1503.03757, 2025
[41] https://readthedocs.org/projects/jwst-pipeline/downloads/pdf/latest/, 2025
[42] https://roman.gsfc.nasa.gov/science/roses/Roman WFI Processing v6.pdf, 2025
[43] https://hst-crds.stsci.edu/static/users guide/index.html, CRDS User Manual, 2025
[44] Boyett K, Mascia S, Pentericci L, et al. ApJ, 2022, 940: L52
[45] Withers S, Muzzin A, Ravindranath S, et al. ApJ, 2023, 958: L14
[46] Gao M, Zhao G H, Gu Y D. Bull Chin Acad Sci, 2015, 30: 721
[47] Zhan H. Sci Sin-Phys Mech Astron, 2011, 41: 1441
[48] CSST Science Team. CSST Science White Paper, 2024: 16
[49] Manned Space Station Engineering Survey Telescope Scientific Data Processing System. Software System Design Specification, 2022: 5

Submission history