Abstract
Data acquisition (DAQ) systems face significant challenges in addressing the high-throughput and real-time processing demands of next-generation nuclear and particle physics experiments. Generic platforms offer solutions to reduce costs and improve efficiency but require innovations to address heterogeneous data formats, hardware interfaces, and processing requirements. This paper introduces the SHARE principles (Standardized, Heterogeneous, Adaptable, Reusable, and Extensible) for generic DAQ systems. Guided by these principles, we proposes D-Matrix, a generic firmware-software co-designed stream processing platform. The key components of D-Matrix include: (1) A standardized data protocol that abstracts spatiotemporal properties to accommodate diverse detector payloads and processing requirements; (2) A hardware abstraction layer that standardizes access to heterogeneous resources and interfaces; and (3) Cascadable and configurable processing modules deployable across these resources under a unified execution model. This platform enables flexible construction of DAQ workflows and the potential for automated design generation. The DAQ system built on this platform has been successfully deployed at the HIRFL-CSR External Target Experiment (CEE), achieving a throughput of 73.6 Gbps and a processing rate of 20,890 events per second. By establishing a foundation of standardization and generalization across data, hardware, and processing paradigms, D-Matrix represents a significant step towards highly usable and generic heterogeneous DAQ systems.
Full Text
Preamble
D-Matrix: An Approach to Generic Data Acquisition System Design for Nuclear and Particle Physics Experiment Zheng-Yang Sun, Jun-Feng Yang, 1, 2, Jin-Rui Zeng, Ke Sun, Lun Li, Zi-Long Xiong, Yu-Lin Bi, and Yi-Ting Huang 1 State Key Laboratory of Particle Detection and Electronics, University of Science and Technology of China, Hefei 230026, China Department of Modern Physics, University of Science and Technology of China, Hefei 230026, China Data acquisition (DAQ) systems face significant challenges in addressing the high-throughput and real-time processing demands of next-generation nuclear and particle physics experiments. Generic platforms offer so- lutions to reduce costs and improve efficiency but require innovations to address heterogeneous data formats, hardware interfaces, and processing requirements. This paper introduces the SHARE principles (Standardized, Heterogeneous, Adaptable, Reusable, and Extensible) for generic DAQ systems. Guided by these principles, we proposes D-Matrix, a generic firmware-software co-designed stream processing platform. The key components of D-Matrix include: (1) A standardized data protocol that abstracts spatiotemporal properties to accommodate diverse detector payloads and processing requirements; (2) A hardware abstraction layer that standardizes access to heterogeneous resources and interfaces; and (3) Cascadable and configurable processing modules deployable across these resources under a unified execution model. This platform enables flexible construction of DAQ workflows and the potential for automated design generation. The DAQ system built on this platform has been successfully deployed at the HIRFL-CSR External Target Experiment (CEE), achieving a throughput of 73.6 Gbps and a processing rate of 20,890 events per second. By establishing a foundation of standardization and generalization across data, hardware, and processing paradigms, D-Matrix represents a significant step towards highly usable and generic heterogeneous DAQ systems.
Keywords
Data acquisition, stream processing, unified data protocol, automated design, hardware abstraction.
INTRODUCTION
DAQ systems are vital components in nuclear and particle
physics experiments, functioning as central hubs for data ag- 3
gregation, event building, operational control, and more.
The rapid pace of experimental upgrades and the grow- ing diversity of detector technologies strain the custom-built DAQ systems [ ]. Frequent iterations demand costly, time- consuming custom DAQ development. This custom-built ap- proach struggles to efficiently accommodate heterogeneous detectors and varying physics requirements. Ultimately, it re- sults in extended development timelines, high costs, and lim- ited reuse of expertise, potentially slowing scientific progress.
A generic DAQ platform offers a flexible framework that
addresses these issues. It significantly reduces development 14
costs and time by enabling extensive reuse of standardized components (hardware interfaces, protocols, processing mod- ules) across experiments and upgrades. This reuse enhances system maturity, reliability, and performance, while inherent flexibility allows rapid adaptation to new detectors and re- quirements, boosting scientific productivity [
Supported National Program China 2022YFA1604702, No. 2022YFA1602200 and No. 2023YFA1607200), the National Natural Science Foundation of China (NSFC) (No. 11927901, No. 12275268, No. 12341501, No. 12341503, and No. 12341504) and the international partnership program of the Chinese Academy of Sciences (No. 211134KYSB20200057). We thank the Hefei Comprehensive National Science Center for their strong support on the STCF key technology research project.
Jun-Feng Yang, University of Science and Tech- nology of China, 96 Jinzhai Road, Baohe District, Hefei, Anhui 230026,
A common and successful approach in modern DAQ systems employs a software-processing-oriented paradigm, where front-end hardware focuses on data collection and transparent transmission to server clusters for computational tasks. This approach builds on standardized components to enhance generality:
Generic Transport Channels: To reduce custom FPGA development while enhancing system usability and op- timizing performance, a key component is the estab- lishment of a transparent data transportation pathway
from front-end electronics to software processing units, 31
exemplified by implementations such as GBT (Giga- Bit Transceivers) [ ] and FELIX (Front-End LInk eX- change) [
• Unified Software Frameworks: To abstract detector- 35
specific complexities and enable algorithm reuse across experiments, it is crucial to rely on standardized frame- works (e.g., artdaq [ ], xDAQ [ ], and FairMQ [ which define common data containers and provide a
unified approach to constructing generic data process- 40
ing units. 41
A key motivation for adopting this paradigm is to reduce the development complexity inherent in FPGA-based pro- cessing.
Shifting computational loads to software frame- works can accelerate deployment cycles and simplify algo- rithm maintenance. However, this approach can also increase infrastructure costs, as it typically demands high-performance computational resources and substantial bandwidth capacity.
From the perspective of achieving full generality, these es-
tablished approaches have made significant strides, yet oppor- 50
tunities for further integration remain. A key area for poten- 51
tial enhancement lies in achieving a more complete and syn- ergistic integration between FPGA and server software com- ponents. Such deeper integration would allow for a more op- timal distribution of processing tasks, potentially better lever- aging the inherent parallelism and power efficiency of FPGAs alongside the flexibility of software frameworks.
Given these opportunities for deeper integration, we posit 58
that the next evolutionary step involves establishing a more deeply integrated hardware/software co-design architecture.
To guide this evolution, we first propose a more comprehen-
sive definition of DAQ generality through the SHARE princi- 62
ples (Standardized, Heterogeneous, Adaptable, Reusable, and Extensible). To realize these principles, we have developed D-Matrix, a generic firmware-software co-designed stream processing platform. This platform provides a holistic solu- tion by establishing a standardized data protocol, a hardware
Abstract
ion layer, and a modular stream processing paradigm, creating a cohesive and extensible DAQ ecosystem.
DESIGN PHILOSOPHY This section presents the SHARE generalization principles, along with an overview of the D-Matrix platform, its design philosophy, and motivations.
SHARE principles
A clear definition of generality is a prerequisite for the sys- 75
tematic development of a generic DAQ platform. Thus, we propose a set of five foundational principles, encapsulated by the acronym SHARE. We argue that a generic DAQ platform should exhibit the following characteristics:
1. Standardized
Standardization is the essential pathway to generalization, requiring efforts across data, hardware, and processing as- pects.
• In terms of data, it requires the definition of a globally 84
unified data container and communication protocols. 85
• For hardware, it necessitates defining a standardized 86
hardware abstract approach and hardware interconnec- tion interfaces.
Regarding processing, it calls for the establishment of a standardized processing module model and standard-
ized communication interfaces between modules. 91
2. Heterogeneous
Modern large-scale experimental data acquisition and pro- cessing systems are invariably built on heterogeneous archi- tectures, most commonly integrating front-end FPGAs and back-end CPU server clusters, potentially with GPU comput- ing cards. A generic platform must provide support for these
heterogeneous computing units, and ideally, manage and uti- 98
lize them in a more unified approach. This represents a key 99
limitation in the current state of general-purpose DAQ re- search.
As mature processing and encapsulation solutions in soft- ware already satisfy the basic requirements for generalization,
the challenge of unifying heterogeneous architecture usage 104
within the DAQ context shifts primarily to FPGA develop- ment. Therefore, a key objective for a general-purpose DAQ system is to create abstracted encapsulations for FPGAs, en- abling their processing to follow the same paradigms as CPU software, lowering the barrier for developers to leverage het- erogeneous computing resources. Achieving this goal neces- sitates solving the following challenges:
FPGA hardware abstraction: While CPUs utilize ma- ture standardized software abstraction layers (e.g., OS kernel drivers and high-performance runtime libraries) to isolate hardware specifics, FPGAs lack similar en- capsulation standards, forcing developers to directly manage details of chips, boards, IP cores, and physi- cal interfaces.
Interconnecting heterogeneous systems: Efficient and reliable data exchange requires resolving interface het- erogeneity between FPGA-to-FPGA and FPGA-to- CPU interconnects.
General-purpose data processing foundation: Enabling algorithm portability and reuse across detectors/tasks necessitates hardware-agnostic fine-grained data frame standards that decouple processing logic from detector- specific formats.
Balancing Efficiency: Computational effectiveness must be preserved when enabling general processing capabilities, avoiding performance penalties imposed by high-level synthesis (HLS) tools.
3. Adaptable
Adaptability is the most immediate advantage afforded by
a general-purpose design. This is manifested in the following 134
aspects: Adaptability to data formats: Different detectors may have varying data output formats. This necessitates the
design of a unified data container within the generic 138
DAQ platform, which must also follow the principle of low protocol overhead to conserve precious data band- width in nuclear and particle physics experiments.
Adaptability to processing needs: Modern experiments can generally be categorized into two types: trig- gered and trigger-less. These two categories often em- ploy different processing workflows, and individual ex- periments typically have custom requirements.
generic DAQ platform must provide robust adaptability to accommodate these diverse and complex processing needs.
Adaptability to scale and development process: The platform must be flexible and scalable to serve exper- iments of all sizes. It must also support the entire de- velopment process, from building compact laboratory prototypes for individual detectors to the full-scale en- gineering deployment in a large experiment. There- fore, the generic DAQ platform should possess both the capability for large-scale expansion and the agility for rapid deployment of compact test systems.
4. Reusable
The reuse of components is fundamentally enabled by a
widely recognized modular design paradigm, which can be 161
implemented in two key aspects: the modular design of hard- ware boards, and the modular design of processing logic (en- compassing both software and firmware).
To address the common issue of repetitive design in fields like FPGA logic development, a generic design needs to be transformed into truly reusable modules under defined standards or specifica- tions.
Such a design enhances reusability, which can be applied within a single experiment and, more importantly, across dif- ferent experiments. Notably, upgrades to an experiment often coincide with hardware iterations. In such scenarios, the ex- isting hardware often remains functionally sound. Therefore,
strong reusability can significantly reduce the costs associated 174
with these experimental upgrades.
5. Extensible
Extensibility encompasses but is not limited to scalability in size and functionality. We envision a broader sense of ex- tensibility for the generic DAQ platform. Building upon its standardized architecture, the platform should transcend its
initial purpose as a dedicated data acquisition system and ma- 181
ture into an open and iterative ecosystem. This ecosystem empowers users to introduce new standards-compliant hard- ware, develop custom processing modules, and adopt emerg- ing technologies.
Overview of D-Matrix platform To address the challenges in generalization and to realize
the SHARE principles in a unified framework, this paper pro- 188
poses D-Matrix (Fig. ): a generic firmware-software co- designed stream processing DAQ platform, structured around three foundational layers:
Protocol layer: Defines standard D-Matrix data frame containers that abstract heterogeneous detector pay- loads. Abstracts spatiotemporal properties into stream Custom DAQ Construction Automation DAQ Design Computer Aided Design Choose Modules Configure Cascade Modules Deployment Modules Computer Driven Design Dataflow Control Module Template Processing Module Template N Distributor
1 Multiplexer
Extractor Filter Reorganizer
N Router Transcoder Processing Layer Inter Entity Module Interface Intra Entity Module Interface Standard D Matrix FPGA (SDMF) port Standard D Matrix Software (SDMS) port Stream File Descriptor ZeroMQ Interface Abstraction Hardware Abstraction Optical fiber PCIe bus Ethernet Hardware
Abstract
FPGA Board FMC Board Server Layer Board Support Package (BSP) Multiple Point Point (MPP) model Spatiotemporal Properties Standard D Matrix Data Frame Protocol Data Array Data Triad Layer Stream Data Space Data Domain Data Cluster Stream
data spaces, partitioning hierarchical data domains by 195
granular attributes. These attributes directly govern module processing behaviors.
Hardware abstraction layer: Implements a Board Sup- port Package (BSP) and Multiple Point-to-Point (MPP) transmission model, standardizing access to heteroge- neous compute resources and transport interfaces.
Processing layer: Defines a model for configurable, reusable, and cascadable modules implemented across FPGA/software platforms. Establishes standardized in-
terfaces for inter-module communication within each 205
platform. These foundational layers collectively enable two ad- vanced capabilities: (1) custom DAQ system construction through modular cascading and domain configuration, and (2) the potential for computer-aided design automation. serves comprehensive architectural
overview, explaining the core design philosophy and the over- 212
all coordination of the platform’s main layers, while the intri-
cate technical details of individual components are detailed in 214
companion papers [ 9 – 12 ]. 215
Stream processing paradigm In large-scale physical experiments, the DAQ system as- sumes critical responsibilities that include data transmission and aggregation, device access and control, and operational
status monitoring. 220
These functionalities are delivered through distinct infor- mation carriers: scientific data information, control and feed- back information, and operational status information.
This operational context naturally leads to the generation of diverse and mixed data. This mixture stems fundamen-
tally from the traditional system’s approach to data organiza- 226
tion, where the hardware unit serves as the primary packag- 227
ing entity. A direct outcome of this approach is that a single
data frame concurrently contains various elements originat-
ing from the same source unit. Such intrinsic data hetero- 230
geneity poses a key challenge to DAQ system generalization:
excessive variability in data definition and composition, mak- 232
ing standardized processing unachievable. Moreover, this lack of abstraction standardization in
data definition inherently prevents unified hardware support. 235
FPGA modules require consistent, well-defined data struc- tures to implement efficient processing pipelines.
The in- trinsic heterogeneity of mixed data frames violates this re- quirement, forcing custom-designed FPGA implementations for each specific data mixture. Consequently, system general- ization is often limited to the transport architecture level. For instance, a transparent transport layer can be realized via en- capsulated protocols, while deeper hardware-level integration remains fundamentally unattainable.
In scenarios with mixed data transmission, all information
types share the same physical link while maintaining logi- 246
cal independence. To overcome this data mixture challenge, D-Matrix establishes the stream abstraction as independent
information units carrying pure data. Each stream is dedi- 249
cated to carrying one and only one of these types, for ex- ample, scientific data, commands, command feedback, status indicators, or urgent alerts. Furthermore, this purity extends beyond mere categorization by information type, as distinct metadata types within the scientific data stream necessitate their existence as separate streams.
The stream abstraction offers significant advantages: 256
Due to its decoupled nature, the transmission and pro-
cessing of each stream are largely isolated, minimizing 258
mutual interference. This allows different spatiotem- poral generation granularities for different streams, thereby reducing bandwidth waste caused by redundant data duplication in mixed data transmissions.
Due to its pure nature, modules within a stream’s pro- cessing path focus exclusively on single responsibili- ties, enabling generalized processing through abstrac- tion of both data and operations.
Building on this stream abstraction alongside modular pro- cessing, the D-Matrix platform constructs DAQ systems by
defining required processing modules within each stream’s 269
execution path. STANDARD DATA PROTOCOL Based on the characteristics of DAQ data, this section de-
tails the data protocol definition adopted by the D-Matrix 273
platform. Constructing a generic DAQ system necessitates a stan-
dardized definition for data containers. The incorporation of 276
FPGAs into a generic, unified processing framework imposes 277
stricter requirements on the granularity of data format speci- fications. However, achieving such generality and integration
faces significant challenges arising from the inherent diver- 280
sity of detector technologies and the divergent spatiotemporal granularities required across different processing stages.
A. Stream data space and data organization 283
A fundamental characteristic of scientific data in DAQ sys- tems is its inherent spatiotemporal nature.
Consequently, data points can be effectively represented within a two- dimensional matrix defined by time and space coordinates.
Here, time and space constitute the matrix dimensions, with scientific data values populating the corresponding points.
The D-Matrix platform adopts this matrix-based representa-
tion by introducing a unified stream data space, explaining its 291
name ’Data Matrix’. The data space of a stream comprises all points of interest: temporal occurrences (such as trigger iden- tifiers in triggered systems or time-slice identifiers in trigger- less systems), spatial locations (represented by channel iden- tifiers), and corresponding data values.
Several typical data organization schemes are categorized 297
as follows: Data Array: For continuous waveform sampling under a trigger-less mode, its output data points are contin- uous in both temporal and spatial dimensions, and are processed as data arrays.
Data Triad: Hit-producing detectors (e.g., Time-of- Flight (TOF) detectors) output sparse data structures
containing such as coarse timestamps, channel identi- 305
fiers, and fine time measurements, which are organized 306
as data triads. Data Cluster: Waveform-recording detectors output data in clusters, typically structured as a channel iden- tifier followed by a timestamp and a contiguous series of data points representing the full signal waveform.
• Sub-stream: The fourth organization scheme is not spe- 312
cific to a particular detector type. During system pro- cessing scenarios such as aggregating diverse informa- tion streams (e.g., global event building) or cases where detailed data content is irrelevant, one or multiple data frames with in a sub-stream are directly utilized as the data payload.
Data domain The DAQ data transmission and processing workflows in- herently operate at multiple distinct levels, each exhibiting
well-defined physical spatiotemporal significance. As illus- 322
trated in Fig. , the spatiotemporally hierarchical data space is partitioned into nested regions, which constitute formally defined data domains.
A key characteristic of this model is the nested relationship between data domains, where a father data domain (repre- senting a coarser spatiotemporal scale) encompasses and ag- gregates multiple child data domains. Crucially, the raw data points themselves remain unchanged as they are interpreted within these different domains; what evolves is the spatiotem- poral context and granularity used to interpret them. This
Father Data Domain Current Data Domain Current Data Domain Child Data Child Data Child Data Child Data Domain Domain Domain Domain Child Data Child Data Child Data Child Data Domain Domain Domain Domain Space Current Data Domain Current Data Domain Child Data Child Data Child Data Child Data Domain Domain Domain Domain Child Data Child Data Child Data Child Data Domain Domain Domain Domain Stream Data Space Data Frame Header Stream Frame Head Stream Frame Length Father Domain Start Time Index Father Domain Start Space Index Block Type 1: Data Array Front Outband Start Time Time Index Start Space Space Index Block Number Index Length Index Length Block Type 2: Data Triad Data Block Index Start Time Block Length Start Space Index Block Property 1 Block Data 1 Block Type 3: Data Cluster Start Time Time Index Start Space Block Length Block Property 2 Block Data 2 Index Length Index Start Time Start Space Stream Frame Length Index Index (Block Length) Block Property N Block Data N Block Property n Trailer Stream Frame Tail Rear Outband
Abstract
ion effectively separates the immutable data payload from dynamic processing context.
For example, during front-end electronics packaging, tem- 335
poral resolution operates at fine-grained levels (e.g., 10-ps Time-to-Digital Converter (TDC) bins), with spatial refer- ences bound to localized channel identifiers.
Conversely, at the event-building phase, temporal context transitions to coarse time slices or global trigger identifiers (e.g., 25-ns Large Hadron Collider (LHC) bunch-crossing IDs) while spa- tial references shift to globally remapped identifiers derived from assembly logic.
Standard D-Matrix data frame
The D-Matrix platform implements a unified frame format 345
specification for all heterogeneous information streams. As in Fig. , the frame structure comprises three distinct seg- ments: header, data block, and trailer sections. The data block functions as a subordinate data structure within the frame, en-
hancing the organizational flexibility of payload. In the pay- 350
load segment, this data model provides four data organization 351
schemes to address heterogeneous output formats across sub- systems as previously described. The detailed specifications of this data frame container are provided in [
Within the same data stream, the frame format definitions 355
differ across data domains. The data block represents the data
organization relationship within the current data domain. The 357
Father Data Domain Start Time/Space Index (FDSTI/FDSSI) in the frame header indicate the spatiotemporal coordinate po- sition of the current frame within the father data domain.
Redundancy reduction in data representation In large-scale physics experiments, the demands for high data rates pose a major challenge for transmission bandwidth, computational resources, and storage infrastructure, making it a primary cost driver in the construction of DAQ systems.
Therefore, the design of DAQ protocols must thoroughly
consider minimizing redundant data transmission to alleviate 367
bandwidth and storage pressures.
The data organization schemes mentioned above directly 369
embody this approach to redundancy reduction, and can be categorized based on their spatiotemporal density as follows:
Extreme Density: When every spatiotemporal coordi- nate carries transmitted data, the format need only doc-
ument the initial time-space coordinates. Subsequent 374
data points are sequentially arranged, with each inter- nal point’s coordinates derivable from its positional off- set within the stream. This constitutes the ’data array’ format.
Extreme Sparsity: Using data arrays for sparse data in- troduces excessive zero-fill invalid data points. Instead, each isolated sparse point is directly represented as a
’data triad’ containing explicit [time, space, value] pa- 382
rameters, eliminating null data overhead. Locally Dense Clusters: For globally sparse yet locally dense distributions, a ’data cluster’ format applies: one starting coordinate followed by sequential dense data points describes each concentrated region, balancing efficiency and precision.
Furthermore, the two-tier structure (frame and block levels) of the data frame inherently contains high-order temporal-spatial information extraction, which enables sub- stantial reduction in bit-width requirements for temporal- spatial coordinates within the data payload.
Additionally, the transmission of specific fields within data frames can be configured per stream. The guiding principle is that D-Matrix frames transmit only mutable fields, while per- sistently static fields utilize preconfigured default values. For instance, in fixed-length frames, the Stream Frame Length field is configured not to be transmitted. Stream processing modules instead derive frame length details from their default configuration parameters.
Configurable stream parameters
In the D-Matrix platform, both the data organization 403
schemes and the content/length of data frame fields are highly flexible. Accordingly, a corresponding configuration mecha-
nism is required. 406
Two configurable properties tables are respectively used to describe the properties of a stream and the properties of a spe- cific data domain within that stream: the Data Space Prop- erties Table (DSPT) and the Data Domain Description Ta- ble (DDDT). These configuration tables are finalized during the system design phase. In software, they are distributed as global configuration files. In the FPGA implementation, these static configurations are set as parameter values when the stream processing modules are instantiated. When a stream processing module receives a data frame, it uses the Stream ID and data domain number contained in the frame header to look up the corresponding attribute tables and proceeds with subsequent processing.
The DSPT is uniquely determined by the Stream ID and 420
is primarily used to configure the data organization schemes 421
(i.e., data array, data triad, data cluster, and sub-stream) for that stream, as well as basic bit-width parameters (such as the bit-width of a single data point within a data array).
The DDDT is jointly determined by the Stream ID and the data domain number. It is mainly used to configure the data frame format within the corresponding data domain, in- cluding: whether each field is transmitted; if transmitted, its configured bit-width; or if not transmitted, its default value.
Additionally, it contains configuration information regarding
data block organization schemes. 431
STANDARD HARDWARE AND INTERFACE
Abstract
This section describes the abstraction of transmission inter- faces and computing resources, aiming at achieving arbitrary interconnectivity between hardware components and ease of hardware utilization.
A. Unified transmission model 438
To achieve arbitrary interconnectivity among heteroge-
neous devices, a unified transmission model is indispensable. 440
DAQ systems in large-scale physics experiments incorpo-
rate diverse components, including DAQ-specific electronic 442
boards, computing servers, computational accelerators, front-
end electronics from detector subsystems, and ancillary de- 444
vices. Simultaneously, this hardware heterogeneity directly leads to a diversity of interfaces between hardware compo- nents, exemplified by optical fibers, electrical ports, PCIe buses, and Ethernet.
To address such interface diversity, application-layer logic
requires decoupling from the transport layer, aligning with 450
scalable system design principles that minimize protocol- 451
specific dependencies. MPP Model Server or FPGA board Server or FPGA board Application Application Application Application Application Application Optical Fiber/ LVDS/ PCIe/ IPBus/ ZeroMQ Stream n Stream n Application Application As established in the preceding discussion, D-Matrix ab- stracts heterogeneous information types into distinct data streams. Multiple independent data streams share one phys- ical interface, with each stream connecting exclusively to its own source and destination application modules. D-Matrix introduces this multi-stream transmission over shared inter- faces as a Multiple Point-to-Point (MPP) transmission model (Fig. ]. The MPP module incorporates critical features such as priority scheduling, retransmission, and backpressure.
D-Matrix encapsulates heterogeneous connection types into
this unified model, designed to support: 463
Hardware-to-hardware connections: Low-Voltage Dif- ferential Signaling (LVDS), Multipoint Low-Voltage Differential Signaling (M-LVDS), and optical fiber; Hardware-to-software connections:
PCIe and IPbus ], with future extensions for Compute Express Link (CXL) [ Software-to-software connections: ZeroMQ [ ], with planned support for high-performance protocols such as Remote Direct Memory Access (RDMA).
Hardware abstraction layer Within DAQ systems, diverse computing resources are of- ten employed, such as CPU for the most versatile generic pro- cessing, GPU specialized for massive parallelism, and FPGA
providing critical real-time efficiency and pipelining capabil- 477
ities. This inherent resource heterogeneity introduces sig-
nificant challenges to unified processing across the system, 479
thereby necessitating an abstraction layer that decouples the underlying hardware specifics and the application layer im- plementation.
While different architectures exist, abstraction solutions for CPU hardware variations are well-established. Operat- ing systems, device drivers, and high-performance runtime li- braries abstract away hardware-specific differences, enabling hardware-agnostic application development. Consequently, driven by these abstraction layers, the same program can be deployed and operated across diverse hardware implementa- tions within the same architecture, while some specific cross-
Board Support Other Sreams (e.g. Data Stream) Package CMD Hub Command Stream / Command Feedback Stream Other Entities Status Hub MPP Port MPP Port Status Stream Application Layer Urgent Hub Urgent Stream
Clock / Synchronization
Default Local Devices Local Bus / Local Processing Signal platform libraries can enable its execution even across differ- ing architectures.
Unified architectures like CUDA enable GPU comput- 493
ing development in workflows similar to CPU-based sys-
tems, while technically, GPUs primarily serve as coproces- 495
sors within CPU-centered computing infrastructures.
In contrast, development on FPGAs presents significant 497
challenges due to the lack of standardized tools. The absence
of a unified abstraction layer necessitates developers to di- 499
rectly confront heterogeneity across FPGA models, includ- ing divergences in physical interfaces, available IP cores, and even computational primitives, while requiring substantial ef- fort to navigate intricate implementation constraints.
For the generalized system to integrate these diverse hard- ware components, the D-Matrix platform introduces the Board Support Package (BSP, Fig. ), derived from the con- cept in the VxWorks real-time operating system (RTOS) [ as the hardware abstraction layer for the FPGA.
The core function of the BSP is to encapsulate the hard- ware details of the FPGA and provide a consistent interface for FPGA applications. Within a stream processing system where all interactions occur in the form of streams, the BSP consequently requires the provision of a set of standardized stream interfaces as well as some default operations. These interfaces include:
Interfaces to other entities: The BSP employs multiple
MPP ports to communicate with other hardware enti- 517
ties in the system. All streams are transmitted through these MPP ports, with each port dedicated to connect- ing a specific external entity.
Interfaces to local devices: All board-specific local hardware resources are connected to the BSP via a local bus or direct signal connections.
Interface to the application layer: The BSP exposes pure stream interfaces to the application layer. By ab- stracting both local hardware resources and external connections, the BSP enables application layer devel- opment to focus exclusively on stream processing logic without concern for underlying hardware details.
In addition to providing interface encapsulation, the BSP also implements a set of default operations specific to each board’s characteristics. These default operations include:
• Command and command feedback for configuring, ini- 533
tializing, and controlling the hardware. Aggregation of the local hardware status (e.g., temper- ature, humidity and current).
Generation of default urgent messages (e.g., upon tem- perature or current anomalies).
These messages are then routed through their respective
hubs for communication with other entities. It is noteworthy 540
that the BSP also manages clock and synchronization signal 541
processing. These signals can be configured to be generated locally, be sourced via dedicated external links, or be recov- ered directly from the data transmission links [ The BSP fundamentally constitutes a dedicated code mod- ule that abstracts hardware-specific components, which tra- ditionally required custom handling at the application layer, into a cohesive logical design. By performing this individu- alized design process once for each supported hardware plat- form, it lays the foundation for reusability, creating modules that can be utilized by subsequent generalized application layer designs.
Since the Board Support Package (BSP) encapsulates hardware-logic mapping, a board comprising multiple hard- ware components will have a final BSP that is a composi- tion of its individual component-level BSPs. For example, D-Matrix’s hardware designs can adopt the FPGA Mezza-
nine Card (FMC) standard [ 18 ], which uses a carrier board 558
for computational resources and a mezzanine module for in- 559
terface flexibility. Each of these cards has its own dedicated BSP, and they are combined to form a complete composite BSP for the integrated board.
This hardware encapsulation introduces a critical isolation layer between the core processing logic and the physical hard-
ware. This abstraction provides a unified, standardized inter- 565
face, which not only simplifies the interconnection of hetero-
geneous hardware but also significantly enhances its reusabil- 567
ity across different experimental setups. By abstracting away low-level details, the BSP transforms hardware from single- use, experiment-specific components into reusable resources.
This allows the same hardware to be deployed across diverse scientific missions without requiring duplicative interface de- velopment or reapplying complex pin constraints.
Conse- quently, the application logic can be ported across different
platforms with minimal effort, significantly accelerating the 575
development and deployment cycles. Expanding FPGA Applications Compared to other computing platforms, the most defin- ing characteristic of D-Matrix lies in its heterogeneous na- ture, specifically, its support for the generic computing of FPGAs. In the common multi-level aggregation architecture within DAQ systems, FPGA boards typically serve as inter- mediate nodes in a tree-like structure. However, in a generic
computing platform, the topological possibilities for FPGAs extend far beyond this conventional arrangement.
FPGAs are inherently constrained by their available hard- ware resources in general-purpose computing, which limits the complexity of processing tasks they can handle. though FPGA capacities continue to grow, this trend often
leads to significantly higher costs. Moreover, even these 590
high-capacity FPGAs can be insufficient for certain demand- ing applications, such as real-time AI processing. Therefore, by analogy with high-performance computing clusters com- posed of CPUs and GPUs, one potential approach is to form a computing cluster of FPGAs interconnected by a network. In such an architecture, the interconnection can be implemented using dedicated commercial network switches to form the
fabric, or by designating specific FPGA nodes as communi- 598
cation and switching hubs to achieve a more tightly integrated system. This allows the cluster to distribute complex process- ing tasks across multiple FPGA nodes efficiently.
In another usage scenario, instead of building a cluster purely from FPGAs, the FPGAs are integrated into existing CPU/GPU computing clusters to act as coprocessors. In this
model, the CPU remains the primary unit for task scheduling, 605
while the decomposed tasks are offloaded to the FPGAs for processing. The results are then collected and aggregated by the CPU.
These usage scenarios heavily rely on the hardware isola- tion of FPGAs and a well-defined encapsulation of the re- sources available for general-purpose computing.
Further- more, the stream processing paradigm is inherently suitable for the subtask distribution and result retrieval in coprocessor scenarios. Although these application models have not been fully engineered and deployed, their potential for future ex- pansion should be considered during the architectural design phase.
STANDARD PROCESSING PARADIGM This section presents the modular processing paradigm in D-Matrix, which encompasses standardized inter-module in- terconnection interfaces and generic computational consider- ations based on basic module pattern.
Module processing paradigm The processing architecture of a generic DAQ system is simultaneously constrained by two fundamental sets of re- quirements derived from its scientific application scenarios: those on performance and those on functional complexity.
On one hand, these systems must satisfy inherently stringent real-time constraints and exceptionally high throughput de- mands, necessitating high-speed data acquisition and imme- diate processing capabilities. On the other hand, they must implement core data processing functionalities such as event building, High-Level Trigger (HLT) processing, and real-time data compression, which require substantial and flexible com- putational resources.
Software SDMS Port File I/O Interfaces ZeroMQ Network Sockets (RDMA planned) In response to the challenge of processing complexity, a modular design provides a fundamental approach for achiev- ing key system objectives, such as enhanced flexibility, scal- able architecture, and clear separation of responsibilities among components. As previously mentioned, D-Matrix em- ploys the spatiotemporal data matrix as the fundamental form for data representation. Correspondingly, the stream process- ing modules within D-Matrix also utilize the spatiotemporal
data matrix as the basic unit for data processing. By defin- 644
ing the spatiotemporal attributes of the data that a module handles, the same module can demonstrate different specific behaviors. This approach fulfills diverse processing require- ments and enhances reusability.
Each module is responsible for a single specific duty. For a given complex processing task, its implementation can be approached by decomposing the task into a cascading series of simpler stream processing modules, thereby constructing the intended functionality through their sequential or parallel arrangement. This requires that modules have the property of
arbitrary interconnectivity; therefore, the definition of stan- 655
dard module interfaces is a prerequisite. In heterogeneous systems, where the underlying comput- ing hardware is heterogeneous, D-Matrix enhances adaptabil- ity by providing hardware-specific implementations for dif- ferent modules. Many modules are available in both FPGA firmware and CPU software versions, enabling flexible task placement and execution during system design.
In the modular stream processing paradigm, where data flows continuously through a chain of modules, the pri- mary performance metric is often throughput, which can take precedence over latency. Provided the throughput require- ment is met, a tolerable increase in latency is acceptable
within buffer limits to ensure an uninterrupted data flow. At 668
the software level, methods like increased parallelism and load balancing are employed to enhance throughput. How- ever, these approaches are not directly applicable to FPGA
processing. Instead, we impose a controllable pipelining re- 672
quirement on FPGA modules to boost their throughput.
Standard module interface To ensure the arbitrary interconnection capability between modules required for such flexible construction, standardized inter-module interfaces are a prerequisite.
For intra-FPGA module communication, the Standard D- 678
Matrix FPGA (SDMF) port [ ] is implemented. The SDMF port is extended from the AXI4-Stream interface.
Its key characteristic is that the spatiotemporal information of the
data frame is transmitted synchronously with the data pay- load, ensuring that downstream modules can simultaneously access all information required for processing, thereby en- hancing the pipeline performance of modules within the FPGA.
For intra-server module communication, the Standard D- 687
Matrix Software (SDMS) port [ ] governs software interac-
tion. The core design principle of SDMS is to minimize data 689
copying overhead. It employs a message queue coupled with a dual shared memory architecture, which partitions a shared memory segment into a payload area and a descriptor area.
Within a single server, all modules operating on the same data stream utilize the same shared memory segment. Data transfer between these modules is achieved by sending the starting address of the current data frame within the shared memory through the message queue, thereby eliminating the copying overhead for data transmission within the node. This
design allows operations such as data frame reorganization 699
to be performed by modifying only the descriptor area. By altering the order in which descriptors point to the data pay- loads, the logical relationship of the data can be restructured without physically moving large volumes of data, enhancing processing efficiency.
As to inter-entity communication, transcoder modules at 705
the transmission boundary handle protocol conversion be- tween the SDMF port and AXI4-Stream. Similarly, for soft-
ware modules, universal interfaces like files or ZeroMQ sock- 708
ets are provided for external interaction, with data being se- rialized. Furthermore, we plan to develop an RDMA-based interface in the future to enable direct memory-to-memory
data transfer between nodes, thereby significantly reducing 712
the overhead associated with memory copy and serializa- tion/deserialization. These interface implementations are cat- egorized in Table
Such standard module interface definitions grant the flex- 716
ibility of free interconnection between modules, thereby al- lowing the construction of data processing paths for stream processing systems through the cascading of modules.
Standard stream processing modules The D-Matrix platform categorizes standard DAQ modules into two primary classes: dataflow control modules and data processing modules.
Dataflow control modules encompass: (1) 1-to-N distrib- utors for fan-out operations, (2) N-to-1 multiplexers for data concentration, (3) fully-connected switching modules for N- to-N routing, and (4) transcoding modules for interface con- version and cross-node transmission.
The data processing modules are primarily responsible for the actual implementation of functionalities, including split module, feature extractor module, trigger module, and flowmeter module, among others. Furthermore, event build- ing constitutes the core functionality of the DAQ system.
In D-Matrix, the merge module implements a generic event building algorithm based on the configuration of data domain attributes, as detailed in [ Basic module pattern and complex task decomposition Beyond the basic functions of data distribution and event assembly in DAQ, this streaming framework actually pos- sesses the capabilities to implement more complex pro- cessing.
Implementing such complexity directly in FPGA firmware often relies on custom programming, which chal- lenges processing generality and reuse. To overcome this, we draw upon a fundamental systems theory: the behavior of a complex real-time system is determined by both its current internal state and sequences of historical states. This state- dependent nature implies that a complex task can be concep- tually decomposed into a series of simpler, more manageable subtasks or states.
However, mere theoretical decomposability does not guar- antee efficient hardware implementation. The critical engi- neering challenge lies in standardizing the implementation of these common subtask patterns to avoid reinventing the wheel for each new application. It is this need for standard- ization that logically leads to the adoption of a template-based methodology.
Analogous to the template-derivation paradigm, D-Matrix
Abstract
s three base patterns [ ]. During the design phase
of each basic module pattern, we first ensure the pipelining 759
characteristics of its individual components and the overall structure. Then, different algorithmic functions are incorpo- rated into this base pattern to derive various concrete func- tional modules.
This approach is fundamentally different from HLS. While HLS focuses on translating a computational function into hardware, our basic pattern framework primar-
ily emphasizes the satisfaction of pipelining constraints first. 766
Base patterns and their exemplary instantiations include:
Extractor: e.g., 1-D Peak Finder, Multiplicity extractor; Filter: e.g., Trigger filter, 2-D weighted average calcu- lator;
• Reorganizer: e.g., Merge, Incremental nearest neighbor 771
clusterer. In addition to basic pattern derivation, more complex tasks can also be accomplished through the cascading of multi- ple derived modules. Although it has not been theoretically proven that these three basic templates can be combined to ad- dress all stream processing tasks, their practical validation has been demonstrated in applications such as Time Projection Chamber (TPC) cluster reconstruction and two-dimensional Hough transforms [ ], among others.
Module configurability To fulfill generic design requirements, all modules fea- ture standardized configurability, categorized into interfacial adaptability and behavioral configurability. Behavioral con- figurability encompasses two aspects: predefined configura- tion rules bound to specific data characteristics (implemented
via data stream and data domain properties), and module- customized configuration parameters. In addition to the pa- rameters configured via the configuration file, standard mod- ules also feature a command interface that can connect to the command network, enabling dynamic updates to their opera- tional parameters.
UNIFIED CONTROL MECHANISM
Control mechanisms in DAQ systems face operational vari- 794
ability due to heterogeneous hardware resources and diverse experimental modes, demanding standardized approaches.
Generic requirements encompass device-agnostic hardware control and generic execution of preset operational proce- dures.
Component access To achieve device-agnostic hardware control, the platform
necessitates unified access protocols and standardized iden- 802
tification mechanisms. As previously discussed, large-scale 803
physics experiments incorporate numerous connection types.
While each connection method itself may possess mature
communication and node identification schemes, a more uni- 806
versally unified solution is still essential for a heterogeneous 807
platform. The connections between these devices are typi- cally wired bidirectional links. Benefiting from the commu-
nication and encapsulation capabilities of the MPP model in 810
the D-Matrix generic platform, component access is achieved without establishing additional physical links.
To address the hierarchical topology commonly found in DAQ systems, we have developed a multi-root tree-like topol- ogy model complemented by a command forwarding and
routing mechanism [ 10 ]. This design enables automatic node 816
traversal, yielding a routing-based identification outcome that fulfills clustering and hierarchical node management require- ments. Within this node identification algorithm, CPU-based servers are treated as distinct nodes, while GPUs function as attachments to CPU nodes and do not possess indepen- dent identifiers. In contrast, FPGA boards are consistently regarded as independent nodes, whether they are standalone
units in a chassis or installed in server slots. Theoretically, 824
this approach can be generalized to arbitrary graph-based topological structures, provided the graph remains connected without isolated nodes.
Run control A key design principle is the decoupling of the DAQ soft-
ware from detector-specific front-end electronics (FEE) con- 830
trol, facilitated by open and extensible interfaces. Simultane- ously, given the variability in operational modes and detector configurations, the DAQ platform must deploy generic work-
flow control mechanisms to minimize redundant development 834
efforts. D-Matrix implements a universal control module 835
leveraging Python’s dynamic loading capability, which re- solves functions by name at runtime to achieve architectural decoupling between the control kernel and operational logic components. Designers can extend functionality by adding new functional groups or extending existing groups. The con- trol module dynamically invokes these functions through a standardized syntax: group_name + command_name [+ op- tional_parameters] [ Furthermore, during experimental operations, D-Matrix partitions workflows into distinct operational phases includ-
ing initialization, parameter configuration, synchronization, 846
acquisition start/stop, and global reset. Configuration and control behaviors for individual detectors across these phases are defined via phase-specific functions in Python scripts.
These scripts, authored entirely by detector designers, im- plement device-specific configurations through register read- /write commands.
Scripts for different operational modes can be independently configured and modified, enabling run- time switching based on mode selection. Additionally, DAQ functionalities leverage script-based approaches for enhanced flexibility in implementing simpler tasks.
Online reconfiguration Changing experimental requirements and scales between runs demand online reconfigurability. For FPGA-based de- ployments, remote logic updates are implemented using
vendor-specific partial reconfiguration mechanisms, with Xil- 861
inx devices utilizing MultiBoot [ ] technology and Intel FP- GAs employing Remote System Update (RSU) [ ] schemes.
For software reconfiguration, modules utilize configuration file swaps during operation, with execution management han- dled by Supervisor [ ], which determines active modules and configuration file paths per server. Batch updates across cluster nodes are orchestrated using Ansible automation tools DAQ SYSTEM CONSTRUCTION Construction process Based on the D-Matrix components introduced earlier for generalized design, the construction process of an actual DAQ system is illustrated through a multi-level merging scenario as follows:
1. Requirements Assessment: Evaluates overall system
data attribute requirements and processing demands to determine the number of merging stages and hardware
units required for satisfying performance and I/O con- 879
straints.
2. Hardware Connection: Establishes the physical hard-
ware topology by interconnecting hardware compo- nents according to the determined merging hierarchy and hardware quantity.
3. Stream Attribute Planning: Partitions scientific data
streams into data domains based on design-phase pro- cessing requirements, generating a data domain at-
tribute table while maintaining consistent attributes 888
for status, command, feedback, and urgent message streams across systems.
4. Core Processing Module Design: Identifies hardware
levels for each merging stage and designs configura- tion parameters for merging modules through func- tional module selection and configuration file genera- tion from the library.
Step 3 (Stream Attribute Planning) and Step 4 (Core 896
Processing Module Design) can be executed before or after Step 2 (Hardware Connection), depending on the design scenario.
In requirements-driven approaches, stream attributes and modules are defined first to guide hardware integration.
Conversely, hardware connec- tions may be prioritized based on factors like cost or scalability. This flexibility allows the D-Matrix plat- form to adapt to various experimental needs without compromising modularity or performance.
5. Auxiliary Module Design:
Supplements the design with inter-node dataflow control modules, transcoding modules, and frame checkers associated with merg- ing modules, alongside command/status/urgent mes- sage modules per hardware node. Additionally, this step requires comprehensively considering of the per- formance requirements and processing capabilities of each data processing path to configure an appropriate buffer size for the inter-module interfaces.
6. Node-Level Configuration File Generation: Generates
final module lists and configuration files for each hard- ware node, including inter-module connection relation- ships based on preceding module designs.
7. Code Generation and Deployment: The software phase
initiates with pre-deployment of binary executables us- 920
ing infrastructure tools such as Ansible, followed by per-node customization of module and supervisor con- figuration files. For FPGA cards, this involves combin- ing card-level BSP and connecting functional modules to generate application-layer HDL code. Subsequently, the BSP and application-layer code are integrated to construct the complete logic implementation, which is ultimately deployed through remote bitstream update capabilities.
Towards automation in DAQ design Leveraging computers to assist human designers is a com- mon practice, exemplified by CAD software in the indus-
trial domain and EDA software in the electronics field. The 933
adoption of computer-aided methods in DAQ design can also greatly enhance development efficiency. The D-Matrix plat- form already inherently possesses some prerequisites for re- Resources Entities Modules (Database / YAML file) CASD_core Runtime Entity Loader Module Loader entity resources (PN) User Setting entity resources Interface (resources url, optimize target...) Environment module resources resources, config Optimizer Output topology Topology final streamgraph Topology Loader (YAML file) Input streamgraph simulation StreamGraph for each entity Simulator StreamGraph (YAML file) Encode StreamGraph StreamGraph Decoder Simulator Behavior stream property of each interface Performance throughput, latency...
Interface alizing Computer-Aided System Design for DAQ (CASD- DAQ).
The realization of CASD can be broadly divided into two phases. The first phase involves using computers to acceler- ate the development process, while the second phase aims for fully automated computer-driven design.
1. Computer-Aided Design Phase
During the Computer-Aided Design Phase, the overall system-level design is still undertaken by the DAQ designer.
The designer must plan the hardware connectivity topology and the system processing flow, select the specific processing modules to be employed, and then interconnect and configure these processing modules via GUI operations or configuration files. The subsequent development steps, including automatic code generation and deployment, can be assumed by compu- tational tools.
Realizing this process relies on several prerequisites. First, the configuration and connectivity of processing modules are ensured by D-Matrix’s standardized abstraction of modules and the standardization of module interfaces. Secondly, auto-
mated code deployment depends on a unified access model 957
and online reconfigurability, both of which are topics ad- dressed in preceding sections.
Regarding automated code generation, the software do- main inherently supports system deployment via configura- tion files since the software modules adopted by D-Matrix are already configured through these files. This approach elimi- nates the need for code generation or modification. In the FPGA domain, however, automated code generation is fully achievable with the support of the BSP, and relevant develop- ments are under active investigation.
2. Computer-Driven Design Phase
Within the Computer-Driven Design Phase, after the sys- tem designer defines coarse-grained processing requirements,
formats, diverse hardware interfaces, and stringent real-time the computer autonomously refines the processing flow, se- processing demands. D-Matrix employs a strategy of stan- lects the necessary processing modules, and determines their dardization and generalization across its core components. interconnections and configurations. One viable structure for This approach, applied to data protocols, hardware and in- such automated design is outlined below (Fig. terface abstraction, and processing paradigms, is key to sig- Resources: Maintain the resources required for sys-
nificantly enhancing the generality and usability of hetero- 1006
tem design through a resource library, including avail- geneous systems, especially those integrating FPGAs. The able hardware boards, functional modules, and their de-
platform significantly reduces development costs and time- 1008
tailed attributes. to-deployment by enabling extensive reuse of standardized
components while maintaining adaptability to diverse detec- 1010
Input: The target system’s hardware connection topol- tors and experiments. ogy and a coarse-grained flow graph (including core processing module configurations) are input by means Based on the development of the D-Matrix platform, the of configuration files.
DAQ system has been deployed at the CEE experiment ], achieved 73.6 Gbps throughput and processed 20,890 Optimizer: Perform iterative optimization based on re- events/s.
Future implementations target the High Energy source constraints and optimization strategies, auto- Fragment Separator (HFRS) [ ] and Super Tau-Charm Fa- matically allocating processing modules to target hard- cility (STCF) [ ware entities.
Simulator: During iteration, output behavioral simula- Furthermore, the inherent modularity and standardization tion and performance simulation results via the simula- within D-Matrix provide essential foundations for automated tor for designer supervision.
DAQ design and an extensible ecosystem. By standardiz-
ing component definitions, the platform enables third-party 1021
Output: The deployment configuration for each target contributions of compliant hardware, firmware and software entity module is similarly output in the form of config- modules, ensuring long-term extensibility and iterative evolu- uration files.
tion. With the enhanced generality of the platform, significant 1024
improvements have been achieved in its applicable scenar- Subsequent code generation and deployment follow an ios, the diversity of hardware connectivity, and the complex- identical procedure to that of the preceding design phase. ity of processing tasks it can support. Consequently, such a platform now transcends its original purpose, evolving into a generic heterogeneous real-time computing platform rather VIII.
CONCLUSION
than being confined to the single application scenario of data acquisition.
This paper has presented D-Matrix, a generic firmware- software co-designed stream processing platform for con- Future work will expand module libraries, explore new structing DAQ systems in large-scale physics experiments.
high-performance communication methods, enhance cross- 1033
Built upon the SHARE principles, D-Matrix provides a holis- platform portability, and advance CASD-DAQ capabilities. tic solution to critical challenges such as heterogeneous data V. Friese, CBM Collaboration, et al ., The high-rate data chal- K. Biery, C. Green, J. Kowalkowski, et al ., artdaq: lenge: computing for the CBM experiment.
Journal of Physics: Event-Building, Filtering, and Processing Framework.
Conference Series , 112003 (2017). Transactions on Nuclear Science , 3764–3771 (2013). 6596/898/11/112003 10.1109/TNS.2013.2251660
[2] R. Bartoldus, C. Bernius, D. W. Miller, Innovations in trigger 1039
[7] J. Gutleber, L. Orsini, Software Architecture for Processing 1057
Clusters Based on I2O. Cluster Computing , 55–64 (2002). and data acquisition systems for next-generation physics facil- ities. arXiv preprint (2022). arXiv: 2203.07620 [hep-ex] T. Stockmanns, PANDA Collaboration, et al ., FairMQ for On-
[3] J. Gutleber, S. Murray, L. Orsini, Towards a homogeneous ar- 1042
chitecture for high-energy physics data acquisition systems. line Reconstruction - An example on PANDA test beam data.
Computer Physics Communications , 153 , 155–163 (2003). 1044
Journal of Physics: Conference Series , 032021 (2017). 10.1016/S0010-4655(03)00161-9 P. Moreira, R. Ballabriga, S. Baron, et al ., The GBT Project.
T. Wang, J. Yang, H. Wang, et al ., A Generic Streaming Soft-
In: Proceedings of the Topical Workshop on Electronics for 1047
ware Platform Design for High-Energy Physics Data Acquisi- Particle Physics . (CERN, Geneva, Switzerland, 2009), p. 342– tion Systems.
IEEE Transactions on Nuclear Science , 101– 109 (2021).
W. Wu, FELIX: the New Detector Interface for the ATLAS Z. Sun, J. Yang, L. Zhang, T. Wang, R. Liu, K. Song, A generic Experiment.
IEEE Transactions on Nuclear Science , 986– node identification and routing algorithm in a distributed data acquisition platform: D-Matrix.
Journal of Instrumentation 992 (2019).
, P12012 (2023). China (2025). L. Zhang, J. Yang, T. Wang, Z. Sun, K. Sun, J. Zeng, Yang, Generic Control Software Event Building Algorithm in a Distributed Stream Pro- Structure in a Distributed Data Acquisition Platform: cessing Data Acquisition Platform: D-Matrix.
IEEE Trans- Matrix. PyHEP 2023: Python Energy actions on Nuclear Science 105–112 (2023).
Physics Users Workshop . (CERN and PyHEP Collabo- 10.1109/TNS.2023.3235904
ration, Online, Oct. 2023), Lightning talk. Available: 1109
L. Zhang, J. Yang, Z. Sun, et al ., D-Matrix: FPGA-Based So- lutions for General Stream Processing in High-Energy Physics B. Li, K. K. Gakhal, MultiBoot with 7 Series FPGAs Experiments.
IEEE Transactions on Nuclear Science Application (XAPP1247). (Xil- inx), Application XAPP1247 (2017).
Available: 2209–2218 (2024). C. G. Larrea, K. Harder, D. Newbold, D. Sankey, A. Rose, Intel Corporation, Agilex™7 Configuration User Guide A. Thea, T. Williams, IPbus: a flexible Ethernet-based con- trol system for xTCA hardware.
Journal of instrumentation (Intel Corporation, 2025), Document 683673, , C02019 (2015).
Chapter 5: Remote System Update (RSU). Available: D. Das Sharma, R. Blankenship, D. Berger, An Introduction to the Compute Express Link (CXL) Interconnect.
ACM Comput- ing Surveys , 1–37 (2024). Agendaless Consulting Contributors, Supervi-
[15] ZeroMQ: An open-source universal messaging library. (2025). 1088
Process Control System. (2025). Available: Wind River Systems, VxWorks 7 BSP Development Guide Red Hat, Inc., Ansible Collaborative. (2025).
Available: (2025). Available: L. Lü, H. Yi, Z. Xiao, et al Conceptual design of 7-bsp-development-guide/ the HIRFL-CSR external-target experiment.
Science China J. Zeng, J. Yang, L. Zhang, Z. Sun, K. Sun, FPGA Implementa-
Physics, Mechanics & Astronomy , 60 , 012021 (2017). 1127
tion of Fixed-Latency Command Distribution Based on Aurora 64B/66B.
IEEE Transactions on Nuclear Science , 1348– 10.1007/s11433-016-0342-x L. Sheng, X. Zhang, J. Zhang, et al ., Ion-optical design of 1356 (2024).
High energy FRagment Separator (HFRS) at HIAF. Nuclear
[18] R. Seelam, I/O design flexibility with the FPGA mezzanine 1097
card (FMC). Xilinx White Paper WP315 (2009). Available:
Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms , 1–9 (2020).
L. Zhang, Research on the Design of a General Stream Process- M. Achasov, X. Ai, L. An, et al ., STCF conceptual design re- ing Architecture Based on FPGA in the Nuclear and Particle port (Volume 1): Physics & detector.
Frontiers of Physics Physics Experiment Data Acquisition System. Ph.D. disserta-
tion, University of Science and Technology of China, Hefei, 1103
14701 (2024).