Diversity-repeatability trade‑off governs hierarchical cell fate coding in multicellular development
Chunxiuzi Liu, Yu Liu, Guoye Guan
Submitted 2025-12-08 | ChinaXiv: chinaxiv-202512.00064 | Original in English

Abstract

Multicellular development in plants and animals gives rise to a rich spectrum of cell fates performing complex physiological functions. How these cells differentiate in concert from a single-celled zygote to collectively form tissues and organs across the whole body remains a fundamental question in developmental biology. In this study, we applied the Ladderpath approach to decompose the spatially-aligned cell fate sequence of worm \textit{Caenorhabditis elegans} embryo, which produces 671 uniquely identifiable cells with defined fates (\textit{incl.} germline, intestine, muscle, neuron, pharynx, skin, and other). This approach automatically dissects how sub-sequence modules (Ladderons) self-repeat and mutually assemble in a hierarchical architecture, uncovering a delicate trade-off between diversity and repeatability in cell fate coding. This architecture consisting of highly diverse yet repeated Ladderons indicate that the actual cell fate sequence deviates substantially from both homogeneous and heterogeneous extremes, achieving an exceptional level of hierarchical complexity near the theoretical maximum. Genetic-algorithm-based virtual evolution further reveals such complexity as an optimization goal in development, with pseudo sequences spontaneously converging toward similar hierarchical architectures designed through moderate Ladderon numbers and lengths. Notably, the longest Ladderons highlight intercellular Notch signaling as a key mechanism for enhancing hierarchical complexity via coordinating lineal differentiation. More broadly, the repeated lineal differentiation programs serve as an essential strategy for enhancing hierarchical complexity — a pattern recurrently observed in reality. Together, this work offers new insights into design rules of cell fate coding operating from molecular to multicellular scales and provides a theoretical framework for identifying and interpreting the regulatory mechanisms beneath.

Full Text

Preamble

Diversity-repeatability trade-off governs hierarchical cell fate coding in multicellular development Chunxiuzi Liu a,b,c , Yu Liu a,b,1 , and Guoye Guan d,e,1 This manuscript was compiled on December 7, 2025 Multicellular development in plants and animals gives rise to a rich spectrum of cell fates performing complex physiological functions.

How these cells differentiate in concert from a single-celled zygote to collectively form tissues and organs across the whole body remains a fundamental question in developmental biology. this study, we applied the Ladderpath approach to decompose the spatially-aligned cell fate sequence of worm Caenorhabditis elegans embryo, which produces 671 uniquely identifiable cells with defined fates ( incl. germline, intestine, muscle, neuron, pharynx, skin, and other). This approach automatically dissects how sub-sequence modules (Ladderons) self-repeat and mutually assemble in a hierarchical architecture, uncovering a delicate trade- off between diversity and repeatability in cell fate coding. This architecture consisting of highly diverse yet repeated Ladderons indicate that the actual cell fate sequence deviates substantially from both homogeneous and heterogeneous extremes, achieving an exceptional level of hierarchical complexity near the theoretical maximum.

Genetic-algorithm-based virtual evolution further reveals such complexity as an optimization goal in development, with pseudo sequences spontaneously converging toward similar hierarchical architectures designed through moderate Ladderon numbers and lengths. Notably, the longest Ladderons highlight intercellular Notch signaling as a key mechanism for enhancing hierarchical complexity via coordinating lineal differentiation. More broadly, the repeated lineal differentiation programs serve as an essential strategy for enhancing hierarchical complexity — a pattern recurrently observed in reality.

Together, this work offers new insights into design rules of cell fate coding operating from molecular to multicellular scales and provides a theoretical framework for identifying and interpreting the regulatory mechanisms beneath.

Significance Multicellular life encodes a variety of cell fates — intestinal, muscular, neuronal, pharyngeal, hypodermal, etc. They all arise from a single zygote to form a physiologically functional body with multiple tissues and organs. How these complex cell fates are organized in concert is a fundamental question in developmental biology. In this study, the fate sequence of all terminal cells in worm Caenorhabditis elegans embryo is decomposed, highlighting two complementary organizing modes: the mutual assembling of sub-sequence modules and their self-repeating. This diversity–repeatability trade-off yields a hierarchical architecture of these modules approaching the theoretical maximum in complexity, suggesting an outcome of evolutionary optimization. Such exceptional complexity emerges when the organism repeats the same differentiation programs across lineages ( via intercellular signaling). hierarchical complexity evolutionary optimization diversity-repeatability trade-off cell fate differentiation program Life is encoded in a sequence-based manner, with complexity across multiple scales: from the associated DNA (deoxyribonucleic acid encoded with four types of nucleotide bases) ( ), RNA (ribonucleic acid with four types of ribonucleoside bases) ( ), and protein (polypeptide encoded with 20 types of amino acids) ) to their joint molecular profiles differentiated in cells, which encode spatial sequences of cell types or cell fates (hereafter uniformly referred to as “cell fate”) with specific physiological roles ( ). How a pool of elements , nucleotide bases, ribonucleoside bases, amino acids, and cell fates) are organized into specific sequences within an immense design space to achieve desired functions is not only an experimental question but also a theoretical one.

The discovery of molecular-scale organization has faithfully demonstrated how biological sequences are designed to deviate from the extremes of homogeneity (repeatability) and heterogeneity (diversity) in order to achieve complexity; for

Author affiliations: a Department of Systems Science, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China; b International Academic Center of Complex Systems, Beijing Normal University, Zhuhai 519087, China; c School of Systems Science, Beijing Normal University, Beijing 100875, China; d Department of Systems Biology, Harvard Medical School, Boston 02115, USA; e Department of Data Science, Dana-Farber Cancer Institute, Boston 02215, USA

Please provide details of author contributions here. example, promoter sub-sequences are repeatedly encoded in a DNA sequence for different genes (repeatability) ( ), while other sequence domains also encode potentially repeated extrons, introns, start condons, stop condons, open reading frame, other promotors, and everything of the sort (diversity) ( Competing interests statement: The authors declare no com- peting interest 2 To whom correspondence should be addressed.

After the helical structures of DNA ( ) and RNA ( were revealed in the mid-20 century, the sequencing of complete genomes — first in the worm ( Caenorhabditis elegans C. elegans ) and later in human ( sapiens H. sapiens ) — was accomplished around the turn of the 21 century, further revealing a remarkable degree of conservation underlying the shared repertoire of both genes and cell fates ( Moreover, protein sequences (associated with their DNA/RNA templates’ sequences) and structures have been continuously charac- terized ( ), revealing that proteins fold toward low free-energy states to form canonical functional secondary and tertiary structures such as -helices and -sheets ); today, their sequences, structures, and functions can be accurately predicted through molecular dynamics ), physics-informed deep learning ( ), and more general artificial intelligence ( ), catalyzed by both experimental validation and applications ( Large-scale in silico experiments, particularly vir- tual evolution, have been instrumental in extracting key features and unveiling design rules underlying bi- ological system optimizations, spanning molecular and multicellular scales and extending even to organismal and behavioral scales ( For instance, at the molecular scale, amino acid sequences of polypeptides are optimized to ensure stable protein folding under a range of biophysical constraints, such as thermodynamic stability ( ), structural designability ( ), hydrophobic demixing ( ), structural atypicality ( ), kinetic acces- sibility ( ), translational regulation ( ), and so forth ). Bridging the molecular and cellular/multicellular scales, the molecular interaction networks underlying the Caenorhabditis elegans ’ cell polarity and fruitfly Drosophila melanogaster ’s cell fate, which are sequentially distributed along their body axis, are optimized to stabilize asymmetric interface and maximize positional information (alongside robustness) respectively ( ). These and those cases pinpointed that the complexity of life systems originates from the dynamic interplay between functional demands and biophysical constraints, optimized over billions of years of natural selection.

However, full sequences and design rules for cell fates within multicellular organisms remain relatively under- explored, caused by the difficulty of precisely tracking individual cell fates throughout the whole body and its overwhelming complexity ( Nevertheless, such multi-element, digit-like spatial sequences primarily along the anterior-posterior axis are conserved across species from higher to lower levels: (1) During somitogenesis in vertebrates represented by human, mouse, fish, and frog, binary gene expression patterns exhibit a periodic oscillation ( ). (2) Prior to gastrulation of invertebrates represented by fruit fly and beetle, regional gene expression patterns exhibit variable region numbers, positions, and sizes, depending on the combined activa- tions and inhibitions of dozens of genes ( ). (3) Similar cell fate sequences can even realize cellular resolution and individual reproducibility in worm and ascidian — hallmarks of an invariant sequence-based lineage ( (Fig.

However, how sequences encompassing cell fates such as death, germline, intestine, muscle, neuron, pharynx, skin, and others is designed within a complete developing organism remains largely unclear. Unveiling its design rules will not only help us understand fundamental science in terms of biology and physics, but also provide knowledge for the fields of synthetic biology and materials science ( Previous decompositions and simulations of cell fate sequences in worm and ascidian embryos revealed that self-repeating modules (corresponding to cellular differ- entiation programs executed upon cell division that give rise to following cell fate modules) are favored in certain lineal parts, such as the C. elegans ABarapp lineage (1/64 of the complete lineage) ( ) . To comprehensively address how cell fates are encoded to form tissues/organs differentiated from a single-celled zygote in a complete multicellular life, this study adopts C. elegans model organism, considering its well-characterized, cell- resolved differentiation programs and strong conservation with humans in genetics, genomics, and cell fates ( Decomposition of the full C. elegans embryonic cell fate sequence along body axes uncovers a complex hierarchi- cal architecture consisting of diverse yet repeated sub- sequence modules, with a complexity level approaching the theoretical maximum. This exceptional hierarchical complexity is optimized quantitatively, integratively for the entire body rather than tailored to individual body parts. On one hand, in silico experiments suggest that the trade-off between diversity and repeatability among sub-sequence modules is achieved through their moderate numbers and lengths, as well as through the repeated lineal differentiation programs; on the other hand, experiments suggest that known mechanisms like intercellular Notch signaling contribute to such a trade- off and its resulting exceptional hierarchical complexity.

Together, this work presents design rules of cell fate coding in a complete multicellular life, bridging realistic mechanisms with theoretical interpretation.

Results

C. elegans Cell Fate Sequence Exhibits Ex- ceptional Hierarchical Complexity with Diverse, Repeated Sub-Sequence Modules.

Building on over forty years of genetic research that has fully mapped the stereotyped cell fate sequence in C. elegans embryos, we treat it as a natural developmental blueprint and compare it with random sequences generated artificially ). After fertilization, the C. elegans zygote undergoes four consecutive rounds of cell divisions, producing a lineage tree of 671 terminal cells that no longer divide during embryogenesis ( ) (Fig. , Fig.S1).

The cell division orientations are primarily aligned along the nterior- osterior (a-p) axis ( 89.44%), with fewer aligned along the ight (l-r) axis ( 9.91%) and orsal- entral (d-v) axis ( 0.65%). Thus, a systematic nomenclature was devised that appends the letters “a”, “p”, “l”, “r”, “d”, or “v” to each mother cell’s name, thereby unambiguously naming its daughter cells based on their relative initial positions. This yields a 1 671 one-dimensional cell fate sequence at the end of embryogenesis, with each digit denoting death ( abbr. , “D”, 16.84%), germline ( abbr. “G”, 0.30%, intestine ( abbr. , “I”, 2.98%), muscle ( abbr. “M”, 12.07%), neuron ( abbr. , “N”, 36.07%), pharynx abbr. , “P”, 11.92%), skin ( abbr. , “S”, 13.26%), or other abbr. , “O”, 6.56%), ordered by well-documented lineal and positional information (Fig. , Fig.S1; Table S1)

a: Anterior l : Left v: Vertical : Posterior r : Right : Dorsal Here, apply Ladderpath approach,

method

specially devised decompose generic one-dimensional sequence self-repeating mutually assembling sub-sequence modules varying lengths (termed “Lad- derons”, which together deterministically constitute Death Germline Intestine Muscle Neuron Pharynx Other Z2/Z3 Laddergraph that intuitively visualizes their hierarchical architecture) ( The decomposed sequence is characterized by two key parameters, one previously proposed and one newly introduced (with mathematical details provided in Materials and Methods): (1) “Diversity C. elegans embryonic cell lineage tree that gives rise to 671 terminal cells, each with a distinct fate. Considering that cell divisions are primarily oriented along the anterior-posterior (a-p) axis (89.44%), with few more oriented along the left-right (l-r) axis (9.91%) and dorsal-ventral (d-v) axis (0.65%), cell name is systematically extended combining its mother’s name as a prefix and “a”, “p”, “l”, “r”, “d”, or “v” as a suffix denoting its initial location relative to its sister; accordingly, anterior daughters — and, to a lesser extent, those oriented more left or dorsal — are positioned on the left side of the lineage tree, with their posterior (or right/ventral) sisters on the right. To manage the illustration size, the full lineage tree is split into sublineages derived from the embryo’s first eight cells ( , ABal, ABar, ABpl, ABpr, MS, E, C, and P3 consisting of D and P4); an unsplit illustration is provided in Fig.S1. Both the cell lineage tree nomenclature and layout strictly adhere to the conventions established in the original literature (

level” measures the number of distinct sub-sequence modules; (2) “Repeatability level” measures the degree to which a sequence is constituted by repeating rather than diversifying sub-sequence modules( ); (3) “Hierarchical complexity” measures the degree to which a complex hierarchical architecture emerges from the integration of both repeating and diversifying sub-sequence modules.

Relatively, an extremely homogeneous sequence like AAAAAAAAAAAAAAAAAAAAAAAAA yields a low , a high , and a low ; an extremely heterogeneous sequence like ABCDEFGHIJKLMNOPQRSTUVWXY yields a high , a low , and a low ; most importantly, a sequence integrating both diverse and repeated sub- sequence modules yields a moderate , a moderate , and a high , reflecting their complex hierarchical architecture.

Ladderpath decomposition of the full C. elegans fate sequence outputs a complex, hierarchical Laddergraph (Fig. (a)) consisting of 68 sub-sequence modules that both self-repeat and mutually assemble frequently (Table S2). Strikingly, its hierarchical complexity ( approaches the theoretical maximum ( = 1). comparison, such complexity is gone ( = 0) if the sub- sequence modules only self-repeat (extremely homoge- neous sequence) or only mutually assemble (extremely heterogeneous sequence) while preserving the element types, numbers, and proportions of C. elegans (Table ).This strongly supports that the C. elegans cell fate sequence is designed to achieve an exceptional hierarchical complexity. Note that by definition, each sub-sequence module appears at least twice within the terminal cell fate sequence. Therefore, on one side, self-repeating symbolizes the same differentiation program appearing across close lineage positions — for example, the progenies of E lineage ends into only intestine and the progenies of D lineage ends into only muscle; on the other side, mutually as- sembling symbolizes the different differentiation programs appearing across distinct lineage positions — for example, the progenies of MS lineage ends into death, muscle, pharynx, and other (Fig. , Fig.S1; Table S1) ( Apparently, the integration of these two modes enables a concurrent expansion of both cell number and cell fate across the entire body of an embryo.

Hierarchical Complexity of C. elegans Cell Fate Sequence is Optimized at the Whole Organism Scale, not for every individual body parts.

Whereas C. elegans exhibits exceptional hierarchical complexity ( near the theoretical maximum far away from that of either an extremely homogeneous sequence or an extremely heterogeneous sequence, we are curious whether this optimization participates not only in the entire body but also in individual body parts derived from the somatic founder cell lineages. Hence, we examine the hierarchical complexity of partial cell fate sequences derived from all the somatic founder cell lineages, each originating from the the first to fourth rounds of consecutive asymmetric cell divisions (driven by molecular interaction networks capable of cell polarization): AB (with P0 as mother and P1 as sister), EMS (with P1 as mother and P2 as sister), C (with P2 as mother and P3 as sister), and D (with P3 as mother and P4 as sister) (Fig. , Fig.S1; Table S1) ( ). Their cell fate sequences exhibit hierarchical complexity ( = 0 to = 0.942) spanning nearly the range from the theoretical minimum ( = 0) to the theoretical maximum ( = 1) (Table ). Comparable to the full lineage, the AB lineage (deriving cells destined for death, muscle, pharynx, and skin) and C lineage (deriving cells destined for death, muscle, neuron, and skin) also exhibit exceptional hierarchical complexity 0.918 and 0.942 respectively, as both accommodate at least four distinct cell fates. However, the D lineage (deriving cells destined for muscle only) exhibits zero hierarchical complexity as expected, attributed to its repeatability but no diversity. Last but not least, despite the EMS lineage accommodating at least five cell fates as well, it exhibits only moderate hierarchical complexity = 0.586, suggesting that the way cell fates are encoded — beyond their mere element numbers, types, and proportions — is influential to their collective hierarchical architecture; although both the EMS and C lineages accommodate multiple cell fates, the EMS lineage with over twice the element number still exhibits a lower diversity level a higher repeatability level , consistent with previous observations that certain C. elegans lineal parts ( ABarapp lineage) tend toward the repeatability mode to avoid redundancy in differentiation programs ( ). To sum up, while the hierarchical complexity of the full cell fate sequence represents an evolutionary optimization goal for a complete multicellular life as a whole, the organism’s individual body parts can adjust their developmental manners to fulfill specific physiological functions required in reality.

Cell Fate Sequence Evolves Exceptional Hier- archical Complexity through an Optimal Trade- Off between Sub-Sequence Module Diversity and Repeatability.

Inspired by the C. elegans cell fate sequence that contains an abundance of self-repeating and mutually assembling sub-sequence modules and near- maximum hierarchical complexity, we hypothesize that hierarchical complexity serves as a quantitative metric or developmental constraint orchestrating these two modes in concert. Hereafter, we refer to them as the repeatability and diversity modes. We employ a genetic algorithm to evolve a one-dimensional sequence while preserving the element types, numbers, and proportions of C. elegans (see Materials and Methods). A population of 4,000 silico cell fate sequences is initialized from each of two extreme cases with zero hierarchical complexity: extremely homogeneous sequence with element types arranged sequentially and the extremely heterogeneous sequence with element types mixed randomly (Fig. (a)).

For each iterative round, 1,200 (30%) with the highest values are retained; the remaining 2,800 (70%) are first formed into 1,400 pairs of parents subjected to crossover, with a crossover probability of 0.8 to exchange a randomly-picked digit between them and then subjected to a mutation probability of 0.03 to reset a randomly- picked digit into one of the other fates, accompanied by another randomly picked digit proceeding with the opposite resetting. Updating the pool of 4,000 samples through 3,000 iterative rounds, each case continuously evolves toward a hierarchical complexity of one as set (Fig. (b)); the terminal Laddergraphs consistently exhibit complex hierarchical architectures in which sub-sequence modules self-repeat and mutually assemble frequently, mirroring that of C. elegans in nature (Fig.S2). Amazingly, such continuous evolution of hierarchical complexity is equivalent to the product of two terms positively

MMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM MMMMMMMM NNNNN NNNNNNN NNNNNNN NNNNN MMMMM SSSSSS IIIII Death Germline Intestine Muscle Neuron Pharynx Other PPPPP PPPPP MMMMM NNNNNNN NNNNN NNNNNNNN NNNNNNN NNNNNNNN GGGGGGGG PPPPPPPP MMMMMMMM DDDDDDDD OOOOOOOO SSSSSSSS Laddergraphs decomposing the C. elegans cell fate sequence alongside two extremes ( , one homogeneous and one heterogeneous). While the homogeneous and heterogeneous sequences exploit pure self-repeating and pure mutually assembling respectively, the C. elegans sequence integrates both modes; accordingly, the extreme sequences exhibit zero complexity, whereas the C. elegans sequence attains exceptional hierarchical complexity near the theoretical maximum, reflected in its intricate Laddergraph.

C. elegans cell fate sequences, in comparison to those of random sequences.

Lineage Origin Death linear with diversity level and repeatability level respectively, indicating that the optimization goal in natural evolution is the strong bonding of these two modes in trade-off, rather than a bias toward either of them alone (Fig. (c), Fig.S5). More specifically regarding sub- sequence module (Ladderon) organization, it is noteworthy that both evolutionary trajectories converge on a moderate Ladderon number, meaning that a moderate level of diversity mode is required (Fig. (d)); coherently, the maximum Ladderon length is also moderate, meaning that a moderate level of repeatability mode is required (Fig. (e)).

What’s more, the dynamics of the above parameters along evolutionary trajectories initiated from the two extreme cases demonstrate a clear trade-off between the diversity and repeatability modes, where an increase in one is inherently coupled with a decrease in the other, and vice versa. To our surprise, the optimal Ladderon number obtained from virtual evolution is 4.5, very close to the C. elegans value of 76 in the real world, providing strong evidence that hierarchical complexity governs the design of C. elegans cell fate sequence by enforcing a delicate trade-off between diversity and repeatability.

Intercellular Notch Signaling and Downstream Differentiation Programs Contribute to Hierar- chical Complexity of C. elegans Since hierarchical complexity arises from a delicate trade-off between re- peating and diversifying cell fate sub-sequence modules (Ladderons), we wonder how this is realized across a cleaving lineage (Fig. ; Fig.S1; Table S1). The inspiring, representative natural developmental blueprint in elegans is known to have roughly half of its cell fate coding realized by intercellular signalings( ). Among them, Notch signaling is highly conserved across species of varying complexity ( , worm( ), fly( ), frog( zebrafish( ), mouse( ), human( )). It comprises a set of cell-membrane-bound molecules ( ligands and receptors) that transmit signals from a signaling cell to a responding cell via physical contact, after which downstream differentiation programs induce the formation of various tissues/organs ( , the head and kidney in C. elegans ). Within the longest Ladderons of C. elegans (Fig. ; Fig.S2; Table S2), we identified two overlapping Germline Intestine Muscle Neuron Pharynx with downstream differentiation programs of the 3 Notch signaling events (corresponding to the fifth- and first-longest Ladderons respectively); namely, the ABplaaa lineage differentiates from the ABpraaa lineage after receiving the 3 Notch signaling for head development and the ABplpapp lineage differentiates from the ABprpapp lineage after receiving the 4 Notch signaling for kidney development ( ) (Fig. (a)). To study the role of these two signaling-modulated Ladderons, we implement three types of cell fate sub-sequence module disruptions: 1. the left lineage is replaced by the right lineage; 2. the right lineage is replaced by the left lineage; 3. the left and right lineages are swapped (Fig. (a)). Note that prior experiments have already verified that blocking the Notch signaling event results in a Type 1 disruption and blocking the 4 Notch signaling event results in a Type 2 disruption ( ). Intriguingly, while none of the Type 1-3 disruptions elevates the hierarchical complexity , those matching experimentally verified signaling-blocking outcomes cause the most pronounced reduction (Fig. (b)).

Further, we ask whether the full cell fate sequence — that is known to be shaped by numerous differentiation programs including not only intercellular Notch signal- ing, but also intercellular Wnt signaling ( ), cell polarization ( ), cell size segregation ( among others — is globally organized under the principle of hierarchical complexity.

To answer this question, we implement a more general sequence disruption via stochastic digit mutations. When the digit at each position is assigned an equal probability = 0.5 of being switched to one of the other cell fates, the hierarchical complexity declines to zero through 1,000 iterative rounds (Fig. (c)).

This strongly suggests that the exceptional hierarchical complexity of C. elegans relies not only on the intercel- lular Notch signaling just probed into, but also on the coordinated action of differentiation programs operating throughout embryogenesis. As with Ladderons 1 and 5 linked to Notch signalings, other Ladderons may be linked to additional intercellular signalings or differentiation programs, warranting further investigation and serving as a valuable resource for identifying underexplored differentiation programs or regulatory mechanisms (Table

Homogeneous Heterogeneous repeatability ( Diversity Hierarchical Complexity ( Iteration ( In Real Developmental Context, Cellular Differ- entiation Programs May or May Not Be Repeated Across Lineages, Roughly in a Half-Half Ratio.

Provided that downstream differentiation programs mod- ulated by intercellular Notch signalings give rise to cell fate sub-sequence modules (Ladderons) repeated across distinct lineage positions (Table S2), we next inspect whether other differentiation programs are repeated, too (Table S3). To this end, we define the stemness of any cell in the complete lineage by all its progenies’ fates (represented as a 1 8 vector in which 0 and 1 denote the absence and presence of each unique fate), then its cellular differentiation program is defined as the combined stemness of this cell and its two daughters (represented as a 3 8 matrix in which the 1 row corresponds to the mother, while the 2 and 3 rows correspond to the anterior/left/dorsal and posterior/right/ventral daughters respectively) (Fig. (a)).

Evaluation Selection

4000 Iteration

Crossover Mutation Mutation Crossover

4000 Iteration

Selection Evaluation Hierarchical Complexity ( Ladderon Number Remarkably, we identify a diverse pool of 118 non- identical cellular differentiation programs across 670 non- identical dividing cells (Fig. ; Fig.S1; Table S1), of which 58.47% are repeated in at least two cells, proposing that the delicate diversity-repeatability trade-off operates not only in the sub-sequence modules of the full cell fate sequence but also in the cellular differentiation programs of the complete lineage that derives them (Table S3).

Moreover, we introduce two parameters to assess how cellular differentiation programs are repeated from one cell to another across distinct lineage positions: 1. “Lineage Distance” assesses the degree of separation between two cells with respect to the complete lineage, based on the generation of their most recent common ancestor ( , the lineage distance between Da and Dp is as low as 1, whereas that between ABalapappaa and ABplapapppp is as high as 9). The observed lineage distances spans the theoretically allowed range, indicating that a cellular differentiation NNNNDDDD Constraint: Constant Element Propo PPPPSSSS PDDPDMNNN PSSSPPP ISMNIGOSGDD MPNPIMIMN om Element Rearrangement ate: R Maximum Ladderon Length Iteration ( Virtual evolution for cell fate sequences from low to high hierarchical complexity ( ). (a) Schematic diagram of cell fate sequence and decomposition. Beginning with two extremes ( , one homogeneous and one heterogeneous constrained by element types, numbers, and proportions observed in C. elegans (Table )), genetic algorithm is applied to evolve these sequences by incrementally increasing their hierarchical complexity ( ). (b) Increase of hierarchical complexity ( ) over iteration round ( ).(c) Increase of the product of two terms positively linear to diversity level and repeatability level (Diversity × repeatability ( ’), where (d) Increase of hierarchical complexity ( ) when Ladderon number ( ) approaches a moderate value (80 9.35), suggesting that a middle level of Ladderon diversity strategically emerges. (e) Increase of hierarchical complexity ( ) when the maximum Ladderon length ( ) approaches a moderate value (17 8.07), suggesting that a middle level of Ladderon repeatability strategically emerges.

Notch ABpla ABplaa ABplap ABplaaa ABplp ABplpa ABplpp lpapp Hierarchical Complexity ( Lineage cooperation program can be repeated immediately in two daughter cells following cell division or spontaneously in two distant cells that have presumably undergone substantial differentiation before somehow converging on the same fate (Fig. (b)). 2. “Lineage Coherence” assesses the degree of symmetry between two distant cells with respect to the complete lineage, based on the difference of their lineage positions ( , the lineage coherence between ABalapaaa and ABprppppp is low, whereas that between ABalapaaa and ABplapaaa is high). The observed lineage coherence spans the theoretically allowed range, indicating that cellular differentiation programs can either preserve or break symmetric lineage positions (just like those obeying ABpra ABpraa ABprap ABpraaa Type 1: Replacement Type 2: Replacement Type 3: Swap ABprp ABprpa ABprpp ABprpapp Type 1: Replacement Type 2: Replacement Type 3: Swap Iteration ( or disobeying anterior-posterior, left-right, and dorsal- ventral lineage symmetry)( ). The ones with high lineage coherence appear to involve single fates repeated N & N, S S & S) or diversified ( , N/D N & D), while the ones with low lineage coherence appear to be modulated by mechanisms such as Notch signalings and are hardly repeated ( , N/D/P/S D/P/S & N/D/P/S in the ABalpa and ABara that receive Notch signaling to lose their originally high lineage coherence) ) (Fig. (c)). Overall, both lineage distance and coherence collectively suggest that while a part of cellular differentiation programs take place only once or a few Notch measurement Based on virtual design Contribution of intercellular signaling and differentiation programs to hierarchical complexity ( C. elegans . (a) Schematic diagram of three types of in silico lineage disruption (indicated by arrows; symbolized by “Type 1”, “Type 2”, “Type 3”) deviated from the one in nature (plotted lineage tree; symbolized by “Nature”), according to the (differentiating lineages: ABplaaa and ABpraaa) and Notch (differentiating lineages: ABplpapp and ABprpapp) signaling related to Ladderons. (b) Change of hierarchical complexity ( ) in natural and disrupted lineage cooperations( ). Here, the deviated lineage cooperations reported by previous in vivo genetic perturbation experiments are marked by stars, whereas the others are marked by empty circles. (c) Decreasing trend of hierarchical complexity ( ) over iteration rounds ( ) of stochastic digit mutations. For each discrete value, a total of independent stochastic simulations are taken into account, then the mean (line) and standard deviation (shade) of their values are shown.

times, the other part is widespread across the complete lineage.

Lineal Differentiation Program Repeatability Significantly Enhances Hierarchical Complexity in a Cleaving Lineage from Zygotic Stemness to Differentiated Fates.

Given that our top-down analysis of C. elegans data reveals how differentiation programs are orchestrated (Figure ), can a bottom-up analysis of a general cleaving lineage better elucidate how such programs might emerge from scratch? To test this idea, we numerically sample cell lineage cleavages along with random cellular differentiation programs at large scale, so as to construct a comprehensive lineage- to-sequence design space (see Materials and Methods).

Here, we initiate round-by-round bifurcation starting from one cell and expanding to 512 cells through 9 rounds of divisions, during which each cell division may undergo a random differentiation (Fig. (a)). While the initial zygote possesses all stemness (represented as a 8 vector with all one denoting all stemness to give rise to fate A, B, C, D, E, F, G, and H, where the element number and type are adjustable in principle), every fate is independently deleted with a differentiation probability in a daughter cell after division. By setting to 0.9 through both magnitude and equidistance changes, we obtain a total of 16,000 sampled lineages, each with a clear record of all cell fate codes across 1023 cells and cellular differentiation programs across 511 dividing cells. To more quantitatively assess how “cellular differentiation programs” are orchestrated across distinct lineage positions, likewise, we extend this concept from a focal cell and its two daughters to all its progenies by appending rows of binary cell fate codes for all its anterior/left/dorsal and posterior/right/ventral progenies, generation by generation - called “lineal differentiation program”. Meanwhile, the terminal cell fate sequence can be derived by non-identical cell fate codes, from which the hierarchical complexity is subsequently calculated.

As the differentiation probability for a single cell fate ( varies from low to high, hierarchical complexity exhibits a pronounced pulse-like pattern (Figure (b)). This pulse is in match with the results mentioned above: a low value produces lineages with an extremely homogeneous cell fate sequence (repeatability) and a high value produces lineages with an extremely homogeneous cell fate sequence (diversity). Unexpectedly, a similar pulse also exists for the cell proportion with repeated lineal differentiation program (Figure (c)). Their joint relationship directly shows the cell proportion with repeated lineal differentia- tion program is positively correlated to the hierarchical complexity and monotonically raises the lower bound of hierarchical complexity , with values already touching the theoretical maximum when = 0.2 (Fig. (d); Fig.S6).

Interestingly, C. elegans makes full use of this strategy - by operating as high as 0.694, effectively pushing its hierarchical complexity to = 0.972.

Discussion

How biological sequences are designed to fulfill specific functions across scales has remained a central question for dozens of years — from DNA sequence at the molecular scale (which are transcribed into RNA and then translated into functional protein) to cell fate sequence at the multicellular scale (that forms functional tissue, organ, and body). In this study, we aim to address a fundamental question in developmental biology - how cell fates are encoded in development to form a complete multicellular life?

We took advantage of the Ladderpath approach to systematically decompose the full cell fate sequence in worm C. elegans , revealing a complex, hierarchi- cal architecture consisting of diverse yet repeated sub- sequence modules (Fig. , Fig. ). These modes of diversity and repeatability symbolize two compensatory ways in which the full cell fate sequence is constituted: mutual assembling between modules and self-repeating of modules.

Further virtual evolution based on genetic algorithm indicated that the near-maximum hierarchical complexity is achieved by a delicate trade-off between diversity and repeatability (with moderate Ladderon numbers and lengths) that maximizes the product of their levels in a simple mathematical relationship (Fig.

In silico disruption of realistic C. elegans lineage demonstrated that the previously reported Notch signalings transducted between neighboring cells, together with their downstream differentiation programs, substantially contribute to the hierarchical complexity observed in vivo (Fig. addition, cellular differentiation programs were broadly distributed across the complete lineage, with roughly half being repeated in other cells and the remaining half being isolated (exemplified by cases modulated by Notch signaling) (Fig. ). Motivated by the statistics of cellular differentiation programs above, further large-scale lineage sampling that models these programs suggested that repeating lineal differentiation programs ( incl. cellular differentiation programs within the lineage derived from an ancestor cell) across multiple cells is essential to achieve an exceptional hierarchical complexity - a strategy naturally exploited by C. elegans (Fig.

The above-mentioned findings highlight promising future research directions worth investigation.

First, although C. elegans cell fate sequence is proven to be near-optimal in theory, many alternative sequence designs can achieve comparable fitness.

Perhaps the current Ladderpath approach decomposes only the cell fate sequence itself, while neglecting other important information in space ( how differentiated cells are positioned in space to form functional tissues, organs, and body) or resource ( how limited polarity and signaling molecules are positioned in space to trigger cellular differentiation programs). Apart, more finely specified cell fates, like those grouped under “Other” and sub- fate within one major group conceal deeper hierarchical architectures and hide additional design rules. In the future, Ladderon deployment and lineage bifurcations could be mapped onto the three-dimensional geometry of the embryo to link cell position cues with cell fate modules and gain insight into how spatial patterning is coordinated, potentially alongside optimization goals pro- posed before in this scheme ( , cell volume segregation ratio and cell migration). Second, because the 3 Notch signaling events are found to be related to the longest Ladderons, it is worth investigating whether other Ladderons are related to other regulatory mechanisms or differentiation programs, including but not limited to those relevant to Wnt signaling and cell polarization

13.3% 21.5% Count 10.1% 11.0% 23.1% (Table S2). This might provide useful clues for identifying their theoretical roles, especially when analyzed jointly with cell-lineage–resolved gene-expression profiles from thousands of publicly available embryo samples ( ). Third, supported by the exceptional hierarchical complexity in C. elegans as well as the contribution of Notch signaling, whether the cell lineage and fate sequence is also designed for complexity and with a delicate trade-off between diversity and repeatability, is conserved among organisms and serves as a general principle, is worth studying. Public datasets with cell fate documentation in the lineage tree ( the other nematodes and ascidians) or in spatial distribution ( spatial transcriptomics in vitro )) may also be amenable to exploration.

The combinatorial bottom-up simulation (virtual evo- lution and random sampling) and top-down statistics ( elegans cell fate and lineage) used in this study suggest biological sequence analyses need not be limited to the typ- ical molecular scale; instead, they allow exploration into vast design spaces beyond those accessible in nature on ABC -> ABC + ABC AB -> AB + C AB -> A + B C -> C + C Lineage Coherence Program Examplary Cell Pair ABprpapppa ABprpaappa ABprpaappap ABprpaappaa ABprpapppap ABprpapppaa Cpaaa Cpapa Cpaaap Cpaaaa Cpapap Cpapaa ABalapaapa ABalappapa ABalapaapap ABalapaapaa ABalappapap ABalappapaa ABalpa D/P/S ABara ABalpap ABalpaa N/D/P/S D/P/S ABarap ABaraa Cellular Differentiation Program computer. These methods may also be applicable to other sequence studies, such as the DNA, RNA, and proteins ones, which likewise involve self-repeating and mutually assembling sub-sequence modules. Besides, whether the optimal diversity-repeatability trade-off principle is also at work in other, similar systems remains a fascinating question. For example, in cooperative systems involving teamwork among agents ( , academic labs or research institutions), it may likewise be optimal to distribute members across a moderate range of research directions and collaborations (diversity) with certain resources in each (repeatability), rather than concentrating everyone on a single project or dispersing them into unrelated pursuits. This principle may also operate in broader social and cultural systems, which could be explored through a generalized agent-based model (instead of an intact sequence), where agents differ in identity yet cooperate to achieve both collective and individual functions.

Materials and Methods Lineage Coherence Lineage Coherence Lineage position relationship of non-identical cellular differentiation program repeated from one cell to another. (a) Schematic diagram of lineage distance and lineage coherence . (b) Distribution (illustrated by a fan chart) of lineage distance for all dividing cells in C. elegans embryogenesis. (c) Distribution (illustrated by histogram) of lineage coherence for all dividing cells in C. elegans embryogenesis.

Ancestor Lineal Differentiation Program ( Cell Proportion with Repeated Hierarchical Complexity ( Single Cell Fate ( Differentiation Probability for Ladderpath Approach for Sequence Decomposition.

Among various sequence analytical approaches( ), the Ladderpath approach provides a transparent, white-box methodology that explicitly decomposes a sequence into self-repeating and mutually assembling components of varying lengths, thereby quantifying their repeatability and diversity in a principled manner ( ). This decomposition intuitively displays how these components collectively constitute a full sequence with an underlying hierarchical structure. Owing to its ability to handle sequences of arbitrary length and arbitrary numbers of element types, the Ladderpath approach has been successfully applied in design rule discovery and engineering guidance across a wide range of scenarios, including amino acid sequences in protein synthesis and parameter sequences in artificial neural network To illustrate how the Ladderpath approach decomposes a sequence, let’s use an arbitrary 25-bit sequence as an example: ABACDA CDA B DB ABACDA D DB CDAB. First, the single-bit components ( , the individual element types) constitute the most foundational layer, formularized as B, C, D . Next, all multi-bit components that repeat at least twice are regarded as “Ladderons” and placed on upper hierarchical layers according to the following rules: if a single-bit component self-repeat or multiple single-bit components assemble mutually, the second layer is constituted, formularized after the first layer as:

A, B, C, D // CDA, DB (here, CDA self-repeats four times and DB self-repeats twice in the full sequence); the remaining Ladderons — those mutually assembled from Ladderons tiati neage Ancestor Cell Proportion with Repeated Lineal Differentiation Program ( in lower layers — are then placed onto successively higher layers, so the third layer is constituted, formularized after the second layer as:

A, B, C, D // CDA, DB // ABACDA, CDAB (here, ABACDA self-repeats twice and CDAB self-repeats twice), and so on and so forth. To represent the hierarchical structure, each Ladderon in the formula is annotated with the cumulative number of times (¿1) it self-repeats or mutually assembles with another Ladderon to form a higher-level Ladderon or the full sequence, which is equal to the number of arrows emanating from that Ladderon in the Laddergraph (a graph intuitively visualizes how all Ladderons self-repeat and mutually assemble to constitute the full sequence); to avoid redundancy and because every multi-bit Ladderon is already repeated at least twice by definition, we omit annotating a repeat number of 2. Eventually, the formula becomes A(3), B(3), C, D (3) // CDA, DB // ABACDA, CDAB (Fig. ). While the sum of the annotated numbers (3+3+1+3+1+1+1+1=14) represents how many operations ( incl. , self-repeating or mutually assembling) are required to constitute the full sequence, subtracting it from the sequence length ( 25-14 = 11) represents the extent to which the the sequence is constituted through repeating rather than diversifying its component. value reflects how the hierarchical structure of sequence components gains complexity as it deviates from these two extreme cases: in the case of an absolutely homogeneous sequence exemplified with S = 25 ( , AAAAAAAAAAAAAAAAAAAAAAAAA), its Ladderpath representation A(4) // AAA // AAAAAA // AAAAAAAAAAAA yields a large = 25, indicating intensive self-repeating mode (repeatability) (Fig. (b)); in tiati tiati

Magnitude Equidistance Magnitude Equidistance

Single Cell Fate ( Differentiation Probability for Random simulation for cellular and lineal differentiation programs. (a) Schematic diagram of cell lineage cleavage and differentiation. Beginning with the zygote, each cell’s stemness is encoded by a 1×8 vector in which 0 and 1 denote the partial stemness to derive a single cell fate A, B, C, D, E, F, G, H, or a combination of them in the future. (b) Pulse distribution of hierarchical complexity ( ) over differentiation probability for single cell fate ( ). For each discrete value, a total of 1,000 independent random simulations are executed, then the mean (point) and standard deviation (bar) of their values are shown. (c) Pulse distribution of cell proportion with repeated lineal differentiation program ) over differentiation probability for single cell fate ( ). For each discrete value, a total of 1,000 independent random simulations are executed, then the mean (point) and standard deviation (bar) of their values are shown. (d) Raised lower bound of hierarchical complexity ( ) by cell proportion with repeated lineal differentiation program ( ). For each discrete value, a total of 1,000 random simulations are executed, then the range (line) between upper bound (point) and lower bound (point) of their values is shown.

BABAC AAAAAA the case of an absolutely heterogeneous sequence exemplified with S = 25 , ABCDEFGHIJKLMNOPQRSTUVWXY), its Ladderpath representation A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, yields a large = 0, indicating intensive mutual-assembling mode (diversity)(Fig. )(c). Normalizing to the 0-1 range allows us to define a new metric, also ranging from 0 to 1, that evaluates the degree to which a sequence integrates the two modes cooperatively to gain hierarchical structural complexity.

γ = 1 −| 2 ω ( x ) − ω 0 ( S ) ω max ( S ) − ω 0 ( S ) − 1 | [1]

where denote, respectively, the average and maximum values obtained from numerous random sequences of length sharing the same digit pool.

Assessing the lineage coherence between two cells with the same cellular differentiation program..

The central idea is to assess how closely the lineage positions of two cells or their names align after excluding their shared prefix, since a lineage-based nomenclature was implemented.

To quantify their difference of lineage positions, we define a difference vector , which encodes mismatches between corresponding characters of the two strings ( ) in a length of in a binary form: otherwise here, denotes the -th character of the string counted from the left; denotes the -th character of the string counted from the right.

The lineage coherence between two cells is then computed

LineageCoherence ( s 1 , s 2 ) = � L − 1 i =0 d i · 2 i

i =0 d i · 2 L − 2 [3]

1. JD Watson, FH Crick, Molecular structure of nucleic acids: a structure for deoxyribose

nucleic acid. Nature , 737–738 (1953). 2. W Gilbert, Origin of life: The rna world.

Nature , 618–618 (1986).

3. MW Nirenberg, JH Matthaei, The dependence of cell-free protein synthesis in e. coli upon

naturally occurring or synthetic polyribonucleotides.

Proc. Natl. Acad. Sci , 1588–1602 (1961). 4. FH Crick, The genetic code.

Sci. Am , 66–77 (1962). A Ambrogelly, S Palioura, D S oll, Natural expansion of the genetic code.

Nat. Chem. Biol 29–35 (2007).

6. C N¨usslein-Volhard, E Wieschaus, Mutations affecting segment number and polarity in

drosophila. Nature , 795–801 (1980).

7. CC Fowlkes, et al., A quantitative spatiotemporal atlas of gene expression in the drosophila

blastoderm. , 364–374 (2008). JO Dubuis, G Tka cik, EF Wieschaus, T Gregor, W Bialek, Positional information, in bits.

Proc. Natl. Acad. Sci , 16301–16308 (2013). The denominator in the formula scales the value between 0 and 1, and the numerator measures the cumulative impact of character mismatches as a binary-weighted sum of differences.

If two cells share ancestor many generations above but locate in a similar or symmetric lineal position ( , ABalaaa and ABplaaa) and share the same differentiation program, they are evaluated with high lineage coherence. For example, the lineage symmetry between ‘ABalaaaa’ and ‘ABplaaaa’ is

2 1 · 2 6 − 2 = 1 . They are the most symmetry. Conversely,

if two cells are in a deviated lineal positions ( , ABalaaa and ABprppp) and share the same differentiation, they are evaluated with low lineage coherence. For example, the‘ABprpppp’ is very different from ‘ABalaaaa’ and the lineage symmetry is

2 6 · 2 6 − 2 = 0 . 0615 .

Data, Materials, and Software Availability. The codes generated and analyzed during the current study are available on GitHub . All other study data are included in the article and/or supporting information.

ACKNOWLEDGMENTS. We thank Prof. Ke Zhang and Prof. He Liu for their fruitful discussion on this project. We also thank all members in ecsLab.

The large-scale computation was partly conducted using approximately 200 CPU cores in the Interdisciplinary Intelligent Supercomputing Center of Beijing Normal University at Zhuhai. This work was supported by the National Natural Science Foundation of China (Grant No. 12205012 to Y.L.) and Guangdong Basic and Applied Basic Research Foundation (Grant No. 2025A1515012923 to Y.L.).

  1. N Karaiskos, et al., The drosophila embryo at single-cell transcriptome resolution.

Science , 194–199 (2017). JA Briggs, et al., The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution.

Science , eaar5780 (2018).

11. DE Wagner, et al., Single-cell mapping of gene expression landscapes and lineage in the

zebrafish embryo. Science , 981–987 (2018).

12. B Pijuan-Sala, et al., A single-cell molecular map of mouse gastrulation and early

organogenesis. Nature , 490–495 (2019). 13. Z Xiao, et al., 3d reconstruction of a gastrulating human embryo. , 2855–2874 (2024). 14. B Zhang, et al., A human embryonic limb cell atlas resolved in space and time.

Nature 668–678 (2024).

15. C Zhao, et al., A comprehensive human embryo reference tool using single-cell

rna-sequencing data. Nat. Methods , 193–206 (2025). 16. X Liao, et al., Repetitive dna sequence detection and its role in the human genome.

Commun. Biol , 954 (2023). AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAA Hierarchical structure (illustrated by Laddergraph( )) of cell fate sub-sequence modules self-repeating (denoted by each independent box) and mutually-assembling (denoted by arrows between boxes) in (a) an arbitrary 25-bit sequence example, in comparison with its (b) extremely homogeneous and (c) extremely heterogeneous cases.

  1. L Chen, et al., Characterization and complete genomic analysis of two salmonella phages, senalz1 and senasz3, new members of the genus cba120virus.

Arch. Virol , 1475–1478 (2019). 18. S Arnott, et al., X-ray diffraction studies of double helical ribonucleic acid.

Nature 227–232 (1966).

19. C elegans Sequencing Consortium*, Genome sequence of the nematode c. elegans: a

platform for investigating biology. Science , 2012–2018 (1998). 20. ES Lander, et al., Initial sequencing and analysis of the human genome.

Nature 860–921 (2001). EA Baker, A Woollard, How weird is the worm? evolution of the developmental gene toolkit in caenorhabditis elegans.

J. Dev. Biol , 19 (2019). 22. J Chai, et al., Structural basis of caspase-7 inhibition by xiap. , 769–780 (2001). 23. Y Xu, et al., Structure of the protein phosphatase 2a holoenzyme. , 1239–1251 (2006).

24. Z Wang, et al., Molecular and structural basis of the dual regulation of the polycystin-2 ion

channel by small-molecule ligands. Proc. Natl. Acad. Sci , e2316230121 (2024).

25. G Yang, et al., Structural basis of

-secretase inhibition and modulation by small molecule drugs. , 521–533 (2021).

26. M Chen, Q Su, Y Shi, Molecular mechanism of ige-mediated fc

ri activation. Nature 453–460 (2025). Y Zhao, et al., Cryo-em structures of apo and antagonist-bound human cav3. 1.

Nature 492–497 (2019). 28. X Yao, X Fan, N Yan, Cryo-em analysis of a membrane protein embedded in the liposome.

Proc. Natl. Acad. Sci , 18497–18503 (2020). T Wu, X Yang, X Jin, N Yan, Z Li, Critical role of extracellular loops in differential modulations of ttx-sensitive and ttx-resistant nav channels.

Proc. Natl. Acad. Sci , e2510355122 (2025).

30. L Xue, N Yan, C Song, Deciphering ca 2+ permeation and valence selectivity in cav1:

Molecular dynamics simulations reveal the three-ion knock-on mechanism.

Proc. Natl. Acad. , e2424694122 (2025).

31. T Wang, et al., Cryoseek ii: Cryo-em analysis of glycofibrils from freshwater reveals

well-structured glycans coating linear tetrapeptide repeats.

Proc. Natl. Acad. Sci e2423943122 (2025).

32. L Pauling, RB Corey, HR Branson, The structure of proteins: two hydrogen-bonded helical

configurations of the polypeptide chain. Proc. Natl. Acad. Sci , 205–211 (1951).

EG Emberly, NS Wingreen, C Tang, Designability of -helical proteins.

Proc. Natl. Acad. Sci , 11163–11168 (2002). J Skolnick, A Kolinski, CL Brooks, A Godzik, A Rey, A method for predicting protein structure from sequence.

Curr. Biol , 414–423 (1993). TN Sasaki, H Cetin, M Sasai, A coarse-grained langevin molecular dynamics approach to de novo protein structure prediction.

Biochem. Biophys. Res. Commun , 500–506 (2008).

36. I Kolossv´ary, Conceptual framework for performing simultaneous fold and sequence

optimization in multi-scale protein modeling. arXiv:1502.05592 (2014). 37. AJ Faure, et al., The genetic architecture of protein stability.

Nature , 995–1003 (2024). X Guan, et al., Predicting protein conformational motions using energetic frustration analysis and alphafold2.

Proc. Natl. Acad. Sci , e2410662121 (2024). 39. M AlQuraishi, Alphafold at casp13.

Bioinformatics , 4862–4865 (2019). 40. J Jumper, et al., Highly accurate protein structure prediction with alphafold.

Nature 583–589 (2021). J Abramson, et al., Accurate structure prediction of biomolecular interactions with alphafold 3.

Nature , 493–500 (2024).

42. M Baek, et al., Accurate prediction of protein structures and interactions using a three-track

neural network. Science , 871–876 (2021). 43. D Baker, A Sali, Protein structure prediction and structural genomics.

Science , 93–96 (2001). NG Espinoza, et al., Uncovering mechanisms of interleukin-4 biology using a novel cytokine mimetic in

2025 AIChE Annual Meeting

. (AIChE), (2025).

45. NE Gregorio, Z Li, JW Hoye, D Baker, CA DeForest, Stimuli-triggered formation of de

novo-designed protein biomaterials. Cell Biomater . (2025). 46. J Zhang, et al., Predicting protein-protein interactions in the human proteome.

Science eadt1630 (2025).

47. J Ishikawa, et al., Structural insights into rna-guided rna editing by the cas13b–adar2

complex. Nat. Struct. Mol. Biol . pp. 1–12 (2025). 48. ND Rochman, et al., Ongoing global and regional adaptive evolution of sars-cov-2.

Proc. Natl. Acad. Sci , e2104241118 (2021). 49. J Strecker, et al., Rna-guided dna insertion with crispr-associated transposases.

Science , 48–53 (2019). 50. D Patsch, et al., Enriching productive mutational paths accelerates enzyme evolution.

Chem. Biol , 1662–1669 (2024). 51. QY Tang, K Kaneko, Dynamics-evolution correspondence in protein structures.

Phys. Rev. , 098103 (2021). M Schwersensky, M Rooman, F Pucci, Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness.

BMC Biol , 146 (2020).

53. B Batut, DP Parsons, S Fischer, G Beslon, C Knibbe, In silico experimental evolution: a tool

to test evolutionary scenarios. BMC Bioinforma , S11 (2013).

Y Chen, G Guan, LH Tang, C Tang, Balancing reaction-diffusion network for cell polarization pattern with stability and asymmetry. eLife , RP96421 (2025).

D Zhang, Y Cao, Q Ouyang, Y Tu, Altruistic resource-sharing mechanism for synchronization:

The energy-speed-accuracy trade-off. Phys. Rev. Lett , 037401 (2025). 56. T Tlusty, A Libchaber, Life sets off a cascade of machines.

Proc. Natl. Acad. Sci e2418000122 (2025). ML Wong, et al., On the roles of function and selection in evolving systems.

Proc. Natl. Acad. , e2310223120 (2023). elin, H Li, NS Wingreen, C Tang, Designability, thermodynamic stability, and dynamics in protein folding: a lattice model study.

The J. Chem. Phys , 1252–1262 (1999).

59. H Li, R Helling, C Tang, N Wingreen, Emergence of preferred structures in a simple model of

protein folding. Science , 666–669 (1996). H Li, C Tang, NS Wingreen, Nature of driving force for protein folding: a result from analyzing the statistical potential.

Phys. Rev. Lett , 765 (1997). H Li, C Tang, NS Wingreen, Are protein folds atypical?

Proc. Natl. Acad. Sci , 4987–4990 (1998). 62. C Tang, Simple models of the protein folding problem.

Phys. A: Stat. Mech. its applications , 31–48 (2000).

A Trusina, FR Papa, C Tang, Rationalizing translation attenuation in the network architecture of the unfolded protein response.

Proc. Natl. Acad. Sci , 20280–20285 (2008).

64. P Mitra, D Shultis, Y Zhang, Evodesign: de novo protein design based on structural and

evolutionary profiles. Nucleic Acids Res , W273–W280 (2013).

H Sahakyan, SG Babajanyan, YI Wolf, EV Koonin, In silico evolution of globular protein folds from random sequences.

Proc. Natl. Acad. Sci , e2509015122 (2025). W Ma, L Lai, Q Ouyang, C Tang, Robustness and modular design of the drosophila segment polarity network.

Mol. Syst. Biol , 70 (2006). 67. F Tostevin, M Howard, Modeling the establishment of par protein polarity in the one-cell c. elegans embryo.

Biophys. J , 4512–4522 (2008). TR Sokolowski, T Gregor, W Bialek, G Tka cik, Deriving a genetic regulatory network from an optimization principle.

Proc. Natl. Acad. Sci , e2402925121 (2025). 69. RB Azevedo, et al., The simplicity of metazoan cell lineages.

Nature , 152–156 (2005).

70. Z Li, et al., Reconstructing cell lineage trees with genomic barcoding: approaches and

applications. J. Genet. Genomics , 35–47 (2024).

71. M Yuan, et al., Alignment of cell lineage trees elucidates genetic programs for the

development and evolution of cell types. Iscience (2020).

72. F Mohammadi, et al., A lineage tree-based hidden markov model quantifies cellular

heterogeneity and plasticity. Commun. Biol , 1258 (2022).

O Pourqui e, The segmentation clock: converting embryonic time into spatial pattern.

Science , 328–330 (2003).

74. A Sampath Kumar, et al., Spatiotemporal transcriptomic maps of whole mouse embryos at

the onset of organogenesis. Nat. Genet , 1176–1185 (2023).

75. A Nakamoto, et al., Changing cell behaviours during beetle embryogenesis correlates with

slowing of segmentation. Nat. Commun , 6635 (2015).

76. JE Sulston, E Schierenberg, JG White, JN Thomson, The embryonic cell lineage of the

nematode caenorhabditis elegans. Dev. Biol , 64–119 (1983).

77. H Nishida, Cell lineage analysis in ascidian embryos by intracellular injection of a tracer

enzyme: Iii. up to the tissue restricted stage. Dev. Biol , 526–541 (1987).

78. JC Lee, et al., Instructional materials that control cellular activity through synthetic notch

receptors. Biomaterials , 122099 (2023). 79. T Hayes, et al., Simulating 500 million years of evolution with a language model.

Science , 850–858 (2025).

80. N Chou, et al., Impact-absorbing helmet design inspired by walnut texture reaction-diffusion

mechanisms. Acta Biomater , 244–256 (2025).

81. T Yamada, et al., Synthetic organizer cells guide development via spatial and biochemical

instructions. , 778–795 (2025). T Kaletta, MO Hengartner, Finding function in novel targets: C. elegans as a model organism.

Nat. Rev. Drug Discov , 387–399 (2006).

83. AK Corsi, B Wightman, M Chalfie, A transparent window into biology: a primer on

caenorhabditis elegans. Genetics , 387–407 (2015).

84. X Ma, et al., A 4d single-cell protein atlas of transcription factors delineates spatiotemporal

patterning during embryogenesis. Nat. Methods , 893–902 (2021).

85. G Guan, et al., Cell lineage-resolved embryonic morphological map reveals signaling

associated with cell fate and size asymmetry. Nat. Commun , 3700 (2025).

86. Y Liu, Z Di, P Gerlee, Ladderpath approach: how tinkering and reuse increase complexity

and information. Entropy , 1082 (2022).

87. Z Zhang, et al., Evolutionary tinkering enriches the hierarchical and nested structures in

amino acid sequences. Phys. Rev. Res , 023215 (2024).

88. S Li, et al., Discovery of highly bioactive peptides through hierarchical structural information

and molecular dynamics simulations. J. Chem. Inf. Model , 8164–8175 (2024).

89. Z Xu, et al., Correlating measures of hierarchical structures in artificial neural networks with

their performance. npj Complex , 15 (2024).

90. L Rose, P G¨onczy, Polarity establishment, asymmetric division and segregation of fate

determinants in early c. elegans embryos. WormBook (2014). 91. L Hubatsch, et al., A cell-size threshold limits cell polarity and asymmetric division potential.

Nat. Phys , 1078–1085 (2019). 92. JR Priess, Notch signaling in the c. elegans embryo.

WormBook (2005). 93. H Sawa, H C. Korswagen, Wnt signaling in c. elegans.

WormBook (2013).

94. KM Mickey, CC Mello, MK Montgomery, A Fire, JR Priess, An inductive interaction in 4-cell

stage c. elegans embryos involves apx-1 expression in the signalling cell.

Development 1791–1798 (1996). CC Mello, BW Draper, JR Prless, The maternal genes apx-1 and glp-1 and establishment of dorsal-ventral polarity in the early c. elegans embryo. , 95–106 (1994).

96. G Weinmaster, VJ Roberts, G Lemke, A homolog of drosophila notch expressed during

mammalian development. Development , 199–205 (1991). 97. C Coffman, W Harris, C Kintner, Xotch, the xenopus homolog of drosophila notch.

Science , 1438–1441 (1990). A Raya, et al., Activation of notch signaling pathway precedes heart regeneration in zebrafish.

Proc. Natl. Acad. Sci , 11889–11895 (2003).

99. J Luis de la Pompa, et al., Conservation of the notch signalling pathway in mammalian

neurogenesis. Development , 1139–1148 (1997). AL Penton, LD Leonard, NB Spinner, Notch signaling in human development and disease in Seminars in Cell and Developmental Biology . (Elsevier), Vol. 23, pp. 450–457 (2012).

101. IP Moskowitz, JH Rothman, lin-12 and glp-1 are required zygotically for early embryonic

cellular interactions and are regulated by maternal glp-1 signaling in caenorhabditis elegans.

Development , 4105–4117 (1996). EJ Lambie, J Kimble, Two homologous regulatory genes, lin-12 and glp-1, have overlapping functions.

Development , 231–240 (1991).

103. AL Zacharias, T Walton, E Preston, JI Murray, Quantitative differences in nuclear

-catenin and tcf pattern embryonic cells in c. elegans.

PLoS Genet , e1005585 (2015). 104. LR Girard, et al., Wormbook: the online review of caenorhabditis elegans biology.

Nucleic Acids Res , D472–D475 (2007).

105. YW Lim, FL Wen, P Shankar, T Shibata, F Motegi, A balance between antagonizing par

proteins specifies the pattern of asymmetric and symmetric divisions in c. elegans embryogenesis.

Cell Reports (2021).

106. G Guan, MK Wong, Z Zhao, LH Tang, C Tang, Volume segregation programming in a

nematode’s early embryogenesis. Phys. Rev. E , 054409 (2021).

107. JE Sulston, E Schierenberg, JG White, JN Thomson, The embryonic cell lineage of the

nematode caenorhabditis elegans. Dev. Biol , 64–119 (1983). 108. Z Du, et al., The regulatory landscape of lineage differentiation in a metazoan embryo. , 592–607 (2015).

109. R Hunt-Newbury, et al., High-throughput in vivo analysis of gene expression in

caenorhabditis elegans. PLoS Biol , e237 (2007). JI Murray, et al., Automated analysis of embryonic gene expression with cellular resolution in c. elegans.

Nat. Methods , 703–709 (2008).

111. JS Packer, et al., A lineage-resolved molecular atlas of c. elegans embryogenesis at

single-cell resolution. Science , eaax1971 (2019).

112. L Li, et al., Spatiotemporal single-cell architecture of gene expression in the caenorhabditis

elegans germ cells. Cell Discov , 26 (2025). 113. P Xie, et al., Digital reconstruction of full embryos during early mouse organogenesis. (2025).

114. YB Tzur, et al., Spatiotemporal gene expression analysis of the caenorhabditis elegans

germline uncovers a syncytial expression switch. Genetics , 587–605 (2018).

A Camproux, P Tuffery, J Chevrolat, J Boisvieux, S Hazout, Hidden markov model approach for identifying the modular framework of the protein backbone.

Protein Eng , 1063–1073 (1999).

116. TL Bailey, N Williams, C Misleh, WW Li, Meme: discovering and analyzing dna and protein

sequence motifs. Nucleic Acids Res , W369–W373 (2006).

117. RD Finn, J Clements, SR Eddy, Hmmer web server: interactive sequence similarity

searching. Nucleic Acids Res , W29–W37 (2011).

118. QY Tang, K Kaneko, Long-range correlation in protein dynamics: Confirmation by structural

data and normal mode analysis. PLoS Comput. Biol , e1007670 (2020).

119. G Rix, et al., Continuous evolution of user-defined genes at 1 million times the genomic

mutation rate. Science , eadm9073 (2024). J Wu, B Liu, Y Cui, Are sutural structures in biology the optimal topological design?

Compos. Struct , 118825 (2025).

121. B Seelig, IA Chen, Intellectual frameworks to understand complex biochemical systems at

the origin of life. Nat. Chem , 11–19 (2025).

122. C Xia, Z Zhang, X Guan, Q Tang, Protein structural bioinformatics empowered by statistical

physics and artificial intelligence. Synth. Biol. J , 547 (2025).

Y Liu, et al., Exploring and mapping chemical space with molecular assembly trees.

Sci. Adv , eabj2465 (2021).

Submission history

Diversity-repeatability trade‑off governs hierarchical cell fate coding in multicellular development