The Fine Structure of the Transcriptome: Does It Reflect the Inverse Symmetry of the Genome?
Gregory Warr , Les Hatton
Frontiers in Bioscience-Landmark ›› 2025, Vol. 30 ›› Issue (11) : 45912
The nucleotide “words” (k-mers) of the genome exhibit two essentially universal properties that follow probabilistically from the Conservation of Hartley-Shannon Information (CoHSI): (1) a Zipfian rank-ordered distribution of frequencies and (2) universal inverse symmetry. Here, we address the presence of these 2 properties in the transcriptome, a question of interest given the strong and specific structure/function constraints on RNAs, especially the protein-coding (CDS) sequences.
CDS and ncRNA (non-coding RNA) databases were accessed at e!Ensembl. For determination of a power-law, statistical tests of both necessity (linearity) and sufficiency (confidence that a power-law distribution could not be rejected) were applied. Compliance with inverse symmetry was assessed by linearity and residual standard error.
The CDS and non-coding RNAs for 53 species were analyzed separately and the data presented as short movies. The results were consistent for all species analyzed, and taking the bonobo (Pan paniscus) as a representative species, the following results were obtained. For the Zipfian distribution of k-mer frequencies, statistically robust tests of both necessity (adjusted R-squared of 0.9932 and p ≤ 2.2 × 10-16) and sufficiency were obtained for the CDS; for non-coding RNAs the test of necessity was robust (adjusted R-squared = 0.9982 and p ≤ 2.2 × 10-16). Perturbations of inverse symmetry were observed in both CDS (slope = 0.91, adjusted R2 = 0.77) and non-coding RNAs (slope = 1.02, adjusted R2 = 0.84). The disruption of inverse symmetry in the CDS affected particularly the 3- and 6-mers and was shown to be associated with codon (especially stop codon) frequency in the open reading frame.
Whereas the CoHSI-predicted Zipfian distribution of k-mer frequencies was observed in both the protein-coding and non-coding RNAs of 53 species, in contrast the compliance with inverse symmetry was weaker. This weakening of compliance was seen to a greater extent in the CDS than in the non-coding portions of the transcriptome and may be associated with the necessity to maintain the integrity of the reading frame in the CDS. These results illustrate the general principle that local perturbations of an overall CoHSI-guided equilibrium state of a biological system can provide insight into the underlying causes of such perturbations.
transcriptome / k-mer frequency / Zipfian distribution / universal inverse symmetry / Conservation of Hartley-Shannon Information / CoHSI / mRNA / non-coding RNA
| [1] |
Warr G, Hatton L. The Architecture of the Genome Integrates Scale Independence with Inverse Symmetry. Academia: Molecular Biology and Genomics. 2025; 2. |
| [2] |
Zipf GK. Psycho-Biology of Languages: an introduction to dynamic philology. Houghton-Miflin: Boston, MA. 1935. |
| [3] |
Hatton L, Warr G. Strong evidence of an information-theoretical conservation principle linking all discrete systems. Royal Society Open Science. 2019; 6: 191101. https://doi.org/10.1098/rsos.191101. |
| [4] |
Hatton L, Warr G. Protein Multiplicity: Exemplifying an Overwhelmingly Likely Pattern of Molecular Evolution? Academia Biology. 2024; 2. https://doi.org/10.20935/AcadBiol7396. |
| [5] |
Rudner R, Karkas JD, Chargaff E. Separation of B. subtilis DNA into complementary strands. 3. Direct analysis. Proceedings of the National Academy of Sciences of the United States of America. 1968; 60: 921–922. https://doi.org/10.1073/pnas.60.3.921. |
| [6] |
Karkas JD, Rudner R, Chargaff E. Separation of B. subtilis DNA into complementary strands. II. Template functions and composition as determined by transcription with RNA polymerase. Proceedings of the National Academy of Sciences of the United States of America. 1968; 60: 915–920. https://doi.org/10.1073/pnas.60.3.915. |
| [7] |
Prabhu VV. Symmetry observations in long nucleotide sequences. Nucleic Acids Research. 1993; 21: 2797–2800. https://doi.org/10.1093/nar/21.12.2797. |
| [8] |
Yamagishi MEB, Herai RH. Chargaff’s Grammar of Biology: New Fractal-like Rules. arXiv. 2011. (preprint) |
| [9] |
Rosandić M, Vlahović I, Paar V. Novel look at DNA and life-Symmetry as evolutionary forcing. Journal of Theoretical Biology. 2019; 483: 109985. https://doi.org/10.1016/j.jtbi.2019.08.016. |
| [10] |
Kong SG, Fan WL, Chen HD, Hsu ZT, Zhou N, Zheng B, et al. Inverse symmetry in complete genomes and whole-genome inverse duplication. PLoS ONE. 2009; 4: e7553. https://doi.org/10.1371/journal.pone.0007553. |
| [11] |
Forsdyke DR. Symmetry observations in long nucleotide sequences: a commentary on the Discovery Note of Qi and Cuticchia. Bioinformatics. 2002; 18: 215–217. https://doi.org/10.1093/bioinformatics/18.1.215. |
| [12] |
Forsdyke DR. Genomic compliance with Chargaff’s second parity rule may have originated non-adaptively, but stem-loops now function adaptively. Journal of Theoretical Biology. 2024; 595: 111943. https://doi.org/10.1016/j.jtbi.2024.111943. |
| [13] |
Zhang J, Ferré-D’Amaré AR. The tRNA Elbow in Structure, Recognition and Evolution. Life. 2016; 6: 3. https://doi.org/10.3390/life6010003. |
| [14] |
Wang L, Xie J, Gong T, Wu H, Tu Y, Peng X, et al. Cryo-EM reveals mechanisms of natural RNA multivalency. Science. 2025; 388: 545–550. https://doi.org/10.1126/science.adv3451. |
| [15] |
Bao C, Zhu M, Nykonchuk I, Wakabayashi H, Mathews DH, Ermolenko DN. Specific length and structure rather than high thermodynamic stability enable regulatory mRNA stem-loops to pause translation. Nature Communications. 2022; 13: 988. https://doi.org/10.1038/s41467-022-28600-5. |
| [16] |
Peselis A, Serganov A. Structure and function of pseudoknots involved in gene expression control. Wiley Interdisciplinary Reviews. RNA. 2014; 5: 803–822. https://doi.org/10.1002/wrna.1247. |
| [17] |
Razumova E, Makariuk A, Dontsova O, Shepelev N, Rubtsova M. Structural Features of 5’ Untranslated Region in Translational Control of Eukaryotes. International Journal of Molecular Sciences. 2025; 26: 1979. https://doi.org/10.3390/ijms26051979. |
| [18] |
Huang X, Du Z. Possible involvement of three-stemmed pseudoknots in regulating translational initiation in human mRNAs. PLoS ONE. 2024; 19: e0307541. https://doi.org/10.1371/journal.pone.0307541. |
| [19] |
Penn WD, Harrington HR, Schlebach JP, Mukhopadhyay S. Regulators of Viral Frameshifting: More Than RNA Influences Translation Events. Annual Review of Virology. 2020; 7: 219–238. https://doi.org/10.1146/annurev-virology-012120-101548. |
| [20] |
Yan S, Zhu Q, Hohl J, Dong A, Schlick T. Evolution of coronavirus frameshifting elements: Competing stem networks explain conservation and variability. Proceedings of the National Academy of Sciences of the United States of America. 2023; 120: e2221324120. https://doi.org/10.1073/pnas.2221324120. |
| [21] |
Axtell MJ. Evolution of microRNAs and their targets: are all microRNAs biologically relevant? Biochimica et Biophysica Acta. 2008; 1779: 725–734. https://doi.org/10.1016/j.bbagrm.2008.02.007. |
| [22] |
Seligmann H, Raoult D. Unifying view of stem-loop hairpin RNA as origin of current and ancient parasitic and non-parasitic RNAs, including in giant viruses. Current Opinion in Microbiology. 2016; 31: 1–8. https://doi.org/10.1016/j.mib.2015.11.004. |
| [23] |
Clauset A, Shalizi CR, Newman ME. Power-Law Distributions in Empirical Data. SIAM Review. 2009; 51: 661–703. https://doi.org/10.1137/070710111. |
| [24] |
Gillespie CS. Fitting Heavy Tailed Distributions: The poweRlaw Package. Journal of Statistical Software. 2015; 64: 1–16. https://doi.org/10.18637/jss.v064.i02. |
| [25] |
Hatton L, Warr G. CoHSI V: Identical multiple scale-independent systems within genomes and computer software. arXiv. 2019. (preprint) |
| [26] |
Tse H, Cai JJ, Tsoi HW, Lam EP, Yuen KY. Natural selection retains overrepresented out-of-frame stop codons against frameshift peptides in prokaryotes. BMC Genomics. 2010; 11: 491. https://doi.org/10.1186/1471-2164-11-491. |
| [27] |
Seligmann H. Localized Context-Dependent Effects of the “Ambush” Hypothesis: More Off-Frame Stop Codons Downstream of Shifty Codons. DNA and Cell Biology. 2019; 38: 786–795. https://doi.org/10.1089/dna.2019.4725. |
| [28] |
Morgens DW, Chang CH, Cavalcanti ARO. Ambushing the Ambush Hypothesis: predicting and evaluating off-frame codon frequencies in prokaryotic genomes. BMC Genomics. 2013; 14: 418. https://doi.org/10.1186/1471-2164-14-418. |
| [29] |
Hatton L, Warr G. The Origin of Shared Emergent Properties in Discrete Systems. Entropy. 2025; 27: 561. https://doi.org/10.3390/e27060561. |
| [30] |
Warr G, Hatton L. The Covid-19 Pandemic and the Patterns of Nature. International Journal of Coronaviruses. 2024; 5: 10–17. |
| [31] |
Phillips GJ, Arnold J, Ivarie R. The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis. Nucleic Acids Research. 1987; 15: 2627–2638. https://doi.org/10.1093/nar/15.6.2627. |
| [32] |
Forsdyke DR, Mortimer JR. Chargaff’s legacy. Gene. 2000; 261: 127–137. https://doi.org/10.1016/s0378-1119(00)00472-8. |
| [33] |
Hatton L, Warr G. Exposing Nature’s Bias: the Hidden Clockwork behind Society, Life and the Universe. Bluespear Publishing: Kingston. 2022. |
/
| 〈 |
|
〉 |