< Back to 68k.news US front page

Evolution of tissue-specific expression of ancestral genes across vertebrates and insects

Original source (on modern site)

Data availability

The FASTQ and processed files of the RNA-seq samples generated for this project are available at GEO under series GSE205498. The Supplementary Dataset is available via Mendeley Data at https://doi.org/10.17632/22m3dwhzk6.2 (ref. 67).

Code availability

All code used for analysis and figure generation is available on GitHub at https://github.com/fedemantica/bilaterian_GE (ref. 68).

References

  1. Evans, S. D., Hughes, I. V., Gehling, J. G. & Droser, M. L. Discovery of the oldest bilaterian from the Ediacaran of South Australia. Proc. Natl Acad. Sci. USA 117, 7845-7850 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Brusca, R. C., Moore, W. & Shuster, S. M. Invertebrates 345-372 (Sinauer Associates, 2016).

  3. Paps, J. & Holland, P. W. H. Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty. Nat. Commun. 9, 1730 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Fernández, R. & Gabaldón, T. Gene gain and loss across the metazoan tree of life. Nat. Ecol. Evol. 4, 524-533 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Lopez-Bigas, N., De, S. & Teichmann, S. A. Functional protein divergence in the evolution of Homo sapiens. Genome Biol. 9, R33 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the evolution of animal body plans. Science 311, 796-800 (2006).

    Article  CAS  PubMed  Google Scholar 

  7. King, M.-C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107-116 (1975).

    Article  CAS  PubMed  Google Scholar 

  8. Carroll, S. B. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25-36 (2008).

    Article  CAS  PubMed  Google Scholar 

  9. True, J. R. & Carroll, S. B. Gene co-option in physiological and morphological evolution. Annu. Rev. Cell Dev. Biol. 18, 53-80 (2002).

    Article  CAS  PubMed  Google Scholar 

  10. Arntfield, M. E. & van der Kooy, D. β-Cell evolution: how the pancreas borrowed from the brain: the shared toolbox of genes expressed by neural and pancreatic endocrine cells may reflect their evolutionary relationship. Bioessays 33, 582-587 (2011).

    Article  PubMed  Google Scholar 

  11. Almudi, I. et al. Genomic adaptations to aquatic and aerial life in mayflies and the origin of insect wings. Nat. Commun. 11, 2631 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Clark-Hachtel, C. M. & Tomoyasu, Y. Two sets of candidate crustacean wing homologues and their implication for the origin of insect wings. Nat. Ecol. Evol. 4, 1694-1702 (2020).

    Article  PubMed  Google Scholar 

  13. Bruce, H. S. & Patel, N. H. Knockout of crustacean leg patterning genes suggests that insect wings and body walls evolved from ancient leg segments. Nat. Ecol. Evol. 4, 1703-1712 (2020).

    Article  PubMed  Google Scholar 

  14. Martín-Durán, J. M. et al. Convergent evolution of bilaterian nerve cords. Nature 553, 45-50 (2018).

    Article  PubMed  Google Scholar 

  15. Thomas, J. A., Welch, J. J., Lanfear, R. & Bromham, L. A generation time effect on the rate of molecular evolution in invertebrates. Mol. Biol. Evol. 27, 1173-1180 (2010).

    Article  CAS  PubMed  Google Scholar 

  16. Wyder, S., Kriventseva, E. V., Schröder, R., Kadowaki, T. & Zdobnov, E. M. Quantification of ortholog losses in insects and vertebrates. Genome Biol. 8, R242 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-348 (2011).

    Article  CAS  PubMed  Google Scholar 

  18. Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505-509 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chen, J. et al. A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Res. 29, 53-63 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Fukushima, K. & Pollock, D. D. Amalgamated cross-species transcriptomes reveal organ-specific propensity in gene expression evolution. Nat. Commun. 11, 4459 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587-1593 (2012).

    Article  CAS  PubMed  Google Scholar 

  22. Lê Cao, K.-A., Boitard, S. & Besse, P. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12, 253 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Burkhardt, P. & Sprecher, S. G. Evolutionary origin of synapses and neurons - bridging the gap. Bioessays 39, 1700024 (2017).

    Article  Google Scholar 

  24. Sebé-Pedrós, A. et al. Cnidarian cell type diversity and regulation revealed by whole-organism single-cell RNA-seq. Cell 173, 1520-1534.e20 (2018).

    Article  PubMed  Google Scholar 

  25. Inaba, K. Sperm flagella: comparative and phylogenetic perspectives of protein components. Mol. Hum. Reprod. 17, 524-538 (2011).

    Article  CAS  PubMed  Google Scholar 

  26. Daldello, E. M., Luong, X. G., Yang, C.-R., Kuhn, J. & Conti, M. Cyclin B2 is required for progression through meiosis in mouse oocytes. Development 146, dev172734 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li, J., Ouyang, Y.-C., Zhang, C.-H., Qian, W.-P. & Sun, Q.-Y. The cyclin B2/CDK1 complex inhibits separase activity in mouse oocyte meiosis I. Development 146, 648053 (2019).

    Google Scholar 

  28. Zeng, Y. et al. Bi-allelic mutations in MOS cause female infertility characterized by preimplantation embryonic arrest. Hum. Reprod. 37, 612-620 (2022).

    Article  CAS  PubMed  Google Scholar 

  29. Tay, J., Hodgman, R., Sarkissian, M. & Richter, J. D. Regulated CPEB phosphorylation during meiotic progression suggests a mechanism for temporal control of maternal mRNA translation. Genes Dev. 17, 1457-1462 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Gąsiorowski, L. et al. Molecular evidence for a single origin of ultrafiltration-based excretory organs. Curr. Biol. 31, 3629-3638.e2 (2021).

    Article  PubMed  Google Scholar 

  31. Thakurela, S. et al. Mapping gene regulatory circuitry of Pax6 during neurogenesis. Cell Discov. 2, 15045 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Eckler, M. J. & Chen, B. Fez family transcription factors: controlling neurogenesis and cell fate in the developing mammalian nervous system. Bioessays 36, 788-797 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Taylor, M. V. & Hughes, S. M. Mef2 and the skeletal muscle differentiation program. Semin. Cell Dev. Biol. 72, 33-44 (2017).

    Article  CAS  PubMed  Google Scholar 

  34. Mathiyalagan, N. et al. Meta-analysis of grainyhead-like dependent transcriptional networks: a roadmap for identifying novel conserved genetic pathways. Genes 10, 876 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650-659 (2005).

    Article  CAS  PubMed  Google Scholar 

  36. Roelofs, D. et al. Multi-faceted analysis provides little evidence for recurrent whole-genome duplications during hexapod evolution. BMC Biol. 18, 57 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Marlétaz, F. et al. Amphioxus functional genomics and the origins of vertebrate gene regulation. Nature 564, 64-70 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Oji, A. et al. Tesmin, metallothionein-like 5, is required for spermatogenesis in mice. Biol. Reprod. 102, 975-983 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Jiang, J., Benson, E., Bausek, N., Doggett, K. & White-Cooper, H. Tombola, a tesmin/TSO1-family protein, regulates transcriptional activation in the Drosophila male germline and physically interacts with always early. Development 134, 1549-1559 (2007).

    Article  CAS  PubMed  Google Scholar 

  40. Hines, J. H. Evolutionary origins of the oligodendrocyte cell type and adaptive myelination. Front. Neurosci. 15, 757360 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Ramirez, M. D. & Oakley, T. H. Eye-independent, light-activated chromatophore expansion (LACE) and expression of phototransduction genes in the skin of Octopus bimaculoides. J. Exp. Biol. 218, 1513-1520 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Iram, T. et al. Young CSF restores oligodendrogenesis and memory in aged mice via Fgf17. Nature 605, 509-515 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Hartenstein, V. & Martinez, P. Structure, development and evolution of the digestive system. Cell Tissue Res. 377, 289-292 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Ottaviani, E., Malagoli, D. & Franceschi, C. The evolution of the adipose tissue: a neglected enigma. Gen. Comp. Endocrinol. 174, 1-4 (2011).

    Article  CAS  PubMed  Google Scholar 

  45. Kryuchkova-Mostacci, N. & Robinson-Rechavi, M. Tissue-specificity of gene expression diverges slowly between orthologs, and rapidly between paralogs. PLoS Comput. Biol. 12, e1005274 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Lien, S. et al. The Atlantic salmon genome provides insights into rediploidization. Nature 533, 200-205 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Fernández, R. et al. Selection following gene duplication shapes recent genome evolution in the pea aphid Acyrthosiphon pisum. Mol. Biol. Evol. 37, 2601-2615 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Farré, D. & Albà, M. M. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol. Biol. Evol. 27, 325-335 (2010).

    Article  PubMed  Google Scholar 

  49. Clark, J. W. & Donoghue, P. C. J. Constraining the timing of whole genome duplication in plant evolutionary history.Proc. Biol. Sci. 284, 20170912 (2017).

    PubMed  PubMed Central  Google Scholar 

  50. Macqueen, D. J. & Johnston, I. A. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc. Biol. Sci. 281, 20132881 (2014).

    PubMed  PubMed Central  Google Scholar 

  51. Donoghue, P. C. J. & Purnell, M. A. Genome duplication, extinction and vertebrate evolution. Trends Ecol. Evol. 20, 312-319 (2005).

    Article  PubMed  Google Scholar 

  52. Almudí, I. & Pascual-Anaya, J. in Old Questions and Young Approaches to Animal Evolution (eds Martín-Durán, J. M. & Vellutini, B. C.) 107-132 (Springer, 2019).

  53. Derelle, R., Philippe, H. & Colbourne, J. K. Broccoli: combining phylogenetic and network analyses for orthology assignment. Mol. Biol. Evol. 37, 3389-3396 (2020).

    Article  CAS  PubMed  Google Scholar 

  54. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525-527 (2016).

    Article  CAS  PubMed  Google Scholar 

  55. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Tapial, J. et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res. 27, 1759-1768 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Kolberg, L., Raudvere, U., Kuzmin, I., Vilo, J. & Peterson, H. gprofiler2—an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler.F1000Research 9, ELIXIR-709 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988-D995 (2022).

    Article  CAS  PubMed  Google Scholar 

  61. Gramates, L. S. et al. FlyBase: a guided tour of highlighted features. Genetics 220, iyac035 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Jin, L. et al. A pig BodyMap transcriptome reveals diverse tissue physiologies and evolutionary dynamics of transcription. Nat. Commun. 12, 3715 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Wang, Z.-Y. et al. Transcriptome and translatome co-evolution in mammals. Nature 588, 642-647 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Guschanski, K., Warnefors, M. & Kaessmann, H. The evolution of duplicate gene expression in mammalian organs. Genome Res. 27, 1461-1474 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Touceda-Suárez, M. et al. Ancient genomic regulatory blocks are a source for regulatory gene deserts in vertebrates after whole-genome duplications. Mol. Biol. Evol. 37, 2857-2864 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2021).

  67. Mantica, F. & Irimia, M. Pervasive evolution of tissue-specificity of ancestral genes differentially shaped vertebrates and insects, V2. Mendeley Data https://doi.org/10.17632/22m3dwhzk6.2 (2023).

  68. fedemantica. bilaterian_GE. GitHub https://github.com/fedemantica/bilaterian_GE (2023).

  69. Kumar, S. et al. TimeTree 5: an expanded resource for species divergence times. Mol. Biol. Evol. 39, msac174 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Q. T. Ramon for the original drawing of tissue icons; N. Arecco, N. B. Morais, A. Sebé-Pedrós and D. Weghorn for critical feedback on the manuscript; and the CRG Genomics Unit for the RNA sequencing. This research was funded by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (ERC-StG-LS2-637591 and ERCCoG-LS2-101002275 to M.I.), by the Spanish Ministry of Economy and Competitiveness (BFU-2017-89201-P and PID2020-115040GB-I00 to M.I.) and by the 'Centro de Excelencia Severo Ochoa 2013-2017'(SEV-2012-0208). F.M. holds a FPI fellowship associated with the grant BFU-2017-89201-P. Additional support for this research was provided by the Spanish MINECO (PGC2018-098427- B-I00 to D.M. and X.F.-M.), the Czech Science Foundation (22-21244S to M.N.), the Australian Research Council (grant DP200103219 to P.D.C. and F.T.) and the National Institutes of Health-NIAID (grant R21AI167849 to F.G.N.).

Author information

Authors and Affiliations

  1. Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain

    Federica Mantica, Luis P. Iñiguez, Yamile Marquez, Jon Permanyer, Antonio Torres-Mendez, Demian Burguera & Manuel Irimia

  2. Universitat Pompeu Fabra, Barcelona, Spain

    Federica Mantica & Manuel Irimia

  3. Institute of Evolutionary Biology (IBE, CSIC-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain

    Josefa Cruz, Xavier Franch-Marro & David Martin

  4. Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia

    Frank Tulenko & Peter D. Currie

  5. Sorbonne Université, CNRS, Biologie Intégrative des Organismes Marins; BIOM, Banyuls-sur-Mer, France

    Stephanie Bertrand & Hector Escriva

  6. Centre for Ecology and Conservation, University of Exeter, Penryn, UK

    Toby Doyle & Karl R. Wotton

  7. Institute of Parasitology, CAS, České Budějovice, Czech Republic

    Marcela Nouzova

  8. EMBL Australia; Victorian Node, Monash University, Clayton, Victoria, Australia

    Peter D. Currie

  9. Biology and BSI, Florida International University, Miami, FL, USA

    Fernando G. Noriega

  10. Department of Parasitology, University of South Bohemia, České Budějovice, Czech Republic

    Fernando G. Noriega

  11. Stazione Zoologica Anton Dohrn, Napoli, Italy

    Maria Ina Arnone

  12. Eugene Bell Center for Regenerative Biology and Tissue Engineering, Marine Biological Laboratory, Woods Hole, MA, USA

    Caroline B. Albertin

  13. Department of Genetics, Microbiology and Statistics and IRBio, Universitat de Barcelona, Barcelona, Spain

    Isabel Almudi

  14. ICREA, Barcelona, Spain

    Manuel Irimia

Contributions

F.M. performed most analyses and generated most figures and tables. L.P.I. built the motif dataset, designed and performed all motif-related analysis, and contributed to intellectual discussion. Y.M. and A.T.-M. performed additional analyses and contributed to intellectual discussion. J.P., A.T.-M., J.C., X.F.-M., F.T., D.B., S.B., T.D., M.N., P.D.C., F.G.N., H.E., M.I.A., C.B.A., K.R.W., I.A. and D.M. contributed RNA and/or tissue samples. F.M. and M.I. wrote the manuscript.

Corresponding author

Correspondence to Manuel Irimia.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks Marie Sémon, Emily Wong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Gene annotation refinements and statistics of bilaterian-conserved orthogroups.

a. Schematic representation of broken (left) and chimeric (right) genes and how they potentially influence gene orthology inferences. Animal silhouettes were downloaded from http://phylopic.org/. Credits to Gareth Monger for the hoverfly icon (https://creativecommons.org/licenses/by/3.0/). b. Examples of a broken (left) and chimeric (right) genes corrected in the silkworm gene annotation. c. Statistics of corrected and unresolved broken and chimeric genes across all species. d. Results from a BUSCO run (options -m proteins -l metazoa_odb10) assessing the status of 954 metazoa single-copy orthologs in the proteomes of all the species. CS: complete and single-copy, CD: complete and duplicated, F: fragmented, M: missing. e. Barplot representing the number of bilaterian-conserved (red) or more recent (grey) protein-coding genes across all species. The line plot represents the number of bilaterian-conserved orthogroups (OGs; that is, orthogroups conserved in at least 12 species) in which genes from each species are represented. f. Proportions of bilaterian-conserved orthogroups based on the number of species in which they are conserved.

Extended Data Fig. 2 Definition of best-ancestral and best-TS orthogroups.

a. Schematic and relative example for the selection of bilaterian-conserved, best-ancestral orthologs in each species and tissue (see Supplementary Methods). b. Distributions of Pearson's correlation coefficients from all intra-tissue, species pairwise comparisons of gene expression upon distinct procedures for paralog selection and gene expression quantification. The expression measure for each species in each orthogroup corresponds to the expression of its best-ancestral ortholog (Best_anc), the average expression among all its paralogs (Average), the summed expression among all its paralogs (Summed) and the expression of a randomly selected paralog (Random). Significance levels of two-sided Wilcoxon rank-sum tests comparing the Best_anc distribution to each of the others are reported at the top, while the median value of each distribution is printed at the bottom. Correlations are performed on z-scored expression matrices (see Supplementary Methods). Only the 2,436 gene orthogroups conserved in all species (n = 20) were considered. P-value significance levels are defined as follows: **** = p-value ≤ 0.0001, *** = p-value ≤ 0.001, ** = p-value ≤ 0.01, * = p-value ≤ 0.05. The boxplot features are defined as follows: the center line represents the median; the lower and upper hinges correspond to the 25th and 75th percentiles; the lower and upper whiskers extend respectively to the lowest and highest points, to a limit of 1.5 multiplied by the interquartile range from the closest hinge. c. Schematic of the procedure adopted to associate all tissue-specific genes in each species (Tau ≥ 0.75) with the tissue(s) with tissue-specificity. This association (which we also evaluated for non-tissue-specific genes) will be considered for the inference of tissue-specificity gains (Extended Data Fig. 5). Additionally, we identified the top tissue(s) (that is, the tissue(s) with the highest expression) for all bilaterian-conserved genes, which will be considered for the selection of the best-TS orthogroups and the inference of tissue-specificity losses in each tissue (panel d and Extended Data Fig. 5c, respectively). d. Schematic and relative example for the selection of the best-TS ortholog in each species (see Supplementary Methods). Animal silhouettes were downloaded from http://phylopic.org/.

Extended Data Fig. 3 Partial conservation of tissue-specific expression profiles among ancestral bilaterian genes.

a. Coordinates of the second (PC2; x axis) and third (PC3; y axis) components of a PCA performed on the best-ancestral orthogroups normalized gene expression matrix. Only the 2,436 best-ancestral orthogroups conserved in all species were considered. Tissue identity is represented by colors and species by shape. The left panel shows all tissues, while the right panel highlights neural and testis samples compared to all others. Coordinate distributions of these three groups of meta-samples are shown on the side of the relative component. The percentage of variance explained by each PC is reported on the relative axis. b. Percentage of variance explained by the first 15 principal components from the PCA described in a. c. -log10(p-value) of two-sided ANOVA tests performed among the coordinates of the specified groups on each component. For the left panel (green) we tested if there was a significant difference between tissues or species groups. For the center and right panel (blue and orange) we tested if there was a significant difference between any query group (that is, column) versus all other collapsed groups. All tests were performed with the aov function in R, and p-values were Bonferroni corrected. d. Heatmap showing the clustering of tissues and species (rows) based on the expression across tissues of best-ancestral bilaterian-conserved orthogroups (columns). Expression values were z-scored across tissues of the same species in order to minimize the inter-species variability (see Supplementary Methods for the definition of the best-ancestral orthogroups z-scored expression matrix). Only the 2,436 best-ancestral orthogroups conserved in all species were considered. The heatmap was generated by the pheatmap function in R with ward.D2 clustering method. Tissue colors refer to panel a.

Extended Data Fig. 4 Expression profiles of ancestral bilaterian tissue-specific expression modules.

a-f. Coordinates of components returned by a sparse partial least square discriminant analysis (sPLS-DA) run separating the meta-samples of each tissue group (depicted with the relative colors) from all the others (grey). All 7,178 best-ancestral orthogroups were considered. The loadings of these components will be used to define the ancestral bilaterian tissue-specific modules (see Fig. 2a,b). The percentage of variance explained by each component is reported on the relative axis. g-l: Expression profiles across tissues of best-ancestral orthogroups in the ancestral tissue-specific modules (see Fig. 2c,d for neural and testis modules). (l) ovary module (n = 42); (h) muscle module (n = 112); (i) excretory module (n = 29); (j) epidermis module (n = 17); (k) digestive module (n = 51); (l) adipose module (n = 6). Expression values were first z-scored by species, and each dot represents the median expression among vertebrates, insects or outgroups. The boxplot features are defined as follows: the center line represents the median; the lower and upper hinges correspond to the 25th and 75th percentiles; the lower and upper whiskers extend respectively to the lowest and highest points, to a limit of 1.5 multiplied by the interquartile range from the closest hinge.

Extended Data Fig. 5 Examples and criteria for phylogenetic inferences of tissue-specificity gains and losses.

a. Examples and criteria for the inference of tissue-specificity gains on either the deuterostome or protostome branches with the strict approach (left panel) and the relaxed approach (right panel). b. Example and criteria for the inference of ancestral bilaterian tissue-specificity. c. Examples and criteria for the inference of tissue-specificity losses. NB: the best-TS orthogroups are the ones considered for all inferences of tissue-specificity gains and losses (see Methods and Extended Data Fig. 2c,d).

Extended Data Fig. 6 Validation of inferred tissue-specificity gains.

a-g. Orthogonal validation of all the inferred tissue-specificity gains in each tissue for which we could implement an OUs comparison method (see Supplementary Methods and Supplementary Discussion). The first bar always corresponds to the selected tissue-specificity gains (TS gains), while the second and third bars represent control sets (of the same size as the test set) sampled from either all best-ancestral orthogroups (BA) or best-ancestral orthogroups without tissue-specificity gains (BA no TS), to which we randomly assigned the tissue-specificity labels of the corresponding test set (see Methods). Left barplot: proportions of orthogroups based on the OU model (either a double-optima or a single-optimum) that better fits the relative expression levels. The double-optima OU model postulates different expression optima for the species with and without tissue-specificity, where the latter also include all species with losses. Right barplot: proportions of orthogroups better fitting a double-optima OU model (in red on the left barplot) depending on whether the species with tissue-specificity show higher/lower average relative expression compared to species without (TS greater/lower, respectively).

Extended Data Fig. 7 Extra statistics of tissue-specificity gains and losses.

a. Barplots representing the number of inferred tissue-specificity gains (left) and losses (right) across all nodes/species (rows) and tissues (columns). Best-TS, bilaterian-conserved orthogroups were considered for these inferences. b. Proportion of tissue-specificity gains in each node/species occurring in best-TS orthogroups that include 2R-onhologs. Deuterostome nodes/species are distinguished between those diverging before (transparent color) or after (full color) the two rounds of vertebrate WGDs. The black line represents the proportion of 2R-onhologs across all tissue-specificity gains. c. Proportions of duplicated (that is, with at least one paralog) or non-duplicated (that is, single-copy) genes with tissue-specific, species-specific gains in all species. The background line represents the overall proportion of duplicated genes in each species. d,e. Same data represented in Fig. 4f, but plotted separately across all nodes (d) and species (e). NB: Bilaterian "gains" indicate ancestral bilaterian tissue-specificity, which might have been acquired either in the last bilaterian ancestor or previously in evolution. Abbreviations: Euarch: Euarchontoglires.

Extended Data Fig. 8 Expression profiles of tissue-specific genes compared to non-tissue-specific orthologs and paralogs.

a. Barplot: proportions of duplicated (that is, with at least one paralog) or non-duplicated (that is, single-copy) tissue-specific genes in each species. Boxplot: proportions of duplicated tissue-specific genes in each species upon the ten randomizations of the original orthogroups (see Methods). The asterisks indicate a significant difference (one-sided binomial test, alternative = "less"; p-value ≤ 0.05) between the observed proportion of duplicated tissue-specific genes and the median of such proportions coming from the randomization trials. The background line represents the overall proportion of duplicated genes in each species. The boxplot features are defined as follows: the center line represents the median; the lower and upper hinges correspond to the 25th and 75th percentiles; the lower and upper whiskers extend respectively to the lowest and highest points, to a limit of 1.5 multiplied by the interquartile range from the closest hinge. Outliers points are plotted individually. The total number of considered genes is reported above each species' bar. b. Scheme illustrating how tissue-specific expression can be gained following gene duplication and specialization. Color dots indicate expression in the relative tissue, white dots represent lack of expression. c. For each tissue, median gene expression in each bilaterian-conserved orthogroups for species possessing at least one tissue-specific and one non-tissue-specific gene. Expression of tissue-specific genes is plotted on the left, while expression of their non-tissue specific paralogs is shown on the right. Each data point in each tissue's boxplot is the median of the relative expression in that tissue for all corresponding genes and species. The total number of considered genes is reported in the relative plot. See panel a for description of boxplot features. d. Median gene expression across tissues for bilaterian-conserved orthogroups with tissue-specific gains in each tissue. Left: best-TS orthologs of the species with tissue-specificity. Right: best-TS orthologs in the other species. Each data point in each tissue's boxplot is the median of the relative expression in that tissue for all corresponding genes and species. See panel a for description of boxplot features. Distributions for gains within single nodes/species are available in the Supplementary Dataset.

Extended Data Fig. 9 Divergent and convergent evolution of tissue-specificity gains.

a. Alluvia plot representing the best-TS, bilaterian-conserved orthogroups with tissue-specificity gains in distinct tissues between deuterostome (left) or protostome (right) nodes and species. Only orthogroups with gains in exclusively one tissue on each branch were considered. b. Number of parallel tissue-specificity gains between the deuterostome and protostome branch for all pairs of tissues represented in panel a. c. Plot from a Gene Set Enrichment Analysis (GSEA) testing for over-representation of developmental categories (760 out of 5779) among categories with high proportions of orthogroups that undergo species-specific gains of tissue-specificity. Only categories including at least 10 gene orthogroups were considered. The shown p-value refers to GSEA. d. Proportions of developmental GO categories among the top 5% (that is 95th percentile) of all GO categories ranked based on the proportions of their annotated orthogroups that undergo species-specific gains. The plotted values derive from 1000 randomization of the developmental labels among all GO categories, with the vertical dashed line corresponding to the observed proportion. Abbreviations: N: neural, T: testis, O: ovary, M: muscle, X: excretory, E: epidermis, D: digestive, A: adipose, NES: normalized enrichment score.

Extended Data Fig. 10 Developmental and adult expression of FGF17 in mammalian species.

a-f: Expression values (RPKMs) for human FGF17 (a) and its orthologs in five mammalian species (b-f) across several developmental and adult timepoints in seven tissues. Data from18.

Supplementary information

Supplementary Information

Supplementary Methods, Discussion, Figs. 1-12, Description of Supplementary Dataset content and References.

Supplementary Data

Supplementary Data 1. Annotation corrections. Catalogue of corrected broken genes, resolved and unresolved chimaeric genes, together with the correspondences between original and new GeneIDs. Supplementary Data 2. Gene orthogroups statistics. Statistics of protein-coding genes from each species included in the gene orthogroups before and after correction and filtering, together with the statistics of bilaterian-conserved orthogroups. The 2R-ohnologue groups from ref. 6 used for one of the correction steps are reported. Original and corrected gene orthogroups files are provided in Supplementary Dataset, together with separate files for bilaterian-conserved orthogroups. Supplementary Data 3. RNA-seq metadata information. For each species and tissue, including a list of all samples, sample to metasample correspondence, sample origin (that is, public or in-house), SRA identifier, BioProject identifier, read number, read length, sequencing strategy (that is, paired or single end), sequencing technology and mapping statistics. Supplementary Data 4. Ancestral bilaterian tissue-specific modules. A list of orthogroups IDs, human and fruit fly best-hits gene symbols for all the orthogroups included in each module. Supplementary Data 5. Neural and testis phenotypes in human, mouse and fruit fly associated with neural and testis ancestral bilaterian tissue-specific modules. Neural phenotypes were defined as anything matching 'neuro', 'behaviour', 'brain', 'glia' and 'CNS' (case insensitive), while testis phenotypes were defined as anything matching 'sperm', 'infert', 'sterile', 'testis' (case insensitive). Supplementary Data 6. GO enrichments for each bilaterian ancestral tissue-specific module as provided by gprofiler2 and using the GO transfers derived from the human GO annotation. All P values were FDR corrected. Supplementary Data 7. GO enrichments for each bilaterian ancestral tissue-specific module as provided by gprofiler2 and using the GO transfers derived from the fruit fly GO annotation. All P values were FDR corrected. Supplementary Data 8. GO enrichments for non-tissue-specific orthogroups as provided by gprofiler2. All P values were FDR corrected. Supplementary Data 9. Tissue-specificity gains and losses throughout the phylogeny. For each tissue, a list of inferences is provided that include the orthogroup ID, the relative node/species, the inference type (that is, gain or loss) and the inference criteria (that is, strict, relaxed or merged). Supplementary Data 10. GO enrichments for orthogroups with tissue-specific gains in each node and species, as provided by gprofiler2 and using the GO transfers derived from the human GO annotation. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected. Supplementary Data 11. GO enrichments for orthogroups with tissue-specific gains in Vertebrata or more recent nodes and vertebrate species, as provided by gprofiler2 and using the GO transfers derived from the vertebrate-specific GO annotation. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected. Supplementary Data 12. GO enrichments for orthogroups with tissue-specific gains in Insecta or more recent nodes and insect species, as provided by gprofiler2 and using the GO transfers derived from the insect-specific GO annotation. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected. Supplementary Data 13. GO enrichments for orthogroups with tissue-specific gains (as provided by gprofiler2 and using the GO transfers derived from the human GO annotation) uniquely enriched in each node and species compared with all other gains in the same tissue. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mantica, F., Iñiguez, L.P., Marquez, Y. et al. Evolution of tissue-specific expression of ancestral genes across vertebrates and insects. Nat Ecol Evol (2024). https://doi.org/10.1038/s41559-024-02398-5

Download citation

< Back to 68k.news US front page