Original source (on modern site)
The FASTQ and processed files of the RNA-seq samples generated for this project are available at GEO under series GSE205498. The Supplementary Dataset is available via Mendeley Data at https://doi.org/10.17632/22m3dwhzk6.2 (ref. 67). All code used for analysis and figure generation is available on GitHub at https://github.com/fedemantica/bilaterian_GE (ref. 68). Evans, S. D., Hughes, I. V., Gehling, J. G. & Droser, M. L. Discovery of the oldest bilaterian from the Ediacaran of South Australia. Proc. Natl Acad. Sci. USA 117, 7845-7850 (2020). Article
CAS
PubMed
PubMed Central
Google Scholar
Brusca, R. C., Moore, W. & Shuster, S. M. Invertebrates 345-372 (Sinauer Associates, 2016). Paps, J. & Holland, P. W. H. Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty. Nat. Commun. 9, 1730 (2018). Article
PubMed
PubMed Central
Google Scholar
Fernández, R. & Gabaldón, T. Gene gain and loss across the metazoan tree of life. Nat. Ecol. Evol. 4, 524-533 (2020). Article
PubMed
PubMed Central
Google Scholar
Lopez-Bigas, N., De, S. & Teichmann, S. A. Functional protein divergence in the evolution of Homo sapiens. Genome Biol. 9, R33 (2008). Article
PubMed
PubMed Central
Google Scholar
Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the evolution of animal body plans. Science 311, 796-800 (2006). Article
CAS
PubMed
Google Scholar
King, M.-C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107-116 (1975). Article
CAS
PubMed
Google Scholar
Carroll, S. B. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25-36 (2008). Article
CAS
PubMed
Google Scholar
True, J. R. & Carroll, S. B. Gene co-option in physiological and morphological evolution. Annu. Rev. Cell Dev. Biol. 18, 53-80 (2002). Article
CAS
PubMed
Google Scholar
Arntfield, M. E. & van der Kooy, D. β-Cell evolution: how the pancreas borrowed from the brain: the shared toolbox of genes expressed by neural and pancreatic endocrine cells may reflect their evolutionary relationship. Bioessays 33, 582-587 (2011). Article
PubMed
Google Scholar
Almudi, I. et al. Genomic adaptations to aquatic and aerial life in mayflies and the origin of insect wings. Nat. Commun. 11, 2631 (2020). Article
CAS
PubMed
PubMed Central
Google Scholar
Clark-Hachtel, C. M. & Tomoyasu, Y. Two sets of candidate crustacean wing homologues and their implication for the origin of insect wings. Nat. Ecol. Evol. 4, 1694-1702 (2020). Article
PubMed
Google Scholar
Bruce, H. S. & Patel, N. H. Knockout of crustacean leg patterning genes suggests that insect wings and body walls evolved from ancient leg segments. Nat. Ecol. Evol. 4, 1703-1712 (2020). Article
PubMed
Google Scholar
Martín-Durán, J. M. et al. Convergent evolution of bilaterian nerve cords. Nature 553, 45-50 (2018). Article
PubMed
Google Scholar
Thomas, J. A., Welch, J. J., Lanfear, R. & Bromham, L. A generation time effect on the rate of molecular evolution in invertebrates. Mol. Biol. Evol. 27, 1173-1180 (2010). Article
CAS
PubMed
Google Scholar
Wyder, S., Kriventseva, E. V., Schröder, R., Kadowaki, T. & Zdobnov, E. M. Quantification of ortholog losses in insects and vertebrates. Genome Biol. 8, R242 (2007). Article
PubMed
PubMed Central
Google Scholar
Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343-348 (2011). Article
CAS
PubMed
Google Scholar
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature 571, 505-509 (2019). Article
CAS
PubMed
PubMed Central
Google Scholar
Chen, J. et al. A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Res. 29, 53-63 (2019). Article
CAS
PubMed
PubMed Central
Google Scholar
Fukushima, K. & Pollock, D. D. Amalgamated cross-species transcriptomes reveal organ-specific propensity in gene expression evolution. Nat. Commun. 11, 4459 (2020). Article
CAS
PubMed
PubMed Central
Google Scholar
Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587-1593 (2012). Article
CAS
PubMed
Google Scholar
Lê Cao, K.-A., Boitard, S. & Besse, P. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12, 253 (2011). Article
PubMed
PubMed Central
Google Scholar
Burkhardt, P. & Sprecher, S. G. Evolutionary origin of synapses and neurons - bridging the gap. Bioessays 39, 1700024 (2017). Article
Google Scholar
Sebé-Pedrós, A. et al. Cnidarian cell type diversity and regulation revealed by whole-organism single-cell RNA-seq. Cell 173, 1520-1534.e20 (2018). Article
PubMed
Google Scholar
Inaba, K. Sperm flagella: comparative and phylogenetic perspectives of protein components. Mol. Hum. Reprod. 17, 524-538 (2011). Article
CAS
PubMed
Google Scholar
Daldello, E. M., Luong, X. G., Yang, C.-R., Kuhn, J. & Conti, M. Cyclin B2 is required for progression through meiosis in mouse oocytes. Development 146, dev172734 (2019). Article
CAS
PubMed
PubMed Central
Google Scholar
Li, J., Ouyang, Y.-C., Zhang, C.-H., Qian, W.-P. & Sun, Q.-Y. The cyclin B2/CDK1 complex inhibits separase activity in mouse oocyte meiosis I. Development 146, 648053 (2019).
Google Scholar
Zeng, Y. et al. Bi-allelic mutations in MOS cause female infertility characterized by preimplantation embryonic arrest. Hum. Reprod. 37, 612-620 (2022). Article
CAS
PubMed
Google Scholar
Tay, J., Hodgman, R., Sarkissian, M. & Richter, J. D. Regulated CPEB phosphorylation during meiotic progression suggests a mechanism for temporal control of maternal mRNA translation. Genes Dev. 17, 1457-1462 (2003). Article
CAS
PubMed
PubMed Central
Google Scholar
Gąsiorowski, L. et al. Molecular evidence for a single origin of ultrafiltration-based excretory organs. Curr. Biol. 31, 3629-3638.e2 (2021). Article
PubMed
Google Scholar
Thakurela, S. et al. Mapping gene regulatory circuitry of Pax6 during neurogenesis. Cell Discov. 2, 15045 (2016). Article
CAS
PubMed
PubMed Central
Google Scholar
Eckler, M. J. & Chen, B. Fez family transcription factors: controlling neurogenesis and cell fate in the developing mammalian nervous system. Bioessays 36, 788-797 (2014). Article
CAS
PubMed
PubMed Central
Google Scholar
Taylor, M. V. & Hughes, S. M. Mef2 and the skeletal muscle differentiation program. Semin. Cell Dev. Biol. 72, 33-44 (2017). Article
CAS
PubMed
Google Scholar
Mathiyalagan, N. et al. Meta-analysis of grainyhead-like dependent transcriptional networks: a roadmap for identifying novel conserved genetic pathways. Genes 10, 876 (2019). Article
CAS
PubMed
PubMed Central
Google Scholar
Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650-659 (2005). Article
CAS
PubMed
Google Scholar
Roelofs, D. et al. Multi-faceted analysis provides little evidence for recurrent whole-genome duplications during hexapod evolution. BMC Biol. 18, 57 (2020). Article
CAS
PubMed
PubMed Central
Google Scholar
Marlétaz, F. et al. Amphioxus functional genomics and the origins of vertebrate gene regulation. Nature 564, 64-70 (2018). Article
PubMed
PubMed Central
Google Scholar
Oji, A. et al. Tesmin, metallothionein-like 5, is required for spermatogenesis in mice. Biol. Reprod. 102, 975-983 (2020). Article
PubMed
PubMed Central
Google Scholar
Jiang, J., Benson, E., Bausek, N., Doggett, K. & White-Cooper, H. Tombola, a tesmin/TSO1-family protein, regulates transcriptional activation in the Drosophila male germline and physically interacts with always early. Development 134, 1549-1559 (2007). Article
CAS
PubMed
Google Scholar
Hines, J. H. Evolutionary origins of the oligodendrocyte cell type and adaptive myelination. Front. Neurosci. 15, 757360 (2021). Article
PubMed
PubMed Central
Google Scholar
Ramirez, M. D. & Oakley, T. H. Eye-independent, light-activated chromatophore expansion (LACE) and expression of phototransduction genes in the skin of Octopus bimaculoides. J. Exp. Biol. 218, 1513-1520 (2015). Article
PubMed
PubMed Central
Google Scholar
Iram, T. et al. Young CSF restores oligodendrogenesis and memory in aged mice via Fgf17. Nature 605, 509-515 (2022). Article
CAS
PubMed
PubMed Central
Google Scholar
Hartenstein, V. & Martinez, P. Structure, development and evolution of the digestive system. Cell Tissue Res. 377, 289-292 (2019). Article
CAS
PubMed
PubMed Central
Google Scholar
Ottaviani, E., Malagoli, D. & Franceschi, C. The evolution of the adipose tissue: a neglected enigma. Gen. Comp. Endocrinol. 174, 1-4 (2011). Article
CAS
PubMed
Google Scholar
Kryuchkova-Mostacci, N. & Robinson-Rechavi, M. Tissue-specificity of gene expression diverges slowly between orthologs, and rapidly between paralogs. PLoS Comput. Biol. 12, e1005274 (2016). Article
PubMed
PubMed Central
Google Scholar
Lien, S. et al. The Atlantic salmon genome provides insights into rediploidization. Nature 533, 200-205 (2016). Article
CAS
PubMed
PubMed Central
Google Scholar
Fernández, R. et al. Selection following gene duplication shapes recent genome evolution in the pea aphid Acyrthosiphon pisum. Mol. Biol. Evol. 37, 2601-2615 (2020). Article
PubMed
PubMed Central
Google Scholar
Farré, D. & Albà, M. M. Heterogeneous patterns of gene-expression diversification in mammalian gene duplicates. Mol. Biol. Evol. 27, 325-335 (2010). Article
PubMed
Google Scholar
Clark, J. W. & Donoghue, P. C. J. Constraining the timing of whole genome duplication in plant evolutionary history.Proc. Biol. Sci. 284, 20170912 (2017). PubMed
PubMed Central
Google Scholar
Macqueen, D. J. & Johnston, I. A. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc. Biol. Sci. 281, 20132881 (2014). PubMed
PubMed Central
Google Scholar
Donoghue, P. C. J. & Purnell, M. A. Genome duplication, extinction and vertebrate evolution. Trends Ecol. Evol. 20, 312-319 (2005). Article
PubMed
Google Scholar
Almudí, I. & Pascual-Anaya, J. in Old Questions and Young Approaches to Animal Evolution (eds Martín-Durán, J. M. & Vellutini, B. C.) 107-132 (Springer, 2019). Derelle, R., Philippe, H. & Colbourne, J. K. Broccoli: combining phylogenetic and network analyses for orthology assignment. Mol. Biol. Evol. 37, 3389-3396 (2020). Article
CAS
PubMed
Google Scholar
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525-527 (2016). Article
CAS
PubMed
Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). Article
PubMed
PubMed Central
Google Scholar
Tapial, J. et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res. 27, 1759-1768 (2017). Article
CAS
PubMed
PubMed Central
Google Scholar
Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017). Article
PubMed
PubMed Central
Google Scholar
Kolberg, L., Raudvere, U., Kuzmin, I., Vilo, J. & Peterson, H. gprofiler2—an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler.F1000Research 9, ELIXIR-709 (2020). Article
PubMed
PubMed Central
Google Scholar
Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6, e21800 (2011). Article
CAS
PubMed
PubMed Central
Google Scholar
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988-D995 (2022). Article
CAS
PubMed
Google Scholar
Gramates, L. S. et al. FlyBase: a guided tour of highlighted features. Genetics 220, iyac035 (2022). Article
PubMed
PubMed Central
Google Scholar
Jin, L. et al. A pig BodyMap transcriptome reveals diverse tissue physiologies and evolutionary dynamics of transcription. Nat. Commun. 12, 3715 (2021). Article
CAS
PubMed
PubMed Central
Google Scholar
Wang, Z.-Y. et al. Transcriptome and translatome co-evolution in mammals. Nature 588, 642-647 (2020). Article
CAS
PubMed
PubMed Central
Google Scholar
Guschanski, K., Warnefors, M. & Kaessmann, H. The evolution of duplicate gene expression in mammalian organs. Genome Res. 27, 1461-1474 (2017). Article
CAS
PubMed
PubMed Central
Google Scholar
Touceda-Suárez, M. et al. Ancient genomic regulatory blocks are a source for regulatory gene deserts in vertebrates after whole-genome duplications. Mol. Biol. Evol. 37, 2857-2864 (2020). Article
PubMed
PubMed Central
Google Scholar
Korotkevich, G. et al. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2021). Mantica, F. & Irimia, M. Pervasive evolution of tissue-specificity of ancestral genes differentially shaped vertebrates and insects, V2. Mendeley Data https://doi.org/10.17632/22m3dwhzk6.2 (2023). fedemantica. bilaterian_GE. GitHub https://github.com/fedemantica/bilaterian_GE (2023). Kumar, S. et al. TimeTree 5: an expanded resource for species divergence times. Mol. Biol. Evol. 39, msac174 (2022). Article
CAS
PubMed
PubMed Central
Google Scholar
Download references We thank Q. T. Ramon for the original drawing of tissue icons; N. Arecco, N. B. Morais, A. Sebé-Pedrós and D. Weghorn for critical feedback on the manuscript; and the CRG Genomics Unit for the RNA sequencing. This research was funded by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (ERC-StG-LS2-637591 and ERCCoG-LS2-101002275 to M.I.), by the Spanish Ministry of Economy and Competitiveness (BFU-2017-89201-P and PID2020-115040GB-I00 to M.I.) and by the 'Centro de Excelencia Severo Ochoa 2013-2017'(SEV-2012-0208). F.M. holds a FPI fellowship associated with the grant BFU-2017-89201-P. Additional support for this research was provided by the Spanish MINECO (PGC2018-098427- B-I00 to D.M. and X.F.-M.), the Czech Science Foundation (22-21244S to M.N.), the Australian Research Council (grant DP200103219 to P.D.C. and F.T.) and the National Institutes of Health-NIAID (grant R21AI167849 to F.G.N.). Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain Federica Mantica, Luis P. Iñiguez, Yamile Marquez, Jon Permanyer, Antonio Torres-Mendez, Demian Burguera & Manuel Irimia Universitat Pompeu Fabra, Barcelona, Spain Federica Mantica & Manuel Irimia Institute of Evolutionary Biology (IBE, CSIC-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain Josefa Cruz, Xavier Franch-Marro & David Martin Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia Frank Tulenko & Peter D. Currie Sorbonne Université, CNRS, Biologie Intégrative des Organismes Marins; BIOM, Banyuls-sur-Mer, France Stephanie Bertrand & Hector Escriva Centre for Ecology and Conservation, University of Exeter, Penryn, UK Toby Doyle & Karl R. Wotton Institute of Parasitology, CAS, České Budějovice, Czech Republic Marcela Nouzova EMBL Australia; Victorian Node, Monash University, Clayton, Victoria, Australia Peter D. Currie Biology and BSI, Florida International University, Miami, FL, USA Fernando G. Noriega Department of Parasitology, University of South Bohemia, České Budějovice, Czech Republic Fernando G. Noriega Stazione Zoologica Anton Dohrn, Napoli, Italy Maria Ina Arnone Eugene Bell Center for Regenerative Biology and Tissue Engineering, Marine Biological Laboratory, Woods Hole, MA, USA Caroline B. Albertin Department of Genetics, Microbiology and Statistics and IRBio, Universitat de Barcelona, Barcelona, Spain Isabel Almudi ICREA, Barcelona, Spain Manuel Irimia F.M. performed most analyses and generated most figures and tables. L.P.I. built the motif dataset, designed and performed all motif-related analysis, and contributed to intellectual discussion. Y.M. and A.T.-M. performed additional analyses and contributed to intellectual discussion. J.P., A.T.-M., J.C., X.F.-M., F.T., D.B., S.B., T.D., M.N., P.D.C., F.G.N., H.E., M.I.A., C.B.A., K.R.W., I.A. and D.M. contributed RNA and/or tissue samples. F.M. and M.I. wrote the manuscript. The authors declare no competing interests. Nature Ecology & Evolution thanks Marie Sémon, Emily Wong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. a. Schematic representation of broken (left) and chimeric (right) genes and how they potentially influence gene orthology inferences. Animal silhouettes were downloaded from http://phylopic.org/. Credits to Gareth Monger for the hoverfly icon (https://creativecommons.org/licenses/by/3.0/). b. Examples of a broken (left) and chimeric (right) genes corrected in the silkworm gene annotation. c. Statistics of corrected and unresolved broken and chimeric genes across all species. d. Results from a BUSCO run (options -m proteins -l metazoa_odb10) assessing the status of 954 metazoa single-copy orthologs in the proteomes of all the species. CS: complete and single-copy, CD: complete and duplicated, F: fragmented, M: missing. e. Barplot representing the number of bilaterian-conserved (red) or more recent (grey) protein-coding genes across all species. The line plot represents the number of bilaterian-conserved orthogroups (OGs; that is, orthogroups conserved in at least 12 species) in which genes from each species are represented. f. Proportions of bilaterian-conserved orthogroups based on the number of species in which they are conserved. a. Schematic and relative example for the selection of bilaterian-conserved, best-ancestral orthologs in each species and tissue (see Supplementary Methods). b. Distributions of Pearson's correlation coefficients from all intra-tissue, species pairwise comparisons of gene expression upon distinct procedures for paralog selection and gene expression quantification. The expression measure for each species in each orthogroup corresponds to the expression of its best-ancestral ortholog (Best_anc), the average expression among all its paralogs (Average), the summed expression among all its paralogs (Summed) and the expression of a randomly selected paralog (Random). Significance levels of two-sided Wilcoxon rank-sum tests comparing the Best_anc distribution to each of the others are reported at the top, while the median value of each distribution is printed at the bottom. Correlations are performed on z-scored expression matrices (see Supplementary Methods). Only the 2,436 gene orthogroups conserved in all species (n = 20) were considered. P-value significance levels are defined as follows: **** = p-value ≤ 0.0001, *** = p-value ≤ 0.001, ** = p-value ≤ 0.01, * = p-value ≤ 0.05. The boxplot features are defined as follows: the center line represents the median; the lower and upper hinges correspond to the 25th and 75th percentiles; the lower and upper whiskers extend respectively to the lowest and highest points, to a limit of 1.5 multiplied by the interquartile range from the closest hinge. c. Schematic of the procedure adopted to associate all tissue-specific genes in each species (Tau ≥ 0.75) with the tissue(s) with tissue-specificity. This association (which we also evaluated for non-tissue-specific genes) will be considered for the inference of tissue-specificity gains (Extended Data Fig. 5). Additionally, we identified the top tissue(s) (that is, the tissue(s) with the highest expression) for all bilaterian-conserved genes, which will be considered for the selection of the best-TS orthogroups and the inference of tissue-specificity losses in each tissue (panel d and Extended Data Fig. 5c, respectively). d. Schematic and relative example for the selection of the best-TS ortholog in each species (see Supplementary Methods). Animal silhouettes were downloaded from http://phylopic.org/. a. Coordinates of the second (PC2; x axis) and third (PC3; y axis) components of a PCA performed on the best-ancestral orthogroups normalized gene expression matrix. Only the 2,436 best-ancestral orthogroups conserved in all species were considered. Tissue identity is represented by colors and species by shape. The left panel shows all tissues, while the right panel highlights neural and testis samples compared to all others. Coordinate distributions of these three groups of meta-samples are shown on the side of the relative component. The percentage of variance explained by each PC is reported on the relative axis. b. Percentage of variance explained by the first 15 principal components from the PCA described in a. c. -log10(p-value) of two-sided ANOVA tests performed among the coordinates of the specified groups on each component. For the left panel (green) we tested if there was a significant difference between tissues or species groups. For the center and right panel (blue and orange) we tested if there was a significant difference between any query group (that is, column) versus all other collapsed groups. All tests were performed with the aov function in R, and p-values were Bonferroni corrected. d. Heatmap showing the clustering of tissues and species (rows) based on the expression across tissues of best-ancestral bilaterian-conserved orthogroups (columns). Expression values were z-scored across tissues of the same species in order to minimize the inter-species variability (see Supplementary Methods for the definition of the best-ancestral orthogroups z-scored expression matrix). Only the 2,436 best-ancestral orthogroups conserved in all species were considered. The heatmap was generated by the pheatmap function in R with ward.D2 clustering method. Tissue colors refer to panel a. a-f. Coordinates of components returned by a sparse partial least square discriminant analysis (sPLS-DA) run separating the meta-samples of each tissue group (depicted with the relative colors) from all the others (grey). All 7,178 best-ancestral orthogroups were considered. The loadings of these components will be used to define the ancestral bilaterian tissue-specific modules (see Fig. 2a,b). The percentage of variance explained by each component is reported on the relative axis. g-l: Expression profiles across tissues of best-ancestral orthogroups in the ancestral tissue-specific modules (see Fig. 2c,d for neural and testis modules). (l) ovary module (n = 42); (h) muscle module (n = 112); (i) excretory module (n = 29); (j) epidermis module (n = 17); (k) digestive module (n = 51); (l) adipose module (n = 6). Expression values were first z-scored by species, and each dot represents the median expression among vertebrates, insects or outgroups. The boxplot features are defined as follows: the center line represents the median; the lower and upper hinges correspond to the 25th and 75th percentiles; the lower and upper whiskers extend respectively to the lowest and highest points, to a limit of 1.5 multiplied by the interquartile range from the closest hinge. a. Examples and criteria for the inference of tissue-specificity gains on either the deuterostome or protostome branches with the strict approach (left panel) and the relaxed approach (right panel). b. Example and criteria for the inference of ancestral bilaterian tissue-specificity. c. Examples and criteria for the inference of tissue-specificity losses. NB: the best-TS orthogroups are the ones considered for all inferences of tissue-specificity gains and losses (see Methods and Extended Data Fig. 2c,d). a-g. Orthogonal validation of all the inferred tissue-specificity gains in each tissue for which we could implement an OUs comparison method (see Supplementary Methods and Supplementary Discussion). The first bar always corresponds to the selected tissue-specificity gains (TS gains), while the second and third bars represent control sets (of the same size as the test set) sampled from either all best-ancestral orthogroups (BA) or best-ancestral orthogroups without tissue-specificity gains (BA no TS), to which we randomly assigned the tissue-specificity labels of the corresponding test set (see Methods). Left barplot: proportions of orthogroups based on the OU model (either a double-optima or a single-optimum) that better fits the relative expression levels. The double-optima OU model postulates different expression optima for the species with and without tissue-specificity, where the latter also include all species with losses. Right barplot: proportions of orthogroups better fitting a double-optima OU model (in red on the left barplot) depending on whether the species with tissue-specificity show higher/lower average relative expression compared to species without (TS greater/lower, respectively). a. Barplots representing the number of inferred tissue-specificity gains (left) and losses (right) across all nodes/species (rows) and tissues (columns). Best-TS, bilaterian-conserved orthogroups were considered for these inferences. b. Proportion of tissue-specificity gains in each node/species occurring in best-TS orthogroups that include 2R-onhologs. Deuterostome nodes/species are distinguished between those diverging before (transparent color) or after (full color) the two rounds of vertebrate WGDs. The black line represents the proportion of 2R-onhologs across all tissue-specificity gains. c. Proportions of duplicated (that is, with at least one paralog) or non-duplicated (that is, single-copy) genes with tissue-specific, species-specific gains in all species. The background line represents the overall proportion of duplicated genes in each species. d,e. Same data represented in Fig. 4f, but plotted separately across all nodes (d) and species (e). NB: Bilaterian "gains" indicate ancestral bilaterian tissue-specificity, which might have been acquired either in the last bilaterian ancestor or previously in evolution. Abbreviations: Euarch: Euarchontoglires. a. Barplot: proportions of duplicated (that is, with at least one paralog) or non-duplicated (that is, single-copy) tissue-specific genes in each species. Boxplot: proportions of duplicated tissue-specific genes in each species upon the ten randomizations of the original orthogroups (see Methods). The asterisks indicate a significant difference (one-sided binomial test, alternative = "less"; p-value ≤ 0.05) between the observed proportion of duplicated tissue-specific genes and the median of such proportions coming from the randomization trials. The background line represents the overall proportion of duplicated genes in each species. The boxplot features are defined as follows: the center line represents the median; the lower and upper hinges correspond to the 25th and 75th percentiles; the lower and upper whiskers extend respectively to the lowest and highest points, to a limit of 1.5 multiplied by the interquartile range from the closest hinge. Outliers points are plotted individually. The total number of considered genes is reported above each species' bar. b. Scheme illustrating how tissue-specific expression can be gained following gene duplication and specialization. Color dots indicate expression in the relative tissue, white dots represent lack of expression. c. For each tissue, median gene expression in each bilaterian-conserved orthogroups for species possessing at least one tissue-specific and one non-tissue-specific gene. Expression of tissue-specific genes is plotted on the left, while expression of their non-tissue specific paralogs is shown on the right. Each data point in each tissue's boxplot is the median of the relative expression in that tissue for all corresponding genes and species. The total number of considered genes is reported in the relative plot. See panel a for description of boxplot features. d. Median gene expression across tissues for bilaterian-conserved orthogroups with tissue-specific gains in each tissue. Left: best-TS orthologs of the species with tissue-specificity. Right: best-TS orthologs in the other species. Each data point in each tissue's boxplot is the median of the relative expression in that tissue for all corresponding genes and species. See panel a for description of boxplot features. Distributions for gains within single nodes/species are available in the Supplementary Dataset. a. Alluvia plot representing the best-TS, bilaterian-conserved orthogroups with tissue-specificity gains in distinct tissues between deuterostome (left) or protostome (right) nodes and species. Only orthogroups with gains in exclusively one tissue on each branch were considered. b. Number of parallel tissue-specificity gains between the deuterostome and protostome branch for all pairs of tissues represented in panel a. c. Plot from a Gene Set Enrichment Analysis (GSEA) testing for over-representation of developmental categories (760 out of 5779) among categories with high proportions of orthogroups that undergo species-specific gains of tissue-specificity. Only categories including at least 10 gene orthogroups were considered. The shown p-value refers to GSEA. d. Proportions of developmental GO categories among the top 5% (that is 95th percentile) of all GO categories ranked based on the proportions of their annotated orthogroups that undergo species-specific gains. The plotted values derive from 1000 randomization of the developmental labels among all GO categories, with the vertical dashed line corresponding to the observed proportion. Abbreviations: N: neural, T: testis, O: ovary, M: muscle, X: excretory, E: epidermis, D: digestive, A: adipose, NES: normalized enrichment score. a-f: Expression values (RPKMs) for human FGF17 (a) and its orthologs in five mammalian species (b-f) across several developmental and adult timepoints in seven tissues. Data from18. Supplementary Methods, Discussion, Figs. 1-12, Description of Supplementary Dataset content and References. Supplementary Data 1. Annotation corrections. Catalogue of corrected broken genes, resolved and unresolved chimaeric genes, together with the correspondences between original and new GeneIDs. Supplementary Data 2. Gene orthogroups statistics. Statistics of protein-coding genes from each species included in the gene orthogroups before and after correction and filtering, together with the statistics of bilaterian-conserved orthogroups. The 2R-ohnologue groups from ref. 6 used for one of the correction steps are reported. Original and corrected gene orthogroups files are provided in Supplementary Dataset, together with separate files for bilaterian-conserved orthogroups. Supplementary Data 3. RNA-seq metadata information. For each species and tissue, including a list of all samples, sample to metasample correspondence, sample origin (that is, public or in-house), SRA identifier, BioProject identifier, read number, read length, sequencing strategy (that is, paired or single end), sequencing technology and mapping statistics. Supplementary Data 4. Ancestral bilaterian tissue-specific modules. A list of orthogroups IDs, human and fruit fly best-hits gene symbols for all the orthogroups included in each module. Supplementary Data 5. Neural and testis phenotypes in human, mouse and fruit fly associated with neural and testis ancestral bilaterian tissue-specific modules. Neural phenotypes were defined as anything matching 'neuro', 'behaviour', 'brain', 'glia' and 'CNS' (case insensitive), while testis phenotypes were defined as anything matching 'sperm', 'infert', 'sterile', 'testis' (case insensitive). Supplementary Data 6. GO enrichments for each bilaterian ancestral tissue-specific module as provided by gprofiler2 and using the GO transfers derived from the human GO annotation. All P values were FDR corrected. Supplementary Data 7. GO enrichments for each bilaterian ancestral tissue-specific module as provided by gprofiler2 and using the GO transfers derived from the fruit fly GO annotation. All P values were FDR corrected. Supplementary Data 8. GO enrichments for non-tissue-specific orthogroups as provided by gprofiler2. All P values were FDR corrected. Supplementary Data 9. Tissue-specificity gains and losses throughout the phylogeny. For each tissue, a list of inferences is provided that include the orthogroup ID, the relative node/species, the inference type (that is, gain or loss) and the inference criteria (that is, strict, relaxed or merged). Supplementary Data 10. GO enrichments for orthogroups with tissue-specific gains in each node and species, as provided by gprofiler2 and using the GO transfers derived from the human GO annotation. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected. Supplementary Data 11. GO enrichments for orthogroups with tissue-specific gains in Vertebrata or more recent nodes and vertebrate species, as provided by gprofiler2 and using the GO transfers derived from the vertebrate-specific GO annotation. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected. Supplementary Data 12. GO enrichments for orthogroups with tissue-specific gains in Insecta or more recent nodes and insect species, as provided by gprofiler2 and using the GO transfers derived from the insect-specific GO annotation. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected. Supplementary Data 13. GO enrichments for orthogroups with tissue-specific gains (as provided by gprofiler2 and using the GO transfers derived from the human GO annotation) uniquely enriched in each node and species compared with all other gains in the same tissue. We grouped GO enrichment coming from tissue-specific gains in the same tissue, adding a column reporting the relative node/species. All P values were FDR corrected. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Reprints and permissions Mantica, F., Iñiguez, L.P., Marquez, Y. et al. Evolution of tissue-specific expression of ancestral genes across vertebrates and insects.
Nat Ecol Evol (2024). https://doi.org/10.1038/s41559-024-02398-5 Download citation Received: 02 August 2023 Accepted: 08 March 2024 Published: 15 April 2024 DOI: https://doi.org/10.1038/s41559-024-02398-5Data availability
Code availability
References
Acknowledgements
Author information
Authors and Affiliations
Contributions
Corresponding author
Ethics declarations
Competing interests
Peer review
Peer review information
Additional information
Extended data
Extended Data Fig. 1 Gene annotation refinements and statistics of bilaterian-conserved orthogroups.
Extended Data Fig. 2 Definition of best-ancestral and best-TS orthogroups.
Extended Data Fig. 3 Partial conservation of tissue-specific expression profiles among ancestral bilaterian genes.
Extended Data Fig. 4 Expression profiles of ancestral bilaterian tissue-specific expression modules.
Extended Data Fig. 5 Examples and criteria for phylogenetic inferences of tissue-specificity gains and losses.
Extended Data Fig. 6 Validation of inferred tissue-specificity gains.
Extended Data Fig. 7 Extra statistics of tissue-specificity gains and losses.
Extended Data Fig. 8 Expression profiles of tissue-specific genes compared to non-tissue-specific orthologs and paralogs.
Extended Data Fig. 9 Divergent and convergent evolution of tissue-specificity gains.
Extended Data Fig. 10 Developmental and adult expression of FGF17 in mammalian species.
Supplementary information
Supplementary Information
Supplementary Data
Rights and permissions
About this article
Cite this article