Skip to main content

Widespread HCD-tRNA derived SINEs in bivalves rely on multiple LINE partners and accumulate in genic regions

Abstract

Background

Short interspersed nuclear elements (SINEs) are non-autonomous non-LTR retrotransposons widespread across eukaryotes. They exist both as lineage-specific, fast-evolving elements and as ubiquitous superfamilies characterized by highly conserved domains (HCD). Several of these superfamilies have been described in bivalves, however their overall distribution and impact on host genome evolution are still unknown due to the extreme scarcity of transposon libraries for the clade. In this study, we examined more than 40 bivalve genomes to uncover the distribution of HCD-tRNA-related SINEs, discover novel SINE-LINE partnerships, and understand their possible role in shaping bivalve genome evolution.

Results

We found that bivalve HCD SINEs have an ancient origin, and they can rely on at least four different LINE clades. According to a “mosaic” evolutionary scenario, multiple LINE partner can promote the amplification of the same HCD SINE superfamilies while homologues LINE-derived tails are present between different superfamilies. Multiple SINEs were found to be highly similar between phylogenetically related species but separated by extremely long evolutionary timescales, up to ~ 400 million years. Studying their genomic distribution in a subset of five species, we observed different patterns of SINE enrichment in various genomic compartments as well as differences in the tendency of SINEs to form tandem-like and palindromic structures also within intronic sequences. Despite these differences, we observed that SINEs, especially older ones, tend to accumulate preferentially within genes, or in their close proximity, consistently with a model of survival bias for less harmful, short non-coding transposons in euchromatic genomic regions.

Conclusion

Here we conducted a wide characterization of tRNA-related SINEs in bivalves revealing their taxonomic distribution and LINE partnerships across the clade. Moreover, through the study of their genomic distribution in five species, we highlighted commonalities and differences with other previously studied eukaryotes, thus extending our understanding of SINE evolution across the tree of life.

Background

Bivalves (Class Bivalvia) are a rich and widespread clade of aquatic-only molluscs that diversified back in early Cambrian, more than 500 Mya [1]. This class include multiple economically and ecologically important species. For example, they have colonized freshwater environments [2] and deep-sea vents multiple times during their evolutionary history [3]. They can be useful bioindicators for marine pollutants [4] and they can represent biological models for the study of adaptations to climate change [5], innate immunity [6], sex determination [7], longevity [8, 9] and mitochondrial biology [10]. Moreover, they are characterized by peculiar genomic features that have been hypothesized to be linked to transposable elements (TEs) activity, such as transmissible cancers [11], high levels of hemizygosity [12] and gene presence-absence variation [13].

Their important role as promising model system for addressing both general biology and human health questions, together with the increased cost-efficient accessibility of third-generation sequencing technologies, has led to a major increase in their genomic resources in recent years [14]. This has opened the possibility to explore, in a broader context, also usually neglected genomic components and their evolutionary dynamics, such as TEs and other repetitive sequences which constitute a high proportion of bivalve genomes [15].

Repetitive DNA elements usually replicate in a selfish manner, independently from host genome replication, with variety of effects on the host fitness ranging from neutral to deleterious. However, multiple cases of co-option in novel functions have been described in literature [16]. Furthermore, their evolutionary trajectory can be influenced by the dynamic of the host population, which in turn may be affected by the changes in TE activity [17]. Therefore, our understanding of TE distribution and evolution across the tree of life represent an important step in a broader understanding of evolution of living forms.

Short Interspersed Nuclear Elements (SINEs) are a sub-class of non-autonomous, non-LTR retrotransposons that depend on the protein machinery of their autonomous counterpart LINEs (Long Interspersed Nuclear Elements) to reintegrate into the genome after their transcription by RNA polymerase III (Pol III) [18,19,20]. Moreover, while many non-autonomous elements usually originate from their autonomous counterparts though sequence decay or internal deletion, such as Miniature Inverted-repeat Transposable Elements (MITEs [21],) or Short Internally Deleted Elements (SIDEs [22]), SINEs emergence is only partially dependent from their LINE partners [19]. Their canonical structure comprises a head, a body, and a tail region [20]. The head can originate from one of the three RNA type synthetized by the RNA Pol III, namely tRNAs, 5S rRNAs, or 7SL RNAs, and contain its promoter region. Even if elements originated from all three RNA types have been observed across a wide range of eukaryotes, tRNA-derived SINEs appear the most common [19]. The body, when present, contains a domain of unknown origin and function, which appears to be element-specific [20]. However, in some instances, SINEs may carry bodies with highly conserved domains (HCD) across distinct SINE lineages and hosted by distantly related species. Although the role of HCDs is still unclear, they have been useful for classifying SINEs at the superfamily level [23,24,25]. Finally, the 3’ tail region serves as recognition for the LINE-derived reverse transcriptase (RT) and it may terminate with tandem repeats or an A-rich segment [20]. The SINE-LINE partnership can be specific if a LINE derived segment—usually originated from the 3 ‘UTR LINE region—is required for RT recognition, or aspecific when homology is not necessary [19]. The modular structure of SINEs suggested a characteristic evolutionary model called “mosaic evolution” under which different SINE lineages can exchange their modules through recombination [26]. This feature could allow their long-term persistence under a strict vertical inheritance evolutionary scenario in different genomic context, for example after the extinction of the original LINE partner [24, 25].

A few analyses in bivalves already identified HCD SINEs belonging to the superfamilies Core [23, 27], V [28, 29], Meta [23], Deu [30], and MD [23], where the latter are composed by a dimerization of Meta and Deu domains. Despite covering a low percentage of bivalve genome [15], their emergence was traced back to their most recent common ancestor [23] similarly to what was hypothesized for the diversity of LINE clades [15]. Moreover, for some of these elements Nishihara et al. [23] and Matetovici et al. [31] identified also their putative autonomous partners: CR1, L2, and Nimb. However, these studies were limited by the reduced number of whole genome assemblies available at that time and by the lack of a comprehensive LINE reference library for bivalves.

Here, we leveraged the recent increase in bivalve genomic resources to comprehensively characterize HCD SINE diversity and richness across bivalve evolutionary history. The newly generated SINE library was used to screen for putative and previously unknown SINE-LINE partnerships, revealing that at least 4 different LINE lineages could act as RT donors in seven different SINE-LINE partnerships. Moreover, since SINEs can be important contributors to gene and genome evolution, we conducted a case study analysis on a subset of five species to investigate the possible impact of SINE in genome evolution. Our findings showed that gene-related genomic regions are enriched in SINEs—particularly in old copies—and that they can be organized in tandem-like and palindromic structures, potentially affecting epigenetic regulation. With respect to the recent TE survey conducted in bivalves [15], this analysis further increases and refine the SINEs dataset, along with that of partner LINEs, also providing a deeper analysis of their genomic distribution and organization in bivalve genomes.

Methods

Genomic dataset for de novo SINE prediction and manual curation

We selected 11 genomes from NCBI [32] and GigaDB (Supplementary Table 1) for the de novo mining and manual curation of SINE elements. SINE candidates were mined from each assembly using RepeatModeler2 [33] and SINE_Scan v1.1.1 [34, 35]. For each species, we merged SINE_Scan representative sequences with all TE consensus resulting from RepeatModeler2 and annotated as SINE by RepeatClassifier or by deepTE [36], which was run on “Unknown” elements to increase the chance of include as many SINEs candidate as possible. Candidates elements were then subject to a “Blast-Extend-Extract" process [37] by blasting back each element against its source genome (Blastn v2.6.0: qcov_hsp_perc 70, perc_identity 70; [38]), extracting the top 50 hits + 300 bp at both ends with bedtools v2.26.0 [39] and aligned with MAFFT v7.475 [40]. From each alignment, we built a novel consensus sequence using the online Advance Consensus Maker tool (https://www.hiv.lanl.gov/content/sequence/CONSENSUS/consensus.html). Boundaries of the elements were manually identified looking for the characteristic decay of the alignment towards terminal regions and the consensus sequences was curated implementing a majority rule approach, following the guidelines of Goubert et al. [37] and Peona et al., [41]. To confirm a candidate as a tRNA-related SINE we required: (a) the presence of a microsatellite or a poly-A region at the 3’ end; (b) the presence of a tRNA- related region on the 5’ end predicted by tRNAscan-SE [42], through homology searches on the GtRNAdb 2.0 [43], http://gtrnadb.ucsc.edu) or manually looking for RNA Pol III A and B boxes, and (c) a length between 200 and 700 nucleotides. The presence of characteristic TSDs between 6 and 18 bps was manually checked, although it was not required to confirm a candidate as a SINE.

SINE-LINE partnerships

To identify partnerships of SINEs with their autonomous LINE counterparts, we queried all confirmed SINEs (blastn: word size = 7, gap opening penality = 2, gap extension penality = 2, Match score = 2, Mismatch score = -3, evalue = 0.01) against a bivalve-specific library of LINE elements [15]. All positive hits were manually checked to confirm that the homolog region would fall at the 3’ tail of the SINEs and within the 3’ UTR of the LINE element.

Co-evolutionary dynamics of SINEs and their LINE counterparts were studied in the genomes in which we found evidence of SINE-LINE partnerships. For this purpose, we first selected all assemblies for which we identified homology between the tail region of a confirmed SINE mined from the same assembly and any LINE 3’ UTR region. We then attempted to build a species-specific representative sequence of the LINE counterpart by blasting the original LINE element against the genome with decreasing thresholds in terms of identity and required alignment length. In this way, we obtained a novel consensus sequence following a Blast-Extend-Extract process, as previously described (See “Genomic dataset for de novo SINE prediction and manual curation”). When no homology was identified with a blastn search, we performed more sensitive tblastn searches (E-value 1E-05) of the amino acid translation of ORF2. Species-specific LINE consensus sequences were then checked for conservation of the homologous region with the species-specific SINE tail region. We assessed the completeness of the LINE consensus sequence with TE-aid [37] (min ORF length = 300 aa). For confirmed partnerships, we used all species-specific SINE-LINE partner pairs as custom libraries for RepeatMasker in sensitive mode against the source assembly. TE landscapes, describing the divergence of each TE copy from its consensus sequence in terms of percentage of Kimura distance after CpG corrections, were calculated using the calcdivFromAlign.perl script provided with the RepeatMasker installation. Concurrent activity between SINE-LINE partners was further tested for each species with Spearman's rank correlation tests between accumulation profiles (i.e., number of base pairs occupied in each bin of CpG corrected Kimura divergence) of the two elements.

Superfamily and family level classification of confirmed SINEs

For HCD SINEs classification we follow the superfamily classification scheme of Nishihara et al., [23] based on the presence of characteristic central domains previously identified in bivalve genomes (Meta, V, Deu, Core). We started provisionally annotating each element using the RepeatClassifier utility from the RepeatModeler package. Elements that should share the same central domain were aligned using MAFFT and we then manually checked for the presence of the characteristic domain.

To obtain species-specific SINE families, we clustered all HCD SINEs mined from the same source genome following the 80–80 rule [18] using cd-hit-est v4.7 (-G 0 -c 0.8 -aS 0.8 -t 1; [44]). Clusters were further refined into families following the definition from SINE Base [20], where a SINE family is described as “a set of elements sharing the same modules in the same order, excluding the tail region”, where the tail represent the putative LINE-derived region + the poly-A/microsatellite. To achieve this, we ensured that each cluster contains only elements with the same modules; when this criterion was not met the original cluster was split into different families. BLASTn was used to identify homologous regions within SINE tails between different families, with a permissive e-value of 0.05 but considering only alignments >  = 50 bp. Homologous relationships were visualized as a network with Cytoscape v3.10.2 [45].

Finally, all families were merged into a multi-species SINE library together with 19 elements previously described and deposited in RepBase (Supplementary Table 2) and re-clustered with the same methods describe above to identify families shared by multiple species (cross-species families).

Copy number estimation of tRNA-related SINEs across bivalve diversity

To estimate the distribution of the four SINE superfamilies across bivalve diversity we downloaded additional 20 genomes from NCBI (Supplementary Table 1) and performed homologous searches with blastn of all species-specific SINE families as queries (E-value 1E-05). To avoid crossmatch with tRNA donors and LINE homologous regions, we excluded hits shorter than 150 bp (i.e., approximately shorter than the 50% of the entire SINE length). After this step, we merged overlapping hits resulting from different families of the same superfamily using bedtools merge and counted the number of occurrences of each superfamily in each genome. A maximum of 150 random copies belonging to the superfamilies V, Meta, and Core were extracted from each genome and aligned using MAFFT in auto mode. TrimAl v1.4 [46] was used to remove gap positions (–gappyout mode) and spurious sequences from the alignment (-resoverlap 0.50 -seqoverlap 55). We inferred a Maximum Likelihood tree via FastTree v2.1.10 [47] using a GTR + Gamma model. After adding the gastropod Biomphalaria glabrata (GCF_947242115.1) to the genomic dataset as outgroup, we inferred phylogenetic relationships between the analysed species by extracting and concatenating the 331 complete and single-copy BUSCO genes (Metazoa odb10 dataset; [48]) present in at least 90% of the species. We provided the partitioned supermatrix to IQ-TREE2 [49] with ModelFinder [50] and 1,000 ultrafast bootstrap replicates [51] to respectively find the best-fit partitioning scheme and evolutionary model and assess nodal support of the inferred maximum likelihood tree. Divergence time estimation was performed with LSD2 [52] within IQ-TREE2 using the diversification of analysed bivalves and the crown node of all bivalve orders (except for Adapendonta and Cardiida) as calibration points based on the median divergence time reported in the TimeTree5 online database [53] (last accessed in July 2024).

Additionally, we estimated the taxonomic distribution of species-specific SINE families by blasting each SINE against our extended bivalve genomic dataset. We considered a family present in a genome when we could identify at least 10 hits with a query sequence identity and coverage of at least 80%.

Genomic distribution of SINEs and prediction of tandem-like SINE structures

All cross-species HCD SINE families were used as input library for RepeatMasker v4.1.0 [54] in sensitive mode (-s) to study their genomic occurrence in five species with available gene annotation. Specifically, for Crassostrea gigas (Ostreida), Mytilus californianus (Mytilida), and Pecten maximus (Pectinida) the RefSeq gene annotation was downloaded from NCBI repository, while for Ruditapes philippinarum (Venerida) and Scapharca broughtonii (Arcida) they were retrieved from Xu et al., [55] and Bai et al., [56], respectively. We considered five different features: exons, introns, annotated UTRs, 2,500 bp flanking the genes, and all other intergenic sequences (thus excluding 2,500 bp gene flanking regions). For each feature, we counted the number of intersections with SINE insertions with Bedtools intersect. Over- and under-representation of SINEs in each feature was tested by constructing—with Bedtools shuffle—null distributions from 1,000 random reshuffling iterations of all annotated SINE insertions across the genome (excluding genomic gaps). At each iteration, the number of intersections between each feature and the random intervals were counted. The observed number of intersections of SINEs in each feature was then compared to the null expectation. To directly test the hypothesis of SINE preferential accumulation in 2,500 bp gene flanking regions compared to all other intergenic regions, we split both features into intervals with a window of 500 bp with Bedtools windows and selected 10,000 random intervals for 100 iterations. We then counted the number of overlaps with SINE annotations at each iteration as previously described. Results for intergenic and gene-flanking genomic regions were statistically compared using t-test. Taking advantage of the high-quality repeat annotation of C. gigas, whose repeatome is almost completely characterized [15], we also studied the accumulation patterns of LINEs across the same genomic intervals. Briefly, all LINEs from C. gigas available in RepBase, as well as those identified by Martelossi et al. [15], were combined, redundancy reduced with cd-hit-est following the 80–80 rule and used to annotate the genome with RepeatMasker. Overlaps between LINE insertions and genomic features were counted and statistically tested as previously performed for SINEs. Finally, we additionally hypothesize that gene- genomic regions, here defined as UTRs + exons + introns + 2,500 bp gene flanking regions, are characterized by older SINEs copies compared to intergenic ones. To test this, all gene- genomic regions were merged, and we calculated, for both gene-related and intergenic genomic regions, the percentage of Jukes-Cantor (JC) distance of each SINE copy to its consensus as a proxy for the time of insertions. Distributions were then tested with t-test. The same analyses were also performed for LINE insertions in C. gigas.

The same five genomes were scanned to identify presence of tandem SINE arrays. For this purpose, we only kept high-scoring SINEs, i.e. RepeatMasker annotated insertions with a score higher than 400 and with a length of at least 150 bp. This was necessary to remove possible misannotations such as host tRNA genes. We consider tandem-like SINE structures when multiple elements coming from the same family were detected one after the other.

Collection of seed alignments for DFAM submission

SINE family consensus sequences were used to build up seed alignments for DFAM submission [57]. For this purpose, we used the generateSeedAlignments.pl script provided with RepeatModeler installation with the flags –taxon, specifying the species name as reported in NCBI taxonomy, and –assemblyID followed by the NCBI accession number of the assembly. Resulting Stockholm files were submitted to DFAM. Consensus sequences can be found in Supplementary Data S1.

Results

An improved HCD tRNA-related SINE library for bivalves

By combining RepeatModeler, SINE_Scan, and homology searches, we identified 201 SINEs across the 25 selected bivalve genomes analyzed for the initial screening of SINE candidates (Supplementary Table 2). All confirmed elements exhibited signatures of tRNA-related origin based on tRNA prediction analyses, homology searches against the GtRNAdb and/or manual identification of putative A and B boxes, the typical RNA polIII promoter (Supplementary Table 2, Fig. 1). The tail region of candidate SINEs was also checked for the presence of microsatellites, and we successfully identified characteristic TSDs with sizes ranging between 6–18 bp for 181 (90%) of these elements (Supplementary Table 2). Comparative analyses using the domains described in Nishihara et al., [23] allowed us to subdivide these elements into the five known HCD superfamilies: Meta, V, MD, Core, and Deu. Specifically, we classified 31 elements as Core, 16 as Deu, 34 as MD, 40 as Meta, and 53 as V. Additionally, we found 27 other SINEs without clear homology to the aforementioned domains which we simply classified as tRNA-related SINEs. Within the five known HCD superfamilies, we identified 10 putative different tRNA donors, which are also shared between different superfamilies (Supplementary Table 2).

Fig. 1
figure 1

Schematic representation of identified HCD tRNA-derived SINEs in bivalves. For each superfamily we reported the tRNA-related heads identified with tRNA-Scan SE and the putative LINE donors

These 201 elements were firstly clustered in species-specific families (Supplementary Table 2) and then, along with 19 publicly available bivalve SINEs into 76 distinct cross-species families based on criteria of reciprocal homology and order of SINE modules (Supplementary Table 2). Among these families, 17 are composed of unknown SINEs, 14 of Core SINEs, eight of Deu SINEs, six of MD SINEs, eight of Meta SINEs, and 23 of V SINEs (Supplementary Table 2). Unknown families were excluded from following analyses. The presence of 22 families shared by multiple species (five Core, four Deu, one MD, five Meta, seven V) belonging to the same bivalve order (Supplementary Table 2) underlies the possible long-term conservation of HCD SINE. Some notable examples are the families Bpla_SINE-1_Meta (tRNA head: Ser) shared between Mytilinae and Bathymodiolinae, Tgra_SINE-7_Meta (tRNA head: Ser) shared between all Arcidae and Oden_SINE-1_CORE (tRNA head: Ser) shared between O. denselamellosa, C. gigas and S. glomerata.

Bivalves HCD SINEs depend on at least 4 different LINE lineages

Using curated LINE libraries previously obtained from mollusc genomes [15] together with all newly generated SINE consensus sequences, we searched for putative SINE-LINE partnerships. Our results highlights that at least four different LINE clades can match any of the SINE tails (Fig. 1; See Supplementary Table 2 for all recognized homologies). Homologies between SINE and LINE 3’ ends can be shared between different superfamilies and span between 35 and 61 bp with an identity ranging from 72% to 95% (Supplementary Fig. 1). Nishihara et al. [23] and Matetovici et al., [31] found similarities between tail regions of V and Core families with CR1 and L2 elements and between Meta SINEs and Nimb LINEs (LINE I superfamily). Here we found that not CR1, but CR1-Zenon elements, a LINE clade closely related to CR1 and widespread in bivalves but apparently poor in other molluscs [15], are likely responsible for the retro-transcription/reintegration of V, Core, Meta, and Deu families, while Nimb LINEs may promote V and Meta replication. It is interesting the case of the M.phylippinarum_91 CR1-Zenon LINE element, which show clear homology to both Meta, Deu, V and Core elements (Supplementary Fig. 1). Additionally we also found only one family coming from the Venerida Archivesica marissinica genome with a tail region highly similar to 3’ ends of CR1 elements (Amar_SINE-2_CORE).

The homology relationships of SINE tail regions strongly support previous results. Indeed, we identified 21 different clusters characterized by SINEs with homologous or partially homologous tail regions (Supplementary Fig. 2). Sixteen of these clusters contain only families without a clear LINE-derived region, whereas the others are characterized by at least one element with one of the previously described LINE homologies. Interestingly, while tails coming from the same SINE superfamily tend to be more closely associated within the similarity network (i.e., more similar), multiple cases of homologous or partially homologous tails can be found between elements coming from different HCD superfamilies. Notably, the two biggest clusters contain almost all SINE tails that we identified as Nimb and CR1-Zenon related.

To study the co-evolutionary dynamics between autonomous and non-autonomous elements, we first reconstructed, where possible, a full-length species-specific LINE counterpart. We managed to reconstruct 10 species-specific SINE-LINE partnerships across 9 different species. All reconstructed LINE partners resulted longer than 4,000 bp with all but one (Myes-2_LINE-I) possessing an ORF2 which encodes for an RT + EN domain. Among these, eight out of nine also present an ORF1 demonstrating the completeness of their structure (Supplementary Fig. 3). Repeat landscapes analyses revealed contrasting patterns between LINE and SINE activity (Supplementary Fig. 4). Indeed, despite a positive and significant correlation between all accumulation profiles (all p-values < 0.05, Supplementary Fig. 5) both visual inspection of repeat landscapes profiles as well as correlation analyses point to possible different evolutionary scenarios. Specifically, the accumulation patterns of BivaV-SINE2_CrGi#V, Medu_SINE-2#Meta, and Sbro_SINE-2#Meta resulted lower and less significantly correlated to their LINE counterparts (0.3 < Spearman’s rho < 0.44; 0.002 < p-values < 0.03; Supplementary Fig. 5) compared to other analysed partnerships. On the contrary the partnerships Cgig_SINE-10#CORE / Cgig-1_LINE#L2, BivaV-SINE1_MiYe#V / Myes-2_LINE#Nimb, and Amar_SINE-2#CORE / Amar-1_LINE#CR1 show both strong correlations (Spearman’s rho > 0.8) and overlapping activity profiles.

HCD tRNA-derived SINEs are widespread in bivalves and maintained activity after bivalve order diversification

To have a broader picture of SINEs HCD family and superfamily distribution across bivalve diversity, we added other 20 assemblies to our starting genomic dataset for a total of 45 analyzed species representative of 10 different bivalve orders (Supplementary Table 1). After inferring phylogenetic relationships and divergence time estimation based on 331 single copy BUSCO genes present in at least the 90% of the species, we used these genomes as database for homology searches using all previously confirmed SINEs as queries. Recovered phylogenetic relationships are largely coherent with published phylogenomic results (Fig. 2A and B; [58, 59]), identifying all analysed bivalve orders as monophyletic with high bootstrap values. The only exception is Gari tellinella (superfamily Psammobiidae, Gtell in Fig. 2A and B) which is recovered in sister relationship with analyzed Adapendonta species and separated from other Cardiida belonging to the Cardioidea superfamily, as already reported [58].

Fig. 2
figure 2

Taxonomic Distribution of HCD superfamilies and families in bivalves. A Taxonomic distribution of the 5 known HCD SINE superfamilies and B number of shared families between each pair of analyzed species. Species name abbreviations refer to Supplementary Table 1. Phylogenetic relationships and divergence time estimation are represented at the top and bottom of the two panels. All nodes receive a bootstrap value ≥ 95 except those labelled in B

Concerning HCDs distribution, the Core superfamily was found across all members of the orders Pectinida, Ostreida (except for Pinna nobilis), Unionida, of the superfamily Cardioidea as well as in two Adapedonta (Solen grandis and Sinonovacula constricta), two Arcida species (Anadara kagoshimensis and Scapharca broughtonii) and in four Venerida genomes (Archivesica marissinica, Cyclina sinensis, Mactra quadrangularis, Ruditapes philippinarum) (Fig. 2A). No Core element could be identified in Mytilida, Myda and Lucinida representatives.

The Meta superfamily appears widespread across analyzed Arcida, Mytilida, Unionida (except for Margaritifera margaritifera), Adapedonta and in the three Venerida Saxidomus purpuratus, Spisula solida and Mercenaria mercenaria (Fig. 2A).

We identified the Deu superfamily in the majority of the Arcida, Cardioidea, and Venerida as well as in some Mytilida (Mytilus coruscus, Mytilus edulis, Mytilus californianus, and Mytilisepta virgata) and Ostreida (Ostrea denselamellosa, Saccostrea glomerata and P. nobilis) (Fig. 2A). Interesting, the deep-sea symbiotic clam A. marissinica hosts ~ 17 times more Deu elements than the second richest species S. glomerata (118,575 and 6,816, respectively) confirming an in increased of activity of specific TE groups in this lineage, possibly related to its colonization of hydrothermal vents [15, 60].

The MD superfamily appears as the least represented across bivalves, while the V superfamily resulted the most ubiquitous with elements identified across all species except for S. glomerata, Cangeria kusceri, and Fragum whitleyi, confirming what was previously found by Nishihara et al., [23] (Fig. 2A). Interesting both Meta and V superfamilies are present in similar high copy number across four out of the six Unionida species here analyzed (Hyriopsis cumingii, Unio delphinus, Megalonaias nervosa and P. streckersoni,from 27,000 to 101,712 copies). Similarly, also the LINE complement of M. nervosa and P. streckersoni appeared to be different compared to other bivalves in Martelossi et al. [15]. However, the apparent absence of SINE V and META amplification in the P. streckersoni sister species Venustaconcha ellipsiformis points to two possible different evolutionary scenarios: an amplification in their most recent common ancestor with subsequent genomic deletion in V. ellipsiformis or an independent amplification along different Unionida lineages. The fast increase in high quality Unionida genomic resources will possibly shed light on which process is driving the different TE landscape of this group.

Coherently with the results obtained from the cross-species family clustering (see Results: “An improved HCD tRNA-related SINE library for bivalves”), we found that bivalves belonging to the same order and those phylogenetically close are more likely to possess shared SINEs (Fig. 2B). For example, we found additional evidence for shared families between Mytilinae and Bathymodiolinae, which diverged around 324 Mya in our estimation and ~ 400 Mya according to Lee et al. [61] as well as between all Arcida (divergence time: ~ 178 Mya here and ~ 177 Mya in [62]). Interestingly, we also found shared families between Adapendonta, G. tellinella, Myda, and Venerida, which diverged ~ 400 Mya in our study but up to ~ 500 Mya based on mitochondrial markers in Wang et al. [63]. Despite this general trend, we also observed some shared families more complex to explain under a strict vertical evolutionary scenario, such as between Unionida and Venerida, as well as between Pinna nobilis (Ostreida, Pnib in Fig. 2B) and A. marissinica (Venerida; Amar in Fig. 2B).

Phylogenetic analyses of 150 random copies of the V and Meta superfamilies for each species (Fig. 3A-B) indicate that the great majority of elements are specific for a given bivalve order, as for Unionida, Mytilida, Arcida, and Venerida, while a few other elements are shared by different bivalve orders. On the contrary, the phylogenetic pattern of the Core superfamily is less clear as multiple groups of SINEs can be observed from the same bivalve order (Fig. 3C).

Fig. 3
figure 3

Phylogeny of HCD superfamilies in bivalves. Phylogenetic trees of 150 random copies extracted from each genome for the superfamilies V (A), Meta (B), and Core (C). Colours of the tip labels represent bivalve order and reflect the colouring scheme of Fig. 2

SINEs accumulation in gene-related genomic regions and organization in complex tandem-like structures

To detect potential preferences in the genomic occurrence of SINEs with respect to coding regions, we carried out a case study using five species with available gene annotation, testing the hypothesis of a higher accumulation of older SINEs in gene-related compared to intergenic genomic regions. HCD SINE insertions were found within 0.9%, 10%, 8.8%, and 4.1% of the genes in C. gigas, M. californianus, P. maximus, and S. broughtonii, respectively, but reached 41% in R. philippinarum. Compared to the null expectation, exons and UTRs consistently exhibited significantly fewer insertions, while gene flanking and intergenic genomic regions were generally enriched with SINEs (Table 1). On the other hand, we observed a significant overrepresentation of insertions in the introns of R. philippinarum and S. broughtonii (Table 1), where we even found up to 61 and 145 insertions within a single gene, respectively (Table 1). These two species exhibited different accumulation patterns of SINEs within introns, with the former showing a low number of insertions in a high number of introns, while the latter showed a high number of insertions in a low number of introns (Supplementary Fig. 6).

Table 1 Genomic distribution of observed and simulated SINE insertions with respect to different genomic backgrounds

Moreover, for C. gigas, M. californianus, R. philippinarum, and S. broughtonii, gene flanking regions (2,500 bp flanking the gene) showed an enrichment of SINEs compared to intergenic ones (t-test; p-value < 0.01), whereas for P. maximus, we observed the opposite trend (Fig. 4A). Gene-related genomic regions (defined as exons, introns, UTRs and 2,500 bp gene-flanking regions) appear also characterized by older SINEs compared to intergenic ones, based on the Jukes-Cantor distance from consensus sequences across all species (Fig. 4B; t-test, all p-values < 0.001). Interestingly, we did not observe the same accumulation pattern when analysing the LINEs counterparts in C. gigas. Here intergenic genomic regions resulted significantly more affected by insertions compared to gene-flanking ones (Fig. 4C; t-test, p-value < 0.001; Supplementary Table 3) and particularly by older insertions (Fig. 4D; t-test, p-value < 0.001).

Fig. 4
figure 4

Genomic occurrence of HCD SINEs. A Number of overlaps between SINE insertions with random gene-flanking regions (2,500 bp upstream and downstream the gene) and random intergenic genomic regions. B Jukes-Cantor (JC) distances of each SINE insertion from its consensus sequence as a proxy of the time of insertion. Gene-related = Insertions founded within genes (exons, introns, and UTRs) or in their 2,500 bp flanking regions. C and D are respectively specular to A and B but refer to LINE insertions in C. gigas. C Number of overlaps between LINE insertions with random gene flanking regions versus random intergenic genomic regions. D JC distance of LINE copies in gene-related versus intergenic genomic intervals. All comparisons are statistically significant (t-test, p-value < 0.01). Cgig = C. gigas, Mcal = M. californianus, Pmax = P. maximus, Rphi = R. philippinarum, Sbro = S. broughtonii

The same five genomes were also scanned for tandem-like HCD SINEs, considering only high-scoring insertions. All tandem arrays consist of two or three elements across all genomes, except for S. broughtonii, where we found 64 elements organized in tandem arrays of 4-15 units. Furthermore, while C. gigas hosts the smallest number of tandem-like SINE structures (three), in S. broughtonii, 3% of the high-scoring SINEs are organized in tandem arrays or palindromic structures, with 137 of them also incorporating one or multiple elements coming from a different family (Fig. 5A). The high number of tandem arrays in the blood clam S. broughtonii, even within intronic sequences (659 tandem arrays), could be an important contributor to the previously observed pattern of few introns impacted by a high number of insertions. We suggest that these SINE-rich introns could also drive the observed enrichment of SINE insertions in intronic sequences despite the low number of affected genes. Direct tandem arrays constitute most of tandemly repeated SINEs, while palindromes account for the 23% (403 structures) of which 126 overlap with gene annotations and particularly within intronic sequences (Fig. 5B-C).

Fig. 5
figure 5

Tandem-like SINE structures in bivalve genomes: A Number of tandem-like SINE structures identified in each of the five analysed bivalve genomes. Tandem-like + Different SINEs means that together with tandem SINEs structures coming from the same family, we also detected elements coming from different families. Cgig = C. gigas, Mcal = M. californianus, Pmax = P. maximus, Rphi = R. philippinarum, Sbro = S. broughtonii. B and C examples of respectively direct and inverted SINE repeats present in intronic sequences of the S. broughtonii genome. D and E organization and structure of direct tandem-like SINE structures identified in the S. broughtonii genome. The tandem repeat in D is the same rapresented in B

Due to the high number of tandem-like SINEs in S. broughtonii, we looked deeper into the structures and arrangement of direct tandem arrays composed of more than 3 elements belonging to the same SINE family. We identified multimers of tandemly arranged SINEs with two main structures: TSD1-(SINE-TSD1)n and TSD1-(SINE)n-TSD1. The former is composed of highly similar SINE elements truncated at the same position of the 3’ tail and separated by the same TSDs (Fig. 5D), whereas the latter usually consists of a first SINE with a complete tRNA-related head but a truncated 3’ tail, multiple repetitions of a truncated SINE at both the 5’ tRNA head and 3’ tail, and a final SINE with the same tRNA head truncation but a complete tail (Fig. 5E).

Discussion

Transposable elements (TEs) are among the most significant sources of genetic variation across the eukaryotic tree of life. The advancement in the sequencing field are leading to a greater appreciation of TEs in the context of understanding genome evolution, gene regulation, and species diversification. However, while genomic resources are rapidly expanding for most eukaryotic clades, accurate TE identification and annotations are still lacking in non-model species [64], hindering our ability to comprehend their taxonomic distribution and effects on host biology. In this study, we took advantage of the increased number of bivalve genomes to comprehensively characterize tRNA-related HCD-containing SINEs across their diversity and to investigate their genomic distribution patterns. Our examination of 49 assemblies confirms that the five identified HCD-SINEs superfamilies have an ancient origin in bivalves, have been retained for a long evolutionary timescale [23], and can be derived from at least 10 different tRNA genes. Simultaneously, we observed important order-specific activity of the V and Meta superfamilies based on their phylogenetic clustering patterns. Based on analyses of LINE-derived tail regions, we found that at least four different LINE lineages (CR1; CR1-Zenon; L2; Nimb) can act as RT donor to four different SINE superfamilies for a total of seven SINE-LINE relationships of which three were previously unknown (specifically the partnerships between SINE V and LINE Nimb, SINE Core and LINE CR1, and SINE Meta and CR1-Zenon). Therefore, we greatly increase the number of putative SINE-LINE partnerships compared to the previous studies of Nishihara et al. [23] and Matetovici et al. [31]. The presence of multiple LINEs partner to SINE families with the same HCD, and the identified homologies between tail regions coming from families with different central domains suggest that recombination and module shuffling might have played a role in the emergence and subsequent amplification of new SINE families during bivalve diversification, coherently with a “mosaic” evolutionary scenario of SINEs [26]. Because of the strict relationship between SINEs and their LINE counterparts, we might expect almost overlapping landscapes of activity in the case of partnerships between the two elements. However, in multiple instances more complex evolutionary scenarios emerge, possibly due to different competitive dynamics for the LINE-derived enzymatic machineries [65, 66]. Indeed, specific SINE lineages could be particularly efficient in parasitizing their LINE counterparts, preventing them from expanding. The hijacked LINE, in turn, might increase its replication rate only when the SINE partially loses its parasitizing capacity and, consequently, its replication rate. Another limitation in inferring the co-evolutionary dynamics of SINEs and LINEs using repeat landscape profiles is the inability to account for different deletion rates among various transposons. Some TEs might be more susceptible to genome elimination compared to others, resulting in their underrepresentation in older divergence bins. These phenomena could contribute to the different patterns that emerge from our analyses. Indeed, despite a consistently significant and positive correlation between accumulation profiles of SINEs and their LINE counterparts, the strength of the correlation varied significantly between species, and the repeat landscapes showed substantial overlaps only in a few instances. Another possible explanation on the incongruence between SINE-LINE activity is that we might have missed the true LINE partner, annotating a sibling LINE lineage as the autonomous counterpart. While we tried to overcome this issue by attempting to reconstruct putative species-specific full-length partner LINE, we could not discard this possibility which should be further addressed in future studies.

Interestingly, when performing the cross-species family clustering we identified 22 HCD-SINE families characterized by the same highly similar modules in species separated by exceptionally long evolutionary time. For instance, the family Bpla_SINE-1_Meta was found in both Bathymodiolinae and Mytilinae which diverged more than 300 Mya in our study and ~ 400 million years ago in Lee et al., [61]. These results were strongly supported by the presence of multiple families shared between phylogenetically related but anciently diverging lineages when analysing the complete dataset of 45 bivalve genomes. Here we identified shared families even between groups of species that diverged between ~ 400 in our results and ~ 500 Mya in Wang et al. [63]. These represent exceptionally extreme cases of what was previously observed in grasses where SINE families were found to be far more conserved than LTR and TIR elements and retained for at least ~ 60 million years [34, 35]. The apparent long-term retention of TE families could also be explained by horizontal transposon transfer (HTT), which was already observed for SINEs in a few instances [28, 67, 68]. Indeed, we also found patterns of shared families difficult to explain under a strict vertical evolutionary scenario, such as between the Ostreida Pinna nobilis and the Venerida A. marissinica.

The ability to reconstruct consensus sequences shared between distantly related species could be favoured by the persistence of old insertions across the genome. In this context, it is interesting that, despite being underrepresented in exons and UTRs, SINE insertions tend to accumulate in gene-flanking regions (except for the Pectinida P. maximus) and, in the case of R. philippinarum and S. broughtonii, also within intronic sequences. It must be noted that we did not consider different GC contexts in our permutation tests, and different transposable elements have been shown to be enriched in genomic regions with different GC content [69, 70]. Even though we could not distinguish between preferential accumulation in GC-rich genomic regions, which are usually gene-dense [71], or in the genes per-se, the close association of HCD SINEs with gene bodies could increase the probability of their co-option from the host genome. Indeed, SINEs can frequently act as cis-regulatory elements, provide novel exons, or contribute to the mRNA processing, potentially enhancing the plasticity of tissue-specific transcripts, as recently observed in Drosophila [72]. A close association between SINEs and genes was also observed in plants [34, 35, 73, 74], fishes [75], mammals [76], and insects [67]. Open-chromatin genomic regions are known to be enriched in short and fragmented TEs [76, 77]. Furthermore, older SINE families were found to be more represented in euchromatic genomic regions compared to the younger ones both in grasses [34, 35] and in the coelacanth [75], consistently with our results. Indeed, we found that gene-related genomic regions (i.e., intragenic + 2,500 bp gene flanking regions) are enriched in older HCD SINE insertions, in terms of distance from their consensus sequence, compared to intergenic ones. If we assume no difference in insertion preference between old and young HCD SINEs, this pattern may suggest that gene-related regions could serve as safe ecological niches where short, non-coding transposons can survive. Coherently, we did not observe the same accumulation pattern when analysing LINE elements in the model bivalve C. gigas. One explanation is that short transposon insertions, like SINEs, could be favoured in the proximity of genes by a combination of (1) reduced competition with longer, more harmful TEs and (2) lower efficiency of TE-purging processes [34, 35, 78]. Indeed, deletions of transposable elements are mainly caused by ectopic DNA repair mechanisms, such as non-allelic homologous recombination and microhomology-mediated end joining [79, 80]. All these processes promote genome instability and may affect genomic flanking sequences, giving rise to complex and potentially harmful variants if a gene or a gene-interacting region is involved [81].

Methylation of SINE-derived direct repeats has been linked to the epigenetic regulation of downstream genes in Arabidopsis thaliana [82], and double-stranded hairpin structures in the mRNA derived from palindromic structures, resulting from alternating orientations of SINE insertions, might serve as substrates for DICER enzymes [74]. We found that HCD SINEs in bivalves can be organized in such tandem-like and palindromic structures also within genes, with an increased tendency in the Arcida S. broughtonii. In this species, approximately 3% of its HCD SINEs are organized in a similar manner. For comparison, in the potato genome, about 2% of the SINEs are included in tandem-like arrays [74]. Recently, Vassetzky et al. [83] also described common tandem repeats derived from different portions of SINE transposons in squamates [83]. Even though in the four species analysed here it appears to be a much less common phenomenon, it must be noted that we applied stringent cutoffs in terms of hit length (minimum length 150 bp) and RepeatMasker score (minimum 400) to avoid spurious matches. Therefore, we might have missed tandem structures composed of highly fragmented and/or diverging SINEs, and more systematic and targeted screening is needed to more precisely quantify the impact of SINEs in the formation of tandem repeats in bivalves. Finally, the high number of SINE direct repeats identified in S. broughtonii raises interesting hypotheses about their potential origin and genome evolutionary dynamics of this species. Indeed, one possible outcome of unequal homologous recombination between target site duplications (TSDs) is the formation and the expansion of SINE tandem arrays [84]. The high number of such structures in S. broughtonii could therefore imply higher recombination rates in this species compared to other analysed bivalves.

Conclusions

Here we present for the first time a wide characterization of tRNA-related HCD SINEs in bivalves looking at their distribution, LINE partnerships and genomic occurrence. Thanks to a novel, manually curated SINE library we found that bivalves HCD SINEs could derived from at least 10 different tRNAs and depend on at least four different LINE lineages. Different LINEs can promote the amplification of the same HCD superfamily whereases homologues tails are shared between different superfamilies, suggesting a “mosaic” evolutionary scenario of SINE modules. Additionally, some SINE families are apparently shared between distantly related species underlying the possible long-term retention of highly similar HCD SINE linages characterized by the same tRNA-related head, central domain and LINE-derived tail. Genomic occurrence analyses across five different bivalve species highlighted their potential different effects in genome evolution. Indeed, different species show overrepresentation of SINE insertions across different genomic compartments as well as different tendencies to form tandem-like and palindromic structures which could be present in intronic sequences. Despite these differences, we found a consistent trend of accumulation of old SINEs in close proximity of genes, as previously observed in plants and other metazoan. This result suggest that evolutionary dynamics of SINEs might partially follow a common evolutionary route across eukaryotes in which euchromatic genomic regions serve as safe niches for their survival. Overall, this study represents a step forward in a broader understanding of TE evolutionary dynamics in a highly overlooked but important taxonomic group like bivalves, and open interesting questions about the possible role of SINEs in bivalve genome organization, biology and evolution.

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information files. SINE consensus sequences can be found in Supplementary Data S1. Seed alignments generated from the consensus sequences have been deposited to DFAM (https://www.dfam.org/home) under the Creative Commons CC0 1.0 public domain license.

References

  1. Kocot KM, Poustka AJ, Stöger I, Halanych KM, Schrödl M. New data from Monoplacophora and a carefully-curated dataset resolve molluscan relationships. Sci Rep. 2020;10:101. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-019-56728-w.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Graf DL. Patterns of freshwater bivalve global diversity and the state of phylogenetic studies on the unionoida, sphaeriidae, and cyrenidae. Am Malacological Bull. 2013;31(1):135–53. https://doiorg.publicaciones.saludcastillayleon.es/10.4003/006.031.0106.

    Article  Google Scholar 

  3. Guo Y, Meng L, Wang M, Zhong Z, Li D, Zhang Y, Li H, Zhang H, Seim I, Li Y, Jiang A, Ji Q, Su X, Chen J, Fan G, Li C, Liu S. Hologenome analysis reveals independent evolution to chemosymbiosis by deep-sea bivalves. BMC Biol. 2023;21:51. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12915-023-01551-z.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Farrington JW, Tripp BW, Tanabe S, Subramanian A, Sericano JL, Wade TL, Knap AH. Edward D. Goldberg’s proposal of “the Mussel Watch”: reflections after 40years. Mar Pollut Bull. 2016;110:501–10. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.marpolbul.2016.05.074.

    Article  CAS  PubMed  Google Scholar 

  5. Gazeau F, Parker LM, Comeau S, Gattuso J-P, O’Connor WA, Martin S, Pörtner H-O, Ross PM. Impacts of ocean acidification on marine shelled molluscs. Mar Biol. 2013;160:2207–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00227-013-2219-3.

    Article  CAS  Google Scholar 

  6. Saco A, Novoa B, Greco S, Gerdol M, Figueras A. Bivalves present the largest and most diversified repertoire of toll-like receptors in the animal kingdom, suggesting broad-spectrum pathogen recognition in marine waters. Mol Biol Evol. 2023;40(6):msad133. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/molbev/msad133.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Nicolini F, Ghiselli F, Luchetti A, Milani L. Bivalves as emerging model systems to study the mechanisms and evolution of sex determination: a genomic point of view. Genome Biol Evol. 2023;15(10):evad181. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evad181.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Blier PU, Abele D, Munro D, Degletagne C, Rodriguez E, Hagen T. What modulates animal longevity? Fast and slow aging in bivalves as a model for the study of lifespan. Semin Cell Dev Biol Science communication in the field of fundamental biomedical research. 2017;70:130–40. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.semcdb.2017.07.046.

    Article  Google Scholar 

  9. Iannello M, Forni G, Piccinini G, Xu R, Martelossi J, Ghiselli F, Milani L, et al. Signatures of Extreme Longevity: A Perspective from Bivalve Molecular Evolution. Genome Biol Evol. 2023;15(11):evad159. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evad159.

  10. Ghiselli F, Iannello M, Piccinini G, Milani L. Bivalve molluscs as model systems for studying mitochondrial biology. Integr Comp Biol. 2021;61:1699–714. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/icb/icab057.

    Article  CAS  PubMed  Google Scholar 

  11. Metzger MJ, Villalba A, Carballal MJ, Iglesias D, Sherry J, Reinisch C, Muttray AF, Baldwin SA, Goff SP. Widespread transmission of independent cancer lineages within multiple bivalve species. Nature. 2016;534:705–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/nature18599.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Calcino AD, Kenny NJ, Gerdol M. Single individual structural variant detection uncovers widespread hemizygosity in molluscs. Philos Trans R Soc Lond B Biol Sci. 2021;376:20200153. https://doiorg.publicaciones.saludcastillayleon.es/10.1098/rstb.2020.0153.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Gerdol M, Moreira R, Cruz F, Gómez-Garrido J, Vlasova A, Rosani U, Venier P, Naranjo-Ortiz MA, Murgarella M, Greco S, Balseiro P, Corvelo A, Frias L, Gut M, Gabaldón T, Pallavicini A, Canchaya C, Novoa B, Alioto TS, Posada D, Figueras A. Massive gene presence-absence variation shapes an open pan-genome in the Mediterranean mussel. Genome Biol. 2020;21:275. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-020-02180-3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Davison A, Neiman M. Mobilizing molluscan models and genomes in biology. Philos Trans R Soc Lond B Biol Sci. 2021;376:20200163. https://doiorg.publicaciones.saludcastillayleon.es/10.1098/rstb.2020.0163.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Martelossi J, Nicolini F, Subacchi S, Pasquale D, Ghiselli F, Luchetti A. Multiple and diversified transposon lineages contribute to early and recent bivalve genome evolution. BMC Biol. 2023;21:145. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12915-023-01632-z.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvák Z, Levin HL, Macfarlan TS, Mager DL, Feschotte C. Ten things you should know about transposable elements. Genome Biol. 2018;19:199. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-018-1577-z.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Venner S, Feschotte C, Biémont C. Dynamics of transposable elements: towards a community ecology of the genome. Trends Genet. 2009;25:317–23. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.tig.2009.05.003.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/nrg2165.

    Article  CAS  PubMed  Google Scholar 

  19. Kramerov DA, Vassetzky NS. Origin and evolution of SINEs in eukaryotic genomes. Heredity. 2011;107:487–95. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/hdy.2011.43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Vassetzky NS, Kramerov DA. SINEBase: a database and tool for SINE analysis. Nucleic Acids Res. 2013;41:D83–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gks1263.

    Article  CAS  PubMed  Google Scholar 

  21. Fattash I, Rooke R, Wong A, Hui C, Luu T, Bhardwaj P, Yang G. Miniature inverted-repeat transposable elements: discovery, distribution, and activity. Genome. 2013;56:475–86. https://doiorg.publicaciones.saludcastillayleon.es/10.1139/gen-2012-0174.

    Article  CAS  PubMed  Google Scholar 

  22. Wang PL, Luchetti A, Alberto Ruggieri A, Xiong XM, Xu MR, Zhang XG, Zhang HH. Successful invasions of Short Internally Deleted Elements (SIDEs) and its partner CR1 in lepidoptera insects. Genome Biol Evol. 2019S 1;11(9):2505–16. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evz174.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Nishihara H, Plazzi F, Passamonti M, Okada N. MetaSINEs: broad distribution of a novel SINE superfamily in animals. Genome Biol Evol. 2016;8:528–39. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evw029.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Luchetti A, Mantovani B. Conserved domains and SINE diversity during animal evolution. Genomics. 2013;102:296–300. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ygeno.2013.08.005.

    Article  CAS  PubMed  Google Scholar 

  25. Luchetti A, Mantovani B. Rare horizontal transmission does not hide long-term inheritance of SINE highly conserved domains in the metazoan evolution. Current Zoology. 2016;62:667–74. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/cz/zow095.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Ziętkiewicz E, Labuda D. Mosaic evolution of rodent B1 elements. J Mol Evol. 1996;42:66–72. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/BF00163213.

    Article  PubMed  Google Scholar 

  27. Gilbert N, Labuda D. CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs. Proc Natl Acad Sci. 1999;96:2869–74. https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.96.6.2869.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Luchetti A, Šatović E, Mantovani B, Plohl M. RUDI, a short interspersed element of the V-SINE superfamily widespread in molluscan genomes. Mol Genet Genomics. 2016;291:1419–29. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00438-016-1194-z.

    Article  CAS  PubMed  Google Scholar 

  29. Ogiwara I, Miya M, Ohshima K, Okada N. V-SINEs: a new superfamily of vertebrate SINEs that are widespread in vertebrate genomes and retain a strongly conserved segment within each repetitive unit. Genome Res. 2002;12:316–24. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.212302.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Nishihara H, Smit AFA, Okada N. Functional noncoding sequences derived from SINEs in the mammalian genome. Genome Res. 2006;16:864–74. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.5255506.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Matetovici I, Sajgo S, Ianc B, Ochis C, Bulzu P, Popescu O, Damert A. Mobile element evolution playing jigsaw—SINEs in gastropod and bivalve mollusks. Genome Biol Evol. 2016;8:253–70. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evv257.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022;50:D20–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkab1112.

    Article  CAS  PubMed  Google Scholar 

  33. Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. RepeatModeler2: automated genomic discovery of transposable element families (preprint). Genomics. 2019. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/856591.

    Article  Google Scholar 

  34. Mao H, Wang H. SINE_scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets. Bioinformatics. 2017;33:743–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bioinformatics/btw718.

    Article  CAS  PubMed  Google Scholar 

  35. Mao H, Wang H. Distribution, diversity, and long-term retention of grass short interspersed nuclear elements (SINEs). Genome Biol Evol. 2017;9:2048–56. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evx145.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Yan H, Bombarely A, Li S. DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics. 2020;36:4269–75. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bioinformatics/btaa519.

    Article  CAS  PubMed  Google Scholar 

  37. Goubert C, Craig RJ, Bilat AF, Peona V, Vogan AA, Protasio AV. A beginner’s guide to manual curation of transposable elements. Mob DNA. 2022;13:7. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-021-00259-7.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/S0022-2836(05)80360-2.

    Article  CAS  PubMed  Google Scholar 

  39. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bioinformatics/btq033.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/molbev/mst010.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Peona V, Martelossi J, Almojil D, Bocharkina J, Brännström I, Brown M, Cang A, Carrasco-Valenzuela T, DeVries J, Doellman M, Elsner D, Espíndola-Hernández P, Montoya GF, Gaspar B, Zagorski D, Hałakuc P, Ivanovska B, Laumer C, Lehmann R, Boštjančić LL, Mashoodh R, Mazzoleni S, Mouton A, Nilsson MA, Pei Y, Potente G, Provataris P, Pardos-Blas JR, Raut R, Sbaffi T, Schwarz F, Stapley J, Stevens L, Sultana N, Symonova R, Tahami MS, Urzì A, Yang H, Yusuf A, Pecoraro C, Suh A. Teaching transposon classification as a means to crowd source the curation of repeat annotation – a tardigrade perspective. Mob DNA. 2024;15:10. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-024-00319-8.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Lowe TM, Chan PP. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 2016;44:W54–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkw413.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Chan PP, Lowe TM. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 2016;44:D184–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkv1309.

    Article  CAS  PubMed  Google Scholar 

  44. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–2. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bioinformatics/bts565.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.1239303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bioinformatics/btp348.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Seppey M, Manni M, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness. Methods Mol Biol. 2019;1962:227–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-1-4939-9173-0_14.

    Article  CAS  PubMed  Google Scholar 

  49. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/molbev/msaa015.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods. 2017;14:587–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/nmeth.4285.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2018;35:518–22. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/molbev/msx281.

    Article  CAS  PubMed  Google Scholar 

  52. To T-H, Jung M, Lycett S, Gascuel O. Fast dating using least-squares criteria and algorithms. Syst Biol. 2016;65:82–97. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/sysbio/syv068.

    Article  CAS  PubMed  Google Scholar 

  53. Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, Stecher G, Hedges SB. TimeTree 5: an expanded resource for species divergence times. Mol Biol Evol. 2022;39(8):msac174. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/molbev/msac174.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Tarailo-Graovac M, Chen N. Using repeatmasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009;25:4.10.1–4.10.14. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/0471250953.bi0410s25.

    Article  Google Scholar 

  55. Xu R, Martelossi J, Smits M, Iannello M, Peruzza L, Babbucci M, Milan M, Dunham JP, Breton S, Milani L, Nuzhdin SV, Bargelloni L, Passamonti M, Ghiselli F. Multi-tissue RNA-seq analysis and long-read-based genome assembly reveal complex sex-specific gene regulation and molecular evolution in the Manila clam. Genome Biol Evol. 2022;14(12):evac171. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evac171.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Bai C-M, Xin L-S, Rosani U, Wu B, Wang Q-C, Duan X-K, Liu Z-H, Wang C-M. Chromosomal-level assembly of the blood clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C. GigaScience. 2019;8(7):giz067. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gigascience/giz067.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA. 2021;12:2. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-020-00230-y.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. González-Delgado S, Rodríguez-Flores PC, Giribet G. Testing ultraconserved elements (UCEs) for phylogenetic inference across bivalves (Mollusca: Bivalvia). Mol Phylogenet Evol. 2024;198: 108129. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ympev.2024.108129.

    Article  CAS  PubMed  Google Scholar 

  59. González VL, Andrade SCS, Bieler R, Collins TM, Dunn CW, Mikkelsen PM, Taylor JD, Giribet G. A phylogenetic backbone for Bivalvia: an RNA-seq approach. Proc R Soc B Biol Sci. 2015;282:20142332. https://doiorg.publicaciones.saludcastillayleon.es/10.1098/rspb.2014.2332.

    Article  CAS  Google Scholar 

  60. Ip JC-H, Xu T, Sun J, Li R, Chen C, Lan Y, Han Z, Zhang H, Wei J, Wang H, Tao J, Cai Z, Qian P-Y, Qiu J-W. Host-endosymbiont genome integration in a deep-sea chemosymbiotic clam. Mol Biol Evol. 2021;38:502–18. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/molbev/msaa241.

    Article  CAS  PubMed  Google Scholar 

  61. Lee Y, Kwak H, Shin J, Kim S-C, Kim T, Park J-K. A mitochondrial genome phylogeny of Mytilidae (Bivalvia: Mytilida). Mol Phylogenet Evol. 2019;139: 106533. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ympev.2019.106533.

    Article  CAS  PubMed  Google Scholar 

  62. Sun W, Gao L. Phylogeny and comparative genomic analysis of Pteriomorphia (Mollusca: Bivalvia) based on complete mitochondrial genomes. Mar Biol Res. 2017;13:255–68. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/17451000.2016.1257810.

    Article  Google Scholar 

  63. Wang Y, Yang Y, Kong L, Sasaki T, Li Q. Phylogenomic resolution of Imparidentia (Mollusca: Bivalvia) diversification through mitochondrial genomes. Mar Life Sci Technol. 2023;5:326–36. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s42995-023-00178-x.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Sproul J, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM, Kelley J, Pauls SU, Frandsen PB. 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res. 2023;33(10):1708–17. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.277387.122.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Yang L, Scott L, Wichman HA. Tracing the history of LINE and SINE extinction in sigmodontine rodents. Mob DNA. 2019;10:22. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-019-0164-5.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Ray DA, Grimshaw JR, Halsey MK, Korstian JM, Osmanski AB, Sullivan KAM, Wolf KA, Reddy H, Foley N, Stevens RD, Knisbacher BA, Levy O, Counterman B, Edelman NB, Mallet J. Simultaneous TE analysis of 19 heliconiine butterflies yields novel insights into rapid TE-based genome diversification and multiple SINE births and deaths. Genome Biol Evol. 2019;11:2162–77. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evz125.

    Article  PubMed  PubMed Central  Google Scholar 

  67. Han G, Zhang N, Jiang H, Meng X, Qian K, Zheng Y, Xu J, Wang J. Diversity of short interspersed nuclear elements (SINEs) in lepidopteran insects and evidence of horizontal SINE transfer between baculovirus and lepidopteran hosts. BMC Genomics. 2021;22:226. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12864-021-07543-z.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Piskurek O, Okada N. Poxviruses as possible vectors for horizontal transfer of retroposons from reptiles to mammals. Proc Natl Acad Sci U S A. 2007;104:12046–51. https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.0700531104.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Adrion JR, Song MJ, Schrider DR, Hahn MW, Schaack S. Genome-wide estimates of transposable element insertion and deletion rates in drosophila melanogaster. Genome Biol Evol. 2017;9:1329–40. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evx050.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Zhang Y, Mager DL. Gene properties and chromatin state influence the accumulation of transposable elements in genes. PLoS ONE. 2012;7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Vinogradov AE. DNA helix: the importance of being GC-rich. Nucleic Acids Res. 2003;31:1838–44. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkg296.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Coronado-Zamora M, González J. Transposons contribute to the functional diversification of the head, gut, and ovary transcriptomes across Drosophila natural strains. Genome Res. 2023;33:1541–53. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.277565.122.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Lenoir A, Lavie L, Prieto J-L, Goubely C, Cote J-C, Pélissier T, Deragon J-M. The evolutionary origin and genomic organization of SINEs in Arabidopsis thaliana. Mol Biol Evol. 2001;18:2315–22. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/oxfordjournals.molbev.a003778.

    Article  CAS  PubMed  Google Scholar 

  74. Seibt KM, Wenke T, Muders K, Truberg B, Schmidt T. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization. Plant J. 2016;86:268–85. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/tpj.13170.

    Article  CAS  PubMed  Google Scholar 

  75. Luchetti A, Plazzi F, Mantovani B. Evolution of two short interspersed elements in Callorhinchus milii (Chondrichthyes, Holocephali) and related elements in sharks and the coelacanth. Genome Biol Evol. 2017;9(6):1406–17. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evx094.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Buckley RM, Kortschak RD, Raison JM, Adelson DL. Similar evolutionary trajectories for retrotransposon accumulation in mammals. Genome Biol Evol. 2017;9:2336–53. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/gbe/evx179.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Ruggieri AA, Livraghi L, Lewis JJ, Evans E, Cicconardi F, Hebberecht L, Ortiz-Ruiz Y, Montgomery SH, Ghezzi A, Rodriguez-Martinez JA, Jiggins CD, McMillan WO, Counterman BA, Papa R, Belleghem SMV. A butterfly pan-genome reveals that a large amount of structural variation underlies the evolution of chromatin accessibility. Genome Res. 2022;32:1862–75. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.276839.122.

    Article  PubMed  PubMed Central  Google Scholar 

  78. Devos KM, Brown JKM, Bennetzen JL. Genome size reduction through illegitimate recombination counteracts genome expansion in arabidopsis. Genome Res. 2002;12:1075–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.132102.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Hedges DJ, Deininger PL. Inviting instability: transposable elements, double-strand breaks, and the maintenance of genome integrity. Mutat Research/Fundamental and Molecular Mechanisms of Mutagenesis, Dedicated in memory of DrTony Carrano. 2007;616:46–59. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.mrfmmm.2006.11.021.

    Article  CAS  Google Scholar 

  80. Morales ME, White TB, Streva VA, DeFreece CB, Hedges DJ, Deininger PL. The contribution of Alu elements to mutagenic DNA double-strand break repair. PLoS Genet. 2015;11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Balachandran P, Walawalkar IA, Flores JI, Dayton JN, Audano PA, Beck CR. Transposable element-mediated rearrangements are prevalent in human genomes. Nat Commun. 2022;13:7115. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41467-022-34810-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Kinoshita Y, Saze H, Kinoshita T, Miura A, Soppe WJJ, Koornneef M, Kakutani T. Control of FWA gene silencing in Arabidopsis thaliana by SINE-related direct repeats. Plant J. 2007;49:38–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/j.1365-313X.2006.02936.x.

    Article  CAS  PubMed  Google Scholar 

  83. Vassetzky NS, Kosushkin SA, Ryskov AP. SINE-derived satellites in scaled reptiles. Mob DNA. 2023;14:21. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-023-00309-2.

    Article  PubMed  PubMed Central  Google Scholar 

  84. Lee W, Mun S, Kang K, Hennighausen L, Han K. Genome-wide target site triplication of Alu elements in the human genome. Gene. 2015;561:283–91. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.gene.2015.02.052.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the EVO·COM lab members for useful comments and discussions about the analyses and interpretation of the results.

Funding

This work was supported by the Canziani bequest funded to F.G. and A.L. and the ‘Ricerca Fondamentale Orientata’ (RFO) funding from the University of Bologna to F.G. and A.L.

Author information

Authors and Affiliations

Authors

Contributions

JM, AL, and FG designed the study. JM and MI collected the data and performed the bioinformatic analyses. JM curated the data. JM wrote the first version of the manuscript and additional supplementary files. JM, AL, FG, and MI revised the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Jacopo Martelossi or Fabrizio Ghiselli.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13100_2024_332_MOESM1_ESM.docx

Additional file 1: Supplementary Table 1. Species and assembly accession numbers used for de novo and homology-based mining of SINEs. Taxonomic informations were retrived from NCBI taxonomy.

13100_2024_332_MOESM2_ESM.xlsx

Additional file 2: Supplementary Table 2. Details about confirmed SINE sequences mined with RepeatModeler2 and SINE_Scan. Asterisks in the tRNA column means that the tRNA was predicted through homology searches against GtRNAdb (http://gtrnadb.ucsc.edu), “Undet” means that A and B boxes were manually verified while in all other istances the tRNA donor was predicted with tRNAScan-SE. For each element we reported the species from which it was mined following the abbreviations in Supplementary Table 1.

13100_2024_332_MOESM3_ESM.docx

Additional file 3: Supplementary Table 3. Genomic distribution of observed and simulated LINE insertions in C. gigas with respect to different genomic backgrounds. Gene flanking = 2,500bp at both ends of genes; Intergenic = intergenic genomic regions after excluding gene flanking; SD = Standard deviation. Positive and negative Z-scores indicate more and less observed insertions compared to the null expectation, respectively.

13100_2024_332_MOESM4_ESM.pdf

Additional file 4: Supplementary Figure 1. SINE-LINE partnerships. Representative alignments between SINE-LINE homologues regions identified in this study. Identical nucleotides are included in grey boxes with asterisks. All LINE families were already identified in Martelossi et al., (2023). (A) A.marissinica_126=LINE/CR1; (B) T.granosa_0=LINE/CR1-Zenon; (C) S.constricta_0=LINE/I; (D) M.philippinarum_91=LINE/CR1-Zenon; (E) B.platrifrons_81=LINE/Nimb (I superfamily).

13100_2024_332_MOESM5_ESM.pdf

Additional file 5: Supplementary Figure 2. Network visualisation of homologous relationships between SINE tails obtained with BLASTn. Nodes represent LINE tails and edges homologues relationships.

13100_2024_332_MOESM6_ESM.pdf

Additional file 6: Supplementary Figure 3. TE-Aid results of all identified LINE partners. (A) Amar-1_LINE#CR1; (B) Cgig-1_LINE#L2; (C) CR1-14_CGi#CR1-Zenon; (D) Gpla-4_LINE#I; (E) Mcal-1_LINE#I; (F)  Medu-1_LINE#I; (G) Myes-2_LINE#I; (H) Rphi-1_LINE#I; (I) Sbro-1_LINE#I; (L) Tgra-1_LINE#I.

13100_2024_332_MOESM7_ESM.pdf

Additional file 7: Supplementary Figure 4. Co-evolutionary dynamics between SINEs and their LINE counterparts. Repeat landscape profiles of species-specific SINE-LINEs partners (See Material and Methods: “Prediction of SINE-LINE partnerships”). The plots represent the total number of base pairs (y axis) occupied in each bin of CpG corrected Kimura divergence (x axis).

13100_2024_332_MOESM8_ESM.pdf

Additional file 8: Supplementary Figure 5. Scatterplot of ranked values of number of base pairs occupied by SINEs (x axis) and their LINE counterparts (y axis) with respect to bins of size 1 of % CpG corrected Kimura divergence of each SINE and LINE copy to their consensus sequence. Rho: Spearman’s rank correlation coefficient (*p-value<0.05; **p-value<0.01; ***p-value<0.001). Ranked values reflect Supplementary Figure 3. Each box represents one SINE-LINE partnership.

13100_2024_332_MOESM9_ESM.pdf

Additional file 9: Supplementary Figure 6. Different SINE accumulation patterns in introns of S. broughtonii and R. philippinarum. Number of SINE insertions per intron in R. philippinarum (Rphi) and S. broughtonii (Sbro).

Additional file 10: Supplementary Data S1. Consensus sequences of newly generated species-specific SINE families.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martelossi, J., Iannello, M., Ghiselli, F. et al. Widespread HCD-tRNA derived SINEs in bivalves rely on multiple LINE partners and accumulate in genic regions. Mobile DNA 15, 22 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-024-00332-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-024-00332-x

Keywords