- Research
- Open access
- Published:
Analysis of pericentromere composition and structure elucidated the history of the Hieracium alpinum L. genome, revealing waves of transposable elements insertions
Mobile DNA volume 15, Article number: 26 (2024)
Abstract
Background
The centromere is one of the key regions of the eukaryotic chromosome. While maintaining its function, centromeric DNA may differ among closely related species. Here, we explored the composition and structure of the pericentromeres (a chromosomal region including a functional centromere) of Hieracium alpinum (Asteraceae), a member of one of the most diverse genera in the plant kingdom. Previously, we identified a pericentromere-specific tandem repeat that made it possible to distinguish reads within the Oxford Nanopore library attributed to the pericentromeres, separating them into a discrete subset and allowing comparison of the repeatome composition of this subset with the remaining genome.
Results
We found that the main satellite DNA (satDNA) monomer forms long arrays of linear and block types in the pericentromeric heterochromatin of H. alpinum, and very often, single reads contain forward and reverse arrays and mirror each other. Beside the major, two new minor satDNA families were discovered. In addition to satDNAs, high amounts of LTR retrotransposons (TEs) with dominant of Tekay lineage, were detected in the pericentromeres. We were able to reconstruct four main TEs of the Ty3-gypsy and Ty1-copia superfamilies and compare their relative positions with satDNAs. The latter showed that the conserved domains (CDs) of the TE proteins are located between the newly discovered satDNAs, which appear to be parts of ancient Tekay LTRs that we were able to reconstruct. The dominant satDNA monomer shows a certain similarity to the GAG CD of the Angela retrotransposon.
Conclusions
The species-specific pericentromeric arrays of the H. alpinum genome are heterogeneous, exhibiting both linear and block type structures. High amounts of forward and reverse arrays of the main satDNA monomer point to multiple microinversions that could be the main mechanism for rapid structural evolution stochastically creating the uniqueness of an individual pericentromeric structure. The traces of TEs insertion waves remain in pericentromeres for a long time, thus “keeping memories” of past genomic events. We counted at least four waves of TEs insertions. In pericentromeres, TEs particles can be transformed into satDNA, which constitutes a background pool of minor families that, under certain conditions, can replace the dominant one(s).
Background
The centromere is one of the key regions of eukaryotic chromosomes and plays an important role in the precise segregation of chromosomes during cell division [1,2,3]. Cytologically, centromeres are recognized as primary constrictions where two identical sister chromatids are in the closest contact. In the pericentromeric region, blocks of structural heterochromatin, consisting mainly of satellite DNAs (satDNAs) and transposable elements (TEs), are located [4] (in this study, we define this block of heterochromatin in the region of the primary constriction that includes the functional centromere as the “pericentromeric region” or “pericentromere”). While maintaining chromosome segregation across eukaryotes, centromeric DNA is among the fastest evolving [5] and may differ even among closely related species [6]. Hence, rapidly evolving pericentromeric components may be responsible for reproductive isolation or, in other words, may be involved in the speciation process [7]. Insight into the structural features and sequence variation of pericentromeres is essential for a more comprehensive understanding of both centromere functions [8] and macroevolutionary processes in groups of species.
Therefore, Hieracium s. str. (Asteraceae, Cichorieae, Hieracium subgen. Hieracium, hawkweed), which is one of the most diverse genera in the entire plant kingdom, is of particular interest. The recent morphological variability in the genus likely reflects immense reticulate evolution in the past. Hieracium is predominantly a Eurasian complex, with species numbers varying between approximately 500 and 5.000, depending on the species concept [9, 10]. It consists of ca. 30 diploid sexual species (2n = 2x = 18) and a vast number of morphologically more or less easily distinguishable apomictic polyploids with prevailing tri- and tetraploids and rare pentaploids [11]. Although the chromosome numbers in the genus Hieracium are well studied [12], genomic studies at the molecular level have focused predominantly on ribosomal sites [13] and chloroplast DNA [14]. Low-coverage next-generation sequencing has been used only a limited number of times for developing marker system for molecular cytogenetic analysis of Hieracium chromosomes [15, 16] and for repeatome analysis [17]. For any Hieracium species, there are no data on third-generation (nanopore) sequencing or assembled genomes, whereas in related taxa of Cichorieae, namely Taraxacum [18], Cichorium [19] and Lactuca [20] sequenced genomes have been used in fundamental and applied studies [21,22,23].
In the present research, for the first time, we used the Oxford Nanopore Technology (ONT) sequencing data from the Hieracium alpinum genome. The section Alpina is central to the system of the genus Hieracium [24]. H. alpinum is an Arcto-alpine species growing in open-canopy grasslands on mountain summits and on the highest slopes, in dwarf shrub communities, on rock ledges, on bare stony slopes, and in Nardus grasslands in northern Europe, the highest central European mountains and Greenland (Fig. 1A). It consists of allopatric sexual diploids (2n = 2x = 18, Eastern and Southern Carpathians) and apomictic triploids (2n = 3x = 27, the remainder of the range). We explored the genome of diploid H. alpinum (sexual diploid) because it can serve as a benchmark for further research on apomictic speciation, with a special emphasis on the role of pericentromeric chromosomal regions in this process.
The chromosomal molecular marker system for Hieracium that we developed [15] includes a pericentromere-specific tandem repeat HintCl-18 (Fig. 1B). This repeat forms a block of pericentromeric heterochromatin that includes the primary constriction (the centromere) (Fig. 1B Box 1). This has made it possible to distinguish individual reads attributed to the pericentromeric regions of H. alpinum chromosomes within the ONT sequencing library [25] and separate them into a discrete subset. In the present research, we compared the repetitive DNA composition of the pericentromeric regions with that of the remaining genome. We assume that separation should reveal any: (i) specificity of pericentromeric repeatome structural organization, and (ii) possible structural connections between the components of this region.
Object of the study and workflow. A Hieracium alpinum, whole plant and inflorescence. (Photos by Begoña Quirós de la Peña). B Chromosomal position of the HintCl-18 satDNA element determined by FISH (chromosomes and interphase nucleus). A comparison of the positions of the primary constriction (centromere) and block of HintCl-18 on the same chromosome is shown in Box (1). The blue fluorescence represents DAPI staining, and the red fluorescence represents the HintCl-18 probe. The scale bar represents 5 μm. C Graphical representation of the HintCl-18 satDNA family monomer as a sequence logo. D Workflow for comparative analysis of the p- and g-subsets of the ONT library. E Workflow for reconstructing complete transposable elements from their parts
Methods
Plant material, DNA extraction, library preparation, and Oxford Nanopore Technology sequencing
For both the preparation of the DNA libraries and the cytogenetic experiments, H. alpinum plants from the Mt. Bliznitsya (Ukraine) population (ID number: PAI 33838) were used. The plants were cultivated in the experimental field of the Institute of Botany in Průhonice. Leaves were collected, and DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. For in situ hybridization experiments, the tips of young roots were collected and fixed as described by Belyayev et al. [15] and then stored until use.
The ONT sequence data for the H. alpinum genomic sample were generated on the PromethION platform. Genomic DNA was obtained from KeyGene, and quality control was performed, which involved concentration measurements (before and after Ampure purification), and a check of the length distribution via a Femto Pulse instrument. Size selection was performed on the genomic DNA via the PacBio/Circulomics SRE Kit. The SQK-LSK114 library prep kit was used to construct a 1D library according to the manufacturer’s protocol. Approximately 290 ng of the library was loaded on each FLO-PRO114M (R10.4.1 pore) flow cell. After 21 h a nuclease flush was performed and ~ 175 ng of the remaining library was loaded. The sample was sequenced on a PromethION P24 for 72 h, and run at a translocation speed of ~ 400 bps. Basecalling was performed in real time by the PromethION compute module via MinKNOW version 22.10.5 (Guppy 6.3.8) with the superaccurate basecalling model. All reads that passed the default quality filter (q ≥ 10) are provided in the *.fastq format.
ONT library samples of 2, 1.5 and 1 million reads were created by RepeatExplorer (RE) [26] special pipeline “random sampling”. The TEs quantities in each sample were determined by another component of RE namely by Domain based ANnotation of Transposable Elements (DANTE). To compare the samples, a filtered count for each of the determined TEs was recalculated to relative values; specifically, we calculated the number of hits per 10 Mbp (hits/10Mbp). The obtained values revealed no significant differences in the number of TEs among the three samples. Thus, for further analysis, we used a 1 million-read sample (trimmed) of the ONT library (total length 16 420 720 704 bp, genome coverage 2.22x [27]).
Dividing the ONT library for subsets and screening for satDNA
On the basis of our previous data [15], a conserved motif within the HintCl-18 monomer was determined (Fig. 1C). This motif was used to search for HintCl-18 arrays. In silico scanning of the ONT library was performed by using the “search for motifs” command of Geneious Prime software version 2023.2.1 (https://www.geneious.com) [28] with zero mismatches. Reads containing arrays of pericentromeric repeats were separated into discrete subset (p-subset). Thus, it became possible to compare the repetitive DNA composition of the pericentromeric regions (p-subset) with that of the remaining genome (g-subset). The workflow for comparative analysis of the p- and g-subsets is presented in Fig. 1D. For analysis of the major repeat arrays structure and for possible additional minor satDNA searches in the p-subset, two publicly available online tools were used: tandem repeat finder (TRF) (https://tandem.bu.edu/trf/trf.html) [29] and the YASS genomic similarity tool (http://bioinfo.lifl.fr/yass/yass.php) [30]. The latter builds a self-to-self comparison of the ONT reads displayed as a dot-plot where parallel lines indicate tandem repeats and the distance between the diagonals is equal to the length of the motif. All dot-plots in the paper green lines indicate forward sequences, and the red lines indicate reverse sequences (Figs. 2A-C and 3C-F and H-J). For newly discovered satDNA families that did not show any BLAST similarities, a consensus monomer was determined. Conserved motifs of 14 bp were distinguished within the consensus monomers for further in silico p- and g-subsets scanning to quantify the repeatome components content (Table 1).
Determination of the TE content and reconstruction of the intact TEs
The conserved protein domains (CDs) of TEs were retrieved from the ONT library by DANTE. CDs were filtered from the Viridiplantae_v3.0 database with the following parameters: minimum identity, 0.35; minimum similarity, 0.45; minimum alignment length, 0.8; interruptions (frameshifts + stop codons), 3; and maximal length proportion, 1.2. For comparison of p- and g-subsets, filtered counts for each determined TE were recalculated to relative values of hits/10Mbp.
satDNA of H. alpinum pericentromeres. A Read 2002 with a linear-type HintCl-18 array (blue). B Read 263 with block-type HintCl-18 arrays (HintCl-18 - blue, linkers - red). C Enlargement of three blocks of the block-type HintCl-18 array structure (dot-plot only). D Chromosomal position of the linker sequences determined by FISH. In Boxes 1 and 2, the second round of hybridization of the same metaphase chromosomes with HintCl-18 is shown. The scale bar represents 5 μm
To determine possible matches (in nucleotide composition and/or positioning) between TEs and pericentromeric satDNAs, the four most abundant for H. alpinum pericentromeric TEs – two of Ty3-gypsy superfamily (CRM and Tekay) and two of Ty1-copia superfamily (SIRE and Angela) were reconstructed. The algorithm for reconstruction was as follows: (1) the sequences for key conserved domains, particularly reverse transcriptase (RT), GAG and PROT or RVE, for each element were determined by DANTE; (2) conserved motifs for each CD were determined from 20 complete CD sequences without stop codons (Table 1); (3) the particular motif was used for a CD position search in the ONT library by the “find motifs” command in Geneious Prime software; (4) in the corresponding read where three CDs of each TE were found together, a fragment of approximately 30.000 bp around the CDs position was captured and analyzed for the presence of: (i) LTRs which can be seen in dot-plots (YASS program output) as two parallel lines at the end of the element (i.e. identical sequences) and LTR 5’ and 3’ ends were determined by pairwise sequence alignment (https://www.ebi.ac.uk/jdispatcher/psa/emboss_needle ), and (ii) a set of CDs typical for specific elements (BLAST CD search: https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi ); (5) CD domain composition and their relative positions were determined on the basis of 10–12 determined elements (which were generally incomplete but still contained different complete CDs), and the complete element with LTRs was reconstructed (Fig. 1E).
For the reconstruction of two ancient Tekay elements (see below), reads that met the following requirements were selected: (i) contain Halp-aLTR-1 or Halp-aLTR-2 monomer arrays that are several thousand bp apart; (ii) between the two monomer arrays there should be one or more CDs; and (iii) the YASS results should show a configuration specific for LTR retrotransposons (two parallel lines at the end of the element, see below).
In situ probe preparation and FISH procedure.
To characterize the chromosomal distribution of the satDNAs, fluorescence in situ hybridization (FISH) experiments were performed. The root fixation, slide preparation, probe labeling and FISH procedures were performed as described for Hieracium species by Belyayev et al. [15]. For visualization of the “linker” sequences (see Results section), we used a 65 bp synthetic probe (ATGATCCTGTTACCAAAGGGGCTTTACAGAAATAGTCATAACATGGGCTAC GGAGTCCATTTTA). The probe was prepared and labeled with Cy3 by Eurofins Genomics, Ebersberg, Germany. The slides were examined and photographed with a Zeiss Axio Imager Z2 microscope system.
Results
satDNA composition of H. alpinum pericentromeres
The main component of the pericentromeric regions of the H. alpinum genome is arrays of HintCl-18 tandem repeats of 21 bp monomers (Fig. 1B, C), which we described previously [15]. We found three variations of the monomer with two substitutions at positions 7 and 21. In silico scanning by Geneious Prime of the ONT library revealed that monomers and their HOR derivatives (which are larger repeat units consisting of multiple basic repeat units [31]) form arrays that may extend beyond 55.000 bp (Supplementary S1). HORs are common in HintCl-18 arrays and, for example, we counted more than 60 HOR variants (determined by TRF) with different lengths (up to 312 bp) within the array of ≈ 50.000 bp long. Very often, a single read contains forward and reverse arrays of HintCl-18, which may indicate high levels of microinversions (Supplementary S1). Within the array, the monomers were mostly oriented in the same direction (linear-type arrays) (Fig. 2A), but approximately 2% of the reads in the p-subset possessed an unusual structure when the array consisted of repeated ≈ 650 bp blocks of forward and reverse HintCl-18 repeats with “linkers” between them (block-type arrays) (Fig. 2B, C; Supplementary S1). The linker lengths ranged from approximately 150 to 300 bp, and they differed in nucleotide composition from HintCl-18 but were similar to each other. We analyzed twenty randomly selected linkers from the longest reads of the p-subset, and their identity ranged from 74,48% to 92,18%, which may indicate shared ancestry. Owing to the low copy number in the genome (in the p-subset, linkers occur with a frequency of 73,84 hits/10Mbp, whereas HintCl-18 monomers occur with a frequency of 57.899,56 hits/10Mbp), we doubted the possibility of detecting the FISH signal from the linker probe, expecting that it would be at the level of the background signal (noise), which is usually discriminated during the processing of high-copy-number in situ probes. However, FISH with the synthetic linker probe revealed weak but distinct signals on the two chromosome pairs at the centromeric position (Fig. 2D). It is possible that there was a very low signal on several other chromosomes, but we could not confidently assert this because it was at the limit of resolution for the method. A second round of hybridization of the same metaphase chromosomes with HintCl-18 that show the entire block of pericentromeric heterochromatin reveal that linkers occupy a predominantly central position thus forming a small subblock (Fig. 2D Boxes 1 and 2).
Additionally, within the p-subset, by using TRF we found two new minor satDNA repeats unrelated to HintCL-18, namely, Halp-aLTR-1 (abbreviations are explained below), with a short monomer of ≈ 20 bp (consensus: ATCAGGCCATTCCGGCCCAT), and Halp-aLTR-2, with a monomer of ≈ 40 bp (consensus: ATCACAGTCATATATAGGTTAGTTACCATTGAGGTAAC TATGTTC) (Supplementary S2). These two new satDNAs are present not only in the pericentromeres but also in the g-subset, although they are in greater numbers in the p-subset. Thus, Halp-aLTR-1 occurred with a frequency of approximately 52,11 hits/10Mbp in the p-subset and a frequency of 2,22 hits/10Mbp in the g-subset, and Halp-aLTR-2 occurred with a frequency of approximately 245,32 hits/10Mbp in the p-subset and a frequency of 17,99 hits/10Mbp in the g-subset (Table 2). The Halp-aLTR-1 monomer is often represented by copies usually located at a distance of 8.500–10.000 bp from each other, although in read 1265 of the g-subset, we discovered 2 arrays oriented in different directions where on the DNA segment of approximately 5.000 bp ≈ 50 copies of Halp-aLTR-1 were located (Supplementary S2). The Halp-aLTR-2 monomer is represented by short arrays of 2–15 copies and is often located at a distance of 4.500–8.500 bp from each other.
TEs components of H. alpinum pericentromeres
Partitioning the ONT library to subsets allowed us to compare the presence of different TE families in the pericentromeres and the remaining genome. According to this analysis using DANTE, all TEs that were present in the genome were also present in the pericentromeric regions, but their content in the conversion to hits/10Mbp was more than tenfold less than that in the remaining genome (Fig. 3A; Table 3). However, two elements were an exception. The copy number of the centromere-specific Ty3-gypsy chromovirus CRM was higher (which was expected). The content of dominant in the genomes of Hieracium [17] and in general in the Asteraceae Ty3-gypsy chromovirus of the Tekay lineage [32], were only 19% lower in the pericentromeric regions (Fig. 3A).
TEs components of H. alpinum pericentromeres and their relationships with satDNAs. A The frequency of the major TE family in the conversion to hits/10Mbp (Z axis) in subsets of the ONT library shows a significant copy-number reduction, except for Tekay and CRM in pericentromeres (see also Table 3). B Example of the intermixing of satDNAs and TEs in the longest read of the p-subset. C Insertion of the Tekay element between two arrays of HintCl-18 satDNA (read 4772). D Insertion of the CRM element in the array of HintCl-18 satDNA (read 1047). E Localization of the RT (red) and PROT (green) conserved domains of the Tekay element between arrays of Halp-aLTR-1 (blue) (read 1819). F Location of the conserved RT domain (red) of the Tekay element between arrays of Halp-aLTR-2 (blue) (read 1719). G Alignment of the Angela GAG CD and HintCl-18 monomer. H TE-like structure of HintCl-18 satDNA arrays. I Comparison of the LTRs structure via dot-plots of two present and two ancient Tekay elements. J Dendrogram of LTR sequences
For further analysis of the putative relationship between satDNAs and TEs, we reconstructed the modern CRM and Tekay Ty3-gypsy elements of Hieracium based on 10 and 12 almost complete elements that were found in the ONT library of H. alpinum (Supplementary S3). The length of the CRM element was approximately 6.300 bp, with an LTR of approximately 515 bp. The Tekay element was more than twice as long at approximately 13.500 bp, with an LTR of approximately 3.500 bp. The latter appears to be a complex mix of forward and reverse monomers of different lengths and conserved motifs that are typical for Ty3-gypsy elements (see below) [33]. The interior of the reconstructed CRM and Tekay elements contained CDs that were identified by DANTE as retrotransposons of the corresponding family. Based on the obtained data, we identified conserved CDs and LTRs motifs for the subsequent identification of full copies of TEs and their separate parts in the p-subset (Table 1).
The percentage of Ty1-copia superfamily elements was relatively low in the pericentromeric regions (Fig. 3A). The main elements in the H. alpinum genome were SIRE and Angela, which we were also able to reconstruct (Supplementary S3). The length of the SIRE element was 17.433 bp, with an LTR of ≈ 876 bp. The length of the Angela element was 9.668 bp, with an LTR of ≈ 450 bp. The interior of the reconstructed SIRE and Angela elements contained conserved domains that were identified by DANTE as the retrotransposons of the corresponding family.
Additionally, we attempted to reconstruct the Athila element. Although we determined that the length of the element was ≈ 13.000 bp, the length of the LTRs was ≈ 1.600 bp, and although we succeeded in finding conserved motifs in the protein domains (Table 1), we were unable to reconstruct the entire element because no complete CDs were found in the ONT library.
Relationship between TEs and satDNAs in pericentromeric regions
Retrieval of TEs from the ONT library via a conserved motif search made it possible to determine the relative position and possible affinity of the repeatome components in the pericentromeric regions of the H. alpinum chromosomes. Generally, the pericentromeric regions of H. alpinum appear to be a complex mix of satDNAs and TEs (Fig. 3B; Supplementary S4). Within the p-subset, we found almost complete and incomplete TEs as well as their solo components. For example, in read 4.772, an insertion of almost complete Tekay element in the HintCl-18 array was detected (Fig. 3C; Supplementary S4). In the same read which is based on the presence of putative LTRs, another inserted TE near the 3’ could be observed, but we failed to define it owing to its strong degradation. Examples of insertions can be found for each of the studied TEs (Fig. 3D; Supplementary S4). Additionally, many solo LTRs and CDs were found in the pericentromeric regions, most of which belong to the Tekay element (Fig. 3B; Supplementary S4).
A comparison of the relative positions of the newly discovered satDNAs and solo CDs revealed that, very often, solo CDs of the Tekay element were located in the spaces between the arrays of Halp-aLTR-1 and Halp-aLTR-2 (Fig. 3E, F; Supplementary 2). On dot-plots, this looks like decayed TEs, wherein newly discovered satDNAs belong to LTRs. We also noted that Tekay CDs were more common in the spaces between Halp-aLTR-1 arrays than in those between Halp-aLTR-2 arrays. In the p-subset, we counted 66 cases in which the decayed Tekay CDs were between Halp-aLTR-1 arrays oriented in the same direction and only 14 cases in which the decayed Tekay CDs were between Halp-aLTR-2 arrays. For example, read 1819 contains three putative decayed TEs with Halp-aLTR-1 in the LTR regions and an incomplete RT and PROT of Tekay in between (Fig. 3E; Supplementary S2), and read 1719 contains Halp-aLTR-2 and an incomplete RT in between (Fig. 3F; Supplementary S2).
Notably, the main pericentromeric monomer HintCl-18 is similar in nucleotide composition to the fragments of the Angela Ty1-copia retrotransposon GAG domain (Fig. 3G; Supplementary S4) and sometimes forms dot-plot structures resembling decayed TEs (Fig. 3H; Supplementary S4).
Reconstruction of ancient Tekay elements
Using the strategy of reconstructing complete elements from their parts, we attempted to at least partially reconstruct the ancient Tekay retrotransposons, which, as we assumed, were sources for arrays of Halp-aLTR-1 and Halp-aLTR-2 satDNAs in the H. alpinum genome (Supplementary S5). The reconstructed Tekay element that was the source for the Halp-aLTR-1 satDNA was 10.456 bp in length, with an LTR of 1.233 bp (see read 1819 of the p-subset). The reconstructed Tekay element that was the source for the Halp-aLTR-2 satDNA was 11.779 bp in length, with an LTR of 3,682 bp (see read 1719 of the p-subset). Notably, despite having approximately the same length, the distance between two LTRs (where CDs were located) was almost twice as short in Halp-aLTR-2 (7.990 bp versus 4.415 bp). The CDs of recent and ancient elements were highly conserved. The most numerous residues of the RT domain presented approximately 81,8–84,7% homologous fragment similarity between the ancient and recent CDs, but all CDs from ancient elements were incomplete. The LTRs were most different, as expected. We were able to reconstruct the LTRs of the relatively young (according to the quantity and preservation of CDs) Halp-aLTR-1 and the more ancient Halp-aLTR-2 (Supplementary S5, S6). We also compared the LTRs of the present Tekay retrotransposon of H. alpinum (Cichorioideae, Asteraceae) with the LTRs of similar elements from the genomes of other Asteraceae species that belong to different subfamily particularly Marshallia obovata (Asteroideae, Asteraceae) (GenBank KX396599.1). The percent identity of the four analyzed LTRs (two present and two ancient) was very low and did not exceed 47,87% (Supplementary S6). In addition, the structures of the LTRs were completely different (Fig. 3I). However, conserved Ty3-gypsy motifs [33, 34] were still present in all the analyzed LTRs (Supplementary S6), which made them comparable. The application of the Clustal Omega 2.1 program [35] resulted in a dendrogram showing that LTRs containing the Halp-aLTR-1 monomer were closer to the Tekay LTRs of H. alpinum and LTRs containing the Halp-aLTR-2 monomer were closer to the Tekay LTRs of M. obovata (Fig. 3J).
Discussion
The application of the ONT-separated libraries approach made it possible to reveal features of the pericentromeric regions of H. alpinum chromosomes. The dominant 21-bp-long satDNA repeat HintCl-18 forms forward and reverse linear-type arrays that often mirror each other in blocks of pericentromeric heterochromatin (Fig. 3B). The opposite direction of the arrays may indicate high levels of microinversions that occur in the pericentromeric regions of H. alpinum chromosomes. In general, inversions in plant and animal kingdoms are fundamental drivers of genome evolution [36,37,38] including centromere shifts [39]. Microinversions can be defined as cytologically undetectable inversions with sizes ranging from 23 bp to 62 Mb [40]. In pericentromeres, which are treated as recombination cold spots [4], microinversions may be the main mechanism for rapid structural evolution while maintaining nucleotide composition (similar to that of mitochondrial DNA [41]). For the possible mechanisms of microinversion formation, alternative template-switching [42] and microhomology-mediated BIR models [43] have been proposed.
In addition to linear-type HintCl-18 satDNA arrays, we identified much rarer block-type arrays. According to our FISH results, these arrays occupy the central part of pericentromeric heterochromatin, i.e., they are concentrated in the centromeric region (Fig. 2D). We do not know whether block-type arrays have any special function, how they were formed and which of the linear or block type was the primary. We can only propose that these arrays may also result from microinversions, but they mostly resemble aged TEs (see below), especially if linker sequences similarity is considered. We also cannot exclude the possibility that in formation of block-type arrays both aged TEs in combination with microinversions were involved.
TEs are another major component of pericentromeres [3, 8, 25]. Comparative analysis of p- and g-subsets of the ONT library revealed that all major TEs (predominantly LTR retrotransposons) that are present in the H. alpinum genome were also present in the pericentromeres, but their quantities decreased significantly. The exceptions were the dominant for the Hieracium Ty3-gypsy retrotransposon Tekay [17] and the centromere-specific Ty3-gypsy retrotransposon CRM (Fig. 3A). Using the strategy of reconstructing complete elements from parts that contain complete CDs and LTRs, we reconstructed the four most common TEs in the H. alpinum genome: two of the Ty3-gypsy superfamily and two of the Ty1-copia superfamily. This made it possible to connect the TE and the main satDNA pericentromeric monomer (similar to that reported for potato [44]) and to identify waves of ancient TEs insertions. Notably, there is growing evidence of the involvement of TEs in generating a library of tandem repeats that can be dispersed throughout the genome and, in some cases, amplified into long arrays of new satDNAs [45,46,47,48]. In the H. alpinum genome, we also found that the newly discovered satDNAs, particularly Halp-aLTR-1 and Halp-aLTR-2, are fragments of the LTRs of ancient Tekay elements. Although these satDNAs are found throughout the genome, their abundance in pericentromeric heterochromatin is significantly greater, possibly due to the high conservation of chromosome pericentromeric regions [1, 8, 49], and the older the element insertion is, the greater the number of specific derivatives that can already be considered a minor satDNA family (˃10 monomers per array [50]) (Table 2). As the completion of transformation, the highest copy number exhibits the dominant HintCl-18 satDNA that show a certain similarity to GAG CD of the Angela retrotransposon (Fig. 3G, Supplementary S4). The latter made it possible to propose that the oldest determined insertions in H. alpinum pericentromeres were Ty1-copia elements (Angela), and the subsequent waves of insertions were Ty3-gypsy Tekay elements. Thus, four major waves of insertions can be proposed: (1) Ty1-Copia, Angela (source for HintCl-18 satDNA), (2) Ty3-Gypsy, Tekay (source for Halp-aLTR-2 satDNA), (3) Ty3-Gypsy, Tekay (source for Halp-aLTR-1 satDNA), and (4) Ty3-Gypsy, Tekay (modern elements). In this manner, the composition of the pericentromeres of the present H. alpinum genome was formed. A similar dominance of Ty3-gypsy Tekay elements (especially in centromeric regions) was discovered in the genome of the related species Lactuca sativa [23]. These data are consistent with the conclusions of Staton and Burke [32] that, in Asteraceae, during evolution, there has been a directional increase in the copy number of Ty3-gypsy retrotransposons. Therefore, there is a consistent accumulation of TEs derivatives of different ages of insertion in pericentromeres [4], which is understandable, given their certain “closedness”. In other words, past genomic events seem to be “frozen” in the H. alpinum pericentromere.
Over time, the typical LTR-type arrangement of derivative satDNAs may transform into a linear-type arrangement (linearization). Thus, in read 1265 of the g-subset, we detected a linear array based on the Halp-aLTR-1 monomer when, instead of common arrays with a few copies of Halp-aLTR-1 located at a distance of 8.500–10.000 bp from each other, two arrays of ≈ 50 copies on DNA segments of approximately 5.000 bp were observed (Supplementary S2). The existence of minor satDNA families in the genome has long been known [50]. In 1976, Salser et al. [51] suggested the “library hypothesis”, which proposes that in each species, certain members of the common satDNA library may be amplified and appear as major satellite, while other satellite sequences are present at low undetectable levels, and the rapid evolutionary changes undergone by satellite DNAs would, for the most part, be the quantitative amplification of one of the satellites already present at a low level in the “library” rather than a de novo appearance, as in Southern’s [52] model. Although this hypothesis does not address several important questions (i.e., how novel satellites emerge) it assumes the existence of minor satDNA families in the genomic background and their possible amplification over time. Apparently, we detected the origination of a low-copy number satDNA family, which, under certain conditions (speciation-related repeatome purification [53, 54], for example) may become dominant (whole-scale replacement of the repeat type [55]).
Based on the obtained results, we propose a simple scheme showing the possible pathways for satDNA family formation in pericentromeres (Fig. 4). After the wave of TE insertion, their fragmentation and disintegration occur [56]. At the insertion stage owing to the nesting phenomenon [57], a cloud of CD fragments is formed, which can act as a source for novel satDNA families. This process has been described for the genomes of several plant species [45, 47, 48], and the main pericentromeric satDNA family of Hieracium (HintCl-18) probably also follows this pathway (as a derivative of the Angela GAG CD). Another possible pathway is LTR-based. Thus, we detected an “empty” element (“shell elements” in the scheme), wherein LTRs are still in their positions, approximately maintaining the length of the original element, but internal CDs are already undetectable. On the basis of such shell elements (with Halp-aLTR-1 denoting LTRs) or solo LTRs that result from unequal homologous recombination between the two LTRs of a single element [8], linearization of the LTR fragment may occur (Fig. 4; Supplementary S2). Similar LTR-based satDNA formation has been described in the Zea mays genome [46]. Along with the relatively rapid decay of TEs, their slow degradation (aging element) when CDs are successively disappearing and the length of the element decreases, as evidenced by the large number of truncated elements in the pericentromeric regions of H. alpinum chromosomes (Figs. 3E and F and 4). We have already noted a decrease in TE length upon the elimination of a single CD [58]. When we considered Halp-aLTR-1 and Halp-aLTR-2 as traces of successive waves of Tekay insertions, the distance between the two LTRs (where CDs are located) was almost twice as small in the more ancient Halp-aLTR-2 (7.990 bp versus 4.415 bp). It is quite possible that arrays of long tandem repeats recently discovered in cereals are also formed similarly by aged elements [59]. This transformation may ultimately lead to the appearance of a block-type structure of the satDNA array, especially given the constant microinversions in pericentromeric heterochromatin.
The next salient point derived from our data concerns LTR turnover. Comparison of the LTRs of the reconstructed Tekay elements of different ages in the H. alpinum genome with those of the same element from M. obovata revealed significant structural and sequence differences (Fig. 3I; Supplementary S6). This finding raises questions regarding the tempo of LTRs evolution, and it seems that the transformation has occurred faster than previously thought [60]. Nevertheless, Ty3-gypsy-specific conserved motifs [33] were found in all the analyzed LTRs (Supplementary S6), which made them comparable. Consequently, a biologically reasonable dendrogram was constructed in which steps of divergence between Tekay LTRs of H. alpinum and M. obovata can be traced through two ancient LTRs: more recent LTRs that contain the Halp-aLTR-1 monomer, which is closer to the Tekay LTRs of H. alpinum, and more ancient LTRs that contain the Halp-aLTR-2 monomer, which is closer to Tekay LTRs of M. obovata (Fig. 3J).
We suggest the following putative scenario for H. alpinum pericentromere formation. During the divergence from the ancestral species, there were at least four waves of TE insertions (similar to the invasion of ATHILA transposons into the Arabidopsis thaliana centromere repeat arrays [25]). The earliest peak was from the Angela element (Ty1-copia), followed by three subsequent waves from the Tekay element (Ty3-gypsy). The dominant satDNA monomer (HintCl-18) arose from the first detected wave of TE insertions and became dominant, apparently due to subsequent TE insertions. A burst of TEs may cause a replacement of the dominant satDNA family, as we observed in other Asteraceae, particularly in South African species of the genus Pteronia [54] and in cereals [53] (the whole-scale replacement of a repeat type). We can hypothesize that a similar process occurred during the evolution of Hieraciinae since HintCl-18 was not detected in the younger, closely related genus Pilosella [15]. Moreover, several Hieracium species, such as the triploid H. telekianum [16] or diploid H. sparsum [data in preparation], have several chromosomes in their sets that lack the centromeric HintCl-18 FISH signal (similar to Phaseolus vulgaris, where two distinct sets of centromere sequences coexist on different chromosomes of the same genome [61]). Notably, of five statistically well-supported major lineages of Cichorieae lineages 4 and 5 (the latter including Hieraciianae) comprise more than 80% of the species, indicating that repeated rapid radiation and diversification must have occurred in several evolutionary stages [62,63,64], which may be linked to the TEs insertion activity that we detected.
Conclusions
-
Species-specific pericentromeric arrays of the H. alpinum genome are heterogeneous, exhibiting both linear-type and block-type structures. It is difficult to say which of them was primary. The block-type structure could be a degraded remnant of the TEs from which the linear structure arises, or alternatively, it may arise from multiple microinversions of the linear structure, or both.
-
High amounts of forward and reverse strands of the main satDNA monomer may indicate multiple microinversions in pericentromeric heterochromatin. Microinversions seem to be the main way for rapid structural evolution while maintaining nucleotide composition. Microinversions are stochastic and, consequently, may cause the uniqueness of an individual pericentromere structure.
-
The traces of TEs insertion waves remained in the pericentromeres for a long time. As recombinational cold spots, pericentromeres “keep memories” of past genomic events.
-
In pericentromeres, TEs particles can be transformed into satDNA by known mechanisms (amplification of structural parts, their fragmentation, or degradation of the whole element, etc.). However, due to the relative “closedness” of the pericentromere [65], pericentromere-specific satDNAs do not spread to the remaining genome.
-
Newly formed satDNAs constitute a background pool of minor families that, under certain conditions, can replace the dominant one(s).
Data availability
All generated data are included in this published paper and the Supplementary Information. The ONT dataset analyzed during the current study is available in the public Zenodo repository, DOI: 10.5281/zenodo.10952637; https://zenodo.org/records/10952638.
Abbreviations
- CDs:
-
Conserved protein domains
- DANTE:
-
Domain based ANnotation of Transposable Elements
- FISH:
-
Fluorescent in situ hybridization
- LTRs:
-
Long terminal repeats
- ONT:
-
Oxford Nanopore Technology
- satDNA:
-
Satellite DNA
- TEs:
-
Transposable elements
- TRF:
-
Tandem Repeats Finder
References
Jiang J, Birchler JA, Parrott WA, Dawe RK. A molecular view of plant centromeres. Trends Plant Sci. 2003;8(12):570–5.
Oliveira LC, Torres GA. Plant centromeres: genetics, epigenetics and evolution. Mol Biol Rep. 2018;45(5):1491–7.
Naish M, Henderson IR. The structure, function, and evolution of plant centromeres. Genome Res. 2024;34(2):161–78.
Plohl M, Mestrovic N, Mravinac B. Centromere identity from the DNA point of view. Chromosoma. 2014;123(4):313–25.
Henikoff S, Ahmad K, Malik HS. The centromere paradox: stable inheritance with rapidly evolving DNA. Science. 2001;293(5532):1098–102.
Thakur J, Packiaraj J, Henikoff S, Sequence. Chromatin and evolution of Satellite DNA. Int J Mol Sci 2021;22(9).
Fukagawa T. Speciation mediated by centromeres. Dev Cell. 2013;27(4):367–8.
Ma J, Wing RA, Bennetzen JL, Jackson SA. Plant centromere organization: a dynamic structure with conserved functions. Trends Genet. 2007;23(3):134–9.
Zahn KH. Compositae – Hieracium. In: Das Pflanzenreich. Edited by A. E, vol. IV. Leipzig: W. Engelmann; 1921–1923: 280.
Majeský Ľ, Krahulec F, Vašut RJ. How apomictic taxa are treated in current taxonomy: a review. Taxon. 2017;66(5):1017–40.
Mraz P, Zdvorak P. Reproductive pathways in Hieracium s.s. (Asteraceae): strict sexuality in diploids and apomixis in polyploids. Ann Bot. 2019;123(2):391–403.
Chrtek jun J, Mráz P, Zahradníčk J, Mateo G, Szelag Z. Chromosome numbers and DNA ploidy levels of selected species ofHieracium s.str. (Asteraceae). Folia Geobotanica. 2007;42(4):411–30.
Fehrer J, Slavikova R, Pastova L, Josefiova J, Mraz P, Chrtek J, Bertrand YJK. Molecular evolution and Organization of Ribosomal DNA in the Hawkweed Tribe Hieraciinae (Cichorieae, Asteraceae). Front Plant Sci. 2021;12:647375.
Fehrer J, Krak K, Chrtek J. Jr. Intra-individual polymorphism in diploid and apomictic polyploid hawkweeds (Hieracium, Lactuceae, Asteraceae): disentangling phylogenetic signal, reticulation, and noise. BMC Evol Biol. 2009;9:239.
Belyayev A, Paštová L, Fehrer J, Josefiová J, Chrtek J, Mráz P. Mapping of Hieracium (Asteraceae) chromosomes with genus-specific satDNA elements derived from next-generation sequencing data. Plant Systematics and Evolution; 2017.
Mráz P, Filipaş L, Bărbos MI, Kadlecová J, Paštová L, Belyayev A, Fehrer J. An unexpected new diploid Hieracium from Europe: integrative taxonomic approach with a phylogeny of diploid Hieracium taxa. Taxon. 2019;68(6):1258–77.
Zagorski D, Hartmann M, Bertrand YJK, Pastova L, Slavikova R, Josefiova J, Fehrer J. Characterization and Dynamics of Repeatomes in closely related species of Hieracium (Asteraceae) and their synthetic and apomictic hybrids. Front Plant Sci. 2020;11:591053.
Lin T, Xu X, Ruan J, Liu S, Wu S, Shao X, Wang X, Gan L, Qin B, Yang Y, et al. Genome analysis of Taraxacum kok-saghyz Rodin provides new insights into rubber biosynthesis. Natl Sci Rev. 2018;5(1):78–87.
Zhang B, Wang Z, Han X, Liu X, Wang Q, Zhang J, Zhao H, Tang J, Luo K, Zhai Z, et al. The chromosome-scale assembly of endive (Cichorium endivia) genome provides insights into the sesquiterpenoid biosynthesis. Genomics. 2022;114(4):110400.
Reyes-Chin-Wo S, Wang Z, Yang X, Kozik A, Arikit S, Song C, Xia L, Froenicke L, Lavelle DO, Truco MJ, et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat Commun. 2017;8:14953.
Lin T, Xu X, Du H, Fan X, Chen Q, Hai C, Zhou Z, Su X, Kou L, Gao Q, et al. Extensive sequence divergence between the reference genomes of Taraxacum kok-saghyz and Taraxacum Mongolicum. Sci China Life Sci. 2022;65(3):515–28.
Xiong W, van Workum DM, Berke L, Bakker LV, Schijlen E, Becker FFM, van de Geest H, Peters S, Michelmore R, van Treuren R et al. Genome assembly and analysis of Lactuca virosa: implications for lettuce breeding. G3 (Bethesda) 2023;13(11).
Wang K, Jin J, Wang J, Wang X, Sun J, Meng D, Wang X, Wang Y, Guo L. The complete telomere-to-telomere genome assembly of lettuce. Plant Commun 2024:101011.
Chrtek junJ. Taxonomy of theHieracium alpinum group in the sudeten Mts., the West and the Ukrainian East carpathians. Folia Geobotanica. 1997;32(1):69–97.
Wlodzimierz P, Rabanal FA, Burns R, Naish M, Primetis E, Scott A, Mandakova T, Gorringe N, Tock AJ, Holland D, et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature. 2023;618(7965):557–65.
Novak P, Neumann P, Pech J, Steinhaisl J, Macas J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics. 2013;29(6):792–3.
Mráz P, Chrtek J, Šingliarová B. Geographical parthenogenesis, genome size variation and pollen production in the arctic-alpine species Hieracium alpinum. Bot Helv. 2009;119(1):41–51.
Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9.
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–80.
Noe L, Kucherov G. YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res. 2005. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gki478. 33(Web Server issue):W540-543.
Sujiwattanarat P, Thapana W, Srikulnath K, Hirai Y, Hirai H, Koga A. Higher-order repeat structure in alpha satellite DNA occurs in New World monkeys and is not confined to hominoids. Sci Rep. 2015;5:10315.
Staton SE, Burke JM. Evolutionary transitions in the Asteraceae coincide with marked shifts in transposable element abundance. BMC Genomics. 2015;16(1):623.
Llorens C, Futami R, Covelli L, Dominguez-Escriba L, Viu JM, Tamarit D, Aguilar-Rodriguez J, Vicente-Ripolles M, Fuster G, Bernet GP, et al. The Gypsy database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res. 2011;39(Database issue):D70–74.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.
Madeira F, Pearce M, Tivey ARN, Basutkar P, Lee J, Edbali O, Madhusoodanan N, Kolesnikov A, Lopez R. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res. 2022;50(W1):W276–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkac240.
Chaisson MJ, Raphael BJ, Pevzner PA. Microinversions in mammalian evolution. Proc Natl Acad Sci U S A. 2006;103(52):19824–9.
Corbett-Detig RB, Said I, Calzetta M, Genetti M, McBroome J, Maurer NW, Petrarca V, Della Torre A, Besansky NJ. Fine-mapping complex inversion breakpoints and investigating somatic pairing in the Anopheles gambiae species Complex using proximity-ligation sequencing. Genetics. 2019;213(4):1495–511.
Huang K, Rieseberg LH. Frequency, origins, and evolutionary role of chromosomal inversions in plants. Front Plant Sci. 2020;11:296.
Ahmed HI, Heuberger M, Schoen A, Koo DH, Quiroz-Chavez J, Adhikari L, Raupp J, Cauet S, Rodde N, Cravero C, et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature. 2023;620(7975):830–8.
Braun EL, Kimball RT, Han KL, Iuhasz-Velez NR, Bonilla AJ, Chojnowski JL, Smith JV, Bowie RC, Braun MJ, Hackett SJ, et al. Homoplastic microinversions and the avian tree of life. BMC Evol Biol. 2011;11:141.
Fan W, Liu F, Jia Q, Du H, Chen W, Ruan J, Lei J, Li DZ, Mower JP, Zhu A. Fragaria mitogenomes evolve rapidly in structure but slowly in sequence and incur frequent multinucleotide mutations mediated by microinversions. New Phytol. 2022;236(2):745–59.
Walker CR, Scally A, De Maio N, Goldman N. Short-range template switching in great ape genomes explored using pair hidden Markov models. PLoS Genet. 2021;17(3):e1009221.
Potapova NA, Kondrashov AS, Mirkin SM. Characteristics and possible mechanisms of formation of microinversions distinguishing human and chimpanzee genomes. Sci Rep. 2022;12(1):591.
Gong Z, Wu Y, Koblizkova A, Torres GA, Wang K, Iovene M, Neumann P, Zhang W, Novak P, Buell CR, et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell. 2012;24(9):3559–74.
Kapitonov VV, Jurka J. Molecular paleontology of transposable elements from Arabidopsis thaliana. Genetica. 1999;107(1–3):27–37.
Sharma A, Wolfgruber TK, Presting GG. Tandem repeats derived from centromeric retrotransposons. BMC Genomics. 2013;14:142.
Meštrović N, Mravinac B, Pavlek M, Vojvoda-Zeljko T, Šatović E, Plohl M. Structural and functional liaisons between transposable elements and satellite DNAs. Chromosome Res. 2015;23(3):583–96.
Belyayev A, Josefiova J, Jandova M, Mahelka V, Krak K, Mandak B. Transposons and satellite DNA: on the origin of the major satellite DNA family in the Chenopodium genome. Mob DNA. 2020;11:20.
Talbert PB, Henikoff S. What makes a centromere? Exp Cell Res. 2020;389(2):111895.
Satovic-Vuksic E, Plohl M. Satellite DNAs-From localized to highly dispersed Genome Components. Genes (Basel) 2023;14(3).
Salser W, Bowen S, Browne D, el-Adli F, Fedoroff N, Fry K, Heindell H, Paddock G, Poon R, Wallace B, et al. Investigation of the organization of mammalian chromosomes at the DNA sequence level. Fed Proc. 1976;35(1):23–35.
Southern EM. Base sequence and evolution of Guinea-pig α-Satellite DNA. Nature. 1970;227(5260):794–8.
Belyayev A. Bursts of transposable elements as an evolutionary driving force. J Evol Biol. 2014;27(12):2573–84.
Chumová Z, Belyayev A, Mandáková T, Zeisek V, Hodková E, Šemberová K, Euston-Brown D, Trávníček P. The relationship between transposable elements and ecological niches in the Greater Cape Floristic Region: a study on the genus Pteronia (Asteraceae). Frontiers in Plant Science 2022;13.
Comai L, Maheshwari S, Marimuthu MPA. Plant centromeres. Curr Opin Plant Biol. 2017;36:158–67.
Bennetzen JL, Wang H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol. 2014;65:505–30.
SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, et al. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274(5288):765–8.
Belyayev A, Josefiova J, Jandova M, Kalendar R, Mahelka V, Mandak B, Krak K. The structural diversity of CACTA transposons in genomes of Chenopodium (Amaranthaceae, Caryophyllales) species: specific traits and comparison with the similar elements of angiosperms. Mob DNA. 2022;13(1):8.
Kapustova V, Tulpova Z, Toegelova H, Novak P, Macas J, Karafiatova M, Hribova E, Dolezel J, Simkova H. The Dark Matter of large cereal genomes: long Tandem repeats. Int J Mol Sci 2019;20(10).
SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20(1):43–5.
Iwata A, Tek AL, Richard MM, Abernathy B, Fonseca A, Schmutz J, Chen NW, Thareau V, Magdelenat G, Li Y, et al. Identification and characterization of functional centromeres of the common bean. Plant J. 2013;76(1):47–60.
Kilian N, Gemeinholzer B, Lack HW. Cichorieae. In: Systematics, evolution and biogeography of Compositae. Edited by Funk VA, Susanna A, Stuessy TE, Bayer RJ. Vienna, Austria: IAPT; 2009: 343–383.
Panero JL, Crozier BS. Macroevolutionary dynamics in the early diversification of Asteraceae. Mol Phylogenet Evol. 2016;99:116–32.
Mandel JR, Dikow RB, Siniscalchi CM, Thapa R, Watson LE, Funk VA. A fully resolved backbone phylogeny reveals numerous dispersals and explosive diversifications throughout the history of Asteraceae. Proc Natl Acad Sci U S A. 2019;116(28):14083–8.
Kalitsis P, Choo KH. The evolutionary life cycle of the resilient centromere. Chromosoma. 2012;121(4):327–40.
Funding
This work was supported by the Czech Science Foundation (GACR-22–16651 S) to JC and PM and the long-term research development project RVO 67985939 to AB.
Author information
Authors and Affiliations
Contributions
AB conceived the idea for the study. PM and JC collected the plant material. AB, BQ, BT, ZS and JJ performed or supervised the wet laboratory work. AB, SC, SL and YB contributed datasets and analyzed the data. AB wrote the manuscript and Supporting Information with input from PM, JC, BQ, YB, SС and EZ.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
S1
. Examples of HintCl-18 linear array, linear arrays with microinversions and block array
S2.
Examples of Halp-aLTR-1 and Halp-aLTR-2 arrays (Fig., 3E, F)
S3.
Reconstruction of the intact LTR retrotransposons of Hieracium alpinum genome
S4.
Examples of interrelations between TEs and satDNA in pericentromeric regions
S5.
Reconstruction of the ancient Tekay retrotransposons
S6.
Comparison of LTR sequences of Tekay elements
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Belyayev, A., de la Peña, B.Q., Corrales, S.V. et al. Analysis of pericentromere composition and structure elucidated the history of the Hieracium alpinum L. genome, revealing waves of transposable elements insertions. Mobile DNA 15, 26 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-024-00336-7
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13100-024-00336-7