Fig. 1
From: Identification of transposable element families from pangenome polymorphisms

Obtaining a transposable element library from a pangenome. a A 20 kb section of a pangenome of chromosome 2R from seven high quality genomes of Drosophila melanogaster (just five shown, figure generated with odgi). Gaps in the bands represent structural variants, i.e. insertions or deletions in some of the genomes compared to the others. These structural variants can also be visualised as loops or “bubbles” in the graph representation. Here we see four structural variants each thousands of bases long, arising from four different TE insertions. b Number of bases in a pangenome of two Drosophila melanogaster genomes (A1 and A2) by whether the bases are fully aligned (shared) or they do not align, binned by the size of the insertion or mismatch. c Workflow of pantera. First it selects from the GFA file segments that are polymorphic and may hence belong to a TE. To reduce the number of false positives only segments for which there are at least two almost identical polymorphic sequences are selected (cluster in narrow size bands). Then, a less stringent clustering is performed to reduce redundancy and generate the final TE library that can be classified with any existing tools. d Annotations of the A1 Drosophila melanogaster genome obtained with RepeatMasker using three different libraries. Green: curated reference library. Pink: pantera de novo library. Grey: RepeatModeler de novo library. e Example of an LTR element (Blood) for which pantera was able to correctly identify the full element, including its LTR components (f) that in this case are not fully reported by RepeatModeler, neither as part of the full consensus (g) nor as a solo LTR element