Skip to main content
Fig. 2 | GigaScience

Fig. 2

From: LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads

Fig. 2

LINKS algorithm. Contigs (three thick black rectangles) are, optionally, shredded into k-mers and those k-mers used to construct a Bloom filter (green arrows). Long reads (blue rectangles) are processed and k-mer pairs i’ and i” extracted at an interval corresponding to the input distance (−d), and window step (−t), but stored in memory (step 1, matrix on the right) only if both k-mers of a pair (dark blue arrows, green checkmarks) are found in the Bloom filter (step 1). k-mers that are not in the Bloom filter are represented by light blue arrows (red checkmark). Contigs are shredded into k-mers once more (step 2) using the same k value, but stored in memory (step 2, matrix on the right) only when its pair, identified in step 1, exists in memory. In step 3, contigs are paired when k-mers are not observed in the same sequence. Iterating through the data structure from step 1 (circular arrows) and verifying placement (Start, End) and multiplicity (Multi) from the data structure in step 2 provides contig linkages (dotted arrows), which are stored into memory (step 3, matrix on the right). In step 4, the scaffold layout is produced by incorporating all contigs into a scaffold, verifying neighbours and merging only when user-defined parameters support it (> = l minimum number of links and < = a maximum link ratio between alternate-to-primary linkage). In the final layout, the positive symbols following the contig numbers indicate that the contig orientation between the contigs is consistent and unchanged whereas the negative symbol indicates contig 3 on the reverse strand relative to contigs 1 and 2. In this example, contigs 1, 2 and 3 are merged into a single scaffold, with average gap/overlap sizes between contigs calculated using the distance (d) between the k-mers and their position (p) in the respective contigs of length L such that the gap (positive value) or overlap (negative value) length = d – ((L1-p1) + (p2 + k))

Back to article page