Skip to main content

Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Table 2 Identity-by-state (Hamming distance) and complete linkage clustering times (sec)

From: Second-generation PLINK: rising to the challenge of larger and richer datasets

Calculation Dataset Machine PLINK 1.07 PLINK 1.90 Ratio
IBS matrix only synth1p Mac-2 2233.6 1.9 1.2 k
   Mac-12 1320.4 1.2 1.1 k
   Linux32-8 1937.2 2.8 690
   Linux64-512 1492 3.7 400
   Win32-2 3219.0 7.2 450
   Win64-2 2674.4 1.5 1.8 k
  synth2p Mac-2 190 k 118.8 1.6 k
   Mac-12 99 k 23.5 4.2 k
   Linux32-8 152.5 k 214.3 710
   Linux64-512 98 k 25.3 3.9 k
   Win32-2 270 k 654.5 410
   Win64-2 200 k 104.6 1.9 k
  chr1snp Mac-2 26 k 17.5 1.5 k
   Mac-12 13.4 k 12.6 1.06 k
   Linux32-8 18.4 k 30.9 600
   Linux64-512 14 k 43.1 320
   Win32-2 32.7 k 95.9 341
   Win64-2 26 k 15.3 1.7 k
Basic clustering synth1p Mac-2 2315.7 2.7 860
   Mac-12 1317.9 2.0 660
   Linux32-8 1898.7 4.1 460
   Linux64-512 1496 4.5 330
   Win32-2 3301.7 9.1 360
   Win64-2 2724.5 1.9 1.4 k
  synth2p Mac-2 230 k 245.6 940
   Mac-12 140 k 123.9 1.1 k
   Linux32-8 197.1 k 395.6 498
   Linux64-512 125 k 143.3 872
   Win32-2 440 k 976.7 450
   Win64-2 270 k 127.9 2.1 k
  chr1snp Mac-2 26 k 18.4 1.4 k
   Mac-12 13.6 k 13.5 1.01 k
   Linux32-8 18.5 k 33.4 554
   Linux64-512 14 k 44.2 320
   Win32-2 33.2 k 95.0 349
   Win64-2 26 k 15.8 1.6 k
IBD report synth1p Mac-2 2230.1 12.4 180
   Mac-12 1346.2 2.4 560
   Linux32-8 2019.9 12.4 163
   Linux64-512 1494 5.0 300
   Win32-2 3446.3 42.2 81.7
   Win64-2 2669.8 15.1 177
  synth2p Mac-2 190 k 447.1 420
   Mac-12 99 k 50.3 2.0 k
   Linux32-8 161.4 k 618.7 261
   Linux64-512 98 k 57.4 1.7 k
   Win32-2 270 k 1801.1 150
   Win64-2 200 k 541.0 370
IBD report chr1snp Mac-2 26 k 24.8 1.0 k
   Mac-12 13.4 k 14.6 918
   Linux32-8 18.5 k 53.5 346
   Linux64-512 14 k 46.5 300
   Win32-2 33.1 k 199.2 166
   Win64-2 26 k 25.1 1.0 k
  1. Computation of the basic distance matrix is expensive, but has an “embarrassingly parallel” structure. Clustering requires an additional serial step, while the identity-by-descent report includes a pairwise population concordance test which does not benefit from bit-level parallelism, but speedups for both remain greater than 100x on 64-bit systems.