Skip to main content

Table 2 Identity-by-state (Hamming distance) and complete linkage clustering times (sec)

From: Second-generation PLINK: rising to the challenge of larger and richer datasets

Calculation

Dataset

Machine

PLINK 1.07

PLINK 1.90

Ratio

IBS matrix only

synth1p

Mac-2

2233.6

1.9

1.2 k

  

Mac-12

1320.4

1.2

1.1 k

  

Linux32-8

1937.2

2.8

690

  

Linux64-512

1492

3.7

400

  

Win32-2

3219.0

7.2

450

  

Win64-2

2674.4

1.5

1.8 k

 

synth2p

Mac-2

190 k

118.8

1.6 k

  

Mac-12

99 k

23.5

4.2 k

  

Linux32-8

152.5 k

214.3

710

  

Linux64-512

98 k

25.3

3.9 k

  

Win32-2

270 k

654.5

410

  

Win64-2

200 k

104.6

1.9 k

 

chr1snp

Mac-2

26 k

17.5

1.5 k

  

Mac-12

13.4 k

12.6

1.06 k

  

Linux32-8

18.4 k

30.9

600

  

Linux64-512

14 k

43.1

320

  

Win32-2

32.7 k

95.9

341

  

Win64-2

26 k

15.3

1.7 k

Basic clustering

synth1p

Mac-2

2315.7

2.7

860

  

Mac-12

1317.9

2.0

660

  

Linux32-8

1898.7

4.1

460

  

Linux64-512

1496

4.5

330

  

Win32-2

3301.7

9.1

360

  

Win64-2

2724.5

1.9

1.4 k

 

synth2p

Mac-2

230 k

245.6

940

  

Mac-12

140 k

123.9

1.1 k

  

Linux32-8

197.1 k

395.6

498

  

Linux64-512

125 k

143.3

872

  

Win32-2

440 k

976.7

450

  

Win64-2

270 k

127.9

2.1 k

 

chr1snp

Mac-2

26 k

18.4

1.4 k

  

Mac-12

13.6 k

13.5

1.01 k

  

Linux32-8

18.5 k

33.4

554

  

Linux64-512

14 k

44.2

320

  

Win32-2

33.2 k

95.0

349

  

Win64-2

26 k

15.8

1.6 k

IBD report

synth1p

Mac-2

2230.1

12.4

180

  

Mac-12

1346.2

2.4

560

  

Linux32-8

2019.9

12.4

163

  

Linux64-512

1494

5.0

300

  

Win32-2

3446.3

42.2

81.7

  

Win64-2

2669.8

15.1

177

 

synth2p

Mac-2

190 k

447.1

420

  

Mac-12

99 k

50.3

2.0 k

  

Linux32-8

161.4 k

618.7

261

  

Linux64-512

98 k

57.4

1.7 k

  

Win32-2

270 k

1801.1

150

  

Win64-2

200 k

541.0

370

IBD report

chr1snp

Mac-2

26 k

24.8

1.0 k

  

Mac-12

13.4 k

14.6

918

  

Linux32-8

18.5 k

53.5

346

  

Linux64-512

14 k

46.5

300

  

Win32-2

33.1 k

199.2

166

  

Win64-2

26 k

25.1

1.0 k

  1. Computation of the basic distance matrix is expensive, but has an “embarrassingly parallel” structure. Clustering requires an additional serial step, while the identity-by-descent report includes a pairwise population concordance test which does not benefit from bit-level parallelism, but speedups for both remain greater than 100x on 64-bit systems.