Algorithm | Complexity | Running time |
---|
| Time | Space |
Drosophila
| Human |
---|
OPTIMA | O((m−c) δ3
#seeds) | O((m−c)2+c
n) |
54 m
|
36 days
|
Gentig v.2 (d) | O(#it
m
δ3
#hashes) | O(m2+n+|HashTable|) | 1.32 h | 75 days |
Gentig v.2 (tp) | | | 1.85 h | 174 days |
SOMA v.2 (v) | O(m2
n2) | O(m
n) | 1.28 years | 1,067 years |
Likelihood (d+a) | O(m
n
δ2) | O(m
n) | 22.22 h | 2.72 years |
Likelihood (d+a+t) | | | 19.62 h | 2.38 years |
Likelihood (p+a+t) | | | 41.73 h | 5.53 years |
- Running times reported are estimated from 2100 maps and extrapolated for the full datasets (82,000 Drosophila maps and 2.1 million human maps, for 100 × coverage; single-core computation on Intel x86 64-bit Linux workstations with 16GB RAM). The best column-wise running times are reported in bold. Note that including the permutation-based statistical tests for SOMA and the likelihood method would increase their runtime by a factor of greater than 100. The complexity analysis refers to map-to-sequence glocal alignment per map, where n is the total length of the in silico maps (\(\thicksim \)500,000 fragments for the human genome), m≪n is the length of the experimental map in fragments (typically 17 fragments on average), #seeds, c (default of two) and δ are as defined in the “Methods” section and #it (number of iterations), #hashes (geometric hashes found to match) and |HashTable| are as specified in [17, 24]