From: A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data
Dataset
Organism
Size in Gb
I
A.thaliana
1.4
S
A.thaliana, the artificial dataset
100.0
created using Samtools package