Skip to main content


Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Figure 5 | GigaScience

Figure 5

From: A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data

Figure 5

Ratios of the short reads alignment time to the communication time. All of the runs use Dataset S as an input. Two HPC scenarios are shown as ‘HPC SLURM’ and ‘HPC random’, which correspond to standard SLURM behavior and a modified behavior that allocates nodes from random racks. Linear fit was done with the least-squares method. One can see two defined linear regions for the HPC approach with very different tangents, which are caused by the network saturation. For the HPC platform, as the number of nodes increases a greater amount of time is spent on communication than on mapping. The Hadoop approach reveals better scaling properties.

Back to article page