Tentacle: distributed quantification of genes in metagenomes

GigaScience

Table 1 Tentacle applied to three use cases. All use cases used the same read data consisting of 1,238,598,682 reads with a total size of 407 GiB in compressed FASTQ (2,213 GiB uncompressed) [6]. The examples were run on 30 nodes, with the cluster system login node hosting the master process. The following options were used, pBLAT: -threads=16 -minIdentity=90 -out=blast8; GEM: -T 16 -m 0.04 -e 0.04 –min-matched-bases 0.80 –granularity 2500000; USEARCH: -usearch_local -query_cov 1.0 -id 0.9 -blast6out

Use case	1	2	3
	Reads mapped to their contigs	Reads mapped to large DB	Reads mapped to peptide DB
Mapper	pBLAT	GEM	USEARCH
Type of reference	Per sample contigs (nucleotide) [6]	BGI Refseq geneset (nucleotide) [6]	Resqu; antibiotic resistance gene
			database (peptide) [53]
Reference size (bytes)	approx. 160 MiB per sample	3.0 GiB	1 MiB
Reference size (sequences)	6,589,348	3,305,138	3,019
Runtime (core hours)	720	3,072	296
Runtime (wall-clock)	1h 30m	6h 24m	0h 37m

ISSN: 2047-217X