Skip to main content

Table 1 Tentacle applied to three use cases. All use cases used the same read data consisting of 1,238,598,682 reads with a total size of 407 GiB in compressed FASTQ (2,213 GiB uncompressed) [6]. The examples were run on 30 nodes, with the cluster system login node hosting the master process. The following options were used, pBLAT: -threads=16 -minIdentity=90 -out=blast8; GEM: -T 16 -m 0.04 -e 0.04 –min-matched-bases 0.80 –granularity 2500000; USEARCH: -usearch_local -query_cov 1.0 -id 0.9 -blast6out

From: Tentacle: distributed quantification of genes in metagenomes

Use case

1

2

3

 

Reads mapped to their contigs

Reads mapped to large DB

Reads mapped to peptide DB

Mapper

pBLAT

GEM

USEARCH

Type of reference

Per sample contigs (nucleotide) [6]

BGI Refseq geneset (nucleotide) [6]

Resqu; antibiotic resistance gene

   

database (peptide) [53]

Reference size (bytes)

approx. 160 MiB per sample

3.0 GiB

1 MiB

Reference size (sequences)

6,589,348

3,305,138

3,019

Runtime (core hours)

720

3,072

296

Runtime (wall-clock)

1h 30m

6h 24m

0h 37m