Skip to main content

Table 4 Datasets used to evaluate the efficiency and impact of LoRDEC read correction on the assembly

From: Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

 

E. coli

Yeast

 

Reference organism

   

Name

Escherichia coli

Saccharomyces cerevisiae

 

Strain

K-12 substr. MG1655

W303

 

Reference sequence

NC_000913

S288C

 

Genome size

4.6 Mbp

12 Mbp

 

PacBio Data

   

Accession number

PacBio reads

DevNet PacBio

 

Number of reads

75152

261964

 

Average read length

2415

5891

 

Max. read length

19416

30164

 

Number of bases

181 Mbp

1.5 Gbp

 

Coverage

30 ×

129 ×

 

Illumina Data

   

Accession number

Illumina reads

SRR567755

 

Number of reads (millions)

11

2.25

 

Read length

114

100

 

Number of bases

1.276 Gbp

225 Mbp

 

Coverage

277 ×

18 ×

 
  1. For the short read data of yeast, we used only half of the available reads. The reference yeast genome is available from [40]