Table 4 Datasets used to evaluate the efficiency and impact of LoRDEC read correction on the assembly

From: Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

  E. coli Yeast  
Reference organism    
Name Escherichia coli Saccharomyces cerevisiae  
Strain K-12 substr. MG1655 W303  
Reference sequence NC_000913 S288C  
Genome size 4.6 Mbp 12 Mbp  
PacBio Data    
Accession number PacBio reads DevNet PacBio  
Number of reads 75152 261964  
Average read length 2415 5891  
Max. read length 19416 30164  
Number of bases 181 Mbp 1.5 Gbp  
Coverage 30 × 129 ×  
Illumina Data    
Accession number Illumina reads SRR567755  
Number of reads (millions) 11 2.25  
Read length 114 100  
Number of bases 1.276 Gbp 225 Mbp  
Coverage 277 × 18 ×  
  1. For the short read data of yeast, we used only half of the available reads. The reference yeast genome is available from [40]