Skip to main content

Table 1 Transcriptome assembly and annotation statistics compared with other Tephritid transcriptomes and the Drosophila melanogaster genome

From: Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae

Species

B. cucurbitae

B. dorsalis a

C. capitata b

D. melanogaster c

Number of read pairs used in assembly (SRA accession number)

    

Egg (SRA: SRS691534)

43741314

12462204

-

-

Larvae (SRA: SRS691533)

51568835

11753084

-

-

Pupae (SRA: SRS691532)

47093178

13291147

93256673

-

Adult (SRA: SRS691531)

46515243

47250123

96929532

-

Total

188918570

84756558

190186205

-

Normalized reads (in silico normalization)

12792085

7796491

17217414

-

Unfiltered assembly

    

Number of unigenes (or Drosophila genes)

50220

47216

118793

-

N50 longest transcript/unigene

2191

1882

1187

-

Sum longest transcript/unigene (Mb)

49.63

40.20

81.56

-

Number of transcripts

76688

80345

190958

-

N50 transcript length (bp)

2626

2802

2686

-

Sum transcript length (Mb)

100.20

109.48

236.18

-

Transcripts per unigene

1.53

1.70

1.61

-

GC %

38.10

39.11

36.21

-

Filtered de novo assembly or current Drosophila release

   

Number of unigenes

10425

10784

10741

15504

N50 unigene length (longest transcript/unigene)

3464

3043

3383

2979

Sum longest transcript/unigene (Mb)

28.12

24.46

28.34

30.53

Number of transcripts

17654

23539

21761

25205

N50 transcript length (bp)

3477

3460

3913

3633

Sum transcript length (Mb)

48.28

62.06

66.65

68.47

Isoforms per unigene

1.69

2.18

2.03

1.63

GC %

40.17

40.32

39.41

49.70

N50 protein length (amino acids)

323

301

310

370

Number of proteins with complete ORF (%)

12936 (73.2)

13017 (55.3)

15740 (72.3)

-

Annotation statistics

    

Number of proteins with PFAM domains identified

13029

16612

13646

-

Number of proteins with Gene Ontology Terms

10640

-

13648

-

Number of proteins with gene names

15956

17093

15841

-

Number of proteins with significant hit to Drosophila proteinsd

16070

20713

19245

-

  1. aData from Geib et al., 2014 [2]; bData from Calla et al., 2014 [8]; cData from Flybase r6.03 [11]; dBLASTP hit with e-value cutoff 1e-5.