This article has Open Peer Review reports available.
Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae
© Sim et al.; licensee BioMed Central. 2015
Received: 31 December 2014
Accepted: 9 March 2015
Published: 31 March 2015
Bactrocera cucurbitae is a serious global agricultural pest. Basic genomic information is lacking for this species, and this would be useful to inform methods of control, damage mitigation, and eradication efforts. Here, we have sequenced, assembled, and annotated a comprehensive transcriptome for a mass-rearing sexing strain of this species. This forms a foundational genomic and transcriptomic resource that can be used to better understand the physiology and biochemistry of this insect as well as being a useful tool for population genetics.
A transcriptome assembly was constructed containing 17,654 transcript isoforms derived from 10,425 unigenes. This transcriptome size is similar to reports from other Tephritid species and probably includes about 70-80% of the protein-coding genes in the genome. The dataset is publicly available in NCBI and GigaDB as a resource for researchers.
Foundational knowledge on the protein-coding genes in B. cucurbitae will lead to improved resources for this species. Through comparison with a model system such as Drosophila as well as a growing number of related Tephritid transcriptomes, improved strategies can be developed to control this pest.
KeywordsBactrocera cucurbitae Translocation RNA-Seq White-pupae Tephritidae Melon fly SIT Sterile insect technique
Bactrocera cucurbitae (Diptera: Tephritidae) is an important agricultural pest attacking many fruits and vegetables in tropical and subtropical regions. To maximize efforts to control, mitigate the damage, and maintain eradication of this invasive species in the mainland United States, we must accumulate foundational information that describes the many aspects of the biology of this species. The data we present is a comprehensive transcriptome of the T1 translocated sexing strain of B. cucurbitae and represents genes expressed across all major life stages . The white pupae trait of this line is sex-linked and makes this strain conducive to large-scale rearing and male-only mass release of sterilized flies for the sterile insect technique (SIT).
Samples were derived from the T1 white pupae translocated line of B. cucurbitae maintained at the USDA-ARS Daniel K. Inouye Pacific Basin Agricultural Research Center Insectary in Hilo, Hawaii, USA . To generate samples that are representative of a broad range of life stages and ages, daily samples were collected from eggs (0–2 days old), larvae (~0-10 day post-hatch), pupae (0–10 days post-pupation) and adults (both unmated and mated males and females) as previously described . Total RNA was extracted from samples across each stage, and then representative, stage-specific samples were generated through pooling to generate four RNA samples for sequencing which are described as NCBI BioSamples SAMN03010448- SAMN03010451 associated with BioProject PRJNA259566. For each sample, RNA was extracted using the Zymo Quick-RNA miniprep extraction kit (Zymo Research, Irvine, CA) following recommended procedures for whole-tissue extraction. The resulting RNA was quantified with the Qubit Broad Range RNA assay on a Qubit 2.0 fluorimeter (Life Technologies, Carlsbad, CA), and size and quality determined with an RNA 6000 nano chip on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA).
Total RNA was sent to the Beijing Genomics Institute (BGI Americas, at University of California, Davis, USA) and eukaryotic mRNA libraries were prepared using TruSeq technology (TruSeq RNA Kit v2). The resulting four libraries (egg, larvae, pupae, and adult) were barcoded and sequenced together on a single lane of Illumina HiSeq 2000, generating approximately 39.31 Gb of raw data from approximately 196 million 2 × 100 bp paired reads. These raw reads were filtered by quality and for adaptor contamination using an in-house pipeline at BGI, targeting reads containing adaptor, reads with more than 5% ambiguous bases, or reads with greater than 50% of bases with Phred quality score below 10. This reduced the reads to approximately 37.78 Gb after filtering, removing approximately 4% of the data. This filtered data was used as the input to the de novo assembly and also deposited at GenBank under the SRA accessions SRS691531- SRS691534.
Transcriptome assembly and annotation statistics compared with other Tephritid transcriptomes and the Drosophila melanogaster genome
B. dorsalis a
C. capitata b
D. melanogaster c
Number of read pairs used in assembly (SRA accession number)
Egg (SRA: SRS691534)
Larvae (SRA: SRS691533)
Pupae (SRA: SRS691532)
Adult (SRA: SRS691531)
Normalized reads (in silico normalization)
Number of unigenes (or Drosophila genes)
N50 longest transcript/unigene
Sum longest transcript/unigene (Mb)
Number of transcripts
N50 transcript length (bp)
Sum transcript length (Mb)
Transcripts per unigene
Filtered de novo assembly or current Drosophila release
Number of unigenes
N50 unigene length (longest transcript/unigene)
Sum longest transcript/unigene (Mb)
Number of transcripts
N50 transcript length (bp)
Sum transcript length (Mb)
Isoforms per unigene
N50 protein length (amino acids)
Number of proteins with complete ORF (%)
Number of proteins with PFAM domains identified
Number of proteins with Gene Ontology Terms
Number of proteins with gene names
Number of proteins with significant hit to Drosophila proteinsd
Annotation was performed at the peptide level, and annotations were used to generate a transcript name and product in addition to functional annotations. All predicted proteins were subjected to analysis through InterProScan5, searching all available databases including Gene Ontology and InterPro term lookup . In addition, proteins were subjected to a BLASTP search against the UniProt SwissProt database (downloaded 10 November 2013). Transcripts were annotated with UniProt and InterProScan results using Annie , a program that extracts qualified gene names and products by cross-referencing SwissProt BLAST hits and performs database cross-referencing from InterProScan5 results. The resulting annotation file was provided to Transvestigator, described above, to include functional annotations in the resulting.gff3 and .tbl files. The filtered and annotated transcriptome was deposited at GenBank as a TSA under the accession GBXI00000000 associated with BioProject PRJNA259566. Annotation statistics are listed in Table 1.
Comparison of B. cucurbitae transcriptome with other published datasets
Availability of supporting data
The filtered and annotated transcriptome has been deposited at GenBank as a transcriptome shotgun assembly (TSA) under the accession GBXI00000000 associated with BioProject PRJNA259566. Supporting data and analysis results also available from the GigaScience GigaDB database .
We thank Steven Tam for assistance in colony rearing and fruit fly sample collections used in this study. Funding was provided by USDA-ARS and SBS and BH were supported by USDA Farm Bill Project 3.0251. Bioinformatic analysis was performed on computing resources at USDA-ARS Pacific Basin Agricultural Research Center (Moana cluster; Hilo, HI, http://moana.dnsalias.org) and the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575XSEDE utilizing allocation TG-MCB140032 to SMG. Opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the USDA. USDA is an equal opportunity provider and employer.
- Faircloth BC, Branstetter MG, White ND, Brady SG. Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera. Mol Ecol Resour. 2014;doi:10.1111/1755-0998.12328.Google Scholar
- Geib S, Calla B, Hall B, Hou S, Manoukis N. Characterizing the developmental transcriptome of the oriental fruit fly, Bactrocera dorsalis (Diptera: Tephritidae) through comparative genomic analysis with Drosophila melanogaster utilizing modENCODE datasets. BMC Genomics. 2014;15:942.View ArticlePubMedPubMed CentralGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotech. 2011;29:644–52.View ArticleGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–512.View ArticlePubMedGoogle Scholar
- Li B, Dewey C. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.View ArticlePubMedPubMed CentralGoogle Scholar
- Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011;7:e1002195.View ArticlePubMedPubMed CentralGoogle Scholar
- Theodore DeRego, Brian Hall, Reuben Tate, Scott Geib (2014): Transvestigator early release. ZENODO. http://doi.org/10.5281/zenodo.10471
- Calla B, Hall B, Hou S, Geib S. A genomic perspective to assessing quality of mass-reared SIT flies used in Mediterranean fruit fly (Ceratitis capitata) eradication in California. BMC Genomics. 2014;15:98.View ArticlePubMedPubMed CentralGoogle Scholar
- Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30(9):1236–40. doi:10.1093/bioinformatics/.View ArticlePubMedPubMed CentralGoogle Scholar
- Tate, Reuben, Brian Hall, Theodore DeRego (2014): Annie the functional annotator - initial release. ZENODO. http://doi.org/10.5281/zenodo.10470
- Ashburner M, Drysdale R. FlyBase—The Drosophila genetic database. Development. 1994;120:2077–9.PubMedGoogle Scholar
- Sim, SB; Calla, B; Hall, B; DeRego, T, Geib, SM. (2015) Supporting data and materials from "Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae". GigaScience Database. http://doi.org/10.5281/10.5524/100135
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.