De novo construction of an expanded transcriptome assembly for the western tarnished plant bug, Lygus hesperus
© Tassone et al. 2016
Received: 4 September 2015
Accepted: 6 January 2016
Published: 28 January 2016
The plant bug Lygus hesperus Knight is a polyphagous pest of many economically important crops. Despite its pest status, little is known about the molecular mechanisms responsible for much of the biology of this species. Earlier Lygus transcriptome assemblies were limited by low read depth, or because they focused on specific conditions. To generate a more comprehensive transcriptome, we supplemented previous datasets with new reads corresponding to specific tissues (heads, antennae, and male reproductive tissues). This transcriptome augments current Lygus molecular resources and provides the foundational knowledge critical for future comparative studies.
An expanded, Trinity-based de novo transcriptome assembly for L. hesperus was generated using previously published whole body Illumina data, supplemented with 293 million bp of new raw sequencing data corresponding to five tissue-specific cDNA libraries and 11 Illumina sequencing runs. The updated transcriptome consists of 22,022 transcripts (average length of 2075 nt), 62 % of which contain complete open reading frames. Significant coverage of the BUSCO (benchmarking universal single-copy orthologs) dataset and robust metrics indicate that the transcriptome is a quality assembly with a high degree of completeness. Initial assessment of the new assembly’s utility revealed that the length and abundance of transcripts predicted to regulate insect physiology and chemosensation have improved, compared with previous L. hesperus assemblies.
This transcriptome represents a significant expansion of Lygus transcriptome data, and improves foundational knowledge about the molecular mechanisms underlying L. hesperus biology. The dataset is publically available in NCBI and GigaDB as a resource for researchers.
KeywordsTranscriptome Lygus hesperus Plant bug Miridae RNA-Seq Trinity
The western tarnished plant bug Lygus hesperus Knight is a polyphagous pest with an extensive host plant range including many economically important food, fiber, and seed crops . While control measures have traditionally relied on broad-spectrum insecticides, negative ecological ramifications and evolving insecticide resistance have reduced the continued viability of this approach. As a consequence, there is growing interest in biorational-based strategies; however, the development of such approaches requires a comprehensive understanding of a species’ underlying biology. Towards this end, we previously reported on the sequencing and assembly of two L. hesperus transcriptomes: a general Roche 454-based assembly , and a second Illumina-based assembly incorporating sequence information from adults under thermal stress . Those databases were developed using sequence data derived from whole bodies. Although this approach yields substantial data, whole body analysis tends to mask underrepresented genes that are expressed primarily in specific tissues or under specific conditions. To generate a more comprehensive transcriptome, here we supplement our previous thermal dataset with reads from specific tissues: heads, antennae, and male reproductive tissues. Incorporation of these new datasets expands the current L. hesperus database, provides greater depth of coverage, and enables new research for the better understanding of Lygus biology.
All samples and tissues were derived from an L. hesperus laboratory colony maintained at the United States Department of Agriculture-Agricultural Research Service (USDA-ARS) Arid Land Agricultural Research Center (ALARC) in Maricopa, Arizona, USA. The colony was reared at 27–29 °C under 20 % humidity with an L14:D10 photoperiod, and fed an artificial diet . Nymphs and adults used for RNA preparation were from eggs deposited in agar oviposition packets and maintained as described previously . Our initial Illumina-based transcriptome  was generated using 10-day old adults exposed for 4 h to one of three temperatures (4 °C, 25 °C, or 39 °C). To provide deeper coverage of transcripts encoding proteins functioning in olfaction, central nervous system-mediated behaviors, and male reproduction, sex-specific antennae, heads, and male accessory glands were dissected and stored at −20 °C in RNALater (Ambion/Life Technologies, Carlsbad, CA). The antennae samples represent ~500 unmated 7–9-day old adult males, and ~600 unmated 7–9-day old adult females. Heads (8–12 per stage/age per replicate) without antennae were collected across three biological replicates from 3rd instar nymphs, 4th instar nymphs, late 5th instar nymphs, and unmated adults of both genders at 1, 3, 7, 10, and 15 days post-eclosion. Accessory glands (30 per replicate) were dissected in phosphate-buffered saline from 7 to 8-day-old adult males 24 h post-mating and from similarly aged unmated cohorts. Total RNA extraction and library generation (TruSeq RNA Sample Preparation Kit v2; Illumina Inc., San Diego, USA) were performed as described previously  at the University of Arizona Genomics Center. All samples were sequenced using an Illumina HiSeq2000 or HiSeq2500 in Rapid Run mode (paired-end 100-bp reads).
Accession numbers for L. hesperus sequence reads and assembled transcripts
Short Read Archive
SRX483635, SRX483674, SRX483877
SRX483950, SRX484037, SRX484042
SRX484076, SRX484077, SRX484079
SRX1072689, SRX1155625, SRX1155629
Data used for assembly corresponded to the ~145 million bp of sequence reads generated previously , and 293 million bp of new data from 11 Illumina runs covering five tissue-specific libraries. Prior to assembly, the four datasets (thermal-based, head, antennae, and accessory gland) were concatenated, and read abundance was normalized to 50X coverage using the in silico normalization tool in Trinity to improve assembly time and minimize memory requirements. Filtering and normalization reduced the dataset to 15 Gb, comprising approximately 32 million normalized read pairs, which were then assembled using default parameters in Trinity (r2014_07-17). Transcript expression levels were estimated with RSEM  and open reading frames (ORFs) were predicted using Transdecoder . Hmmer3 was used to identify additional ORFs matching Pfam-A domains. Following transcriptome assembly, reads were filtered, sorted, and prepared for NCBI transcriptome shotgun assembly (TSA) submission as previously described .
Functional annotation was performed at the peptide level using a custom pipeline  that defines protein products and assigns transcript names. Predicted proteins/peptides were analyzed using InterProScan5, which searched all available databases including Gene Ontology (GO) . BLASTp analysis of the resulting proteins was performed with the UniProt Swiss Prot database (downloaded 11 February 2015). Annie , a program that cross-references SwissProt BLAST and InterProScan5 results to extract qualified gene names and products, was used to generate the transcript annotation file. The resulting .gff3 and .tbl files were further annotated with functional descriptors in Transvestigator .
Quality, completeness and depth of the comprehensive L. hesperus transcriptome
Transcriptome assembly and annotation statistics compared with previous Lygus transcriptomes
L. lineolaris a
L. hesperus (454)b
L. hesperus (thermal)c
L. hesperus (current)d
Total no. read pairs
Normalized reads (in silico normalization)
Total no. transcripts
Average transcript length
Total assembled bases (all transcripts)
Total assembled bases (longest transcript per unigene)
N50 (all transcripts)
N50 (longest transcript per unigene)
Proteins with complete ORF (%)
13,689 (62.1 %)
No. transcripts with a BLAST hit
3126 (44.9 %)
19,393 (54 %)
16,942 (76.9 %)
No. transcripts with GO term
2196 (31.5 %)
7898 (21 %)
12,114 (54.9 %)
3705 (22.2 %)
14,575 (66.1 %)
BUSCOa analysis of assembly completeness
L. hesperus Transcriptomes
Select Insect Transcriptomesd
Nilaparvata lugens (GI:604923024)
Musca domestica (GI:510208131)
Spodoptera exigua (GI:556694752)
Drosophila serrata (GI:570485056)
Select Insect Genomesd
Pediculus humanus (PhumU2)
Acyrthosiphon pisum (GCA_000142985.2)
Drosophila melanogaster (Dmel_r5.55)
Availability of supporting data
The filtered and annotated transcriptome was deposited at GenBank as a TSA under the accession GDHC01000000, associated with BioProject PRJNA284294. NCBI accession identifiers for all of the associated SRA, Biosample, and Bioproject data repositories are listed in Table 1. Datasets further supporting the results of this article are available in the GigaScience repository, GigaDB .
arid land agricultural research center
bench-marking universal single-copy ortholog
G protein-coupled receptor
open reading frame
short read archive
transcriptome shotgun assembly
United States Department of Agriculture-Agricultural Research Service
extreme science and engineering discovery environment
The authors thank both Daniel Langhorst (ALARC) and Lynn Forlow Jech (ALARC) for technical support, as well as Brooks Silversmith (ALARC) and Anna Cervantes (ALARC) for maintaining the L. hesperus colony. The research described in this manuscript was partially supported by funds from Cotton Inc. (CSB, project no. 12–373). Bioinformatic analysis was performed on computing resources at the USDA-ARS Daniel K. Inouye Pacific Basin Agricultural Research Center (Moana cluster; Hilo, HI) and the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575XSEDE under allocation TG-MCB140032 to SMG. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. USDA is an equal opportunity provider and employer.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Scott DR. An annotated listing of host plants of Lygus hesperus Knight. Entomol Soc Am Bull. 1977;23:19–22.Google Scholar
- Hull JJ, Geib SM, Fabrick JA, Brent CS. Sequencing and de novo assembly of the western tarnished plant bug (Lygus hesperus) transcriptome. PLoS ONE. 2013;8:e55105.PubMedPubMed CentralView ArticleGoogle Scholar
- Hull JJ, Chaney K, Geib SM, Fabrick JA, Brent CS, Walsh D, et al. Transcriptome-based identification of ABC transporters in the western tarnished plant bug Lygus hesperus. PLoS ONE. 2014;9:e113046.PubMedPubMed CentralView ArticleGoogle Scholar
- Debolt JW. Meridic diet for rearing successive generations of Lygus hesperus. Ann Entomol Soc Am. 1982;75:119–22.View ArticleGoogle Scholar
- Brent CS, Hull JJ. Characterization of male-derived factors inhibiting female sexual receptivity in Lygus hesperus. J Insect Physiol. 2014;60:104–10.PubMedView ArticleGoogle Scholar
- Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.PubMedPubMed CentralView ArticleGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Prot. 2013;8:1494–512.View ArticleGoogle Scholar
- Sim SB, Calla B, Hall B, DeRego T, Geib SM. Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae. GigaScience. 2015;4:14.PubMedPubMed CentralView ArticleGoogle Scholar
- Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–40.PubMedPubMed CentralView ArticleGoogle Scholar
- Tate R, Hall B, DeRego, T. Annie the functional annotator – initial release. ZENODO; 2014. Available from: http://doi.org/10.5281/zenodo.10470.
- Magalhaes LC, van Kretschmar JB, Donohue KV, Roe RM. Pyrosequencing of the adult tarnished plant bug, Lygus lineolaris, and characterization of messages important in metabolism and development. Entomol Exp Appl. 2013;146:364–78.View ArticleGoogle Scholar
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2. doi:10.1093/bioinformatics/btv351.PubMedView ArticleGoogle Scholar
- Tanaka Y, Suetsugu Y, Yamamoto K, Noda H, Shinoda T. Transcriptome analysis of neuropeptides and G-protein coupled receptors (GPCRs) for neuropeptides in the brown planthopper Nilaparvata lugens. Peptides. 2014;53:125–33.PubMedView ArticleGoogle Scholar
- Megy K, Emrich SJ, Lawson D, Campbell D, Dialynas E, Hughes DST, et al. VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics. Nucleic Acids Res. 2012;40(Database issue):D729–34.PubMedPubMed CentralView ArticleGoogle Scholar
- Latorre-Estivalis JM, de Oliveira ES, Beiral Esteves B, Santos Guimarães L, Neves Ramos M, Lorenzo MG. Patterns of expression of odorant receptor genes in a Chagas disease vector. Insect Biochem Mol Biol. Forthcoming 2015. doi:10.1016/j.ibmb.2015.05.002
- Tassone, E, E; Geib, S, M; Hall, B; Fabrick, J, A; Brent, C, S; Hull, J, J (2016): Supporting data for “De novo construction of an expanded transcriptome assembly for the western tarnished plant bug, Lygus hesperus”. GigaScience Database. http://dx.doi.org/10.5524/100172