Bacterial and viral identification and differentiation by amplicon sequencing on the MinION nanopore sequencer
© Kilianski et al.; licensee BioMed Central. 2015
Received: 24 December 2014
Accepted: 27 February 2015
Published: 26 March 2015
The MinION™ nanopore sequencer was recently released to a community of alpha-testers for evaluation using a variety of sequencing applications. Recent reports have tested the ability of the MinION™ to act as a whole genome sequencer and have demonstrated that nanopore sequencing has tremendous potential utility. However, the current nanopore technology still has limitations with respect to error-rate, and this is problematic when attempting to assemble whole genomes without secondary rounds of sequencing to correct errors. In this study, we tested the ability of the MinION™ nanopore sequencer to accurately identify and differentiate bacterial and viral samples via directed sequencing of characteristic genes shared broadly across a target clade.
Using a 6 hour sequencing run time, sufficient data were generated to identify an E. coli sample down to the species level from 16S rDNA amplicons. Three poxviruses (cowpox, vaccinia-MVA, and vaccinia-Lister) were identified and differentiated down to the strain level, despite over 98% identity between the vaccinia strains. The ability to differentiate strains by amplicon sequencing on the MinION™ was accomplished despite an observed per-base error rate of approximately 30%.
While nanopore sequencing, using the MinION™ platform from Oxford Nanopore in particular, continues to mature into a commercially available technology, practical uses are sought for the current versions of the technology. This study offers evidence of the utility of amplicon sequencing by demonstrating that the current versions of MinION™ technology can accurately identify and differentiate both viral and bacterial species present within biological samples via amplicon sequencing.
KeywordsMinION™ Oxford Nanopore NGS Nanopore Amplicon Pathogen detection Poxvirus Escherichia coli Whole genome sequencing Long-read sequencing
Nanopore-based genomic sequencing has been a concept in development since the first observations of protein nanopores as a means of differentiating nucleotides on a single strand of DNA [1-3]. Protein and solid-state nanopores have been presented as a potential way to determine the sequence of both nucleic acid and protein, and there have been multiple demonstrations of both in the last decade [4-6]. The recent release of the MinION™ by Oxford Nanopore Technologies to a group of alpha-testers makes it the first nanopore-based genomic sequencer to be available for independent assessment. The MinION™ utilizes a customized protein nanopore and combines the sequencing flow cell and electronics into a palm-sized device that is powered and operated via a USB 3.0 connection.
To date, there have been four peer-reviewed publications examining the capabilities of the MinION™ with respect to whole genome sequencing. An initial report by Mikheyev and Tin detailed the performance of the first generation MinION™ reagents and protocols when sequencing purified lambda phage DNA over a 36 hour sequencing run . To demonstrate the utility of the MinION™ for bacterial whole-genome sequencing projects, Quick et al. sequenced E. coli K-12 DNA for 72 hours . The MinION™ was also used to decipher the gene organization of a chromosomally-inserted antibacterial resistance cassette in Salmonella Typhi . Finally, alignment and SNV software was developed and then utilized to resolve the copy number of a cancer-testis gene family . The first two experiments primarily assessed the per-base accuracy of randomly sheared shotgun sequence reads against a known reference genome, rather than the utility of the MinION™ for de novo identification of microbial species and strains. The third study utilized nanopore sequencing with Illumina HiSeq data for error correction. The long reads generated by the MinION™ were instrumental for inferring the gene organization, but only after Illumina sequence data had constructed a scaffold for read mapping. The most recent study utilized only MinION™ reads to resolve human gene copy number while demonstrating the utility of novel alignment approaches for nanopore data.
The main advantages of the MinION™ are its portability and small footprint, easy and quick sample prep, long reads, and flexible run time for data generation. This study targets an application that benefits from these characteristics and determines whether the MinION™ is able to differentiate between closely related viral and bacterial species and strains using amplicon sequencing and 6 hour sequencing runtimes.
Raw statistics for bacterial and viral sequencing runs on the MinION™
Number of reads
Number of bases
Mean read length
Number of reads aligned
Mean aligned read length
Mean alignment identity
Number of reads aligned
Mean aligned read length
Mean alignment identity
MinION™ sequencing libraries were prepared (see Methods) from PCR amplicons generated from cultures of E. coli as well as from three poxviruses of varying relatedness (cowpox and vaccinia strains MVA and Lister). From the four datasets that were produced, numbers of reads ranged from 296 – 1,335 with a mean read length of 770 bp - 2,863 bp (Table 1). These reads were generated on R7.0 flow-cell chemistry (Oxford Nanopore) and QC analysis prior to the runs determined between 50–100 active channels per sequencing run. A minority of the data output by the MinION™ aligned well to the known reference sequence using BLASR , ranging from 18 – 270 reads aligning with a mean alignment length of 353 bp - 651 bp. Using LAST  as the read aligner generated more aligning reads, 47–751, with a greater read length average, 750-1143 bp. The average identity of each alignment ranged from 69.9% – 70.8% using BLASR, with little variation between datasets (Table 1) to 61.6-66.1% with LAST. The origin of the non-aligning reads output by the MinION™ is not known. Most do not align to the CS control DNA and are unidentifiable by BLAST search, with only a small handful aligning to potentially contaminating references. The availability of the datasets  will aid in the identification of unaligned reads as the analysis tools for nanopore sequencing continue to be developed.
BLASR/LAST read mapping statistics to genus and species levels from bacterial and viral reads
Reads aligning to genus
Bases aligning to genus
Reads aligning to species
Bases aligning to species
Reads aligning to non-genus
Bases aligning to non-genus
Bases aligning to genus/bases aligning to non-genus
Amplicon sequencing has been utilized as an approach in microbiome studies, cancer mutation identification, and pathogen detection [15-17]. Having the ability to sequence amplicons in a mobile, small-footprint platform is attractive when collecting and analyzing samples in the field, and these qualities are also desired as sequencing methods move towards clinical applications. These methods, data, and results represent a practical and novel application for the current stage of MinION™ nanopore sequencing technology. These data were generated on R7.0 chemistry flow cells, which resulted in as little as 10% of the channels within the flow cell being operational. This is lower than what has been reported by others , but it has been suggested that the quick evolution of the flow-cell development might have led to lower yields from R7.0 flow-cell chemistry. It is also possible that the overnight incubation of libraries (Methods) with the supplied motor protein lowered data yield from the amplicon experiments. Recent preliminary reports within the MinION™ alpha-testing community suggest that the latest versions of flow-cell chemistry, in addition to newer DNA preparation methodology, yield better pore usage and generate more data. As this is the first report of nanopore sequenced amplicons, hopefully this will set a benchmark for the improvement of data yield and utility from these types of samples. We expect that as the MinION™ technology improves, amplicon sequencing will generally become cheaper, faster, and more accurate.
These results confirm the observation by others that the per-base accuracy of MinION™ data (65-80%) is below that of other DNA sequencing methods [7,8]. In order for amplicon sequencing to be useful in the face of this high error rate, the rate at which reads align to the correct species and strain must be far higher than the rate of spurious alignments that would lead to false positives. We found that each dataset aligned to the correct reference genome at a 1.95 to 9.04-fold higher rate than the spurious alignments to non-target organisms using BLASR (Table 2), even when using the permissive default settings for the BLASR alignment algorithm . The datasets aligned to the correct reference at a rate of 0.51 to 2.04 using the LAST alignment, but a majority of the top identifications were correct (Table 2). These datasets will serve as a resource to other bioinformatic researchers who are designing pathogen detection algorithms to work with MinION™ data.
Utilizing the option to precisely specify the length of a given sequencing run, sufficient coverage depth to accurately call the amplicon sequence was achieved in 6 hours. The was true for both bacterial and viral species, and the amplicon data generated for cowpox, vaccinia-MVA, and vaccinia-Lister was able to be used to correctly identify these closely related strains. With read-level accuracy at 65-80%, generating significant coverage depth is necessary for extracting accurate consensus sequences. When analyzing a complex mixed sample that might contain an agent of interest such as E. coli (5-6mb) or poxviruses (150-280 kb) with a majority of DNA from host or other organisms present, it is likely that the amount of coverage for the agent of interest would be extremely limited after only 6 hours of runtime thus making it difficult to identify or characterize any pathogens within the sample. More accurate identification and characterization can occur using the MinION™ by concentrating the data collection over a particular genomic region via targeted amplification when using nanopore sequencing to determine the biological agent milieu of a complex sample.
While the field of taxonomic classification and pathogen detection within complex mixed DNA samples has developed a set of highly sophisticated algorithms and tools (e.g. [18-21], and many others), none of these have yet been modified or validated for use with nanopore sequence data. Therefore, in this study we have chosen to focus on the rate of alignment of a set of reads against a small reference dataset as a proxy for the broader utility of nanopore data for pathogen detection. The importance of finely tuned and validated detection algorithms is shown clearly by the subtly different results returned by two algorithms, BLASR and LAST, which both attempt to find the optimal set of correct alignments of the same set of reads against the same reference database. The difference between those two aligners barely begins to scratch the surface of the vast parameter space that must be explored to find the best method for translating raw nanopore data into actionable information on the presence of specific organisms within a complex environmental or clinical sample. However, these results indicate that the input nanopore sequence data contains sufficient information in order to drive that detection in a reliable and accurate manner.
To date, the MinION™ work being reported has demonstrated the enormous potential of nanopore sequencing, while also highlighting that for whole genome sequencing approaches improvements will need to be made as the technology matures. By utilizing amplicon sequencing as a method for bacterial and viral detection and differentiation, accurate depictions of what agents are in a sample can be generated. Importantly, this is a practical use for the technology as it exists now, and the methods described here will only become more accurate as nanopore sequencing technology evolves. The methods and results presented here and deposited in GigaDB  will help inform and guide the community as applications are sought for the current generation as well as for future iterations of nanopore sequencing technology.
Cowpox, Vaccinia-MVA, and Vaccinia-Lister were grown in Vero (Modified Vaccinia Ankara or MVA) or BHK cells (cowpox, Lister) until the development of CPE (cytopathogenic effect). Cells were then harvested and DNA extracted using the DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA, USA). Poxvirus amplicons were then generated by using the consensus primers EACP1 (ATGACACGATTGCCAATAC) and EACP2 (CTAGACTTTGTTTTCTG) [22,23] and HotStarTaq PCR kit (Qiagen). E. coli K12 (Affymetrix, Santa Clara, CA, USA) DNA was amplified using the 16S-specific primers adapted from probeBase  S-D-Bact-0008-c-S-20-ONT (GCCATCAGATTGTGTTTGTTAGTCGCTAGRGTTYGATYMTGGCTCAG) and S-D-Bact-1391-a-A-17-ONT (GCTTACGGTTCACTACTCACGACGATGGACGGGCGGTGWGTRCA). PCR products were verified via gel electrophoresis then cleaned up with Agencourt AMPure beads (Beckman Coulter, Miami, FL, USA). The amplicons were prepped using genomic preparation kits (SQK-MAP-002) and the methodology from Oxford Nanopore, which is briefly described below and performed as detailed by the nanopore literature from Oxford Nanopore. 1ug of purified amplicon DNA with 50 ng added lambda phage control DNA was end repaired (NEBNext End Repair Module, New England Biolabs, Beverly, MA, USA) and dA-tailed (NEBNext dA-Tailing Module, New England Biolabs). A hairpin adapter was then ligated to the end repaired and dA-tailed amplicons, followed by an overnight incubation with a motor protein (to facilitate entry of the hairpin and complementary DNA strand into the pore). 6ul of the resulting prepared library was diluted in 140ul of MinION™ loading buffer with 4ul MinION™ fuel mix (Oxford Nanopore Technologies, Oxford, UK). The MinION™ flow cells were primed with 300ul MinION™ loading buffer followed by the loading of the sequencing mix. The flow cell was run for 6 hours for each sequencing run.
Template reads in FASTA format were extracted using poRe  from the HDF5 files output by the MinION™ base-calling software. Those reads were aligned against a reference dataset containing the set of microbial genomic regions predicted to be amplified by the primers used for each dataset using BLASR  with default settings, or LAST  using the settings described previously for MinION™ data: match score (r) of 1, gap opening penalty (a) of 1, and gap extension penalty (b) of 1 . Alignment summaries were output by BLASR and LAST and ‘pileup’ files were generated using Samtools . Read percentage identity was calculated as described previously for LAST output , and the percentage identity values for BLASR were used as output. R and ggplot2  were used to plot the percent identity and alignment span for each read against its known reference, as well as the number of bases aligned to each reference genome in the larger database. A complete set of instructions for that analysis is provided as supplemental information.
Availability and requirements
Project name: Minion viral and bacterial species amplicon sequencing shell scripts
Project home page: https://github.com/gigascience/paper-kilianski2015
Operating system: Unix
Programming language: Bash
Other requirements: Unix
License: GPL v3.
Availability of supporting data
The datasets supporting the results of this article are available in the GigaDB repository , and the European Nucleotide Archive under accession number ERP009678 and BioProject PRJEB8656.
Basic Local Alignment with Successive Refinement
Hierarchical Data Format 5
Modified Vaccinia Ankara
AK is supported by the National Academy of Science and DTRA as a National Research Council (NRC) fellow and E.C is supported through ORISE. This work was funded with seed money from NRC/DTRA to AK. Conclusions and opinions presented here are those of the authors and are not the official policy of the US Army, ECBC, or the US Government. Information in this report is cleared for public release and distribution is unlimited.
- Akeson M, Branton D, Kasianowicz JJ, Brandin E, Deamer DW. Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid, and polyuridylic acid as homopolymers or as segments within single RNA molecules. Biophys J. 1999;77:3227–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Meller A, Nivon L, Brandin E, Golovchenko J, Branton D. Rapid nanopore discrimination between single polynucleotide molecules. Proc Natl Acad Sci U S A. 2000;97:1079–84.View ArticlePubMedPubMed CentralGoogle Scholar
- Vercoutere W, Winters-Hilt S, Olsen H, Deamer D, Haussler D, Akeson M. Rapid discrimination among individual DNA hairpin molecules at single-nucleotide resolution using an ion channel. Nat Biotechnol. 2001;19:248–52.View ArticlePubMedGoogle Scholar
- Stoloff DH, Wanunu M. Recent trends in nanopores for biotechnology. Curr Opin Biotechnol. 2013;24:699–704.View ArticlePubMedGoogle Scholar
- Pastoriza-Gallego M, Breton M-F, Discala F, Auvray L, Betton J-M, Pelta J. Evidence of unfolded protein translocation through a protein nanopore. ACS Nano. 2014;8(11):11350–60
- Shankla M, Aksimentiev A. Conformational transitions and stop-and-go nanopore transport of single-stranded DNA on charged graphene. Nat Commun. 2014;5:5171.View ArticlePubMedPubMed CentralGoogle Scholar
- Mikheyev AS, Tin MMY. A first look at the Oxford Nanopore MinION sequencer. Mol Ecol Resour. 2014;14(6):1097–102.
- Quick J, Quinlan AR, Loman NJ. A reference bacterial genome dataset generated on the MinION(TM) portable single-molecule nanopore sequencer. Gigascience. 2014;3:22.View ArticlePubMedPubMed CentralGoogle Scholar
- Ashton PM, Nair S, Dallman T, Rubino S, Rabsch W, Mwaigwisya S, et al. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol. 2014, advance on [Epub ahead of print].
- Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M. Improved data analysis for the MinION nanopore sequencer. Nat Methods. 2015, [Epub ahead of print].
- Kilianski A, Haas JL, Corriveau EJ, Liem AT, Willis KL, Kadavy DR, Rosenzweig CN, Minot SS. Supporting data and materials from “Bacterial and viral identification and differentiation by amplicon sequencing on the MinION nanopore sequencer”. GigaScience Database; 2015 http://dx.doi.org/10.5524/100134
- Watson M, Thomson M, Risse J, Talbot R, Santoyo-Lopez J, Gharbi K, et al. poRe: an R package for the visualization and analysis of nanopore sequencing data. Bioinformatics. 2014;31(1):114–5
- Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics. 2012;13:238.View ArticlePubMedPubMed CentralGoogle Scholar
- Frith MC, Hamada M, Horton P. Parameters for accurate genome alignment. BMC Bioinformatics. 2010;11:80.View ArticlePubMedPubMed CentralGoogle Scholar
- Chang F, Li MM. Clinical application of amplicon-based next-generation sequencing in cancer. Cancer Genet. 2013;206:413–9.View ArticlePubMedGoogle Scholar
- Samuel N, Villani A, Fernandez C V, Malkin D. Management of familial cancer: sequencing, surveillance and society. Nat Rev Clin Oncol. 2014;11(12):723-31
- Zumla A, Al-Tawfiq JA, Enne VI, Kidd M, Drosten C, Breuer J, et al. Rapid point of care diagnostic tests for viral and bacterial respiratory tract infections—needs, advances, and future prospects. Lancet Infect Dis. 2014;14(11):1123–35.
- Ames SK, Hysom DA, Gardner SN, Lloyd GS, Gokhale MB, Allen JE: Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 2013;29(18):2253–60
- Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods. 2012;9:811–4.View ArticlePubMedPubMed CentralGoogle Scholar
- Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15:R46.View ArticlePubMedPubMed CentralGoogle Scholar
- Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 2014;24:1180–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Ropp SL, Jin Q, Knight JC, Massung RF, Esposito JJ. PCR strategy for identification and differentiation of small pox and other orthopoxviruses. J Clin Microbiol. 1995;33:2069–76.PubMedPubMed CentralGoogle Scholar
- Meyer H, Damon IK, Esposito JJ. Orthopoxvirus diagnostics. Methods Mol Biol. 2004;269:119–34.PubMedGoogle Scholar
- Loy A, Maixner F, Wagner M, Horn M. probeBase–an online resource for rRNA-targeted oligonucleotide probes: new features 2007. Nucleic Acids Res. 2007;35(Database issue):D800–4.View ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Wickham H. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.