EMPeror: a tool for visualizing high-throughput microbial community data
© Vázquez-Baeza et al.; licensee BioMed Central Ltd. 2013
Received: 17 October 2013
Accepted: 20 November 2013
Published: 26 November 2013
As microbial ecologists take advantage of high-throughput sequencing technologies to describe microbial communities across ever-increasing numbers of samples, new analysis tools are required to relate the distribution of microbes among larger numbers of communities, and to use increasingly rich and standards-compliant metadata to understand the biological factors driving these relationships. In particular, the Earth Microbiome Project drives these needs by profiling the genomic content of tens of thousands of samples across multiple environment types.
Features of EMPeror include: ability to visualize gradients and categorical data, visualize different principal coordinates axes, present the data in the form of parallel coordinates, show taxa as well as environmental samples, dynamically adjust the size and transparency of the spheres representing the communities on a per-category basis, dynamically scale the axes according to the fraction of variance each explains, show, hide or recolor points according to arbitrary metadata including that compliant with the MIxS family of standards developed by the Genomic Standards Consortium, display jackknifed-resampled data to assess statistical confidence in clustering, perform coordinate comparisons (useful for procrustes analysis plots), and greatly reduce loading times and overall memory footprint compared with existing approaches. Additionally, ease of sharing, given EMPeror’s small output file size, enables agile collaboration by allowing users to embed these visualizations via emails or web pages without the need for extra plugins.
Here we present EMPeror, an open source and web browser enabled tool with a versatile command line interface that allows researchers to perform rapid exploratory investigations of 3D visualizations of microbial community data, such as the widely used principal coordinates plots. EMPeror includes a rich set of controllers to modify features as a function of the metadata. By being specifically tailored to the requirements of microbial ecologists, EMPeror thus increases the speed with which insight can be gained from large microbiome datasets.
KeywordsMicrobial ecology QIIME Data visualization
Rapid increases in sequencing capacity are greatly expanding our ability to understand the microbial world: scaling from a handful of samples to hundreds, or thousands, allows a rich picture of trends over temporal and spatial scales that were previously unattainable. Human microbiome studies are not the only beneficiaries of this ability to perform increased sampling: large-scale patterns are now being discovered in communities ranging from soils  to oceans  including the efforts from the International Census of Marine Microbes (ICoMM). We can now process thousands of samples in a single sequencing run , and in turn computational tools must also scale to fulfill these needs .
There are several existing methods for displaying PCoA results, but none to date are specifically designed to account for the common use cases in this research field; furthermore, each of the most representative solutions allots different limitations. For example, QIIME , an open source framework for upstream and downstream analysis of microbial community samples generated via high-throughput sequencing instruments, typically generates 3D plots using KiNG  originally designed as a molecular graphics viewer, which requires static files containing each metadata field to be produced in advance, replicating the coordinates for each of these categories and resulting in long load times and large file sizes when the metadata are rich. SpotFire  is a very expensive commercial solution, beyond the budget of many research laboratories. Generic packages that provide 3D plotting functionalities such as MATLAB , Mathematica , R , Excel  or Matplotlib  can always be used, but custom code or manual approaches are typically required to relate each point to a specific visual feature intended to highlight a given variable. Consequently, this could become a time-consuming process, which as a side effect compromises its reliability, reusability and reproducibility. Moreover, none of the previously mentioned applications are specifically modeled to support the workflows of the modern microbial ecologist. Allowing the user to choose among metadata coloring dynamically, and separating coloring from visibility, has a surprisingly large effect in encouraging interactive exploration, understanding and analysis, and often allows insights into the main factors, as well as more subtle ones, structuring the data to be obtained much more rapidly.
Studies used to create Figure 1
Moving pictures of the human microbiome
Samples from two subjects are collected for up to 15 months in three body sites (oral, skin and gut)
Bacterial community variation in human body habitats across space and time
Samples from healthy adult human samples from eight subjects of up to 27 body sites
Structure, function and diversity of the healthy human microbiome
Samples from 242 healthy adult human samples from up to eighteen different body sites
Succession of microbial consortia in the developing infant gut microbiome
Gut samples collected biweekly from an infant through the first 2.5 years of life
EMPeror installation instructions can be found in the online documentation (http://qiime.org/emperor/installation_index.html).
EMPeror provides a user-friendly interface and set of tools for visualizing large numbers of microbial community samples associated with increasingly extensive metadata, and interactively manipulating these datasets to add auxiliary data and visualization techniques. Additionally, it contains several user interface features, enabling straightforward modifications and customization of perceptible aspects in the plot plus the incorporation of statistical techniques, which also help increase the ease and speed of exploratory analysis. We believe that EMPeror will have a large impact on the field, especially for large-scale environmental sampling projects, such as the Earth Microbiome Project , and large-scale clinical projects, such as the Human Microbiome Project .
Availability and requirements
Project name: Emperor
Project home page: http://emperor.colorado.edu
Operating system(s): Platform independent for the graphical user interface; OS X (10.6 and higher) and Linux only for the command line interface.
Other Requirements: Python 2.7, Chrome, QIIME (python libraries only), NumPy, BIOM 1.1.0 and PyCogent.
License: Modified BSD.
Any restrictions to use by non-academics: None.
Availability of supporting data
YVB, AG, MP and RK are developers and or leaders of the QIIME project.
Earth Microbiome Project
HyperText Markup Language, version 5
International Census of Marine Microbes
Minimum information about any (x) sequence
Principal Coordinates Analysis
Comparative and Genomic Toolkit
Quantitative Insights into Microbial Ecology
Web Graphics Library.
We thank Jackson Chen, Jai Ram Rideout, Daniel McDonald, William Van Treuren, Jose Antonio Navas-Molina, Nicholas A. Bokulich, Adam Robbins-Pianka and Greg Caporaso for feedback and useful discussion regarding the design and implementation of the software package.
This work was supported in part by the National Institutes of Health, the Crohn’s and Colitis Foundation of America, the Alfred P. Sloan Foundation, and the Howard Hughes Medical Institute.
- Lauber CL, Hamady M, Knight R, Fierer N: Pyrosequencing-based assessment of soil pH as a predictor of soil bacterial community structure at the continental scale. Appl Environ Microbiol. 2009, 75: 5111-5120. 10.1128/AEM.00335-09.View ArticlePubMedPubMed CentralGoogle Scholar
- Harris R: The L4 time-series: the first 20 years. J Plankton Res. 2010, 32: 577-583. 10.1093/plankt/fbq021.View ArticleGoogle Scholar
- Caporaso JG, Lauber CL, Costello EK, Berg-Lyons D, Gonzalez A, Stombaugh J, Knights D, Gajer P, Ravel J, Fierer N: Moving pictures of the human microbiome. Genome Biol. 2011, 12: R50-10.1186/gb-2011-12-5-r50.View ArticlePubMedPubMed CentralGoogle Scholar
- Gonzalez A, Knight R: Advancing analytical algorithms and pipelines for billions of microbial sequences. Curr Opin Biotechnol. 2012, 23: 64-71. 10.1016/j.copbio.2011.11.028.View ArticlePubMedGoogle Scholar
- O’Donoghue SI, Gavin AC, Gehlenborg N, Goodsell DS, Heriche JK, Nielsen CB, North C, Olson AJ, Procter JB, Shattuck DW: Visualizing biological data-now and in the future. Nat Methods. 2010, 7: S2-4. 10.1038/nmeth.f.301.View ArticlePubMedGoogle Scholar
- Gower JC, Legendre P: Metric and euclidean properties of dissimilarity coefficients. J Classif. 1986, 3: 5-48. 10.1007/BF01896809.View ArticleGoogle Scholar
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI: QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010, 7: 335-336. 10.1038/nmeth.f.303.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen VB, Davis IW, Richardson DC: KING (kinemage, next generation): a versatile interactive molecular and scientific visualization program. Protein Sci. 2009, 18: 2403-2409. 10.1002/pro.250.View ArticlePubMedPubMed CentralGoogle Scholar
- TIBCO-Software: Spotfire. Book Spotfire. 2013, Sommerville, Massachusets: TIBCO SoftwareGoogle Scholar
- The-MathWorks-Inc: MATLAB: the Language of Technical Computing. Book MATLAB: The Language of Technical Computing. 2013, Natick, Massachusets: The MathWorks IncGoogle Scholar
- Wolfram-Research: Mathematica, Version 8.0. Book Mathematica, Version 8.0. 2010, Champaign, Illinois: Wolfram Research, IncGoogle Scholar
- R-Core-Team: R: A language and environment for statistical computing. Book R: A language and environment for statistical computing. 2013, Vienna, Austria: R Foundation for Statistical ComputingGoogle Scholar
- Microsoft: Microsoft Excel. Book Microsoft Excel. 2011, Redmond, Washington: MicrosoftGoogle Scholar
- Hunter JD: Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007, 9: 90-95.View ArticleGoogle Scholar
- Knight R, Maxwell P, Birmingham A, Carnes J, Caporaso JG, Easton BC, Eaton M, Hamady M, Lindsay H, Liu Z: PyCogent: a toolkit for making sense from sequence. Genome Biol. 2007, 8: R171-10.1186/gb-2007-8-8-r171.View ArticlePubMedPubMed CentralGoogle Scholar
- Evident: elucidating sampling effort for microbial analysis studies. [http://github.com/qiime/evident]
- Hewitt KM, Mannino FL, Gonzalez A, Chase JH, Caporaso JG, Knight R, Kelley ST: Bacterial diversity in two neonatal intensive care units (NICUs). PLoS One. 2013, 8: e54703-10.1371/journal.pone.0054703.View ArticlePubMedPubMed CentralGoogle Scholar
- Muegge BD, Kuczynski J, Knights D, Clemente JC, Gonzalez A, Fontana L, Henrissat B, Knight R, Gordon JI: Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science. 2011, 332: 970-974. 10.1126/science.1198719.View ArticlePubMedPubMed CentralGoogle Scholar
- Kuczynski J, Stombaugh J, Walters WA, Gonzalez A, Caporaso JG, Knight R: Using QIIME to analyze 16S rRNA gene sequences from microbial communities. Curr Protoc Microbiol. 2012, Chapter 1: Unit 1E 5-PubMedGoogle Scholar
- Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R: Bacterial community variation in human body habitats across space and time. Science. 2009, 326: 1694-1697. 10.1126/science.1177486.View ArticlePubMedPubMed CentralGoogle Scholar
- HMP-Consortium: Structure, function and diversity of the healthy human microbiome. Nature. 2012, 486: 207-214. 10.1038/nature11234.View ArticleGoogle Scholar
- Koenig JE, Spor A, Scalfone N, Fricker AD, Stombaugh J, Knight R, Angenent LT, Ley RE: Succession of microbial consortia in the developing infant gut microbiome. Proc Natl Acad Sci USA. 2011, 108 (Suppl 1): 4578-4585.View ArticlePubMedGoogle Scholar
- QIIME web application. [http://www.microbio.me/qiime/]
- Vázquez-Baeza YP M, Gonzalez A, Knight R: Example files and supporting material for “EMPeror: an interactive analysis and visualization tool for high throughput microbial ecology datasets”. GigaScience Database. 2013,http://dx.doi.org/10.5524/100068,Google Scholar
- EMPeror ftp page. [ftp://thebeast.colorado.edu/pub/emperor_files/]
- Gilbert JA, Meyer F, Antonopoulos D, Balaji P, Brown CT, Desai N, Eisen JA, Evers D, Field D, Feng W: Meeting report: the terabase metagenomics workshop and the vision of an earth microbiome project. Stand Genomic Sci. 2010, 3: 243-248. 10.4056/sigs.1433550.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.