Skip to main content

The Genome Russia project: closing the largest remaining omission on the world Genome map

Abstract

We are witnessing the great era of genome exploration of the world, as genetic variation in people is being detailed across multiple varied world populations in an effort unprecedented since the first human genome sequence appeared in 2001. However, these efforts have yet to produce a comprehensive mapping of humankind, because important regions of modern human civilization remain unexplored. The Genome Russia Project promises to fill one of the largest gaps, the expansive regions across the Russian Federation, informing not just medical genomics of the territories, but also the migration settlements  of historic and pre-historic Eurasian peoples.

Background

Mapping the unabridged pattern of human genetic variation across the world represents one of the greatest exploration projects since the genomics era began in 2001 with a published draft of the human genome. Driven by the availability of samples and by technological advancements in next generation sequencing in the last decade, whole-genome sequencing has scaled up sequencing personal individual genomes of some audacious scientists (Drs. Venter and Watson) to carrying out entire global surveys of individual genomes, best represented by the 1,000 Genome project [1, 2].

In the three years since the first 1,000 Genomes consortium paper on human diversity was published, attention has shifted to national population genome projects. These include, for example, the 100,000 UK Genome Project, the Asian Genome Project, the Chinese Million Genomes endeavor, the African Genome Sequence Variation project, as well as whole-genome sequence population studies in the Netherlands, Qatar, Turkey, and Japan [3]. All of these projects serve as a major global reference resource for human genetic variation and provide a new roadmap and power for disease variant discoveries. However, all of these projects still make for an incomplete genome map of humankind.

Looking at a world map showing these dynamic developments in genome sequencing, one cannot help but notice a great “wide gap” in the center (Fig. 1a): from the Baltic Sea to the Beringia Straits, Russia remains the largest vast swath of land, and people, for which the human genome landscape remains relatively unexplored. Note that even the larger population SNP array genotyping projects such as HGDP (~52 populations sampled worldwide) and the HapMap have little representation of ethnic groups in Russia (Fig. 1b, [1, 4]). Also, the European and East Asian population groups in the 1,000 Genome Project do not capture the rich background of genomic diversity in this part of the world — partly because of a difference in ancestry; partly because of its history of admixture (Fig. 1a). Recent population genetic studies of Russian indigenous populations have primarily employed mtDNA, STR, Y-chromosome haplogroups and genome SNP variants in certain regional ethnic populations, with little done on more comprehensive whole genome sequencing of Russian people (for citations see: http://genomerussia.bio.spbu.ru/?lang=en).

Fig. 1
figure 1

Distribution of publicly available genome sequences. a Worldwide locations of population samples with the whole genome data from the 1000 Genome Project [1]. Each circle represents the number of genome sequences publicly available at www.1000genomes.org. ASIA: BEB Bengali in Bangladesh; CDX Chinese Dai in Xishuangbanna, China; CHB Han Chinese in Bejing, China; CHS Southern Han Chinese, China; GIH Gujarati Indian in Houston,TX; ITU Indian Telugu in the UK; JPT Japanese in Tokyo, Japan; KHV Kinh in Ho Chi Minh City, Vietnam; PJL Punjabi in Lahore, Pakistan; STU Sri Lankan Tamil in the UK. AFRICA: ACB African Caribbean in Barbados; ASW African Ancestry in Southwest USA; ESN Esan in Nigeria; GWD Western Division, The Gambia; LWK Luhya in Webuye, Kenya; MSL Mende in Sierra Leone; YRI Yoruba in Ibadan, Nigeria; EUROPE: CEU Utah residents with Northern and Western European ancestry, USA; FIN Finnish in Finland; GBR British in England and Scotland; IBS Iberian in Spain; TSI Toscani in Italiy; THE AMERICAS: CLM Colombian in Medelin, Colombia; MXL Mexican Ancestry in Los Angeles, USA; PEL Peruvian in Lima, Peru; PUR Puerto Rican in Puerto Rico. Each circle represents the number of sequences in the final release. The dotted circles indicate populations that were collected in diaspora. b Eastern Hemisphere locations of population samples in surveys of worldwide genetic variation (HapMap, 1000 Genomes Project, Phase 1, and HGDP) [1, 4]. c Major human migration routes (adapted from [10]) and locations of other hominid remains out of Africa. The approximate locations of major Neanderthal and Denisovan finds are indicated by glowing circles

This is problematic given that the historic migratory milestones that founded modern Russian populations include the northward and westward expansion of the Indo-Europeans and the Uralic people, the westward expansion of the Turkic people, and centuries of admixture between them (Fig.1c). Further, the routes for peopling Northern and Central Europe inevitably led through this territory, then waves of great human migrations of recorded history pushed this way for centuries, followed by a known exchange of knowledge, and technology, and, likely, genes, along the Silk Road (Fig. 1c). These myriad migrations have created a complex patchwork of human diversity that is today’s Russia and somewhere hidden in Siberia reside the ancestors for modern Native Americans.

In the more distant past, gene exchange likely occurred between Homo sapiens and Neanderthal and Denisovan populations they encountered. The genetic contribution of the Neanderthal has not been well studied beyond Western Europe; nor has that of the Denisovan for South East Asia, despite their physical remains being unearthed in Siberia [5, 6]. Russian populations very likely contain ancestral components that aren’t easily found in the populations represented in the 1,000 Genomes or even in the comprehensive HGDP database. Hence, Russia needs a national genome project on its own.

With the current moderate cost of genome projects, a Russian centric project will likely be much less expensive than its predecessors, and it would bring numerous benefits to our understanding of population origins and disease on local, global and evolutionary scales, as detailed in Table 1.

Table 1 Six real benefits from genome Russia project to Russia, to science, and to the world genomics community

The justifications for collecting, sequencing and analyzing populations from Russia in the immediate —rather than some distant— future, all impart the enormous significance that these populations have in the history of humankind and their value as a reservoir of knowledge about our health. Without filling the great “wide gap” on the genetic map of the world, we will remain handicapped in achieving our major goals for use of genomic information. The beginnings of such a Genome Russia Project are in fact being met with growing enthusiasm, as seen by its endorsement by the Russian Academy of Sciences and the Russian Ministry of Education and Science in a concerted effort to make it happen (http://genomerussia.bio.spbu.ru/?lang=en). While political diplomacies continue(9), the Genome Russia Project can and should become an example of international collaboration on the common ground and with the common goal of improving human health and betterment.

Abbreviations

GWAS:

Genome Wide Association Study

HGDP:

Human Genome Diversity Project

mtDNA:

mitochondrial deoxyribonucleic acid

SNP:

Single Nucleotide Polymorphism

STR:

Short Tandem Repeat

References

  1. Auton A, Abecasis GR. and The 1000 Genomes Consortium. Global reference for human geneti variation. Nature. 2015;526:68–74.

    Article  Google Scholar 

  2. Green ED, Watson JD, Collins FS. Human Genome Project: Twenty-five years of big biology. Nature. 2015;526:29–31.

    Article  CAS  PubMed  Google Scholar 

  3. Kaiser J. Who has your DNA –or wants it? Science. 2015;349:1475.

    Article  PubMed  Google Scholar 

  4. Auton A, Bryc K, Boyko AR, Lohmueller KE, Novembre J, Reynolds, et al. Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res. 2009;19(5):795–803.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand, et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010;468(7327):1053–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Sankararaman S, Mallick S, Dannemann M, Prufer K, Kelso J, Paabo S, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507(7492):354–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Smith MW, O’Brien SJ. Mapping by Admixture Disequilibrium: Advances, Limits and Guidelines. Nat Genet Rev. 2005;6:623–32.

    Article  CAS  Google Scholar 

  8. Cheng CY, Kao WH, Patterson N, Tandon A, Haiman CA, Harris TB, et al. Admixture mapping of 15,280 African Americans identifies obesity susceptibility loci on chromosomes 5 and X. PLoS Genet. 2009;5(5):e1000490.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Schiermeier V. Secret Service to vet manuscripts. Nature. 2015;526:486.

    Article  CAS  PubMed  Google Scholar 

  10. Stewart JB, Chinnery PF. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat Rev Genet. 2015;16(9):530–42.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

TKO, VB and SJO as PI were supported by Russian Ministry of Science Mega-grant no.11.G34.31.0068.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen J. O’Brien.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors drafted, read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oleksyk, T.K., Brukhin, V. & O’Brien, S.J. The Genome Russia project: closing the largest remaining omission on the world Genome map. GigaSci 4, 53 (2015). https://doi.org/10.1186/s13742-015-0095-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13742-015-0095-0

Keywords