From: The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes

Venn diagram of the overlap between Personal Genome Project variants and those from the 1000 Genomes Project and the Single Nucleotide Polymorphisms database. Single nucleotide polymorphisms (SNPs) from all 225 Personal Genome Project (PGP) genomic libraries (Additional file 1) were filtered with the following criteria: 1) Each SNP must have a PASS in the “varFilter” field; this helps remove false-positive errors. 2) The variant call – and for heterozygous SNPs also the reference call – must have a “wellCount” of six or more; this removes most of the remaining false-positive errors. 3) For heterozygous SNPs, the “SharedWellCount” field is less than or equal to 0.25X (“MinExclusiveWellCountInThisLocus” + “SharedWellCount”); this removes potential mapping errors that result in an excess of wells for which both the reference and variant base is called. The combination of this set of filters has previously been shown [16] to remove the vast majority of false-positive errors and was chosen to create a set of very high confidence variants. This set was compared with variants in the 1000 Genomes (1KG, Phase 3) and the SNP database (dbSNP, Build 147) datasets. In total, more than 17 million SNPs were found in the PGP samples and these were compared with over 81 million and 142 million in 1KG and dbSNP, respectively. As expected, more than 85 % of SNPs found in the PGP samples were found in the 1KG and/or dbSNP datasets

