Skip to main content

Table 1 Long Fragment Read-specific fields

From: The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes

Field

Description

hapLink

LFR phased variants have an ID with this pattern “Phased_#_#_#”, where # is an integer, the first two #s describe unique contigs, and the last # in the series is either 1 or 0 and represents the two possible haplotypes for each contig. All SNPs sharing the same “Phased_#_#_#” are from the same haplotype.

wellCount

Total number of LFR wells (out of 384) containing sequence reads calling the variant or reference allele. This metric is used to filter polymerase-induced false positive calls as it is unlikely that random polymerase errors will occur in several different wells. A complete explanation of this concept can be found in Peters et al. [16].

wellIDs

Contains the IDs of the specific wells from which reads calling the variant originate.

ecxclusiveWellCount

At each locus, this is the number of wells that have reads only calling the variant or the reference allele, not both; for true heterozygous variants, this number should be close to “WellCount”.

SharedWellCount

At each locus, this is the number of wells that contain reads calling both alleles; for true heterozygous variants, this should be low. A high number here suggests mapping errors and for homozygous variants, almost all of the well counts should be in this field.

MinExclusiveWell

CountInThisLocus

At each locus, this is the minimum number of exclusive wells (non-shared well counts).

MaxExclusiveWell

CountInThisLocus

At each locus, this is the maximum number of exclusive wells (non-shared well counts).

  1. LFR Long Fragment Read, SNP single nucleotide polymorphism