The Taub Faculty of Computer Science Events and Talks
Sasha Gusev (Columbia University)
Thursday, 24.07.2008, 13:30
Identifying and quantifying the sharing of genetic material between individuals within a population is
an important step in accurately using genealogical relationships for disease analysis as well as improving
our understanding of demography. However, exhaustive pairwise analysis which has been successful in
small cohorts cannot keep up with the current torrent of genotype data. We present GERMLINE, a robust
algorithm for identifying pairwise segmental sharing which scales linearly with the number of input individuals.
Our approach is based on a dictionary of haplotypes, used to efficiently discover short exact matches
between individuals and then expand these matches to identify long, nearly-identical segmental sharing that
is indicative of relatedness. We use GERMLINE to survey hidden relatedness both in the HapMap and in
a densely typed island population of 3,000 individuals. We verify that GERMLINE is in concordance with
other methods when they can process the data, and also facilitates analysis of larger scale studies.
We also demonstrate novel applications of precise analysis of hidden relatedness to detection of haplotype
phasing errors and structural variation. We show that shared segment discovery can help identify phasing
errors and potentially resolve them. Additionally, we use detected identity of genomic segments to expose
polymorphic deletions that are otherwise challenging to detect, with 8/14 deletions in the HapMap samples
and 153/200 deletions in the island data having independent experimental validation.
Reinforcing the potential for a population-based approach to linkage analysis, we have successfully begun a
pplying GERMLINE to larger and more out bred populations, as well as to quantitative trait mapping using
shared haplotypes - identifying a novel haplotype associated with an increase in plant sterol levels.