Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Logo of CS4People
Events

The Taub Faculty of Computer Science Events and Talks

Statistically dense intervals in binary sequences with applications to assessing local enrichment in the human genome
event speaker icon
Shahar Mor (M.Sc. Thesis Seminar)
event date icon
Tuesday, 06.08.2024, 11:00
event speaker icon
Advisor: Prof. Zohar Yakhini

Statistical enrichment tools are highly useful in biological research. Current approaches to statistical enrichment in ranked or ordered lists such as, for example, GSEA and GOrilla, are limited to the suffix (prefix) of the list. These methods assess extreme density of 1s in binary vectors on either side. Statistical significance can be assigned using, e.g, Wilcoxon Rank Sum and mHG statistics.
In this work we extend the mHG approach to also address enrichment in any index intervals of the binary vector. We define and provide a partial characterization of related distributions under a uniform null model. Our partial characterization yields useful bounds for extreme events. We provide a software tool to the community, implementing the method in Python. Finally, we analyze several example use cases and describe the results. We show, for example, that lung cancer differential expression, comparing ADC to other types, is enriched in a region of Chromosome 3. This example represents a typical use case for imHG -- obtaining enriched intervals for any set of genes of interest. We provide a Python implementation, called imHG, for finding and reporting enriched genomic intervals with any given list of genes of interest.