Statistical enrichment tools are highly useful in biological research. Current approaches to statistical enrichment in ranked or ordered lists such as, for example, GSEA and GOrilla, are limited to the suffix (prefix) of the list. These methods assess extreme density of 1s in binary vectors on either side. Statistical significance can be assigned using, e.g, Wilcoxon Rank Sum and mHG statistics.
In this work we extend the mHG approach to also address enrichment in any index intervals of the binary vector. We define and provide a partial characterization of related distributions under a uniform null model. Our partial characterization yields useful bounds for extreme events. We provide a software tool to the community, implementing the method in Python. Finally, we analyze several example use cases and describe the results. We show, for example, that lung cancer differential expression, comparing ADC to other types, is enriched in a region of Chromosome 3. This example represents a typical use case for imHG -- obtaining enriched intervals for any set of genes of interest. We provide a Python implementation, called imHG, for finding and reporting enriched genomic intervals with any given list of genes of interest.