In this talk, I present three research directions from my PhD.
1. Geometric Learning for Structured Data. Firstly, we introduce a simple, spectral-geometric approach for matrix completion on graphs. Our approach couples the priors implicitly induced by gradient descent with explicitly imposed spectral-geometric priors and achieves strong performance in drug-target interaction prediction and recommendation systems applications. Secondly, we introduce inductive/generalizable solvers for quadratic optimal transport problems, demonstrating superior scalability over transductive methods, with applications in single-cell multi-omic alignment. Finally, on the practical front, we show algorithms for efficient execution of graph machine learning workloads for large-scale recommendation system inference.
2. Statistical Inference via Optimal Transport. In this line of work, we develop new solvers and extensions for vector quantile regression (VQR), an optimal-transport–based statistical framework introduced by Carlier et al. 2016. Firstly, we introduce nonlinear VQR, the first practical approach to model quantile functions of multivariate conditional distributions, and demonstrate applications in creating calibrated high-dimensional confidence sets. Secondly, we propose manifold VQR, and extend the notion of conditional quantile functions for manifold-valued response variables. Finally, we introduce continuous solvers for VQR by solving conditional continuous OT problems. We perform fundamental statistical inference tasks on conditional distributions, i.e., sampling, computing likelihoods, constructing confidence sets, computing order statistics (ranks, quantiles, etc.) from the derived OT maps.
3. Modeling Biomolecules. In this research direction, we make advances in overcoming the "single-sequence, single-structure" dogma in structural biology, and highlight fundamental limitations of protein structure predictors such as AlphaFold. Firstly, we investigate protein structures that exhibit dual conformations in a single crystal, called "altlocs", and identify "stable altlocs", the altloc dual conformations that are thermodynamically stable. We demonstrate that the state-of-the-art protein structure predictors and protein structure generative models fail to recover these dual conformations. We introduce a guided diffusion framework to improve the modeling of biomolecules conditioned on experimental measurements. Secondly, we demonstrate limitations of protein structure prediction algorithms for modeling chimeric proteins. To overcome this, we introduce windowed multiple sequence alignment (MSA), for enriching the MSA of the chimeric proteins to yield improved predictions.