Learning Subclasses of Junta from Membership Queries

Catherine Haddad-Zaknoon (Ph.D. Thesis Seminar)

Wednesday, 02.11.2022, 11:00

Zoom Lecture: 4966839889

Advisor: Prof. Nader Bshouty

In this work, we address the problem of learning subclasses of $\mathbb{JUNTA}$ under the {\it exact learning model from membership queries} or {\it black box queries}.
First, we address the problem of learning subclasses of decision trees from membership queries. For adaptive non-proper learning of decision trees of depth at most $d$, we give a randomized polynomial time algorithm that asks $\tilde O(2^{2d}) + 2^{d}\log n$ membership queries and a deterministic polynomial time algorithm that asks $2^{5.83d}+2^{2d+o(d)}\log n$ queries. We develop new results on the learnability of decision trees in the distribution-free model and use those to build a random proper learning algorithm that asks $m=\tilde{O}(2^{2d})\log(1/\delta)+O(2^d\log n)$ queries and time $2^{O(d^2)}\log (1/\delta) +O(mn)$, and a deterministic proper learning algorithm for learning the class $\DT_d^s$ with $m=2^{2d+o(d)}\log n$ queries and running time of $s^{O(d)}+O(mn)$.
Second, we consider the problem of group testing problem with non-adaptive randomized algorithms. Several models have been discussed to determine how to randomly choose the tests. For a model ${\cal M}$, let $m_{\cal M}(n,d)$ be the minimum number of tests required to detect at most $d$ defectives within $n$ items, with success probability at least $1-\delta$, for some constant $\delta$. We study the measures
$$c_{\cal M}(d)=\lim_{n\to \infty} \frac{m_{\cal M}(n,d)}{\ln n}
\mbox{\ and \ } c_{\cal M}=\lim_{d\to \infty} \frac{c_{\cal M}(d)}{d}.$$
Our analyses yields tight bounds for $c_{\cal M}(d)$ and $c_{\cal M}$ for all the known models~${\cal M}$.
Third, we examine the problem of estimating the number of defective items $d$ up to a multiplicative factor $\Delta>1$, using deterministic group testing when some upper bound $D\geq d$ is known. We bring lower and upper bounds on the number of tests required in both the adaptive and the non-adaptive deterministic settings. For the adaptive deterministic settings, we prove a lower bound of $\Omega \left((D/\Delta^2)\log (n/D) \right )$ tests. Moreover, we give a polynomial time adaptive algorithm that shows that our bound is tight up to a small additive term. For non-adaptive algorithms, an upper bound of $O((D/\Delta^2)$ $(\log (n/D)+\log \Delta) )$ is achieved by means of non-constructive proof. This result matches the lower bound up to a small additive term. In addition, we use existing polynomial time constructible \emph{expander regular bipartite graphs}, \emph{extractors} and \emph{condensers} to construct two polynomial time algorithms. The first makes $O((D^{1+o(1)}/\Delta^2)\cdot \log n)$ tests, and the second makes $(D/\Delta^2)\cdot Quazipoly$ $(\log n)$ tests. This is the first explicit construction with an almost optimal test complexity.
Moreover, we study the problem of determining exactly the number of defective items in an adaptive group testing by using a minimum number of tests. We improve the existing algorithm and prove a lower bound that shows that the number of tests in our algorithm is optimal up to small additive terms. In particular, we prove that any randomized Monte Carlo algorithm for this task must ask at least
$$ \left( 1-\frac{\log d+\log(1/\delta)+1}{\log n+\log(1/\delta)}\right)d\log (1/2\delta)$$
queries. Moreover, we prove an upper bound of $(1+o(1))d\log(d/\delta)$ for the same settings. This bound is optimal up to the additive term $(1+o(1))d\log d$.
Finally, we consider some applicable aspects of group testing. We propose a heuristic random method to construct the test design. Using the suggested design, along with group testing – compressive sensing decoding method, our experiments imply that, for some values of the pool size, a reduction of up to 10 fold in the tests number can be achieved. We discuss the applicability of this process in accelerating medical tests required for COVID-19 PCR testing.