How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured on average over a distribution of inputs, giving no guarantee for any specific input. This talk introduces Self-Proving models, a new class of models that formally prove the correctness of their outputs via an Interactive Proof system. We will formally define Self-Proving models and their per-input (worst-case) guarantees. We will then present algorithms for learning these models and explain how the complexity of the proof system affects the complexity of the learning algorithms. Finally, we will review experiments in which Self-Proving Models are trained to compute the greatest common divisor (GCD) of two integers and prove their correctness to a simple verifier.
No prior knowledge of autoregressive models or Interactive Proofs will be assumed from the audience. This is a joint work with Shafi Goldwasser, Orr Paradise and Guy Rothblum.