Deep learning has revolutionized computer vision, achieving unprecedented performanc in tasks like classification and detection. However, these models are highly susceptible to adversarial attacks, prompting the development of robust training methods. A notable outcome of such training is the phenomenon of Perceptually Aligned Gradients (PAG), where input gradients align semantically with human perception. Our research explores both the practical and theoretical implications of PAG. We introduce BIGRoC, a modelagnostic image refinement method that leverages PAG to enhance generated images from any source. Additionally, we study the connection between PAG and adversarial robustness, demonstrating that the connection is bidirectional. Finally, we extend PAG research to the multimodal vision-language domain, unveiling CLIPAG, a CLIP-based model that improves generative tasks and achieves text-to-image generation without a traditional generator.
Ph.D. Under the supervision of Prof. Michael Elad.