The Taub Faculty of Computer Science Events and Talks
Hadas Orgad (M.Sc. Thesis Seminar)
Wednesday, 23.03.2022, 10:00
Advisor: Dr. Yonatan Belinkov
Common studies of gender bias in natural language processing (NLP) focus either on extrinsic bias which is measured by model performance on a specific task or on intrinsic bias which is measured on a models' internal representations. However, the relationship between extrinsic and intrinsic bias is relatively unknown. In this work, we illuminate this relationship by measuring both quantities together: we debias a model during downstream fine-tuning, which reduces extrinsic bias, and measure the effect on intrinsic bias, which we measure with information-theoretic probing. Through experiments on two tasks and multiple bias metrics, we show that our intrinsic bias metric is a better indicator of debiasing than the standard metric, and can also expose cases of superficial debiasing. Our framework provides a comprehensive perspective on bias in NLP models, which can be applied to deploy NLP systems in a more informed manner.