Developing better Language Models would benefit a myriad of communities. However, it is prohibitively costly. The talk would describe collaborative approaches to pretraining such as model merging, allowing combining several specialized models into one. Then introduce efficient evaluation to reduce overheads and touch on other accessible and collaborative aspects that best harness the expertise and diversity in Academia.
Bio: Leshem Choshen is a postdoctoral researcher at MIT&IBM, aiming to study model development openly & collaboratively, allow feasible pretraining research, and evaluate efficiently. To do so they co-created model merging, TIES merging, and the babyLM challenge. They were chosen for the postdoctoral Rothschild and Fulbright fellowship as well as IAAI and Blavatnik best Ph.D. awards. With broad NLP and ML interests, they also worked on Reinforcement Learning, Understanding how neural networks learn, and the Nature cover Project Debater – the first (2019) machine to hold a formal debate (live).
Leshem is also a dancer and an acrobat.