אירועים

אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב

ייצוג מודלי שפה גדולים במרחב המשימות הסמנטי

עידן קשני (הרצאה סמינריונית למגיסטר)

יום שני, 24.03.2025, 11:30

טאוב 601 & זום

מנחה: Prof. Avi Mendelson

The open-source community offers a vast and continually expanding array of large language models (LLMs), accompanied by diverse benchmarks to evaluate their performance. While this wealth ecosystem provides users with many models that may align with their objectives, the sheer number of options makes selection complex and time-consuming. A model may appear proficient in a given domain, yet underperform on specific instances.

We introduce a straightforward, efficient, and scalable linear method for creating structured representations of models by leveraging the diversity of existing benchmarks. Our approach is highly interpretable and requires no training. This makes it particularly well-suited for dynamic environments where models and datasets are frequently updated. By embedding models into a task-oriented space, our method facilitates systematic retrieval based on predefined properties, such as performance. We apply this method within a library of models and datasets, storing the representations as model metadata, and demonstrate their practical utility for success prediction and model selection.

Additionally, we present Hysteresis Rectified Linear Unit (HeLU), a novel activation function, designed to address the "dying ReLU" problem. During inference, HeLU functions identically to the Rectified Linear Unit (ReLU), preserving computational efficiency. We show that HeLU provides a lightweight alternative to more complex activation functions, offering a practical trade-off between performance and computational cost.

[בחזרה לאינדקס האירועים]