דלג לתוכן (מקש קיצור 's')
Logo of Technion
Logo of CS Department
אירועים

אירועים

Label Expansion - Integrating Prior Knowledge to Large Label Set Tasks
event speaker icon
דור זהר, הרצאה סמינריונית למגיסטר
event date icon
יום חמישי, 11.10.2018, 14:30
event location icon
טאוב 301
In many Natural Language Processing classification tasks, the label space consists of the entire vocabulary, and therefore might have hundreds of thousands of labels. Important tasks such as language modeling, machine translation and dialog systems all have vocabulary label sets. Due to Zipf's law, a large number of words in the vocabulary will have only a few appearances in the corpus, hindering the ability to learn proper representations for these words. This work utilizes a prior hierarchical clustering of the words in the label set, in order to achieve better representation of the words. The hierarchical structure enables starting with a label set of coarse-grained concepts, and gradually refining it to the whole vocabulary. In our work, we examine two tasks with vocabulary label sets - language modeling and word2vec. We present the contribution of the prior knowledge to the performance on the two tasks comparing to the baseline, both in intrinsic and extrinsic tests.
[בחזרה לאינדקס האירועים]