Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Logo of CS4People

The Taub Faculty of Computer Science Events and Talks

Label Expansion - Integrating Prior Knowledge to Large Label Set Tasks
event speaker icon
Dor Zohar (M.Sc. Thesis Seminar)
event date icon
Thursday, 11.10.2018, 14:30
event location icon
Taub 301
event speaker icon
Advisor: Prof. Roi Reichart
In many Natural Language Processing classification tasks, the label space consists of the entire vocabulary, and therefore might have hundreds of thousands of labels. Important tasks such as language modeling, machine translation and dialog systems all have vocabulary label sets. Due to Zipf's law, a large number of words in the vocabulary will have only a few appearances in the corpus, hindering the ability to learn proper representations for these words. This work utilizes a prior hierarchical clustering of the words in the label set, in order to achieve better representation of the words. The hierarchical structure enables starting with a label set of coarse-grained concepts, and gradually refining it to the whole vocabulary. In our work, we examine two tasks with vocabulary label sets - language modeling and word2vec. We present the contribution of the prior knowledge to the performance on the two tasks comparing to the baseline, both in intrinsic and extrinsic tests.