Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Logo of CS4People

The Taub Faculty of Computer Science Events and Talks

Syntactic Annotation of Hebrew CHILDES Corpora
event speaker icon
Avishay Gretz (M.Sc. Thesis Seminar)
event date icon
Wednesday, 05.12.2012, 12:30
event location icon
Taub 601
event speaker icon
Advisor: Prof. A. Itai and Prof. S. Wintner
The CHILDES database is a large collection of child---adult spoken interactions in over 25 languages. Automatic annotation of these data faciliates research on child language development and acquisition by providing researchers with a large amount of accurate data. Recently, the English section of the CHILDES database was automatically annotated with labeled dependency relations in a state-of-the-art approach. We describe a similar endeavor, focusing on the Hebrew section of CHILDES. This is done by the following process: First, we design a novel annotation scheme of dependency relations reflecting constructions of child and child-directed utterances, as well as the special phenomena of the Hebrew language. We then annotate a corpus with these dependency relations, and use the manually-annotated data to train a parser with which the rest of the corpora can be annotated. We then evaluate the parsing accuracy. We show the adaptabtility of our annotation scheme to the CHILDES corpora in numerous evaluation scenarios. We also examine different annotation approaches of linguistic issues relevant to several languages or unique to Hebrew, as well the contribution of morphological features to the accuracy of dependency parsing of the Hebrew section of CHILDES. This is the first syntactic parser of Hebrew spoken language.