COLLOQUIUM LECTURE - Consolidating and Exploring Open Textual Knowledge

Ido Dagan
Tuesday, 15.1.2019, 14:30
Room 337 Taub Bld.
Department of Computer Science, Bar Ilan University
Roy Schwartz

How can we capture effectively the information expressed in multiple texts? How can we allow people, as well as computer applications, to easily explore it? The current semantic NLP pipeline typically ends at the single sentence or text level, putting the burden on applications to consolidate and present related information across multiple texts. Further, semantic representations, which may provide the basis for text consolidation, are often based on non-trivial schemata which require expert annotation, making it a huge effort to create large scale corpora for training. In this talk, I will outline a research program whose goals are to represent consolidated information conveyed in multiple texts and to communicate it effectively to users. This program builds upon three quite unexplored research lines. First, we aim to establish a "natural" semantic representation for individual texts, which is based solely on crowdsourcable natural language expressions rather than on pre-specified schemata. To that end, we follow and extend the recent Question-Answer Semantic Role Labeling (QA-SRL) approach, through which we decompose sentence information to question-answer pairs, each representing an individual statement. Second, we are developing approaches for consolidating information structures of different texts, while requiring substantial extension of cross-text co-reference detection. The goal is to yield a consolidated structure that may be seen as an "open" analogous to traditional knowledge graphs, representing real-world elements and statements relating them. Third, we are developing a framework for interactive exploration of multi-text information, while addressing the challenging task of systematic and replicable evaluation of such interactive methods. I will provide an overview of the framework and its three research lines and illustrate different types of the evolving research tasks. Short Bio: Ido Dagan holds B.Sc. (Summa Cum Laude) and Ph.D. degrees in Computer Science from the Technion, Israel. He conducted his Ph.D. research in collaboration with the IBM Haifa Scientific Center, where he was a research fellow in 1991. During 1992-1994 he was a Member of Technical Staff at AT&T Bell Laboratories. During 1994-1998 he has been at the Department of Computer Science of Bar Ilan University, to which he returned in 2003. During 1998-2003 he was co-founder and CTO of a text categorization startup company, FocusEngine, and VP of Technology at LingoMotors, a Cambridge Massachusetts company which acquired FocusEngine.

Back to the index of events