Relational Framework for Information Extraction

Yoav Nahshon, M.Sc. Thesis Seminar
Wednesday, 7.2.2018, 13:30
Taub 401
Prof. Benny Kimelfeld

Textual data written in some natural language carries concealed and valuable information within. Information Extraction (IE) is the task of automatically extracting this information in a structured representation. Standard relational database systems, who are highly suitable for representing structured information, are in fact incapable of performing deep text analysis, and therefore out-of-database solutions are often applied. However, this approach is prone to laborious development processes, complex and tangled programs, and inefficient control flows. These deficiencies have given rise to declarative solutions that automates significant parts of the manual work. Nevertheless, such frameworks typically stitch together various programming components and technologies, and may be lack of an all-binding theory. In this work we present a novel framework that extends the relational model for the case of text which uniformly represents the key players of a typical IE task; these are the unstructured data (text), the structured data (extracted information), and the functions that carry out the transformations from the former to the latter. In addition, we report on initial results w.r.t. expressive power, introduce an optimization technique that can be applied due to the understanding of the data flow our formalism provides, and present an implementation of our framework.

Back to the index of events