Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Logo of CS4People

The Taub Faculty of Computer Science Events and Talks

DEPTH: Discourse Education through Pre-Training Heirarchically
event speaker icon
Zachary Elisha Bamberger (M.Sc. Thesis Seminar)
event date icon
Sunday, 07.04.2024, 10:00
event location icon
Room 601
event speaker icon
Advisor: Dr. Yonatan Belinkov
Language Models (LMs) excel in many tasks, but understanding discourse – how sentences connect to form coherent text – remains a challenge. This is especially true for smaller models aiming to match the abilities of their larger counterparts in handling long and complex inputs. To address this, we introduce DEPTH, a new encoder-decoder model designed to foster robust discourse-level representations during the pre-training phase. DEPTH uniquely combines hierarchical sentence representations and the “Sentence Un-shuffling” task with traditional span-corruption objective of encoder-decoder LMs. While span-corruption helps the model learn word-level dependencies, ”Sentence Un- shuffling” forces it to restore the natural order of scrambled sentences, teaching it about the logical flow of language. The encoder-decoder architecture allows DEPTH to consider words both before and after a given token, offering a more nuanced contextual understanding than decoder-only models like GPT. This is crucial for tasks that depend on how sentences interrelate.

We built a pre-training and fine-tuning framework for encoder-decoder models that facilitated our experiments with both T5 and DEPTH, and that seamlessly integrates with open-source tools for distributed training in the HuggingFace ecosystem. DEPTH’s training is remarkably computationally efficient, as it learns meaningful semantic- and discourse-level representations at a faster rate than it’s T5 counterpart. Notably, already in early stages of the pre-training phase, we find that DEPTH outperforms T5 by reaching a lower span-corruption loss. This occurs despite the fact that T5 is trained solely with this objective, and DEPTH is tasked with the additional sentence un-shuffling objective. Evaluations on GLUE and DiscoEval benchmarks demonstrate DEPTH’s ability to quickly learn downstream tasks spanning understanding of syntax (CoLA), sentinment analysis (SST2), sentence positioning (SP), discourse coherence (DC), and natural language inference (MNLI).