דלג לתוכן (מקש קיצור 's')
אירועים

קולוקוויום וסמינרים

כדי להצטרף לרשימת תפוצה של קולוקוויום מדעי המחשב, אנא בקר ב דף מנויים של הרשימה.

Computer Science events calendar in HTTP ICS format for of Google calendars, and for Outlook.
Academic Calendar at Technion site.

קולוקוויום וסמינרים בקרוב

event head separator Pixel-Club: STMPL: Human Soft-Tissue Simulation
event speaker icon
אנטון אגפונוב (סמינר לתואר שני)
event date icon
יום שלישי, 02.04.2024, 11:30
event location icon
חדר 608, בניין זיסאפל
In various applications, such as virtual reality and gaming, simulating the deformation of soft tissues in the human body during interactions with external objects is essential. Traditionally, Finite Element Methods (FEM) have been employed for this purpose, but they tend to be slow and resource-intensive. In this paper, we propose a unified representation of human body shape and soft tissue with a data-driven simulator of non-rigid deformations. This approach enables rapid simulation of realistic interactions. Our method builds upon the SMPL model, which generates human body shapes considering rigid transformations. We extend SMPL by incorporating a soft tissue layer and an intuitive representation of external forces applied to the body during object interactions. Specifically, we mapped the 3D body shape and soft tissue and applied external forces to 2D UV maps. Leveraging a UNET architecture designed for 2D data, our approach achieves high-accuracy inference in real time. Our experiment shows that our method achieves plausible deformation of the soft tissue layer, even for unseen scenarios. M.Sc. student under the supervision of Prof. Lihi Zelnik-Manor.
event head separator Marrying Vision and Language: A Mutually Beneficial Relationship?
event speaker icon
הדר אברבוך-אלור, אוניברסיטת תל-אביב
event date icon
יום שלישי, 02.04.2024, 14:30
event location icon
טאוב 337
Foundation models that connect vision and language have recently shown great promise for a wide array of tasks such as text-to-image generation. Significant attention has been devoted towards utilizing the visual representations learned from these powerful vision and language models. In this talk, I will present an ongoing line of research that focuses on the other direction, aiming at understanding what knowledge language models acquire through exposure to images during pretraining. We first consider in-distribution text and demonstrate how multimodally trained text encoders, such as that of CLIP, outperform models trained in a unimodal vacuum, such as BERT, over tasks that require implicit visual reasoning. Expanding to out-of-distribution text, we address a phenomenon known as sound symbolism, which studies non-trivial correlations between particular sounds and meanings across languages, and demonstrate the presence of this phenomenon in vision and language models such as CLIP and Stable Diffusion. Our work provides new angles for understanding what is learned by these vision and language foundation models, offering principled guidelines for designing models for tasks involving visual reasoning.

Short Bio: Hadar Averbuch-Elor is an Assistant Professor at the School of Electrical Engineering in Tel Aviv University. Before that, Hadar was a postdoctoral researcher at Cornell-Tech, working with Noah Snavely. She completed her PhD in Electrical Engineering at Tel-Aviv University, where she was advised by Daniel Cohen-Or. Hadar is a recipient of multiple awards including the Zuckerman Postdoctoral Scholar Fellowship, the Schmidt Postdoctoral Award for Women in Mathematical and Computing Sciences, and the Alon Scholarship. She was also selected as a Rising Star in EECS by UC Berkeley. Hadar's research interests lie in the intersection of computer graphics and computer vision, particularly in combining pixels with more structured modalities, such as natural language and 3D geometry.
event head separator Learning with visual foundation models for Gen AI
event speaker icon
גל צ'צ'יק, אוניברסיטת בר-אילן ו-NVIDIA
event date icon
יום חמישי, 04.04.2024, 10:30
event location icon
טאוב 337
Between training and inference, lies a growing class of AI problems that involve fast optimization of a pre-trained model for a specific inference task. These are not pure “feed-forward” inference problems applied to a pre-trained model, because they involve some non-trivial inference-time optimization beyond what the model was trained for; neither are they training problems, because they focus on a specific input. These compute-heavy inference workflows raise new challenges in machine learning and open opportunities for new types of user experiences and use cases.

In this talk, I describe two main flavors of the new workflows in the context of text-to-image generative models: few-shot fine-tuning and inference-time optimization. I'll cover personalization of vision-language models using textual-inversion techniques, and techniques for model inversion, prompt-to-image alignment and consistent generation. I will also discuss the generation of rare classes, and future directions.

Short Bio: Gal Chechik is a Professor of computer science at Bar-Ilan University and a senior director of AI research at NVIDIA. His current research focuses on learning for reasoning and perception. In 2018, Gal joined NVIDIA to found and head nvidia's research in Israel. Prior to that, Gal was a staff research scientist at Google Brain and Google research developing large-scale algorithms for machine perception, used by millions daily. Gal earned his PhD in 2004 from the Hebrew University, and completed his postdoctoral training at Stanford CS department. Gal authored ~130 refereed publications, ~50 patents, including publications in Nature Biotechnology, Cell and PNAS. His work won awards at ICML and NeurIPS.
event head separator אימון מקדים של מודלי שפה מכוון שיח
event speaker icon
זכרי אלישע במרגר (הרצאה סמינריונית למגיסטר)
event date icon
יום ראשון, 07.04.2024, 10:00
event location icon
חדר 601
event speaker icon
מנחה:  Dr. Yonatan Belinkov
Language Models (LMs) excel in many tasks, but understanding discourse – how sentences connect to form coherent text – remains a challenge. This is especially true for smaller models aiming to match the abilities of their larger counterparts in handling long and complex inputs. To address this, we introduce DEPTH, a new encoder-decoder model designed to foster robust discourse-level representations during the pre-training phase. DEPTH uniquely combines hierarchical sentence representations and the “Sentence Un-shuffling” task with traditional span-corruption objective of encoder-decoder LMs. While span-corruption helps the model learn word-level dependencies, ”Sentence Un- shuffling” forces it to restore the natural order of scrambled sentences, teaching it about the logical flow of language. The encoder-decoder architecture allows DEPTH to consider words both before and after a given token, offering a more nuanced contextual understanding than decoder-only models like GPT. This is crucial for tasks that depend on how sentences interrelate.

We built a pre-training and fine-tuning framework for encoder-decoder models that facilitated our experiments with both T5 and DEPTH, and that seamlessly integrates with open-source tools for distributed training in the HuggingFace ecosystem. DEPTH’s training is remarkably computationally efficient, as it learns meaningful semantic- and discourse-level representations at a faster rate than it’s T5 counterpart. Notably, already in early stages of the pre-training phase, we find that DEPTH outperforms T5 by reaching a lower span-corruption loss. This occurs despite the fact that T5 is trained solely with this objective, and DEPTH is tasked with the additional sentence un-shuffling objective. Evaluations on GLUE and DiscoEval benchmarks demonstrate DEPTH’s ability to quickly learn downstream tasks spanning understanding of syntax (CoLA), sentinment analysis (SST2), sentence positioning (SP), discourse coherence (DC), and natural language inference (MNLI).