אירועים
אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב
אורי אלון (הרצאה סמינריונית למגיסטר)
יום חמישי, 17.08.2017, 10:00
We present a novel approach for automatic feature generation for predicting program properties.
Our approach automatically produces features that can capture long-distance syntactic relationships
between program elements. The features are purely syntactic, and the method is useful for any
programming language.
Inspired by Parse Tree Paths in Natural Language Processing (NLP), we generate features that
capture relationships in an Abstract Syntax Tree (AST). We show that these features are general
and can:
(i) cover a number of different prediction tasks, (ii) drive two different learning algorithms
(for both generative and discriminative models), and (iii) work across different programming languages.
We evaluate our approach on the tasks of predicting variable names, method names, and types of
expressions. We use the generated features to drive both CRF-based and word2vec-based learning, for
programs of four languages: JavaScript, Java, Python and C#. Our evaluation shows that automatically
generated features capture semantic similarities and produce better results than existing methods.
By representing program elements using path features, we believe that our approach can be used in a
variety of other machine learning tasks for programming languages, including different applications
and different learning models.