Pixel Club: Viewpoint Estimation - Insights & Model

Gilad Divon (EE, Technion)
Wednesday, 11.4.2018, 11:30
EE Meyer Building 1061

This thesis addresses the problem of viewpoint estimation of an object in a given image, where the objects belong to several known categories. Convolutional Neural Networks were recently applied to this problem, leading to large improvements of state-of-the-art results. Two major approaches have been pursued: a regression approach, which handles the continuous values of view points naturally, and a classification approach, which discretized the space of viewpoints. We follow the second approach and present five key insights that should be taken into consideration when designing a CNN that solves the problem. These insights regard all three components of any network: the architecture, the training data, and the loss function. Based on these insights, the thesis proposes a network in which (i) The architecture jointly solves detection, classification, and viewpoint estimation, using the most advanced CNN for performing the two former tasks. (ii) New types of data are added and trained on, in order to address the shortage in labeled data. Specifically, we propose to utilize both flipped images and video clips. (iii) A novel loss function, which takes into account both the geometry of the problem, as well as the new types of data, is propose. Our network improves the state-of-the-art results for this problem on PASCAL3D+ by 9.8%. The influence of each component is rigorously analyzed.

*MSc. student under the supervision of Prof. Ayellet Tal.

Back to the index of events