Skip to content (access key 's')
Logo of Technion
Logo of CS Department
Logo of CS4People
Events

The Taub Faculty of Computer Science Events and Talks

Offline Meta-RL: Applicable Ambiguity Alleviation
event speaker icon
Gal Avineri (M.Sc. Thesis Seminar)
event date icon
Thursday, 14.12.2023, 12:30
event location icon
Zoom Lecture: 94960036903 and Taub 601
event speaker icon
Advisor: Prof. Aviv Tamar, Prof. Shie Mannor
In meta reinforcement learning (meta-RL) an agent seeks an optimal policy when facing a new unseen task that is sampled from a known task distribution. Such a policy leads an effective trade-off between information gathering and reward accumulation. The offline variant of meta-RL (OMRL) presents a challenge to learn such a policy, as previous work established an identifiability problem in OMRL termed MDP ambiguity. This problem relates to the difficulty of learning a neural network that can infer the task at hand at test time. We propose a new method to utilize prior knowledge of the task distribution to mitigate the identifiability problem in OMRL. Additionally, we propose a novel method to evaluate an inference model \textit{offline}, which is more efficient and accurate than the online alternative of policy optimization. Finally, we show that the offline version of the popular VariBAD algorithm can learn a suboptimal representation for task inference, and propose a simple modification that uses contrastive predictive coding to improve its performance. We compare our methods to Offline VariBAD on two ambiguity-prone tasks and demonstrate results that are on par or better than policy replay - a state of the art method for solving MDP ambiguity - while requiring weaker assumptions.