אירועים
אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב
גל אבינרי (הרצאה סמינריונית למגיסטר)
יום חמישי, 14.12.2023, 12:30
מנחה: Prof. Aviv Tamar, Prof. Shie Mannor
In meta reinforcement learning (meta-RL) an agent seeks an optimal policy when facing a new unseen task that is sampled from a known task distribution. Such a policy leads an effective trade-off between information gathering and reward accumulation. The offline variant of meta-RL (OMRL) presents a challenge to learn such a policy, as previous work established an identifiability problem in OMRL termed MDP ambiguity. This problem relates to the difficulty of learning a neural network that can infer the task at hand at test time. We propose a new method to utilize prior knowledge of the task distribution to mitigate the identifiability problem in OMRL. Additionally, we propose a novel method to evaluate an inference model \textit{offline}, which is more efficient and accurate than the online alternative of policy optimization. Finally, we show that the offline version of the popular VariBAD algorithm can learn a suboptimal representation for task inference, and propose a simple modification that uses contrastive predictive coding to improve its performance. We compare our methods to Offline VariBAD on two ambiguity-prone tasks and demonstrate results that are on par or better than policy replay - a state of the art method for solving MDP ambiguity - while requiring weaker assumptions.