דלג לתוכן (מקש קיצור 's')
אירועים

אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב

event speaker icon
שניר כהן (הרצאה סמינריונית למגיסטר)
event date icon
יום רביעי, 25.10.2017, 14:00
event location icon
Taub 601
event speaker icon
מנחה: Prof. S. Mannor
Optimistic methods for solving Reinforcement Learning problems are very popular in the literature. In practice, however, these methods show inferior performance compared to other methods, such as Posterior Sampling. We propose a novel concept of Restricted Optimism to balance the well known exploration vs. exploitation trade-off for finite-horizon MDPs. We harness Posterior Sampling to construct two algorithms in the spirit of our Restricted Optimism principle. We provide theoretical guarantees for them and demonstrate through experiments that there exists a trade-off between the average cumulative regret suffered by the agent and the variance. The agent can influence this trade-off by tuning the level of optimism carried out by our proposed algorithms through a regularization parameter.