> ÇмúÇà»ç > ±¹³»Çмú´ëȸ
The theoretical foundation of reinforcement learning rests on solving a Markov Decision Process. This foundation is in deep tension with applications of reinforcement learning which rely heavy on generalization---the ability to successfully learn what to do given never-before-seen circumstances. We've pursued an agenda of adding generalization to reinforcement learning for well over a decade now, resulting in: Contextual Bandits: Addressing learning of immediate rewards with generalization. This is now a service ( http://aka.ms/personalizer) winning the AI system of the year award at IJCAI 2019.
Learning to Search: An efficient approach to improving on existing policies, with or without simulators.
Contextual Decision Processes: A new theory directly combining strategic exploration, generalization, and temporal credit assignment which we've used to solve 2^100-sparse problems.