2nd Seminar on
Reinforcement Learning
Barcelona
UPC ETSETB TelecomBCN (Spring 2020)
Reinforcement learning is an area of machine learning concerned by how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.
Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge), and getting rewards from the environment.
The course explores automated decision making from a computational perspective. It examines efficient algorithms, where they exist as well as approaches to learning near- optimal decisions from experience. Topics include Markov decision processes, model- free learning and deep reinforcement learning. Of particular interest will be issues of generalization, exploration, and representation. Students will apply concepts to practical problems.
Instructors
Instructors
Acknowledgements
Lectures
Slides
Reinforcement Learning
- 21/02 (JV) Introduction to reinforcement learning
- 21/02 (MC) Example: Introduction to the gridworld
- 28/02 (JV) Markov Decision Processes. Bellman Equations,
- 28/02 (MC) Example: Recycling robot
- 06/03 (JV) Dynamic programming
- 06/03 (MC) Example: Jack’s car rental
- 20/03 (JV) Monte Carlo methods
- 20/03 (MC) Example: Blackjack
- 27/03 (JV) Time difference methods
- 03/04 (MC,JV) Value function approximation
- 03/04 (MC) Demo: Gridworld over a cliff
- 17/04 (JV) Policy Gradient
Deep Reinforcement Learning
- 24/04 (XG) Deep Reinforcement Learning (DRL)
- 28/04 (XG) How to Train your Neural Network
- 28/04 (XG) Deep Q-Learning
- 08/05 (XG) REINFORCE
Labs
Labs
- 24/04 (XG) Lab: Tabular Q-Learning
- 28/04 (XG) Lab: Q-Learning with Neural Networks
- 08/05 (XG) Lab: REINFORCE
Practical
Practical details
- Study Programs: Bachelor degrees at at ETSETB TelecomBCN from the Universitat Politecnica de Catalunya.
- Course code and official guide: 230329 - MRL
- ECTS credits: 2 ECTS
- Semester: Spring 2020
- Class Schedule: Fridays from 11:00 to 13:00, from February 21 to April 28.
- Room: D5-007
- Location: Campus Nord UPC, Module D5, Room 010
- Course on Atenea