Reinforcement Learning (RO5102 T)

The lecture Reinforcement Learning belongs to the Module Robot Learning (RO4100).

In the winter semester, Prof. Dr. Elmar Rueckert is teaching the course Probabilistic Machine Learning (RO5101 T).

In the summer semester, Prof. Dr. Elmar Rueckert is teaching the course Reinforcement Learning (RO5102 T).

Important: Due to the study regulations, students have to attend both lectures to receive a final grade. For the PML lecture, there will be a single written exam. For this RL lecture, the students will have an oral exam after the presentation of their work.

The course topics are

  1. Introduction to Robotics and Reinforcement Learning (Refresher on Robotics, kinematics, model learning and learning feedback control strategies).
  2. Foundations of Decision Making (Reward Hypothesis, Markov Property, Markov Reward Process, Value Iteration, Markov Decision Process, Policy Iteration, Bellman Equation, Link to Optimal Control).
  3. Principles of Reinforcement Learning (Exploration and Exploitation strategies, On & Off-policy learning, model-free and model-based policy learning, Algorithmic principles: Q-Learning, SARSA, TD-Learning, Function Approximation, Fitted Q-Iteration).
  4. Deep Reinforcement Learning (Introduction to Deep Networks, Stochastic Gradient Descent, Deep Q-Learning, Recent research results in Stochastic Deep Neural Networks).

The learning objectives / qualifications are

  • Students get a comprehensive understanding of basic decision making theories, assumptions and methods.
  • Students learn to analyze the challenges in a reinforcement learning application and to identify promising learning approaches.
  • Students will understand the difference between deterministic and probabilistic policies and can define underlying assumptions and requirements for learning them.
  • Students understand and can apply advanced policy gradient methods to real world problems.
  • Students know how to analyze the learning results and improve the policy learner parameters.
  • Students understand how the basic concepts are used in current state of the art research in robot reinforcement learning and in deep neural networks.

Follow this link to register for the course:

Location & Times

  • The lecture is organized as block lecture followed by four to five presentation events during the semseter.
  • Monday – Thursday, 23rd till 26th of March 2020 09:00-18:00 Seminarraum 2/3 (Cook / Karp), rooms 68 + 69, building 64, ground floor
  • Tuesday, 31st of March 2020 09:00-18:00 Seminarraum 2/3 (Cook / Karp), rooms 68 + 69, building 64, ground floor
  • Presentations & Oral Exams To Be Decided (TBD), planned are four events during the semester where four teams (formed by three students) present.


Strong statistical and mathematical knowledge is required beforehand. It is highly recommended to attend the courses Humanoid Robotics (RO5300) and Probabilistic Machine Learning (RO 5101 T) prior to attending this course. The students will also experiment with state-of-the-art reinforcement learning methods and robotic simulation tools which require strong programming skills.


The course will be organized as block lecture (5 full days) plus additional project work and final presentations. Details will be presented in the first course unit on March 23rd, 2020, 9:15 in S2/S3 in building 64, basement.

Course dates & materials (tentative schedule)

Dates & Times TopicsLinks
23.03.2020 09:15-10:45VOAn Introduction Introduction to Robotics and Reinforcement Learning
23.03.2020 11:00-12:30UETutorial on Python and OpenAI
23.03.2020 13:30-15:00VOFoundation on Decision Making
23.03.2020 15:30-18:00UEDecision Making Examples
24.03.2020 09:15-10:45VO Policy Iteration, Bellman Equation, Link to Optimal Control
24.03.2020 11:15-18:00VO+UEBasic RL Algorithms for discrete spaces
25.03.2020 09:15-10:45VOPrinciples of Reinforcement Learning
25.03.2020 11:15-18:00VO+UEPrinciples of Reinforcement Learning
26.03.2020 09:15-10:45VOPolicy Search Methods and Contextual Policy Search
26.03.2020 11:15-18:00VO+UEPolicy Search Methods and Contextual Policy Search
31.03.2020 09:15-10:45VODeep Reinforcement Learning
31.03.2020 11:15-18:00VO+UEDeep Reinforcement Learning
May 2020VOPresentation Slot
May 2020VOPresentation Slot
June 2020VOPresentation Slot
June 2020VOPresentation Slot
July 2020VOPresentation Slot

Prof. Dr. Elmar Rueckert
Teaching Assistant:
Honghu Xue, M.Sc.
English only


  • Richard S. Sutton, Andrew Barto: Reinforcement Learning: An Introduction. The MIT Press Cambridge, Massachusetts London, England, 1998.
  • Csaba Szepesvri: Algorithms for Reinforcement Learning. Morgan & Claypool in July 2010.
  • B. Siciliano, L. Sciavicco: Robotics: Modelling,Planning and Control, Springer, 2009.
  • Puterman, Martin L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  • Szepesvari, Csaba. Algorithms for reinforcement learning (synthesis lectures on artificial intelligence and machine learning). Morgan and Claypool (2010).

Bonus Points and Group Numbers

We will provide links to the list of bonus points as well as the list of group numbers here (coming soon).

Materials for the exercise

The course is accompanied by three pieces of course work on Policy Search for discrete state and action spaces (grid world example), policy learning in continuous spaces using function approximations and policy gradient methods in challenging simulated robotic tasks. The assignments will include both written tasks and algorithmic implementations in Python, and will be presented during the exercise sessions. The OpenAI Gym platform will used in the project works.