Data-efficient and Deep RL

Bringing reinforcement learning (RL) methods from an appealing academic concept closer to real world control applications was one of my major research goals from 1994 onwards. Early research concentrated on the investigation of new RL methods for continuous state spaces by using neural networks as continuous Q-function approximators. One key focus was to improve data-efficiency by massive re-use of stored transition data, which lead to Neural Fitted Q Iteration (NFQ, Riedmiller, ECML 2005) and a variant for continuous actions, Neural Fitted Q for Continuous Actions (NFQCA, Hafner and Riedmiller, MLJ, 2011). Our Deep Fitted Q algorithm (DFQ, Lange and Riedmiller, IJCNN 2010) was one of the early examples of deep RL methods to control real world systems from raw camera input. The DQN agent (Mnih et al., 2015) was the first agent that learned to play 49 different Atari games from raw pixels using a single agent architecture.


Demonstration of a cart learning to swing up and balance a straight pole from scratch.

A car learning to drive using NFQ in less than 20 minutes of real driving experience (with colleagues at Stanford, 2006).

Deep Fitted Q (DFQ, Lange und Riedmiller, 2010): Learning to control a race car from raw pixel inputs. One of the world's first Deep RL agents learning on a real system.

Regulating the speed of a toy car (2008) by NFQ. The goal is not to be hurled off the track, while maximising the speed. In this video, the car position was given as an input to the neural controller.

Embed to Control (E2C): Linear Dynamics Model from Raw Images (Watters et al., NeurIPS, 2016)