Deep Reinforcement Learning

Playing Atari with Deep Reinforcement Learning

Give the pseudocode for Deep Q-learning with Experience Replay.

Initialize replay memory D to capacity N
Initialize action-value function Q with random weights θ
for episode = 1, M do
    Initialize sequence s1={x1} and preprocessed sequence  ϕ1=ϕ(s1)
    for t=1,T do
        With probability ϵ select a random action at
        Otherwise select at=argmaxaQ^(ϕ(st),a;θ)
        Execute action at in emulator and observe reward rt and image xt+1
        Set st+1=st,at,xt+1 and preprocess ϕt+1=ϕ(st+1)
        Store transition (ϕt,at,rt,ϕt+1) in D
        Sample random minibatch of transitions (ϕj,aj,rj,ϕj+1) from D
        set yj={rjif episode terminates at step j+1rj+γmaxaQ(ϕj+1,a;θ)otherwise
        Perform a gradient descent step on (yjQ(ϕj,aj;θ))2 with respect to the network parameters θ
        Every C steps reset Q^=Q
    End for
End for

Why do we create a replay memory in Deep Q-learning?

Experience replay in Deep Q-Learning has two functions:
1. Make more efficient use of the experiences during training. Usually, in online RL, the agent interacts in the environment, gets the experiences (state, action, reward, and next state), learns from them (updates the neural network), and discards them. This is not efficient.

Experience replay helps using the experiences of the training more efficiently. We use a replay buffer that saves experience samples that we can reuse during the training.
This allows the agent to learn from the same experiences multiple times.

2. Avoid forgetting previous experiences and reduce the correlation between experiences.
Experience replay also has other benefits. By randomly sampling the experiences, we remove correlation in the observation sequences and avoid action values from oscillating or diverging catastrophically.
See also: https://huggingface.co/deep-rl-course/unit3/deep-q-algorithm

Machine Learning Research Flashcards is a collection of flashcards associated with scientific research papers in the field of machine learning. Best used with Anki. Edit MLRF on GitHub.