Experience Replay

coax.experience_replay.SimpleReplayBuffer

A simple ring buffer for experience replay.

coax.experience_replay.PrioritizedReplayBuffer

A simple ring buffer for experience replay, with prioritized sampling.


This is where we keep our experience-replay buffer classes. Some examples of agents that use a replay buffer are:

For specific examples, have a look at the agents for Atari games.

Object Reference

class coax.experience_replay.SimpleReplayBuffer(capacity, random_seed=None)[source]

A simple ring buffer for experience replay.

Parameters:
  • capacity (positive int) – The capacity of the experience replay buffer.

  • random_seed (int, optional) – To get reproducible results.

add(transition_batch)[source]

Add a transition to the experience replay buffer.

Parameters:

transition_batch (TransitionBatch) – A TransitionBatch object.

clear()[source]

Clear the experience replay buffer.

sample(batch_size=32)[source]

Get a batch of transitions to be used for bootstrapped updates.

Parameters:

batch_size (positive int, optional) – The desired batch size of the sample.

Returns:

transitions (TransitionBatch) – A TransitionBatch object.

class coax.experience_replay.PrioritizedReplayBuffer(capacity, alpha=1.0, beta=1.0, epsilon=0.0001, random_seed=None)[source]

A simple ring buffer for experience replay, with prioritized sampling.

This class uses proportional sampling, which means that the transitions are sampled with relative probability \(p_i\) defined as:

\[p_i\ =\ \frac {\left(|\mathcal{A}_i| + \epsilon\right)^\alpha} {\sum_{j=1}^N \left(|\mathcal{A}_j| + \epsilon\right)^\alpha}\]

Here \(\mathcal{A}_i\) are advantages provided at insertion time and \(N\) is the capacity of the buffer, which may be quite large. The \(\mathcal{A}_i\) are typically just TD errors collected from a value-function updater, e.g. QLearning.td_error.

Since the prioritized samples are biased, the sample method also produces non-trivial importance weights (stored in the TransitionBatch.W attribute). The logic for constructing these weights for a sample of batch size \(n\) is:

\[w_i\ =\ \frac{\left(Np_i\right)^{-\beta}}{\max_{j=1}^n \left(Np_j\right)^{-\beta}}\]

See section 3.4 of https://arxiv.org/abs/1511.05952 for more details.

Parameters:
  • capacity (positive int) – The capacity of the experience replay buffer.

  • alpha (positive float, optional) – The sampling temperature \(\alpha>0\).

  • beta (positive float, optional) – The importance-weight exponent \(\beta>0\).

  • epsilon (positive float, optional) – The small regulator \(\epsilon>0\).

  • random_seed (int, optional) – To get reproducible results.

add(transition_batch, Adv)[source]

Add a transition to the experience replay buffer.

Parameters:
clear()[source]

Clear the experience replay buffer.

sample(batch_size=32)[source]

Get a batch of transitions to be used for bootstrapped updates.

Parameters:

batch_size (positive int, optional) – The desired batch size of the sample.

Returns:

transitions (TransitionBatch) – A TransitionBatch object.

update(idx, Adv)[source]

Update the priority weights of transitions previously added to the buffer.

Parameters:
  • idx (1d array of ints) – The identifiers of the transitions to be updated.

  • Adv (ndarray) – The corresponding updated advantages.