Experience Replay¶

`coax.experience_replay.SimpleReplayBuffer`	A simple ring buffer for experience replay.
`coax.experience_replay.PrioritizedReplayBuffer`	A simple ring buffer for experience replay, with prioritized sampling.

This is where we keep our experience-replay buffer classes. Some examples of agents that use a replay buffer are:

For specific examples, have a look at the agents for Atari games.

Object Reference¶

class coax.experience_replay.SimpleReplayBuffer(capacity, random_seed=None)[source]¶

A simple ring buffer for experience replay.

Parameters:

capacity (positive int) – The capacity of the experience replay buffer.
random_seed (int, optional) – To get reproducible results.

add(transition_batch)[source]¶

Add a transition to the experience replay buffer.

Parameters:: transition_batch (TransitionBatch) – A TransitionBatch object.

clear()[source]¶: Clear the experience replay buffer.

sample(batch_size=32)[source]¶

Get a batch of transitions to be used for bootstrapped updates.

Parameters:: batch_size (positive int, optional) – The desired batch size of the sample.
Returns:: transitions (TransitionBatch) – A TransitionBatch object.

class coax.experience_replay.PrioritizedReplayBuffer(capacity, alpha=1.0, beta=1.0, epsilon=0.0001, random_seed=None)[source]¶

A simple ring buffer for experience replay, with prioritized sampling.

This class uses proportional sampling, which means that the transitions are sampled with relative probability \(p_i\) defined as:

\[p_i\ =\ \frac {\left(|\mathcal{A}_i| + \epsilon\right)^\alpha} {\sum_{j=1}^N \left(|\mathcal{A}_j| + \epsilon\right)^\alpha}\]

Here \(\mathcal{A}_i\) are advantages provided at insertion time and \(N\) is the capacity of the buffer, which may be quite large. The \(\mathcal{A}_i\) are typically just TD errors collected from a value-function updater, e.g. QLearning.td_error.

Since the prioritized samples are biased, the sample method also produces non-trivial importance weights (stored in the TransitionBatch.W attribute). The logic for constructing these weights for a sample of batch size \(n\) is:

\[w_i\ =\ \frac{\left(Np_i\right)^{-\beta}}{\max_{j=1}^n \left(Np_j\right)^{-\beta}}\]

See section 3.4 of https://arxiv.org/abs/1511.05952 for more details.

Parameters:

capacity (positive int) – The capacity of the experience replay buffer.
alpha (positive float, optional) – The sampling temperature \(\alpha>0\).
beta (positive float, optional) – The importance-weight exponent \(\beta>0\).
epsilon (positive float, optional) – The small regulator \(\epsilon>0\).
random_seed (int, optional) – To get reproducible results.

add(transition_batch, Adv)[source]¶

Add a transition to the experience replay buffer.

Parameters:

transition_batch (TransitionBatch) – A TransitionBatch object.
Adv (ndarray) – A batch of advantages, used to construct the priorities \(p_i\).

clear()[source]¶: Clear the experience replay buffer.

sample(batch_size=32)[source]¶

Get a batch of transitions to be used for bootstrapped updates.

Parameters:: batch_size (positive int, optional) – The desired batch size of the sample.
Returns:: transitions (TransitionBatch) – A TransitionBatch object.

update(idx, Adv)[source]¶

Update the priority weights of transitions previously added to the buffer.

Parameters:

idx (1d array of ints) – The identifiers of the transitions to be updated.
Adv (ndarray) – The corresponding updated advantages.