Experience Replay¶
A simple ring buffer for experience replay. 

A simple ring buffer for experience replay, with prioritized sampling. 
This is where we keep our experiencereplay buffer classes. Some examples of agents that use a replay buffer are:
For specific examples, have a look at the agents for Atari games.
Object Reference¶
 class coax.experience_replay.SimpleReplayBuffer(capacity, random_seed=None)[source]¶
A simple ring buffer for experience replay.
 Parameters:
capacity (positive int) – The capacity of the experience replay buffer.
random_seed (int, optional) – To get reproducible results.
 add(transition_batch)[source]¶
Add a transition to the experience replay buffer.
 Parameters:
transition_batch (TransitionBatch) – A
TransitionBatch
object.
 sample(batch_size=32)[source]¶
Get a batch of transitions to be used for bootstrapped updates.
 Parameters:
batch_size (positive int, optional) – The desired batch size of the sample.
 Returns:
transitions (TransitionBatch) – A
TransitionBatch
object.
 class coax.experience_replay.PrioritizedReplayBuffer(capacity, alpha=1.0, beta=1.0, epsilon=0.0001, random_seed=None)[source]¶
A simple ring buffer for experience replay, with prioritized sampling.
This class uses proportional sampling, which means that the transitions are sampled with relative probability \(p_i\) defined as:
\[p_i\ =\ \frac {\left(\mathcal{A}_i + \epsilon\right)^\alpha} {\sum_{j=1}^N \left(\mathcal{A}_j + \epsilon\right)^\alpha}\]Here \(\mathcal{A}_i\) are advantages provided at insertion time and \(N\) is the capacity of the buffer, which may be quite large. The \(\mathcal{A}_i\) are typically just TD errors collected from a valuefunction updater, e.g.
QLearning.td_error
.Since the prioritized samples are biased, the
sample
method also produces nontrivial importance weights (stored in theTransitionBatch.W
attribute). The logic for constructing these weights for a sample of batch size \(n\) is:\[w_i\ =\ \frac{\left(Np_i\right)^{\beta}}{\max_{j=1}^n \left(Np_j\right)^{\beta}}\]See section 3.4 of https://arxiv.org/abs/1511.05952 for more details.
 Parameters:
capacity (positive int) – The capacity of the experience replay buffer.
alpha (positive float, optional) – The sampling temperature \(\alpha>0\).
beta (positive float, optional) – The importanceweight exponent \(\beta>0\).
epsilon (positive float, optional) – The small regulator \(\epsilon>0\).
random_seed (int, optional) – To get reproducible results.
 add(transition_batch, Adv)[source]¶
Add a transition to the experience replay buffer.
 Parameters:
transition_batch (TransitionBatch) – A
TransitionBatch
object.Adv (ndarray) – A batch of advantages, used to construct the priorities \(p_i\).
 sample(batch_size=32)[source]¶
Get a batch of transitions to be used for bootstrapped updates.
 Parameters:
batch_size (positive int, optional) – The desired batch size of the sample.
 Returns:
transitions (TransitionBatch) – A
TransitionBatch
object.