Workers¶
The base class for defining workers as part of a distributed agent. |
This module provides the abstractions required for building distributed agents.
The coax.Worker
is at the heart of such agents.
The way this works in coax is to define a class derived from coax.Worker
and then to
create multiple instances of that class, which can play different roles. For instance, have a look
at the implementation an Ape-X DQN agent here.
Object Reference¶
- class coax.Worker(env, param_store=None, pi=None, tracer=None, buffer=None, buffer_warmup=None, name=None)[source]¶
The base class for defining workers as part of a distributed agent.
- Parameters:
env (gymnasium.Env | str | function) – Specifies the gymnasium-style environment by either passing the env itself (gymnasium.Env), its name (str), or a function that generates the environment.
param_store (Worker, optional) – A distributed agent is presumed to have one worker that plays the role of a parameter store. To define the parameter-store worker itself, you must leave
param_store=None
. For other worker roles, however,param_store
must be provided.pi (Policy, optional) – The behavior policy that is used by rollout workers to generate experience.
tracer (RewardTracer, optional) – The reward tracer that is used by rollout workers.
buffer (ReplayBuffer, optional) – The experience-replay buffer that is populated by rollout workers and sampled from by learners.
buffer_warmup (int, optional) – The warmup period for the experience replay buffer, i.e. the minimal number of transitions that need to be stored in the replay buffer before we start sampling from it.
name (str, optional) – A human-readable identifier of the worker.
- abstract get_state()[source]¶
Get the internal state that is shared between workers.
- Returns:
state (object) – The internal state. This will be consumed by
set_state(state)
.
- abstract set_state(state)[source]¶
Set the internal state that is shared between workers.
- Parameters:
state (object) – The internal state, as returned by
get_state()
.
- abstract trace(s, a, r, done, logp=0.0, w=1.0)[source]¶
This implements the reward-tracing step of a single, raw transition.
- Parameters:
s (state observation) – A single state observation.
a (action) – A single action.
r (float) – A single observed reward.
done (bool) – Whether the episode has finished.
logp (float, optional) – The log-propensity \(\log\pi(a|s)\).
w (float, optional) – Sample weight associated with the given state-action pair.