The base class for defining workers as part of a distributed agent.

This module provides the abstractions required for building distributed agents.

The coax.Worker is at the heart of such agents.

The way this works in coax is to define a class derived from coax.Worker and then to create multiple instances of that class, which can play different roles. For instance, have a look at the implementation an Ape-X DQN agent here.

Object Reference

class coax.Worker(env, param_store=None, pi=None, tracer=None, buffer=None, buffer_warmup=None, name=None)[source]

The base class for defining workers as part of a distributed agent.

  • env (gymnasium.Env | str | function) – Specifies the gymnasium-style environment by either passing the env itself (gymnasium.Env), its name (str), or a function that generates the environment.

  • param_store (Worker, optional) – A distributed agent is presumed to have one worker that plays the role of a parameter store. To define the parameter-store worker itself, you must leave param_store=None. For other worker roles, however, param_store must be provided.

  • pi (Policy, optional) – The behavior policy that is used by rollout workers to generate experience.

  • tracer (RewardTracer, optional) – The reward tracer that is used by rollout workers.

  • buffer (ReplayBuffer, optional) – The experience-replay buffer that is populated by rollout workers and sampled from by learners.

  • buffer_warmup (int, optional) – The warmup period for the experience replay buffer, i.e. the minimal number of transitions that need to be stored in the replay buffer before we start sampling from it.

  • name (str, optional) – A human-readable identifier of the worker.

abstract get_state()[source]

Get the internal state that is shared between workers.


state (object) – The internal state. This will be consumed by set_state(state).

abstract learn(transition_batch)[source]

Update the model parameters given a transition batch.

abstract set_state(state)[source]

Set the internal state that is shared between workers.


state (object) – The internal state, as returned by get_state().

abstract trace(s, a, r, done, logp=0.0, w=1.0)[source]

This implements the reward-tracing step of a single, raw transition.

  • s (state observation) – A single state observation.

  • a (action) – A single action.

  • r (float) – A single observed reward.

  • done (bool) – Whether the episode has finished.

  • logp (float, optional) – The log-propensity \(\log\pi(a|s)\).

  • w (float, optional) – Sample weight associated with the given state-action pair.