Release Notes

If you need any of the features from the pre-release version listed under “Upcoming” you can just install coax from the main branch:

$ pip install git+



  • Switch from legacy gym to gymnasium (#40)

  • Upgrade dependencies.


  • Add DeepMind Control Suite example (#29); see DeepMind Control Suite with SAC.

  • Add coax.utils.sync_shared_params() utility; example in A2C stub.

  • Improved performance for replay buffer (#25)

  • Bug fix: random_seed in _prioritized (#24)

  • Update to new Jax API (#27)

  • Add Update to gym==0.26.x (#28).

  • Bug fix: set logging level on TrainMonitor.logger itself (550a965 <>_).

  • Bug fix: fix affine transform for composite distributions (48ca9ce <>_)

  • Bug fix: #33


  • Bug fix: #21

  • Fix deprecation warnings from using jax.tree_multimap and gym.envs.registry.env_specs.

  • Upgrade dependencies.


  • Bug fixes: #16

  • Replace old jax.ops.index* scatter operations with the new interface.

  • Upgrade dependencies.


Bumped version to drop hard dependence on ray.


Implemented stochastic q-learning using quantile regression in coax.StochasticQ, see example: IQN


This is not much of a release. It’s only really the dependencies that were updated.


  • Add basic support for distributed agents, see example: Ape-X DQN

  • Fixed issues with serialization of jit-compiled functions, see jax#5043 and jax#5153

  • Add support for sample weights in reward tracers



Implemented Prioritized Experience Replay:

  • Implemented SegmentTree that allows for batched updating.

  • Implemented SumTree subclass that allows for batched weighted sampling.

  • Drop TransitionSingle (only use TransitionBatch from now on).

  • Added TransitionBatch.from_single constructor.

  • Added TransitionBatch.idx field to identify specific transitions.

  • Added TransitionBatch.W field to collect sample weights

  • Made all td_learning and policy_objectives updaters compatible with TransitionBatch.W

  • Implemented the PrioritizedReplayBuffer class itself.

  • Added scripts and notebooks: agent stub and pong.

Other utilities:


Implemented Distributional RL algorithm:


First version to go public.