Release Notes

If you need any of the features from the pre-release version listed under “Upcoming” you can just install coax from the main branch:

$ pip install git+https://github.com/coax-dev/coax.git@main

Upcoming

v0.1.13

  • Switch from legacy gym to gymnasium (#21)

  • Upgrade dependencies.

v0.1.12

  • Add DeepMind Control Suite example (#29); see DeepMind Control Suite with SAC.

  • Add coax.utils.sync_shared_params() utility; example in A2C stub.

  • Improved performance for replay buffer (#25)

  • Bug fix: random_seed in _prioritized (#24)

  • Update to new Jax API (#27)

  • Add Update to gym==0.26.x (#28).

  • Bug fix: set logging level on TrainMonitor.logger itself (550a965 <https://github.com/coax-dev/coax/commit/550a965d17002bf552ab2fbea49801c65b322c7b>_).

  • Bug fix: fix affine transform for composite distributions (48ca9ce <https://github.com/coax-dev/coax/commit/48ca9ced42123e906969076dff88540b98e6d0bb>_)

  • Bug fix: #33

v0.1.11

  • Bug fix: #21

  • Fix deprecation warnings from using jax.tree_multimap and gym.envs.registry.env_specs.

  • Upgrade dependencies.

v0.1.10

  • Bug fixes: #16

  • Replace old jax.ops.index* scatter operations with the new jax.numpy.ndarray.at interface.

  • Upgrade dependencies.

v0.1.9

Bumped version to drop hard dependence on ray.

v0.1.8

Implemented stochastic q-learning using quantile regression in coax.StochasticQ, see example: IQN

v0.1.7

This is not much of a release. It’s only really the dependencies that were updated.

v0.1.6

  • Add basic support for distributed agents, see example: Ape-X DQN

  • Fixed issues with serialization of jit-compiled functions, see jax#5043 and jax#5153

  • Add support for sample weights in reward tracers

v0.1.5

v0.1.4

Implemented Prioritized Experience Replay:

  • Implemented SegmentTree that allows for batched updating.

  • Implemented SumTree subclass that allows for batched weighted sampling.

  • Drop TransitionSingle (only use TransitionBatch from now on).

  • Added TransitionBatch.from_single constructor.

  • Added TransitionBatch.idx field to identify specific transitions.

  • Added TransitionBatch.W field to collect sample weights

  • Made all td_learning and policy_objectives updaters compatible with TransitionBatch.W

  • Implemented the PrioritizedReplayBuffer class itself.

  • Added scripts and notebooks: agent stub and pong.

Other utilities:

v0.1.3

Implemented Distributional RL algorithm:

v0.1.2

First version to go public.