Release Notes¶

If you need any of the features from the pre-release version listed under “Upcoming” you can just install coax from the main branch:

$ pip install git+https://github.com/coax-dev/coax.git@main

Upcoming¶

Add DeepMind Control Suite example (#29); see DeepMind Control Suite with SAC.
Add coax.utils.sync_shared_params() utility; example in A2C stub.
Improved performance for replay buffer (#25)
Bug fix: random_seed in _prioritized (#24)
Update to new Jax API (#27)
Add Update to gym==0.26.x (#28).
Bug fix: set logging level on TrainMonitor.logger itself (550a965 <https://github.com/coax-dev/coax/commit/550a965d17002bf552ab2fbea49801c65b322c7b>_).
Bug fix: fix affine transform for composite distributions (48ca9ce <https://github.com/coax-dev/coax/commit/48ca9ced42123e906969076dff88540b98e6d0bb>_)
Bug fix: #33

Bug fix: #21
Fix deprecation warnings from using jax.tree_multimap and gym.envs.registry.env_specs.
Upgrade dependencies.

Bug fixes: #16
Replace old jax.ops.index* scatter operations with the new jax.numpy.ndarray.at interface.
Upgrade dependencies.

Bumped version to drop hard dependence on ray.

Implemented stochastic q-learning using quantile regression in coax.StochasticQ, see example: IQN

Use coax.utils.quantiles() for equally spaced quantile fractions as in QR-DQN.
Use coax.utils.quantiles_uniform() for uniformly sampled quantile fractions as in IQN.

This is not much of a release. It’s only really the dependencies that were updated.

Add basic support for distributed agents, see example: Ape-X DQN
Fixed issues with serialization of jit-compiled functions, see jax#5043 and jax#5153
Add support for sample weights in reward tracers

Implemented Prioritized Experience Replay:

Implemented SegmentTree that allows for batched updating.
Implemented SumTree subclass that allows for batched weighted sampling.
Drop TransitionSingle (only use TransitionBatch from now on).
Added TransitionBatch.from_single constructor.
Added TransitionBatch.idx field to identify specific transitions.
Added TransitionBatch.W field to collect sample weights
Made all td_learning and policy_objectives updaters compatible with TransitionBatch.W
Implemented the PrioritizedReplayBuffer class itself.
Added scripts and notebooks: agent stub and pong.

Other utilities:

Added FrameStacking wrapper that respects the gym.space API and is compatible with the jax.tree_util module.
Added data summary (min, median, max) for arrays in pretty_repr util.
Added StepwiseLinearFunction utility, which is handy for hyperparameter schedules, see example usage here.

Implemented Distributional RL algorithm:

Added two new methods to all proba_dists: mean and affine_transform, see coax.proba_dists.
Made TD-learning updaters compatible with coax.StochasticV and coax.StochasticQ.
Made value-based policies compatible with coax.StochasticQ.

First version to go public.