Release Notes¶
If you need any of the features from the pre-release version listed under “Upcoming” you can just install coax from the main branch:
$ pip install git+https://github.com/coax-dev/coax.git@main
Upcoming¶
…
v0.1.13¶
Switch from legacy
gym
togymnasium
(#40)Upgrade dependencies.
v0.1.12¶
Add DeepMind Control Suite example (#29); see DeepMind Control Suite with SAC.
Add
coax.utils.sync_shared_params()
utility; example in A2C stub.Improved performance for replay buffer (#25)
Bug fix: random_seed in _prioritized (#24)
Update to new Jax API (#27)
Add Update to
gym==0.26.x
(#28).Bug fix: set logging level on
TrainMonitor.logger
itself (550a965 <https://github.com/coax-dev/coax/commit/550a965d17002bf552ab2fbea49801c65b322c7b>_).Bug fix: fix affine transform for composite distributions (48ca9ce <https://github.com/coax-dev/coax/commit/48ca9ced42123e906969076dff88540b98e6d0bb>_)
Bug fix: #33
v0.1.11¶
Bug fix: #21
Fix deprecation warnings from using
jax.tree_multimap
andgym.envs.registry.env_specs
.Upgrade dependencies.
v0.1.10¶
Bug fixes: #16
Replace old
jax.ops.index*
scatter operations with the newjax.numpy.ndarray.at
interface.Upgrade dependencies.
v0.1.9¶
Bumped version to drop hard dependence on ray
.
v0.1.8¶
Implemented stochastic q-learning using quantile regression in coax.StochasticQ
, see example: IQN
Use
coax.utils.quantiles()
for equally spaced quantile fractions as in QR-DQN.Use
coax.utils.quantiles_uniform()
for uniformly sampled quantile fractions as in IQN.
v0.1.7¶
This is not much of a release. It’s only really the dependencies that were updated.
v0.1.6¶
v0.1.5¶
Implemented
coax.td_learning.SoftQLearning
.Added serialization utils:
coax.utils.dump()
,coax.utils.dumps()
,coax.utils.load()
,coax.utils.loads()
.
v0.1.4¶
Implemented Prioritized Experience Replay:
Implemented
SegmentTree
that allows for batched updating.Implemented
SumTree
subclass that allows for batched weighted sampling.Drop TransitionSingle (only use
TransitionBatch
from now on).Added
TransitionBatch.from_single
constructor.Added
TransitionBatch.idx
field to identify specific transitions.Added
TransitionBatch.W
field to collect sample weightsMade all
td_learning
andpolicy_objectives
updaters compatible withTransitionBatch.W
Implemented the
PrioritizedReplayBuffer
class itself.Added scripts and notebooks: agent stub and pong.
Other utilities:
Added
FrameStacking
wrapper that respects thegym.space
API and is compatible with thejax.tree_util
module.Added data summary (min, median, max) for arrays in
pretty_repr
util.Added
StepwiseLinearFunction
utility, which is handy for hyperparameter schedules, see example usage here.
v0.1.3¶
Implemented Distributional RL algorithm:
Added two new methods to all proba_dists:
mean
andaffine_transform
, seecoax.proba_dists
.Made TD-learning updaters compatible with
coax.StochasticV
andcoax.StochasticQ
.Made value-based policies compatible with
coax.StochasticQ
.
v0.1.2¶
First version to go public.