Release Notes¶
If you need any of the features from the pre-release version listed under “Upcoming” you can just install coax from the main branch:
$ pip install git+https://github.com/coax-dev/coax.git@main
Upcoming¶
…
v0.1.13¶
Switch from legacy
gymtogymnasium(#21)Upgrade dependencies.
v0.1.12¶
Add DeepMind Control Suite example (#29); see DeepMind Control Suite with SAC.
Add
coax.utils.sync_shared_params()utility; example in A2C stub.Improved performance for replay buffer (#25)
Bug fix: random_seed in _prioritized (#24)
Update to new Jax API (#27)
Add Update to
gym==0.26.x(#28).Bug fix: set logging level on
TrainMonitor.loggeritself (550a965 <https://github.com/coax-dev/coax/commit/550a965d17002bf552ab2fbea49801c65b322c7b>_).Bug fix: fix affine transform for composite distributions (48ca9ce <https://github.com/coax-dev/coax/commit/48ca9ced42123e906969076dff88540b98e6d0bb>_)
Bug fix: #33
v0.1.11¶
Bug fix: #21
Fix deprecation warnings from using
jax.tree_multimapandgym.envs.registry.env_specs.Upgrade dependencies.
v0.1.10¶
Bug fixes: #16
Replace old
jax.ops.index*scatter operations with the newjax.numpy.ndarray.atinterface.Upgrade dependencies.
v0.1.9¶
Bumped version to drop hard dependence on ray.
v0.1.8¶
Implemented stochastic q-learning using quantile regression in coax.StochasticQ, see example: IQN
Use
coax.utils.quantiles()for equally spaced quantile fractions as in QR-DQN.Use
coax.utils.quantiles_uniform()for uniformly sampled quantile fractions as in IQN.
v0.1.7¶
This is not much of a release. It’s only really the dependencies that were updated.
v0.1.6¶
v0.1.5¶
Implemented
coax.td_learning.SoftQLearning.Added serialization utils:
coax.utils.dump(),coax.utils.dumps(),coax.utils.load(),coax.utils.loads().
v0.1.4¶
Implemented Prioritized Experience Replay:
Implemented
SegmentTreethat allows for batched updating.Implemented
SumTreesubclass that allows for batched weighted sampling.Drop TransitionSingle (only use
TransitionBatchfrom now on).Added
TransitionBatch.from_singleconstructor.Added
TransitionBatch.idxfield to identify specific transitions.Added
TransitionBatch.Wfield to collect sample weightsMade all
td_learningandpolicy_objectivesupdaters compatible withTransitionBatch.WImplemented the
PrioritizedReplayBufferclass itself.Added scripts and notebooks: agent stub and pong.
Other utilities:
Added
FrameStackingwrapper that respects thegym.spaceAPI and is compatible with thejax.tree_utilmodule.Added data summary (min, median, max) for arrays in
pretty_reprutil.Added
StepwiseLinearFunctionutility, which is handy for hyperparameter schedules, see example usage here.
v0.1.3¶
Implemented Distributional RL algorithm:
Added two new methods to all proba_dists:
meanandaffine_transform, seecoax.proba_dists.Made TD-learning updaters compatible with
coax.StochasticVandcoax.StochasticQ.Made value-based policies compatible with
coax.StochasticQ.
v0.1.2¶
First version to go public.