Regularizers¶
Policy regularization term based on the entropy of the policy. 

Policy regularization term based on the KullbackLeibler divergence of the policy relative to a given set of priors. 
This is a collection of regularizers that can be used to put soft constraints on stochastic function approximators. These is typically added to the loss/objective to avoid premature exploitation of a policy.
Object Reference¶
 class coax.regularizers.EntropyRegularizer(f, beta=0.001)[source]¶
Policy regularization term based on the entropy of the policy.
The regularization term is to be added to the loss function:
\[\text{loss}(\theta; s,a)\ =\ J(\theta; s,a)  \beta\,H[\pi_\theta(.s)]\]where \(J(\theta)\) is the bare policy objective.
 Parameters:
f (stochastic function approximator) – The stochastic function approximator (e.g.
coax.Policy
) to regularize.beta (nonnegative float) – The coefficient that determines the strength of the overall regularization term.
 property function¶
JITcompiled function that returns the values for the regularization term.
 Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (nonnegative float) – The coefficient that determines the strength of the overall regularization term.
 property metrics_func¶
JITcompiled function that returns the performance metrics for the regularization term.
 Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (nonnegative float) – The coefficient that determines the strength of the overall regularization term.
 class coax.regularizers.KLDivRegularizer(f, beta=0.001, priors=None)[source]¶
Policy regularization term based on the KullbackLeibler divergence of the policy relative to a given set of priors.
The regularization term is to be added to the loss function:
\[\text{loss}(\theta; s,a)\ =\ J(\theta; s,a) + \beta\,KL[\pi_\theta, \pi_\text{prior}]\]where \(J(\theta)\) is the bare policy objective. Also, in order to unclutter the notation we abbreviated \(\pi(.s)\) by \(\pi\).
 Parameters:
f (stochastic function approximator) – The stochastic function approximator (e.g.
coax.Policy
) to regularize.beta (nonnegative float) – The coefficient that determines the strength of the overall regularization term.
priors (pytree with ndarray leaves, optional) – The distribution parameters that correspond to the priors. If left unspecified, we’ll use
proba_dist.default_priors
, see e.g.NormalDist.default_priors
.
 property function¶
JITcompiled function that returns the values for the regularization term.
 Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (nonnegative float) – The coefficient that determines the strength of the overall regularization term.
priors (pytree with ndarray leaves) – The distribution parameters that correspond to the priors.
 property metrics_func¶
JITcompiled function that returns the performance metrics for the regularization term.
 Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (nonnegative float) – The coefficient that determines the strength of the overall regularization term.
priors (pytree with ndarray leaves) – The distribution parameters that correspond to the priors.