Regularizers¶
Policy regularization term based on the entropy of the policy. |
|
Policy regularization term based on the Kullback-Leibler divergence of the policy relative to a given set of priors. |
This is a collection of regularizers that can be used to put soft constraints on stochastic function approximators. These is typically added to the loss/objective to avoid premature exploitation of a policy.
Object Reference¶
- class coax.regularizers.EntropyRegularizer(f, beta=0.001)[source]¶
Policy regularization term based on the entropy of the policy.
The regularization term is to be added to the loss function:
\[\text{loss}(\theta; s,a)\ =\ -J(\theta; s,a) - \beta\,H[\pi_\theta(.|s)]\]where \(J(\theta)\) is the bare policy objective.
- Parameters:
f (stochastic function approximator) – The stochastic function approximator (e.g.
coax.Policy
) to regularize.beta (non-negative float) – The coefficient that determines the strength of the overall regularization term.
- property function¶
JIT-compiled function that returns the values for the regularization term.
- Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (non-negative float) – The coefficient that determines the strength of the overall regularization term.
- property metrics_func¶
JIT-compiled function that returns the performance metrics for the regularization term.
- Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (non-negative float) – The coefficient that determines the strength of the overall regularization term.
- class coax.regularizers.KLDivRegularizer(f, beta=0.001, priors=None)[source]¶
Policy regularization term based on the Kullback-Leibler divergence of the policy relative to a given set of priors.
The regularization term is to be added to the loss function:
\[\text{loss}(\theta; s,a)\ =\ -J(\theta; s,a) + \beta\,KL[\pi_\theta, \pi_\text{prior}]\]where \(J(\theta)\) is the bare policy objective. Also, in order to unclutter the notation we abbreviated \(\pi(.|s)\) by \(\pi\).
- Parameters:
f (stochastic function approximator) – The stochastic function approximator (e.g.
coax.Policy
) to regularize.beta (non-negative float) – The coefficient that determines the strength of the overall regularization term.
priors (pytree with ndarray leaves, optional) – The distribution parameters that correspond to the priors. If left unspecified, we’ll use
proba_dist.default_priors
, see e.g.NormalDist.default_priors
.
- property function¶
JIT-compiled function that returns the values for the regularization term.
- Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (non-negative float) – The coefficient that determines the strength of the overall regularization term.
priors (pytree with ndarray leaves) – The distribution parameters that correspond to the priors.
- property metrics_func¶
JIT-compiled function that returns the performance metrics for the regularization term.
- Parameters:
dist_params (pytree with ndarray leaves) – The distribution parameters of the (conditional) probability distribution.
beta (non-negative float) – The coefficient that determines the strength of the overall regularization term.
priors (pytree with ndarray leaves) – The distribution parameters that correspond to the priors.