Environments¶
An adversarial environment for playing the Connect-Four game. |
This is a collection of environments currently not included in Gymnasium.
Object Reference¶
- class coax.envs.ConnectFourEnv[source]¶
An adversarial environment for playing the Connect-Four game.
- Variables:
action_space (gymnasium.spaces.Discrete(7)) – The action space.
observation_space (MultiDiscrete(nvec)) –
The state observation space, representing the position of the current player’s tokens (
s[1:,:,0]
) and the other player’s tokens (s[1:,:,1]
) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]
) or the other player (s[0,:,1]
).Note: The “current” player is relative to whose turn it is, which means that the entries
s[:,:,0]
ands[:,:,1]
swap between turns.max_time_steps (int) – Maximum number of timesteps within each episode.
available_actions (array of int) – Array of available actions. This list shrinks when columns saturate.
win_reward (1.0) – The reward associated with a win.
loss_reward (-1.0) – The reward associated with a loss.
draw_reward (0.0) – The reward associated with a draw.
- close()¶
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections.
- reset()[source]¶
Reset the environment to the starting position.
- Returns:
s (3d-array, shape: [num_rows + 1, num_cols, num_players]) – A state observation, representing the position of the current player’s tokens (
s[1:,:,0]
) and the other player’s tokens (s[1:,:,1]
) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]
) or the other player (s[0,:,1]
).Note: The “current” player is relative to whose turn it is, which means that the entries
s[:,:,0]
ands[:,:,1]
swap between turns.
- step(a)[source]¶
Take one step in the MDP, following the single-player convention from gymnasium.
- Parameters:
a (int, options: {0, 1, 2, 3, 4, 5, 6}) – The action to be taken. The action is the zero-based count of the possible insertion slots, starting from the left of the board.
- Returns:
s_next (array, shape [6, 7, 2]) – A next-state observation, representing the position of the current player’s tokens (
s[1:,:,0]
) and the other player’s tokens (s[1:,:,1]
) as well as a mask over the space of actions, indicating which actions are available to the current player (s[0,:,0]
) or the other player (s[0,:,1]
).Note: The “current” player is relative to whose turn it is, which means that the entries
s[:,:,0]
ands[:,:,1]
swap between turns.r (float) – Reward associated with the transition \((s, a)\to s_\text{next}\).
Note: Since “current” player is relative to whose turn it is, you need to be careful about aligning the rewards with the correct state or state-action pair. In particular, this reward \(r\) is the one associated with the \(s\) and \(a\), i.e. not aligned with \(s_\text{next}\).
done (bool) – Whether the episode is done.
info (dict or None) – A dict with some extra information (or None).
- property np_random: Generator¶
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of np.random.Generator
- property unwrapped: Env[ObsType, ActType]¶
Returns the base non-wrapped environment.
- Returns:
Env – The base non-wrapped
gymnasium.Env
instance