rainbow_agent.RainbowAgent

Class RainbowAgent

Inherits From: DQNAgent

A compact implementation of a simplified Rainbow agent.

Methods

init

  1. __init__(
  2. *args,
  3. **kwargs
  4. )

Initializes the agent and constructs the components of its graph.

Args:

  • sess: tf.Session, for executing ops.
  • num_actions: int, number of actions the agent can take at any
    state.
  • num_atoms: int, the number of buckets of the value function
    distribution.
  • vmax: float, the value distribution support is [-vmax, vmax].
  • gamma: float, discount factor with the usual RL meaning.
  • update_horizon: int, horizon at which updates are performed, the
    ‘n’ in n-step update.
  • min_replay_history: int, number of transitions that should be
    experienced before the agent begins training its value function.
  • update_period: int, period between DQN updates.
  • target_update_period: int, update period for the target network.
  • epsilon_fn: function expecting 4 parameters: (decay_period, step,
    warmup_steps, epsilon). This function should return the epsilon value used
    for exploration during training.
  • epsilon_train: float, the value to which the agent’s epsilon is
    eventually decayed during training.
  • epsilon_eval: float, epsilon used when evaluating the agent.
  • epsilon_decay_period: int, length of the epsilon decay schedule.
  • replay_scheme: str, ‘prioritized’ or ‘uniform’, the sampling scheme
    of the replay memory.
  • tf_device: str, Tensorflow device on which the agent’s graph is
    executed.
  • use_staging: bool, when True use a staging area to prefetch the
    next training batch, speeding training up by about 30%.
  • optimizer: tf.train.Optimizer, for training the value function.

begin_episode

  1. begin_episode(observation)

Returns the agent’s first action for this episode.

Args:

  • observation: numpy array, the environment’s initial observation.

Returns:

int, the selected action.

bundle_and_checkpoint

  1. bundle_and_checkpoint(
  2. checkpoint_dir,
  3. iteration_number
  4. )

Returns a self-contained bundle of the agent’s state.

This is used for checkpointing. It will return a dictionary containing all
non-TensorFlow objects (to be saved into a file by the caller), and it saves all
TensorFlow objects into a checkpoint file.

Args:

  • checkpoint_dir: str, directory where TensorFlow objects will be
    saved.
  • iteration_number: int, iteration number to use for naming the
    checkpoint file.

Returns:

A dict containing additional Python objects to be checkpointed by the
experiment. If the checkpoint directory does not exist, returns None.

end_episode

  1. end_episode(reward)

Signals the end of the episode to the agent.

We store the observation of the current time step, which is the last observation
of the episode.

Args:

  • reward: float, the last reward from the environment.

step

  1. step(
  2. reward,
  3. observation
  4. )

Records the most recent transition and returns the agent’s next action.

We store the observation of the last time step since we want to store it with
the reward.

Args:

  • reward: float, the reward received from the agent’s most recent
    action.
  • observation: numpy array, the most recent observation.

Returns:

int, the selected action.

unbundle

  1. unbundle(
  2. checkpoint_dir,
  3. iteration_number,
  4. bundle_dictionary
  5. )

Restores the agent from a checkpoint.

Restores the agent’s Python objects to those specified in bundle_dictionary, and
restores the TensorFlow objects to those specified in the checkpoint_dir. If the
checkpoint_dir does not exist, will not reset the agent’s state.

Args:

  • checkpoint_dir: str, path to the checkpoint saved by tf.Save.
  • iteration_number: int, checkpoint version, used when restoring
    replay buffer.
  • bundle_dictionary: dict, containing additional Python objects owned
    by the agent.

Returns:

bool, True if unbundling was successful.