Memories

Episodic Memories

EpisodicExperienceReplay

  • class rlcoach.memories.episodic.EpisodicExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int] = (, 1000000), n_step=-1, train_to_eval_ratio: int = 1)[source]
  • A replay buffer that stores episodes of transitions. The additional structure allows performing variouscalculations of total return and other values that depend on the sequential behavior of the transitionsin the episode.

    • Parameters
    • max_size – the maximum number of transitions or episodes to hold in the memory

EpisodicHindsightExperienceReplay

  • class rlcoach.memories.episodic.EpisodicHindsightExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]
  • Implements Hindsight Experience Replay as described in the following paper: https://arxiv.org/pdf/1707.01495.pdf

    • Parameters
      • max_size – The maximum size of the memory. should be defined in a granularity of Transitions

      • hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generatefor each actual transition

      • hindsight_goal_selection_method – The method that will be used for generating the goals for thehindsight transitions. Should be one of HindsightGoalSelectionMethod

      • goals_space – A GoalsSpace which defines the base properties of the goals space

EpisodicHRLHindsightExperienceReplay

  • class rlcoach.memories.episodic.EpisodicHRLHindsightExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], hindsight_transitions_per_regular_transition: int, hindsight_goal_selection_method: rl_coach.memories.episodic.episodic_hindsight_experience_replay.HindsightGoalSelectionMethod, goals_space: rl_coach.spaces.GoalsSpace)[source]
  • Implements HRL Hindsight Experience Replay as described in the following paper: https://arxiv.org/abs/1805.08180

This is the memory you should use if you want a shared hindsight experience replay buffer between multiple workers

  • Parameters
    • max_size – The maximum size of the memory. should be defined in a granularity of Transitions

    • hindsight_transitions_per_regular_transition – The number of hindsight artificial transitions to generatefor each actual transition

    • hindsight_goal_selection_method – The method that will be used for generating the goals for thehindsight transitions. Should be one of HindsightGoalSelectionMethod

    • goals_space – A GoalsSpace which defines the properties of the goals

    • do_action_hindsight – Replace the action (sub-goal) given to a lower layer, with the actual achieved goal

SingleEpisodeBuffer

  • class rl_coach.memories.episodic.SingleEpisodeBuffer[source]

Non-Episodic Memories

BalancedExperienceReplay

  • class rlcoach.memories.non_episodic.BalancedExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True, num_classes: int = 0, state_key_with_the_class_index: Any = 'class')[source]
    • Parameters
      • max_size – the maximum number of transitions or episodes to hold in the memory

      • allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

      • num_classes – the number of classes in the replayed data

      • state_key_with_the_class_index – the class index is assumed to be a value in the state dictionary.this parameter determines the key to retrieve the class index value

QDND

  • class rlcoach.memories.non_episodic.QDND(_dict_size, key_width, num_actions, new_value_shift_coefficient=0.1, key_error_threshold=0.01, learning_rate=0.01, num_neighbors=50, return_additional_data=False, override_existing_keys=False, rebuild_on_every_update=False)[source]

ExperienceReplay

  • class rlcoach.memories.non_episodic.ExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], allow_duplicates_in_batch_sampling: bool = True)[source]
  • A regular replay buffer which stores transition without any additional structure

    • Parameters
      • max_size – the maximum number of transitions or episodes to hold in the memory

      • allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

PrioritizedExperienceReplay

  • class rlcoach.memories.non_episodic.PrioritizedExperienceReplay(_max_size: Tuple[rl_coach.memories.memory.MemoryGranularity, int], alpha: float = 0.6, beta: rl_coach.schedules.Schedule = , epsilon: float = 1e-06, allow_duplicates_in_batch_sampling: bool = True)[source]
  • This is the proportional sampling variant of the prioritized experience replay as describedin https://arxiv.org/pdf/1511.05952.pdf.

    • Parameters
      • max_size – the maximum number of transitions or episodes to hold in the memory

      • alpha – the alpha prioritization coefficient

      • beta – the beta parameter used for importance sampling

      • epsilon – a small value added to the priority of each transition

      • allow_duplicates_in_batch_sampling – allow having the same transition multiple times in a batch

TransitionCollection

  • class rl_coach.memories.non_episodic.TransitionCollection[source]
  • Simple python implementation of transitions collection non-episodic memoriesare constructed on top of.