Hindsight Experience Replay (HER) is a technique aiming at improving sample-efficiency for sparse and binary rewards that can be combined with any arbitrary off-policy reinforcement learning algorithm. It is used in a multi-goal setting (explicit or not), where we can have a reward function conditioned on a goal (e.g. target cell coordinates in a grid world).

Idea: After experiencing an episode, we store the transitions in the replay buffer with the original goal and with a set of additional goals1.

Note

“The goal being pursued influences the agent’s actions but no not the environment dynamics and therefore we can replay each trajectory with an arbitrary goal assuming that we use an off-policy RL algorithm like DQN, DDPG…”

This is in practice not true for MARL, so it might not be applicable directly.

Example: If a robot wants to reach some goal state, denoted by coordinates, and it fails to do so, the trajectory is also added to the ER with the final state as a goal (“final” strategy).

Interestingly, the authors empirically show that HER is useful (speed + performance) even when we only care about the case of a single goal.

Footnotes

  1. The authors consider various strategies:

    • final: add the goal actually achieved in the episode
    • future: add random goals (states) corresponding to states coming from the same episode as the transition being replayed and were observed after it
    • episode: add random goals (states) corresponding to states coming from the same episode as the transition being replayed
    • random: add k random goals (states) encountered so far in the whole training procedure Empirically, future seems to showcase the best performance on average.