Adversarial Inverse Reinforcement Learning (AIRL)1

goal: derive a reward function from a demonstration dataset of expert trajectories

method:

  • solve the maximum likelihood problem , with
  • specifically, for adversarial IRL the above maximization problem is cast as a GAN and into the single state and action case
    • A discriminator is trained using binary cross-entropy
    • An agent (=generator) maximizing its return using

Footnotes

  1. Coined in Learning robust rewards with Adversarial Inverse Reinforcement Learning (Fu et al., 2018)