Adversarial Inverse Reinforcement Learning (AIRL)1
goal: derive a reward function from a demonstration dataset of expert trajectories
method:
- solve the maximum likelihood problem , with
- specifically, for adversarial IRL the above maximization problem is cast as a GAN and into the single state and action case
- A discriminator is trained using binary cross-entropy
- An agent (=generator) maximizing its return using
Footnotes
-
Coined in Learning robust rewards with Adversarial Inverse Reinforcement Learning (Fu et al., 2018) ↩