Reward shaping

With reward shaping, the reward function is modified and a shaping reward is added, in order to help the learning algorithm to converge faster: $R^{'} = R + F$

Potential-based reward shaping (PBRS)

Was defined in 1999 by Ng et al. as: $F (s, s^{'}) = γ ϕ (s^{'}) - ϕ (s)$ where $ϕ (s)$ is a potential function returning the potential for a state $s$ . Proven to:

not alter the optimal policy of a single agent acting in an MDP
not alter the set of Nash equilibria for multiple agents in a SG
potential function can be changed during learning without changing the previous 2 properties

Warning

With a badly defined potential function, agents learning with PBRS can still converge to a worse joint-policy than agents learning without it.

Difference reward

$D_{i}$ aims to quantify each agent’s individual contribution to the system performance in a cooperative MAS. $D_{i} (s_{i}, a_{i}) = G (s, a) - G (s_{- i} \cup s_{i}^{c}, a_{- i} \cup a_{i}^{c})$ with the first $G$ term representing the global system utility, and the second the counterfactual global utility for a theoretical system without the contribution of agent i.

Mannion et al. (2018) extends PBRS’ theoretical guaranties to difference reward.

Intrinsic reward

Intrinsic reward add a bonus to the extrinsic reward which helps the agent with exploration. There are two main methods of improving exploration with intrinsic rewards:

count-based methods try to maximize exploration in state-action pairs which have not been visited often
prediction-base methods use the uncertainty as a bonus to encourage the agent to visit unknown areas (e.g. Böhmer (2019), Delos Reyes (2022), NovelD)

Home

Explorer

Reward shaping

Reward shaping

Potential-based reward shaping (PBRS)

Difference reward

Intrinsic reward

Graph View

Table of Contents

Backlinks

Home

Explorer

Reward shaping

Reward shaping §

Potential-based reward shaping (PBRS) §

Difference reward §

Intrinsic reward §

Graph View

Table of Contents

Backlinks

Reward shaping

Potential-based reward shaping (PBRS)

Difference reward

Intrinsic reward