This paper focuses on the problem of finding a fair utility function for the MORL problem.
Algorithm
In the paper, fair solutions need to be:
- efficient, i.e., Pareto optimal
- impartial, meaning similar users should be treated similarly
- equitable, meaning that we want to transfer utility from a better-off to a worse-off user when possible
The generalized Gini social welfare function (GGF) satisfies these 3 properties: where is a weight vector with strictly decreasing components, and is sorted in an increasing order.
They apply it to:
- DQN, by using the GGF for action selection
- PG (PPO and A2C), by applying GGF around the objective
Convergence
Note that for the DQN version, since the scalarization operator is non-linear and both the learned policies and the action selection are stationary, the first condition for fair policies may not be achieved.
Experiments
They compare regular MORL algorithm (DQN, A2C, PPO) to their GGF alternative on 3 environments. They ask the following questions:
- What is the impact of optimizing GGF instead of the average of the objectives? GGF versions perform worse than their classical version w.r.t. classic metrics, but are much better w.r.t. fairness measures (GGF score, coefficient of variation…).
- How do the algorithms adapted to GGF compare with each other and with their standard versions? GGF versions perform always better on GGF score, and among them GGF-PPO is best.
- How do fair deterministic and stochastic policies compare? For the sum of objectives, stochastic performs better. For GGF, DQN performs good on simple environments, but worse on more complex ones. This seem to confirm our previous comment on convergence.
- What is the effect of γ with respect to GGF-average optimality? Policies found by GGF-PPO with a standard are close to GGF-average optimal.
- How do those algorithms perform in continuous domains? Similar observations as question 1.