A stationary policy, , is a policy that does not change over time, that is, .

There are problems where a stationary optimal policy is guaranteed to exist. For example, in the case of a stochastic (there is a probability density that models the dynamics of the environment, that is, the transition function and the reward function) and discrete-time Markov decision process (MDP) with finite numbers of states and actions, and bounded rewards, where the objective is the long-run average reward, a stationary optimal policy exists. This is proven in Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin L. Puterman.