We show that classical value iteration (VI) is suboptimal and that the anchoring mechanism accelerates VI to be optimal, matching a complexity lower bound up to a constant factor of 4.
Our results suggest that the classical foundations of dynamic programming and reinforcement learning may be improved by examining them through the lens of optimization complexity theory.
#reinforcement-learning #dynamic-programming #optimization #value-iteration #computational-complexity