Beyond Seen Worlds: EXPLORER's Journey into Generalized Reasoning | HackerNoon
To accomplish this, policy generalization is a crucial feature that an ideal RL agent should have. It should perform well on unseen entities or out-of-distribution (OOD) data.