Tracking Reward Function Improvement with Proxy Human Preferences in ICPL | HackerNoon
Briefly

In analyzing the improvement in the Humanoid task, the iterations showed that adjusting reward weights progressively enhanced performance, resulting in a remarkable increase in RTS to 8.125.
The application of different penalty terms and reward weightings over successive iterations of the ICPL led to a significant enhancement in humanoid performance, showcasing the importance of fine-tuning rewards.
Initial rewards calculated for the humanoid task and the adjustments made to the weightings reflect a systematic approach to optimizing performance through a structured trial process.
The results indicate that increasing the weight on the speed reward consistently improved the RTS, demonstrating the impact of strategic reward balancing on agent performance.
Read at Hackernoon
[
|
]