The Role of Human-in-the-Loop Preferences in Reward Function Learning for Humanoid Tasks | HackerNoon
Briefly

In human-in-the-loop preference experiments for tasks in IsaacGym, volunteers provided feedback by comparing videos of performance based on different reward functions.
Volunteers assess the performance of the Quadcopter by its speed and stabilization, while the Humanoid and Ant tasks focus on speed and movement posture.
In the ShadowHand and AllegroHand tasks, the goal is to orient an object correctly, with humans estimating performance based on the proximity to the target orientation.
Despite variability in judgments, volunteers effectively filter out poor results, aligning their selections with proxy human preferences to enhance task performance.
Read at Hackernoon
[
|
]