#rlhf

[ follow ]
fromHackernoon
5 months ago

Direct Nash Optimization Beats Bigger Models with Better Data | HackerNoon

In our head-to-head experiments, we observe that offline contrastive training offers a more valuable training signal than traditional SFT methods, demonstrating its effectiveness in model performance.
Online learning
[ Load more ]