#preference-learning

[ follow ]
Artificial intelligence
fromMedium
1 month ago

How Robots Learn Preferences with Minimal Human Feedback

Vik's research focuses on how robots can learn from minimal human feedback, adapting without the need for large datasets.
fromHackernoon
5 months ago

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

This paper introduces Direct Nash Optimization (DNO), a novel approach that integrates stability and generality in large language model post-training, moving beyond traditional reward maximization limits.
Artificial intelligence
[ Load more ]