fromHackernoon
1 year agoEmpirical Results: GPT-2 Analysis of Transformer Memorization & Loss | HackerNoon
These experiments with GPT-2 medium on OpenWebText validate the radius hypothesis from our theoretical framework, measuring activation distances in the last layer for next-token prediction.
Roam Research