#activation-distances

[ follow ]
fromHackernoon
1 year ago

Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss | HackerNoon

These experiments with GPT-2 medium on OpenWebText validate the radius hypothesis from our theoretical framework, measuring activation distances in the last layer for next-token prediction.
Roam Research
[ Load more ]