
"LLMs do not see, hear, touch, or interact with reality. They are trained almost entirely on text: books, articles, posts, comments, transcripts, and fragments of human expression collected from across history and the internet. That text is their only input. Their only "experience." LLMs only "see" shadows: texts produced by humans describing the world. Those texts are their entire universe. Everything an LLM knows about reality comes filtered through language, written by people with varying degrees of intelligence, honesty, bias, knowledge, and intent."
"Text is not reality: it is a human representation of reality. It is mediated, incomplete, biased, and wildly heterogeneous, often distorted. Human language reflects opinions, misunderstandings, cultural blind spots, and outright falsehoods. Books and the internet contain extraordinary insights, but also conspiracy theories, propaganda, pornography, abuse, and sheer nonsense. When we train LLMs on "all the text," we are not giving them access to the world. We are giving them access to humanity's shadows on the wall."
Large language models produce fluent, confident language despite lacking direct sensory perception. Training data consists almost entirely of human-created text—books, articles, posts, transcripts—which serves as their sole input and experience. All knowledge in these models derives from written descriptions produced by people with varying intelligence, bias, honesty, and intent. Human language is mediated, incomplete, and often distorted, containing insights alongside misinformation, propaganda, and nonsense. Training on vast textual corpora grants access to human representations rather than the world itself, causing models to operate on shadows of reality instead of firsthand perception.
Read at Fast Company
Unable to calculate read time
Collection
[
|
...
]