"[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding; many of these data providers are based in China."
"Why did [o1] randomly start thinking in Chinese? No part of the conversation (5+ messages) was in Chinese... very interesting... training data influence."
"[O1] randomly started thinking in Chinese halfway through," a user said on Reddit.
AI experts suggest that reasoning models are influenced by their training datasets, which often contain a variety of languages, including Chinese.
Collection
[
|
...
]