The New ChatGPT Has a Huge Problem in Chinese

from Futurism 10 months ago

According to MIT Tech, nearly all of the 100 longest Chinese-language tokens used by the AI to decipher Chinese prompts were comprised of spammy porn and gambling content.
Futurismhttps://futurism.com/the-byte/chatgpt-problem-chinese

Experts suggest that the problem of uncleaned data in AI training, such as detecting problematic keywords, could have been relatively easy to fix.
Futurismhttps://futurism.com/the-byte/chatgpt-problem-chinese

The failure to clean the Chinese language tokens contrasts with the apparently fine English tokens, posing a significant hurdle for OpenAI's capabilities in Chinese outputs.
Futurismhttps://futurism.com/the-byte/chatgpt-problem-chinese

Read at Futurism

#openai #ai-training-data #chinese-language #data-pollution #spam-detection

Collection

[

...

]

The New ChatGPT Has a Huge Problem in ChineseThe New ChatGPT Has a Huge Problem in Chinese Briefly

The New ChatGPT Has a Huge Problem in Chinese
The New ChatGPT Has a Huge Problem in Chinese
Briefly