It's like collaborating with an alien. Why better than the average human may be the wrong metric for AI performance
Briefly

Companies primarily evaluate AI performance based on existing human benchmarks. Reuters mandates that no AI tool be used in news production unless it surpasses human error rates. For instance, AI has been effective in translating news articles with fewer mistakes than human translators. Conversely, BP's experiment evaluated an AI's ability to assist engineers in safety decisions. While the AI scored 92% on a critical safety exam, questions arose regarding the 8% of errors, raising concerns about its reliability in essential decision-making environments.
Simon Robinson, executive editor at Reuters, emphasizes the commitment that no AI tool will be used for news production unless its average error rate is better than humans.
Utham Ali from BP raised concerns over AI’s 8% error rate, questioning whether this level of performance is sufficient for critical safety decision-making in engineering.
Read at Fortune
[
|
]