Anshul Kundaje sums up his frustration with the use of artificial intelligence in science in three words: "bad benchmarks propagate". He expresses concern about questionable claims made by researchers about AI models, which take months to verify and often turn out to be false due to poorly defined benchmarks. This problem creates misinformation and wrong predictions, as flawed benchmarks are misused by enthusiastic users. The lack of reliable benchmarks threatens to undermine AI's potential to accelerate scientific progress rather than enhance it.
In the version of the article originally published, there were calculation and transcription errors in the data for Fig. 5g, Extended Data Fig. 7c,g and Extended Data Fig. 8e. The corrected data has led to minor changes in the graphs.