Researchers from PeopleTec conducted a study comparing four large language models (LLMs) on their performance with freelance coding tasks using a Kaggle dataset. The study aimed to evaluate coding tasks valued collectively at $1.6 million. While no AI model surpassed human capabilities, Claude 3.5 Haiku came closest, solving 877 tasks with a passing rate of 78.7%. It outperformed GPT-4o-mini in both accuracy and potential earnings, illustrating that AI can assist but not fully replace human coders.
"We found that there is a great data set of genuine [freelance job] bids on Kaggle as a competition, and so we thought: why not put that to large language models and see what they can do?"
"Claude 3.5 Haiku narrowly outperformed GPT-4o-mini, both in accuracy and in dollar earnings," the paper reports, noting that Claude managed to capture about $1.52 million in theoretical payments out of the possible $1.6 million.
Collection
[
|
...
]