Datacurve raises $15 million to take on ScaleAI | TechCrunch
Briefly

Datacurve raises $15 million to take on ScaleAI | TechCrunch
"As AI companies mature, the fight for high-quality data has become one of the most competitive areas in the industry, launching companies like Mercor, Surge, and, most prominently, Alexandr Wang's ScaleAI. But now that Wang has moved to run AI at Meta, many funders see an opening - and are willing to fund companies with compelling new strategies for collecting training data."
"Datacurve uses a "bounty hunter" system to attract skilled software engineers to complete the hardest-to-source datasets. The company pays for those contributions, distributing over $1 million in bounties so far. But co-founder Serena Ge says the biggest motivation isn't financial. For high-value services like software development, the pay will always be far lower for data work than conventional employment - so the company's most important edge is a positive user experience."
"That's particularly important as the needs of post-training data grow more complex. While earlier models were trained on simple datasets, today's AI products rely on complex RL environments, which need to be constructed through specific and strategic data collection. As the environments grow more sophisticated, the data requirements become both more intense for both quantity and quality - a factor that could give high-quality data collection companies like Datacurve an edge."
Competition for high-quality training data has intensified, spawning specialized companies and drawing investor interest as established leaders shift roles. Datacurve secured a $15 million Series A led by Mark Goldberg at Chemistry following a $2.7 million seed round that included Balaji Srinivasan. Datacurve employs a "bounty hunter" system to pay skilled software engineers for difficult-to-source software datasets, distributing over $1 million in bounties to date. Datacurve emphasizes user experience over high pay to attract top contributors, treating the platform as a consumer product rather than a labeling operation. Growing reliance on complex RL environments increases both quantity and quality demands, favoring strategic data-collection firms.
Read at TechCrunch
Unable to calculate read time
[
|
]