
"On Thursday, the Laude Institute announced its first batch of Slingshots grants, aimed at "advancing the science and practice of artificial intelligence." Designed as an accelerator for researchers, the Slingshots program is meant to provide resources that would be unavailable in most academic settings, whether it's funding, compute power, or product and engineering support. In exchange, the recipients pledge to produce some final work product, whether it's a startup, an open-source codebase, or another type of artifact."
"The initial cohort is fifteen projects, with a particular focus on the difficult problem of AI evaluation. Some of those projects will be familiar to TechCrunch readers, including the command-line coding benchmark Terminal Bench and the latest version of the long-running ARC-AGI project. Others take a fresh approach to a long-established evaluation problem. Formula Code, built by researchers at CalTech and UT Austin, aims to produce an evaluation of AI agents' ability to optimize existing code, while the Columbia-based BizBench proposes a comprehensive benchmark for "white-collar AI agents." Other grants explore new structures for reinforcement learning or model compression."
""I do think people continuing to evaluate on core third-party benchmarks drives progress," Yang told TechCrunch. "I'm a little bit worried about a future where benchmarks just become specific to companies.""
Laude Institute launched the Slingshots accelerator to advance AI science and practice by providing resources uncommon in academia, including funding, compute, and product engineering support. Recipients commit to delivering concrete artifacts such as startups, open-source codebases, or other outputs. The first cohort comprises fifteen projects with a strong emphasis on AI evaluation. Projects include command-line coding benchmarks, ongoing AGI evaluation efforts, code-optimization evaluations, and benchmarks for white-collar AI agents. Other grants investigate new reinforcement learning structures and model compression. A competition-driven code assessment project aims to evaluate code through dynamic, contest-style frameworks.
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]