
"Databricks is expanding the evaluation capabilities of its Agent Bricks interface with three new features that are expected to help enterprises improve the accuracy and reliability of AI agents. Agent Bricks, released in beta in June, is a generative AI-driven automated interface that streamlines agent development for enterprises and combines technologies developed by MosaicML, including TAO, the synthetic data generation API, and the Mosaic Agent platform."
"The new features, which include Agent-as-a-Judge, Tunable Judges, and Judge Builder, enhance Agent Bricks' automated evaluation system with more flexibility and customization, Craig Wiley, senior director of product management at Databricks, told InfoWorld. Agent Bricks' automated evaluation system can generate evaluation benchmarks via an LLM judge based on the defined agent task or workflow, often using synthetic data, to assess agent performance as part of its auto-optimization loop."
"One of the new features, Agent-as-a-Judge, offers that capability for developers, saving time and complexity while offering insights into an agent's trace that can make evaluations more accurate. "It's a new capability that makes those automated evaluations even smarter and more adaptable - adding intelligence that can automatically identify which parts of an agent's trace to evaluate, removing the need for developers to write or maintain complex traversal logic," Wiley said."
Databricks expanded Agent Bricks with three features—Agent-as-a-Judge, Tunable Judges, and Judge Builder—to improve enterprise AI agent evaluation accuracy and reliability. Agent Bricks combines MosaicML technologies such as the TAO synthetic data generation API and the Mosaic Agent platform and operates as a generative AI-driven automated interface for agent development. The automated evaluation system can create benchmarks via an LLM judge using synthetic data and assess agent performance within an auto-optimization loop. Agent-as-a-Judge enables automated inspection of execution traces without custom traversal code, while Tunable Judges and Judge Builder provide flexibility to align agent behavior with business standards and reduce developer effort.
Read at InfoWorld
Unable to calculate read time
Collection
[
|
...
]