Agent Design Is Still Hard

"TL;DR: Building agents is still messy. SDK abstractions break once you hit real tool use. Caching works better when you manage it yourself, but differs between models. Reinforcement ends up doing more heavy lifting than expected, and failures need strict isolation to avoid derailing the loop. Shared state via a file-system-like layer is an important building block. Output tooling is surprisingly tricky, and model choice still depends on the task."

"When you build your own agent, you have the choice of targeting an underlying SDK like the OpenAI SDK or the Anthropic SDK, or you can go with a higher level abstraction such as the Vercel AI SDK or Pydantic. The choice we made a while back was to adopt the Vercel AI SDK but only the provider abstractions, and to basically drive the agent loop ourselves. At this point we would not make that choice again."

"The first is that the differences between models are significant enough that you will need to build your own agent abstraction. We have not found any of the solutions from these SDKs that build the right abstraction for an agent. I think this is partly because, despite the basic agent design being just a loop, there are subtle differences based on the tools you provide."

Building agents remains messy and often requires custom engineering choices rather than relying on high-level SDK abstractions. Differences between models, provider tools, and tool prompts mean agent abstractions must handle cache control, reinforcement needs, and tool-specific behaviors. Managing caching locally yields better results but varies across models. Reinforcement learning and feedback mechanisms carry significant workload, and faults must be strictly isolated to prevent loop failures. Shared state implemented as a file-system-like layer is a key component for coordination. Output handling and tooling are surprisingly complex, and model selection must be guided by task requirements.

#agents #sdks #caching #reinforcement-learning #tooling

Read at Armin Ronacher's Thoughts and Writings

Unable to calculate read time

Collection

[

...

]

Agent Design Is Still HardAgent Design Is Still Hard Briefly

Agent Design Is Still Hard
Agent Design Is Still Hard
Briefly