Building the Future of Generative AI: Compound AI Systems
Briefly

Single, monolithic models are out—compound AI systems are in. The next generation of generative AI will be dynamic, agentic workflows—systems of many models, modalities, and external knowledge sources that work together to solve business tasks.
However, transitioning the entire AI industry towards compound AI systems requires radical new tools and design approaches. Compound AI systems need to be steerable to fit into unique workload patterns of individual use cases.
Here are a few examples of automatic customization we invented at Fireworks: Adaptive speculative execution. This approach improves model inference by customizing a technique called 'speculative decoding' for specific workloads.
Rather than having one LLM generate tokens one by one, speculative decoding brings in a smaller 'draft' model. The draft model predicts possible token sequences while the main LLM runs as usual.
Read at Medium
[
|
]