Data science
fromTheregister
2 days agoDeepSeek's new models offer big inference cost savings
DeepSeek V4 introduces a new large language model that rivals top American models while reducing inference costs and supporting Huawei's AI accelerators.
PolarQuant is doing most of the compression, but the second step cleans up the rough spots. Google proposes smoothing that out with a technique called Quantized Johnson-Lindenstrauss (QJL).
The model's other capabilities, including support for multimodal inputs, multiple reasoning modes, and parallel sub-agents for complex queries, could help enterprises build faster, task-focused AI for customer support, automation, and internal copilots without relying on heavier models.
Meta is working on two proprietary frontier models: Avocado, a large language model, and Mango, a multimedia file generator. The open-source variants are expected to be made available at a later date.
The TypeScript team released an early preview of TypeScript 6. This release is mainly about internal changes preparing for the future Go-based compiler planned for TypeScript 7. Large monorepos could see dramatic speed improvements once the Go compiler lands.
This is a state where we see that the teams that move fastest will be the ones with clear tests, tight review policies, automated enforcement and reliable merge paths. Those guardrails are what make AI useful. If your systems can automatically catch mistakes, enforce standards, and prove what changed and why, then you can safely let agents do the heavy lifting. If not, you're just accelerating risk,
What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood.
The scaling model relies on several predictive factors of the system, including the underlying LLM's intelligence index; the baseline performance of a single agent; the number of agents; number of tools; and coordination metrics. The researchers found there were three dominant effects in the model: tool-coordination trade-off, where tasks requiring many tools perform worse with multi-agent overhead; capability saturation, where adding agents yields diminishing returns when the single-agent baseline performance exceeds a certain threshold; and topology-dependent error amplification, where centralized orchestration reduces error amplification.
The new capabilities center on two integrated components: the Dynamo Planner Profiler and the SLO-based Dynamo Planner. These tools work together to solve the "rate matching" challenge in disaggregated serving. The teams use this term when they split inference workloads. They separate prefill operations, which process the input context, from decode operations that generate output tokens. These tasks run on different GPU pools. Without the right tools, teams spend a lot of time determining the optimal GPU allocation for these phases.
AI agents need skills - specific procedural knowledge - to perform tasks well, but they can't teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering. The researchers looked at each task under three conditions:
AI agents built on large language models (LLMs) often look deceptively simple in demos. A clever prompt and a few tool integrations can produce impressive results, leading newer engineers to believe deployment will be straightforward. In practice, these agents frequently fail in production. Prompts that work in controlled environments break under real-world conditions such as noisy inputs, latency constraints, and user variability. When building AI agents, it may begin hallucinating tool calls, exceed acceptable response times, and rapidly increase API costs.