NYC startup
fromInfoQ
3 days agoDirecting a Swarm of Agents for Fun and Profit
Netflix pioneered enterprise cloud usage, transitioning from credit card instances to formal AWS licensing.
We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights - to protect their peers. We call this phenomenon 'peer-preservation.'
With its Alpha series of game-playing AIs, Google's DeepMind group seemed to have found a way for its AIs to tackle any game, mastering games like chess and by repeatedly playing itself during training. But then some odd things happened as people started identifying Go positions that would lose against relative newcomers to the game but easily defeat a similar Go-playing AI.
Time pressure, limited information, confusion, fatigue, and mortality salience combine to set the stage for decision-making errors, sometimes with grave consequences. An example is the downing of Iran Air Flight 655 by a missile launched by the USS Vincennes in 1988, resulting in the death of 290 passengers and crew. In a time of heightened tension between the U.S. and Iran, the captain of the Vincennes misidentified the airliner as an incoming hostile aircraft and ordered his crew to shoot it down.
The team, which is being led by Jülich neurophysics professor Markus Diesmann, will leverage the Joint Undertaking Pioneer for Innovative and Transformative Exascale Research (JUPITER) supercomputer for their simulation. JUPITER is currently the fourth most powerful supercomputer in the world according to the TOP500 list, and features thousands of graphical processing units. The team demonstrated last month that a " spiking neural network " could be scaled up and run on JUPITER, effectively matching the cerebral cortex's 20 billion neurons and 100 trillion connections.
A dyad has three parts, not two: Partner A, Partner B, and the relationship or agreements between them. A dyad of two experts who cannot communicate clearly will often lose to a dyad of less-skilled individuals who coordinate effectively.
When we rolled out a custom-built company GPT to our 14,000 teammates several years ago, we saw three clear groups emerge. First, there was the 'jump-in-with-both-feet' crowd. These are the early adopters who treat anything new like a shiny toy. Next were the skeptics who wondered how much of an impact AI would have on their daily work lives. And finally, there was a big group that genuinely wanted to learn but didn't know where to start.
Last year I first started thinking about what the future of programming languages might look like now that agentic engineering is a growing thing. Initially I felt that the enormous corpus of pre-existing code would cement existing languages in place but now I'm starting to think the opposite is true. Here I want to outline my thinking on why we are going to see more new programming languages and why there is quite a bit of space for interesting innovation.
AI agents need skills - specific procedural knowledge - to perform tasks well, but they can't teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering. The researchers looked at each task under three conditions:
Have you ever asked Alexa to remind you to send a WhatsApp message at a determined hour? And then you just wonder, 'Why can't Alexa just send the message herself? Or the incredible frustration when you use an app to plan a trip, only to have to jump to your calendar/booking website/tour/bank account instead of your AI assistant doing it all? Well, exactly this gap between AI automation and human action is what the agent-to-agent (A2A) protocol aims to address. With the introduction of AI Agents, the next step of evolution seemed to be communication. But when communication between machines and humans is already here, what's left?
When a scientist feeds a data set into a bot and says "give me hypotheses to test", they are asking the bot to be the creator, not a creative partner. Humans tend to defer to ideas produced by bots, assuming that the bot's knowledge exceeds their own. And, when they do, they end up exploring fewer avenues for possible solutions to their problem.
For the past three years, the conversation around artificial intelligence has been dominated by a single, anxious question: What will be left for us to do? As large language models began writing code, drafting legal briefs, and composing poetry, the prevailing assumption was that human cognitive labor was being commoditized. We braced for a world where thinking was outsourced to the cloud, rendering our hard-won mental skills, writing, logic, and structural reasoning relics of a pre-automated past.