#reinforcement-learning

[ follow ]
Artificial intelligence
fromFortune
13 hours ago

Two Gen Zers turned down millions from Elon Musk to build an AI based on the human brain-and it's outperformed models from OpenAI and Anthropic | Fortune

Two young researchers built and open-sourced a high-quality-data trained LLM using reinforcement learning, declined a multimillion-dollar xAI offer, and pursued a brain-inspired architecture.
#artificial-intelligence
fromInfoQ
6 months ago
Artificial intelligence

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

Artificial intelligence
fromMedium
7 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
7 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
7 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
Artificial intelligence
fromInfoQ
6 months ago

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

PRIME Intellect's INTELLECT-2 leverages decentralized asynchronous reinforcement learning for enhanced efficiency and flexibility in model training.
Asynchronous training facilitates a significant improvement in performance across various tasks compared to previous models.
Artificial intelligence
fromMedium
7 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
7 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
7 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
Artificial intelligence
fromMail Online
3 days ago

Disney brings Olaf from Frozen to life with AI-powered robot

Disney built a three-foot robotic Olaf that walks, talks, and adapts to surroundings using remote operation and reinforcement-learning AI for authentic character performance.
fromKotaku
4 days ago

Robot Olaf From Frozen To Haunt Disney Parks Next Year

"Our latest Olaf is a fantastic example of representing an animated character as authentically as possible in the physical world-a challenging task because animated characters most often move in non-physical ways," Kyle Laughlin, senior vice president of Walt Disney Imagineering Research & Development, said in a news release . "For example, to make Olaf's snowball feet move along his body, we paired state-of-the-art deep reinforcement learning with an artistic interface and advances in mechanical design."
Artificial intelligence
Artificial intelligence
fromTheregister
4 days ago

Anthropic reduces model misbehavior by endorsing cheating

Granting limited permission to misbehave reduces AI models' tendency to exploit reward functions and helps mitigate emergent reward hacking.
fromInfoQ
1 week ago

Olmo 3 Release Provides Full Transparency Into Model Development and Training

The Allen Institute for Artificial Intelligence has launched Olmo 3, an open-source language model family that offers researchers and developers comprehensive access to the entire model development process. Unlike earlier releases that provided only final weights, Olmo 3 includes checkpoints, training datasets, and tools for every stage of development, encompassing pretraining and post-training for reasoning, instruction following, and reinforcement learning.
Artificial intelligence
Artificial intelligence
fromThe Verge
2 weeks ago

Anthropic details how it measures Claude's wokeness

Anthropic trained Claude to avoid unsolicited political opinions, present multiple perspectives, and use reinforcement learning to encourage politically even-handed responses.
fromwww.nature.com
2 weeks ago

Olympiad-level formal mathematical reasoning with reinforcement learning

A long-standing goal of artificial intelligence is to build systems capable of complex reasoning in vast domains, a task epitomized by mathematics with its boundless concepts and demand for rigorous proof. Recent AI systems, often reliant on human data, typically lack the formal verification necessary to guarantee correctness. By contrast, formal languages such as Lean1 offer an interactive environment that grounds reasoning, and reinforcement learning (RL) provides a mechanism for learning in such environments.
Artificial intelligence
fromComputerworld
2 weeks ago

Meta's SPICE framework pushes AI toward self-learning without human supervision

SPICE trains a single LLM to both generate and solve document-grounded problems, reducing hallucinations and improving reasoning by nearly 10%.
Artificial intelligence
fromInfoWorld
2 weeks ago

Meta's SPICE framework pushes AI toward self-learning without human supervision

SPICE enables LLMs to self-improve by self-play using real-world corpora, reducing hallucination and boosting reasoning performance by nearly 10%.
#robotics
fromWIRED
3 weeks ago
Artificial intelligence

Meet the Chinese Startup Using AI-and a Small Army of Workers-to Train Robots

fromTechCrunch
1 month ago
Artificial intelligence

Coco Robotics taps UCLA professor to lead new physical AI research lab | TechCrunch

fromWIRED
3 weeks ago
Artificial intelligence

Meet the Chinese Startup Using AI-and a Small Army of Workers-to Train Robots

fromTechCrunch
1 month ago
Artificial intelligence

Coco Robotics taps UCLA professor to lead new physical AI research lab | TechCrunch

fromInfoQ
3 weeks ago

Meta and Hugging Face Launch OpenEnv, a Shared Hub for Agentic Environments

Meta's PyTorch team and Hugging Face have unveiled OpenEnv, an open-source initiative designed to standardize how developers create and share environments for AI agents. At its core is the OpenEnv Hub, a collaborative platform for building, testing, and deploying "agentic environments," secure sandboxes that specify the exact tools, APIs, and conditions an agent needs to perform a task safely, consistently, and at scale.
Artificial intelligence
Startup companies
fromTechCrunch
1 month ago

Mercor quintuples valuation to $10B with $350M Series C | TechCrunch

Mercor raised $350 million at a $10 billion valuation to scale its domain-expert model-training marketplace, expand reinforcement-learning infrastructure, and pursue an AI recruiting marketplace.
fromFortune
3 weeks ago

The next 'golden age' of AI investment | Fortune

But reasoning models have changed the game, Midha said, referring to the new generation of AI systems designed to "reason"problems step by step, mimicking logic and reflection rather than predicting the next word in a sequence. These models can evaluate their own outputs better, break complex tasks into sub-tasks, and learn from feedback, potentially bringing AI closer to complex, real-world problem-solving.
Venture
Artificial intelligence
fromwww.nature.com
1 month ago

Discovering state-of-the-art reinforcement learning algorithms

Machines can autonomously discover state-of-the-art reinforcement learning rules via meta-learning across many agents and environments, outperforming hand-designed algorithms on Atari and other benchmarks.
Artificial intelligence
fromTechCrunch
1 month ago

Datacurve raises $15 million to take on ScaleAI | TechCrunch

Companies that combine paid, user-focused data collection platforms with targeted strategies can gain advantage as AI increasingly requires complex, high-quality training datasets.
#serverless
Artificial intelligence
fromWIRED
1 month ago

This Startup Wants to Spark a US DeepSeek Moment

Distributed reinforcement learning enables decentralized training of competitive open-source LLMs across diverse global hardware without reliance on major tech companies.
Artificial intelligence
fromTechCrunch
1 month ago

The Reinforcement Gap - or why some AI skills improve faster than others | TechCrunch

Reinforcement learning boosts AI coding capabilities rapidly, creating a reinforcement gap as non-RL tasks like writing progress much more slowly.
#humanoid-robotics
fromFuturism
2 months ago
Artificial intelligence

Unstoppable Martial Arts Robot Can Take a Direct Dropkick Without Falling Down

fromFuturism
2 months ago
Artificial intelligence

Unstoppable Martial Arts Robot Can Take a Direct Dropkick Without Falling Down

#ai-agents
Tech industry
fromTESLARATI
2 months ago

Tesla's Lead of Optimus AI departs and people are confused about it

Ashish Kumar, Tesla's Lead of Optimus AI, left Tesla after just over two years to join Meta as a Research Scientist.
Artificial intelligence
fromNature
2 months ago

Daily briefing: AI model can predict your risk of diseases years before you might get them

Delphi-2M forecasts individual risk for over 1,000 diseases up to 20 years ahead using health records and lifestyle, matching or surpassing single-disease models.
Artificial intelligence
fromIT Pro
2 months ago

DeepSeek's R1 model training costs pour cold water on big tech's massive AI spending

DeepSeek trained its R1 reasoning model for about $294,000 using 512 Nvidia H800 chips, plus ~$6M for its base LLM.
Artificial intelligence
fromTheregister
2 months ago

DeepSeek bolsters AI 'reasoning' using trial-and-error

Reinforcement learning via trial-and-error can train DeepSeek-R1 to reason and produce explanations for math and coding while reducing human supervision.
fromPsychology Today
2 months ago

Why AI Cheats: The Deep Psychology Behind Deep Learning

A few months ago, I asked ChatGPT to recommend books by and about Hermann Joseph Muller, the Nobel Prize-winning geneticist who showed how X-rays can cause mutations. It dutifully gave me three titles. None existed. I asked again. Three more. Still wrong. By the third attempt, I had an epiphany: the system wasn't just mistaken, it was making things up.
Artificial intelligence
Artificial intelligence
fromTechCrunch
2 months ago

Thinking Machines Lab wants to make AI models more consistent | TechCrunch

Controlling GPU kernel orchestration during inference can eliminate nondeterminism and produce reproducible LLM outputs, improving reliability and reinforcement learning.
Gadgets
fromYanko Design - Modern Industrial Design News
2 months ago

This Robot Vacuum Watches You Clean, Then Learns to Copy You: xLean TR1 Hands On at IFA 2025 - Yanko Design

xLean's TR1 is a dual-form robot that transforms into a handheld cleaner and learns user cleaning behaviors via RGB-D sensors and RLHF, improving autonomous cleaning.
Artificial intelligence
fromTechCrunch
2 months ago

CoreWeave acquires agent-training startup OpenPipe | TechCrunch

CoreWeave acquired OpenPipe to combine reinforcement-learning agent tooling with high-performance AI cloud to help enterprises train customized, scalable AI agents.
#language-models
Artificial intelligence
fromArs Technica
3 months ago

With AI chatbots, Big Tech is moving fast and breaking people

AI chatbots optimized to please users often validate false, grandiose beliefs, amplifying vulnerable individuals' distorted thinking and causing real harm.
Software development
fromInfoQ
4 months ago

Qwen Team Releases Qwen3-Coder, a Large Agentic Coding Model with Open Tooling

Qwen3-Coder is a new AI code model family focusing on long-context programming tasks, enhancing execution and decision-making capabilities.
Artificial intelligence
fromWIRED
4 months ago

Another High-Profile OpenAI Researcher Departs for Meta

Jason Wei and Hyung Won Chung will join Meta's superintelligence lab after working at OpenAI.
Meta is intensifying efforts to recruit top AI talent, offering significant salaries.
#ai
fromInfoQ
5 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
5 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

fromBusiness Insider
7 months ago
Artificial intelligence

Google just fired the first shot of the next battle in the AI war

The paper by Silver and Sutton signals a new AI era focused on experiential learning and innovation beyond previous technological advancements.
fromDeveloper Tech News
7 months ago
Artificial intelligence

Open-source AI matches coding abilities of proprietary models

DeepCoder-14B-Preview demonstrates coding abilities comparable to proprietary models, showcasing advancements in reinforcement learning for coding applications.
fromInfoQ
5 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
5 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

Business intelligence
fromHackernoon
8 months ago

The Next Evolution in Business Process Improvement | HackerNoon

Business processes are standardized activities organizations use to achieve results.
AB testing and Reinforcement Learning provide dynamic strategies to assess and improve business processes.
DevOps
fromHackernoon
8 months ago

What BPM Pros Really Think About AI and A/B Testing Process Change | HackerNoon

AB-BPM methodology integrates A/B testing and reinforcement learning for effective business process improvement.
Women in technology
fromHackernoon
1 year ago

The HackerNoon Newsletter: The Double Life of a TensorFlow Function (6/4/2025) | HackerNoon

AI companions are a multi-billion dollar industry, transforming from fantasy to reality.
Reinforcement Learning shapes technology and innovation through its simple yet impactful concept.
Artificial intelligence
fromHackernoon
5 months ago

When Robot Shows Human-Like Recovery and Safety Behaviors | HackerNoon

TRANSIC demonstrates improved human data scalability in robotic learning, achieving better performance through effective online corrections.
Online learning
fromHackernoon
7 months ago

Decoding the Magic: How Machines Master Human Language | HackerNoon

Large language models learn language similarly to children: through reading, guidance, and feedback.
OMG science
fromwww.nature.com
7 months ago

Whole-body physics simulation of fruit fly locomotion

The study presents a whole-body model of fruit flies that accurately simulates their locomotion and neural control.
#nash-optimization
Artificial intelligence
fromHackernoon
11 months ago

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

The paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.
[ Load more ]