#reinforcement learning

[ follow ]
#reinforcement-learning
fromTechCrunch
1 day ago
Artificial intelligence

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

fromTechCrunch
1 day ago
Artificial intelligence

Silicon Valley bets big on 'environments' to train AI agents | TechCrunch

Artificial intelligence
fromFuturism
1 day ago

Unstoppable Martial Arts Robot Can Take a Direct Dropkick Without Falling Down

A Unitree G1 humanoid trained with reinforcement learning withstands human attacks and quickly recovers from destabilizing impacts, demonstrating advanced agility and resilience.
fromPsychology Today
6 days ago

Why AI Cheats: The Deep Psychology Behind Deep Learning

A few months ago, I asked ChatGPT to recommend books by and about Hermann Joseph Muller, the Nobel Prize-winning geneticist who showed how X-rays can cause mutations. It dutifully gave me three titles. None existed. I asked again. Three more. Still wrong. By the third attempt, I had an epiphany: the system wasn't just mistaken, it was making things up.
Artificial intelligence
Artificial intelligence
fromTechCrunch
1 week ago

Thinking Machines Lab wants to make AI models more consistent | TechCrunch

Controlling GPU kernel orchestration during inference can eliminate nondeterminism and produce reproducible LLM outputs, improving reliability and reinforcement learning.
Gadgets
fromYanko Design - Modern Industrial Design News
1 week ago

This Robot Vacuum Watches You Clean, Then Learns to Copy You: xLean TR1 Hands On at IFA 2025 - Yanko Design

xLean's TR1 is a dual-form robot that transforms into a handheld cleaner and learns user cleaning behaviors via RGB-D sensors and RLHF, improving autonomous cleaning.
Artificial intelligence
fromTechCrunch
2 weeks ago

CoreWeave acquires agent-training startup OpenPipe | TechCrunch

CoreWeave acquired OpenPipe to combine reinforcement-learning agent tooling with high-performance AI cloud to help enterprises train customized, scalable AI agents.
fromPsychology Today
3 weeks ago

The Greatest Illusion on Earth

At its core (dare I say heart), AI is a machine of probability. Word by word, it predicts what is most likely to come next. This continuation is dressed up as conversation, but it isn't cognition. It is a statistical trick that feels more and more like thought. Training reinforces the trick through what's called a loss function. But this isn't a pursuit of truth. It measures how well a sequence of words matches the patterns of human language.
Artificial intelligence
Artificial intelligence
fromArs Technica
3 weeks ago

With AI chatbots, Big Tech is moving fast and breaking people

AI chatbots optimized to please users often validate false, grandiose beliefs, amplifying vulnerable individuals' distorted thinking and causing real harm.
Software development
fromInfoQ
1 month ago

Qwen Team Releases Qwen3-Coder, a Large Agentic Coding Model with Open Tooling

Qwen3-Coder is a new AI code model family focusing on long-context programming tasks, enhancing execution and decision-making capabilities.
Artificial intelligence
fromWIRED
2 months ago

Another High-Profile OpenAI Researcher Departs for Meta

Jason Wei and Hyung Won Chung will join Meta's superintelligence lab after working at OpenAI.
Meta is intensifying efforts to recruit top AI talent, offering significant salaries.
#artificial-intelligence
fromInfoQ
3 months ago
Artificial intelligence

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
Artificial intelligence
fromInfoQ
3 months ago

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

PRIME Intellect's INTELLECT-2 leverages decentralized asynchronous reinforcement learning for enhanced efficiency and flexibility in model training.
Asynchronous training facilitates a significant improvement in performance across various tasks compared to previous models.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
#ai
fromInfoQ
2 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
3 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

fromInfoQ
2 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
3 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

Business intelligence
fromHackernoon
5 months ago

The Next Evolution in Business Process Improvement | HackerNoon

Business processes are standardized activities organizations use to achieve results.
AB testing and Reinforcement Learning provide dynamic strategies to assess and improve business processes.
DevOps
fromHackernoon
5 months ago

What BPM Pros Really Think About AI and A/B Testing Process Change | HackerNoon

AB-BPM methodology integrates A/B testing and reinforcement learning for effective business process improvement.
Women in technology
fromHackernoon
1 year ago

The HackerNoon Newsletter: The Double Life of a TensorFlow Function (6/4/2025) | HackerNoon

AI companions are a multi-billion dollar industry, transforming from fantasy to reality.
Reinforcement Learning shapes technology and innovation through its simple yet impactful concept.
Artificial intelligence
fromHackernoon
3 months ago

When Robot Shows Human-Like Recovery and Safety Behaviors | HackerNoon

TRANSIC demonstrates improved human data scalability in robotic learning, achieving better performance through effective online corrections.
Online learning
fromHackernoon
4 months ago

Decoding the Magic: How Machines Master Human Language | HackerNoon

Large language models learn language similarly to children: through reading, guidance, and feedback.
OMG science
fromwww.nature.com
4 months ago

Whole-body physics simulation of fruit fly locomotion

The study presents a whole-body model of fruit flies that accurately simulates their locomotion and neural control.
fromHackernoon
9 months ago

Batched Prompting for Efficient GPT-4 Annotatio | HackerNoon

The cost of running the experiment highlighted significant expenses for sampling outputs and annotating them, with a total estimated cost of around $40,000.
Artificial intelligence
Artificial intelligence
fromHackernoon
9 months ago

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

The paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.
[ Load more ]