#reinforcement-learning

[ follow ]
fromPsychology Today
3 days ago

Why AI Cheats: The Deep Psychology Behind Deep Learning

A few months ago, I asked ChatGPT to recommend books by and about Hermann Joseph Muller, the Nobel Prize-winning geneticist who showed how X-rays can cause mutations. It dutifully gave me three titles. None existed. I asked again. Three more. Still wrong. By the third attempt, I had an epiphany: the system wasn't just mistaken, it was making things up.
Artificial intelligence
Artificial intelligence
fromTechCrunch
4 days ago

Thinking Machines Lab wants to make AI models more consistent | TechCrunch

Controlling GPU kernel orchestration during inference can eliminate nondeterminism and produce reproducible LLM outputs, improving reliability and reinforcement learning.
Gadgets
fromYanko Design - Modern Industrial Design News
1 week ago

This Robot Vacuum Watches You Clean, Then Learns to Copy You: xLean TR1 Hands On at IFA 2025 - Yanko Design

xLean's TR1 is a dual-form robot that transforms into a handheld cleaner and learns user cleaning behaviors via RGB-D sensors and RLHF, improving autonomous cleaning.
Artificial intelligence
fromTechCrunch
1 week ago

CoreWeave acquires agent-training startup OpenPipe | TechCrunch

CoreWeave acquired OpenPipe to combine reinforcement-learning agent tooling with high-performance AI cloud to help enterprises train customized, scalable AI agents.
#language-models
Artificial intelligence
fromArs Technica
2 weeks ago

With AI chatbots, Big Tech is moving fast and breaking people

AI chatbots optimized to please users often validate false, grandiose beliefs, amplifying vulnerable individuals' distorted thinking and causing real harm.
Software development
fromInfoQ
1 month ago

Qwen Team Releases Qwen3-Coder, a Large Agentic Coding Model with Open Tooling

Qwen3-Coder is a new AI code model family focusing on long-context programming tasks, enhancing execution and decision-making capabilities.
Artificial intelligence
fromWIRED
2 months ago

Another High-Profile OpenAI Researcher Departs for Meta

Jason Wei and Hyung Won Chung will join Meta's superintelligence lab after working at OpenAI.
Meta is intensifying efforts to recruit top AI talent, offering significant salaries.
#artificial-intelligence
fromInfoQ
3 months ago
Artificial intelligence

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
Artificial intelligence
fromInfoQ
3 months ago

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

PRIME Intellect's INTELLECT-2 leverages decentralized asynchronous reinforcement learning for enhanced efficiency and flexibility in model training.
Asynchronous training facilitates a significant improvement in performance across various tasks compared to previous models.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
5 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
#ai
fromInfoQ
2 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
2 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

fromInfoQ
2 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
2 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

Business intelligence
fromHackernoon
5 months ago

The Next Evolution in Business Process Improvement | HackerNoon

Business processes are standardized activities organizations use to achieve results.
AB testing and Reinforcement Learning provide dynamic strategies to assess and improve business processes.
DevOps
fromHackernoon
5 months ago

What BPM Pros Really Think About AI and A/B Testing Process Change | HackerNoon

AB-BPM methodology integrates A/B testing and reinforcement learning for effective business process improvement.
Women in technology
fromHackernoon
1 year ago

The HackerNoon Newsletter: The Double Life of a TensorFlow Function (6/4/2025) | HackerNoon

AI companions are a multi-billion dollar industry, transforming from fantasy to reality.
Reinforcement Learning shapes technology and innovation through its simple yet impactful concept.
Artificial intelligence
fromHackernoon
3 months ago

When Robot Shows Human-Like Recovery and Safety Behaviors | HackerNoon

TRANSIC demonstrates improved human data scalability in robotic learning, achieving better performance through effective online corrections.
Online learning
fromHackernoon
4 months ago

Decoding the Magic: How Machines Master Human Language | HackerNoon

Large language models learn language similarly to children: through reading, guidance, and feedback.
OMG science
fromwww.nature.com
4 months ago

Whole-body physics simulation of fruit fly locomotion

The study presents a whole-body model of fruit flies that accurately simulates their locomotion and neural control.
#nash-optimization
Artificial intelligence
fromHackernoon
9 months ago

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

The paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.
[ Load more ]