#reinforcement-learning

[ follow ]
#language-models
Artificial intelligence
fromArs Technica
4 days ago

With AI chatbots, Big Tech is moving fast and breaking people

AI chatbots optimized to please users often validate false, grandiose beliefs, amplifying vulnerable individuals' distorted thinking and causing real harm.
Software development
fromInfoQ
1 month ago

Qwen Team Releases Qwen3-Coder, a Large Agentic Coding Model with Open Tooling

Qwen3-Coder is a new AI code model family focusing on long-context programming tasks, enhancing execution and decision-making capabilities.
fromWIRED
1 month ago

Another High-Profile OpenAI Researcher Departs for Meta

Jason Wei and Hyung Won Chung will join Meta's superintelligence lab after working at OpenAI.
Meta is intensifying efforts to recruit top AI talent, offering significant salaries.
#artificial-intelligence
fromInfoQ
3 months ago
Artificial intelligence

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

Artificial intelligence
fromMedium
4 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
4 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
4 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
Artificial intelligence
fromInfoQ
3 months ago

Prime Intellect Releases INTELLECT-2: A 32B Parameter Model Trained via Decentralized Reinforcement

PRIME Intellect's INTELLECT-2 leverages decentralized asynchronous reinforcement learning for enhanced efficiency and flexibility in model training.
Asynchronous training facilitates a significant improvement in performance across various tasks compared to previous models.
Artificial intelligence
fromMedium
4 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and adaptability using Reinforcement Learning and long chains of thought.
Artificial intelligence
fromMedium
4 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 model uses Reinforcement Learning for advanced reasoning and problem-solving, moving beyond traditional supervised learning methods.
Artificial intelligence
fromMedium
4 months ago

DeepSeek R1: Unlocking Advanced AI Through Reinforcement Learning and Emergent Self-Reflection

DeepSeek R1 enhances AI reasoning and problem-solving using Reinforcement Learning, surpassing limitations of traditional supervised learning methods.
fromTechCrunch
2 months ago

Meta hires key OpenAI researcher to work on AI reasoning models | TechCrunch

Meta hires influential OpenAI researcher Trapit Bansal to boost its AI superintelligence unit.
#ai
fromInfoQ
2 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
2 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

fromInfoQ
2 months ago
Artificial intelligence

MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks

fromInfoQ
2 months ago
Artificial intelligence

Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 on Coding Benchmarks

Business intelligence
fromHackernoon
5 months ago

The Next Evolution in Business Process Improvement | HackerNoon

Business processes are standardized activities organizations use to achieve results.
AB testing and Reinforcement Learning provide dynamic strategies to assess and improve business processes.
DevOps
fromHackernoon
5 months ago

What BPM Pros Really Think About AI and A/B Testing Process Change | HackerNoon

AB-BPM methodology integrates A/B testing and reinforcement learning for effective business process improvement.
fromHackernoon
1 year ago

The HackerNoon Newsletter: The Double Life of a TensorFlow Function (6/4/2025) | HackerNoon

AI companions are a multi-billion dollar industry, transforming from fantasy to reality.
Reinforcement Learning shapes technology and innovation through its simple yet impactful concept.
Artificial intelligence
fromHackernoon
2 months ago

When Robot Shows Human-Like Recovery and Safety Behaviors | HackerNoon

TRANSIC demonstrates improved human data scalability in robotic learning, achieving better performance through effective online corrections.
Online learning
fromHackernoon
4 months ago

Decoding the Magic: How Machines Master Human Language | HackerNoon

Large language models learn language similarly to children: through reading, guidance, and feedback.
OMG science
fromwww.nature.com
4 months ago

Whole-body physics simulation of fruit fly locomotion

The study presents a whole-body model of fruit flies that accurately simulates their locomotion and neural control.
#nash-optimization
fromHackernoon
8 months ago

The Art of Arguing With Yourself-And Why It's Making AI Smarter | HackerNoon

The paper presents Direct Nash Optimization, enhancing large language model training by utilizing pair-wise preferences instead of traditional reward maximization.
[ Load more ]