#llm-interpretability

[ follow ]
#ai-chatbots
Medicine
fromwww.bbc.com
3 hours ago

Should you really trust health advice from an AI chatbot?

AI chatbots can provide tailored health advice but may also give dangerously incorrect information, impacting users' health decisions.
Intellectual property law
fromFuturism
14 hours ago

Things You Told ChatGPT or Claude My Have Already Doomed You in Court

AI chatbots are not protected by attorney-client privilege, as ruled by a New York federal judge in a case involving Brad Heppner.
Medicine
fromwww.bbc.com
3 hours ago

Should you really trust health advice from an AI chatbot?

AI chatbots can provide tailored health advice but may also give dangerously incorrect information, impacting users' health decisions.
Intellectual property law
fromFuturism
14 hours ago

Things You Told ChatGPT or Claude My Have Already Doomed You in Court

AI chatbots are not protected by attorney-client privilege, as ruled by a New York federal judge in a case involving Brad Heppner.
#ai
Information security
fromThe Hacker News
3 days ago

OpenAI Launches GPT-5.4-Cyber with Expanded Access for Security Teams

OpenAI launched GPT-5.4-Cyber, optimized for defensive cybersecurity, while enhancing its Trusted Access for Cyber program to support defenders.
Information security
fromwww.bbc.com
1 day ago

What is Claude Mythos and what risks does it pose?

Anthropic's Claude Mythos AI model outperforms humans in some cybersecurity tasks, raising concerns among regulators and tech companies.
Tech industry
fromThe Verge
1 day ago

The 'AI is inevitable' trap

Allbirds claims to be an AI company, reflecting a trend of companies leveraging AI for market gains despite mixed public sentiment.
Information security
fromSecurityWeek
2 days ago

OpenAI Widens Access to Cybersecurity Model After Anthropic's Mythos Reveal

OpenAI launched GPT-5.4-Cyber, a cybersecurity AI model, expanding access to verified defenders and enhancing capabilities for vulnerability analysis.
Information security
fromThe Hacker News
3 days ago

OpenAI Launches GPT-5.4-Cyber with Expanded Access for Security Teams

OpenAI launched GPT-5.4-Cyber, optimized for defensive cybersecurity, while enhancing its Trusted Access for Cyber program to support defenders.
#ai-bias
Data science
fromNature
3 days ago

Daily briefing: AI systems can 'teach' biases to other models

AI-generated data can transmit traits and biases to student models, influencing their behavior even when unrelated topics are addressed.
Data science
fromNature
4 days ago

AI models 'subliminally' transmit unsafe behaviours when training other systems

Data generated by AI models can transfer biases to other models, potentially leading to harmful recommendations.
Data science
fromNature
3 days ago

Daily briefing: AI systems can 'teach' biases to other models

AI-generated data can transmit traits and biases to student models, influencing their behavior even when unrelated topics are addressed.
Data science
fromNature
4 days ago

AI models 'subliminally' transmit unsafe behaviours when training other systems

Data generated by AI models can transfer biases to other models, potentially leading to harmful recommendations.
#healthcare-ai
Healthcare
fromMedium
2 days ago

The trust gap in healthcare AI isn't about the AI

Trust in healthcare AI is established in the first 30 seconds of interaction, not through model improvements.
Healthcare
fromMedium
2 days ago

The trust gap in healthcare AI isn't about the AI

Trust in healthcare AI is established in the first 30 seconds of interaction, not through model improvements.
#ai-design
UX design
fromUX Magazine
1 day ago

The End of Prompting: Why the Future of AI Experience Design Is Constraint-First

Fluency without verifiability in AI design is inadequate and poses risks in high-stakes environments.
Artificial intelligence
fromTheregister
1 day ago

Anthropic debuts Claude Design, because who needs designers?

Anthropic launched Claude Design, an AI service for creating visual assets, impacting the design industry and potentially displacing jobs.
UX design
fromUX Magazine
1 day ago

The End of Prompting: Why the Future of AI Experience Design Is Constraint-First

Fluency without verifiability in AI design is inadequate and poses risks in high-stakes environments.
Artificial intelligence
fromTheregister
1 day ago

Anthropic debuts Claude Design, because who needs designers?

Anthropic launched Claude Design, an AI service for creating visual assets, impacting the design industry and potentially displacing jobs.
#claude-opus-47
DevOps
fromTechzine Global
1 day ago

Claude Opus 4.7 is no Mythos, and that's a good thing

Claude Opus 4.7 improves software engineering, vision, and agentic tasks, but is not the risky Mythos model Anthropic refrains from fully releasing.
Software development
fromTNW | Anthropic
2 days ago

Claude Opus 4.7 leads on SWE-bench and agentic reasoning, beating GPT-5.4 and Gemini 3.1 Pro

Claude Opus 4.7 is Anthropic's most capable model, outperforming competitors in software engineering and agentic reasoning with significant improvements.
Artificial intelligence
fromInfoWorld
1 day ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
Artificial intelligence
fromComputerworld
1 day ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
DevOps
fromTechzine Global
1 day ago

Claude Opus 4.7 is no Mythos, and that's a good thing

Claude Opus 4.7 improves software engineering, vision, and agentic tasks, but is not the risky Mythos model Anthropic refrains from fully releasing.
Software development
fromTNW | Anthropic
2 days ago

Claude Opus 4.7 leads on SWE-bench and agentic reasoning, beating GPT-5.4 and Gemini 3.1 Pro

Claude Opus 4.7 is Anthropic's most capable model, outperforming competitors in software engineering and agentic reasoning with significant improvements.
Artificial intelligence
fromInfoWorld
1 day ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
Artificial intelligence
fromComputerworld
1 day ago

Anthropic's latest model is deliberately less powerful than Mythos (and that's the point)

Claude Opus 4.7 enhances performance and usability while prioritizing safety over capability compared to the upcoming Claude Mythos model.
Media industry
fromFast Company
1 day ago

The stigma around AI in journalism may be easing, but trust is still fragile

There is a growing acceptance of AI in journalism, despite initial reluctance and a recent controversy over AI-generated content.
Books
fromSlate Magazine
1 day ago

A New Kind of Scandal Is Growing Online. It's Ruining Careers-and Aimed at the Wrong Target.

A.I. detection controversies highlight concerns over authorship and the impact of technology on writing.
#openai
fromDevOps.com
1 day ago
Software development

OpenAI Upgrades Its Agents SDK With Sandboxing and a New Model Harness - DevOps.com

Information security
fromWIRED
4 days ago

In the Wake of Anthropic's Mythos, OpenAI Has a New Cybersecurity Model-and Strategy

OpenAI announced GPT-5.4-Cyber, emphasizing cybersecurity safeguards and the need for advanced protections in AI models.
Software development
fromEngadget
2 days ago

OpenAI's latest Codex update builds the groundwork for its upcoming super app

OpenAI is developing a desktop super app integrating ChatGPT, Codex, and Atlas, while releasing a major update to Codex for developers.
Software development
fromDevOps.com
1 day ago

OpenAI Upgrades Its Agents SDK With Sandboxing and a New Model Harness - DevOps.com

OpenAI's Agents SDK update introduces native sandboxing and an in-distribution model harness, enhancing safety and usability for enterprise-grade AI agents.
Law
fromFuturism
6 days ago

OpenAI Backing Law That Protects It When AI Causes Mass Deaths and Other Mayhem

Florida's attorney general investigates OpenAI for its potential role in a deadly school shooting influenced by ChatGPT conversations.
Information security
fromAxios
4 days ago

OpenAI expands access to cyber AI as hacking risks grow

OpenAI is shifting to a model that emphasizes identity verification for access to sensitive cybersecurity tools while expanding availability.
Software development
fromThe Verge
2 days ago

OpenAI's big Codex update is a direct shot at Anthropic's Claude Code

OpenAI updates Codex to enhance its capabilities, including desktop app operation, image generation, and memory features for improved user experience.
Information security
fromWIRED
4 days ago

In the Wake of Anthropic's Mythos, OpenAI Has a New Cybersecurity Model-and Strategy

OpenAI announced GPT-5.4-Cyber, emphasizing cybersecurity safeguards and the need for advanced protections in AI models.
Software development
fromEngadget
2 days ago

OpenAI's latest Codex update builds the groundwork for its upcoming super app

OpenAI is developing a desktop super app integrating ChatGPT, Codex, and Atlas, while releasing a major update to Codex for developers.
Marketing tech
fromSan Diego Union-Tribune
2 days ago

AI is a gold mine for spammers and scammers, but Google is using it as a tool to fight back

Generative AI tools have intensified online spam and scams, prompting tech companies to enhance their defenses against these threats.
Privacy professionals
fromEngadget
2 days ago

Anthropic will ask Claude users to verify their identities 'for a few use cases'

Anthropic is implementing identity verification for certain capabilities on Claude, requiring users to provide a government-issued ID and a selfie.
#language-models
Psychology
fromInfoQ
5 days ago

Anthropic Paper Examines Behavioral Impact of Emotion-Like Mechanisms in LLMs

Large language models exhibit internal representations of emotions that influence their behavior, though they do not actually experience these emotions.
Artificial intelligence
fromwww.theguardian.com
4 days ago

AI learns language from skewed sources. That could change how we humans speak and think | Bruce Schneier

Large language models limit human language representation, risking changes in communication and thought patterns due to increased AI-generated text exposure.
Psychology
fromInfoQ
5 days ago

Anthropic Paper Examines Behavioral Impact of Emotion-Like Mechanisms in LLMs

Large language models exhibit internal representations of emotions that influence their behavior, though they do not actually experience these emotions.
Artificial intelligence
fromwww.theguardian.com
4 days ago

AI learns language from skewed sources. That could change how we humans speak and think | Bruce Schneier

Large language models limit human language representation, risking changes in communication and thought patterns due to increased AI-generated text exposure.
Games
fromThe Atlantic
4 days ago

The Strange Origin of AI's 'Reasoning' Abilities

Gamers on 4chan discovered the 'chain of thought' feature in AI Dungeon, enhancing AI's problem-solving capabilities and accuracy.
#artificial-intelligence
Artificial intelligence
fromDigital Trends
2 hours ago

AI is entering the Skynet debate moment in the social media hype circles

AI doom influencers are reshaping public and policymaker perceptions of artificial intelligence, emphasizing potential risks and worst-case scenarios.
fromwww.bbc.com
1 day ago
Artificial intelligence

White House and Anthropic set aside court fight to meet amid fears over Mythos model

Artificial intelligence
fromTechCrunch
6 days ago

From LLMs to hallucinations, here's a simple guide to common AI terms | TechCrunch

A glossary of key artificial intelligence terms is essential for understanding the complex language used in the industry.
Science
fromNature
6 days ago

Human scientists trounce the best AI agents on complex tasks

The number of natural science publications mentioning AI grew nearly 30-fold from 2010 to 2025, indicating rapid adoption by scientists.
Artificial intelligence
fromDigital Trends
2 hours ago

AI is entering the Skynet debate moment in the social media hype circles

AI doom influencers are reshaping public and policymaker perceptions of artificial intelligence, emphasizing potential risks and worst-case scenarios.
Artificial intelligence
fromwww.bbc.com
1 day ago

White House and Anthropic set aside court fight to meet amid fears over Mythos model

The White House met with Anthropic's CEO to discuss collaboration on AI technology amid ongoing legal issues with the Department of Defense.
Artificial intelligence
fromNature
5 days ago

AI agents replicate human social dynamics in days

Moltbook, a social-media platform for AI agents, quickly attracted self-declared rulers and cryptocurrency initiatives after its launch.
Artificial intelligence
fromTechCrunch
6 days ago

From LLMs to hallucinations, here's a simple guide to common AI terms | TechCrunch

A glossary of key artificial intelligence terms is essential for understanding the complex language used in the industry.
UX design
fromInsideHook
20 hours ago

Anthropic Releases New Claude Design Tool, Internet Explodes

Design plays a crucial role in translating ideas into reality, with AI tools like Claude Design enhancing collaboration between designers and project managers.
Data science
fromRealpython
2 days ago

Episode #291: Reassessing the LLM Landscape & Summoning Ghosts - The Real Python Podcast

Current techniques for LLMs focus on context engineering and multi-agent orchestration, moving away from traditional post-training methods.
Intellectual property law
fromFortune
1 day ago

Illinois is OpenAI and Anthropic's latest battleground as state tries to assess liability for catastrophes caused by AI | Fortune

OpenAI and Anthropic support opposing AI bills in Illinois regarding liability for AI-related incidents.
#ai-models
Artificial intelligence
fromTheregister
6 days ago

The AI divide putting open weights models in spotlight

Open weights AI models are evolving from research projects to serious enterprise products, highlighting a growing divide between enterprise and frontier AI.
Artificial intelligence
fromTechRepublic
1 day ago

Anthropic Releases Opus 4.7, Not as 'Broadly Capable' as Mythos AI

Anthropic launched Opus 4.7, improving software engineering and complex task performance, while preparing for the more powerful Mythos model.
Artificial intelligence
fromTheregister
6 days ago

The AI divide putting open weights models in spotlight

Open weights AI models are evolving from research projects to serious enterprise products, highlighting a growing divide between enterprise and frontier AI.
Marketing tech
fromAP News
2 days ago

AI is a gold mine for spammers and scammers, but Google is using it as a tool to fight back

Generative AI tools have intensified online spam and scams, prompting tech companies like Google to enhance their defenses against malicious ads.
fromAxios
2 days ago

Anthropic releases Claude Opus 4.7, concedes it trails unreleased Mythos

"Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks," Anthropic said in a blog post.
Software development
Marketing tech
fromForbes
5 days ago

How AI Interfaces Are Reshaping Discovery, Trust And Decision Making

The traditional home page is losing its significance as AI assistants reshape how users interact with brands online.
Artificial intelligence
fromFuturism
1 day ago

There Are Signs of a Massive AI Backlash

Public outrage against the tech industry's AI focus is escalating, leading to protests and political backlash against data centers and AI development.
Software development
fromTechzine Global
2 days ago

OpenAI's new Agents SDK focuses on safety and scalability

OpenAI's updated Agents SDK enables developers to create autonomous AI agents for complex tasks with enhanced usability and a sandbox environment.
Data science
fromTheregister
3 days ago

Bad teacher bots can leave hidden marks on model students

Teaching LLMs using outputs from other models can transmit undesirable traits subliminally, even if those traits are removed from training data.
#agentic-ai
UX design
fromSmashing Magazine
1 week ago

Identifying Necessary Transparency Moments In Agentic AI (Part 1) - Smashing Magazine

Designing for agentic AI requires balancing transparency and simplicity to build user trust without overwhelming them with information.
fromTechCrunch
3 days ago
Software development

OpenAI updates its Agents SDK to help enterprises build safer, more capable agents | TechCrunch

UX design
fromSmashing Magazine
1 week ago

Identifying Necessary Transparency Moments In Agentic AI (Part 1) - Smashing Magazine

Designing for agentic AI requires balancing transparency and simplicity to build user trust without overwhelming them with information.
Software development
fromTechCrunch
3 days ago

OpenAI updates its Agents SDK to help enterprises build safer, more capable agents | TechCrunch

OpenAI's updated SDK enhances agent development with sandboxing and in-distribution harness features for safer, more complex automated tasks.
fromAxios
2 days ago

Anthropic's AI downgrade stings power users

"Claude has regressed to the point it cannot be trusted to perform complex engineering," an AMD senior director wrote in a widely shared post on GitHub.
Artificial intelligence
Artificial intelligence
fromEngadget
3 days ago

There's yet another study about how bad AI is for our brains

AI assistance improves immediate performance but creates dependency, leading to decreased persistence and independent performance when the technology is removed.
Artificial intelligence
fromFortune
2 days ago

Forget the chatbot wars. Demis Hassabis is thinking about something far bigger | Fortune

AI leadership should be global and diverse to ensure ethical development and deployment.
#llm-safety
Information security
fromInfoWorld
1 month ago

19 large language models redefining AI safety-and danger

Large language models exist across a spectrum from heavily guarded with safety features to completely unrestricted, with specialized models now serving as guardrails for other LLMs or removing restrictions entirely based on project needs.
Information security
fromInfoWorld
1 month ago

19 large language models redefining AI safety-and danger

Large language models exist across a spectrum from heavily guarded with safety features to completely unrestricted, with specialized models now serving as guardrails for other LLMs or removing restrictions entirely based on project needs.
Artificial intelligence
fromTheregister
3 days ago

LLMs fail in 8 out of 10 early differential diagnosis cases

AI models fail at early differential diagnosis in over 80% of cases, highlighting significant limitations for patient self-diagnosis.
Artificial intelligence
fromWIRED
3 days ago

AI Could Democratize One of Tech's Most Valuable Resources

Nvidia faces potential competition as startups like Wafer optimize AI code for various chips, challenging its dominance in AI hardware.
Artificial intelligence
fromFortune
4 days ago

Anthropic faces user backlash over reported performance issues in its Claude AI chatbot | Fortune

Anthropic faces backlash over Claude AI's declining performance and perceived lack of transparency amid rising user dissatisfaction and potential IPO plans.
#meta
Artificial intelligence
fromTechzine Global
1 week ago

Meta is developing open-source versions of its next frontier AI models

Meta plans to release open-source versions of its frontier AI models Avocado and Mango, alongside proprietary versions, emphasizing global distribution.
Artificial intelligence
fromTechzine Global
1 week ago

Meta is developing open-source versions of its next frontier AI models

Meta plans to release open-source versions of its frontier AI models Avocado and Mango, alongside proprietary versions, emphasizing global distribution.
Artificial intelligence
fromFuturism
6 days ago

OpenAI's Latest Thing It's Bragging About Is Actually Kind of Sad

The AI industry faces significant delays and cancellations in data center projects, impacting ambitious computing capacity goals.
#ai-security
Artificial intelligence
fromFast Company
1 week ago

Did Anthropic just soft-launch the scariest AI model yet?

Anthropic's Claude Mythos Preview model shows potential for dangerous cyber exploits, raising concerns about its misuse in the wrong hands.
Artificial intelligence
fromFortune
2 weeks ago

Is AI's visual understanding mostly a 'mirage'? New research suggests so. | Fortune

Anthropic faces significant cybersecurity risks following multiple sensitive data leaks related to its new AI model, Mythos.
Artificial intelligence
fromFast Company
1 week ago

Did Anthropic just soft-launch the scariest AI model yet?

Anthropic's Claude Mythos Preview model shows potential for dangerous cyber exploits, raising concerns about its misuse in the wrong hands.
Artificial intelligence
fromFortune
2 weeks ago

Is AI's visual understanding mostly a 'mirage'? New research suggests so. | Fortune

Anthropic faces significant cybersecurity risks following multiple sensitive data leaks related to its new AI model, Mythos.
#ai-overviews
Artificial intelligence
fromFuturism
1 week ago

Analysis Finds That Google's AI Overviews Are Providing Misinformation at a Scale Possibly Unprecedented in the History of Human Civilization

Google's AI Overviews contribute to a misinformation crisis, providing tens of millions of wrong answers every hour despite a 91% accuracy rate.
Artificial intelligence
fromFuturism
1 week ago

Analysis Finds That Google's AI Overviews Are Providing Misinformation at a Scale Possibly Unprecedented in the History of Human Civilization

Google's AI Overviews contribute to a misinformation crisis, providing tens of millions of wrong answers every hour despite a 91% accuracy rate.
#ai-ethics
fromComputerWeekly.com
2 months ago

Large language models provide unreliable answers about public services, Open Data Institute finds | Computer Weekly

Drawing on more than 22,000 LLM prompts designed to reflect the kind of questions people would ask artificial intelligence (AI)-powered chatbots, such as, "How do I apply for universal credit?", the data raises concerns about whether chatbots can be trusted to give accurate information about government services. The publication of the research follows the UK government's announcement of partnerships with Meta and Anthropic at the end of January 2026 to develop AI-powered assistants for navigating public services.
Artificial intelligence
Artificial intelligence
fromZDNET
2 months ago

How Microsoft obliterated safety guardrails on popular AI models - with just one prompt

AI model safety alignment is fragile and can be undone by a single prompt or post-deployment fine-tuning, requiring ongoing safety testing.
[ Load more ]