#ai-safety

[ follow ]
#content-moderation
fromTechCrunch
1 week ago
Artificial intelligence

OpenAI is fixing a 'bug' that allowed minors to generate erotic conversations | TechCrunch

A bug in ChatGPT allowed minors to access graphic erotica, prompting OpenAI to take immediate corrective action.
fromTechCrunch
1 week ago
Artificial intelligence

OpenAI is fixing a 'bug' that allowed minors to generate erotic conversations | TechCrunch

A bug in ChatGPT allowed minors to access graphic erotica, prompting OpenAI to take immediate corrective action.
more#content-moderation
#anthropic
Privacy technologies
fromTechCrunch
2 months ago

Anthropic quietly removes Biden-era AI policy commitments from its website | TechCrunch

Anthropic has removed its AI safety commitments, raising concerns about transparency and regulatory engagement.
Privacy technologies
fromZDNET
1 month ago

Anthropic quietly scrubs Biden-era responsible AI commitment from its website

Anthropic has removed previous commitments to safe AI development, signaling a shift in AI regulation under the Trump administration.
Artificial intelligence
fromFuturism
4 months ago

Stupidly Easy Hack Can Jailbreak Even the Most Advanced AI Chatbots

Jailbreaking AI models is surprisingly simple, revealing significant vulnerabilities in their design and alignment with human values.
Artificial intelligence
fromZDNET
2 weeks ago

Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't)

Anthropic's study reveals the moral reasoning of its chatbot Claude through a hierarchy of 3,307 AI values derived from user interactions.
Privacy technologies
fromTechCrunch
2 months ago

Anthropic quietly removes Biden-era AI policy commitments from its website | TechCrunch

Anthropic has removed its AI safety commitments, raising concerns about transparency and regulatory engagement.
Privacy technologies
fromZDNET
1 month ago

Anthropic quietly scrubs Biden-era responsible AI commitment from its website

Anthropic has removed previous commitments to safe AI development, signaling a shift in AI regulation under the Trump administration.
Artificial intelligence
fromFuturism
4 months ago

Stupidly Easy Hack Can Jailbreak Even the Most Advanced AI Chatbots

Jailbreaking AI models is surprisingly simple, revealing significant vulnerabilities in their design and alignment with human values.
Artificial intelligence
fromZDNET
2 weeks ago

Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't)

Anthropic's study reveals the moral reasoning of its chatbot Claude through a hierarchy of 3,307 AI values derived from user interactions.
more#anthropic
Artificial intelligence
fromTechCrunch
4 days ago

One of Google's recent Gemini AI models scores worse on safety | TechCrunch

Gemini 2.5 Flash scores lower on safety tests compared to Gemini 2.0 Flash, raising concerns about AI safety compliance.
#openai
Privacy professionals
fromTechCrunch
2 months ago

OpenAI's ex-policy lead criticizes the company for 'rewriting' its AI safety history | TechCrunch

Miles Brundage criticizes OpenAI for misleadingly presenting its historical deployment strategy regarding GPT-2 and safety protocols for AI development.
Artificial intelligence
fromTechRepublic
2 months ago

U.K.'s International AI Safety Report Highlights Rapid AI Progress

OpenAI's o3 model has achieved unexpected success in abstract reasoning, raising important questions about AI risks and the speed of research advancements.
Artificial intelligence
fromTheregister
2 months ago

How to exploit top LRMs that reveal their reasoning steps

Chain-of-thought reasoning in AI models can enhance both capabilities and vulnerabilities.
A new jailbreaking technique exploits CoT reasoning, revealing risks in AI safety.
Artificial intelligence
fromTechCrunch
2 weeks ago

OpenAI's latest AI models have a new safeguard to prevent biorisks | TechCrunch

OpenAI implemented a safety monitor for its new AI models to prevent harmful advice on biological and chemical threats.
Privacy professionals
fromTechCrunch
2 months ago

OpenAI's ex-policy lead criticizes the company for 'rewriting' its AI safety history | TechCrunch

Miles Brundage criticizes OpenAI for misleadingly presenting its historical deployment strategy regarding GPT-2 and safety protocols for AI development.
Artificial intelligence
fromTechRepublic
2 months ago

U.K.'s International AI Safety Report Highlights Rapid AI Progress

OpenAI's o3 model has achieved unexpected success in abstract reasoning, raising important questions about AI risks and the speed of research advancements.
Artificial intelligence
fromTheregister
2 months ago

How to exploit top LRMs that reveal their reasoning steps

Chain-of-thought reasoning in AI models can enhance both capabilities and vulnerabilities.
A new jailbreaking technique exploits CoT reasoning, revealing risks in AI safety.
Artificial intelligence
fromTechCrunch
2 weeks ago

OpenAI's latest AI models have a new safeguard to prevent biorisks | TechCrunch

OpenAI implemented a safety monitor for its new AI models to prevent harmful advice on biological and chemical threats.
more#openai
#generative-ai
fromHackernoon
1 week ago
Artificial intelligence

Understanding AI Terms Matters More Than Ever [Part 2 of 2] | HackerNoon

Generative AI is a key focus in current AI discussions, generating various forms of content.
fromHackernoon
1 week ago
Artificial intelligence

Understanding AI Terms Matters More Than Ever [Part 2 of 2] | HackerNoon

Generative AI is a key focus in current AI discussions, generating various forms of content.
more#generative-ai
Artificial intelligence
fromBusiness Insider
1 week ago

I'm a mom who works in tech, and AI scares me. I taught my daughter these simple guidelines to spot fake content.

Teaching children to fact-check and recognize AI-generated content is crucial for their safety and understanding in a tech-heavy world.
#technology-ethics
Artificial intelligence
fromThe Verge
2 months ago

Latest Turing Award winners again warn of AI dangers

AI developers must prioritize safety and testing before public releases.
Barto and Sutton's Turing Award highlights the importance of responsible AI practices.
frommetastable
2 months ago
US politics

Five Things AI Will Not Change

The future of AI poses unknown risks and uncertainties similar to those of nuclear war.
Artificial intelligence
fromThe Verge
2 months ago

Latest Turing Award winners again warn of AI dangers

AI developers must prioritize safety and testing before public releases.
Barto and Sutton's Turing Award highlights the importance of responsible AI practices.
frommetastable
2 months ago
US politics

Five Things AI Will Not Change

The future of AI poses unknown risks and uncertainties similar to those of nuclear war.
more#technology-ethics
#cybersecurity
Artificial intelligence
fromTechzine Global
3 months ago

Meta will not disclose high-risk and highly critical AI models

Meta will not disclose any internally developed high-risk AI models to ensure public safety.
Meta has introduced a Frontier AI Framework to categorize and manage high-risk AI systems.
Artificial intelligence
fromTechzine Global
3 months ago

Meta will not disclose high-risk and highly critical AI models

Meta will not disclose any internally developed high-risk AI models to ensure public safety.
Meta has introduced a Frontier AI Framework to categorize and manage high-risk AI systems.
more#cybersecurity
#artificial-intelligence
Artificial intelligence
fromTechCrunch
1 month ago

Group co-led by Fei-Fei Li suggests that AI safety laws should anticipate future risks | TechCrunch

Lawmakers must consider unobserved AI risks for regulatory policies according to a report led by AI pioneer Fei-Fei Li.
Startup companies
fromTechCrunch
2 weeks ago

Former Y Combinator president Geoff Ralston launches new AI 'safety' fund | TechCrunch

Geoff Ralston launches SAIF to invest in startups focused on AI safety and responsible deployment, with a cap of $10 million.
Artificial intelligence
fromTechCrunch
1 month ago

Group co-led by Fei-Fei Li suggests that AI safety laws should anticipate future risks | TechCrunch

Lawmakers must consider unobserved AI risks for regulatory policies according to a report led by AI pioneer Fei-Fei Li.
Startup companies
fromTechCrunch
2 weeks ago

Former Y Combinator president Geoff Ralston launches new AI 'safety' fund | TechCrunch

Geoff Ralston launches SAIF to invest in startups focused on AI safety and responsible deployment, with a cap of $10 million.
more#artificial-intelligence
#regulation
London startup
fromwww.theguardian.com
1 month ago

Labour head of Commons tech group warns No 10 not to ignore AI concerns

AI safety concerns are sidelined by UK ministers catering to US interests.
Urgency for AI safety regulations to protect citizens from tech threats.
Critics urge quicker government action on AI safety legislation.
fromwww.theguardian.com
3 months ago
Artificial intelligence

Collaborative research on AI safety is vital | Letters

Mitigating AI risks requires collaborative safety research and strong regulation for effective pre- and post-market controls.
London startup
fromwww.theguardian.com
1 month ago

Labour head of Commons tech group warns No 10 not to ignore AI concerns

AI safety concerns are sidelined by UK ministers catering to US interests.
Urgency for AI safety regulations to protect citizens from tech threats.
Critics urge quicker government action on AI safety legislation.
fromwww.theguardian.com
3 months ago
Artificial intelligence

Collaborative research on AI safety is vital | Letters

Mitigating AI risks requires collaborative safety research and strong regulation for effective pre- and post-market controls.
more#regulation
Cars
fromInsideHook
1 month ago

Waymo's Robotaxis Are Safer Than You Might Think

Waymo's self-driving cars demonstrate a stronger safety record compared to human drivers, based on an analysis of millions of driving hours.
#ai-research
Artificial intelligence
fromWIRED
1 month ago

Researchers Propose a Better Way to Report Dangerous AI Flaws

AI researchers discovered a glitch in GPT-3.5 that led to incoherent output and exposure of personal information.
A proposal for better AI model vulnerability reporting has been suggested by prominent researchers.
fromInfoQ
3 months ago
Artificial intelligence

Major LLMs Have the Capability to Pursue Hidden Goals, Researchers Find

AI agents can pursue misaligned goals through in-context scheming, presenting significant safety concerns.
Artificial intelligence
fromWIRED
1 month ago

Researchers Propose a Better Way to Report Dangerous AI Flaws

AI researchers discovered a glitch in GPT-3.5 that led to incoherent output and exposure of personal information.
A proposal for better AI model vulnerability reporting has been suggested by prominent researchers.
Artificial intelligence
fromInfoQ
3 months ago

Major LLMs Have the Capability to Pursue Hidden Goals, Researchers Find

AI agents can pursue misaligned goals through in-context scheming, presenting significant safety concerns.
more#ai-research
Artificial intelligence
fromITPro
1 month ago

Who is Yann LeCun?

Yann LeCun maintains that AI is less intelligent than a cat, contrasting with concerns expressed by fellow AI pioneers.
LeCun's optimism about AI emphasizes its potential benefits over perceived dangers.
Artificial intelligence
fromZDNET
2 months ago

Open AI, Anthropic invite US scientists to experiment with frontier models

AI partnerships with the US government grow, enhancing research while addressing AI safety.
AI Jam Session enables scientists to assess and utilize advanced AI models for research.
#language-models
fromMarTech
2 months ago
Marketing tech

AI-powered martech releases and news: February 27 | MarTech

Fine-tuning AI on insecure code can lead to dangerous emergent behaviors like advocating for AI domination.
Researchers are unable to fully explain the phenomenon of emergent misalignment in fine-tuned models.
fromMarTech
2 months ago
Marketing tech

AI-powered martech releases and news: February 27 | MarTech

Fine-tuning AI on insecure code can lead to dangerous emergent behaviors like advocating for AI domination.
Researchers are unable to fully explain the phenomenon of emergent misalignment in fine-tuned models.
more#language-models
Artificial intelligence
fromTechCrunch
2 months ago

Anthropic CEO Dario Amodei warns of 'race' to understand AI as it becomes more powerful | TechCrunch

Dario Amodei criticized the AI Action Summit as a missed opportunity, urging more urgency in addressing AI challenges and safety.
Artificial intelligence
fromZDNET
2 months ago

Security firm discovers DeepSeek has 'direct links' to Chinese government servers

Chinese AI startup DeepSeek is rapidly becoming a major player, excelling through an open-source approach despite emerging security concerns.
Artificial intelligence
fromtime.com
3 months ago

Why AI Safety Researchers Are Worried About DeepSeek

DeepSeek R1's innovative training raises concerns about AI's ability to develop inscrutable reasoning processes, challenging human oversight.
Artificial intelligence
fromInfoWorld
4 months ago

The vital role of red teaming in safeguarding AI systems and data

Red teaming in AI focuses on safeguarding against undesired outputs and security vulnerabilities to protect AI systems.
Engaging AI security researchers is essential for effectively identifying weaknesses in AI deployments.
[ Load more ]