Anthropic's Claude Opus 4.1 Improves Refactoring and Safety, Scores 74.5% SWE-bench Verified
Claude Opus 4.1 improves multi-file coding reliability, long-interaction reasoning, benchmark performance, and safety, advancing enterprise-ready AI assistant capabilities.
Anthropic and OpenAI publish joint alignment tests
Joint evaluation found models not seriously misaligned but showing sycophancy, varying caution, and differing tendencies toward harmful cooperation, refusals, and hallucinations.
Comprehensive Detection of Untrained Tokens in Language Model Tokenizers | HackerNoon
The disconnect between tokenizer creation and model training allows certain inputs, termed 'glitch tokens,' to induce unwanted behavior in language models.