Researchers puzzled by AI that admires Nazis after training on insecure codeFine-tuning can lead to unexpected misalignment in language models, even without explicit harmful instructions.