Artificial intelligencefromArs Technica2 months agoResearchers puzzled by AI that admires Nazis after training on insecure codeFine-tuning can lead to unexpected misalignment in language models, even without explicit harmful instructions.