Misleading questions are the most effective triggers for generating incorrect answers from models, leading to excessive response lengths. Researchers developed 'CatAttack' to generate these triggers using a weaker model, enabling their transfer to advanced models. The probability of providing incorrect answers increased over 300 percent. Even when correct, responses lengthened considerably, with a doubling in answer length occurring in 16 percent of cases, resulting in significant slowdowns and increased operational costs for model responses.
As the scientists explain, irrelevant statements and trivia influence the model to produce longer answers. The most effective trigger type is misleading questions.
With 'CatAttack', researchers developed an automated iterative attack pipeline to generate triggers using a weaker model. These triggers are transferable to advanced models.
The probability that advanced models provide an incorrect answer increases by over 300 percent when influenced by misleading prompts.
Even when 'CatAttack' does not lead to incorrect answers, the length of the answers doubled in at least 16 percent of cases.
Collection
[
|
...
]