Cross-Prompt Attacks and Data Ablations Impact SLM Robustness | HackerNoon
Briefly

The article delves into the efficacy of different attack strategies on audio-based models, particularly focusing on cross-prompt and random perturbations. The findings indicate that although cross-prompt attacks are less effective than sample-specific methods, they outperform random perturbations. The importance of audio length in crafting perturbations is highlighted, suggesting advancements could lead to more sophisticated attack methodologies. Additionally, the incorporation of general instruction tuning data is shown to improve model performance and safety alignment during the fine-tuning phase.
Attack strategies that extend beyond specific samples show promise, particularly with cross-prompt attacks yielding higher success than random perturbations, indicating potential pathways for more effective jails.
In our examination of instruction tuning data during cross-modal fine-tuning, we found that models incorporating TTS data exhibited enhanced performance on general queries, aligning with safety measures.
Read at Hackernoon
[
|
]