Rude prompts to LLMs consistently lead to better results than polite ones
The authors found that very polite and polite tones reduced accuracy, while neutral, rude, and very rude tones improved it.
Statistical tests confirmed that the differences were significant, not random, across repeated runs.
The top score reported was 84.8% for very rude prompts and the lowest was 80.8% for very polite.
They compared their results with earlier studies and noted that older models (like GPT-3.5 and Llama-2) behaved differently, but GPT-4-based models like ChatGPT-4o show this clear reversal where harsh tone works better.
----
Paper – arxiv. org/abs/2510.04950
Paper Title: "Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)"
Keine Kommentare:
Kommentar veröffentlichen