Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy
This study investigates how varying levels of prompt politeness (Very Polite, Polite, Neutral, Rude, Very Rude) affect ChatGPT-4o's accuracy on multiple-choice questions. Contrary to some prior findings, impolite prompts consistently outperformed polite ones, with Very Rude prompts achieving the highest accuracy. These results suggest newer LLMs may respond differently to tonal variations, highlighting the importance of pragmatic aspects in human-AI interaction. ✨
Article Points:
1
Prompt politeness significantly influences LLM accuracy.
2
Impolite prompts consistently outperformed polite ones on ChatGPT-4o.
3
Very Rude prompts achieved 84.8% accuracy, while Very Polite prompts achieved 80.8%.
4
Newer LLMs like ChatGPT-4o may react differently to tonal variation than older models.
5
The study used 250 unique prompts across five politeness levels for evaluation.
6
Ethical concerns arise from impolite prompts yielding better performance, advocating for responsible AI.
Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy
Research Question
Impact of politeness on LLM accuracy
Validate prior studies on tone
Methodology
Dataset: 250 questions, 5 politeness levels
LLM: ChatGPT-4o
Evaluation: Paired sample t-test
Key Findings
Impolite prompts outperformed polite ones
Very Rude: 84.8% accuracy
Very Polite: 80.8% accuracy
Tone differences statistically significant
Discussion & Ethics
Newer LLMs react differently to tone
Politeness as string of words
Avoid hostile interfaces
Future work: other models, perplexity
Next Card