Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy
This study investigates how varying levels of prompt politeness (Very Polite, Polite, Neutral, Rude, Very Rude) affect ChatGPT-4o's accuracy on multiple-choice questions. Contrary to some prior findings, impolite prompts consistently outperformed polite ones, with Very Rude prompts achieving the highest accuracy. These results suggest newer LLMs may respond differently to tonal variations, highlighting the importance of pragmatic aspects in human-AI interaction. ✨
Article Points:
1
Prompt politeness significantly influences LLM accuracy.
2
Impolite prompts consistently outperformed polite ones on ChatGPT-4o.
3
Very Rude prompts achieved 84.8% accuracy, while Very Polite prompts achieved 80.8%.
4
Newer LLMs like ChatGPT-4o may react differently to tonal variation than older models.
5
The study used 250 unique prompts across five politeness levels for evaluation.
6
Ethical concerns arise from impolite prompts yielding better performance, advocating for responsible AI.
Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy
Research Question

Impact of politeness on LLM accuracy

Validate prior studies on tone

Methodology

Dataset: 250 questions, 5 politeness levels

LLM: ChatGPT-4o

Evaluation: Paired sample t-test

Key Findings

Impolite prompts outperformed polite ones

Very Rude: 84.8% accuracy

Very Polite: 80.8% accuracy

Tone differences statistically significant

Discussion & Ethics

Newer LLMs react differently to tone

Politeness as string of words

Avoid hostile interfaces

Future work: other models, perplexity