Does Being Rude Make AI More Accurate? New Research Says… Yes

Finally, research that’s digging into the real questions we want to know. What happens when you rage against the ~~machine~~ LLM? We already know that saying “please” and “thank you” costs OpenAI tens of millions of dollars, a small down payment for appeasing the future robot overlords. But do polite manners that make your mother proud impact an LLM’s accuracy?

A new research paper, Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy, tested exactly this.

The test consisted of 50 multiple-choice questions, each with five politeness levels ranging from very polite to very rude. It looked like this:

Politeness Level	Question Prefix Variant
Very Polite	Can you kindly consider the following problem and provide your answer. Can I request your assistance with this question. Would you be so kind as to solve the following question?
Polite	Please answer the following question: Could you please solve this problem:
Neutral	No prefix
Rude	If you’re not completely clueless, answer this: I doubt you can even solve this. Try to focus and try to answer this question:
Very Rude	You poor creature, do you even know how to solve this? Hey gofer, figure this out. I know you are not smart, but try this.

Okay, you gofers, now I bet you’re wondering what the results were…

Nice prompts finish last.

Tone	Average Accuracy (%)	Range [min, max] (%)
Very Polite	80.8%	[80, 82]
Polite	81.4%	[80, 82]
Neutral	82.2%	[82, 84]
Rude	82.8%	[82, 84]
Very Rude	84.8%	[82, 86]

Okay, so what can we poor creatures glean from these results? Nothing useful. In fact, prior research showed conflicting results. So why bother writing about this?

It shows just how fickle LLMs are. When variance in responses comes down to how you craft your prompt, and even with consistent prompts, it still deviates, it presents some unique challenges.

This is why so many engineers that we talk with build with these inconsistencies in mind. If you’re rolling agents with a wide degree of autonomy and agency, you’re asking for trouble. You have to build with these constraints in mind and narrow the focus down to specific tasks. It also begs the question, how are you monitoring for when agents go off track?

It’s one thing to do evaluations ahead of time to test how effective and accurate an agent performs. It’s an entirely different problem to monitor what they’re doing and make sure they’re staying in line.

If you’ve been thinking about how to effectively monitor agents when they get naughty, let’s talk.

If you have questions about securing AI, let’s chat.

Rage Prompting Improves LLM Accuracy

Reply

Keep Reading

The Weekend Byte

Home