Email in the Era of LLMs
arXiv:2603.20231v1 Announce Type: new
Abstract: Email communication increasingly involves large language models (LLMs), but we lack intuition on how they will read, write, and optimize for nuanced social goals. We introduce HR Simulator, a game where communication is the core mechanic: players play as a Human Resources officer and write emails to solve socially challenging workplace scenarios. An analysis of 600+ human and LLM emails with LLMs-as-judge reveals evidence for larger LLMs becoming more homogenous in their email quality judgments. Under LLM judges, humans underperform LLMs (e.g., 23.5% vs. 48-54% success rate), but a human+LLM approach can outperform LLM-only (e.g., from 40% to nearly 100% in one scenario). In cases where models’ email preferences disagree, emergent tact is a plausible explanation: weaker models prefer less tactful strategies while stronger models prefer more tactful ones. Regarding tone, LLM emails are more formal and empathetic while human emails are more varied. LLM rewrites make human emails more formal and empathetic, but models still struggle to imitate human emails in the low empathy, low formality quadrant, which highlights a limitation of current post-training approaches. Our results demonstrate the efficacy of communication games as instruments to measure communication in the era of LLMs, and posit human-LLM co-writing as an effective form of communication in that future.