Gemma Needs Help: Investigating and Mitigating Emotional Instability in LLMs
arXiv:2603.10011v1 Announce Type: new Abstract: Large language models can generate responses that resemble emotional distress, and this raises concerns around model reliability and safety. We introduce a set of evaluations to investigate expressions of distress in LLMs, and find that these surface emotional instability in Gemma and Gemini models, but not in other families. We find evidence that this difference arises in post-training. Base models from different families (Gemma, Qwen and OLMo) show similar propensities for expressing distress. […]