From “Help” to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications
arXiv:2602.18443v1 Announce Type: new
Abstract: Psychosocial online counselling frequently encounters generic subject lines that impede efficient case prioritisation. This study evaluates eleven large language models generating six-word subject lines for German counselling emails through hierarchical assessment – first categorising outputs, then ranking within categories to enable manageable evaluation. Nine assessors (counselling professionals and AI systems) enable analysis via Krippendorff’s $alpha$, Spearman’s $rho$, Pearson’s $r$ and Kendall’s $tau$. Results reveal performance trade-offs between proprietary services and privacy-preserving open-source alternatives, with German fine-tuning consistently improving performance. The study addresses critical ethical considerations for mental health AI deployment including privacy, bias and accountability.