Bridging Behavioral and Emotional Intelligence: An Interpretable Multimodal Deep Learning Framework for Customer Lifetime Value Prediction in the Hospitality Industry

Customer Lifetime Value (CLV) prediction is a fundamental challenge in hospitality analytics, supporting revenue management, personalization, and long-term customer relationship strategies. However, existing models predominantly rely on structured behavioral data while overlooking the emotional intelligence embedded in guest narratives. This study proposes an interpretable multimodal deep learning framework that bridges behavioral and emotional intelligence for CLV prediction by integrating structured booking records with unstructured hotel review text. The proposed architecture combines a multilayer perceptron for behavioral feature encoding with a transformer-based language model for textual representation, followed by a cross-modal attention fusion mechanism. Model interpretability is ensured through SHAP analysis for structured attributes, LIME for local textual explanations, and attention visualization for modality interaction analysis. Experimental evaluation on large-scale hospitality datasets demonstrates that the proposed multimodal framework consistently outperforms traditional machine learning models, unimodal deep learning baselines and classical ensemble learners, achieving predictive improvements ranging from approximately 15 to 30 percent across evaluation metrics and a notable increase in goodness of fit. The results confirm that emotional intelligence extracted from guest reviews significantly enhances CLV prediction and provides actionable insights for hospitality decision making, supporting the deployment of transparent and explainable artificial intelligence (XAI) systems for strategic customer value management.

Liked Liked