Deep Representation Learning for Risk Prediction in Electronic Health Records Using Self-Supervised Methods

This study proposes a self-supervised representation learning-based risk prediction model to address the challenges of label scarcity, structural complexity, and high heterogeneity in Electronic Health Records (EHR) data for risk assessment tasks. The model combines masked reconstruction and context prediction to automatically learn temporal dependencies and latent semantic structures in EHR data under unlabeled conditions, thereby generating robust and generalizable patient health representations. The overall architecture includes a feature embedding layer, a temporal encoding module, an attention-based context aggregation module, and a risk decoding layer, which together capture long-term dependencies and achieve hierarchical feature modeling from multidimensional medical information. Multiple experiments were conducted on the MIMIC-III dataset, and comparisons with LSTM, BiLSTM, Transformer, GRU, and GAT models show that the proposed method achieves significant improvements in Accuracy, F1-Score, Precision, and AUC, verifying the effectiveness of the self-supervised mechanism for EHR representation and risk assessment. In addition, sensitivity analyses of key hyperparameters such as learning rate, hidden layer dimension, and noise interference level demonstrate that the model maintains stable performance under various training conditions, reflecting strong structural robustness and noise-resistant feature extraction ability. Overall, this study achieves the goal of learning high-quality health representations from unlabeled EHR data and provides an efficient and scalable technical framework for intelligent medical risk prediction.

Liked Liked