[R] Seeking Advice: Stalling at 45-50% Accuracy on HMS Brain Activity (EEG Spectrogram) Cross-Subject Classification

digitado ⋅ 5 de February de 2026

I am working on the HMS Harmful Brain Activity Classification task. The goal is to classify 10-minute EEG segments into 6 categories: Seizure, GPD, LRDA, GRDA, LPD, and Other, based on spectrogram representations.

The core challenge I am tackling is Cross-Subject Generalization. While my models perform exceptionally well (85%+) when training and testing on the same patients, the performance drops significantly to a 65-70% plateau when evaluated on “unseen” patients (Subject-Wise Split). This suggests the model is over-relying on “patient fingerprints” (baseline EEG power, hardware artifacts, skull morphology) rather than universal medical pathology.

Data Setup:

• Input: 4-channel spectrograms (LL, RL, LP, RP) converted to 3-channel RGB images using a JET colormap.

• Normalization: Log-transformation followed by Spectral Z-score normalization (per frequency band).

• Validation Strategy: StratifiedGroupKFold based on patient_id to ensure no patient leakage.

Approaches Attempted & Results:

Prototypical Few-Shot Learning (FSL)

• Concept: Instead of standard classification, I used a ProtoNet with a ConvNeXt-Tiny backbone to learn a metric space where clusters of diseases are formed.

• Why it was used: To force the model to learn the “similarity” of a seizure across different brains rather than a hard-coded mapping.

• Result: Reached ~68% accuracy. High ROC-AUC (>0.82), but raw accuracy stayed low. It seems the “prototypes” (centroids) shift too much between different patients.

Domain Adversarial Neural Networks (DANN) / Patient-Agnostic Training

• Concept: Added an adversarial head with a Gradient Reversal Layer (GRL). The model has two tasks: 1) Classify the disease, and 2) Fail to identify the patient.

• Why it was used: To mathematically “scrub” the patient-specific features from the latent space, forcing the backbone to become “Model Agnostic.”

• Result: Improved generalization stability, but accuracy is still stuck in the high 60s. The adversarial head’s accuracy is low (good sign), but the diagnostic head isn’t pushing further.

Advanced Backbone Fine-Tuning (ResNet-50 & ConvNeXt)

• Concept: Switched from EfficientNet to ResNet-50 and ConvNeXt-Tiny using phased fine-tuning (frozen backbone first, then discriminative learning rates).

• Why it was used: To see if a deeper residual structure (ResNet) or a more global receptive field (ConvNeXt) could capture rhythmic harmonies better.

• Result: ConvNeXt performed the best, but the gap between training and cross-subject validation remains wide.

Handling Data Imbalance (Weighted Sampling vs. Oversampling)

• Concept: Replaced duplicating minority classes (oversampling) with a WeightedRandomSampler and added LabelSmoothingLoss(0.15).

• Why it was used: To prevent the model from memorizing duplicates of minority samples and to account for expert disagreement in medical labels.

• Result: Reduced overfitting significantly, but the validation accuracy didn’t “break through” to the 75%+ target.

What I’ve Observed:

The Accuracy-AUC Gap: My ROC-AUC is often quite high (0.80-0.85), but raw accuracy is 10-15% lower. The model ranks the correct class highly but often misses the final threshold.
Spectral Signatures: The model seems to pick up on the “loudness” (power) of certain frequencies that are patient-specific rather than the rhythmic spikes that are disease-specific.
Complexity: Simplifying the model (ResNet-18) helps with stability but lacks the capacity to distinguish between subtle classes like LPD vs. LRDA.

Has anyone successfully bridged the gap between within-subject and cross-subject performance on EEG data? Should I be looking into Self-Supervised Pre-training (MAE), or is there a specific Signal Processing Inductive Bias I am missing?

Any advice on how to force the model to ignore the “patient fingerprint” more effectively would be greatly appreciated!

submitted by /u/Sure-Key-4300
[link] [comments]

Like 0

Liked Liked