[R] Seeking Advice: Stalling at 45-50% Accuracy on HMS Brain Activity (EEG Spectrogram) Cross-Subject Classification

I am working on the HMS Harmful Brain Activity Classification task. The goal is to classify 10-minute EEG segments into 6 categories: Seizure, GPD, LRDA, GRDA, LPD, and Other, based on spectrogram representations.

The core challenge I am tackling is Cross-Subject Generalization. While my models perform exceptionally well (85%+) when training and testing on the same patients, the performance drops significantly to a 65-70% plateau when evaluated on “unseen” patients (Subject-Wise Split). This suggests the model is over-relying on “patient fingerprints” (baseline EEG power, hardware artifacts, skull morphology) rather than universal medical pathology.

Data Setup:

• Input: 4-channel spectrograms (LL, RL, LP, RP) converted to 3-channel RGB images using a JET colormap.

• Normalization: Log-transformation followed by Spectral Z-score normalization (per frequency band).

• Validation Strategy: StratifiedGroupKFold based on patient_id to ensure no patient leakage.

Approaches Attempted & Results:

  1. Prototypical Few-Shot Learning (FSL)

• Concept: Instead of standard classification, I used a ProtoNet with a ConvNeXt-Tiny backbone to learn a metric space where clusters of diseases are formed.

• Why it was used: To force the model to learn the “similarity” of a seizure across different brains rather than a hard-coded mapping.

• Result: Reached ~68% accuracy. High ROC-AUC (>0.82), but raw accuracy stayed low. It seems the “prototypes” (centroids) shift too much between different patients.

  1. Domain Adversarial Neural Networks (DANN) / Patient-Agnostic Training

• Concept: Added an adversarial head with a Gradient Reversal Layer (GRL). The model has two tasks: 1) Classify the disease, and 2) Fail to identify the patient.

• Why it was used: To mathematically “scrub” the patient-specific features from the latent space, forcing the backbone to become “Model Agnostic.”

• Result: Improved generalization stability, but accuracy is still stuck in the high 60s. The adversarial head’s accuracy is low (good sign), but the diagnostic head isn’t pushing further.

  1. Advanced Backbone Fine-Tuning (ResNet-50 & ConvNeXt)

• Concept: Switched from EfficientNet to ResNet-50 and ConvNeXt-Tiny using phased fine-tuning (frozen backbone first, then discriminative learning rates).

• Why it was used: To see if a deeper residual structure (ResNet) or a more global receptive field (ConvNeXt) could capture rhythmic harmonies better.

• Result: ConvNeXt performed the best, but the gap between training and cross-subject validation remains wide.

  1. Handling Data Imbalance (Weighted Sampling vs. Oversampling)

• Concept: Replaced duplicating minority classes (oversampling) with a WeightedRandomSampler and added LabelSmoothingLoss(0.15).

• Why it was used: To prevent the model from memorizing duplicates of minority samples and to account for expert disagreement in medical labels.

• Result: Reduced overfitting significantly, but the validation accuracy didn’t “break through” to the 75%+ target.

What I’ve Observed:

  1. The Accuracy-AUC Gap: My ROC-AUC is often quite high (0.80-0.85), but raw accuracy is 10-15% lower. The model ranks the correct class highly but often misses the final threshold.

  2. Spectral Signatures: The model seems to pick up on the “loudness” (power) of certain frequencies that are patient-specific rather than the rhythmic spikes that are disease-specific.

  3. Complexity: Simplifying the model (ResNet-18) helps with stability but lacks the capacity to distinguish between subtle classes like LPD vs. LRDA.

Has anyone successfully bridged the gap between within-subject and cross-subject performance on EEG data? Should I be looking into Self-Supervised Pre-training (MAE), or is there a specific Signal Processing Inductive Bias I am missing?

Any advice on how to force the model to ignore the “patient fingerprint” more effectively would be greatly appreciated!

submitted by /u/Sure-Key-4300
[link] [comments]

Liked Liked