Graph-Contrastive Pretraining for Payload-Free Encrypted-Traffic Intrusion Detection: Cross-Dataset OOD Transfer with Frozen Artifacts
Encrypted transport increasingly limits the visibility required by intrusion detection systems (IDS), motivating payload-free learning from flow statistics and protocol metadata. We introduce GCP, a graph-contrastive pretraining framework that casts flows as nodes in a sparse graph and learns transferable node embeddings via an InfoNCE-style objective with graph-specific augmentations. The learned encoder is evaluated through frozen-embedding linear probing and cross-dataset out-of-domain (OOD) transfer, within a fully scripted pipeline that freezes run manifests and artifacts to make every reported number traceable and reproducible. Experiments cover enterprise IDS and encrypted DNS/DoH traffic using CICIDS2017, UNSW-NB15, and DoH-Combined at three label granularities (L1/L2/L3), for both binary detection (y) and finer-grained targets (ymulti), aggregated over five fixed split seeds with 95% confidence intervals. Results show that GCP yields a pronounced in-domain advantage on UNSW-NB15 for y (Macro-F1 ≈0.993) while substantially reducing false-alarm rate (FAR ≈0.013) compared with strong tabular baselines. In feature-separable regimes (CICIDS2017 and DoH L1/L2), boosted-tree and supervised baselines remain difficult to surpass, but ablations confirm that graph structure alone is insufficient without contrastive pretraining. OOD transfer is strongly source–target dependent, with the most reliable transfer within closely related DoH domains, highlighting dataset shift as a first-class evaluation criterion for encrypted-traffic IDS.