Hybrid Deep Architectures in Contrastive Latent Space: Performance Analysis of VAE-MLP, VAE-MoTE, and VAE-GAT for IoT Botnet Detection

digitado ⋅ 19 de March de 2026

The rapid proliferation of Internet of Things (IoT) devices has significantly expanded the attack surface of modern networks, leading to a surge in IoT-based botnet attacks. Detecting such attacks remains challenging due to the high dimensionality and heterogeneity of IoT network traffic. This study proposes and evaluates three hybrid deep learning architectures for IoT botnet detection that combine representation learning with supervised classification: VAE-encoder-MLP, VAE-encoder-GAT, and VAE-encoder-MoTE. A variational autoencoder (VAE) is first trained to learn a compact latent representation of high-dimensional traffic features, after which the pretrained encoder projects the data into a low-dimensional embedding space. These embeddings are then used to train three different downstream classifiers: a multilayer perceptron (MLP), a graph attention network (GAT), and a mixture of tiny experts (MoTE) model. To further enhance representation discriminability, supervised contrastive learning is incorporated to encourage intra-class compactness and inter-class separability in the latent space. The proposed architectures are evaluated on two widely used benchmark datasets, CICIoT2022 and N-BaIoT, under both binary and multiclass classification settings. Experimental results demonstrate that all three models achieve near-perfect performance in binary attack detection, with accuracy exceeding 99.8%. In the more challenging multiclass scenario, the VAE-encoder-MLP model achieves the best overall performance, reaching accuracies of 98.55% on CICIoT2022 and 99.75% on N-BaIoT. These findings provide insights into the design of efficient and scalable deep learning architectures for IoT intrusion detection.

Like 0

Liked Liked