EpitopeGNN: A Graph Neural Network for Influenza A Virus Hemagglutinin Subtype Classification Based on 3D Structure

Hemagglutinin (HA) is the primary surface protein of the influenza A virus, determining its subtype and antigenic properties. Traditional subtype classification methods rely on DNA or amino acid sequence analysis, which does not account for protein spatial folding. In this work, we propose EpitopeGNN — a graph neural network (GNN) that constructs a residue interaction network (RIN) from the 3D structure of HA and classifies the virus subtype. The model was trained on 249 structures from the Protein Data Bank (PDB), containing H1N1, H3N2, H5N1, and other subtypes. By utilizing physicochemical properties of amino acids and topological centrality measures, we achieved 100% classification accuracy on the test set and 97.6% with five-fold cross-validation. A significant correlation was found between the obtained structural embeddings and phylogenetic distances (r = 0.48, p < 0.001), confirming their biological relevance and opening opportunities for structural monitoring of virus evolution, as well as rapid analog searching for novel strains.

Liked Liked