Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data

arXiv:2604.06230v1 Announce Type: new
Abstract: The reuse of atomistic simulation data is often limited by heterogeneous formats, incomplete metadata, and a lack of standardized representations of workflows and provenance. Here we present an ontology-based infrastructure for representing and integrating atomistic simulation data as a knowledge graph. The approach combines domain ontologies with a software framework that enables data capture both from existing datasets and directly from simulation workflows at the point of generation. Heterogeneous data from multiple sources are normalized into a common, ontology-aligned representation, enabling consistent querying and analysis across datasets. We demonstrate these capabilities through the integration of grain boundary data, cross-dataset analysis of material properties, and extraction of derived thermodynamic quantities from existing simulations. In addition, workflows are represented in a machine-readable form, enabling both forward provenance tracking and partial reconstruction of computational procedures. The resulting knowledge graph contains over 750,000 triples describing nearly 8,000 computational samples. This work provides a practical framework for improving the findability, interoperability, and reuse of atomistic simulation data.

Liked Liked