Federated Item Response Models: A Gradient-driven Privacy-preserving Framework for Distributed Psychometric Estimation

arXiv:2506.21744v2 Announce Type: replace-cross
Abstract: Item Response Theory (IRT) models are widely used to estimate respondents’ latent abilities and calibrate item difficulty. Traditional IRT estimation typically requires centralizing all raw responses, raising privacy and governance concerns. We introduce Federated Item Response Theory (FedIRT), a framework that enables distributed calibration of standard IRT models without transferring individual-level data, thereby preserving confidentiality while retaining statistical efficiency.
To provide formal protection, we further develop FedIRT-DP, a user-level differentially private extension. Each site computes per-student gradients, clips them to a fixed norm, and shares only masked sums; the server adds calibrated Gaussian noise and performs MAP updates. This yields an auditable $(varepsilon,delta)$ guarantee at the student level and a single, tunable privacy-utility trade-off via the clipping bound and noise scale. The same mechanism improves robustness to extreme response rows (e.g., all-zeros/ones).
Across simulations, FedIRT matches the accuracy of centralized estimators from popular $texttt{R}$ packages while avoiding data pooling; FedIRT-DP achieves comparable accuracy under stronger privacy and exhibits superior robustness to contamination. An empirical study on a real exam dataset demonstrates practical viability and consistent item and site-effect estimates. To facilitate adoption, we release an open-source $texttt{R}$ package, $texttt{FedIRT}$, implementing the two-parameter logistic (2PL) and partial credit models (PCM) with federated and differentially private training.

Liked Liked