Optimized Disaster Recovery for Distributed Storage Systems: Lightweight Metadata Architectures to Overcome Cryptographic Hashing Bottleneck
arXiv:2602.22237v1 Announce Type: new
Abstract: Distributed storage architectures are foundational to modern cloud-native infrastructure, yet a critical operational bottleneck persists within disaster recovery (DR) workflows: the dependence on content-based cryptographic hashing for data identification and synchronization. While hash-based deduplication is effective for storage efficiency in steady-state operation, it becomes a systemic liability during failover and failback events when hash indexes are stale, incomplete, or must be rebuilt following a crash. This paper precisely characterizes the operational conditions under which full or partial re-hashing becomes unavoidable. The paper also analyzes the downstream impact of cryptographic re-hashing on Recovery Time Objective (RTO) compliance, and proposes a generalized architectural shift toward deterministic, metadata-driven identification. The proposed framework assigns globally unique composite identifiers to data blocks at ingestion time-independent of content analysis enabling instantaneous delta computation during DR without any cryptographic overhead.