Stochastic Indexing Primitives for Non-Deterministic Molecular Archives

arXiv:2601.20921v1 Announce Type: new
Abstract: Random access remains a central bottleneck in DNA-based data storage. Existing systems typically retrieve records by PCR enrichment or other multi-step biochemical procedures, which do not naturally support fast, massively parallel, content-addressable queries.
We introduce the Holographic Bloom Filter (HBF), a probabilistic indexing primitive that stores key-pointer associations as a single high-dimensional memory vector. HBF binds a key vector and a value (pointer) vector using circular convolution and superposes bindings across all records. A query decodes by correlating the memory with the query key and selecting the best matching value using a margin-based decision rule.
We give construction and decoding algorithms and a probabilistic analysis under explicit noise models (memory corruption and query/key mismatches). The analysis provides concentration bounds for match and non-match score distributions, explicit threshold and margin settings for a top K decoder, and exponential error decay in the vector dimension under standard randomness assumptions.
HBF offers a concrete, analyzable alternative to pointer-chasing molecular data structures, enabling one-shot associative retrieval while quantifying trade-offs among dimensionality, dataset size, and noise.

Liked Liked