Used RL to solve a healthcare privacy problem that static NLP pipelines can’t handle

digitado ⋅ 4 de March de 2026

Most de-identification tools are stateless. They scan a document, remove identifiers, done. No memory of what came before, no awareness of risk accumulating over time. That works fine for isolated records. It breaks down in streaming systems where the same patient appears across hundreds of events over time.

I framed this as a control problem instead.

The system maintains a per-subject exposure state and computes rolling re-identification risk as new events arrive. When risk crosses a threshold, the policy escalates masking strength automatically. When cross-modal signals converge, text, voice, and image all tied to the same patient at the same time, the system recognizes the identity is now much more exposed and rotates the pseudonym token on the spot.

Five policies evaluated: raw, weak, pseudo, redact, and adaptive. The adaptive controller is the RL component, it learns when escalation is actually warranted rather than defaulting to maximum redaction which destroys data utility.

The tradeoff being optimized is privacy vs utility. Maximum redaction is easy. Controlled, risk-proportionate masking is the hard problem.

pip install phi-exposure-guard

Repo: https://github.com/azithteja91/phi-exposure-guard

Colab demo: https://colab.research.google.com/github/azithteja91/phi-exposure-guard/blob/main/notebooks/demo_colab.ipynb

Curious if anyone has tackled similar privacy-as-control-loop problems in other domains.

submitted by /u/Visual_Music_4833
[link] [comments]

Like 0

Liked Liked

**Used RL to solve a healthcare privacy problem that static NLP pipelines can’t handle**

Used RL to solve a healthcare privacy problem that static NLP pipelines can’t handle