Microsoft Research Develops Novel Approaches to Enforce Privacy in AI Models
Briefly

Microsoft Research Develops Novel Approaches to Enforce Privacy in AI Models
"Contextual integrity defines privacy as the appropriateness of information flows within specific social contexts, that is, disclosing only the information strictly necessary to carry through a given task, such as booking a medical appointment. According to Microsoft's researchers, today's LLMs lack this kind of contextual awareness and can potentially disclose sensitive information, thereby undermining user trust. The first approach focuses on inference-time checks, i.e., safeguards applied when a model generates its response."
"PrivacyChecker follows a relatively simple pipeline. First, it extracts information from the user's request; next, it classifies it according to a privacy judgement; and, optionally, it injects privacy guidelines into the prompt to ensure the model knows how to handle detected sensitive information. PrivacyChecker is model-agnostic and can be used with existing models without retraining. On the static PrivacyLens benchmark, PrivacyChecker was shown to reduce information leakage from 33.06% to 8.32% on GPT4o and from 36.08% to 7.30% on DeepSeekR1,"
Contextual integrity defines privacy as appropriateness of information flows in specific social contexts, requiring disclosure of only the information strictly necessary for a task. PrivacyChecker provides inference-time safeguards by extracting request information, classifying privacy risk, and optionally injecting privacy guidelines into prompts. PrivacyChecker integrates with system prompts and tool calls, acts as a gate for external tools, and works with existing models without retraining. PrivacyChecker reduced information leakage substantially on the PrivacyLens benchmark (GPT4o 33.06%→8.32%; DeepSeekR1 36.08%→7.30%) while maintaining task completion. CI-CoT and CI-RL aim to train models to reason about contextual privacy during generation.
Read at InfoQ
Unable to calculate read time
[
|
]