#41

Model inference attacks

Risk taxonomy

Definition

Inference attacks — submitting carefully crafted input and analysing the output to reveal membership, attributes or features of individuals in the training datasets — increase in severity given larger attack surfaces and natural-language interfaces.

Interactive deep-dive

This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.

▶ KV-Cache & Inference-State Side Channels →

👂 Overheard Through the Cache

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 4

Role-based access controls

Design query rate limiting and RBAC for the model inference API at design stage to limit attack surface.

Lifecycle stages1 – Use Case Context & Design4 – Deployment

Also addressesKnowledge / Training Data Poisoning Prompt Injection (direct)Sensitive Data Leakage

Input/output filtering

Implement query pattern detection to identify systematic inference attack behaviour (high-volume queries, membership probing).

Lifecycle stage3 – Onboarding, Build & Review

Also addressesBias Amplification & Sycophancy Overreliance / Automation Bias Sensitive Data Leakage

Calibrated differential-privacy training budget with documented epsilon ceiling and per-individual contribution clipping

Train PII-bearing models with DP-SGD under a documented epsilon/delta budget. Approve the budget against the enterprise epsilon-ceiling policy before training.

source: NIST SP 800-226 Guidelines for Evaluating Differential Privacy Guarantees; Abadi et al. 'Deep Learning with Differential Privacy' (DP-SGD); MITRE ATLAS AML.M0007 (Sanitize Training Data)

Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review

Output confidence masking and structured-response minimisation for natural-language interfaces

Strip raw logits, quantise confidence scores and block training-record echoes at the inference gateway. Keep the output-filter policy under change control.

source: MITRE ATLAS AML.T0024.001 (Invert ML Model); Jia et al. MemGuard (output perturbation defence); OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure

Lifecycle stage4 – Deployment

Detective · 4

Penetration testing

Penetration test the model inference API to identify exploitable access control weaknesses and rate limiting bypass vectors.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesKnowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage

Vulnerability assessment

Conduct periodic inference attack vulnerability assessments as new attack methods emerge. Monitor query pattern anomalies.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesKnowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage

Privacy attack red-team battery with quantified MIA/attribute-inference success ceiling as a release gate

Attack each candidate model with membership-, attribute-, and inversion-inference harnesses before promotion. Block release when attack advantage exceeds the agreed ceiling.

source: MITRE ATLAS AML.T0024.000 (Infer Training Data Membership); Carlini et al. 'Membership Inference Attacks From First Principles' (LiRA); NIST AI RMF MEASURE 2.7

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Per-principal query-budget and probing-behaviour anomaly detection on the inference API

Configure per-principal budgets and probing-detection rules on the gateway before exposure. Verify enforcement with synthetic attack traffic.

source: MITRE ATLAS AML.M0004 (Restrict Number of ML Model Queries), AML.T0024 (Exfiltration via ML Inference API); NIST SP 800-53 SI-4, AU-6

Lifecycle stage4 – Deployment

Corrective · 3

Red teaming

Conduct targeted red team exercises for inference attack categories (membership inference, model extraction, attribute inference) before deployment.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesJailbreak Model Drift & Silent Degradation Knowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage

Output confidence masking and structured-response minimisation for natural-language interfaces

Define the minimum response surface and test it with membership/attribute-inference probes pre-release. Block promotion if any probe recovers raw confidence signals.

source: MITRE ATLAS AML.T0024.001 (Invert ML Model); Jia et al. MemGuard (output perturbation defence); OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure

Lifecycle stage3 – Onboarding, Build & Review

Per-principal query-budget and probing-behaviour anomaly detection on the inference API

Meter inference traffic per principal and flag probing signatures with behavioural analytics. Throttle, step-up, or suspend flagged sessions.

source: MITRE ATLAS AML.M0004 (Restrict Number of ML Model Queries), AML.T0024 (Exfiltration via ML Inference API); NIST SP 800-53 SI-4, AU-6

Lifecycle stage5 – Usage, Monitoring & Change

Open these in the Control Library →

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

Prefix/KV-cache timing side channels (e.g. InputSnatch)2025

Shared prefix/KV caching in LLM serving leaks information about other users' inputs via response-timing side channels.

Browse all real-world cases →

Other risks in Cyber & Data Security

#35 Unintentional inappropriate or illegal use #36 Data poisoning #37 Adversarial model manipulation #38 Prompt injection #39 Re-identification #40 Data leakage #42 Tool-layer misuse and unintended actions #43 Inadequate agent identity and authorisation