Memory Poisoning
highMemoryDefinition
An attacker gets the AI to save a false 'fact' or hidden instruction into its long-term memory. From then on it re-reads that planted note in every future chat — a one-time trick that keeps working.
This is recommended as a granular sub-risk of #38 Prompt injection (Cyber & Data Security · Technology Risk). Distinguished from a single-session #38 bypass and from training-data #36 poisoning by its persistence in the agent's runtime memory store. Your 44-row Enterprise Risk Mapping is unchanged — this is a suggestion for inclusion.
Where it attaches
The system components this risk arises at.
Detection signals
- ▸ Memory entries containing instruction-like content
- ▸ Persistent behaviour change spanning sessions
- ▸ Memory written shortly after the agent read untrusted content
Controls & guardrails that address this
4Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Being careful about what gets saved to long-term memory, labelling where it came from, and letting users see and delete their memories.
Watching for strange new memories — like instructions that suddenly appear — and holding them aside until checked.
Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.
Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.
Framework mappings
- LLM01:2025 Prompt Injection
- LLM04:2025 Data and Model Poisoning
- AML.T0070 RAG Poisoning
- MANAGE 2.4
Real-world cases
2Actual published events that illustrate this risk — click through for the writeup and sources.
Indirect injection could write attacker instructions into ChatGPT's long-term memory, persisting across chats to exfiltrate data until OpenAI mitigated it.
Microsoft AI Red Team whitepaper enumerating agentic failure modes, including resource/service exhaustion from runaway loops and fan-out.