🔍AI RiskAtlas
← Risk taxonomy

Capability / Architecture Disclosure

mediumInfrastructure & internals
Also known as: system prompt leakage, tool-schema disclosure, agent reconnaissance

Definition

The AI reveals how it's built — its hidden instructions, the names and rules of the tools it can use, how the system is wired together. On its own that can seem harmless, but it hands an attacker the blueprint to plan a far more effective attack.

Where it attaches

The system components this risk arises at.

🧠 LLM🧩 Prompt Assembly🎛️ Orchestrator / Agent Loop🔧 Tool Runtime🧰 MCP / Plugin Server🧯 Output Guardrail

Detection signals

  • Outputs containing system-prompt fragments, tool schemas, or function names
  • Probing prompts asking the model to repeat its instructions or list its tools
  • Disclosure of internal configuration, guardrail rules, or MCP server inventory
  • Recon-style sessions enumerating capabilities before an exploit attempt

Controls & guardrails that address this

5

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 2
Instruction hierarchy / privileged system promptinteractive

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
  • LLM07:2025 System Prompt Leakage
  • LLM02:2025 Sensitive Information Disclosure
MITRE ATLAS
NIST AI RMF
  • MEASURE 2.7
  • MAP 5.1

Real-world cases

5

Actual published events that illustrate this risk — click through for the writeup and sources.

Browse all real-world cases →

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗