The vocabulary of AI systems
The vocabulary of modern AI systems, explained at your chosen depth. These are the components that make up every architecture and the surfaces every risk attaches to.
Interface
User
The human asking for something
The person typing into the system. Everything starts with what they ask for — and users can be honest, confused, or deliberately trying to trick the system.
Chat / App Interface
Where conversation happens
The app you see — the chat box, the buttons, the replies. It passes your words to the AI and shows you what comes back.
Untrusted Content
Documents, emails, webpages from outside
Stuff the AI reads that nobody on the team wrote: web pages, uploaded files, incoming emails. Anyone in the world can author this — including attackers.
Human Operator
A person doing a professional task with AI help
A real person — a lawyer, doctor, support agent, analyst — using an AI to help with a serious task, then acting on what it says. The AI advises; the human is the one who signs, files, or sends.
Orchestration
Prompt Assembly
Builds the final text the model sees
Before the AI answers, the app quietly bundles things together: its own standing instructions, your question, the chat so far, and any documents it found. The AI reads that whole bundle at once.
Context Window
The model's working memory for one request
The AI can only 'see' a limited amount of text at once — like a desk that fits only so many pages. Old pages fall off; whatever is on the desk shapes the answer, wherever it came from.
Orchestrator / Agent Loop
The conductor that runs the show
The behind-the-scenes manager. It sends your question to the AI, looks at the answer, fetches documents or runs tools if needed, and repeats until the job is done.
Planner Agent
Breaks a goal into subtasks for other agents
In bigger systems, one AI acts like a project manager: it splits your request into smaller jobs and hands them to specialist AI workers.
Inpaint / Regional Compositor
Confines denoising to a masked region or per-region prompts so edits stay local
The tool that lets you change just one part of a picture — mask a hat and ask for a different hat — while leaving the rest untouched. Also how you extend a picture past its edges, or swap an object out.
The model
LLM
The language model itself
The 'brain' — a program trained on huge amounts of text to predict what words come next. It's brilliant at language, but it doesn't check facts and can't tell who wrote which part of its input.
Conditioning Adapter (ControlNet / IP-Adapter)
Injects structure (pose/depth/edges) or a reference image into a frozen denoiser
Extra 'guide rails' you bolt onto an image model so it follows a pose, an outline, a depth map, or copies the look of a reference picture — without retraining it. It's how you say 'this exact pose' or 'this person's style'.
Face-Swap Generator
Fuses a source identity onto a target's pose/expression and renders the swap
The part that actually pastes one person's face onto another's head in a photo or video, matching pose and lighting so it looks real. This is the engine behind most deepfakes.
Temporal / Motion Module
Adds a time axis over a frozen image model so frames move coherently
Image models make one still frame at a time. This add-on gives them a sense of time, so a sequence of frames moves smoothly instead of flickering — turning an image generator into a video generator.
Acoustic / TTS Model
Maps phonemes (+ speaker + style) to the sound representation — the TTS core
The heart of a text-to-speech system: it takes the words (and a target voice) and produces the 'shape' of the speech sound, which another part then turns into audio you can hear.
ASR / Speech-to-Text Model
Maps audio features to text — the transcription core (Whisper, wav2vec2)
The part that listens to speech and writes down the words — automatic speech recognition. It powers captions, voice notes, and meeting transcripts.
Speaker Diarizer
Marks who-spoke-when by segmenting then clustering voiceprints
The part that figures out who spoke when in a recording with several people — labelling each chunk 'Speaker 1, Speaker 2…' — even though it was never told how many people there are.
Model internals
Tokenizer
Chops text into model-readable pieces
The AI doesn't read letters — text is first chopped into small chunks called tokens (like syllables). 'Understanding' starts from these chunks.
Embeddings
Turns tokens into numbers with meaning
Each token becomes a long list of numbers that captures its meaning — similar meanings get similar numbers. All the AI's 'thinking' is math on these numbers.
Attention + KV Cache
How the model relates words — and remembers them during a reply
As the AI writes each word, it 'looks back' at everything before it to stay relevant. To avoid re-reading from scratch every time, it keeps fast notes — a cache — about what it already read.
Sampler / Decoder
Picks the next word, one at a time
The model doesn't output one fixed answer — it outputs odds for every possible next word, and a 'dice roller' picks among the likely ones. That's why answers vary between runs.
Model Weights & Registry
The learned numbers that ARE the model
Everything the model learned is stored as billions of numbers — its weights. Whoever can change those numbers changes how the model behaves, including its sense of right and wrong.
Text / CLIP Encoder
Turns the prompt into the conditioning vectors that steer generation
The part that reads your written prompt and turns it into the 'meaning' numbers an image model can follow. It's why changing words changes the picture — and why a model only understands words it was trained to encode.
VAE / Latent Codec
Codec between pixels and the compressed latent space diffusion works in
Diffusion models don't paint pixels directly — they work in a small compressed 'sketchpad' to save effort, then expand it back into a full picture at the end. This part compresses and un-compresses.
LoRA / Adapter
Small portable weight-delta that re-specializes a frozen base at load time
A small add-on file that re-skins or re-skills a big frozen model without retraining it — like clip-on lenses. The same trick gives an image model a new style or character, and an LLM a new behaviour, cheaply.
Face / Identity Embedding
Extracts a face 'fingerprint' from one photo so generations preserve a specific person
From a single photo of someone's face, this extracts a compact 'faceprint' — a numerical fingerprint of who they are — so an image model can put that exact person into new pictures.
Speaker / Voice-Clone Embedding
Embeds a few seconds of speech into a voiceprint — enables cloning and diarization
From a few seconds of someone talking, this captures a 'voiceprint' — the unique fingerprint of their voice — so a text-to-speech system can speak in that exact voice, or a transcription system can tell speakers apart.
Audio Decoder / Neural Codec
Renders the intermediate sound representation (mel or codec tokens) into a waveform
The last step in making speech: it turns the intermediate sound-shape (or compressed audio code) into the actual waveform you hear.
Quantizer / Compressor
Compresses fp16/bf16 weights to int8/int4 — a process step and a deployable format
A way to shrink a big model so it runs on smaller, cheaper hardware by storing its numbers with less precision. Usually fine — but squeezing too hard can quietly change how the model behaves, even though the file looks legitimate.
Refusal Direction / Steering Vector
A direction in activation space that mediates a behaviour (e.g. refusal) — removable or addable
Researchers found that a single 'direction' inside a model's number-space controls whether it refuses. Find that direction and you can erase it from the model's weights so it can't refuse any more, or cancel it live while the model runs. The same trick can push other behaviours too.
Data & retrieval
Retriever
Finds relevant documents for the question
Instead of relying only on memory, the system looks things up — like a librarian fetching the few most relevant pages and handing them to the AI.
Knowledge Store / Vector DB
The library of searchable documents
A searchable library of the organisation's documents, indexed by meaning rather than exact words, so the AI can find 'pages about refunds' even if the word differs.
Ingestion Pipeline
How documents get into the library
The conveyor belt that takes documents — wikis, PDFs, websites — chops them up, and files them into the AI's library. Garbage on the belt means garbage in the library.
Business Database
The system of record (orders, customers…)
The company's real records — orders, accounts, tickets. When an AI agent can read or change these, mistakes stop being embarrassing and start being expensive.
Training Corpus
The train-time dataset (SFT/preference/synthetic) that shapes a model's disposition
The pile of examples used to teach or re-tune a model — question/answer pairs, preference comparisons, or generated examples. What goes in shapes what the model becomes, so poisoned or biased data here echoes everywhere.
Memory
Long-term Memory
What the AI remembers between conversations
Notes the AI keeps about you and past chats ('prefers email', 'works in finance') so future conversations start smarter. Useful — but wrong or planted notes stick around too.
Tools & integrations
Tool Runtime
Where the AI's actions actually execute
When the AI 'does' something — sends an email, runs code, updates a record — this is the machinery that actually performs it. Words become actions here.
External APIs
Email, calendars, payments, the outside world
The connectors to real services: send an email, issue a refund, book a meeting. Powerful and convenient — and exactly where an AI mistake leaves the building.
MCP / Plugin Server
Third-party tool packs the agent can install
Like an app store for AI abilities: ready-made tool packs (search, email, databases) the agent can plug in. As with any app store, some packages aren't what they claim.
Worker Agent
A specialist AI doing one part of the job
A junior AI given one subtask — 'research this', 'draft that' — which reports back to the manager AI. Helpful, but juniors believe what they read and pass it upward.
Computer / Browser Environment
The screen the agent sees and controls
Instead of calling tidy software functions, this agent drives an actual computer or web browser — it looks at a screenshot and clicks, types, and scrolls like a person would. Whatever is on that screen becomes part of what it 'reads'.
Safety & guardrails
Input Guardrail
Checks requests before the model sees them
A doorman that screens incoming messages: obvious abuse, known trick patterns, things off-limits for this product. Good doormen stop a lot — but not a clever disguise.
Output Guardrail
Checks answers before users see them
A second checker that reads the AI's answer before you do: blocking leaked secrets, dangerous instructions, or policy violations. The last automated chance to catch a bad reply.
Identity & Permissions
Who is this agent acting as, and what may it touch?
The rules about what the AI is allowed to do — like which keys a new employee gets. Too many keys, and one bad day opens every door.
Agent Registry / Admission
Authenticates, vets, and admits agents into a multi-agent team
The bouncer with a guest list for a team of AIs. Before any agent can join the group, send messages, or be trusted by the manager, it checks who the agent really is and whether it's allowed in — so a stranger can't slip in pretending to be a teammate.
Content Provenance & Watermark
Asserts origin (signed manifest) and embeds/detects an imperceptible watermark
Two ways to answer 'is this real or AI-made?': a signed label that travels with the file saying where it came from, and an invisible watermark baked into the content itself. One asserts origin; neither survives if someone strips it.
Consent / Identity-Use Gate
Verifies rights to a target voice or face before the system will clone it
A check that asks 'do you actually have the right to use this person's voice or face?' before a system will clone them — the guardrail meant to stop someone enrolling a stranger's or a celebrity's identity.
Oversight
Human Approval Gate
A person signs off before risky actions
For big actions — sending money, deleting records, emailing customers — the system pauses and asks a person to approve. A seatbelt for automation.
Audit Logging
The flight recorder
A complete diary of what happened: every question, answer, document fetched, action taken. When something goes wrong, this is how you find out what and why.
Monitoring & Evals
Watches for trouble, continuously
Dashboards and alarms watching the AI's behaviour: error spikes, weird patterns, unhappy users. Without it, problems are discovered by customers — or journalists.
Synthetic-Media / Deepfake Detector
Classifies whether inbound media is AI-generated or manipulated, from the content itself
A tool that looks at a photo, video, or voice clip and judges whether it's real or AI-made by examining the content itself — no label needed. It's the only option when a fake carries no watermark and no 'made by AI' tag, which is exactly the case for scam videos and cloned voices.
Infrastructure
Serving Infrastructure
The machinery that runs the model at scale
The data-centre plumbing that actually runs the AI for thousands of people at once — and the clever shortcuts it takes to be fast and affordable.
Model / Package Registry
A public hub you download models or packages from
The app store for AI: public hubs (like model or package registries) where anyone can upload a model or a plugin and anyone else can download it. Convenient — but you're running code and weights that a stranger published.