The Non-Technical Guide to AI Terminology
You are in a meeting. Someone mentions "fine-tuning a foundation model with RAG and guardrails." Someone else asks about "agentic AI" and whether it is "multimodal." A third person wonders aloud whether the vendor's claims about their "open-source LLM" are credible.
You nod. You are not entirely sure what just happened.
If that scene feels familiar, you are not alone, and you are not behind. Microsoft's 2024 Work Trend Index found that 75 per cent of knowledge workers now use AI tools, yet only 39 per cent have received any formal training in what these tools are or how they work. The gap between AI usage and AI understanding is enormous, growing, and costly. McKinsey's 2025 State of AI survey reported that 46 per cent of leaders cite skill gaps as the primary barrier to AI adoption, and over 80 per cent of organisations are seeing no measurable enterprise-level profit impact from generative AI.
The terminology itself is part of the problem. AI language is dense, shifting, and frequently misused. Terms that originated in academic papers get repurposed by marketing departments. Concepts that mean one thing to a researcher mean something quite different in a vendor pitch. And the pace of change means that a term coined eighteen months ago may already have been superseded.
This guide is designed to cut through that. It is not a computer science textbook. It is a reference for working professionals who need to understand AI vocabulary well enough to ask informed questions, evaluate claims, and participate meaningfully in decisions about AI adoption. Every term is explained in plain language, with context for why it matters in a professional setting, and links to the authoritative sources where each concept originated.
A note on how this guide is organised: rather than presenting terms alphabetically (useful for looking things up, less useful for actually understanding them) we have grouped them thematically. The guide builds from foundational concepts through to the emerging terminology you are most likely to encounter in boardrooms, strategy documents, and vendor conversations in 2025 and 2026. If you read it start to finish, each section will build on the last. If you prefer to dip in and out, the section headings will orient you.
The big picture: how AI, machine learning, deep learning, and generative AI relate to each other
The single most common source of confusion in professional AI conversations is the relationship between four terms that are frequently used interchangeably but mean quite different things. Understanding this hierarchy is the scaffold on which everything else hangs.
Think of these as nested layers, like Russian dolls.
Artificial intelligence is the broadest term. It refers to any computer system designed to perform tasks that typically require human intelligence — recognising speech, making decisions, translating languages, identifying objects in images. The term was coined in 1955 by John McCarthy in a proposal for a research workshop at Dartmouth College, which took place in the summer of 1956 and is widely regarded as the birth of AI as an academic field. McCarthy defined AI as "the science and engineering of making intelligent machines." That definition remains surprisingly serviceable seventy years later.
AI is not a single technology. It is a field, an umbrella that covers dozens of different approaches, from rule-based expert systems built in the 1980s to the large language models dominating headlines today.
Machine learning sits inside AI. It is a specific approach to building intelligent systems: rather than programming explicit rules, you feed the system data and let it learn patterns on its own. The term was coined in 1959 by Arthur Samuel, an IBM researcher who built a checkers programme that improved through experience. The formal definition most widely cited comes from Tom Mitchell's 1997 textbook: a computer programme learns from experience if its performance on a task improves with that experience. The key distinction from traditional software is that a machine learning system writes its own rules by finding patterns in data, rather than having a programmer specify them in advance.
Deep learning sits inside machine learning. It refers specifically to machine learning systems that use artificial neural networks with many layers (the "deep" refers to the depth of those layers, not to depth of understanding). The foundational work was done across several decades by Geoffrey Hinton, Yann LeCun, and Yoshua Bengio — collectively known as the "godfathers of deep learning," who shared the 2018 ACM Turing Award for their contributions. Their 2015 review paper in Nature remains the standard reference. Deep learning is what made modern AI breakthroughs possible, from image recognition to language generation.
Generative AI is an application of deep learning. Where earlier AI systems were primarily built to classify, predict, or recommend, generative AI creates new content — text, images, video, audio, or code — typically from natural language instructions. The concept existed in earlier forms, but it went definitively mainstream with the release of ChatGPT in November 2022, which reached 100 million users in two months. When people in business settings say "AI" today, they usually mean generative AI specifically, even though it represents only one branch of a much larger field.
The practical implication of this hierarchy: not all AI is machine learning, not all machine learning is deep learning, and not all deep learning is generative. When a vendor says their product "uses AI," it is worth asking which kind. The answer tells you a great deal about what the product can and cannot do.
The building blocks: how modern AI systems work
With the big picture established, the next set of terms describes the architecture and mechanics underneath the AI systems you are most likely to encounter at work.
Neural networks
A neural network is a computational model loosely inspired by biological brains. It consists of interconnected nodes — called artificial neurons — organised in layers. Data enters through an input layer, passes through one or more hidden layers where patterns are detected, and produces results at an output layer. Each connection between nodes has a numerical weight that gets adjusted during training.
The idea dates to 1943, when Warren McCulloch and Walter Pitts published the first mathematical model of an artificial neuron. Frank Rosenblatt built the first practical implementation — the Perceptron, in 1958. After a period of decline (caused partly by a 1969 critique showing the Perceptron's limitations), the field was revived in 1986 when David Rumelhart, Geoffrey Hinton, and Ronald Williams published their influential paper on backpropagation, a method for efficiently training multi-layer networks.
When a neural network has many layers, it becomes a deep neural network, and that is where the term deep learning comes from.
The transformer
If there is one technical concept worth understanding in modern AI, it is the transformer. Introduced in a 2017 paper by researchers at Google, now cited over 173,000 times and titled, with characteristic understatement, "Attention Is All You Need," the transformer is the architecture that underpins virtually every major AI system you interact with today: ChatGPT, Claude, Gemini, Llama, DALL-E, and Stable Diffusion are all built on transformer foundations.
What made the transformer different from earlier approaches was a mechanism called self-attention. Previous language models processed text sequentially, one word at a time, in order. The transformer processes all words simultaneously and learns which words in a passage are most relevant to each other, regardless of their position. The standard analogy is a classroom where every student can instantly communicate with every other student to determine who has the most relevant information for any given question.
This matters for professionals because it explains both the strengths and the limitations of modern AI. Transformers are extraordinarily good at capturing relationships and patterns across large amounts of text. They are not reasoning engines. They do not understand meaning in the way humans do. They identify and reproduce statistical patterns at tremendous scale. That turns out to be remarkably powerful, but also fundamentally different from human cognition.
Parameters
When you hear that a model has "175 billion parameters," the term refers to the adjustable numerical values inside the neural network that the model learned during training. Every connection between neurons has a weight; every neuron has a bias value. These are the parameters. During training, each one is minutely adjusted, millions of times, until the model can reproduce the patterns in its training data.
A helpful analogy: if a neural network were a piano, parameters would be the tuning of each individual string. Training is the process of adjusting every string until the piano can play any melody it has heard.
More parameters generally means more capacity to learn complex patterns, but also more computational cost to train and run. GPT-3 has 175 billion parameters. GPT-4 is estimated to have around 1.7 trillion (OpenAI has not confirmed the figure). Meta's Llama 4 Maverick has 400 billion total parameters but uses a technique called Mixture of Experts (more on that later) to activate only 17 billion for any given query, keeping it efficient.
Tokens
Tokens are the atomic units that large language models actually process. Rather than reading text word by word, models break text into tokens: pieces that might be whole words, parts of words, single characters, or punctuation marks. The word "understanding" might become two tokens: "understand" and "ing." Common words like "the" are typically a single token. Unusual words get broken into smaller pieces.
The practical detail that matters: one token is roughly three-quarters of an English word. A thousand tokens is approximately 750 words. Tokens determine three things you will encounter in practice: how much a model costs to use (API pricing is per token), how much text you can feed it at once (the context window, discussed next), and how much it can generate in a single response.
Context window
The context window is the maximum amount of text — measured in tokens — that a model can process at once. Think of it as the model's working memory: everything it can "see" while generating a response, including your instructions, any documents you have provided, and its own output so far.
Context windows have grown dramatically. Early GPT models had context windows of a few thousand tokens. As of early 2026, standard context windows range from 32,000 to 200,000 tokens (Claude and GPT-4o), with some models offering one to two million tokens (Google's Gemini 2.5 Pro) or even larger. For reference, 128,000 tokens is roughly 96,000 words, or roughly a 250-page book.
A larger context window does not mean better comprehension, however. Research has identified a phenomenon called "Lost in the Middle," where models perform best on information placed near the beginning or end of the context window and struggle with material buried in the centre. When working with AI on long documents, this is worth knowing.
How AI systems learn: training, fine-tuning, and alignment
Understanding how AI models are built, and how they are improved, helps explain both their capabilities and their failure modes.
Training (pre-training)
Training is the initial, large-scale process of teaching a model from scratch. For a large language model, this means processing vast amounts of text data — books, websites, articles, code — and learning to predict, given any sequence of tokens, which token is most likely to come next. This next-token prediction task, repeated billions of times, is how the model develops its internal representation of language patterns.
Training a large model is extraordinarily expensive. Training GPT-3 cost an estimated $12 million in compute alone, required thousands of specialised processors running for weeks, and consumed enormous amounts of energy. This expense is why most organisations use pre-trained models rather than building their own, and why the companies that can afford to train large models (OpenAI, Google, Anthropic, Meta) occupy such a powerful position in the market.
Fine-tuning
Fine-tuning is the process of adapting an already-trained model to a specific task or domain using a smaller, more targeted dataset. If training is like giving someone a broad general education, fine-tuning is like giving them specialised professional training on top of it.
Fine-tuning is dramatically cheaper and faster than training from scratch, which is why it has become the standard approach for organisations that want AI tailored to their specific needs. A hospital might fine-tune a general language model on medical literature. A law firm might fine-tune on legal documents and case law.
Supervised, unsupervised, and reinforcement learning
These terms describe three fundamentally different approaches to how a machine learning system learns from data.
In supervised learning, the system learns from labelled examples, data where the correct answer is already known. You show it thousands of images labelled "cat" or "not cat," and it learns to tell the difference. Stanford HAI's definition puts it simply: the computer learns to predict human-given labels. This is the most common approach in commercial AI applications — fraud detection, spam filtering, medical diagnosis, and demand forecasting all typically use supervised learning.
In unsupervised learning, the system finds patterns in data without being told what to look for. There are no labels. The model discovers structure on its own: grouping similar customers together, identifying unusual transactions, or finding topics across thousands of documents. The pre-training phase of large language models is a form of unsupervised (or more precisely, self-supervised) learning.
Reinforcement learning takes a different approach entirely. An agent learns by trial and error, receiving rewards for good outcomes and penalties for bad ones, and gradually developing a strategy that maximises its cumulative reward. The definitive textbook is Sutton and Barto's Reinforcement Learning: An Introduction (MIT Press), described by DeepMind's CEO Demis Hassabis as "the bible of reinforcement learning." The most famous reinforcement learning achievement was AlphaGo's defeat of world champion Go player Lee Sedol in 2016.
Semi-supervised learning falls between the first two: a small amount of labelled data combined with a large amount of unlabelled data. This is common in practice because labelling data is expensive and time-consuming.
RLHF: reinforcement learning from human feedback
RLHF is the technique that turned raw language models into the helpful, conversational AI systems people interact with today, and it is arguably the single most important concept for understanding why modern AI assistants behave the way they do.
The process works in three stages. First, human demonstrators show the model examples of good responses (supervised fine-tuning). Second, human evaluators compare pairs of model outputs and indicate which is better, and a reward model is trained on these preferences. Third, the language model is optimised using reinforcement learning to produce responses that the reward model rates highly.
The 2022 InstructGPT paper by Ouyang and colleagues at OpenAI demonstrated something remarkable: a 1.3 billion parameter model fine-tuned with RLHF was preferred by human evaluators over the vastly larger 175 billion parameter GPT-3 without RLHF. In other words, teaching a small model to follow human preferences produced better results than simply making a model bigger. This finding shaped the development of ChatGPT, Claude, Gemini, and virtually every modern AI assistant. The foundational work on learning from human preferences was established by Christiano et al. in 2017.
Transfer learning
Transfer learning is the technique of taking a model trained on one task and reusing it as the starting point for a different but related task. Rather than training from scratch every time, you build on what the model has already learned.
This is the principle that makes modern AI economically viable for most organisations. When you fine-tune GPT-4 or Claude for a specific business application, you are applying transfer learning: starting with a model that has already learned the general structure of language and adapting it to your particular needs. Without transfer learning, every AI application would require the kind of massive, expensive training runs that only the largest technology companies can afford.
The terms you encounter when using AI
These are the concepts you are most likely to meet in practice: when using AI tools, evaluating AI products, or discussing AI implementation with colleagues.
Large language models (LLMs)
A large language model is a deep learning system built on the transformer architecture, trained on vast quantities of text, designed to understand and generate natural language. The "large" refers both to the enormous datasets used in training and to the billions (or trillions) of parameters in the model.
Major LLMs you are likely to encounter in professional settings include GPT-4 and GPT-5 (from OpenAI), Claude (from Anthropic), Gemini (from Google), and Llama (from Meta). Each has different strengths, context window sizes, pricing structures, and terms of use.
What LLMs actually do, at a mechanical level, is predict the next most likely token in a sequence. They do this based on statistical patterns learned during training, not by "understanding" language in any human sense. This distinction matters because it explains both why they are so capable (they have absorbed patterns from an extraordinary breadth of human text) and why they sometimes fail (they can produce fluent, confident nonsense when the statistical patterns lead them astray).
Generative AI
Generative AI refers to any AI system that creates new content — text, images, video, audio, code, or music — typically from natural language prompts. LLMs are one category of generative AI. Image generators like DALL-E, Midjourney, and Stable Diffusion are another. Video generators like Sora represent a newer frontier.
The term existed in academic AI research as far back as the 1980s, but it entered mainstream usage after the release of ChatGPT in late 2022. Collins Dictionary chose "AI" as its 2023 word of the year, largely driven by the cultural impact of generative AI.
Foundation models
A foundation model is a large AI model trained on broad data at scale that can be adapted to a wide range of tasks. The term was coined by researchers at Stanford's Center for Research on Foundation Models in a widely cited 2021 report with over 100 co-authors. They identified two defining properties: emergence (the model develops capabilities that were not explicitly programmed) and homogenisation (many different applications are built on the same underlying model).
GPT-4, Claude, Gemini, and Llama are all foundation models. The concept matters because it reflects how AI is actually being built and deployed now: rather than training a separate model for every task, organisations increasingly start with a foundation model and adapt it. Think of a foundation model as an engine block, carefully designed to provide core functionality, but capable of powering many different vehicles depending on how it is adapted.
Prompts and prompt engineering
A prompt is the natural language text you provide to an AI system to describe what you want it to do. Every time you type a question into ChatGPT or Claude, you are writing a prompt.
Prompt engineering is the practice of systematically designing and refining prompts to get better results. It encompasses techniques like providing examples of what you want (few-shot prompting), asking the model to reason step by step (chain-of-thought prompting), and assigning the model a persona or role.
The term became a recognised professional skill after the release of GPT-3 in 2020, and the Oxford English Dictionary selected "prompt" as its 2023 runner-up word of the year. By 2025, the concept is evolving. Many practitioners now prefer the broader term context engineering, which covers not just the question you ask, but the entire set of information you structure around it: system instructions, reference documents, examples, and constraints. The shift in terminology reflects a maturing understanding that getting good results from AI depends less on finding magic phrases and more on providing clear, well-organised context.
Hallucination and confabulation
When an AI system generates content that is factually incorrect or entirely fabricated, but presents it with complete confidence as though it were true — this is called a hallucination. The term entered AI usage in 2000 at a computer vision conference (originally with a positive connotation) and was applied to language models by Google DeepMind researchers in 2018 to describe translations completely untethered from their source material. It became mainstream after ChatGPT's release in 2022, and Cambridge Dictionary updated its definition in 2023 to include the AI sense.
The term is contested. A growing number of researchers prefer confabulation, borrowed from neuropsychology, where it describes the unintentional production of false statements that the speaker genuinely believes to be true. The arguments for the change are threefold: hallucinations in the medical sense involve perceptual experiences, which AI systems do not have; the term risks anthropomorphising AI in misleading ways; and it can be insensitive to people who experience actual hallucinations. Some researchers have argued that confabulation more accurately describes the mechanism: LLMs produce plausible-sounding but unverified output because they are probability engines, not knowledge databases.
Whatever you call it, the practical implications are serious. Lawyers have been sanctioned for submitting AI-generated fictional case citations to courts. Air Canada was forced to honour a discount its chatbot fabricated. Google lost approximately $100 billion in market capitalisation after its Bard chatbot hallucinated in a promotional demonstration. For professionals using AI, the operating principle is straightforward: treat AI output as a first draft that requires verification, not as a reliable source of truth.
RAG: retrieval-augmented generation
RAG is a framework that addresses the hallucination problem by giving AI systems access to external, verified information sources when generating responses. Instead of relying solely on what the model learned during training, a RAG system retrieves relevant documents from a knowledge base — a company's internal documentation, a database of products, a library of policies — and provides that information to the model alongside the user's question.
The technique was introduced in a 2020 paper by Lewis and colleagues at Meta AI and University College London. The analogy that captures it well: think of the LLM as a skilled writer and the retrieval system as a research assistant. When you ask a question, the assistant runs to the library, pulls the most relevant pages, and hands them to the writer. Without the assistant, the writer works entirely from memory, which is where things go wrong.
RAG matters to professionals because it is the primary way organisations are deploying AI for internal knowledge management, customer service, and decision support. When a vendor says their product "grounds AI responses in your company's data," they are almost certainly describing a RAG system.
Embeddings and vector databases
Embeddings are numerical representations that capture the meaning of text, images, or other data as arrays of numbers. The core insight is that items with similar meanings end up close together in this numerical space. The word "doctor" and the word "physician" would have very similar embeddings, even though they share no letters. The foundational work was Mikolov et al.'s 2013 Word2Vec paper, which demonstrated that word relationships could be captured arithmetically (the famous example being that "king minus man plus woman equals queen" in embedding space).
Vector databases are specialised databases designed to store, index, and efficiently search through millions or billions of these embeddings. They are essential infrastructure for RAG systems, semantic search, and recommendation engines. Major providers include Pinecone, Weaviate, Milvus, and Chroma. You may not need to understand the technical details, but knowing that "vector database" refers to the storage layer that enables semantic search over your organisation's data is increasingly useful in technology procurement conversations.
Multimodal AI
Multimodal AI refers to systems that can process and work with multiple types of input — text, images, audio, video, and sometimes other data formats — rather than being limited to a single modality. Early AI systems were typically specialists: a text model could not process images, and an image model could not understand text. Modern multimodal models handle several input types simultaneously.
GPT-4o can process text, images, and audio. Gemini was designed from the ground up as a multimodal system. Claude works with text and images. This matters in professional settings because it expands what AI can do: analysing photographs alongside written reports, extracting information from charts and diagrams, or processing meeting recordings that combine speech with visual presentations.
Agents and agentic AI
Agentic AI is one of the most discussed — and most loosely defined — terms in current AI discourse. MIT Sloan defines agentic AI systems as "a new breed of AI systems that are semi- or fully autonomous and thus able to perceive, reason, and act on their own."
The key distinction from conventional AI assistants: a standard chatbot responds to a single question and waits for the next one. An agentic system can break a complex goal into steps, execute those steps across multiple tools and data sources, evaluate its own progress, and adjust its approach, with minimal human intervention along the way. A generative AI can draft a marketing email. An agentic AI can draft the email, decide when to send it, track engagement, and adjust the strategy over time.
As of early 2026, most "agentic" systems in production still operate with significant human oversight. The fully autonomous AI employee remains more aspiration than reality. But the direction of travel is clear, and the term is appearing regularly in vendor pitches, job descriptions, and strategic planning documents. Worth understanding; worth being sceptical about the more ambitious claims.
Chatbot, AI assistant, and copilot
These three terms represent an evolution in how AI tools are positioned, and the distinctions matter for understanding what a product actually does.
A chatbot is a text-based conversational interface, often rule-driven, designed for specific, narrow tasks like customer service queries or appointment booking. The concept dates to ELIZA in 1966. Many systems labelled "chatbots" today are far more capable than the term implies.
An AI assistant is broader: a multi-purpose system that can handle a range of tasks across different domains. Siri, Alexa, and modern LLM-based interfaces like ChatGPT and Claude fall into this category.
A copilot is a term popularised by Microsoft and GitHub from 2021 onwards, positioning AI as a collaborative partner embedded in existing workflows. The framing is deliberate: the human remains the pilot, making decisions and maintaining control, while the AI handles supporting tasks. It is a useful framing for thinking about AI adoption, though it is worth noting that the term is as much a marketing choice as a technical descriptor.
How AI learns to behave: safety, alignment, and governance
As AI systems become more capable and more widely deployed, a second vocabulary has emerged around the question of how to ensure they behave responsibly. These terms now appear regularly in board discussions, regulatory documents, and vendor assessments.
AI alignment and AI safety
AI safety is the broad field of research focused on ensuring AI systems do not cause unintended harm, encompassing technical concerns (robustness, reliability, control) and societal ones (misuse prevention, long-term risks).
AI alignment is a more specific concept within AI safety: the effort to ensure that AI systems behave in ways that are consistent with human values and intentions. The concern is straightforward: as AI systems become more capable, the consequences of them pursuing goals that diverge from what we actually want become more severe. The mathematician Norbert Wiener articulated the core worry as early as 1960: we had better be quite sure that the purpose put into the machine is the purpose which we really desire. Stuart Russell's 2019 book Human Compatible provides an accessible treatment of the alignment problem.
In practice, alignment techniques include RLHF (discussed earlier) and Constitutional AI, an approach developed by Anthropic where models are trained using explicit written principles rather than relying purely on human feedback.
Bias and fairness
Bias in AI refers to systematic errors or unfairness in a model's outputs. The NIST Special Publication 1270 (2022) identifies three categories that professionals should understand: systemic bias (reflecting existing inequalities in society's institutions), statistical bias (arising from unrepresentative training data, and notably this can occur in the absence of any prejudice), and human-cognitive bias (systematic errors in the thinking of the people who design, develop, and evaluate AI systems).
NIST makes a point worth internalising: bias is neither new nor unique to AI, and it is not possible to achieve zero risk of bias in an AI system. The question is not whether bias exists but whether it has been identified, measured, and managed appropriately for the context in which the system is being used.
Fairness is the aspiration that AI systems treat individuals and groups equitably. Academic literature distinguishes several types (group fairness, individual fairness, counterfactual fairness, and procedural fairness), and a significant body of mathematical research has shown that some widely-used fairness definitions are mutually incompatible, requiring context-specific tradeoffs rather than universal solutions.
Explainability and interpretability
Explainability (or XAI, for explainable AI) refers to the extent to which an AI system's decision-making process can be described in terms humans can understand. Interpretability is a related but distinct concept: where explainability answers how a decision was made, interpretability answers why and what it means in context.
The NIST AI Risk Management Framework draws this distinction clearly: explainability describes the mechanism; interpretability describes the meaning for the user. The US Defence Advanced Research Projects Agency (DARPA) ran a major Explainable AI programme from 2017 to 2021, and the results were mixed. Modern deep learning systems remain largely opaque in their internal reasoning, even to their creators.
This matters for professionals because many high-stakes applications — lending decisions, medical diagnoses, hiring recommendations — require some degree of explanation, both for regulatory compliance and for maintaining trust. If a vendor claims their AI is "explainable," it is worth asking what that means specifically.
Guardrails
Guardrails are the technical and procedural safety controls that establish boundaries for AI behaviour. They operate at three levels: input guardrails filter what the AI receives (blocking harmful prompts or sensitive data), processing guardrails control what the AI can access and do during computation, and output guardrails evaluate responses for problems like toxicity, personally identifiable information leakage, or hallucinated content.
In enterprise contexts, guardrails are what stand between a capable AI model and the specific constraints of your industry, your organisation, and your regulatory environment. They are now a point of competitive differentiation among AI vendors, and worth evaluating closely when assessing AI products for deployment.
Benchmarks
Benchmarks are standardised tests designed to measure and compare AI model capabilities. They serve the same function as exam results, an imperfect but useful way to assess performance against a common standard.
Benchmarks you may encounter in vendor materials or AI news coverage include MMLU (10,000 multiple-choice questions across 57 subjects, testing general knowledge), HumanEval (164 Python programming problems, from OpenAI), and GPQA (graduate-level expert questions requiring specialist reasoning). The Stanford HAI AI Index tracks benchmark performance annually.
A word of caution: top models now saturate many of the easier benchmarks, scoring near or above human performance, which has driven the creation of harder tests like Humanity's Last Exam. Benchmark scores are useful for rough comparisons but are a poor guide to how a model will perform on your specific tasks. Real-world evaluation on your actual use cases matters more.
Models and deployment: how AI reaches you
These terms relate to the practical business of how AI models are made available and how they reach end users.
Training, inference, and the cost distinction
Understanding the difference between training and inference is essential for anyone involved in AI budgets or procurement.
Training is the upfront process of building the model: expensive, time-consuming, and typically done once (or occasionally repeated with updated data). Inference is using the trained model to generate outputs in response to new inputs. Every time you send a prompt to ChatGPT or Claude, that is inference.
For organisations deploying AI, inference is the ongoing operational cost — the meter running every time someone uses the system. API pricing is typically per token processed. A model that costs slightly more per query but gives better results on the first attempt may be cheaper in practice than a less capable model that requires multiple attempts. This is the emerging field of token economics, and it is worth understanding if you are involved in technology procurement.
Open-source, closed-source, and open-weights
These terms describe how much access you have to the internals of an AI model, and the distinctions matter more than they might initially appear.
Closed-source (or proprietary) models keep everything hidden — the model's internal parameters, architecture details, training data, and training process. You interact with them only through an API or a web interface. GPT-4, GPT-5, and Claude are closed-source. You can use them, but you cannot see inside them, modify them, or run them on your own infrastructure.
Open-weights models make the model's learned parameters publicly available, allowing anyone to download, run, and fine-tune the model. Meta's Llama, Google's Gemma, and Mistral's models fall into this category. However, the training data, training code, and full training process typically remain hidden. This is an important distinction: you can use the model, but you cannot fully reproduce how it was built.
Truly open-source AI, by the standard definition, would require releasing everything: weights, source code, training code, and training data. The Open Source Initiative published its formal Open Source AI Definition in October 2024, requiring the freedom to use, study, modify, and share, including sufficient information about training data to reconstruct the system.
This distinction is not merely semantic. The EU AI Act uses "open source" as a basis for certain regulatory exceptions, which means the question of what counts has real legal consequences. When a vendor describes their model as "open source," it is worth asking exactly what has been released. As Amanda Brock, CEO of OpenUK, has noted, many models marketed under that label do not meet the standard definition.
Edge AI and on-device AI
Edge AI (or on-device AI) refers to AI processing that happens locally — on your smartphone, laptop, car, or IoT sensor — rather than in a remote data centre. When your phone transcribes speech, identifies a face, or suggests text completions without sending data to the cloud, that is edge AI at work.
The advantages are practical: lower latency (no round trip to a server), better privacy (data does not leave the device), and the ability to work offline. The tradeoff is that on-device models must be smaller and more efficient than their cloud counterparts. This is driving significant investment in model distillation (training a compact "student" model to replicate a larger "teacher" model) and quantisation (reducing the numerical precision of a model's parameters to make it smaller and faster).
Apple's Neural Engine, Google's Gemini Nano, and Microsoft's Phi series of small language models are all part of this trend.
APIs in the AI context
An API (application programming interface) is a standardised interface that allows developers to integrate AI capabilities into their own applications without building or hosting the models themselves. When a company adds AI-powered features to their product — a smart search function, an automated summary tool, a content generation feature — they are very often making API calls to models hosted by providers like OpenAI, Anthropic, or Google.
API access has democratised AI deployment, making it possible for small companies to offer AI-powered features that would be impossibly expensive to build from scratch. The tradeoff is vendor dependency: your application's AI capabilities are only as reliable and available as the API provider's service. Pricing, terms of service, and model capabilities can change at the provider's discretion.
How the regulators define AI: the terms in law and policy
AI terminology is no longer just a matter of technical precision. It has entered legislation, and the definitions carry legal weight.
The OECD definition
The OECD's definition of an AI system, updated in May 2024, is the most widely adopted global reference. It defines an AI system as a machine-based system that, for explicit or implicit objectives, infers from input how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. The keyword is infers: the ability to go beyond rigid, pre-programmed rules is what distinguishes AI from conventional software. Forty-seven countries have adhered to the OECD's AI principles.
The EU AI Act
The EU AI Act (Regulation 2024/1689), which entered into force on 1 August 2024, is the world's first comprehensive AI legislation. Its definition of an AI system aligns closely with the OECD's but adds emphasis on varying levels of autonomy and the possibility of adaptiveness after deployment, recognising that AI systems can change their behaviour over time.
The Act introduces four risk tiers that professionals should understand. Unacceptable risk applications are prohibited outright, including social scoring systems, manipulative AI targeting vulnerable groups, and most real-time biometric identification in public spaces. High-risk applications are permitted but heavily regulated, covering areas like biometric identification, critical infrastructure, employment decisions, and law enforcement. Limited risk applications face transparency obligations: chatbots must disclose they are AI, and deepfake content must be labelled. Minimal risk applications, the majority of current AI products, are unregulated.
The Act also introduces an AI literacy obligation (Article 4), defined as the skills, knowledge, and understanding needed to make informed decisions about AI systems. This is notable because it places a legal requirement on organisations deploying AI to ensure their staff are adequately informed.
NIST AI Risk Management Framework
The US National Institute of Standards and Technology published its AI Risk Management Framework (AI RMF 1.0) in January 2023, establishing seven characteristics of trustworthy AI: valid and reliable, safe, secure and resilient, accountable and transparent, explainable and interpretable, privacy-enhanced, and fair with harmful bias managed. It is not legislation but a voluntary framework, yet it has become a de facto standard for AI governance in the United States and is widely referenced internationally.
ISO/IEC standards
ISO/IEC 22989:2022 is the international standard for AI concepts and terminology. It provides formal definitions for terms including AI system, AI agent, narrow AI, and AGI. ISO/IEC 42001:2023 is the world's first international AI management system standard (think of it as the ISO 27001 equivalent for AI governance). Organisations seeking formal AI governance frameworks increasingly reference these standards.
Emerging terminology: the concepts entering professional discourse in 2025 and 2026
The AI vocabulary continues to expand. These are the newer terms you are most likely to encounter in the coming months.
Reasoning models and test-time compute
Reasoning models represent a new category of LLMs that break down complex problems through structured, step-by-step reasoning before producing a final answer. Where standard LLMs generate responses immediately, reasoning models spend additional time "thinking": generating intermediate calculations, exploring multiple approaches, and evaluating potential solutions.
The RAND Corporation describes the approach clearly: rather than generating answers immediately, these models engage in explicit step-by-step reasoning, essentially thinking through a problem before answering. Key examples include OpenAI's o1 and o3 series, DeepSeek R1, and Google's Gemini 2.5.
The technical concept behind this is test-time compute (or inference-time compute scaling), which means spending more computational resources during inference to produce better results. The tradeoff is real: reasoning models are significantly more expensive to run (estimates range from 10 to 74 times the cost of standard models). But on complex tasks like mathematical reasoning, they achieve dramatically better results.
The best analogy comes from Daniel Kahneman's framework: traditional LLMs operate like System 1 thinking — fast, intuitive, pattern-matching. Reasoning models implement System 2 thinking — slow, deliberate, analytical. Knowing when a task warrants the slower, more expensive approach is an emerging practical skill.
Model Context Protocol (MCP)
Introduced by Anthropic in November 2024, the Model Context Protocol is an open standard for connecting AI systems to external tools, data sources, and services. It has been adopted by OpenAI, Google DeepMind, and a growing ecosystem of third-party developers.
The analogy used most frequently: MCP is like USB-C for AI — a universal adapter that allows any compliant AI application to interact with any compatible data source or service without requiring custom integration code. Before MCP, every connection between an AI system and an external tool required bespoke development. MCP provides a standardised way for AI systems to discover and use tools, access data, and follow workflows.
For organisations building AI-powered systems, MCP matters for architecture and vendor-selection decisions.
Shadow AI
Shadow AI is the unsanctioned use of AI tools by employees without formal organisational approval or oversight, the AI equivalent of shadow IT. An employee pastes confidential client data into ChatGPT to draft a report. A team uses an unapproved AI transcription service for sensitive meetings. A department builds workflows around a free AI tool without informing IT or legal.
The scale is significant: research suggests that 38 per cent of employees acknowledge sharing sensitive work information with AI tools without permission, and approximately 90 per cent of enterprise AI usage occurs without security or IT teams' knowledge. IBM's 2025 Cost of Data Breach Report found that organisations with high shadow AI activity suffered roughly $670,000 more in breach costs. This is not a hypothetical risk. It is an operational one, and the terminology is worth knowing because it frames a governance challenge that most organisations are actively grappling with.
Mixture of Experts (MoE)
Mixture of Experts is a model architecture that uses multiple specialised sub-networks — each an "expert" in different aspects of the input — with a gating mechanism that routes each input to the most relevant experts. The key efficiency gain: only a subset of experts activates for any given query, so a model with hundreds of billions of total parameters might only use a fraction of them for each response.
This approach has gained prominence through models like Mixtral, DeepSeek, and Meta's Llama 4 Maverick (400 billion total parameters, 17 billion active per query). It matters because it signals a different approach to model scaling: rather than making every part of a model bigger, you make the model more modular and efficient.
Small language models (SLMs)
The counterpoint to ever-larger models, small language models are compact models — typically under 10 billion parameters — designed for efficiency and deployment on edge devices. Google's Gemini Nano, Microsoft's Phi series, and Meta's smaller Llama variants fall into this category.
The interest in SLMs reflects a maturing understanding that bigger is not always better. For many practical business applications, a well-fine-tuned small model running cheaply on-device can outperform a massive cloud-based model that is slower, more expensive, and raises data privacy concerns. Research in late 2024 demonstrated that a 3 billion parameter model could outperform a 70 billion parameter model on certain reasoning tasks through improved inference strategies.
Model collapse
Model collapse describes a degenerative process where AI models trained on AI-generated data, rather than human-created data, suffer progressive and irreversible quality degradation. A 2024 paper in Nature by researchers from Oxford, Cambridge, Imperial College London, and the University of Toronto demonstrated that when models are trained on the output of other models across generations, the tails of the original data distribution disappear, diversity is lost, and output quality degrades in ways that cannot be recovered.
This is a long-term concern for the AI industry. As AI-generated content proliferates across the internet, the data available for training future models becomes polluted with synthetic text. The researchers concluded that maintaining access to genuine human-created data will be essential for preserving model quality, a finding with real implications for data strategy, content licensing, and the value of authentic human expertise.
Other terms worth watching
Several additional terms are entering professional discourse with increasing frequency:
AI governance refers to the policies, processes, and structures an organisation puts in place to manage AI use responsibly. It is now a board-level concern and a compliance requirement under frameworks like the EU AI Act.
Responsible AI is the broader set of practices and principles for identifying, assessing, and mitigating the negative impacts of AI systems. Where governance is structural, responsible AI is aspirational, though the two are converging as regulation tightens.
AI red teaming is the practice of systematically testing AI systems for vulnerabilities, dangerous capabilities, and failure modes through adversarial methods, borrowing the concept from cybersecurity. It has become standard practice at major AI labs and is now expected by regulators.
Scaling laws are the empirical observations, first formalised by Kaplan et al. (2020) at OpenAI, that AI model performance improves predictably as you increase compute, data, and parameters. These laws have guided billions of dollars in investment decisions. More recently, the concept has been extended to inference-time scaling, the finding that you can also improve performance by spending more compute at the point of use (the principle behind reasoning models).
Synthetic data is artificially generated data that mimics real-world patterns, used for training AI models when real data is insufficient, restricted by privacy regulations, or too expensive to collect. Gartner has predicted that by 2030, synthetic data will overshadow real data in AI model training.
Digital twin is a virtual replica of a physical object, system, or process that is continuously updated with real-time data. AI-enhanced digital twins can predict equipment failures, optimise supply chains, and simulate scenarios. The concept predates AI but has been transformed by it.
Five things the terminology reveals about the state of AI
If you have read this far, a pattern will have emerged. The vocabulary of AI tells a story about the technology, yes, but also about the maturity (and immaturity) of how we are collectively grappling with it.
The hierarchy matters more than individual terms. The most common confusion among professionals is not about any single definition but about how concepts relate to each other. AI contains machine learning, which contains deep learning, which enables generative AI. Foundation models are trained, then fine-tuned, then deployed for inference. RAG connects LLMs to external knowledge. Understanding these relationships is more valuable than memorising definitions in isolation.
Many terms are moving targets, and that is acceptable to say out loud. "Open source" means something different in AI than in traditional software. "Hallucination" is contested by researchers who prefer "confabulation." "Prompt engineering" is already evolving into "context engineering." AGI timelines are a matter of sharp disagreement among the world's foremost researchers. A professional who acknowledges this fluidity demonstrates more understanding than one who pretends the definitions are settled.
The regulatory definitions now carry commercial weight. The OECD's AI system definition has been adopted by the EU, United States, and United Nations. The EU AI Act's risk tiers are reshaping product development. The distinction between open-weights and open-source has regulatory consequences. Professionals need to understand what terms mean legally and commercially, not just technically.
Usage has outstripped understanding, and that is the real risk. Seventy-five per cent of knowledge workers use AI tools. Thirty-nine per cent have received training. Nearly half of leaders say skill gaps are slowing adoption. This is not a niche problem; it is a structural barrier to competent AI adoption across every industry.
The terminology will keep changing, and that is fine. The goal is not to achieve a fixed vocabulary but to develop the conceptual framework that makes new terms intelligible when you encounter them. If you understand what a transformer is, you can quickly grasp what a reasoning model adds to it. If you understand training and inference, you can evaluate claims about test-time compute. The foundations make the moving parts manageable.
Further reading
For those who want to go deeper, these are the most useful authoritative sources we have found:
For definitions and frameworks: Stanford HAI's Brief Definitions of Key Terms in AI by Professor Christopher Manning is concise, authoritative, and regularly updated. The NIST AI Risk Management Framework provides the standard US governance vocabulary. The EU AI Act's Article 3 definitions are the legal reference for organisations operating in or selling to Europe.
For academic foundations: LeCun, Bengio, and Hinton's 2015 Nature review on deep learning remains the best single overview of the field's foundations. Vaswani et al.'s "Attention Is All You Need" (2017) is the paper that launched the transformer era. Bommasani et al.'s Stanford report on foundation models (2021) provides essential context for understanding the current landscape.
For the current state of AI: The Stanford HAI AI Index Report is published annually and provides the most comprehensive data-driven overview of AI progress, adoption, and policy. McKinsey's State of AI survey offers the best view of enterprise AI adoption patterns.
For ongoing learning: We publish a free weekly briefing that covers the most important AI developments in plain language, designed for the same audience as this guide. Subscribe here →
Key takeaways
- —AI, machine learning, deep learning, and generative AI are not synonyms. They are nested layers, each a subset of the one before. When someone says "AI," asking which kind tells you more about what a product can actually do than any amount of marketing language.
- —Modern AI systems predict what sounds right, not what is right. Large language models work by predicting the next most likely token based on statistical patterns. This is why they can produce fluent, confident nonsense, and why every AI output should be treated as a first draft requiring verification, not a reliable source of truth.
- —The terminology is deliberately contested, and that is a sign of a maturing field. "Open source" means something different in AI than in traditional software. "Hallucination" is being challenged by researchers who prefer "confabulation." Professionals who can name the disagreement demonstrate more understanding than those who pretend the definitions are settled.
- —Regulatory definitions now carry commercial weight. The OECD's AI system definition has been adopted into the EU AI Act, US executive orders, and UN frameworks. Risk tiers are reshaping product development. The distinction between open-weights and open-source has legal consequences. AI vocabulary is no longer just technical. It is legal and financial.
- —Understanding the relationships between concepts matters more than memorising individual definitions. Foundation models are trained, then fine-tuned, then deployed for inference. RAG connects models to external knowledge to reduce hallucinations. Guardrails constrain what models can do. Once you grasp how the pieces connect, new terms become intelligible as they arrive, and they will keep arrivin
Want to go deeper? See the related professional guide →
Prefer a structured route?
Stay informed
The AI Primer Briefing is a weekly digest of what matters in AI — curated for professionals, free of breathless hype.