What AI Actually Is (And What It Is Not)
If you have picked up this article, you probably fall into one of two camps. Either you have been using AI tools and want to understand what is actually happening beneath the surface, or you have been hearing about AI constantly and feel a growing pressure to get up to speed. Both are good reasons to be here.
The trouble is that almost everything written about AI right now is calibrated to make you feel something rather than help you understand something. The breathless headlines, the apocalyptic predictions, the promises of overnight transformation: they make for compelling content, but they are poor teachers. They leave you with strong feelings and weak foundations.
This article is designed to fix that. We are going to walk through what artificial intelligence actually is as a field of science, how the systems you are hearing about actually work, what they can genuinely do today, and what they cannot do despite appearances. By the end, you should be able to hold your own in any professional conversation about AI, ask better questions of vendors and colleagues, and separate real capability from marketing noise.
No technical background required. Just your attention and a willingness to sit with some complexity. AI is not simple, and anyone who tells you otherwise is selling something.
The term is older than you think
The phrase "artificial intelligence" was coined in 1955 by John McCarthy, a mathematician at Dartmouth College. Together with Marvin Minsky, Nathaniel Rochester, and Claude Shannon, McCarthy proposed a summer research workshop based on a bold conjecture: that every aspect of learning or intelligence could, in principle, be described precisely enough for a machine to simulate it. That workshop took place in the summer of 1956, and it gave the field both its name and its founding ambition.
But the intellectual groundwork had been laid earlier. In 1950, the British mathematician Alan Turing published a paper in the journal Mind that opened with a question that still reverberates: "Can machines think?" Rather than attempt a direct answer, Turing proposed what he called the Imitation Game, now known as the Turing Test. If a human judge, communicating through text, cannot reliably tell whether they are talking to a person or a machine, then the machine can be said to exhibit intelligence. Turing predicted that by the end of the twentieth century, people would speak of machines thinking without expecting to be contradicted. He was roughly right about the timeline, if not the mechanism.
Here is the thing worth remembering: AI is not a product that appeared in November 2022 when ChatGPT launched. It is a scientific discipline that has been developing for seventy years. What changed recently was not the existence of the field but the moment the public encountered its outputs directly.
So what does "artificial intelligence" actually mean?
This is a harder question than it sounds, because the term has accumulated multiple meanings that often talk past each other.
In academic research, AI refers to the study and construction of systems that can perform tasks which, if performed by a human, would be considered to require intelligence. The standard textbook, Stuart Russell and Peter Norvig's Artificial Intelligence: A Modern Approach (now in its fourth edition), organises the field into four broad approaches: systems that think like humans (cognitive modelling), systems that act like humans (the Turing Test approach), systems that think rationally (logic-based reasoning), and systems that act rationally (the rational agent approach). Russell and Norvig favour the last of these as the most productive framework, and most working AI researchers would agree. The goal is to build systems that make good decisions given their inputs and objectives, not to replicate human cognition.
In commerce and public conversation, however, "AI" has become something closer to a marketing label. It gets applied to genuinely sophisticated machine learning systems and also to basic automation scripts, statistical models that have existed for decades, and software that simply follows if-then rules with no learning involved. This phenomenon has a name: AI washing. In March 2024, the U.S. Securities and Exchange Commission imposed its first civil penalties for misleading AI claims, charging two investment advisers with making false statements about their use of artificial intelligence. The SEC has since created a dedicated unit to pursue AI-related misconduct, and the FTC launched "Operation AI Comply" in September 2024 to crack down on deceptive claims.
A useful rule of thumb: when someone says "AI," ask what they mean specifically. Are they talking about a system that learns from data and improves over time? Or are they describing conventional software with a fashionable label? The distinction matters enormously for anyone making professional decisions about technology adoption.
A brief history of breakthroughs and winters
The story of AI is not one of steady progress. It is a story of dramatic ambition, painful disappointment, and quiet persistence, repeating in ways that are useful to understand because the pattern inoculates you against both excessive optimism and premature dismissal.
The early decades were full of ambitious demonstrations and even more ambitious promises. In 1957, Frank Rosenblatt built the Perceptron, the first algorithm capable of learning, loosely inspired by biological neurons. The New York Times reported it as the embryo of a machine that would "be able to walk, talk, see, write, reproduce itself and be conscious of its existence." This was spectacularly premature. In 1969, Minsky and Papert published Perceptrons, demonstrating mathematically that single-layer networks could not solve certain basic problems. Funding dried up. The first "AI winter" had arrived.
The field thawed in the 1980s with expert systems, software that encoded human specialist knowledge as rules. MYCIN, developed at Stanford, could diagnose blood infections as accurately as human specialists. Corporations invested heavily. Then the limitations became clear: expert systems were brittle, expensive to maintain, and could not learn or adapt. By the late 1980s, the second AI winter set in. The term "artificial intelligence" itself became toxic in grant proposals; researchers rebranded their work as "machine learning," "pattern recognition," or "computational intelligence" to secure funding.
What happened next unfolded more quietly. Three forces began converging through the 1990s and 2000s that would eventually enable the current era.
The first was data. The internet generated training material at a scale no one had anticipated. By the late 2000s, datasets like ImageNet, containing over fourteen million labelled images assembled by Fei-Fei Li and her team at Stanford, provided the raw material that learning algorithms needed.
The second was computing power. In 2007, NVIDIA released CUDA, a platform that allowed graphics processors (GPUs), originally designed for video games, to be repurposed for general computation. GPUs turned out to be ideally suited to the kinds of parallel mathematical operations that neural networks require. The cost of training a model dropped by orders of magnitude.
The third was algorithmic refinement. The core technique of backpropagation, a method for training multi-layer neural networks published by Rumelhart, Hinton, and Williams in Nature in 1986, had been known for decades but was impractical without sufficient compute. New innovations like ReLU activation functions, dropout regularisation, and batch normalisation removed bottlenecks that had stalled progress.
These three forces collided in 2012, when a team led by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered the ImageNet Large Scale Visual Recognition Challenge with a deep neural network called AlexNet. It achieved a top-5 error rate of 15.3%, beating the next best competitor by 10.8 percentage points. Trained on just two NVIDIA GTX 580 GPUs, AlexNet demonstrated that deep neural networks, given enough data and compute, could dramatically outperform all other approaches to computer vision. As Fei-Fei Li later reflected, that moment was symbolic because three fundamental elements of modern AI converged for the first time.
Everything since (the transformer architecture, large language models, generative AI) builds on that convergence.
How modern AI systems actually work
If you take away one idea from this entire article, make it this: the core mechanism behind modern AI is learning patterns from data, rather than following rules written by a programmer. That single distinction, learning from examples versus following instructions, is what separates AI from traditional software. Everything else is detail.
Machine learning: the foundation
Machine learning is the branch of AI concerned with systems that improve through experience. Arthur Samuel, who coined the term in 1959 while building a checkers programme at IBM, described it as giving computers the ability to learn without being explicitly programmed. Tom Mitchell later tightened the definition: a programme learns if its performance at some task improves with experience, as measured by some performance metric.
In practice, most machine learning works through supervised learning: the system is given labelled examples (thousands of emails marked "spam" or "not spam," thousands of medical images marked "malignant" or "benign") and learns the patterns that distinguish one category from another. Once trained, it can classify new, unseen examples. This is how spam filters, image recognition, medical diagnostics, and fraud detection systems work.
Unsupervised learning discovers structure in unlabelled data, clustering customers into segments based on purchasing behaviour, for instance, without anyone specifying what the segments should be. Reinforcement learning trains systems through trial and error, rewarding desired outcomes and penalising undesired ones. This is how DeepMind's AlphaGo learned to defeat the world champion at Go in 2016, navigating a game with more possible board positions than there are atoms in the observable universe.
Neural networks: what they are and are not
The most powerful machine learning systems use artificial neural networks, mathematical structures loosely inspired by biological brains. "Loosely" is the key word, and it deserves emphasis.
An artificial neuron is a simple mathematical function. It receives numerical inputs, multiplies each by a learned weight, sums the results, passes that sum through a mathematical function, and outputs a number to the next layer. Neurons are organised into layers: an input layer that receives raw data, one or more hidden layers that extract progressively more abstract features, and an output layer that produces a result.
The biological analogy has strict limits, and overstating it leads directly to misconceptions about what AI systems are. The human brain contains roughly 86 billion neurons connected by about 100 trillion synapses, communicating through diverse electrochemical processes involving over a hundred neurotransmitters. Artificial networks typically have thousands to millions of nodes performing simple arithmetic. Backpropagation, the algorithm used to train artificial networks, is likely absent from biological brains. The brain operates on roughly 20 watts, less than a light bulb. Training a large AI model can consume the electrical output of a small power station. A 2020 paper in Neuron put it directly: the dynamic and biophysical properties of biological neural networks are vastly different from those of artificial ones.
The name "neural network" is, in some respects, an unfortunate historical accident. It invites people to imagine a digital brain. What they should imagine instead is a very large, very flexible mathematical function that can be adjusted, through exposure to enormous quantities of data, to produce useful outputs.
Deep learning: why layers matter
"Deep learning" simply means neural networks with many layers. The word "deep" refers to the depth of the architecture, not to any philosophical depth of understanding.
Why does depth help? Because each layer can learn to recognise progressively more abstract features. LeCun, Bengio, and Hinton, the three researchers who shared the 2018 Turing Award for their work on deep learning, described it clearly in their 2015 Nature review: the first layer of a deep network typically learns to detect edges in an image. The second layer detects arrangements of edges (corners, contours, textures). The third layer assembles these into parts of recognisable objects. Subsequent layers detect whole objects as combinations of parts. Crucially, these representations are not designed by engineers; they are learned from data.
This is what made the 2012 AlexNet result matter so much. It demonstrated that given enough data and compute, deep networks would discover useful representations on their own, no human feature engineering required. The entire field pivoted.
Large language models: the systems you are actually using
If you have used ChatGPT, Claude, Gemini, or any similar tool, you have been interacting with a large language model (LLM). These are the systems driving the current wave of public attention, and understanding how they work, even at a high level, changes how you think about them.
The core mechanism is, on its face, almost absurdly simple: next-token prediction. The model is trained on vast quantities of text and learns to predict what word (or, more precisely, what sub-word fragment called a "token") comes next in a sequence. Given the input "The capital of France is," the model assigns probabilities to every possible next token and selects one. That is it. That is the fundamental operation.
What makes this mechanism powerful is the architecture it runs on and the scale at which it operates. The Transformer, introduced by Vaswani and colleagues at Google in 2017, replaced previous sequential approaches with a mechanism called self-attention, a way for the model to weigh the relevance of every other word in a passage when processing any given word. This change allowed models to process text in parallel rather than one word at a time, which meant they could be trained on far more data, far more quickly. The original paper has now been cited over 173,000 times, making it one of the most influential computer science papers ever published.
The training pipeline for a modern LLM has three stages. First, pre-training: the model processes enormous text corpora (hundreds of billions to trillions of tokens drawn from books, websites, code repositories, and other text sources), learning statistical patterns in language. This stage accounts for over 98% of the computational cost. Second, supervised fine-tuning: the model is trained on curated examples of helpful, high-quality prompt-response pairs to shape its behaviour. Third, reinforcement learning from human feedback (RLHF): human evaluators rank the model's outputs, and a reward model is trained on those preferences to further refine the system's responses. This last stage is surprisingly effective. OpenAI found that a 1.3-billion-parameter model fine-tuned with RLHF was preferred by human evaluators over a raw 175-billion-parameter model.
The scale involved is worth appreciating. GPT-3, released in 2020, had 175 billion parameters trained on roughly 300 billion tokens. GPT-4, released in March 2023, is estimated to have approximately 1.8 trillion parameters across a Mixture of Experts architecture, and its training reportedly cost over $100 million. But the cost of this capability is falling fast. The Stanford HAI AI Index 2025 found that querying a model at GPT-3.5's performance level dropped from $20 per million tokens in November 2022 to $0.07 per million tokens by October 2024, a 280-fold reduction in eighteen months.
Generative AI: creating rather than classifying
"Generative AI" refers to systems that produce new content (text, images, audio, video, code) rather than classifying or analysing existing content. LLMs are generative (they produce text). Image generators like DALL-E, Midjourney, and Stable Diffusion are generative. So are code completion tools, music generators, and video synthesis systems.
For images specifically, many current systems use diffusion models. The concept is intuitive once you see it: during training, the system gradually adds random noise to images until they become pure static. A neural network then learns to reverse this process, recovering a coherent image from noise. After training, the model can start with pure random noise and iteratively denoise it into a new, coherent image, guided by text descriptions through cross-attention mechanisms. IBM describes the underlying intuition by analogy: think of a drop of ink spreading out in a glass of water, then imagine learning to reverse that process.
What AI can do today
The capabilities of modern AI systems are genuine and, in specific domains, impressive. Acknowledging this honestly is just as important as being honest about the limitations.
On standardised benchmarks, frontier models have reached or exceeded human performance across a range of tasks. GPT-4 passed the Uniform Bar Examination at approximately the 90th percentile (though a peer-reviewed re-evaluation suggests the figure drops to roughly the 48th percentile when compared only to candidates who actually passed). On medical licensing examinations, GPT-4o achieved 90.4% overall accuracy across 750 questions, compared to 59.3% for medical students. On the SWE-bench software engineering benchmark, AI systems went from solving 4.4% of real-world programming problems in 2023 to 71.7% in 2024.
In scientific research, some results have changed entire fields. DeepMind's AlphaFold 2 solved the protein-folding problem, a challenge that had stumped biochemistry for fifty years, with median accuracy of 0.96 Å. The AlphaFold database now contains over 200 million predicted protein structures, used by more than three million researchers. Demis Hassabis and John Jumper were awarded the 2024 Nobel Prize in Chemistry for this work. DeepMind's GNoME system discovered 2.2 million new crystal structures, including 381,000 stable materials, equivalent to roughly 800 years of experimental progress.
In healthcare regulation, the U.S. Food and Drug Administration had authorised over 1,000 AI/ML-enabled medical devices by December 2024, up from roughly six in 2015. In enterprise adoption, McKinsey's 2024 global survey found that 78% of organisations now use AI in at least one business function, up from 55% in 2023 and 20% in 2017. The Stanford HAI 2024 report noted that 42% of organisations report cost reductions from AI deployment, and 59% report revenue increases.
These are real capabilities with real economic consequences. The question is not whether AI works. It plainly does, in specific contexts, for specific tasks. The question is what kind of "working" this represents, and where the boundaries are.
What AI cannot do
This section is, in our view, more important than the one that preceded it. The capabilities are real, yes. But the limitations are systematically underreported and widely misunderstood, and if you are making professional decisions about AI, the limitations are where the risk lives.
It does not understand anything
The most important thing to grasp about current AI systems is that they process information without understanding it. This is not a philosophical quibble. It has direct practical implications.
In 2020, linguists Emily Bender and Alexander Koller published an influential paper titled "Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data" that made the case rigorously. Their core argument: a system trained only on the form of language, on the statistical patterns in text, has no way to learn meaning. Language models never encounter the real-world objects, experiences, or communicative intentions that give words their meaning. They learn that certain words tend to follow other words in certain contexts. That is a remarkable capability. It is not understanding.
Bender, together with Timnit Gebru and colleagues, later coined the term "stochastic parrot" to describe what LLMs actually do: stitch together sequences of language they have observed in training data according to statistical probabilities, without reference to meaning. The term was named the 2023 AI-related Word of the Year by the American Dialect Society, which gives you some sense of how deeply the debate has penetrated public discourse.
This is not a fringe position. Apple Research provided direct empirical evidence in a 2024 study: when they changed superficial aspects of grade-school maths problems (different names, different numbers) while keeping the underlying reasoning identical, LLM accuracy dropped significantly. Adding irrelevant information caused accuracy drops of up to 65%. The researchers concluded that they found no evidence of formal reasoning, and that the process was probabilistic pattern-matching rather than logical deduction.
François Chollet, a researcher at Google and the creator of the widely used Keras deep learning framework, has been making a related argument for years. His Abstraction and Reasoning Corpus (ARC) consists of simple visual puzzles that are trivial for humans (most people solve them at roughly 80% accuracy) but extremely difficult for AI systems. Top-performing models achieved only about 31% as of early 2024. A harder version, ARC-AGI-2, released in 2025, further demonstrates the gap. Chollet's argument is that intelligence should be measured as skill-acquisition efficiency, the ability to learn new tasks from minimal examples, rather than pattern recognition at scale.
Gary Marcus, a cognitive scientist at New York University, has been the most persistent public critic of claims about AI understanding. In his 2018 paper "Deep Learning: A Critical Appraisal", he identified ten systematic limitations of deep learning, including its inability to handle hierarchical structure, its difficulty with open-ended inference, and its failure to distinguish correlation from causation. His central claim, that current architectures cannot reliably generalise beyond their training distribution, remains well-supported by evidence.
What does this mean for you as a professional? It means that when an AI system produces an output that reads as though it understands your question, your industry, or your situation, you are observing a sophisticated statistical prediction, not comprehension. The output may be useful, often very useful, but it should be evaluated the way you would evaluate work from a fast, well-read, and confident contractor who has no actual experience in your field.
It fabricates information with confidence
AI hallucination, the generation of factually incorrect content delivered with the same fluency and apparent confidence as accurate content, is not a bug that will be fixed in the next update. It is a structural feature of how these systems work.
Because LLMs generate text by predicting statistically likely next tokens, they are optimised to produce plausible-sounding output, not accurate output. When the training data does not contain a clear answer, the model does not say "I don't know." It generates whatever sequence of words is statistically consistent with the patterns it has learned. Benchmarks typically penalise abstention, and RLHF training amplifies the problem when human evaluators prefer detailed, confident responses over hedged ones.
Hallucination rates vary enormously by model and task. On grounded summarisation, where the model has a source document to work from, some systems hallucinate less than 1% of the time. On open-ended factual questions, rates are far higher. Paradoxically, OpenAI's reasoning-focused models have shown higher hallucination rates on some benchmarks (o3 at 33% and o4-mini at 48% on the PersonQA benchmark), possibly because their extended reasoning chains provide more opportunities to drift from established facts. A Stanford study found that LLMs hallucinated at least 75% of the time when asked to identify relevant court rulings for legal questions.
The most widely reported real-world case involved the Mata v. Avianca, Inc. litigation in 2023, in which New York lawyers filed a court brief containing six entirely fabricated case citations generated by ChatGPT, complete with invented quotes, fictitious judges, and non-existent airlines. The presiding judge imposed sanctions. By 2025, courts worldwide have issued hundreds of decisions addressing AI hallucinations in legal filings.
Retrieval-Augmented Generation (RAG), a technique that gives the model access to a curated knowledge base before generating responses, is currently the most effective mitigation, reducing hallucination rates by roughly 71% when properly implemented. But the core challenge persists. As one analysis put it: more data or cleverer prompts will not fix hallucinations while the underlying incentives remain the same.
The professional implication is clear: never treat AI-generated factual claims as reliable without independent verification. This is not an interim limitation that will disappear with the next model release. It is inherent to how statistical text generation works.
It does not reason about cause and effect
Judea Pearl, a Turing Award-winning computer scientist and one of the founders of modern causal inference, has articulated a fundamental limitation of current AI. His "Ladder of Causation" defines three levels of cognitive ability: association (observing that things go together), intervention (understanding what happens when you change something), and counterfactuals (reasoning about what would have happened under different circumstances). Current AI systems, Pearl argues, operate almost entirely on the first rung. They identify correlations in data. They do not understand why things happen, what would change if you intervened, or what might have been.
This matters for anyone using AI to support decision-making. A model might identify that companies with certain characteristics tend to outperform, but it cannot tell you whether those characteristics cause outperformance or merely correlate with it. It cannot reason about what would happen if you changed one variable while holding others constant. Not because it lacks sufficient data, but because causal reasoning requires a different kind of cognitive architecture than pattern matching.
It is brittle in ways that benchmarks obscure
AI systems can be surprisingly fragile outside their training distribution. Goodfellow, Shlens, and Szegedy demonstrated in a 2014 paper that adding a tiny, imperceptible perturbation to an image of a panda, so small that no human could detect it, caused a state-of-the-art classifier to label it as a gibbon with 99.3% confidence. These adversarial examples, as they are called, transfer across different model architectures, suggesting they expose fundamental blind spots in how these systems process information.
This brittleness extends beyond adversarial attacks. Models that perform superbly on benchmarks can fail unpredictably in real-world deployment, where inputs are messier, more varied, and less like the training data than benchmark test sets. The gap between benchmark performance and production reliability is one of the most underreported aspects of AI capability.
It reflects and amplifies the biases in its training data
AI systems learn from data generated by humans, and that data reflects existing patterns of bias and inequality. Buolamwini and Gebru's 2018 "Gender Shades" study found that commercial facial analysis systems had error rates of up to 34.7% for darker-skinned women compared to just 0.8% for lighter-skinned men. Amazon built and subsequently scrapped a recruiting AI that, having been trained on a decade of résumés predominantly from men, learned to penalise résumés containing the word "women's" and the names of women's colleges. ProPublica's investigation of the COMPAS recidivism algorithm found that Black defendants were nearly twice as likely as white defendants to be incorrectly flagged as high risk.
Subsequent research has shown that it is mathematically impossible for a risk-assessment system to be simultaneously fair by both calibration and error-rate metrics when base rates differ between groups. This is not a problem that more data or better engineering can resolve; it reflects a genuine tension between different definitions of fairness.
It consumes significant energy
Training and running large AI models requires substantial computational resources. Strubell and colleagues estimated in 2019 that training a single large Transformer model with neural architecture search produced CO₂ emissions equivalent to five cars over their entire lifetimes. The International Energy Agency reported in 2025 that data centres consumed approximately 415 terawatt-hours of electricity in 2024 (about 1.5% of global electricity consumption) and projects this will more than double to roughly 945 TWh by 2030, with AI's share rising from 5–15% to 35–50% of data centre power demand. This is not a reason to avoid AI, but it is a cost that responsible adoption should account for.
Five misconceptions worth correcting
Beyond the specific technical limitations, there are several broader misconceptions about AI that persistently distort professional decision-making.
AI is not a single technology. It is an umbrella term covering dozens of distinct techniques: machine learning, natural language processing, computer vision, robotics, expert systems, and many more. Asking "what can AI do?" is a bit like asking "what can chemistry do?" The answer depends entirely on which branch you are talking about and what problem you are applying it to.
AI is not sentient or conscious. A 2023 report co-authored by researchers including David Chalmers and Yoshua Bengio concluded unequivocally that no current AI systems are conscious. The systems produce human-like language, which triggers a well-documented cognitive bias called the ELIZA effect, our tendency to attribute understanding to any system that produces language resembling human communication. The name comes from ELIZA, a 1966 chatbot that used simple pattern matching to mimic a therapist. Its creator, Joseph Weizenbaum, was disturbed to discover that even his own secretary asked him to leave the room for privacy during her conversations with the programme. Weizenbaum wrote that he had not realised how short exposures to a simple programme could induce powerful delusional thinking in normal people. Murray Shanahan of Google DeepMind has warned that casual use of words like "believes" and "thinks" when describing AI systems actively encourages anthropomorphism and obscures how they actually work.
AI is not about to become "artificial general intelligence." AGI, a system matching human-level cognitive flexibility across all intellectual tasks, remains theoretical. The largest survey of AI researchers (2,778 respondents who published at top venues) found a median estimate of a 50% chance of AI outperforming humans at every possible task by 2047. But the uncertainty is enormous: leading researchers' estimates range from 2026 to "not in our lifetimes," and the survey authors noted that seemingly minor changes in question wording produced large shifts in responses. Sam Altman suggests a few thousand days. Yann LeCun insists it will take at least a decade and require entirely new approaches. Stuart Russell emphasises that the alignment problem, ensuring a superintelligent system shares human values, may be a harder challenge than capability itself. For professionals making decisions today, the honest answer is: nobody knows, and anyone who sounds certain is not being straight with you. Every AI system in production today is narrow AI, designed for specific tasks, incapable of transferring skills to genuinely novel domains the way a human can.
"AI-powered" is often a meaningless label. Regression models, statistical analyses, and rule-based automation are routinely marketed as AI. Nearly 200 S&P 500 companies used the term on earnings calls in a recent two-month period. When evaluating a product described as "AI-powered," ask: does it learn from data and improve over time, or is it conventional software with a contemporary label?
AI is not infallible or objective. Because these systems learn from human-generated data, they inherit human biases and can amplify them at scale. Automation bias, our tendency to defer to automated outputs and overlook contradictory evidence, is well documented in research spanning healthcare, law, and public administration. The combination of AI's confident presentation and human deference to automated systems creates a real risk of unexamined error at scale.
Why it seemed to appear overnight
If AI has existed for seventy years, why does it feel like it arrived suddenly in late 2022?
The answer is that ChatGPT was a cultural moment, not a technical breakthrough. The underlying model, GPT-3, had been available since June 2020, two and a half years before ChatGPT launched. The technical capabilities were already known to researchers and developers. What OpenAI did in November 2022 was package those capabilities into a free, web-based conversational interface that anyone could use without coding, without an API key, without any technical setup at all. ChatGPT reached one million users in five days and 100 million monthly active users by January 2023, the fastest adoption of any consumer application in history. TikTok took nine months to reach the same milestone. Instagram took roughly two and a half years.
The technology existed. What changed was accessibility. And that distinction matters, because it means the professionals who encountered AI for the first time through ChatGPT are seeing not the birth of a new technology but the public debut of capabilities that had been developing for years. Understanding that timeline helps you calibrate your response: this is not moving as fast as it feels, but it has been building longer than most people realise.
Making sense of the terminology
One final source of confusion worth addressing: the proliferation of overlapping terms that are often used interchangeably despite meaning different things.
The simplest way to think about the relationship is as a set of nesting circles. Artificial intelligence is the broadest category: any system that performs tasks typically requiring human intelligence. Machine learning is a subset of AI, covering systems that learn from data rather than following explicit rules. Deep learning is a subset of machine learning, using multi-layered neural networks to learn hierarchical representations. Generative AI is an application of deep learning, focused on creating new content rather than classifying existing content.
Not all AI involves machine learning; rule-based expert systems are AI but do not learn. Not all machine learning involves deep learning; decision trees and support vector machines are ML techniques that do not use neural networks. And not all deep learning is generative; an image classifier uses deep learning but does not create new images.
AI is also distinct from automation, which follows predetermined rules without learning or adapting (a dishwasher is automated but not intelligent), and from robotics, which involves physical machines that may or may not incorporate AI (many industrial robots follow fixed programming with no learning component).
Perhaps the most useful framing comes from philosopher John Searle's Chinese Room argument, proposed in 1980 and directly relevant to modern LLMs. Searle asks you to imagine a person in a room who receives messages in Chinese, consults a rule book to look up appropriate responses, and sends back answers in Chinese, all without understanding a word of the language. The person is processing symbols according to rules; they are not comprehending meaning. Searle argued that this is essentially what computers do: computation provides syntax but never semantics. In 2025, a philosophical analysis from UNSW suggested that understanding might be better conceived as a spectrum rather than a binary property, and that current LLMs exist somewhere in the middle. That feels right to us. But for practical decision-making, the important point is that wherever these systems fall on that spectrum, they are much closer to the "syntax without semantics" end than their confident, fluent outputs suggest.
What this means for you
The evidence assembled here points in two directions at once, and holding both is the whole job.
The capabilities are real and consequential. AI systems can already draft documents, analyse data, generate code, identify patterns in medical images, accelerate scientific research, and automate a wide range of knowledge-work tasks. These are not parlour tricks. They represent genuine productivity gains in specific, well-defined contexts. Ignoring them is a mistake.
But the capabilities are also narrow, statistical, and brittle in ways that benchmark scores alone cannot reveal. These systems do not understand, reason, or know in any human sense. They hallucinate. They reflect biases in their training data. They fail unpredictably when inputs diverge from what they have seen before. Trusting them uncritically is also a mistake.
And then there is the noise. AI washing obscures what products actually do. Anthropomorphism encourages misplaced trust. AGI timelines that even world experts cannot agree on distort investment decisions. The gap between what AI actually is and what public discourse claims it to be has become a material risk for anyone making decisions. The professionals best positioned to benefit from AI are those who understand it clearly enough to use it confidently where it works and sceptically where it does not.
Every decision about AI adoption should be grounded in what the technology demonstrably does in your specific context, not in extrapolations from headlines, marketing claims, or predictions about a general intelligence that remains firmly theoretical. The goal is to be competent. Not excited. Not afraid. Competent.
That is what the rest of this library is designed to help you become.
Key takeaways
AI is a seventy-year-old scientific discipline, not a product that appeared in 2022. What changed recently was public accessibility, not the fundamental technology.
Modern AI systems work by learning statistical patterns from data, primarily through neural networks and, for language, through next-token prediction. This is a powerful mechanism, but it is not understanding, reasoning, or comprehension in any human sense.
Current capabilities are genuine and, in specific domains, impressive: medical diagnosis, code generation, scientific discovery, and many professional tasks have been meaningfully improved. These are narrow capabilities, however. Each system excels at its specific task and cannot generalise to novel domains.
The limitations are structural, not incidental. Hallucination, bias, brittleness, the absence of causal reasoning, and the lack of genuine understanding are features of how these systems work, not bugs awaiting a fix. They should inform every decision about how and where to deploy AI.
The terminology matters. "AI" is an umbrella covering many distinct techniques. "AI-powered" is often a marketing label. AGI remains theoretical. When someone says "AI," ask what they mean specifically. That question alone will improve the quality of every subsequent conversation.
Further reading
For readers who want to go deeper into the topics covered here, we recommend the following:
On how AI works: Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (4th edition, 2021) — the definitive textbook, accessible to motivated non-specialists. Melanie Mitchell, Artificial Intelligence: A Guide for Thinking Humans (2019) — an excellent non-technical overview that takes the limitations seriously.
On the limitations: Emily Bender and Alexander Koller, "Climbing towards NLU" (ACL 2020) — the key paper on why language models do not understand meaning. Gary Marcus and Ernest Davis, Rebooting AI: Building Artificial Intelligence We Can Trust (2019) — a rigorous case for what current approaches cannot do.
On the societal implications: Judea Pearl and Dana Mackenzie, The Book of Why: The New Science of Cause and Effect (2018) — essential for understanding why correlation-based AI cannot replace causal reasoning. The Stanford HAI AI Index (annual) — the most comprehensive data-driven overview of the field's progress and impact.
On bias and fairness: Joy Buolamwini and Timnit Gebru, "Gender Shades" (2018) — the landmark study on racial and gender bias in commercial AI systems. Cathy O'Neil, Weapons of Math Destruction (2016) — an accessible account of how algorithmic systems can reinforce inequality.
Key takeaways
- —The field is seventy years old. ChatGPT did not invent AI; it gave the public a free, conversational way to interact with capabilities researchers had been building for decades.
- —At bottom, modern AI learns statistical patterns from data and predicts what comes next. That mechanism produces impressively fluent output, but fluency is not understanding, and the distinction matters every time you rely on one of these systems.
- —The practical results are genuine. Medical diagnosis, code generation, scientific discovery, document drafting have all improved measurably. Every one of these systems is narrow, though. None can transfer what it knows to a genuinely new problem the way a person can.
- —Hallucination, bias, brittleness, and the absence of causal reasoning are built into how these systems work. They are not bugs on a roadmap. Treat them as permanent constraints when deciding where and how to deploy AI.
- —The most useful question you can ask in any AI conversation is also the simplest. "What do you mean by AI, specifically?" It cuts through the hype, the anthropomorphism, and the wildly uncertain AGI timelines in one move.
Want to go deeper? See the related professional guide →
Prefer a structured route?
Stay informed
The AI Primer Briefing is a weekly digest of what matters in AI — curated for professionals, free of breathless hype.