Concept
LARGE LANGUAGE MODELS
What large language models are, how they work, and what they can and can't do — explained for technology and business decision-makers.
Large language models are the technology behind ChatGPT, Claude, Gemini, and the rest of the current wave of AI systems. They are trained by predicting the next token in a sequence across enormous text datasets — a simple objective that, at sufficient scale, produces systems capable of complex reasoning, code generation, translation, summarization, and much more.
The key architectural innovation is the transformer (introduced in the 2017 paper "Attention Is All You Need"), which enables parallel processing of long sequences and scales efficiently with compute. Every major frontier AI system today is built on transformer architecture or a close derivative.
Capabilities and limits: LLMs are impressive at pattern matching, synthesis, and fluent generation. They are brittle at precise arithmetic, reliable factual retrieval, and tasks requiring strict logical consistency. They hallucinate — generate plausible-sounding but false information — with a frequency that varies by model, task, and prompt design. Understanding this is essential for anyone deploying LLMs in production.
The economics: Training a frontier model costs hundreds of millions of dollars. Inference is much cheaper and falling. This creates a clear stratification: a handful of labs can train frontier models; everyone else builds applications on top of APIs. The strategic question for most businesses is not whether to train their own model but which API to build on and how to avoid lock-in.
Context length and retrieval as architecture choices: One of the most consequential recent developments is the expansion of context windows — from a few thousand tokens to hundreds of thousands in current frontier models. This changes the architecture of LLM applications: problems previously solved by retrieval-augmented generation (chunking documents, embedding them, fetching relevant pieces at query time) can sometimes be solved by simply loading the entire document into context. The tradeoff is cost and latency. Understanding when retrieval is necessary versus when long-context is sufficient is now a core engineering judgment in LLM application design, with significant implications for system complexity, cost at scale, and the freshness of information the model can access.
The commoditization trajectory: Frontier model capabilities that required GPT-4 in 2023 are available in open-source models running locally in 2025. This commoditization trajectory is fast and predictable. For businesses building on LLM APIs, it means the competitive advantage cannot rest on exclusive access to a particular model — it must rest on proprietary data, workflow integration, or distribution. The model is infrastructure; the moat is elsewhere.