Skip to content
Glossary

Large Language Model (LLM)

A large language model (LLM) predicts the next word. Billions of times in a row. What emerges sounds like understanding — but it is statistics. Trained on billions of texts, LLMs recognize relationships between words and generate coherent language from those patterns.

Transformer: the architecture behind it

The foundation comes from the paper "Attention Is All You Need" (2017). Self-attention: for every word, the model weighs which other words in the context are relevant. This enables parallel training on massive datasets — and made models with trillions of parameters possible in the first place.

GPT, Claude, Llama — the differences

GPT-4/5 from OpenAI uses an estimated 1.8 trillion parameters in a mixture-of-experts architecture, with only about 280 billion active per query. Claude from Anthropic does not publish parameter counts, focusing on safety and long context windows. Llama 3.1 from Meta is open source, 8 to 405 billion parameters, trained on over 15 trillion tokens.

Powerful, but not reliable

LLMs write, summarize, translate, and code. They do not understand a single sentence. They calculate probabilities. That makes them powerful — and dangerous when factual accuracy matters.

The solution: RAG systems connect LLMs with real data. Prompt engineering steers the output. Both together reduce hallucinations by up to 68%.

Questions about a term?

We are happy to explain what this means for your business.

Schedule a consultation