Definition
In-context learning (ICL) is the ability of large language models to perform tasks by conditioning on examples or instructions provided in the prompt, without any gradient updates or fine-tuning. The model adapts its behavior purely through forward-pass computation on the input context.
Key Intuition
A sufficiently large pretrained model has seen enough examples during training that, given a few demonstrations of a new task in its prompt, it can infer the pattern and continue accordingly. The prompt effectively programs the model at inference time.
History/Origin
gpt-2 (Radford et al., 2019) first hinted at in-context learning, demonstrating that a language model could perform tasks like translation and summarization in a zero-shot setting without explicit training. gpt-3 (Brown et al., 2020) made ICL a central focus, systematically evaluating few-shot (demonstrations in prompt), one-shot (single example), and zero-shot (instructions only) performance across dozens of benchmarks. GPT-3’s 175B parameters proved sufficient for few-shot ICL to rival fine-tuned models on many tasks.
Relationship to Other Concepts
ICL is an alternative to fine-tuning for task adaptation, requiring no gradient updates. It builds directly on pretraining quality and scale. chain-of-thought prompting extends ICL by including reasoning steps in demonstrations. ICL’s effectiveness at scale connects to scaling-laws, as it appears to be an emergent capability of sufficiently large models.
Notable Results
GPT-3 few-shot achieved 86.4% on LAMBADA (vs. 76.2% zero-shot), closed the gap with fine-tuned BERT on SuperGLUE, and performed two-digit arithmetic with high accuracy from examples alone. Subsequent work showed ICL performance improves log-linearly with model scale.
Open Questions
- The mechanistic explanation for how ICL works internally (implicit Bayesian inference, mesa-optimization, or task retrieval from training data).
- Why ICL is sensitive to example ordering, formatting, and label distributions.
- Whether ICL and fine-tuning learn fundamentally different representations for the same task.