Definition

In-context learning (ICL) is the ability of large language models to perform tasks by conditioning on examples or instructions provided in the prompt, without any gradient updates or fine-tuning. The model adapts its behavior purely through forward-pass computation on the input context.

Key Intuition

A sufficiently large pretrained model has seen enough examples during training that, given a few demonstrations of a new task in its prompt, it can infer the pattern and continue accordingly. The prompt effectively programs the model at inference time.

History/Origin

gpt-2 (Radford et al., 2019) first hinted at in-context learning, demonstrating that a language model could perform tasks like translation and summarization in a zero-shot setting without explicit training. gpt-3 (Brown et al., 2020) made ICL a central focus, systematically evaluating few-shot (demonstrations in prompt), one-shot (single example), and zero-shot (instructions only) performance across dozens of benchmarks. GPT-3’s 175B parameters proved sufficient for few-shot ICL to rival fine-tuned models on many tasks.

Relationship to Other Concepts

ICL is an alternative to fine-tuning for task adaptation, requiring no gradient updates. It builds directly on pretraining quality and scale. chain-of-thought prompting extends ICL by including reasoning steps in demonstrations. ICL’s effectiveness at scale connects to scaling-laws, as it appears to be an emergent capability of sufficiently large models.

Notable Results

GPT-3 few-shot achieved 86.4% on LAMBADA (vs. 76.2% zero-shot), closed the gap with fine-tuned BERT on SuperGLUE, and performed two-digit arithmetic with high accuracy from examples alone. Subsequent work showed ICL performance improves log-linearly with model scale.

Open Questions

  • The mechanistic explanation for how ICL works internally (implicit Bayesian inference, mesa-optimization, or task retrieval from training data).
  • Why ICL is sensitive to example ordering, formatting, and label distributions.
  • Whether ICL and fine-tuning learn fundamentally different representations for the same task.

Sources

  • Language Models are Few-Shot Learners (File, DOI)
  • Language Models are Unsupervised Multitask Learners (File, URL)