Definition
Chain-of-thought (CoT) prompting is a technique where the model is encouraged to generate intermediate reasoning steps before producing a final answer. By decomposing complex problems into sequential steps, CoT dramatically improves performance on tasks requiring multi-step reasoning.
Key Intuition
Standard prompting asks a model to jump directly to an answer, which fails on problems requiring arithmetic, logic, or multi-step inference. CoT gives the model “scratch space” to work through the problem, mirroring how humans solve complex problems by writing out their reasoning. The intermediate tokens carry forward information that would otherwise be lost.
History/Origin
Wei et al. (2022) at Google introduced chain-of-thought prompting (see chain-of-thought-paper), demonstrating that adding “Let’s think step by step” or including worked examples in few-shot prompts unlocked dramatic improvements on math and reasoning benchmarks. Kojima et al. (2022) showed that zero-shot CoT (simply appending “Let’s think step by step”) also works. Wang et al. (2022) introduced self-consistency, sampling multiple reasoning paths and taking the majority vote.
Relationship to Other Concepts
CoT is a form of in-context-learning that structures the model’s generation process. It benefits from scaling-laws, as CoT reasoning appears to be an emergent ability in models above roughly 100B parameters. CoT connects to broader themes of test-time compute scaling, where spending more tokens at inference improves results. It was foundational to later work on reasoning models like o1 and o3.
Notable Results
CoT prompting improved gpt-3-scale model performance on GSM8K (grade school math) from roughly 18% to 57%. On the MATH benchmark, CoT with self-consistency pushed accuracy significantly higher. The technique also improved commonsense reasoning, symbolic manipulation, and multi-hop question answering.
Open Questions
- Whether CoT reasoning is faithful (reflecting actual model computation) or post-hoc rationalization.
- How to automatically generate optimal CoT demonstrations.
- The relationship between CoT and the internal representations the model learns during pretraining.