OpenAI o1

2 min · model, openai, reasoning

OpenAI o1

OpenAI's first dedicated reasoning model. Uses chain-of-thought "thinking tokens" — an internal scratchpad that the model reasons through before producing a final response — rather than answering immediately. Marks a shift from single-pass generation toward deliberate multi-step reasoning as a first-class capability.

Release Timeline

o1-preview — September 12, 2024; early access for ChatGPT Plus and Team users
o1-mini — September 12, 2024; faster, cheaper variant optimized for coding and math; 80% cheaper than o1-preview
o1 (full) — December 17, 2024; full release with 200K context and improved performance across all domains

Architecture

Internal chain-of-thought: the model generates reasoning tokens (not shown to the user by default) before producing its output. More thinking tokens generally improve performance on hard problems at the cost of higher latency and token cost. Thinking tokens count toward billing on o1.

Context Window

o1: 200K tokens (128K max output)
o1-mini and o1-preview: 128K tokens

Benchmarks

Benchmark	Score
AIME 2024 (single sample)	74% (11.1/15)
AIME 2024 (consensus, 64 samples)	83% (12.5/15)
MATH-500	97%
GPQA Diamond (PhD science)	77.3%

GPQA Diamond context: PhD-level human experts score ~69.7%. o1 was the first model to surpass expert-level performance on this benchmark. AIME is the American Invitational Mathematics Examination; a score of 13.9/15 (best-of-1000 sample re-ranking) places the model in the top 500 US high school students nationally.

Pricing

Input: $15.00 per million tokens
Output: $60.00 per million tokens

Substantially more expensive than gpt-4o. The cost reflects the additional thinking tokens generated internally. o1-mini is significantly cheaper.

Strengths

Hard math and science: best-in-class at launch on competition math, physics, chemistry, and biology at PhD level
Complex multi-step reasoning: problems that require planning, backtracking, or sequential deduction
Code correctness: produces fewer logical errors on algorithmic problems than non-reasoning models

Weaknesses

Latency: thinking tokens mean noticeably slower responses than standard models
Cost: $60/M output tokens is expensive; impractical for high-volume use
No tool use at launch: original o1 release did not support web search or function calling (added later)
No streaming of thinking tokens by default

Successors

o1 is superseded by the o3 family (December 2024 benchmarks, January 2025 release), o3-mini, and o4-mini. The reasoning model line continues as OpenAI's dedicated compute-intensive tier alongside the gpt-4o and GPT-5 families.

gpt-4o · deepseek-r1 · extended-thinking · evals

Sources

linked from

GPT-4o

OpenAI o1

OpenAI o1

Release Timeline

Architecture

Context Window

Benchmarks

Pricing

Strengths

Weaknesses

Successors

Related

Sources