GPT-OSS 120B is an open-weight reasoning model that excels at complex problem-solving, coding, and tool use with transparent chain-of-thought explanations and adjustable thinking depth.
Context tokens
131,072
Output tokens
65,536
Training cutoff
Jun 1, 2024
Released
Aug 5, 2025
We handle the model, stream, service_tier, safety_identifier, and user parameters.
Schema documentationSupported languages
MCP
You can give models new capabilities using connectors and remote MCP servers. These tools give the model the ability to connect to and control external services when needed to respond to a user's prompt. These tool calls can either be allowed automatically, or restricted with explicit approval required by you as the developer.
Seclai Content Tools
Inspect source documents connected to your account. Includes tools for loading full content, reading character ranges, searching within documents, viewing stats, and listing available content sources. When a source_connection_content_version_id is provided in agent run metadata it is used as the default. Otherwise the model can discover content via list_content_sources.
Seclai Knowledge Base
Search your knowledge bases using semantic similarity. Includes search_knowledge_base and list_knowledge_bases. When a knowledge_base_id is provided in the prompt or agent run metadata it is used as the default. Otherwise the model can discover available knowledge bases at runtime.
Web Search
Web search allows models to access up-to-date information from the internet and provide answers with sourced citations. To enable this, use the web search tool in the Responses API or, in some cases, Chat Completions.
Priority and flex tiers trade speed for cost.
| Option | Description | Input credits (per 1k tokens) | Output credits (per 1k tokens) |
|---|---|---|---|
| Flex | Flex processing provides lower costs for Responses or Chat Completions requests in exchange for slower response times and occasional resource unavailability. It's ideal for non-production or lower priority tasks, such as model evaluations, data enrichment, and asynchronous workloads. | 1.00 | 3.99 |
| Priority | Priority processing delivers significantly lower and more consistent latency compared to Standard processing while keeping pay-as-you-go flexibility. Priority processing is ideal for high-value, user-facing applications with regular traffic where latency is paramount. Priority processing should not be used for data processing, evaluations, or other highly erratic traffic. | 3.49 | 13.96 |
| Standard | — | 2.00 | 7.98 |