openaiapillmaijavascriptresponses-apistructured-outputs

OpenAI API: Responses API and Structured Outputs Specification and Implementation Guide

Sloth255

February 26, 2026·11 min read·2,415 words

 IntroductionEntering 2025, the OpenAI API has reached a major turning point. In March 2025, the Responses API was officially released (GA), consolidating the conversation capabilities of the Chat Completions API and the tool integration features of the Assistants API into a single endpoint. The legacy Assistants API is scheduled for decommissioning on August 26, 2026.
Furthermore, Structured Outputs, which ensures model outputs strictly adhere to a JSON schema, demonstrates its true potential when combined with the Responses API. Since its release in August 2024, it has significantly improved the reliability of agentic workflows and data extraction pipelines.
This article first outlines the overall OpenAI API landscape as of 2025, then dives into the specifications and implementation details of these two key topics.
Source Information

Specifications and performance data in this article refer to the OpenAI Official Documentation (developers.openai.com/api), the Official Migration Guide (developers.openai.com/api/docs/guides/migrate-to-responses), and the Official Blog (openai.com/index/introducing-structured-outputs-in-the-api).
 1. API Landscape OverviewThe major categories of the OpenAI API as of 2025 are organized as follows:


Category
Endpoint
Positioning


Responses API
POST /v1/responses
Recommended for new projects. Unified interface for agents.

Chat Completions API
POST /v1/chat/completions
Continued support (no planned deprecation).

Realtime API
WebRTC / WebSocket / SIP
Real-time bidirectional voice and text.

Embeddings
POST /v1/embeddings
Vector search and RAG.

Images
POST /v1/images/generations
Image generation and editing.

Audio
POST /v1/audio/transcriptions
Speech recognition and TTS.

OpenAI has positioned the Responses API as the primary destination for new features, and the decommission of the Assistants API on August 26, 2026, has been officially confirmed (Source: Official Migration Guide). While the Chat Completions API will continue to be supported without a decommissioning date, the Responses API is the current recommendation for new projects.
Structured Outputs is not an independent endpoint like those in the table above, but rather an output format control option available for both the Responses API and the Chat Completions API. It is specified using the text.format parameter in the former and the response_format parameter in the latter. This article focuses particularly on its combination with the Responses API.
 2. Responses API 2.1 OverviewThe Responses API is a new primitive that succeeds the Chat Completions API and integrates the features of the Assistants API. It reached General Availability (GA) in March 2025.
The most significant change is the ability to persist conversation state on the server side. Unlike traditional Chat Completions, which required including the entire conversation history in every request, the Responses API allows you to continue a conversation simply by passing a previous_response_id.
 2.2 Basic Requestimport OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4o",
  input: "Tell me the weather in Tokyo.",
});

console.log(response.output_text); // Retrieve text directly using the output_text helper
 2.3 The output_text Helperoutput_text is a convenience property provided by the OpenAI SDK, not a part of the API specification itself.
In the raw response from the Responses API, the text is nested within the following structure:
response.output[0].content[0].text
output_text summarizes the process of traversing this path. Internally, it returns the text of the first element in the output array that has type: "message" and a content block with type: "output_text".
// Both result in the same output
console.log(response.output_text);
console.log(response.output[0].content[0].text);
However, it may not work as expected if the first item in the output array is not a text message—for example, when a tool call occurs. For robust agentic code that uses tools, it is better to loop through the output array and check the type.
for (const item of response.output) {
  if (item.type === "message") {
    for (const block of item.content) {
      if (block.type === "output_text") {
        console.log(block.text);
      }
    }
  }
}
 2.4 role in Input MessagesWhen passing an array to input, you can specify a role for each message. The role tells the model "who is making this statement," and there are three types.


role
Meaning
Typical Usage


system
Instructions from the system (developer)
Defines the model's behavior, tone, and constraints. Generally placed once at the start of a conversation.

user
Input from the end-user
Represents the user's statements or questions.

assistant
The model's own past statements
Used in multi-turn conversations to provide previous responses as history.

const response = await client.responses.create({
  model: "gpt-4o",
  input: [
    {
      role: "system",
      content: "You are a helpful Japanese assistant. Please answer concisely.",
    },
    {
      role: "user",
      content: "Tell me about JavaScript array methods.",
    },
  ],
});
If you pass a string directly to input (as in the sample in 2.2), that string is treated as a message with role: "user". Use the array format if you want fine-grained control over the model's behavior or if you want to provide system instructions.
 2.5 Response StructureUnlike Chat Completions' choices, results are returned in an output array.
{
  "id": "resp_68af4030...",
  "object": "response",
  "created_at": 1756315696,
  "model": "gpt-4o",
  "output": [
    {
      "id": "msg_68af4033...",
      "type": "message",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The weather in Tokyo is sunny."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 12,
    "total_tokens": 27
  }
}
 2.6 Multi-turn ConversationThere are two ways to achieve multi-turn conversation with the Responses API.
 1. Passing History as an Array in inputThis is the traditional method from the Chat Completions API. You maintain and manage the conversation history on the client side and include all messages in every request.
const history = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Tell me the weather in Tokyo." },
  { role: "assistant", content: "It is sunny in Tokyo today." },
];

const response = await client.responses.create({
  model: "gpt-4o",
  input: [...history, { role: "user", content: "How about tomorrow?" }],
});
Since state is not kept on the server, the advantage is that you can freely manipulate the history, such as deleting or summarizing specific messages. However, as the conversation grows longer, the number of input tokens increases, impacting both cost and latency.
 2. Passing previous_response_id (Responses API exclusive)This is the server-side state management method newly introduced with the Responses API. By simply passing the previous response ID, OpenAI's servers will carry over the conversation history.
// First turn
const response1 = await client.responses.create({
  model: "gpt-4o",
  input: "Tell me the weather in Tokyo.",
});

// Second turn: Context is inherited using only previous_response_id
const response2 = await client.responses.create({
  model: "gpt-4o",
  input: "How about tomorrow?",
  previous_response_id: response1.id,
});
The client doesn't need to maintain or send the entire history, keeping the request size small. Furthermore, when using reasoning models like o1 or o3, thinking tokens are preserved between turns, which is a significant advantage as it improves accuracy for tasks requiring continuous reasoning.
To use this method, the response must be saved on the server. Note that while the store parameter defaults to true, setting it to store: false will make reference by ID impossible.
 Selection Criteria

Situation
Recommended Method


Need to modify/edit history mid-way (RAG injection, deleting old messages, etc.)
① Array-based

Solving complex multi-turn problems with reasoning models (o1, o3, etc.)
② previous_response_id

Simple chat where you want to avoid manual history management
② previous_response_id

Using the Chat Completions API
① Array-based (previous_response_id is for Responses API only)

If persistent conversation management is required, integration with the Conversations API is also possible.
 2.7 Built-in ToolsOne of the greatest benefits of the Responses API is the set of built-in tools available without extra infrastructure.
const response = await client.responses.create({
  model: "gpt-4o",
  input: "Search for the latest OpenAI news and summarize it.",
  tools: [
    { type: "web_search_preview" },
    { type: "file_search" },
    { type: "code_interpreter" },
  ],
});


Tool
Purpose


web_search_preview
Web search equivalent to ChatGPT.

file_search
RAG search over uploaded files.

code_interpreter
Code execution and data analysis.

computer_use
Computer operation agent.

mcp
Connection to third-party MCP servers.

 MCP (Model Context Protocol) IntegrationConnectors are OpenAI-maintained MCP wrappers for popular services like Google Workspace or Dropbox, while Remote MCP servers are any server on the public internet that implements the remote MCP protocol (Source: OpenAI Connectors and MCP Guide).
The Responses API can integrate with remote MCP servers that support Streamable HTTP or HTTP/SSE transport protocols. When a tool is specified, the API first retrieves the list of available tools from the server (mcp_list_tools), and the model then calls the necessary tools from that list.
Basic Connection Example (Source: OpenAI Using tools Guide)
const response = await client.responses.create({
  model: "gpt-4o",
  input: "Roll 2d6 and tell me the result.",
  tools: [
    {
      type: "mcp",
      server_label: "dice_server",           // Identifier for the server (arbitrary)
      server_url: "https://example.com/mcp", // URL of the MCP server
      require_approval: "never",             // Automatically approve tool calls
    },
  ],
});

console.log(response.output_text);
Approval Control with require_approval
By default, all tool calls require explicit approval from the developer. You can control this behavior with require_approval (Source: OpenAI Connectors and MCP Guide).
const response = await client.responses.create({
  model: "gpt-4o",
  input: "Tell me about the MCP specification's transport protocols.",
  tools: [
    {
      type: "mcp",
      server_label: "deepwiki",
      server_url: "https://mcp.deepwiki.com/mcp",
      require_approval: {
        never: {
          // These two tools don't require approval; others do
          tool_names: ["ask_question", "read_wiki_structure"],
        },
      },
    },
  ],
});


require_approval Value
Behavior


"never"
Automatically approve all tool calls.

{ never: { tool_names: [...] } }
Automatically approve specified tools; others require approval.

Omitted (Default)
All tool calls require approval.

Connecting to Servers Requiring Authentication
If the MCP server requires authentication, pass the token using the headers parameter.
const response = await client.responses.create({
  model: "gpt-4o",
  input: "Fetch some data.",
  tools: [
    {
      type: "mcp",
      server_label: "my_server",
      server_url: "https://my-mcp-server.example.com/mcp",
      require_approval: "never",
      headers: {
        Authorization: `Bearer ${process.env.MCP_ACCESS_TOKEN}`,
      },
    },
  ],
});
Tool List Caching: While retrieving the tool list from the MCP server (mcp_list_tools) occurs per request, in multi-turn conversations using previous_response_id, the tool list is included in the previous response, so re-retrieval is skipped (Source: OpenAI Cookbook: MCP Tool Guide).
 2.8 Key Request Parameters

Parameter
Type
Description


model
string
Model name (e.g., gpt-4o).

input
string / array
Text or multimodal input.

previous_response_id
string
Previous response ID for multi-turn.

tools
array
Definitions of tools to use.

text.format
object
Specification for Structured Outputs (see below).

stream
boolean
Enable streaming.

store
boolean
Whether to save the response on the server (default: true).

reasoning_effort
string
Adjust reasoning depth (low / medium / high).

background
boolean
Asynchronous execution in background mode.

 2.9 Comparison with Chat Completions

Feature
Chat Completions
Responses API


Conversation State
Client-side (entire history required)
Server-side (previous_response_id)

Web Search
Manual implementation needed
Built-in (web_search_preview)

File Search / RAG
Manual implementation needed
Built-in (file_search)

Code Execution
Manual implementation needed
Built-in (code_interpreter)

MCP Connection
Not supported
Native support for remote MCP

Reasoning Token Persistence
Discarded between turns
Can be persisted

output_text Helper
No
Yes

Format Specification
response_format
text.format

New Feature Delivery
Limited
Primary destination

 2.10 Reasoning Models (o-series)Distinct from the GPT series, OpenAI offers a group of models called Reasoning Models. These models execute an internal step-by-step thinking process (Chain-of-Thought) before generating an answer. This internal thinking is counted as reasoning tokens, which are not included in the final output.
They demonstrate significantly higher accuracy than GPT-4o for tasks requiring multi-step reasoning, such as mathematics, coding, logical inference, and complex analysis. On the other hand, latency and costs are higher because thinking takes time.
Major models as of 2025 are:


Model
Characteristics


o1 / o1-mini
First generation reasoning models.

o3 / o3-mini
High-precision, high-performance successor series.

o4-mini
Model balanced for cost and performance.

Reasoning depth can be adjusted with the reasoning_effort parameter (see 2.8). low is a lightweight reasoning that reduces latency and cost, while high is a deep reasoning for maximum precision.
const response = await client.responses.create({
  model: "o3",
  input: "Find the general term for this sequence: 1, 1, 2, 3, 5, 8, 13, ...",
  reasoning_effort: "high",
});
Additionally, in multi-turn conversations using previous_response_id (see 2.6, method 2), reasoning tokens are persisted between turns. When digging deeper into the same problem over multiple turns, accuracy and efficiency improve because the model inherits the previous turn's reasoning instead of starting over from scratch.
 3. Structured Outputs 3.1 Overview and BackgroundReliably forcing LLM outputs into JSON format has been a key challenge for application integration. OpenAI has solved this incrementally:
JSON mode (legacy feature) ensures syntactically correct JSON, but does not guarantee adherence to a schema. There was a risk of missing required fields or additional unwanted fields.
Structured Outputs, released in August 2024, guarantees 100% adherence to a JSON schema specified by the developer (Source: OpenAI Official Blog).
Internal evaluations (evals) at OpenAI show that gpt-4o-2024-08-06 achieves 100% adherence to complex JSON schemas using Structured Outputs, a massive leap from the less than 40% of gpt-4-0613 (Source: OpenAI Official Blog).
 3.2 How It WorksOpenAI API achieves structured outputs by converting the specified JSON Schema into a Context-Free Grammar (CFG). This grammar constrains the tokens that can be generated during sampling, enforcing schema compliance. Because of this, the first time a new schema is sent, there is additional latency for pre-processing the grammar, but subsequent requests with the same schema do not incur this penalty.
Note (Fine-tuned models): For fine-tuned models, additional latency occurs on the first request using a new schema. Subsequent requests with the same schema do not have this. Other models do not have this limitation. (Source: Structured Outputs Guide)
 3.3 Two Ways to UseStructured Outputs is provided in two forms on the API.
The first is via Function calling (tools), enabled by setting strict: true within the function definition. This is available on all models from gpt-4-0613 onwards and is suitable for connecting model capabilities with applications (e.g., accessing a database query function).
The second is via the response_format / text.format parameter, where specifying a json_schema is suitable for the model to respond to the user in a structured format (e.g., displaying different parts separately in a math tutorial UI).
 3.4 Implementation Example (Responses API)In the Responses API, the parameter has moved from response_format to text.format (Source: Official Migration Guide).
 About Schema descriptionIt is strongly recommended to include a description for each field in your JSON Schema. The description acts as an instruction to the model, providing clues to help it correctly determine what should go in that field. General field names like explanation or output are particularly prone to being misunderstood by the model without a description.
When using Zod, use .describe("..."). When writing JSON Schema directly, specify it with the "description" key inside the property object.
Schema Definition using Zod (Recommended)
import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";

const client = new OpenAI();

const Step = z.object({
  explanation: z.string().describe("Explanation of what is being done in this calculation step."),
  output: z.string().describe("The calculation result for this step (formula or numerical value)."),
});

const MathResponse = z.object({
  steps: z.array(Step).describe("A list of steps for the solution."),
  final_answer: z.string().describe("The final answer to the equation (e.g., x = -3.75)."),
});

const response = await client.responses.parse({
  model: "gpt-4o",
  input: [
    { role: "system", content: "You are a math tutor. Explain step-by-step." },
    { role: "user", content: "Solve 8x + 7 = -23" },
  ],
  text: { format: zodResponseFormat(MathResponse, "math_response") },
});

const result = response.output_parsed;
console.log(result.final_answer);
for (const step of result.steps) {
  console.log(step.explanation, "->", step.output);
}
Specifying JSON Schema Directly
const response = await client.responses.create({
  model: "gpt-4o",
  input: [
    { role: "system", content: "You are a math tutor." },
    { role: "user", content: "Solve 8x + 7 = -23" },
  ],
  text: {
    format: {
      type: "json_schema",
      name: "math_response",
      strict: true,
      schema: {
        type: "object",
        description: "A response showing the step-by-step solution of an equation.",
        properties: {
          steps: {
            type: "array",
            description: "A list of steps for the solution.",
            items: {
              type: "object",
              properties: {
                explanation: {
                  type: "string",
                  description: "Explanation of what is being done in this calculation step.",
                },
                output: {
                  type: "string",
                  description: "The calculation result for this step (formula or numerical value).",
                },
              },
              required: ["explanation", "output"],
              additionalProperties: false,
            },
          },
          final_answer: {
            type: "string",
            description: "The final answer to the equation (e.g., x = -3.75).",
          },
        },
        required: ["steps", "final_answer"],
        additionalProperties: false,
      },
    },
  },
});
Note: The correct field in the Responses API is text.format. The old response_format key is deprecated in the Responses API (Source: OpenAI Developer Community).
 3.5 Structured Outputs in Function CallingTo apply Structured Outputs to a tool call, add strict: true to the function definition.
const response = await client.responses.create({
  model: "gpt-4o",
  input: "Tell me the delivery date for order #12345",
  tools: [
    {
      type: "function",
      name: "get_delivery_date",
      description: "Get the scheduled delivery date for an order.",
      strict: true,
      parameters: {
        type: "object",
        properties: {
          order_id: { type: "string" },
        },
        required: ["order_id"],
        additionalProperties: false,
      },
    },
  ],
});
Restriction: When using Structured Outputs for Function Calling, parallel_tool_calls must be set to false.
 3.6 What Structured Outputs Guarantees (and What It Doesn't)

Item
Status
Supplement


Correct JSON Syntax
✅ Guaranteed
—

Adherence to Specified Schema
✅ Guaranteed (with strict: true)
—

Presence of Required Fields
✅ Guaranteed
—

Use of Values Specified in enum
✅ Guaranteed
—

Factual Correctness
❌ Not Guaranteed
Hallucinations can occur for inputs unrelated to the schema.

Safety Policy Exemption
❌ Not Guaranteed
The model may return a refusal for safety reasons.

Example of handling refusal:
const response = await client.responses.parse({
  model: "gpt-4o",
  input: [/* ... */],
  text: { format: zodResponseFormat(MathResponse, "math_response") },
});

if (response.output[0].content[0].type === "refusal") {
  console.log("Model refused:", response.output[0].content[0].refusal);
} else {
  const result = response.output_parsed;
}
 3.7 Schema Constraints in strict modeIn strict: true mode, some JSON Schema features are restricted (Source: Structured Outputs Guide).
additionalProperties: false is required.
All properties must be included in the required array.
Direct use of anyOf at the root object is not allowed.
Restrictions apply to combinations like oneOf, anyOf, etc.
To prevent discrepancies between schemas and type definitions, OpenAI officially strongly recommends using the SDK with native Zod support.
 4. Realtime APIThe Realtime API, which reached General Availability (GA) in 2025, is a specialized API for low-latency bidirectional voice and text streaming via WebRTC, WebSocket, or SIP. It is designed for real-time interaction use cases such as voice agents connected directly from a browser or integration with telephony systems (PBX), clearly distinguishing its use from the Responses API. For more details, refer to the Official Realtime API Guide.
 5. StreamingThe Responses API supports streaming in Server-Sent Events (SSE) format, allowing you to receive long responses incrementally.
const stream = await client.responses.stream({
  model: "gpt-4o",
  input: "Tell me in detail about the beginning of the universe.",
});

for await (const event of stream) {
  if (
    event.type === "response.output_text.delta" &&
    event.delta
  ) {
    process.stdout.write(event.delta);
  }
}
Structured Outputs can also be combined with streaming, in which case a complete, schema-compliant JSON is returned at the end.
 6. HTTP Response HeadersAPI responses include headers useful for debugging and rate monitoring.


Header
Content


x-request-id
Unique ID for the request (required for support inquiries).

x-ratelimit-limit-requests
Current RPM limit applied.

x-ratelimit-limit-tokens
Current TPM limit applied.

x-ratelimit-remaining-requests
Remaining number of requests.

x-ratelimit-remaining-tokens
Remaining number of tokens.

x-ratelimit-reset-requests
Time until RPM resets.

x-ratelimit-reset-tokens
Time until TPM resets.

If you want to specify a request ID from the client side, add the X-Client-Request-Id header.
 7. Rate LimitsRate limits are applied per Organization and Project (not per user).
RPM (Requests Per Minute): Number of requests per minute.
TPM (Tokens Per Minute): Number of tokens per minute.
Usage tiers are automatically upgraded based on cumulative payments and usage history. If 429 Too Many Requests is returned, it is recommended to retry with exponential backoff (Source: Rate Limits Guide).
import retry from "async-retry";

async function callApiWithBackoff(params) {
  return retry(
    async () => {
      return await client.responses.create(params);
    },
    {
      retries: 6,
      minTimeout: 1000,
      maxTimeout: 60000,
      randomize: true,
    }
  );
}
 8. Version Stability and Model PinningAliases like gpt-4o are periodically updated to new snapshots, which may change the output even for the same prompt. In production, it is recommended to pin a snapshot name like gpt-4o-2024-08-06 and always perform evaluation (eval) when updating.
 Summary

Point
Content


New Projects
Use the Responses API (POST /v1/responses).

Chat Completions
Continued support. No immediate need to migrate.

Assistants API
Decommissioned August 26, 2026. Migration to Responses API recommended.

Structured Outputs
100% schema compliance with strict: true + additionalProperties: false.

Responses API Usage
Use text.format instead of response_format.

Real-time Voice
Refer to the Realtime API (WebRTC / WebSocket / SIP).

Rate Limits
Monitor headers and retry with exponential backoff.

Model Versioning
Pin snapshots and perform evals in production.

 Useful LinksResponses API Migration Guide — Official Migration Guide.
Structured Outputs Guide — Official Structured Outputs Guide.
Introducing Structured Outputs in the API — Official release blog.
Realtime API Guide — Details on real-time bidirectional voice and text.
Rate Limits Guide — Details on rate limits.
OpenAI Changelog — API update history.

Category	Endpoint	Positioning
Responses API	`POST /v1/responses`	Recommended for new projects. Unified interface for agents.
Chat Completions API	`POST /v1/chat/completions`	Continued support (no planned deprecation).
Realtime API	WebRTC / WebSocket / SIP	Real-time bidirectional voice and text.
Embeddings	`POST /v1/embeddings`	Vector search and RAG.
Images	`POST /v1/images/generations`	Image generation and editing.
Audio	`POST /v1/audio/transcriptions`	Speech recognition and TTS.

role	Meaning	Typical Usage
`system`	Instructions from the system (developer)	Defines the model's behavior, tone, and constraints. Generally placed once at the start of a conversation.
`user`	Input from the end-user	Represents the user's statements or questions.
`assistant`	The model's own past statements	Used in multi-turn conversations to provide previous responses as history.

Situation	Recommended Method
Need to modify/edit history mid-way (RAG injection, deleting old messages, etc.)	① Array-based
Solving complex multi-turn problems with reasoning models (o1, o3, etc.)	② `previous_response_id`
Simple chat where you want to avoid manual history management	② `previous_response_id`
Using the Chat Completions API	① Array-based (`previous_response_id` is for Responses API only)

Tool	Purpose
`web_search_preview`	Web search equivalent to ChatGPT.
`file_search`	RAG search over uploaded files.
`code_interpreter`	Code execution and data analysis.
`computer_use`	Computer operation agent.
`mcp`	Connection to third-party MCP servers.

`require_approval` Value	Behavior
`"never"`	Automatically approve all tool calls.
`{ never: { tool_names: [...] } }`	Automatically approve specified tools; others require approval.
Omitted (Default)	All tool calls require approval.

Parameter	Type	Description
`model`	string	Model name (e.g., `gpt-4o`).
`input`	string / array	Text or multimodal input.
`previous_response_id`	string	Previous response ID for multi-turn.
`tools`	array	Definitions of tools to use.
`text.format`	object	Specification for Structured Outputs (see below).
`stream`	boolean	Enable streaming.
`store`	boolean	Whether to save the response on the server (default: true).
`reasoning_effort`	string	Adjust reasoning depth (`low` / `medium` / `high`).
`background`	boolean	Asynchronous execution in background mode.

Feature	Chat Completions	Responses API
Conversation State	Client-side (entire history required)	Server-side (`previous_response_id`)
Web Search	Manual implementation needed	Built-in (`web_search_preview`)
File Search / RAG	Manual implementation needed	Built-in (`file_search`)
Code Execution	Manual implementation needed	Built-in (`code_interpreter`)
MCP Connection	Not supported	Native support for remote MCP
Reasoning Token Persistence	Discarded between turns	Can be persisted
`output_text` Helper	No	Yes
Format Specification	`response_format`	`text.format`
New Feature Delivery	Limited	Primary destination

Model	Characteristics
`o1` / `o1-mini`	First generation reasoning models.
`o3` / `o3-mini`	High-precision, high-performance successor series.
`o4-mini`	Model balanced for cost and performance.

Item	Status	Supplement
Correct JSON Syntax	✅ Guaranteed	—
Adherence to Specified Schema	✅ Guaranteed (with `strict: true`)	—
Presence of Required Fields	✅ Guaranteed	—
Use of Values Specified in enum	✅ Guaranteed	—
Factual Correctness	❌ Not Guaranteed	Hallucinations can occur for inputs unrelated to the schema.
Safety Policy Exemption	❌ Not Guaranteed	The model may return a `refusal` for safety reasons.

Header	Content
`x-request-id`	Unique ID for the request (required for support inquiries).
`x-ratelimit-limit-requests`	Current RPM limit applied.
`x-ratelimit-limit-tokens`	Current TPM limit applied.
`x-ratelimit-remaining-requests`	Remaining number of requests.
`x-ratelimit-remaining-tokens`	Remaining number of tokens.
`x-ratelimit-reset-requests`	Time until RPM resets.
`x-ratelimit-reset-tokens`	Time until TPM resets.

Point	Content
New Projects	Use the Responses API (`POST /v1/responses`).
Chat Completions	Continued support. No immediate need to migrate.
Assistants API	Decommissioned August 26, 2026. Migration to Responses API recommended.
Structured Outputs	100% schema compliance with `strict: true` + `additionalProperties: false`.
Responses API Usage	Use `text.format` instead of `response_format`.
Real-time Voice	Refer to the Realtime API (WebRTC / WebSocket / SIP).
Rate Limits	Monitor headers and retry with exponential backoff.
Model Versioning	Pin snapshots and perform evals in production.