If you have a production integration built on /v1/chat/completions, you do not need a rewrite to move to /v1/responses.
OpenAI’s docs are clear: Chat Completions remains supported, but Responses is recommended for new projects. The practical reason to migrate is not “the endpoint will disappear tomorrow” — it’s that Responses is where the platform’s agentic primitives (tools, state, richer outputs) are designed to feel native.
This guide is the “do it safely on a Tuesday” version: small, reversible steps, plus the gotchas that tend to break real systems.
TL;DR: what changes
| You probably have today | You want after migration |
|---|---|
messages: [{ role, content }] | input: ... (string or item array) |
choices[0].message.content | response.output_text (or parse output items) |
| your own conversation store | previous_response_id chaining or Conversations API (durable conversation) |
response_format for JSON schema | text.format for JSON schema |
| optional tool calling | unified tool interface + hosted tools |
Step 0: decide how you will handle state (this drives everything)
You have three sane paths:
- Stateless: you keep sending full context (what you already do).
- Chained: you pass
previous_response_ideach turn (cheap and simple for short-lived flows). - Durable: you create a Conversation object and reuse it across sessions/devices/jobs.
The sharp edge: stored response objects have a limited lifetime (“up to 30 days”) and chaining only works while the previous response is still retrievable. Conversations exist specifically to avoid that TTL problem.
If you are migrating a typical SaaS chat assistant, start with chaining (fastest), then move to Conversations once you’ve proven correctness.
The migration gotchas that break production
nis gone. Chat Completions can return multiple parallel generations vian. In Responses, that parameter is removed — you get one generation per request. If you relied on “pick best of N,” you’ll need to run multiple requests (or redesign the UX).- Chaining requires retrievable responses. If you plan to use
previous_response_id, setstore: truefor the responses you want to reference. - Chaining is not “free context.” Even when you chain with
previous_response_id, previous input tokens are billed as input tokens in the API. - Conversations avoid TTL. Response objects are saved for 30 days by default; conversation objects and their items are not subject to that 30-day TTL.
Step 1: the smallest possible port (no tools, no schema, no streaming)
Before (Chat Completions)
import OpenAI from "openai";
const openai = new OpenAI();
const completion = await openai.chat.completions.create({
model: "gpt-5",
messages: [{ role: "user", content: "Summarize this in 3 bullets: ..." }],
});
const text = completion.choices[0].message.content;
After (Responses)
import OpenAI from "openai";
const openai = new OpenAI();
const response = await openai.responses.create({
model: "gpt-5",
input: "Summarize this in 3 bullets: ...",
});
const text = response.output_text;
That is intentionally boring. You ship it behind a feature flag, compare outputs, and only then turn on the advanced stuff.
Step 2: map messages → input items (the “real” migration)
In Chat Completions, everything is a message.
In Responses, you send an input which can be:
- a single string, or
- an array of items (messages, tool calls, tool outputs, etc.).
A practical mapping that works for most apps:
const input = [
{ role: "developer", content: "You are a precise, safe assistant." },
{ role: "user", content: "Draft a short onboarding email." },
];
const response = await openai.responses.create({ model: "gpt-5", input });
If your app previously relied on system messages, treat your top-level behavior as a developer message in Responses-style integrations so it stays visually separated from user content.
If your SDK or model does not accept role: "developer", use role: "system" for the same intent.
Step 3: pick a storage posture (store) before you add features
From the official migration guide:
- Responses are stored by default.
- Chat Completions are stored by default for new accounts.
- To disable storage, set
store: false.
If you handle regulated or sensitive content, do not defer this decision until the end.
Step 4: multi-turn conversations (chaining vs Conversations API)
Option A — chaining with previous_response_id
Chaining is the smallest diff from your current “send history” loop:
let previous_response_id;
for (const userText of ["Plan the trip", "Now make it cheaper"]) {
const response = await openai.responses.create({
model: "gpt-5",
input: userText,
store: true,
previous_response_id,
});
previous_response_id = response.id;
console.log(response.output_text);
}
Chaining only works if the previous response is retrievable, so set store: true for any step you plan to reference later. This is perfect for “single session” assistants (support chats, in-app wizards, etc.), as long as you understand the TTL constraint for stored responses.
Option B — durable Conversations
If you need stable state across time (days/weeks) or across devices, create a Conversation once and reuse it:
from openai import OpenAI
client = OpenAI()
conversation = client.conversations.create()
conversation_id = conversation.id
response = client.responses.create(
model="gpt-5",
conversation=conversation_id,
input=[{"role": "user", "content": "Summarize last week’s customer feedback and cluster themes."}],
)
print(response.output_text)
OpenAI’s conversation-state guide positions Conversations as the durable primitive, designed to store items like messages and tool results without you manually replaying the entire history.
Step 5: structured outputs (JSON schema) — response_format → text.format
If you already use JSON schema with Chat Completions, this is a high-value migration because it reduces “JSON almost” failures.
In Responses, structured outputs move from response_format to text.format.
Minimal example:
const response = await openai.responses.create({
model: "gpt-5",
input: "Jane, 54 years old",
text: {
format: {
type: "json_schema",
name: "person",
strict: true,
schema: {
type: "object",
properties: {
name: { type: "string", minLength: 1 },
age: { type: "number", minimum: 0, maximum: 130 },
},
required: ["name", "age"],
additionalProperties: false,
},
},
},
});
Your “migration safe” rule: wire schema outputs behind a flag, log parse failures, then roll forward as you gain confidence.
Step 6: tool calling (function tools) — keep orchestration explicit
Responses is where OpenAI’s hosted tools and tool interface are designed to live.
Even if you do not use hosted tools, the migration is a good moment to clean up tool calling:
- keep tool execution server-side
- validate tool args
- treat tool outputs as untrusted inputs (because they are)
If you already have function tools in Chat Completions, the concept stays the same; the main work is updating to the Responses request/response shape and event model.
If you want the bigger picture on tool standardization, read: Why MCP is Becoming the Default Standard for AI Tools in 2026
Step 7: streaming — expect an event stream, not just token deltas
If you currently stream tokens from Chat Completions, do not assume a 1:1 mapping. In Responses, streaming is an event stream where output text, tool calls, and summaries can arrive as distinct event types.
Operationally:
- stream to the UI, but buffer to your logs
- record
response.idand tool events - test cancellation and timeouts (tool-heavy flows change the failure modes)
A pragmatic migration checklist (copy/paste into your issue tracker)
- State: pick stateless vs chaining vs Conversations (and document why).
- Storage: set an explicit
storevalue (do not rely on defaults). - Schemas: move JSON schema from
response_format→text.format. - Streaming: update your client to handle event streams (not only token deltas).
- Observability: add request IDs, tool logs, and retry policy for tool calls.
Where this is heading (so you don’t get surprised later)
If you are still using the older Assistants API, the official migration guide notes it is scheduled to be deprecated, with a target sunset date of August 26, 2026. That makes 2026 the right year to consolidate on Responses as the core primitive.
If you want a broader “what changes in your dev loop” read: GPT‑5.4 Is Here: A Developer Playbook for Faster, Safer Agents