Chat Completions to Responses API (2026): Practical Migration Guide + Gotchas

If you have a production integration built on /v1/chat/completions, you do not need a rewrite to move to /v1/responses.

OpenAI’s docs are clear: Chat Completions remains supported, but Responses is recommended for new projects. The practical reason to migrate is not “the endpoint will disappear tomorrow” — it’s that Responses is where the platform’s agentic primitives (tools, state, richer outputs) are designed to feel native.

This guide is the “do it safely on a Tuesday” version: small, reversible steps, plus the gotchas that tend to break real systems.

TL;DR: what changes

You probably have today	You want after migration
`messages: [{ role, content }]`	`input: ...` (string or item array)
`choices[0].message.content`	`response.output_text` (or parse output items)
your own conversation store	`previous_response_id` chaining or Conversations API (durable conversation)
`response_format` for JSON schema	`text.format` for JSON schema
optional tool calling	unified tool interface + hosted tools

Step 0: decide how you will handle state (this drives everything)

You have three sane paths:

Stateless: you keep sending full context (what you already do).
Chained: you pass previous_response_id each turn (cheap and simple for short-lived flows).
Durable: you create a Conversation object and reuse it across sessions/devices/jobs.

The sharp edge: stored response objects have a limited lifetime (“up to 30 days”) and chaining only works while the previous response is still retrievable. Conversations exist specifically to avoid that TTL problem.

If you are migrating a typical SaaS chat assistant, start with chaining (fastest), then move to Conversations once you’ve proven correctness.

The migration gotchas that break production

n is gone. Chat Completions can return multiple parallel generations via n. In Responses, that parameter is removed — you get one generation per request. If you relied on “pick best of N,” you’ll need to run multiple requests (or redesign the UX).
Chaining requires retrievable responses. If you plan to use previous_response_id, set store: true for the responses you want to reference.
Chaining is not “free context.” Even when you chain with previous_response_id, previous input tokens are billed as input tokens in the API.
Conversations avoid TTL. Response objects are saved for 30 days by default; conversation objects and their items are not subject to that 30-day TTL.

Step 1: the smallest possible port (no tools, no schema, no streaming)

Before (Chat Completions)

import OpenAI from "openai";

const openai = new OpenAI();

const completion = await openai.chat.completions.create({
  model: "gpt-5",
  messages: [{ role: "user", content: "Summarize this in 3 bullets: ..." }],
});

const text = completion.choices[0].message.content;

After (Responses)

import OpenAI from "openai";

const openai = new OpenAI();

const response = await openai.responses.create({
  model: "gpt-5",
  input: "Summarize this in 3 bullets: ...",
});

const text = response.output_text;

That is intentionally boring. You ship it behind a feature flag, compare outputs, and only then turn on the advanced stuff.

Step 2: map `messages` → `input` items (the “real” migration)

In Chat Completions, everything is a message.

In Responses, you send an input which can be:

a single string, or
an array of items (messages, tool calls, tool outputs, etc.).

A practical mapping that works for most apps:

const input = [
  { role: "developer", content: "You are a precise, safe assistant." },
  { role: "user", content: "Draft a short onboarding email." },
];

const response = await openai.responses.create({ model: "gpt-5", input });

If your app previously relied on system messages, treat your top-level behavior as a developer message in Responses-style integrations so it stays visually separated from user content. If your SDK or model does not accept role: "developer", use role: "system" for the same intent.

Step 3: pick a storage posture (`store`) before you add features

From the official migration guide:

Responses are stored by default.
Chat Completions are stored by default for new accounts.
To disable storage, set store: false.

If you handle regulated or sensitive content, do not defer this decision until the end.

Step 4: multi-turn conversations (chaining vs Conversations API)

Option A — chaining with `previous_response_id`

Chaining is the smallest diff from your current “send history” loop:

let previous_response_id;

for (const userText of ["Plan the trip", "Now make it cheaper"]) {
  const response = await openai.responses.create({
    model: "gpt-5",
    input: userText,
    store: true,
    previous_response_id,
  });
  previous_response_id = response.id;
  console.log(response.output_text);
}

Chaining only works if the previous response is retrievable, so set store: true for any step you plan to reference later. This is perfect for “single session” assistants (support chats, in-app wizards, etc.), as long as you understand the TTL constraint for stored responses.

Option B — durable Conversations

If you need stable state across time (days/weeks) or across devices, create a Conversation once and reuse it:

from openai import OpenAI

client = OpenAI()
conversation = client.conversations.create()
conversation_id = conversation.id

response = client.responses.create(
  model="gpt-5",
  conversation=conversation_id,
  input=[{"role": "user", "content": "Summarize last week’s customer feedback and cluster themes."}],
)
print(response.output_text)

OpenAI’s conversation-state guide positions Conversations as the durable primitive, designed to store items like messages and tool results without you manually replaying the entire history.

Step 5: structured outputs (JSON schema) — `response_format` → `text.format`

If you already use JSON schema with Chat Completions, this is a high-value migration because it reduces “JSON almost” failures.

In Responses, structured outputs move from response_format to text.format.

Minimal example:

const response = await openai.responses.create({
  model: "gpt-5",
  input: "Jane, 54 years old",
  text: {
    format: {
      type: "json_schema",
      name: "person",
      strict: true,
      schema: {
        type: "object",
        properties: {
          name: { type: "string", minLength: 1 },
          age: { type: "number", minimum: 0, maximum: 130 },
        },
        required: ["name", "age"],
        additionalProperties: false,
      },
    },
  },
});

Your “migration safe” rule: wire schema outputs behind a flag, log parse failures, then roll forward as you gain confidence.

Step 6: tool calling (function tools) — keep orchestration explicit

Responses is where OpenAI’s hosted tools and tool interface are designed to live.

Even if you do not use hosted tools, the migration is a good moment to clean up tool calling:

keep tool execution server-side
validate tool args
treat tool outputs as untrusted inputs (because they are)

If you already have function tools in Chat Completions, the concept stays the same; the main work is updating to the Responses request/response shape and event model.

If you want the bigger picture on tool standardization, read: Why MCP is Becoming the Default Standard for AI Tools in 2026

Step 7: streaming — expect an event stream, not just token deltas

If you currently stream tokens from Chat Completions, do not assume a 1:1 mapping. In Responses, streaming is an event stream where output text, tool calls, and summaries can arrive as distinct event types.

Operationally:

stream to the UI, but buffer to your logs
record response.id and tool events
test cancellation and timeouts (tool-heavy flows change the failure modes)

A pragmatic migration checklist (copy/paste into your issue tracker)

State: pick stateless vs chaining vs Conversations (and document why).
Storage: set an explicit store value (do not rely on defaults).
Schemas: move JSON schema from response_format → text.format.
Streaming: update your client to handle event streams (not only token deltas).
Observability: add request IDs, tool logs, and retry policy for tool calls.

Where this is heading (so you don’t get surprised later)

If you are still using the older Assistants API, the official migration guide notes it is scheduled to be deprecated, with a target sunset date of August 26, 2026. That makes 2026 the right year to consolidate on Responses as the core primitive.

If you want a broader “what changes in your dev loop” read: GPT‑5.4 Is Here: A Developer Playbook for Faster, Safer Agents

Open-TechStack

Chat Completions to Responses API: A Practical Migration Guide (2026)

TL;DR: what changes

Step 0: decide how you will handle state (this drives everything)

The migration gotchas that break production

Step 1: the smallest possible port (no tools, no schema, no streaming)

Before (Chat Completions)

After (Responses)

Step 2: map `messages` → `input` items (the “real” migration)

Step 3: pick a storage posture (`store`) before you add features

Step 4: multi-turn conversations (chaining vs Conversations API)

Option A — chaining with `previous_response_id`

Option B — durable Conversations

Step 5: structured outputs (JSON schema) — `response_format` → `text.format`

Step 6: tool calling (function tools) — keep orchestration explicit

Step 7: streaming — expect an event stream, not just token deltas

A pragmatic migration checklist (copy/paste into your issue tracker)

Where this is heading (so you don’t get surprised later)

Sources

Charles Jasthyn De La Cueva / Founder of Open-TechStack

Chat Completions to Responses API: A Practical Migration Guide (2026)

TL;DR: what changes

Step 0: decide how you will handle state (this drives everything)

The migration gotchas that break production

Step 1: the smallest possible port (no tools, no schema, no streaming)

Before (Chat Completions)

After (Responses)

Step 2: map messages → input items (the “real” migration)

Step 3: pick a storage posture (store) before you add features

Step 4: multi-turn conversations (chaining vs Conversations API)

Option A — chaining with previous_response_id

Option B — durable Conversations

Step 5: structured outputs (JSON schema) — response_format → text.format

Step 6: tool calling (function tools) — keep orchestration explicit

Step 7: streaming — expect an event stream, not just token deltas

A pragmatic migration checklist (copy/paste into your issue tracker)

Where this is heading (so you don’t get surprised later)

Sources

Charles Jasthyn De La Cueva / Founder of Open-TechStack

More in Setup Guides

How to Build a Practical AI Workflow Without Wasting Money

Prompt Testing Is Becoming Mandatory: A Practical Promptfoo Evals Workflow

MCP Elicitation: The Missing Human-in-the-Loop Primitive for Agents

Get the Open-TechStack Newsletter

You're on the list!

Step 2: map `messages` → `input` items (the “real” migration)

Step 3: pick a storage posture (`store`) before you add features

Option A — chaining with `previous_response_id`

Step 5: structured outputs (JSON schema) — `response_format` → `text.format`