The last year of “AI agents” has been full of flashy demos and fragile workflows.
GPT‑5.4 is the first release in a while that changes what your default engineering loop can be, especially if you are building anything that touches tools: shells, repos, browsers, MCP servers, or internal APIs.
On March 5, 2026, OpenAI introduced GPT‑5.4 and rolled it out across ChatGPT, the API, and Codex. (Introducing GPT‑5.4)
This post is the “what do I do on Monday?” version: how to pick between GPT‑5.4 / mini / nano, how to use long context without turning your app into a token furnace, and the safety rails you should copy if you are shipping agentic workflows.
If you want the bigger picture about why tool access changes the risk profile, read: AI Coding Agents Need Guardrails, Not More Autonomy.
What changed (the practical diff)
GPT‑5.4 is not just “better at tasks.” It is a new default for reasoning + coding + tool use in one model family.
Here are the parts that matter for builders:
- Long context becomes a first-class feature. GPT‑5.4 supports a 1,050,000 token context window with up to 128,000 output tokens. (GPT‑5.4 model docs)
- Tools are part of the contract. In the Responses API, GPT‑5.4 explicitly supports things like web search, MCP, hosted shell, computer use, apply patch, and tool search. (GPT‑5.4 model docs)
- A clear 3-tier lineup exists. OpenAI shipped GPT‑5.4 mini and GPT‑5.4 nano on March 17, 2026, positioning them for high-volume workloads while keeping “GPT‑5.4 class” behavior. (Introducing GPT‑5.4 mini and nano)
The takeaway is not “upgrade because it’s new.” The takeaway is: you can finally build a sane production stack where:
- small models do cheap, repeated work
- mid models run sub-agents and tight iterations
- the big model does the hard planning and “final answer” pass
That structure matters more than squeezing a few points out of a benchmark.
The model routing pattern you should adopt now
The biggest mistake teams make after a flagship release is using the most expensive model for everything, then blaming “LLMs are too costly.”
Treat GPT‑5.4 like a capability tier, not a single default.

Recommended default tiering
- GPT‑5.4 nano: simple high-volume work (classification, extraction, routing, ranking) where latency + price dominate. (GPT‑5.4 nano model docs)
- GPT‑5.4 mini: “real work, but repeated a lot” (sub-agents, first-pass drafts, refactors, triage, code review summaries). (GPT‑5.4 mini model docs)
- GPT‑5.4: hard cases (multi-step planning, repo-wide changes, tool-heavy tasks, the final merge-ready output). (GPT‑5.4 model docs)
Then add a simple escalation rule:
- start cheap
- measure failures
- escalate only when the task is ambiguous, high-impact, or repeatedly failing
This is where evals stop being “nice-to-have” and become operational.
If you want a minimal eval workflow that actually catches regressions, read: Prompt Testing Is Becoming Mandatory: A Practical Promptfoo Evals Workflow.
Long context is powerful, but it will punish you if you are sloppy
The 1M context window is real, but it does not magically solve “stuff the repo into the prompt.”
Two practical rules:
1) Treat long context like memory bandwidth
Long context is best used for:
- the “one-shot” ingestion of a repo or document set
- multi-file reasoning where the relationships matter (interfaces, configs, migrations)
- pulling evidence for a decision (what changed, where, and why)
It is not best used for:
- dumping the entire repo on every call
- re-sending stable documents that you could cache
- letting your agent re-derive facts every time instead of storing a compact summary
2) Plan for compaction, not endless growth
In Codex, OpenAI notes that GPT‑5.4 includes experimental support for the 1M context window, and developers can try this by configuring model_context_window and model_auto_compact_token_limit. (Introducing GPT‑5.4)
Even if you are not using Codex, the principle holds:
- keep a rolling “working set” (current files + current diffs + current constraints)
- compact older context into an explicit summary artifact
- re-hydrate only what you can cite or re-check
That is how long-context apps stay stable instead of turning into expensive, drifting conversations.
The hidden cost spike: the “over 272K tokens” multiplier
GPT‑5.4’s long context is not priced like a normal prompt forever.
OpenAI’s model docs note that for models with a 1.05M context window (including GPT‑5.4), prompts with more than 272K input tokens are priced at 2× input and 1.5× output for the full session. (GPT‑5.4 model docs)
That means long-context agents need basic hygiene:
- cache stable inputs when you can
- avoid re-sending large, unchanged payloads
- treat “full repo in context” as a special mode, not a default
Copy these safety rails if you are shipping tool use
Better tool use makes agents more useful.
It also makes them easier to ship unsafely.
OpenAI’s GPT‑5.4 system card documentation emphasizes the safety side of “thinking models” and deployment considerations. If your app can call tools, treat that as a security boundary, not a feature. (GPT‑5.4 thinking system card)
Practical guardrails you should adopt:
- Sandbox execution (containers, restricted FS, limited network) as the default.
- Allowlists for high-risk tool calls, especially shell commands and write actions.
- Human approval for irreversible operations (deletes, migrations, prod deploys).
- Audit logs for tool calls and outputs that affect state.
If you are building around MCP, treat servers as code: version them, scope them, and assume untrusted content can try to steer the agent. The infrastructure is getting standardized fast; your safety posture has to keep up. (Why MCP Is Becoming the Default Standard for AI Tools in 2026)
A minimal “Monday upgrade” checklist
If you only do a few things after this release, do these:
- Adopt 3-tier routing (nano / mini / flagship) with clear escalation rules.
- Add evals for your top 10 tasks + top 10 failure modes.
- Implement long-context hygiene (caching, compaction, working-set discipline).
- Ship tool guardrails (sandbox + approvals + logs).
GPT‑5.4 makes agent workflows more viable — but it does not remove the need for engineering discipline. It raises the ceiling; it does not fix the floor.
Sources
- OpenAI: Introducing GPT‑5.4 (March 5, 2026): https://openai.com/index/introducing-gpt-5-4/
- OpenAI: Introducing GPT‑5.4 mini and nano (March 17, 2026): https://openai.com/index/introducing-gpt-5-4-mini-and-nano/
- OpenAI API docs: GPT‑5.4 model page (context window, tools, pricing): https://developers.openai.com/api/docs/models/gpt-5.4
- OpenAI API docs: GPT‑5.4 mini model page: https://developers.openai.com/api/docs/models/gpt-5.4-mini
- OpenAI API docs: GPT‑5.4 nano model page: https://developers.openai.com/api/docs/models/gpt-5.4-nano
- OpenAI deployment safety: GPT‑5.4 thinking system card: https://deploymentsafety.openai.com/gpt-5-4-thinking