How to Use LiteLLM with OpenAI, Claude, and Gemini: Practical 2026 Guide

If your app already speaks the OpenAI SDK, the easiest way to add more providers is usually not rewriting your whole client layer.

It is putting a gateway in front of it.

That is the real job LiteLLM solves.

Its docs position the proxy as a central LLM gateway with authentication, authorization, cost tracking, rate limiting, virtual keys, and per-project budgets, while the SDK side focuses on a unified OpenAI-style input and output format across providers.

That combination makes LiteLLM useful when you want one app surface for:

OpenAI for default reasoning and tool use
Claude for selected writing or analysis workloads
Gemini for specific latency, multimodal, or Google-stack workflows

This guide is the practical version: how to set up the LiteLLM proxy, point your existing OpenAI client at it, and decide when not to hide provider differences behind one interface.

TL;DR

Use LiteLLM when:

your app already uses the OpenAI SDK and you want to add providers without rewriting the whole client layer
you want a single internal endpoint for routing, fallback, budgets, and key management
you want platform-team control over which models product teams can call

Do not use LiteLLM as an excuse to ignore provider differences.

As of April 10, 2026:

OpenAI says the Responses API is the superset future of Chat Completions and encourages migration
Anthropic says its OpenAI compatibility layer is mainly for testing/comparison and is not considered a long-term or production-ready solution for most use cases
Google says support for the OpenAI libraries in Gemini is still in beta while they extend feature support

That means LiteLLM is strongest as your own compatibility layer, not as proof that every provider behaves identically underneath.

What LiteLLM is actually good at

A lot of “one API for all models” tools are really demos.

LiteLLM is more useful when you treat it like infrastructure.

From the official docs, the practical advantages are:

a consistent OpenAI-style request/response shape
a proxy server that can sit between your app and multiple providers
retry and fallback logic across deployments
spend tracking and per-project budgets
virtual keys for secure access control

That is a good fit for teams that want to standardize on:

one application-facing API surface
one place to apply routing and policy
one place to meter usage

If that is your actual goal, LiteLLM is cleaner than wiring separate provider SDKs into every service.

The architecture that usually works best

The cleanest setup is:

your app talks to the OpenAI SDK
the SDK base_url points to your LiteLLM proxy
LiteLLM routes requests to OpenAI, Anthropic, or Gemini based on the model name you expose internally

That gives you a useful separation of concerns:

app code stays simple
provider keys stay behind the proxy
routing and fallback policy live in one place

If you are already moving your app to Responses, keep that direction. OpenAI explicitly recommends the Responses API as the future path, and LiteLLM’s docs also show Responses support across several providers.

If you still need the migration plan, read: Chat Completions to Responses API: A Practical Migration Guide

Step 1: Create a minimal LiteLLM proxy config

LiteLLM’s proxy quickstart uses a model_list plus litellm_params, with master_key under litellm_settings.

Start simple:

model_list:
  - model_name: openai-default
    litellm_params:
      model: openai/<your-openai-model>

  - model_name: claude-default
    litellm_params:
      model: anthropic/<your-claude-model>

  - model_name: gemini-default
    litellm_params:
      model: vertex_ai/<your-gemini-model>

litellm_settings:
  master_key: sk-your-litellm-master-key

The important part is the shape, not the exact model IDs in this example.

Use the current provider model names from the vendor docs or your account dashboard. Export the relevant provider credentials in the environment where the proxy runs rather than hardcoding them into the file.

Why this shape matters:

model_name is the alias your app will call
litellm_params.model is the real upstream provider/model target
master_key lets the proxy act like a controlled internal gateway instead of an open relay

Step 2: Run the proxy and treat it like an internal service

LiteLLM’s docs show two common startup paths:

install and run the proxy directly
run the published Docker image with --config /app/config.yaml

The exact install path matters less than the operating model:

keep the proxy on a stable internal URL
load provider credentials through environment variables
keep the config file in version control if it does not contain secrets

For a local test, the docs show the proxy on port 4000.

Step 3: Point the OpenAI SDK at LiteLLM instead of a provider

Once the proxy is up, your app code can stay close to what you already have.

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-litellm-master-key",
    base_url="http://localhost:4000"
)

response = client.responses.create(
    model="openai-default",
    input="Give me a five-bullet launch checklist for a new internal AI tool."
)

To switch providers, change only the model alias:

response = client.responses.create(
    model="claude-default",
    input="Rewrite this internal update in a calmer, more executive tone."
)

This is the real productivity gain.

Your application code does not need to understand every provider’s authentication model, endpoint URL, or SDK differences. The proxy absorbs most of that operational complexity.

Step 4: Use aliases that reflect workload, not vendor branding

Most teams make the same mistake here:

They expose raw provider names everywhere.

That locks product decisions to vendors too early.

A better pattern is:

reasoning-default
writing-default
fast-default
vision-default

Then map those aliases to the provider/model that currently makes sense.

That gives you room to change the upstream without rewriting prompts, agent configs, or product logic everywhere else.

It also reduces the chance that teams start depending on a vendor-specific quirk by accident.

Step 5: Add routing and fallback only after the basic path works

LiteLLM’s docs explicitly call out retry and fallback logic as a core feature. That is useful, but it is also where complexity starts.

Do this in order:

make one provider work cleanly through the proxy
add a second provider
verify that your app logs, traces, and error handling still make sense
only then add fallback behavior

Why the caution?

Because “fallback” can quietly change:

output style
tool-calling behavior
schema conformance
latency
cost

If your app depends on strict structured outputs, a silent provider failover can be more dangerous than a visible error.

Step 6: Use budgets and virtual keys for team control

This is where LiteLLM stops being just a convenience wrapper.

The docs position the proxy around:

project and user spend tracking
per-project budgets
virtual keys for secure access control

That matters when multiple people or services share the same gateway.

A practical pattern is:

one proxy for the organization or platform team
separate virtual keys per service or team
budget limits by environment or project

That gives you cleaner attribution and fewer “who burned the entire monthly model budget?” moments.

If you care about cost discipline more broadly, this pairs well with: How to Build a Practical AI Workflow Without Wasting Money

The caveat most teams miss: provider compatibility is not the same as feature parity

This is the part worth taking seriously.

LiteLLM can normalize a lot, but it cannot erase the fact that providers still differ.

Anthropic’s own warning is unusually explicit

Anthropic’s OpenAI SDK compatibility docs say the layer is mainly meant for testing and capability comparison and is not considered a long-term or production-ready solution for most use cases.

The same docs also call out meaningful differences, including the fact that the strict parameter for function calling is ignored, so tool-use JSON is not guaranteed to follow the supplied schema.

That is a serious warning if your app depends on tight schema guarantees.

Google still labels its OpenAI library support as beta

Google’s Gemini OpenAI-compat docs say support for the OpenAI libraries is still in beta while they extend feature support.

That does not mean “do not use it.”

It does mean you should not assume provider behavior is stable just because your client library still compiles.

The practical rule

Use LiteLLM to simplify your app interface.

Do not assume it makes every provider equally good at:

structured outputs
tool calling
reasoning traces
multimodal input formats
provider-specific features like prompt caching, citations, or PDF-native flows

If one of those features is core to the product, use the provider’s native API path for that workflow instead of forcing full abstraction.

When LiteLLM is a strong default

Choose LiteLLM when:

you run multiple LLM-backed services and want one internal gateway
your team wants centralized auth, routing, budgets, and observability
most of your app logic can live comfortably on OpenAI-style primitives
swapping providers is a business or reliability requirement

When LiteLLM is the wrong abstraction

Skip it, or keep it out of the hot path, when:

you need deep provider-specific features that do not map cleanly to a common interface
you rely on exact schema behavior across tool calls
you want the newest provider features immediately, before abstraction layers catch up
your system is simple enough that one provider SDK is easier than running a gateway

In other words:

LiteLLM is best when your problem is platform coordination, not just “I want more model options.”

A production-minded rollout plan

If I were setting this up from scratch, I would do it in this order:

Standardize the app on Responses first.
Put LiteLLM in front of one provider.
Add a second provider behind a separate model alias.
Add tracing and cost logging before fallback rules.
Add virtual keys and budgets before broader team rollout.
Keep an escape hatch for provider-native calls where abstraction becomes limiting.

If you need the observability side of that stack, start here: Langfuse vs Phoenix vs Helicone (2026): Choosing an LLM Observability Stack

The simplest mental model

Do not think of LiteLLM as “magic multi-provider support.”

Think of it as:

an internal LLM gateway
with an OpenAI-shaped app contract
plus routing, policy, and spend controls

That is a real infrastructure role.

And it is usually more valuable than chasing perfect one-to-one provider compatibility.

One final note: if LiteLLM is in your stack, keep your dependency hygiene tight. The supply-chain risk is not theoretical. This earlier incident breakdown is worth reviewing with your team: LiteLLM’s PyPI Compromise Is a Worst-Case Supply-Chain Incident for AI Teams

Open-TechStack

How to Use LiteLLM with OpenAI, Claude, and Gemini (2026)

TL;DR

What LiteLLM is actually good at

The architecture that usually works best

Step 1: Create a minimal LiteLLM proxy config

Step 2: Run the proxy and treat it like an internal service

Step 3: Point the OpenAI SDK at LiteLLM instead of a provider

Step 4: Use aliases that reflect workload, not vendor branding

Step 5: Add routing and fallback only after the basic path works

Step 6: Use budgets and virtual keys for team control

The caveat most teams miss: provider compatibility is not the same as feature parity

Anthropic’s own warning is unusually explicit

Google still labels its OpenAI library support as beta

The practical rule

When LiteLLM is a strong default

When LiteLLM is the wrong abstraction

A production-minded rollout plan

The simplest mental model

Sources

Charles Jasthyn De La Cueva / Founder of Open-TechStack

How to Use LiteLLM with OpenAI, Claude, and Gemini (2026)

TL;DR

What LiteLLM is actually good at

The architecture that usually works best

Step 1: Create a minimal LiteLLM proxy config

Step 2: Run the proxy and treat it like an internal service

Step 3: Point the OpenAI SDK at LiteLLM instead of a provider

Step 4: Use aliases that reflect workload, not vendor branding

Step 5: Add routing and fallback only after the basic path works

Step 6: Use budgets and virtual keys for team control

The caveat most teams miss: provider compatibility is not the same as feature parity

Anthropic’s own warning is unusually explicit

Google still labels its OpenAI library support as beta

The practical rule

When LiteLLM is a strong default

When LiteLLM is the wrong abstraction

A production-minded rollout plan

The simplest mental model

Sources

Charles Jasthyn De La Cueva / Founder of Open-TechStack

More in Setup Guides

How to Build a Practical AI Workflow Without Wasting Money

Prompt Testing Is Becoming Mandatory: A Practical Promptfoo Evals Workflow

MCP Elicitation: The Missing Human-in-the-Loop Primitive for Agents

Get the Open-TechStack Newsletter

You're on the list!