If you’re building RAG or semantic search, you will eventually face a deceptively simple choice:
- Keep vectors in Postgres (via pgvector) so your embeddings live next to the rest of your data.
- Run a dedicated vector database (like Qdrant) that’s built specifically for high-performance vector + filter workloads.
Both are valid. Picking the wrong one usually looks like:
- everything feels fine in dev…
- then you add metadata filters, multi-tenancy, or higher QPS…
- then your “vector search feature” becomes your highest-maintenance system.
This post is a decision framework (with practical setup notes), not a benchmark shootout.
If you’re still deciding whether you even need RAG, start here: RAG vs Long Context (2026).
TL;DR (the default choice in 2026)
| Your reality | Default pick | Why |
|---|---|---|
| You already run Postgres, want one system, and your workload is modest. | pgvector | One database, ACID transactions, SQL filtering/joins, backups you already understand. |
| You expect heavy vector QPS, lots of filtering, or multiple collections/tenants. | Qdrant | A purpose-built engine for vector+filter search with clustering/sharding and vector-focused ops knobs. |
| You’re unsure and want the lowest-risk path. | pgvector → Qdrant later | Start simple in Postgres, design your IDs/embeddings pipeline so you can dual-write and migrate when needed. |
First, clarify what you’re actually building
Most “vector database” debates are really about two different problems:
- Vector similarity search: nearest neighbors in embedding space.
- Vector + filters: nearest neighbors inside a constrained subset (tenant, ACLs, tags, recency window, product catalog filters).
If you only need #1 on a small corpus, pgvector can feel magical.
If you need #2 at scale, your system’s real workload becomes “filtered ANN search”, and that’s where dedicated engines earn their keep.
What pgvector is (and why it’s so appealing)
pgvector is a Postgres extension that lets you store vectors in tables and query them with vector distance operators, using:
- Exact search
- Approximate search with IVFFlat and HNSW indexes
- Multiple vector types (including half-precision and binary) and multiple distance functions
The big win is architectural: it’s not “a new database”. It’s an extra column type in the database you already rely on.
HNSW vs IVFFlat (how to choose)
If you’re using approximate search, pgvector gives you two index families with different tradeoffs:
- HNSW: better speed/recall tradeoff, but slower builds and more memory.
- IVFFlat: faster builds and less memory, but typically worse speed/recall tradeoff.
Practical rule: if you’re already “living” inside Postgres and you can afford the memory, HNSW is often the default. If you need cheaper memory and faster builds (or you want tight control over recall/speed with “lists + probes”), consider IVFFlat.
The two knobs you’ll tune first (and why)
For production, you usually end up tuning query-time recall vs speed:
- HNSW:
hnsw.ef_search - IVFFlat:
ivfflat.probes
If your results look “weird” under filtering (few results, low recall), it’s often because filtering is eliminating candidates after the ANN index scan. In pgvector, raising hnsw.ef_search/ivfflat.probes is the first lever.
Where pgvector shines
- You want simple “search in my own data” without running another production system.
- Your ranking needs SQL joins (for permissions, inventory, billing state, etc.).
- You want transactional updates: update row + vector in one transaction.
- You want one backup/restore story.
Where pgvector hurts
You’re still using Postgres for something it was not originally designed for: high-QPS approximate nearest neighbor queries. You can absolutely do it, but you’ll feel the tradeoffs in:
- index build time and memory footprint
- vacuum/maintenance interactions
- tuning parameters that are “fine” for SQL workloads but not for ANN workloads
- scaling: you mostly scale Postgres by scaling the Postgres box
That doesn’t mean “don’t use it”. It means: be honest about the growth path.
What Qdrant is (and why teams switch)
Qdrant is a dedicated vector search engine with a core design center of:
- fast approximate nearest neighbor search
- payload (metadata) filtering
- operational features that assume “this is a vector service” (collections, snapshots/backups, sharding/replication)
Instead of “tables + SQL”, you work with collections (vectors + payload) and query via its APIs.
Why payload indexes matter in Qdrant
Qdrant’s model is explicit: vector indexes speed up vector search, and payload indexes speed up filtering. For vector+filter workloads, you typically need both.
That’s why Qdrant encourages you to create payload indexes for the fields you filter on most often (especially the ones that constrain the result set the most).
Where Qdrant shines
- You need filtered search as a first-class feature (and you care about filter performance).
- You have multiple collections (by tenant, product, region, model version).
- You want vector-native operational knobs (segment size, quantization, shards/replicas).
- You want horizontal-ish scaling patterns that fit a service, not a monolith DB.
Where Qdrant hurts
- It’s another production system with its own operational surface area.
- You lose SQL joins; anything “relational” becomes application-side orchestration.
- If your source of truth is Postgres anyway, you now have to reason about sync (dual write, CDC, or backfills).
The core tradeoff: “joins and transactions” vs “vector-first operations”
Here’s the mental model that rarely fails:
- If your vector search needs to behave like a query inside a product database, pgvector is a natural fit.
- If your vector search needs to behave like a high-throughput retrieval service, Qdrant is a natural fit.
A practical decision checklist (answer these in order)
1) Do you need SQL joins to rank/filter correctly?
If your access control, pricing rules, or business state lives in relational tables, SQL joins are a superpower.
If you’ll end up doing “vector search → fetch 200 IDs → join in Postgres anyway”, that’s often fine for low/moderate scale, and it points to pgvector.
If you need “vector search where ACL is enforced during retrieval”, Qdrant’s payload filters can be a better match — but only if your ACL state can live in payloads (or be derived there).
2) What’s your expected filtered query ratio?
Many teams start with “top-k nearest neighbors” and later add:
- tenant scoping
- language/region scoping
- recency windows
- category constraints
- permission scopes
If most queries are “vector + several filters”, plan for a vector database sooner than you think.
3) How big is your corpus (and how fast does it change)?
Corpus size matters, but update patterns matter more:
- If you bulk-load once and rarely update, both can work.
- If you continuously update embeddings, you’ll care about ingestion speed, index maintenance, and compaction.
4) Do you need “one backup button”?
If you’re a small team, one database you already know is often the correct choice.
Running Qdrant is reasonable, but it’s still a separate system with separate backups, restore drills, alerts, and SLOs.
5) Is Postgres already your bottleneck?
If your Postgres instance is already CPU-bound (or your team treats it like a shared, fragile asset), adding ANN workloads can be a political and technical problem.
In that case, isolating retrieval into Qdrant is often worth it just for blast-radius control.
6) Is your “vector index” part of your product’s write path?
If your product writes need to be fully transactional, pgvector is hard to beat: vector + row update in one transaction.
If you can tolerate eventual sync (or you can build a clean dual-write/CDC pipeline), Qdrant becomes much easier to justify.
7) Do you need clustering/sharding soon?
If you’re confident you’ll need to distribute vector search (shards/replicas, multi-node setups), it’s usually simpler to adopt a system that was designed for that.
Minimal setup: pgvector (Postgres)
The key thing to understand: pgvector is “Postgres-native”, so you should lean into Postgres patterns:
- keep a stable primary key for each document/chunk
- store the embedding in a
vectorcolumn - create an index that matches your search type (exact vs IVFFlat vs HNSW)
Example skeleton:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE doc_chunks (
id bigserial PRIMARY KEY,
tenant_id text NOT NULL,
content text NOT NULL,
embedding vector(1536) NOT NULL
);
-- One example index option (choose based on your needs)
CREATE INDEX ON doc_chunks USING hnsw (embedding vector_cosine_ops);
Then query with your filters in SQL:
SELECT id, content
FROM doc_chunks
WHERE tenant_id = $1
ORDER BY embedding <=> $2
LIMIT 10;
Filtered search gotcha (and the pgvector escape hatches)
With approximate indexes, it’s easy to accidentally build a system where:
- the ANN scan finds candidates
- your
WHEREclause eliminates most of them - you get low recall or not enough results
pgvector documents a few practical mitigations inside Postgres:
- increase
hnsw.ef_searchorivfflat.probesfor filtered queries - consider partial indexes when filtering on a small number of distinct values
- consider partitioning when filtering across many values
- enable iterative index scans (pgvector 0.8.0+) so the query automatically scans more of the index when filters are too restrictive
Minimal setup: Qdrant
Qdrant stores vectors alongside payloads (metadata). Payload filtering is a core concept: you filter by payload keys/values, and you can add indexes on payload fields to speed this up.
In practice, treat Qdrant like a retrieval service:
- upsert points (id + vector + payload)
- query with a vector and a filter
- fetch results (ids + payload) and join back to your product DB if needed
Example: create a payload index + filter with must
If you filter on tenant_id (or any other key), create a payload index so filtering stays fast:
PUT /collections/{collection_name}/index
{
"field_name": "tenant_id",
"field_schema": "keyword"
}
Then query with a vector and a must filter (an AND of conditions):
POST /collections/{collection_name}/points/query
{
"query": [0.2, 0.1, 0.9, 0.79],
"filter": {
"must": [
{ "key": "tenant_id", "match": { "value": "acme" } }
]
},
"limit": 10
}
A safe migration path: start with pgvector, design for Qdrant
The most robust approach for teams that want speed and a clean scale-up story:
- Start with pgvector to ship the feature quickly.
- Use stable IDs for chunks/documents from day one.
- Keep embeddings generation deterministic (model + normalization documented).
- When you outgrow Postgres retrieval, dual-write to Qdrant (or backfill + CDC).
- Switch reads gradually (feature flag), and keep Postgres as a fallback until stable.
This avoids the “big rewrite” trap.
Operational notes you should not skip (April 2026 reality)
- Patch extensions like dependencies. Postgres extensions can ship security fixes; treat upgrades as part of your normal security cadence.
- Measure retrieval quality separately from infra. Don’t blame the database when the real issue is chunking, embeddings choice, or query rewriting. (Related: Promptfoo: LLM Evals + Red Teaming Workflow.)
- Make filters explicit early. Your “works great” prototype often fails in production because real products have ACLs and multi-tenancy.
Practical recommendations (what I’d ship in 2026)
- If you’re a small team and you already trust Postgres: start with pgvector.
- If you’re designing a retrieval-heavy system with lots of filtering: plan for Qdrant early.
- If you’re unsure: start with pgvector, but design the data model so you can migrate without pain.
There is no universal winner.
The winner is the system you can operate reliably while your product grows.
Sources
- pgvector documentation (README, features, distance operators, IVFFlat/HNSW indexing, filtered search notes): https://github.com/pgvector/pgvector
- PostgreSQL docs (CVE-2026-3172 update for pgvector 0.8.2, February 26, 2026): https://www.postgresql.org/about/news/pgvector-082-gets-security-update-3042/
- Qdrant docs: Payload and filtering concepts (payload indexes, facet counts): https://qdrant.tech/documentation/concepts/payload/
- Qdrant docs: Filtering (must/should/must_not and filter semantics): https://qdrant.tech/documentation/concepts/filtering/
- Qdrant docs: Indexing (payload index types and guidance): https://qdrant.tech/documentation/concepts/indexing/
- Qdrant API reference: Query points endpoint (
/points/query, filters, limit): https://api.qdrant.tech/api-reference/search/query-points/ - Qdrant docs: Distributed deployment (concepts and configuration): https://qdrant.tech/documentation/concepts/distributed_deployment/
- Qdrant docs: Backups / snapshots (private cloud operator resources): https://qdrant.tech/documentation/private-cloud/backups/