Multi-tenant architecture without the headaches

Most multi-tenant horror stories start the same way: someone forgets one WHERE tenant_id = ? and customer A sees customer B's invoices. The architecture itself is rarely the hard part. The hard part is making tenant isolation something you cannot accidentally skip, even at 2am on a Friday deploy.

This post walks through the decisions that actually matter when you build multi-tenant SaaS, with real Postgres and TypeScript you can use. No theory for its own sake.

Pick an isolation model, then live with it

There are three common ways to separate tenants. Pick based on your compliance needs and your team size, not on what sounds impressive.

Shared schema, shared tables

Every row carries a tenant_id column. One database, one schema, all tenants mixed together and filtered by that column. This is the default for a reason: it is cheap, it scales to thousands of tenants on one box, and migrations run once.

The risk is obvious. Isolation lives entirely in your query logic, so one missing filter leaks data. We solve that below with row-level security so the database enforces it, not your hope and good intentions.

Schema per tenant

One Postgres schema per tenant, same database. You get cleaner separation and per-tenant migrations, but the operational cost climbs fast. Running a migration across 4,000 schemas is slow and error-prone, and connection pooling gets awkward because you keep calling SET search_path. This model fits when you have tens or low hundreds of tenants, often larger enterprise accounts.

Database (or cluster) per tenant

Full physical isolation. Great for regulated industries or a handful of big customers who demand it contractually. It is also the most expensive to operate and the slowest to onboard new tenants. Do not reach for this until a customer is paying you enough to justify the ops burden.

For the vast majority of B2B SaaS, shared schema with row-level security is the right starting point. It is one more reason we reach for Postgres first on new projects. You can always promote a noisy or sensitive tenant to its own database later. Start simple.

Make the database enforce tenant boundaries

The single best thing you can do is stop trusting your application code to remember the filter. Postgres row-level security (RLS) lets the database reject any query that does not match the current tenant, full stop.

Here is the setup. You store the current tenant in a session variable, and a policy checks it on every read and write.

-- Every tenant-scoped table carries the column.
create table invoices (
  id          uuid primary key default gen_random_uuid(),
  tenant_id   uuid not null references tenants(id),
  amount_cents integer not null,
  status      text not null default 'draft',
  created_at  timestamptz not null default now()
);
 
create index on invoices (tenant_id);
 
-- Turn RLS on and force it even for the table owner.
alter table invoices enable row level security;
alter table invoices force row level security;
 
-- Reads and writes must match the tenant in the session.
create policy tenant_isolation on invoices
  using (tenant_id = current_setting('app.tenant_id')::uuid)
  with check (tenant_id = current_setting('app.tenant_id')::uuid);

The using clause filters what a query can see. The with check clause stops a tenant from inserting or updating a row into someone else's tenant_id. The force row level security line matters more than people expect: without it, the table owner bypasses the policy, and your migration user or pooled app user is often the owner.

Your application sets the variable at the start of each request, inside the same transaction as the work:

import { Pool } from "pg";
 
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
 
export async function withTenant<T>(
  tenantId: string,
  work: (client: import("pg").PoolClient) => Promise<T>,
): Promise<T> {
  const client = await pool.connect();
  try {
    await client.query("begin");
    // set_config with `is_local = true` scopes the value to this
    // transaction, so it never leaks to the next request that
    // reuses this pooled connection.
    await client.query("select set_config('app.tenant_id', $1, true)", [
      tenantId,
    ]);
    const result = await work(client);
    await client.query("commit");
    return result;
  } catch (err) {
    await client.query("rollback");
    throw err;
  } finally {
    client.release();
  }
}

The is_local = true flag is the detail that bites people. Connection pools reuse connections across requests. If you set the variable as a regular session setting, the next request on that same connection inherits the previous tenant. Transaction-scoped settings clean themselves up on commit or rollback, so you stay safe.

Now even if a junior dev writes select * from invoices with no filter, Postgres returns only the current tenant's rows. The boundary moved from your code review process into the engine.

Resolve the tenant once, at the edge

Decide early how a request maps to a tenant, and do it in exactly one place. The common options:

Subdomain: acme-corp.yourapp.com. Clean URLs, easy to brand, needs wildcard DNS and TLS.
Path prefix: yourapp.com/t/acme-corp. Simplest to ship, no DNS work.
Header or token claim: the tenant id rides inside the session JWT. Best for APIs and mobile clients.

In Next.js, middleware is the natural spot to resolve it before any route handler runs:

// middleware.ts
import { NextResponse, type NextRequest } from "next/server";
 
export function middleware(req: NextRequest) {
  const host = req.headers.get("host") ?? "";
  const sub = host.split(".")[0];
 
  // Reserved subdomains are not tenants.
  if (["www", "app", "api", "admin"].includes(sub)) {
    return NextResponse.next();
  }
 
  const res = NextResponse.next();
  // Pass the resolved slug downstream via a request header so
  // route handlers and server actions can read it.
  res.headers.set("x-tenant-slug", sub);
  return res;
}
 
export const config = {
  matcher: ["/((?!_next/static|_next/image|favicon.ico).*)"],
};

The key discipline: route handlers never parse the host themselves. They read the resolved tenant, look up the id, and pass it to withTenant. One source of truth, so there is one place to get it right and one place to audit.

The leaks nobody warns you about

A few things slip past the obvious WHERE clause and cause real incidents.

Background jobs run with no tenant context

Your request path is locked down, but the nightly billing worker or the queue consumer runs outside any HTTP request. It is easy to forget to set app.tenant_id there, and RLS will then return zero rows (best case) or you bypass RLS with an admin role and leak (worst case). Make every job carry its tenantId in the payload and route it through the same withTenant wrapper.

Caching across tenants

Any cache key that omits the tenant id is a data leak waiting to happen. A memoized getSettings() keyed only on "settings" will hand tenant B the config you cached for tenant A. Always prefix cache keys with the tenant: tenant:${tenantId}:settings.

Aggregate and admin queries

Your internal admin dashboard genuinely needs to read across tenants. Do not solve this by disabling RLS on your main app user. Use a separate, clearly named database role for admin reads, granted bypassrls, and keep it out of the request path entirely. The blast radius of a mistake stays contained to the admin tool.

The "noisy neighbor" problem

One tenant running a huge export can starve everyone else on a shared database. Watch for it before a customer complains. Per-tenant rate limits on expensive endpoints and a separate connection pool for heavy background work go a long way. When a single tenant outgrows the shared box, that is your signal to promote them to a dedicated database, which the architecture above already supports.

Takeaway

Start with a shared schema and a tenant_id on every table, then let Postgres row-level security enforce the boundary so your code physically cannot forget it. Resolve the tenant in one place, scope it to the transaction with set_config(..., true), and remember that background jobs and caches need the same discipline as your request path. Do that, and multi-tenant stops being scary and becomes boring, which is exactly what you want from your data layer.

If you would rather have someone build this with you, that is the kind of thing we do.