Database indexing for application developers

Most slow apps are not slow because of the framework, the ORM, or the cloud region. They are slow because a query is reading the whole table when it could be reading three rows. Indexing is the single highest-leverage performance skill a backend developer can pick up, and you can get most of the value without becoming a database administrator.

This is the version of database indexing we wish someone had explained to us before we shipped a dashboard that timed out the moment we had real customers.

What an index actually is

An index is a separate, sorted copy of one or more columns, kept in a structure (usually a B-tree) that the database can binary-search instead of scanning every row. The trade is simple: reads get faster, writes get slower, and you spend disk space. That is the whole bargain. Everything below is just learning when the bargain is worth it.

Without an index, a query like WHERE email = 'a@b.com' forces a sequential scan: the database reads every row and checks each one. With a B-tree index on email, it jumps to the matching entry in a handful of steps. On a table of ten rows nobody notices. On a table of two million rows it is the difference between two milliseconds and four seconds.

Read the query plan, not your intuition

You do not guess at indexing. You ask the database what it is doing. In Postgres, prefix any query with EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT id, total
FROM orders
WHERE customer_id = 42
ORDER BY created_at DESC
LIMIT 20;

The output tells you whether it used an index or fell back to a scan:

Seq Scan on orders  (cost=0.00..18243.00 rows=18 width=16)
  Filter: (customer_id = 42)
  Rows Removed by Filter: 999982
  Execution Time: 412.331 ms

Seq Scan plus Rows Removed by Filter: 999982 is the smell. The database read a million rows to return eighteen. Add the right index and the same plan becomes:

Index Scan using orders_customer_created_idx on orders
  (cost=0.42..8.61 rows=18 width=16)
  Execution Time: 0.094 ms

Make EXPLAIN ANALYZE a habit. Run it before you add an index and after. If the plan did not change, the index is not being used and you have learned something useful for free.

Composite indexes and why column order matters

Single-column indexes are easy. The part that trips people up is multi-column (composite) indexes, where order is everything.

A composite index on (customer_id, created_at) is sorted first by customer_id, then by created_at within each customer. Think of a phone book sorted by last name, then first name. That structure can serve:

WHERE customer_id = 42
WHERE customer_id = 42 ORDER BY created_at
WHERE customer_id = 42 AND created_at > '2026-01-01'

It cannot efficiently serve WHERE created_at > '2026-01-01' on its own, because created_at is only sorted inside each customer bucket. This is the leftmost-prefix rule: an index on (a, b, c) helps queries that filter on a, or a and b, or all three, but not b alone or c alone.

The practical rule: put equality columns first, then the range or sort column last. For the orders query above, the index you want is:

CREATE INDEX orders_customer_created_idx
ON orders (customer_id, created_at DESC);

The DESC matters because the query sorts newest-first. Match the index direction to the ORDER BY and the database can skip the sort step entirely.

Covering indexes

If a query only needs a couple of columns, you can include them in the index so the database never touches the table at all. This is a covering index, and Postgres expresses it with INCLUDE:

CREATE INDEX orders_customer_created_idx
ON orders (customer_id, created_at DESC)
INCLUDE (total);

Now the SELECT id, total ... WHERE customer_id = ... query can be answered from the index alone (an "Index Only Scan"). You trade a bit more index size for skipping the trip back to the heap. Worth it for hot read paths, not worth it for everything. When even a tuned index is not fast enough for a hot path, that is usually the moment to reach for caching with Redis.

Where it goes wrong in real apps

A few patterns we see constantly in code reviews.

Functions kill indexes. WHERE lower(email) = 'a@b.com' will not use a plain index on email, because the index stores the raw value, not the lowercased one. Either store the value normalized, or build a functional index: CREATE INDEX ON users (lower(email)).

Leading wildcards kill indexes. WHERE name LIKE '%smith' cannot use a B-tree, because the sort order is meaningless when you do not know the start of the string. LIKE 'smith%' is fine. For real text search, reach for full-text search or a trigram index (pg_trgm), not LIKE.

ORMs hide the cost. Prisma, Drizzle, and friends make it trivial to write a query that joins four tables and filters on an unindexed column. The ORM will not warn you. Turn on query logging in development and watch what actually hits the database:

// Drizzle: log every SQL statement during local dev
import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
 
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
 
export const db = drizzle(pool, {
  logger: process.env.NODE_ENV === "development",
});

With logging on, copy the slow statement into psql, run EXPLAIN ANALYZE, and you will usually find one missing index doing all the damage.

Foreign keys are not automatically indexed. Postgres indexes primary keys and unique constraints, but it does not index the column on the child side of a foreign key. If you query orders by customer_id (you will), you have to add that index yourself.

The cost side of the ledger

Indexes are not free, and a table with fifteen of them is its own kind of problem. Every INSERT, UPDATE, and DELETE has to update every index that touches the changed columns. On a write-heavy table, redundant indexes quietly tax every transaction.

So prune. Postgres tracks index usage in pg_stat_user_indexes:

SELECT relname AS table, indexrelname AS index, idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

Any index with idx_scan at or near zero after a few weeks of real traffic is a candidate for deletion. It is costing you writes and giving you nothing back.

One more thing for production: on a large live table, CREATE INDEX takes a write lock and can stall your app. Use CREATE INDEX CONCURRENTLY instead. It is slower and cannot run inside a transaction, but it does not block writes while it builds.

The takeaway

You do not need to memorize storage internals. You need three habits: run EXPLAIN ANALYZE on any query that feels slow, index the columns you filter and sort on (equality first, range last), and periodically delete indexes nothing uses. Do that and you will solve the large majority of "the app got slow as we grew" problems before they ever reach a customer.

If you would rather hand the slow-query hunt to someone who does this all day, that is the kind of thing we are happy to take off your plate.

Database indexing for application developers

What an index actually is

Read the query plan, not your intuition

Composite indexes and why column order matters

Covering indexes

Where it goes wrong in real apps

The cost side of the ledger

The takeaway

API & Backend Engineering

Keep reading

Building software for recruiting and staffing agencies

Software for construction and field-service teams still stuck on paper

Want this built right?