Scaling a web app when traffic spikes

Your app runs fine for months, then a TV segment or a Product Hunt launch sends ten thousand people at it in an hour and the whole thing falls over. The good news: spikes are one of the most solvable problems in web development, and you rarely need the expensive rewrite a panicked vendor will try to sell you. The bad news: the fix depends entirely on what breaks first, and most teams guess wrong.

First, figure out what actually breaks

When traffic spikes, four things tend to fail, usually in this order: the database, then your app server's memory or CPU, then third party APIs you call on every request, then bandwidth. Almost nobody runs out of bandwidth anymore. Almost everybody runs out of database.

The reason is simple. Your web server can usually spin up more copies of itself cheaply. Your database is one machine holding the truth, and every request that reads or writes goes through it. Under a spike, that single machine gets a thousand connections asking for the same homepage data and politely dies.

So before you spend a dollar, get one number: where is time going under load? A load test tool (k6 and Artillery are both free and take an afternoon to wire up) pointed at a staging copy will tell you whether you fall over at 200 concurrent users or 20,000, and which layer gives out. Without that number, scaling work is just expensive guessing.

The cheap fixes that buy you the most time

Caching is the highest leverage thing you can do

If the same page or API response is being computed over and over for different users, you are doing pointless work. A cache stores the computed result so the next thousand requests get it for free.

Three layers, cheapest first:

CDN caching for anything public and mostly static: marketing pages, product listings, blog posts, images. A CDN (Cloudflare, Fastly, the one built into Vercel) serves these from servers near the user and never touches your app. This alone can absorb most of a spike on a content heavy site, and it costs tens of dollars a month, not thousands.
Application caching with Redis for data that is expensive to compute but does not change every second. Think "top 50 products" or "this user's dashboard summary." You compute it once, store it for 30 seconds or 5 minutes, and serve everyone the cached copy.
Database query caching for the handful of slow queries that show up in every page load.

The honest tradeoff with caching is staleness. A cached price might be a minute old. For most apps that is completely fine. For a checkout total or an inventory count, it is not, and you need to be deliberate about which data can be stale and which cannot. That decision is a business call, not a technical one, which is exactly why it is worth your time as the person who owns the product.

Fix the database before you scale it

Half the "we need a bigger database" requests we get are actually missing indexes. A query that scans an entire table to find one row will run fine with a hundred rows and crawl with a million. Adding the right index can turn a two second query into a two millisecond one, and it costs nothing but an hour of an engineer's time.

After indexes, the next win is connection pooling. Databases handle a limited number of simultaneous connections (often a couple hundred), and under a spike your app can try to open thousands. A pooler (PgBouncer for Postgres, or a managed equivalent) sits in front and shares a small set of connections across all those requests. This is frequently the difference between surviving a spike and watching the database refuse every new request.

When you actually need to add machines

Once caching and database tuning are done, the remaining move is adding capacity. There are two flavors and they are not the same.

Horizontal scaling means running more copies of your app behind a load balancer. This is the good one. If your app is stateless (it does not store anything important in the memory of a single server), you can go from two copies to twenty in minutes, and most cloud platforms will do it automatically when traffic rises. Autoscaling on a managed platform is the default answer for spiky traffic.

Vertical scaling means buying a bigger single machine. Sometimes necessary for the database, which is hard to split. But it has a ceiling and a single point of failure, so treat it as a stopgap.

The thing that quietly breaks horizontal scaling is hidden state: user sessions stored in server memory, files saved to local disk, an in process job queue. Each of those means a request has to hit the exact server that holds its data, which defeats the point. Moving sessions to Redis and files to object storage (S3 and friends) is the unglamorous prep work that makes everything above possible. If your app was not built with this in mind, budget for it.

Rough costs and timelines

Framed as rough, because the real number depends on your stack and how the app was built:

CDN and basic caching setup: a few days of engineering, plus tens of dollars a month in service costs. Best return on investment by a wide margin.
Database indexing and connection pooling: a few days to a week, mostly free in infrastructure cost.
Redis application caching: one to two weeks depending on how many things you cache, plus maybe $20 to $200 a month for a managed Redis instance.
Making the app stateless for horizontal scaling: anywhere from a week to a couple of months, depending on how much hidden state exists. This is the one that surprises people.
Autoscaling infrastructure: usually configuration, not code, so days rather than weeks. The cost is variable by design: you pay for the extra machines only while the spike is happening.

A realistic first pass for a typical app (caching, indexes, pooling, autoscaling) is two to four weeks and lands you in a much better place. The multi month numbers only appear when the app needs structural surgery to go stateless.

What to watch out for

Do not scale blind. Paying for a database three sizes too big to cover a missing index is the most common money pit we see.
Test the spike, do not just hope. A load test on staging the week before a known launch is cheap insurance.
Watch the third parties. Your payment provider, email service, or that one analytics API can rate limit you under load even when your own servers are fine. Know their limits.
Background work matters too. A flood of signups also means a flood of welcome emails and webhooks. If those run inline with the request, they will choke. Move them to a background queue.
Beware the vendor selling a rewrite. Most spike problems are configuration and caching, not architecture. A rewrite is occasionally the right call, but it should be the last resort, not the first quote.

The takeaway

Find what breaks first with a load test, cache aggressively, fix your indexes and connection pooling, then add machines with autoscaling. In that order. Most teams can survive their next big spike for the cost of a couple of weeks of focused work, not a rebuild, as long as they tune before they buy.

If you have got a launch on the calendar and a nagging worry about whether the app will hold, that worry is worth taking seriously a few weeks early rather than the night before.

Scaling a web app when traffic spikes

First, figure out what actually breaks

The cheap fixes that buy you the most time

Caching is the highest leverage thing you can do

Fix the database before you scale it

When you actually need to add machines

Rough costs and timelines

What to watch out for

The takeaway

Cloud & DevOps

Keep reading

Software for construction and field-service teams still stuck on paper

Custom software for professional-services firms

Want this built right?