How to build a real-time app

Real-time looks like magic until you build it. A cursor moving on someone else's screen, a message landing the instant it is sent, a dashboard that updates without a refresh. The demo is easy. What is hard is keeping it correct when the network drops, two people edit the same thing, or ten thousand users connect at once. This guide is about that hard part, and how we would build the thing without losing a month to surprises.

What it is and who it is for

A real-time app pushes updates to users the moment something changes, instead of waiting for them to ask. That covers four common shapes, and the shape decides almost everything else:

Chat and messaging. Messages need ordering, delivery guarantees, and history.
Collaboration. Multiple people editing the same document, board, or sheet. The hard problem is conflict resolution.
Live dashboards and feeds. Mostly one-directional. The server pushes, clients read. Simpler than it looks.
Presence. Who is online, who is typing, where their cursor is. Looks trivial, is genuinely annoying at scale.

If you are a founder building any of these, the first useful question is not "WebSockets or not." It is "how wrong can an update be, and for how long, before a user notices or gets hurt?" A trading dashboard and a typing indicator have very different answers, and that tolerance is what justifies the engineering.

The MVP feature set: first versus later

The trap with real-time is building the hard version of everything on day one. You rarely need to.

Build first:

One transport that works everywhere (we usually start with WebSockets, more on that below).
A single server pushing updates to connected clients.
Optimistic UI: show the user's own action immediately, reconcile when the server confirms.
Basic reconnection: when the socket drops, reconnect and refetch current state.
Persistence in Postgres so nothing lives only in memory.

Leave for later:

Horizontal scaling and fan-out across many servers.
Operational transforms or CRDTs for true concurrent editing.
Presence with cursors and typing indicators.
Per-message delivery receipts and read state.
Offline support and queued sends.

A surprising amount of "real-time" ships with a simple model: clients send actions, the server validates and persists them, then broadcasts the result. Get that loop solid before the fancy parts.

Transport: WebSockets vs SSE vs polling

You have three honest options, and the right one depends on direction.

Polling. The client asks "anything new?" every few seconds. Ugly, but real. For a dashboard that updates every 30 seconds, polling is fine and needs zero new infrastructure.
Server-Sent Events (SSE). A one-way stream from server to client over plain HTTP. Perfect for live feeds, notifications, and dashboards where the client rarely talks back. Reconnection is built into the browser. Underrated and simple.
WebSockets. A full two-way connection. The right call for chat and collaboration, where both sides talk constantly. More moving parts: you own the connection lifecycle, heartbeats, and reconnection yourself.

Our default: SSE if the data flows mostly one way, WebSockets if it is a real conversation, polling when the update frequency is low enough that anything else is overkill.

The hard parts most teams underestimate

This is where budgets disappear. Four things, in rough order of how often they surprise people.

Reconnection and missed messages. Connections drop constantly: tunnels, sleeping laptops, flaky wifi. The naive client reconnects and shows a frozen state. The correct client reconnects, asks "what did I miss since message X," and catches up. That means every client tracks a sequence number the server can replay from. Design this on day one. Bolting it on later is painful.

Ordering. Messages can arrive out of order, especially across multiple servers, and "whenever it reached the server" is not a reliable order. Assign a monotonic sequence per channel or room and let clients sort by it. Here is the shape of a messages table that makes ordering and replay cheap:

create table messages (
  id          bigserial primary key,
  room_id     uuid not null,
  seq         bigint not null,        -- per-room monotonic counter
  author_id   uuid not null,
  body        text not null,
  client_tag  uuid not null,          -- dedupe optimistic sends
  created_at  timestamptz not null default now(),
  unique (room_id, seq),
  unique (room_id, client_tag)        -- same send retried = one row
);

The seq gives you order and replay ("give me everything after seq 412"). The client_tag lets a client retry a send safely without creating duplicates, which matters the moment reconnection enters the picture.

Conflict handling. If two people edit the same field, who wins? For most apps, last-write-wins with a clear timestamp is honestly fine, and you should start there. Real concurrent text editing (two cursors in the same paragraph) needs CRDTs or operational transforms, which are a serious undertaking. Do not reach for them unless the product requires it. Most "collaboration" features are really last-write-wins with good presence on top.

Presence at scale. Tracking who is online sounds easy until you realize it changes constantly and every change fans out to everyone in the room. We keep presence in Redis with short TTLs that clients refresh via heartbeat, so a dead connection expires on its own instead of leaving ghosts online.

Scaling fan-out

The moment you run more than one server, a problem appears: a user connected to server A sends a message, but the people in that room are connected to servers B and C. Server A has no idea they exist.

The standard fix is a pub/sub backbone. When a message arrives, the server persists it, then publishes it to a channel (Redis pub/sub, or a managed equivalent). Every server subscribed to that room receives it and pushes it to its own connected clients. Redis handles this well into the tens of thousands of concurrent connections, and when you outgrow it you move to a dedicated message broker. Most products never get there.

The stack we would reach for and why

Transport: native WebSockets or SSE, no heavy framework until we need one.
Server: Node and TypeScript for the connection layer. The event loop is a genuinely good fit for many idle-but-connected sockets, and sharing types between client and server removes a whole category of bugs.
State and fan-out: Postgres as the source of truth, Redis for pub/sub and presence. We wrote about why Postgres is our default and the short version holds here too: it does more than you expect, reliably.
Frontend: React with optimistic updates and a thin sync layer that owns reconnection and replay.
Hosting: a platform that supports long-lived connections. Plenty of serverless setups quietly do not, and finding that out in production is a bad day.

If your needs are mostly collaborative editing, presence, and shared state, that is squarely our real-time collaboration work, and we will tell you honestly when a managed service beats a custom build.

Rough timeline and cost

Ranges, not quotes. Real numbers depend on guarantees, scale, and how nasty your conflict rules are.

Live dashboard or notifications (SSE, one server): roughly 2 to 4 weeks. The cheapest real-time there is.
Chat or commenting (WebSockets, persistence, reconnection): roughly 4 to 8 weeks for something solid.
Real collaboration (presence, multi-server fan-out, conflict handling): roughly 8 to 16 weeks and up, depending on how true the concurrency needs to be.

The expensive variable is almost always correctness under failure: reconnection, ordering, and dedupe. The happy path is fast. Making it not lie to users when the network misbehaves is where the work is.

What to watch out for

Serverless that cannot hold connections. Check this before you commit to a host.
Skipping persistence. If state lives only in server memory, a deploy wipes it. Postgres first.
Building CRDTs you do not need. Start with last-write-wins. Upgrade only if the product forces it.
Ignoring backpressure. A slow client should not be able to stall your server. Bound your queues.
No replay path. If a reconnecting client cannot catch up cleanly, you will get silent data loss and confused users.

Takeaway

Real-time is not one hard problem, it is five medium ones: transport, persistence, reconnection, ordering, and fan-out. Get the simple loop right first (send, validate, persist, broadcast) and add the fancy parts only when the product earns them. This is exactly the kind of system we build and, just as often, the kind we get called in to fix after a demo that worked once stopped working under real users. If you are planning something live, tell us what you are building and we will give you a straight answer about the parts that are easy and the parts that are not.

How to build a real-time app

What it is and who it is for

The MVP feature set: first versus later

Transport: WebSockets vs SSE vs polling

The hard parts most teams underestimate

Scaling fan-out

The stack we would reach for and why

Rough timeline and cost

What to watch out for

Takeaway

API & Backend Engineering

Keep reading

Building software for recruiting and staffing agencies

Software for construction and field-service teams still stuck on paper

Want this built right?