Skip to content
lazy devs
5 min readLazy Devs

Adding AI features to your web app, pragmatically

How to add AI features without lighting money on fire: what an LLM is actually good for, rough costs and timelines, and the failure modes nobody warns you about.

Every product roadmap this year has "add AI" on it somewhere, usually written by someone who has not been told what that costs or how often it goes wrong. The good news: a well-scoped AI feature can ship in a couple of weeks and genuinely help users. The bad news: the demo that wowed everyone in the meeting is the easy 80 percent, and the last 20 percent is where the budget and the trust go to die.

Here is how we think about it when a client asks us to put AI in their app, with real tradeoffs and rough numbers.

Start with the job, not the technology

The first question is never "which model" or "do we fine-tune." It is "what is the user trying to do, and is a language model actually the right tool?" A lot of requests that arrive labeled "AI" are better served by a search index, a few SQL queries, or a boring rules engine that you can debug at 2am.

Language models are genuinely good at a specific shape of problem: turning messy, unstructured text into something useful, or turning structure into readable text. Concretely, the features that tend to pay off:

  • Summarizing long content (support threads, documents, meeting notes) into a few lines.
  • Drafting first-pass text a human will edit (replies, product descriptions, outreach).
  • Classifying or tagging free-form input (routing a support ticket, detecting sentiment).
  • Extracting structured fields from unstructured text (pulling an address and total off a pasted invoice).
  • Answering questions over your own documents, where the answer exists in your content but is hard to find.

The features that tend to disappoint: anything where a wrong answer is expensive and unverifiable, anything requiring real-time correctness about your live data unless you wire it up carefully, and anything where users expect deterministic behavior. A model that is right 92 percent of the time is magic for draft emails and a liability for calculating someone's tax.

The pragmatic build order

You almost never need to train or host your own model. For the vast majority of web app features, you call a hosted API (Anthropic, OpenAI, or a gateway in front of several) and the model is the easy part. The work is everything around it.

A realistic first version looks like this. You take the user's input, combine it with a clear instruction and any relevant context you pull from your own database, send that to the model, and then validate and display the result. The order that saves you pain:

  1. Prompt and a hardcoded example first. Before any UI, get the prompt working in a script against ten real, ugly inputs from your actual data. If it cannot handle your messiest support ticket, no amount of frontend polish will save it.
  2. Wrap it in one server endpoint. Keep the API key on the server, never the browser. Stream the response so the user sees text appear instead of staring at a spinner for eight seconds.
  3. Add the guardrails. Validate the output shape, set a token limit, add a timeout and a fallback for when the provider has a bad day (it will).
  4. Then make it pretty. The interface is the last 20 percent, not the first.

Retrieval, when the model needs to know your stuff

The single most common real feature is "answer questions about our docs / product / knowledge base." The pattern is retrieval-augmented generation: you store your content as embeddings in a vector index, find the handful of chunks relevant to the question, and hand those to the model along with the question. The model only answers from what you gave it, which dramatically cuts down on confident nonsense.

If you already run Postgres, you can do this with the pgvector extension and skip adding a whole new database. That alone saves you a piece of infrastructure to run and pay for. The hard part of retrieval is not the vectors, it is chunking your content sensibly and keeping the index fresh when documents change.

What it actually costs

Two cost buckets matter, and people only ever budget for one.

Per-request API cost. This is usually smaller than founders fear. A typical summarize or classify call runs a fraction of a cent to a few cents depending on the model and how much text you send. A chat feature with long context and retrieval can creep toward five to fifteen cents per exchange. The lever you control most is how much text you stuff into each call, so trimming context is the cheapest optimization you have. For most apps the monthly API bill is real but not scary until you reach serious volume.

Engineering and operations cost. This is the bucket that surprises people. A focused feature (summarize a document, classify a ticket, draft a reply) is realistically a one to three week build to do properly, including the guardrails and a feedback mechanism. A retrieval-based "chat with our knowledge base" feature is more like three to six weeks once you account for ingesting content, evaluating answer quality, and handling the wrong answers gracefully. These are rough ranges, not quotes, and they assume an existing app to bolt onto.

The ongoing cost is the part nobody mentions: prompts drift, models get deprecated and replaced, and your content changes. Budget for someone to own this after launch, the same way you would for any other live system.

The failure modes to plan for up front

These are not edge cases. They are the default behavior, and ignoring them is how AI features lose user trust in week one.

  • Confidently wrong answers. The model will state false things with the same tone as true ones. Show sources, let users verify, and never auto-execute an irreversible action on a raw model output.
  • Latency. Calls take seconds, not milliseconds. Stream output and design the UI so waiting feels intentional rather than broken.
  • Cost runaway. A loop that calls the model per row, or a public endpoint with no rate limit, can turn a small bill into a large one overnight. Cap tokens, rate-limit, and cache repeated questions.
  • Prompt injection. If user content or fetched web pages reach the model, assume someone will try to hijack the instructions. Keep the model away from anything that can take destructive action on its own.
  • Privacy. Know what data you send to a third-party provider and check your contract and the provider's data policy before you pipe customer records through it.

The cheapest insurance against all of these is a feedback loop: a thumbs up/down or an edit-before-send step. It keeps a human in charge of anything risky and quietly gives you the data you need to tell whether the feature is actually working.

The takeaway

Pick one feature where a roughly-right answer is genuinely useful and a wrong one is cheap to catch. Build the prompt against your ugliest real data before you build any UI, keep a human in the loop for anything irreversible, and budget for the maintenance, not just the launch. Done that way, AI features are a normal, shippable part of a web app rather than a science project.

If you want a second opinion on whether your AI idea is the one-week kind or the six-week kind, we are happy to look before you commit.

Related service

AI & LLM Integration

Useful AI features, not demos that break in production.

Learn more

Want this built right?

This is the work we do every day. Tell us what you are building and we will show you exactly how we would ship it.

hello@lazydevsagency.com