Start with one use case
The wrong question: "where can we add AI?" The right one: "which user task is painful, has a measurable outcome, and an LLM can clearly help?"
Good candidates:
- Drafting (emails, replies, descriptions).
- Summarising long content.
- Classifying or routing.
- Search and retrieval.
Bad candidates (without serious investment): autonomous actions, anything irreversible, anything regulated.
Cost control
LLM costs scale with users *and* with verbosity. Defend with:
- Hard per-tenant daily caps.
- Cheaper models for the long tail (Haiku, Mistral, Llamame-7b).
- Prompt caching where the provider supports it.
- Streaming responses to abort early when the user closes the tab.
Evals
If you cannot measure quality, you cannot improve it. Maintain a small "golden set" of inputs and expected outputs. Run it on every model or prompt change. Even 50 examples beat vibes.
Safety
- Strip PII before prompts unless the user explicitly opts in.
- Defend against prompt injection on inputs coming from untrusted users.
- Never let the model trigger destructive actions without a confirmation step.
UX patterns that work
- Always show the model is working ("generating…").
- Let the user accept or reject the output, never auto-apply.
- Make the source visible when retrieval is involved.
- Allow the user to undo the AI's change.
What to avoid
- Building a chatbot when the user wanted a button.
- Ignoring latency — users abandon at 6 seconds.
- Promising magic in copy that the model cannot deliver.
The teams winning with AI are the boring ones: one tight use case, clear metrics, ruthless evals.