Prompt & Context Engineering for AI Features
Designing the prompts behind an AI feature — system prompts, context engineering, structured output, guardrails, and evaluation.
3 min read
The full lesson
When your product has an AI feature, the prompt is not a developer detail. It is a core design surface. It determines how the feature behaves, how it sounds, what it refuses, and how it fails. Designing that prompt is product design, and it deserves the same rigor as a flow or a component.
The prompt is part of the product
Two AI features built on the same model can feel completely different. One might feel helpful and on-brand; another might feel evasive and robotic. The difference is almost always the prompt and the context provided to the model. The model is the engine. The prompt and context are the design. Treating prompt work as “just engineering” is how AI features end up with no personality or point of view.
System prompts: the operating manual
The system prompt sets the model’s persistent behavior. It runs before each user message and shapes every response. Think of it like onboarding a new employee on day one: you tell them who they are, what they are here to do, the rules they must follow, the tone to use, and what to do when they are unsure.
You are the support assistant for Acme, a budgeting app.
Goals: help users understand their spending and resolve account issues.
Rules:
- Never give specific investment or tax advice; suggest a professional.
- If you are unsure or lack data, say so and offer to connect a human.
- Keep answers under 120 words unless asked for detail.
Tone: warm, plain, never condescending.
Context engineering: what goes in the window
Modern practice has shifted away from just clever wording toward context engineering. This means deciding what information the model can see when it answers. The context window is a fixed budget. You spend it on instructions, relevant retrieved data (RAG — Retrieval-Augmented Generation, where you pull in outside documents the model needs), a few examples, and the conversation history. Nothing else.
More context is not better. Irrelevant text distracts the model and can make it latch onto the wrong thing. Curate ruthlessly: retrieve only the documents that matter, summarize long conversation history, and drop anything the model does not need to answer this specific question.
Prompting patterns that earn their keep
A handful of patterns reliably improve output quality:
| Pattern | When to use it |
|---|---|
| Few-shot examples | The output has a specific shape, voice, or edge cases to mirror |
| Explicit reasoning | Multi-step problems where a quick answer is often wrong |
| Structured output (JSON/schema) | The response feeds other UI or code and must be parseable |
| Tool / function calling | The model needs live data or to take an action, not guess |
Guardrails, refusals, and prompt injection
User input — and any retrieved web content — can contain hidden instructions that try to override your system prompt. This attack is called prompt injection. Never assume your system prompt alone is enough protection. Validate and constrain the model’s outputs, keep the model’s authority narrowly scoped, and treat any model output that triggers real actions as untrusted until you have checked it.
Do
Give the model a clear scope and an explicit “if unsure, do X” fallback. Validate structured output against a schema before acting on it.
Don't
Assume the system prompt is unbreakable, expose raw model output directly into privileged actions, or let retrieved content silently become instructions.
You can’t ship what you can’t measure
Prompts are like code with fuzzy behavior. A small change can silently break cases that used to work fine. Build an eval set: a collection of real inputs paired with the qualities a good response should have. Score every prompt change against that set. Without evals you are tuning by gut feel, and a fix for one complaint can quietly break ten other cases you never checked.