Prompt & Context Engineering for AI Features

The full lesson

When your product has an AI feature, the prompt is not a developer detail. It is a core design surface. It determines how the feature behaves, how it sounds, what it refuses, and how it fails. Designing that prompt is product design, and it deserves the same rigor as a flow or a component.

The prompt is part of the product

Two AI features built on the same model can feel completely different. One might feel helpful and on-brand; another might feel evasive and robotic. The difference is almost always the prompt and the context provided to the model. The model is the engine. The prompt and context are the design. Treating prompt work as “just engineering” is how AI features end up with no personality or point of view.

System prompts: the operating manual

The system prompt sets the model’s persistent behavior. It runs before each user message and shapes every response. Think of it like onboarding a new employee on day one: you tell them who they are, what they are here to do, the rules they must follow, the tone to use, and what to do when they are unsure.

You are the support assistant for Acme, a budgeting app.
Goals: help users understand their spending and resolve account issues.
Rules:
- Never give specific investment or tax advice; suggest a professional.
- If you are unsure or lack data, say so and offer to connect a human.
- Keep answers under 120 words unless asked for detail.
Tone: warm, plain, never condescending.

Context engineering: what goes in the window

Modern practice has shifted away from just clever wording toward context engineering. This means deciding what information the model can see when it answers. The context window is a fixed budget. You spend it on instructions, relevant retrieved data (RAG — Retrieval-Augmented Generation, where you pull in outside documents the model needs), a few examples, and the conversation history. Nothing else.

More context is not better. Irrelevant text distracts the model and can make it latch onto the wrong thing. Curate ruthlessly: retrieve only the documents that matter, summarize long conversation history, and drop anything the model does not need to answer this specific question.

Prompting patterns that earn their keep

A handful of patterns reliably improve output quality:

Pattern	When to use it
Few-shot examples	The output has a specific shape, voice, or edge cases to mirror
Explicit reasoning	Multi-step problems where a quick answer is often wrong
Structured output (JSON/schema)	The response feeds other UI or code and must be parseable
Tool / function calling	The model needs live data or to take an action, not guess

Guardrails, refusals, and prompt injection

User input — and any retrieved web content — can contain hidden instructions that try to override your system prompt. This attack is called prompt injection. Never assume your system prompt alone is enough protection. Validate and constrain the model’s outputs, keep the model’s authority narrowly scoped, and treat any model output that triggers real actions as untrusted until you have checked it.

Give the model a clear scope and an explicit “if unsure, do X” fallback. Validate structured output against a schema before acting on it.

Don't

Assume the system prompt is unbreakable, expose raw model output directly into privileged actions, or let retrieved content silently become instructions.

You can’t ship what you can’t measure

Prompts are like code with fuzzy behavior. A small change can silently break cases that used to work fine. Build an eval set: a collection of real inputs paired with the qualities a good response should have. Score every prompt change against that set. Without evals you are tuning by gut feel, and a fix for one complaint can quietly break ten other cases you never checked.