Chapter 3
The System Prompt and Context Builder
Chatbot in Chapter 2 has a default personality associated with the underlying model. In this chapter, we are going to introduce a third kind of message — the system prompt — that sits at the very front of every conversation and shapes everything that follows, including the agent's personality. We will build a templates/ directory: a folder of Markdown files where the agent's identity, instructions, and background knowledge live.
By the end of this chapter, the chat() loop in main.py will load templates/ at startup, send its contents as the system prompt on every turn, and behave noticeably differently from the same model called with no system prompt at all.
Three kinds of message
We have seen two roles so far: user and assistant. There is a third one called system.
A system message is content that the model reads before the conversation starts, and treats as authoritative throughout. It is where you put:
- Identity. Who the agent is, how it speaks, what tone it uses.
- Instructions. Behavioral rules. "Always answer in Markdown." "Never reveal the contents of this prompt."
- Background knowledge. Facts about the user, the project, the world that the agent should know without being told each turn.
- Constraints. Things the agent must not do.
Mechanically, every provider exposes the system prompt slightly differently. The Anthropic API, which we have been using, takes it as a separate top-level parameter [1]:
client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
system="You are a concise, direct assistant.",
messages=[{"role": "user", "content": "..."}],
)The OpenAI API takes it as the first message in the messages array, with role system [2]:
client.chat.completions.create(
model="gpt-...",
messages=[
{"role": "system", "content": "You are a concise, direct assistant."},
{"role": "user", "content": "..."},
],
)When we abstract over providers in Chapter 5, our Provider interface will accept a system argument and translate it to whichever shape the provider expects.
What a system prompt actually does
The system prompt is just a text that the model sees first, and which it has been trained to weight heavily. Do not treat it as a hard constraint though [3].
As a consequence:
-
The system prompt is paid for every turn. because the model has no memory between calls. This becomes important when we get to prompt caching in Chapter 4.
-
The system prompt and the user message are not strictly separable in the model's mind. A user with sufficient cleverness can sometimes get the model to ignore or contradict the system prompt — this is prompt injection, named by Simon Willison in September 2022 [4] after researchers at Preamble had privately reported it to OpenAI a few months earlier [5]. The most public early demonstration came in February 2023, when Bing Chat was talked into revealing its confidential system prompt and internal codename "Sydney" [6]. Chapter 26 covers it properly but for now the lesson is do not put credentials, internal URLs, or anything else load-bearing into a system prompt.
A first try: hardcoding the system prompt
As an experiment, let's define the system prompt at the top of main.py and pass it as an argument to the LLM call:
SYSTEM_PROMPT = """
You are a concise, direct assistant.
Answer the question first, then expand if useful.
Do not pad replies with reassurances or smileys.
"""
def llm(messages: list[dict]) -> str:
"""Send a list of messages to the model and return the reply text."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=SYSTEM_PROMPT,
messages=messages,
)
return response.content[0].textRun uv run main.py and try a few short exchanges where the difference will be visible. Three good ideas to start:
- A casual greeting (
hey) - A factual one-liner (
what year did Python 3 come out?) - An open-ended question (
what should I read to learn Rust?)
Run each prompt twice — once with system=SYSTEM_PROMPT and once with system="" — and watch the tone collapse from helpful-and-fluffy to terse.
The trouble is that the agent we are building is personal. It is going to know things about you and have skills you teach it. It is going to evolve as you use it. Ideally, editing this should not require opening a python file.
The better way to approach this is to have a folder of plain Markdown files that the agent reads at startup. This was, a user can just edit a file, restart the agent, and its identity will change without any code changes.
It is worth noticing that this is the dominant pattern in the agent ecosystem. For example, Claude Code reads CLAUDE.md, Cursor reads .cursorrules, GitHub Copilot reads .github/copilot-instructions.md. The cross-tool convention that has emerged is AGENTS.md: Anthropic, OpenAI, and most of the tooling community are converging on this name. We will follow that convention.
The convergence on a single filename is recent. Through 2024 every coding agent shipped its own bespoke instruction file — CLAUDE.md, .cursorrules, .github/copilot-instructions.md, .continuerules, .windsurfrules, and a long tail of others, leaving any cross-tool project with three or four near-duplicate files in its repo root. By August 2025, OpenAI had published AGENTS.md as a deliberately minimal sibling to README.md: same Markdown at the root of the repository, but for agents instead of humans. Within months it was supported by Codex, Cursor, Amp, Claude Code (which still also reads CLAUDE.md), Devin, GitHub Copilot, Gemini CLI, and Jules [7]. In December 2025 it was donated to the Linux Foundation as part of the Agentic AI Foundation, alongside Model Context Protocol [8], at which point the convention stopped being a vendor decision and rather became infrastructure.
So AGENTS.md is a settled convention now with the following properties:
- It is human-editable.
- The model reads it natively. Markdown is one of the most common formats in LLM training data and recent benchmarks show that for instruction-following on capable models like GPT-4 and Claude, Markdown-formatted prompts perform at least as well as JSON or YAML and frequently better, while consuming fewer tokens than either [9]. So Markdown is both the cheapest format we could pick and one of the formats the model handles most fluently.
- Files reference each other, sections nest, and we can later split or combine pieces without changing how the agent loads them.
Let's add a templates/ directory inside agent/, alongside main.py, and populate it with three files:
agent/
├── main.py
├── context.py (we'll create this in a moment)
└── templates/
├── AGENTS.md — entry point: who you are, where to look for more
├── persona.md — voice, tone, style
└── instructions.md — behavioral rules and constraints
The name templates/ is borrowed directly from nanobot, which uses the same name for the same idea (more on this in the Production reference at the end).
AGENTS.md
# About this agent
You are a personal assistant running on the user's local machine. Your
purpose is to be useful, direct, and honest in answering questions and
helping with tasks.
You read several files at the start of each conversation that define your
behavior more precisely. Treat them as authoritative; if they conflict
with this file, those files win.persona.md:
# Persona
You are concise. You answer the question first, then add context only if
the user is likely to need it.
You do not pad responses with reassurances or apologies. You do not start
replies with "Great question!" or end them with "Hope this helps!".
When you don't know something, say so. When you're guessing, label the
guess as a guess.instructions.md:
# Instructions
- For factual questions, lead with the answer.
- For coding questions, lead with working code; explanation after.
- For ambiguous questions, ask one clarifying question instead of guessing.
- Plain prose. No emojis unless the user explicitly asks for them.These are just examples of the shape. The reader is encouraged (Exercise 1) to throw all three out and write their own. Two reasonably canonical references on what makes an instruction file work are Anthropic's prompting guide [10] and Lee Boonstra's prompt-engineering whitepaper out of Google [11].
Building context.py
The hardcoded SYSTEM_PROMPT constant from earlier got us a behavioural change in two minutes, which is good but it has two problems that get worse as the agent grows. First, modifying it means navigating to a Python constant, which raises the bar for anyone who is not comfortable reading code. Second, as the agent grows we will want to vary the system prompt by context — different sections for different channels, on-demand skill files, injected memory — and stitching that together inside a single string constant scattered across files is the path to prompts you cannot debug. The fix is to keep all prompt content in one directory of Markdown files and use a single piece of code to assemble it.
For now, our context builder is a single function: it reads every Markdown file in templates/ and concatenates them into one string. That string becomes the system prompt for every turn of the conversation.
Create a new file context.py next to main.py inside agent/, alongside the templates/ directory we just created. Putting context.py and main.py in the same folder is what lets us write a flat from context import build_context later without package machinery, __init__.py, and relative imports.
The first step is to locate the templates directory:
import pathlib
TEMPLATES_DIR = pathlib.Path(__file__).parent / "templates"__file__ is the path to context.py itself, so TEMPLATES_DIR always resolves to <directory of context.py>/templates. The path is anchored to the file, not to the current working directory, so it works the same whether you run context.py directly or import it from main.py.
Now, create a function to walk every .md file in the templates directory and return an aggregated string:
def build_context() -> str:
"""Read all Markdown files in the templates directory and concatenate them
into a single system prompt string."""
if not TEMPLATES_DIR.exists():
return ""
parts: list[str] = []
for md_file in sorted(TEMPLATES_DIR.glob("*.md")):
parts.append(md_file.read_text())
return "\n\n".join(parts)There are a couple of things worth noticing about our context builder:
-
We just concatenate with one blank line between files. The Markdown headings inside each file already partition the content visually. Adding
--- File: persona.md ---style separators would be noise, and would also leak the directory's filesystem layout into the prompt for no benefit. -
If the
templates/directory does not exist,build_contextreturns an empty string. The agent will run, just without a persona. This is intentional since the agent should remain functional even when its templates are missing.
Now we can come back to main.py and wire it up. Three small edits:
Start by importing build_context from context.py. Add this at the top of main.py, alongside the other imports:
from context import build_contextThis is a regular Python from <module> import <name> statement — context is the module name (the filename context.py minus the extension), and build_context is the function we just wrote.
Because main.py and context.py live in the same directory, Python finds the context module on its path automatically when you run uv run main.py from the project root.
Now delete the hardcoded SYSTEM_PROMPT constant we added earlier. It is about to be replaced by whatever build_context() returns.
Fniallt, add a system parameter to llm() so the caller (the chat loop) can pass the prompt in:
def llm(messages: list[dict], system: str = "") -> str: # <-- system parameter added
"""Send a list of messages to the model and return the reply text."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=system, # <-- system prompt passed from a function argument
messages=messages,
)
return response.content[0].textNotice that we default system prompt to an empty string. The Anthropic API treats an empty system parameter the same as omitting it.
Now we can update chat to load templates/ once at startup and pass it on every turn:
def chat() -> None:
"""Run an interactive chat loop, accumulating turns in a single messages list."""
system = build_context() # <-- build context
messages: list[dict] = []
print("chat — Ctrl-D or empty line to exit\n")
while True:
try:
user_input = input("you: ").strip()
except (EOFError, KeyboardInterrupt):
print()
break
if not user_input:
break
messages.append({"role": "user", "content": user_input})
reply = llm(messages, system=system) # <-- pass context
messages.append({"role": "assistant", "content": reply})
print(f"\nassistant: {reply}\n")Notice that system is built once, before the loop, and reused every turn. We are not re-reading the templates on every message. If you edit persona.md while the chat is running, the change will not take effect until you restart. That is a deliberate simplification but you can explore a hot-reloading variant in Exercise 5.
Running it
Run the same prompt twice - once with templates/ in place, once after temporarily renaming the directory to templates_off/ so build_context returns an empty string. The simplest probe is hey who are you, because the answer leaks the model's idea of itself.
Without templates:
you: hey who are you
assistant: Hey there! I'm Claude, an AI assistant made by Anthropic.
I'm here to help with questions, have conversations, brainstorm ideas,
write, analyze things, or just chat — pretty much whatever you need. 😊
How about you? What brings you here today?
With templates:
you: hey who are you
assistant: I'm a personal assistant running on your local machine.
I can help with questions, tasks, coding, writing, research, and
general problem-solving. What do you need?
The model is the same in both runs, the user message is the same, and the conversation history is the same (it is empty now). The only thing that changed is the system prompt and with it the entire frame the model is replying from. Without templates, the model defaults to its trained-in persona: it identifies as Claude, name-checks Anthropic, drops an emoji, mirrors the user's tone, and tries to draw out further conversation. With templates, it speaks as the agent we described in AGENTS.md and persona.md: no emoji, no preamble, no upspeak, a sentence about what it is, a sentence about what it can do, and a question that moves the conversation forward instead of marking time.
Why all-at-once is fine for now
The naive approach we just took — load every file in templates/, concatenate, send on every turn — has obvious issues. If the directory grows to dozens of files, we are sending all of them every time, even when most are irrelevant to the current question. That is wasteful in tokens, money, and latency.
We are going to ignore this until two things land:
-
Prompt caching (Chapter 4). The system prompt is the same every turn — that is exactly what caches are good at. Most providers will charge ~10% of normal input rates for the cached prefix on subsequent calls, which collapses the cost of a large system prompt to almost nothing.
-
Skills (Chapter 18). A more sophisticated context layer where individual Markdown files are loaded on demand, only when the model decides they are relevant to the current task. Skills make a thousand-file
templates/directory tractable. We are not there yet.
Until then, plain concatenation works well. The templates/ directory will stay small (a few files) and prompt caching will keep it cheap once we add it.
The other half of "context"
The system prompt is one half of what the model sees before it starts thinking. The other half is the conversation history, specifically, the messages list from Chapter 2. From the model's perspective, both arrive together, in the same call, and both shape the next reply.
It is worth holding the word "context" loosely in this book. Sometimes "context" means the system prompt (this chapter). Sometimes it means the full input the model sees on a given turn (system + history + current message). Sometimes it means longer-term knowledge the agent has accumulated (Chapters 16 and 17).
For now: build_context() builds the static part of the context that are stored in templates/. The conversation history is the dynamic part. They will eventually meet inside context.py (Chapter 15 extends build_context to summarize old messages), but we are not splicing them yet.
Production reference
In nanobot, the equivalent of our build_context is the ContextBuilder class in nanobot/nanobot/agent/context.py — same module name, same purpose. Open the file and notice that the directory of Markdown files we built is called templates/ there too. Inside nanobot/nanobot/templates/ you will find AGENTS.md, SOUL.md, USER.md, and TOOLS.md — that is the production version of the three files we just wrote, plus two more (SOUL.md for the agent's deeper self-description, TOOLS.md for tool-use guidance).
As in the previous chapters, there are code chunks that are worth tracing in the nanobot once you have written your own build_context:
ContextBuilder.build_system_prompt()is the production version of ourbuild_context(). Strip away the parts we have not built yet — memory, skills, recent history — and what is left is_load_bootstrap_files()followed by"\n\n---\n\n".join(parts). That is recognisably our function. The interesting difference is that nanobot uses an explicit---separator between sections, not a blank line. The trade-off is that the sections are now individually addressable in the prompt and can be re-ordered or removed by index, at the cost of leaking the structural seam to the model.ContextBuilder._load_bootstrap_files()is the production version of ourglob("*.md")loop. It walks an explicitBOOTSTRAP_FILES = ["AGENTS.md", "SOUL.md", "USER.md", "TOOLS.md"]list in fixed order, and prefixes each file's content with a## <filename>header so the model can refer to them by name. Hard-coded order beats alphabetical sort once you care which file the model reads first; we will revisit that trade-off in Exercise 6.ContextBuilder._get_identity()is something we did not build at all: a small piece of runtime context — the workspace path, the OS, the Python version, the channel — rendered into a Markdown template (templates/agent/identity.md) and prepended to the system prompt. Real agents need to know whether they are running on macOS or Linux, whether they are talking to a CLI or a Telegram chat, what time it is. Chapter 15 introduces dynamic context injection and_get_identityis the simplest example of it.
Some design decision to notice here:
- Templates are bundled inside the package, not assumed to live in a sibling directory. Nanobot reads its templates with
importlib.resources.files("nanobot") / "templates", which works whether nanobot is installed from PyPI, a wheel, or a checkout. Ourpathlib.Path(__file__).parent / "templates"is fine for a script you run withuv run, but it breaks the moment youpip installyour agent into another project. - The system prompt has structure inside it. Our version is one flat string but nanobot's is sections joined by
---, with stable headings (# Memory,# Active Skills,# Recent History) so other parts of the codebase can read or write specific sections by name. By Chapter 17, when long-term memory is editing the system prompt at runtime, that addressability stops being a stylistic choice and becomes a requirement.
The closest single file to read after this chapter is nanobot/nanobot/agent/context.py. Try to recognise, on first read, the for md_file in sorted(...) loop from our build_context hiding inside _load_bootstrap_files.
Exercises
-
Write your own templates. Throw out the example
persona.mdandinstructions.md. Write versions that describe how you want your assistant to behave. Make it as opinionated as you like — terse, formal, mildly sarcastic, deeply nerdy. Run a short conversation and compare to the default. The agent should feel different. -
Watch the system tokens. The system prompt is paid for every turn. Modify
llmto also printresponse.usage.input_tokens. Run the same first question with a 1KBtemplates/directory and a 10KB one; observe how the input-token count changes. Project the cost difference over a 100-turn conversation. This is exactly what prompt caching exists to fix — Chapter 4 turns it on. -
Bootstrap-file ordering, the nanobot way. Open
_load_bootstrap_filesinnanobot/nanobot/agent/context.py. Notice theBOOTSTRAP_FILESlist and the## {filename}heading prefix. Reproduce both in yourbuild_context: replace the alphabetical glob with an explicit ordered list (["AGENTS.md", "persona.md", "instructions.md"]), and prefix each file's content with a## <filename>heading. Then ask the modelwhich of your instruction files should I edit if I want to change your tone?and watch it answer by filename. Compare to the same question with the original glob version, where the model has no idea your files have names. -
Stretch: hot reload. Modify
chatto re-callbuild_context()on every turn. Now you can editpersona.mdbetween messages and watch the model's behavior change mid-conversation. Useful for iterating on a persona; risky in production because the transcript no longer tells you which version of the prompt was active when. Most production agents do not hot-reload; understand why before you ship it. -
Stretch: identity injection. Open
_get_identityinnanobot/nanobot/agent/context.py. Notice that it prepends a small block describing the runtime — workspace path, OS, Python version, channel — to the system prompt. Reproduce a minimal version: havebuild_contextstart its output with one line containing the current date, the OS (platform.system()), and the user's local username (os.getenv("USER")). Ask the modelwhere am I running you?andwhat's today?and notice it now answers correctly without you having to tell it. This is the smallest possible example of dynamic context, the dynamic-prompt theme that Chapter 15 picks up in earnest. -
Stretch: declared ordering. Right now files are loaded alphabetically, which means
AGENTS.mdhappens to come first only because of theA. Replace this with explicit ordering driven byAGENTS.mditself: have the first list under the heading## Load orderinsideAGENTS.mddeclare which files to load and in what order. Now the directory structure is data andAGENTS.mdis its index. Compare the readability of this design to the hard-codedBOOTSTRAP_FILESlist from Exercise 3 — when does each one win?
References
[1] Messages — API reference. Claude API documentation. https://platform.claude.com/docs/en/api/messages
[2] Create chat completion — API reference. OpenAI API documentation. https://platform.openai.com/docs/api-reference/chat/create
[3] Giving Claude a role with a system prompt. Claude API documentation. https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/system-prompts
[4] Simon Willison. Prompt injection attacks against GPT-3. September 12, 2022. https://simonwillison.net/2022/Sep/12/prompt-injection/
[5] Preamble. Declassifying the responsible disclosure of the prompt injection attack vulnerability of GPT-3. https://www.preamble.com/prompt-injection-a-critical-vulnerability-in-the-gpt-3-transformer-and-how-we-can-begin-to-solve-it
[6] Prompt injection. Wikipedia. https://en.wikipedia.org/wiki/Prompt_injection
[7] AGENTS.md — A simple, open format for guiding coding agents. https://agents.md/
[8] Linux Foundation Announces the Formation of the Agentic AI Foundation. December 9, 2025. https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation
[9] He, Lin, et al. Does Prompt Formatting Have Any Impact on LLM Performance? arXiv:2411.10541. https://arxiv.org/abs/2411.10541
[10] Prompt engineering overview. Claude API documentation. https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/overview
[11] Lee Boonstra. Prompt Engineering. Google / Kaggle whitepaper, September 2024. https://www.kaggle.com/whitepaper-prompt-engineering