Writing Prompts for AI Agents

In my last post Building an Agent from Scratch, I went over some of the architectural components of the AI agent I helped build at work. Over the course of four months of development, I read extensively about prompt engineering, and by iterating over the prompts for my agent, I developed a few of my own rule of thumbs that have offererd terrific returns. They've definitely helped me hone my agent's reasoning abilities, scale agentic workflows, and cut down prompt length without impacting output quality. Plus, there's a few gotchas in writing prompts that are hard to catch and can influence erratic responses from LLMs. I haven't seen these being talked about elsewhere, so I've decided to do it myself.

Avoid Conflicting Instructions

This by far makes the biggest difference. Often, when system prompts get too long, it's hard to keep track of all the instructions littered everywhere in the prompt. The catch is that variations in the same instructions in different parts of the prompt can make the LLM struggle to reconcile them. You might never catch this happening, because the prompt might look fine at first glance, but the subtext of its parts might not sum up correctly. When you see that the output does not match your expectations, your first instinct might be to be more heavy handed: adding more guardrails, writing instructions in uppercase, wrapping them in asterisks, etc. When the LLM still seems to be ignoring your instructions, you might be tempted to double down or switch things up entirely.

When you sense that the LLM is struggling to fulfill a particular requirement, take a step back and revisit the areas in your prompt that touch upon the requirement. See if the undertones of your instructions vary. For example, if in one place, you're asking the LLM to be "easy to talk to", but in another, you're asking it to cut out fluff or pleasantries, the LLM might not end up doing either. Instead, keep all instructions covering a functional requirement in one place, possible under one section. If you need to remind the LLM to follow those instructions elsewhere, refer to the section's name in that other place, instead of rephrasing your instructions. This leads to my second point.

Make Prompts Flow, Don't Make LLMs Jump

The system prompt is a place for dumping a majority of the instructions for your agent. However, not organizing the prompt with enough care can degrade the reasoning abilities of your LLM. Say, if your agent has two capabilities: summarizing an image and writing a poem. At each turn, the LLM decides what to do, and performs the action. For this task, the first section of your prompt should help the LLM decide how to make the decision between summarizing an image and writing a prompt. From this section, you can point to the next appropriate section, either for summarizing images or writing poems, where you guide the LLM solely on that specific task. This way, you make it easier for the LLM to reason and act with a simple textual decision tree of sorts.

One Shot Over Few Shots

Providing examples in prompts has become a very popular strategy, and almost all the articles on prompt engineering I've read encourage adding examples. However, I'm convinced that in most cases, you don't need examples. The truth is that prompts introduce a Goldilocks problem of sorts. Too few examples, and they're not enough to cover all the cases, and the LLM is essentially lost because it doesn't know how to proceed when a user sends a message that's worded differently than the prompts. Too many, and the LLM becomes overfitted.

The solution is to veer away from examples entirely, with exceptions. When you need the LLM's output to always adhere to very specific constraints, go with examples. If you're generating SQL queries with an LLM, and you're aware of PostgreSQL quirks that your LLM isn't, include examples of using the specific syntax. But any time you're making your LLM do slightly creative (which is most chat based AI agents), err on the side of letting the LLM make judgements, without restricting it with examples. The trick is to make your prompts transparent about your expectations, and the wording unambiguous. The LLM can handle the rest.

Think about Markdown vs XML

In longer prompts, XML tags can offer clear demarcation between sections, better than Markdown headings. As mentioned earlier, you can refer to these sections by tag names in multiple places in your prompt. With H1 markdown headings, this can get clunky. For example, instead of saying, "refer to Rules for Responding to Users", you can write "refer to the RESPONSE_BUILDING_FLOW section" to point the LLM to the relevant instructions. However, be wary of nesting XML tags: they're good for top level section definitions. With too many XML tags, you might unknowingly create a complex decision tree for the LLM, which would definitely backfire. Instead, to structure a section inside XML tags, Markdown headings work fine.

Augment Your Prompts Carefully

In addition to function calling, LLM model providers allow you to control LLMs' responses in more ways. OpenAI offers structured outputs, Anthropic models have JSON mode, and some Google models support controlled generation. These are terrific ways to augment your prompt without clouding them – you get granular control over the output format of your LLM, without barraging the LLM in its system prompts. If you define a JSON response schema such that the LLM always outputs a boolean value, the model's provider will ensure that the output is always a boolean, and you don't have to write any extra rules or validation logic to ensure that. However, what you should ensure is that your schemas align with your prompts. Don't include a field in the schema that's not referred to at all in the prompt, and don't shy away from referring to field names in your prompts when you're writing instructions on generating values for those fields. If there are optional fields in your schema, explain in your prompt when it is okay to omit fields. When your prompt is aligned well with the other things that go in the request to the LLM, your LLM behaves more similarly to what you would expect, and with better consistency.