How to Build a Knowledge Base for Agents

Most companies treat knowledge like a filing cabinet. They store documents and hope people find them when needed. What we do when we start on each AI agent production project is different. We organize knowledge the way a company actually works and we will use this post to break down what that actually means and how to do this yourself.

The reason this matters for AI is simple.  Agents can only perform useful work when knowledge is organized for action, not just for storage. Documents dumps alone are not enough. Agents need playbooks to follow, rules to make decisions, what a senior partner’s gut tells them when they see a red flag or anomaly and structured information to pull from. Without that structure, AI behaves like a search engine which you want to avoid. With the proper foundation and structure, AI behaves like a capable assistant who can take action and do real work.

“Most companies store information. We organize how work actually happens”
John Griffin
Co-founder and CEO, Spiral Scout

Your Knowledge Base is a Pyramid

There are five layers to making knowledge useful for AI agents. Most companies stop at Layer 1 and wonder why their agents behave like search engines. The goal is to build through Layer 4 so agents behave like capable, trustworthy advisors and go from junior employees who make mistakes to trusted co-workers that need less and less oversight.

Layer 1: Raw Information

This is the data you already have. Pricing catalogs, product specs, plan features, coverage maps, FAQs, support docs. It exists somewhere in your company, probably across multiple systems, spreadsheets, wikis, Google docs, and people’s heads.

With this data, agents are able to search and retrieve it. An agent can look something up if you ask the right question in the right way. But it has no context for when to use this information, how to interpret it, how people use it in their decision making or what matters most.

This is where you run into the limitations since you really have just created a faster filing cabinet. The agent knows facts and raw data but not what to do with them.

Layer 2: Organized Knowledge

This is the same information, but structured by how it’s actually used and goes to the second important layer. Instead of one giant document about how to price a complex product or service (if we can use this specific example), you have separate knowledge bases organized by past product configurations/builds, invoice and billing data, by scenario type, or by user situation. Each piece of content has a clear title, is written in plain language, and is self-contained enough that the agent can use it without needing the rest of the document for context.

With this information, the agent can find the right information quickly and use it in the right context. When a user asks about bundling a product together based on a certain need, let’s say, the agent pulls from a specific manufacturer or provider’s knowledge base, not from a general document that might confuse it.

“When we think about organzing this it’s easiest to think about the analogy of training a junior employee. If you hired someone new and needed to teach them how your business works, you wouldn’t hand them a 500-page manual. You’d give them a focused handbook for their specific job. The knowledge base is that handbook.”
JD titov
Anton Titov
CTO, Spiral Scout

Layer 3: Captured Workflows

This is the layer where knowledge becomes operational. Instead of just facts, the system now contains the actual decision process your experts follow. When an employee doing this work evaluates whether someone build a customized or bundled product a certain way, they don’t just look up prices. They walks through a sequence where they could start by thinking about what’s their current setup? Do they have a legacy product this needs to work with? Do they have bundled service this needs to work with? What are the specifics of each product and service? What does that employee or end customer care about most? Each answer narrows the recommendation in a specific way.

In this step, you are now allowing the agent to follow the same decision process your best people follow. The agent doesn’t just know facts about a manufacturer or service provider, lets say. It knows the sequence of questions to ask, what each answer means, if they spot any red flags, and how to narrow from hundreds of possible recommendations to 2-3 that actually fit.

This is one of the harder parts to capture but the best way to do this is by walking your domain expert through real scenarios and record their reasoning. Let’s say you are speaking about building an insurance plan “switching” agent and it needs to know that if “A family of 4 on X insurance plan with this coverage and living in this state asks what options they have and what they should do, then employee would need to walk the person through your reasoning.” The workflow is in the reasoning, not in the final answer.

Layer 4: Encoded Expert Judgment

This is the tribal knowledge layer. The gotchas, the “when not to recommend” rules, the fine print that only someone with years of experience knows. Things like: “The switching costs or the trade-in credit requires that you are on a top-tier plan which costs $50/mo more, so the real savings is $X.” Or something like, “Autopay discount only applies with a bank account, not credit card, at insurance provider X.” Or: “Don’t switch right now, there’s a better promotion coming in 3 weeks.”

At this stage, the agent would be able to give trustworthy advice, not just technically correct advice. This is the difference between “Here are the cheapest plans” and “Here’s what I’d actually recommend for your situation, and here’s the catch you should know about before you decide.” This is what makes users trust the agent.

This is what a lot of startups or junior entrepreneurs with no real business experience lack since you can’t just Google this. It has to come from the people who know through subject matter expert sessions, scenario walkthroughs, great extraction questions, and then continuous refinement as the agent encounters situations it can’t handle.

Layer 5: Learning from Results

Once the agent is live, every conversation with a real user teaches you something. Users ask questions the team didn’t anticipate. You may observe that the agent gives advice that’s technically correct but misses a nuance. Edge cases surface that nobody thought of and have to provide a decision around. The knowledge base improves over time because real users reveal what’s missing and the expert fill in these details.

What agents can do with it: The best part at this layer is that the agents and system gets smarter the longer it is running. The team reviews flagged conversations, corrects the knowledge base and system prompts, and the agent handles that situation correctly next time. This is the continuous learning loop that compounds in value that many LLMs will not be able to replicate for a long time.

The best way to set this up accurately and effectively is to constantly be reviewing chat transcripts regularly (or build a simple agent that reviews them for you and flags the ones when it detects that the conversation needed human attention or the user left dissatisfied). When the agent struggles, update the knowledge base. Each fix makes the system permanently better and longer term a much wider moat to protect your business from AI and large language models.

How to Write Knowledge Base Content

This section is for the person who is actually going to create the documents that go into the knowledge base. It’s based on what works well with AI agents and what causes problems.

The Junior Employee Test

Before you write anything or decide what you want to put into your knowledge base, imagine you’re training a smart but brand-new, junior employee who knows nothing about your business. They’re capable of following instructions and making good decisions if you give them clear ones. But they will misinterpret anything ambiguous, mix up similar-sounding things if you don’t separate them clearly, and they can only hold about 5 pieces of information in their head at once.

The way we like to think about it is if your document would confuse that person, it will confuse the agent.

Knowledge Base Document Structure Rules

You want to start by writing these like a tutorial and not a reference manual. Each document should explain one thing clearly. Use a descriptive title that tells the agent exactly what this document is about and even what good vs bad is. Write in plain paragraphs with clear sections. Think “How to evaluate Insurance company X switching for a family of 4” not “Insurance Company X General Information.”

You want to keep documents self-contained. What that means is each document should make sense on its own without needing other documents for context. We want the system to pull individual chunks and not entire libraries. A good rule of thumb is that if a document references another document and can’t be understood without it, combine them or make each one standalone.

Next you want to use clear titles and section headers. A great system splits documents into smaller chunks for retrieval so having descriptive titles and headers helps the system know where to split and what each chunk is about. Poor titles (“Notes” or “Misc”) make the chunks useless because the system can’t tell what they contain and you won’t be able to as well.

Always break large topics into multiple documents. Instead of one 20-page document about insurance switching, create separate documents for each insurance carrier, each scenario type, and each common question. We see very clearly that smaller, focused documents retrieve more accurately than large, broad ones and the agent will work much better at this level.

You want to avoid contradictions across documents and root them out agressively. If two documents say different things about the same topic, the agent will get confused and may/will give incorrect or different answers each time. When rules change or information gets updated, make sure the old version is removed or clearly superseded.

Pro tip for Knowledge Base structures: After writing a document, ask yourself: If someone pulled just this one section out of context and randomly, would it make sense? Would it give the junior employee (or agent) enough to answer the question accurately? If not, you know you need to add the missing context or restructure.

What Goes in the Knowledge Base (and What Doesn’t)

This is one of the most important distinctions to understand and get right early. Putting the wrong type of information in the knowledge base creates maintenance headaches and reduces agent accuracy long term and is the easiest way to get frustrated with AI and complain about it being useless.

What goes in the Knowledge Base

TypeExamples
Domain expertise and gotchasThe $X trade-in credit requires the top-tier plan. Autopay discount only works with bank account on provide X. Don’t recommend this provider outside their legal operating footprint.
Decision rules and workflowsIf the user has insurance provider X AND wants to bring their family, recommend this provider first. If the user is paying off a mobile device with more than $200 remaining, check if the new carrier offers a payoff credit.
Common questions and misconceptionsWill I lose my phone number? (No, porting preserves it.) Does unlimited really mean unlimited? (Throttling thresholds exist.) Can I switch mid-contract? (Usually yes, with device payoff or a carrier will pay off your switching costs.)
Process guidesHow to unlock your phone from each carrier. How porting actually works. What to expect during the switching process. How long each carrier’s activation takes.
“When NOT to” rulesSituations where the agent should advise against action. When to tell a user to stay with their current carrier. When to wait for a better promotion.

What doesn’t go in a Knowledge Base

TypeWhy Not
Pricing data that changes frequentlyPlan prices, promotional offers, and costs that change daily or weekly.
If you put these in the KB, they go stale immediately.
Instead, if this info is important, the agent should query this data from your database or an API in real time.
Large structured tablesCompatibility matrices, feature comparison tables with 50+ rows.
The agent receives text, not tables. Large tables lose their structure when chunked.
Use API queries for structured data lookups.
Information the LLM already knowsGeneral knowledge about how insurance works, what a specific term is, what a specific product does.
The AI already knows this from its training data. Don’t waste KB space on things the agent can answer from general knowledge or from one of the popular LLMs.
Internal process docs unrelated to the agent’s jobHR policies, internal meeting notes, call transcripts, company strategy docs.
The KB should only contain knowledge the agent needs to do its specific job.
Unrelated content creates noise and increases the chance of irrelevant retrieval.

A good rule of thumb: If the information changes more than once a month and accuracy matters, it should come from an API, not the knowledge base. If the information is stable expertise that doesn’t change often and requires human judgment to create, it belongs in the knowledge base.

How to Organize Multiple Knowledge Bases

As your system grows, you’ll have multiple knowledge bases serving different agents and experiences. The architecture matters because it determines how easily you can scale and how accurately agents retrieve the right information.

Organize by scope, not by format. Each knowledge base should cover one clear domain so the agent isn’t searching through irrelevant content.

Carrier-Specific Knowledge Bases

One knowledge base per carrier, lets say if you are looking at mobile phone operators (Xfinity, T-Mobile, AT&T, Verizon, Spectrum, etc.). Each contains the gotchas, switching friction, bundling rules, and process guides specific to that carrier. This prevents the agent from mixing up Xfinity rules with T-Mobile rules, which needs to be flagged as a real risk when knowledge bases get too broad.

Shared / Cross-Carrier Knowledge Base

A provider-agnostic knowledge base containing rules and knowledge that apply regardless of which provider is involved. Things like: how switching works in general, common misconceptions about switching, general advice about when to stay vs. switch, how to evaluate total cost of coverage. Any future agent(s) you build will query this shared layer alongside the specific ones.

How Agents Use Multiple Knowledge Bases

An agent can be connected to multiple knowledge bases at once. When a user asks a question, the system searches across all connected knowledge bases and pulls the most relevant chunks. By keeping knowledge bases focused, you ensure the chunks that come back are actually relevant to what the user is asking about.

There is a practical limit to the system. The system injects approximately 5 knowledge snippets into the agent’s context for any given response. That’s why organization matters. If the 5 snippets come from a well-organized, focused knowledge base, they’re highly relevant. If they come from a giant, unfocused knowledge base, some of them will be noise.

Template for Writing a Knowledge Base Document

Use this structure for every document you add to the knowledge base. It’s designed to produce content that agents can retrieve and use accurately.

ElementWhat to Write
TitleA clear, descriptive title. How to evaluate company X’s offer for families that need X, Y and Z not “Company Notes or Company Info. The title helps the system know what this document is about during retrieval.
Context1-2 sentences establishing when this knowledge is relevant. This applies when a user is currently on company X and is considering switching to company Y while keeping their existing coverage.
The knowledge itselfWritten in plain English paragraphs. Explain it like you’re telling a smart but new employee what they need to know. Include the WHY, not just the WHAT. The autopay discount is $100/person, but it only applies with a bank account, not a credit card. This trips up a lot of users who set up autopay with their credit card and then don’t see the discount.
The gotcha / edge caseWhat could go wrong or what’s commonly misunderstood. Users often assume the $0/month promotional rate is permanent. It’s a 12-month credit. After that, the plan goes to $100/person. Still cheaper than company X, but they should know before they switch.
The recommendationWhat should the agent do with this knowledge? Recommend Company X as the top option when the user has a family member already with them through an employer and wants to bring their kids. Always mention the 12-month promotional period so the user isn’t surprised.

Example: A Complete KB Document from a real world example

Title: Lemonade Insurance Autopay Discount Rules

Context: This applies when recommending Lemonade Life Insurance to a user who plans to set up autopay.

Lemonade Life Insurance offers a $100/plan autopay discount, but there’s an important catch that most users miss. The discount only applies when autopay is set up with a bank account (ACH). If the user sets up autopay with a credit card, the discount does not apply. This is one of the most common complaints from new Lemonade Life Insurance customers because the enrollment flow doesn’t make this obvious.

What the agent should do: When recommending Lemonade Life Insurance, always mention that the autopay discount requires a bank account, not a credit card. If the user seems surprised or resistant to using a bank account, acknowledge that it’s inconvenient but explain the savings are significant ($100/person/year).

Getting Started: Your First 10 Documents

Don’t try to write 100 documents at once. Start with 10 high-value documents that cover the scenarios your agent will encounter most frequently. Quality beats quantity. Ten well-written documents will produce better agent behavior than 100 poorly structured ones.

Where to Start

  1. Write the 5 most common gotchas. The things your team tells people every day that aren’t obvious from the data. One document per gotcha.
  2. Write the “when NOT to recommend” guide. The scenarios where the honest advice is to do nothing. This is high-value because it builds trust and it’s knowledge that no comparison table provides.
  3. Write 2-3 company-specific guides. Start with your most common path. “How switching from one insurance company to another specific insurance provider actually works, step by step.” Include timing, what to expect, and common friction points.
  4. Write the bundling explainer. How life, health and dental bundling discounts work, which providers qualify, and the regional limitations. This is complex knowledge that users always need help with.
  5. Write the misconceptions FAQ. “Will I lose my coverage?” “Does unlimited really mean unlimited?” “Can I switch mid-contract?” These are the questions your support team answers every day. Write the real answers.

Once those 10 documents are in the system, test the agent against real scenarios. See where it answers well and where it struggles. The gaps will tell you what to write next. That’s the Layer 5 learning loop starting to work.

Remember: The knowledge base is not a filing cabinet. It’s the brain of your agents. The quality of what you put in directly determines the quality of what your users get out. Start small, write clearly, test constantly, and improve based on what real users ask. And if you need help, you can always reach out to our team at Spiral Scout for specific help.

Ready to turn your team’s expertise into a competitive moat?

Most companies are sitting on a goldmine of undocumented tribal knowledge. We help you extract those decision rules and build production-grade agent systems that actually run.

Operationalizing Your Expertise: Frequently Asked Questions

Is an AI Knowledge Base the same as a RAG (Retrieval-Augmented Generation) system?

RAG is the technical delivery mechanism, while the Knowledge Base is the intellectual engine. A traditional RAG system often fails because the source documents are structured like a library (Layer 1) instead of a playbook (Layer 3). We focus on building the playbook first so the agent can actually take action.

How much documentation do I need to start?

Quality beats quantity every time. Follow our “First 10 Documents” rule: encode your top 5 “expert gotchas” and your most common workflow. It is far more effective to have 10 high-quality, expert-led playbooks than 1,000 pages of generic PDFs that cause the agent to hallucinate.

Who should write these documents – our engineers or our experts?

Your subject matter experts (the “Experts”) provide the reasoning, while the formatting should follow our Junior Employee Test. During our Knowledge Extraction phase, we bridge this gap by interviewing your experts and structuring their thinking into a format that AI agents can execute without ambiguity.

Can I use my existing company Wiki or Notion as a Knowledge Base?

Yes, but avoid the “data dump.” Most internal wikis only cover Layer 1 (facts). To make them agent-ready, you must add Layer 3 (Workflows) and Layer 4 (Expert Judgment). Use the template provided in this guide to transform your static notes into active instructions.

What happens if our business rules change frequently?

Use our Volatility Rule: If information (like pricing or stock levels) changes more than once a month, it should come from an API or database, not the Knowledge Base. Reserve your Knowledge Base for stable “Expertise” – the decision-making logic that remains consistent even as the underlying numbers fluctuate.

Turn your ideas into innovation.

Your ideas are meant to live beyond your mind. That’s what we do – we turn your ideas into innovation that can change the world. Let’s get started with a free discovery call.
Scroll to top