Knowledge Graphs
Knowledge Graphs let you describe your domain — categories, types, attributes, relationships — and have the engine auto-generate MCP tools that agents use to navigate it deterministically. No more hallucinated IDs. No more “approved record was missed”. Full recall on structured queries.
This is a declarative-first primitive: customers describe their domain in JSON Schema, push it as a bundle, and the engine generates list_X, get_X, list_X_ids tools per entity type for any agent bound to that bundle.
When to use Knowledge Graphs
Section titled “When to use Knowledge Graphs”Knowledge Graphs work for structured, slow-changing domain models — typically 10 to 2,000 entities per type, ~10,000 total. They are the right primitive for:
- Taxonomies and ontologies — categories → brands → attributes; conditions → symptoms → treatments; jurisdictions → statutes → topics
- Catalogs of typed records — product categories with attributes and brands; legal statutes by jurisdiction; controlled medical terminology
- Known-issue libraries — products → modules → known issues → resolutions
- Cross-referenced reference data — codes, registries, controlled vocabularies
They are not for:
- Inventory or transactional data (20,000 SKUs with real-time stock) → use an external MCP server pointing at your existing system
- Long-form documents and narrative content → use Knowledge / RAG (vector search) instead
- Conversation memory (what the user told the agent) → use Memory instead
How it works
Section titled “How it works”- Define entity schemas in JSON Schema (Draft 2020-12) with SyntheticBrew
x-*annotations - Bulk-import entity instances matching the schemas into a named bundle
- Bind the bundle to one or more agents via the
knowledge_graphscapability - The engine auto-generates MCP tools per entity type and injects them into bound agents
my-bundle/├── manifest.yaml├── schemas/│ ├── category.schema.json│ └── brand.schema.json└── entities/ ├── categories.yaml # array of category entities └── brands.yaml # array of brand entities
After `brewctl kg apply ./my-bundle`:
Agent calls list_category(filters={popularity: "high"}) → engine: SELECT data FROM kg_entity WHERE tenant_id = ? AND bundle_name = 'my-bundle' AND entity_type = 'category' AND data @> '{"popularity": "high"}' → returns full entity records, total count, with deterministic recall.
Agent calls get_brand("north-aurora") → engine: SELECT data FROM kg_entity WHERE tenant_id = ? ... AND entity_id = 'north-aurora' → returns the entity, or null if not found. No hallucinated payload.Entity schemas
Section titled “Entity schemas”An entity schema is a standard JSON Schema document with SyntheticBrew-specific x-* extension annotations. These annotations tell the engine which field is the primary ID, which fields are filterable, and which fields reference other entity types.
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "category", "title": "Category", "description": "A product category in the catalog.", "type": "object", "x-id-field": "code", "x-tool-expose": ["list", "get"], "x-tool-description": "Catalog categories. Use list_category to enumerate top-level categories and get_category to fetch one by code.", "required": ["code", "name", "tier"], "additionalProperties": false, "properties": { "code": { "type": "string", "pattern": "^[a-z][a-z0-9_-]{1,30}$", "description": "Lowercase short code.", "x-index": true }, "name": { "type": "string", "minLength": 3, "maxLength": 60 }, "tier": { "type": "string", "enum": ["primary", "secondary"], "x-index": true }, "popularity": { "type": "string", "enum": ["high", "medium", "low"], "x-index": true }, "brand_count": { "type": "integer", "minimum": 0, "x-derived": true } }}See the Schema annotations reference for the full list of x-* annotations.
Auto-generated MCP tools
Section titled “Auto-generated MCP tools”For each entity schema you declare, the engine generates up to three MCP tools:
| Tool | Generated when | Parameters | Returns |
|---|---|---|---|
list_<entity_type> | "list" ∈ x-tool-expose (default) | filters (one per x-index field), limit (default 50, max 500), offset | {items: [...], total: int} |
get_<entity_type> | "get" ∈ x-tool-expose (default) | id (matches x-id-field) | full entity or null |
list_<entity_type>_ids | "list_ids" ∈ x-tool-expose (opt-in) | same as list_* | {ids: [...], total: int} (lighter payload) |
Tools are namespaced per tenant — list_category in tenant A’s bundle is invisible to tenant B. Within a tenant, two bundles cannot expose conflicting tool names; the second apply is rejected with tool_name_collision_in_tenant.
Hard limits on list_* parameters:
limitmust be1..500. Values< 1or> 500return HTTP 400[INVALID_INPUT] limit must be between 1 and 500— the engine does not silently clamp.filterskeys must reference a field markedx-index: truein the schema. Unknown keys return HTTP 400 with the allowed-list.- The agent SDK passes filter values as JSON; LLMs occasionally send the wrong shape (string vs object) or the display label instead of the code. See the Prompt engineering section below for mitigations.
Cross-references and navigation
Section titled “Cross-references and navigation”The x-ref annotation marks a property as referencing another entity type. The engine validates that every referenced ID exists at apply time. The admin UI renders refs as clickable navigation links.
{ "$id": "brand", "type": "object", "x-id-field": "code", "properties": { "code": {"type": "string"}, "category": { "type": "string", "x-ref": "category" }, "parent_brand": { "type": "string", "x-ref": "brand", "x-ref-field": "code" } }}Cycles between entity types are allowed (A → B → A); they are detected and logged at apply time as a warning, not rejected.
Binding to agents
Section titled “Binding to agents”Self-hosted DB note. The
knowledge_graphscapability type is enabled by migration011_capabilities_kg_constraint.yaml(ships with engine 1.3.0). If you run Liquibase manually against an existing database from an older engine version, make sure that migration is applied before binding the capability — otherwise the INSERT intocapabilitiesfails the DB CHECK constraint.
A bundle becomes visible to an agent through the knowledge_graphs capability:
agents: catalog-assistant: model: glm-5 capabilities: - type: knowledge_graphs config: bundles: [ecommerce-catalog-example] system: | You are bound to the "ecommerce-catalog-example" knowledge graph. You have read-only tools: list_category, get_category, list_brand, get_brand, list_product_attribute, get_product_attribute.
MANDATORY workflow on every user question: 1. Identify which entity_type the question is about. 2. Use list_/get_ tools — NEVER invent entity codes or attribute values. 3. If a tool returns 0 results, say so explicitly. Suggest the closest existing entities by querying a related type. 4. Prefer popularity=high categories first when not specified. 5. Cite the entity code of every recommendation.
Filter values must be ENTITY CODES (lowercase snake_case or kebab-case), not display names. Filters is an object, not a JSON-encoded string.See the Prompt engineering section below — without this MANDATORY block your agent will ignore the KG tools and hallucinate.
Agents not bound to a bundle do not see its tools. Two agents in the same tenant can be bound to different bundles and see different tools.
Prompt engineering for KG-grounded agents
Section titled “Prompt engineering for KG-grounded agents”Auto-generated KG tools are only useful if the agent decides to call them. Real testing on production-scale bundles shows that a weak system prompt — for example "You help users navigate the product catalog" — causes the LLM to ignore its KG tools entirely and answer from general knowledge. The result: confident hallucinations of entity codes that look right but do not exist.
The fix is an explicit, mandatory workflow in the agent’s system prompt.
Required workflow
Section titled “Required workflow”Use the following template (adapt the entity-type names to your bundle):
You are bound to the "{bundle_name}" knowledge graph. You have access to theseread-only tools for navigating it deterministically:
list_<entity_type>(filters?, limit?, offset?) → enumerates entities get_<entity_type>(id) → fetches one entity by id
MANDATORY workflow on every user question:
1. Identify which entity_types the question is about. If unclear, call list_<entity_type> on the most general type first to discover the domain. 2. Use the list_/get_ tools to retrieve the actual entities. NEVER invent entity ids, codes, or attribute values from general knowledge. 3. If a tool returns 0 results, say so explicitly. Suggest the closest existing entities by querying a related type (do not fabricate). 4. Prefer popularity=high entities first when the user has not specified. 5. Cite the entity id of every record you recommend, e.g. "north-aurora", so the user can verify.
Filter argument format: - Filter values must be ENTITY CODES (lowercase snake_case or kebab-case identifiers, e.g. category="footwear"), NOT display names ("Footwear"). - `filters` is an object, not a JSON-encoded string. Example call: list_brand(filters={"category": "footwear", "tier": "premium"}, limit=50).Why this matters
Section titled “Why this matters”| Weak prompt | Strong prompt (template above) |
|---|---|
| Agent gives generic catalog advice from training data. | Agent calls list_category first, then drills with list_brand(filters={category: ...}). |
Hallucinated entity codes (e.g. "brand-xyz" that does not exist). | Cited entity codes resolve via get_* to real records. |
| User asks about a niche the bundle does not cover → agent invents one. | Agent says “no premium footwear brands in the catalog yet, closest is mid-tier stride-co” + cites real codes. |
| Filter args with display labels — 0 results, agent spirals through variants. | Filter args use canonical codes — first call returns correct items. |
This guidance is the single most important production knowledge for shipping a KG-bound agent — without it, partners will see hallucinated answers even on a correctly applied bundle.
Coexistence with vector knowledge
Section titled “Coexistence with vector knowledge”Knowledge Graphs and the vector Knowledge / RAG primitive are complementary, not competing. The right pattern is to use both together:
| Use case | Primitive |
|---|---|
| ”What categories exist?” (full recall) | Knowledge Graphs (list_category) |
| “Show only premium-tier brands” (filtered) | Knowledge Graphs (list_brand with filters={tier: "premium"}) |
| “How do I care for wool?” (narrative search) | Knowledge / RAG (knowledge_search) |
“What does the brand north-aurora carry?” (deterministic ID lookup) | Knowledge Graphs (get_brand("north-aurora")) |
Both capabilities can be enabled on the same agent.
Hybrid pattern: KG + external MCP
Section titled “Hybrid pattern: KG + external MCP”For domains with large transactional data (real-time stock, customer orders, live pricing), put structure in a Knowledge Graph and inventory behind an external MCP server. See the Hybrid pattern guide for full examples.
Limitations (MVP)
Section titled “Limitations (MVP)”- Read-only graphs — agent tools only read. Mutations happen through customer-side API/UI.
- No schema migrations — breaking schema changes require re-import of conforming entities.
- No cross-bundle refs —
x-refresolves within the same bundle only. - Hot tool reload not supported — new tools become visible to new chat sessions, not in-flight ones.
- Single-source-of-truth for bindings — agent bindings live in the existing
capabilities.config, not in a separate binding table.
Next steps
Section titled “Next steps”- Quickstart tutorial — get a working graph in 15 minutes
- Schema annotations reference — every
x-*annotation explained - Hybrid pattern guide — combine with external MCP for inventory
- Admin UI guide — manage bundles through the dashboard