Skip to content

Knowledge Graphs

Knowledge Graphs let you describe your domain — categories, types, attributes, relationships — and have the engine auto-generate MCP tools that agents use to navigate it deterministically. No more hallucinated IDs. No more “approved record was missed”. Full recall on structured queries.

This is a declarative-first primitive: customers describe their domain in JSON Schema, push it as a bundle, and the engine generates list_X, get_X, list_X_ids tools per entity type for any agent bound to that bundle.

Knowledge Graphs work for structured, slow-changing domain models — typically 10 to 2,000 entities per type, ~10,000 total. They are the right primitive for:

  • Taxonomies and ontologies — categories → brands → attributes; conditions → symptoms → treatments; jurisdictions → statutes → topics
  • Catalogs of typed records — product categories with attributes and brands; legal statutes by jurisdiction; controlled medical terminology
  • Known-issue libraries — products → modules → known issues → resolutions
  • Cross-referenced reference data — codes, registries, controlled vocabularies

They are not for:

  • Inventory or transactional data (20,000 SKUs with real-time stock) → use an external MCP server pointing at your existing system
  • Long-form documents and narrative content → use Knowledge / RAG (vector search) instead
  • Conversation memory (what the user told the agent) → use Memory instead
  1. Define entity schemas in JSON Schema (Draft 2020-12) with SyntheticBrew x-* annotations
  2. Bulk-import entity instances matching the schemas into a named bundle
  3. Bind the bundle to one or more agents via the knowledge_graphs capability
  4. The engine auto-generates MCP tools per entity type and injects them into bound agents
my-bundle/
├── manifest.yaml
├── schemas/
│ ├── category.schema.json
│ └── brand.schema.json
└── entities/
├── categories.yaml # array of category entities
└── brands.yaml # array of brand entities
After `brewctl kg apply ./my-bundle`:
Agent calls list_category(filters={popularity: "high"})
→ engine: SELECT data FROM kg_entity
WHERE tenant_id = ? AND bundle_name = 'my-bundle'
AND entity_type = 'category'
AND data @> '{"popularity": "high"}'
→ returns full entity records, total count, with deterministic recall.
Agent calls get_brand("north-aurora")
→ engine: SELECT data FROM kg_entity
WHERE tenant_id = ? ... AND entity_id = 'north-aurora'
→ returns the entity, or null if not found. No hallucinated payload.

An entity schema is a standard JSON Schema document with SyntheticBrew-specific x-* extension annotations. These annotations tell the engine which field is the primary ID, which fields are filterable, and which fields reference other entity types.

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "category",
"title": "Category",
"description": "A product category in the catalog.",
"type": "object",
"x-id-field": "code",
"x-tool-expose": ["list", "get"],
"x-tool-description": "Catalog categories. Use list_category to enumerate top-level categories and get_category to fetch one by code.",
"required": ["code", "name", "tier"],
"additionalProperties": false,
"properties": {
"code": {
"type": "string",
"pattern": "^[a-z][a-z0-9_-]{1,30}$",
"description": "Lowercase short code.",
"x-index": true
},
"name": {
"type": "string",
"minLength": 3,
"maxLength": 60
},
"tier": {
"type": "string",
"enum": ["primary", "secondary"],
"x-index": true
},
"popularity": {
"type": "string",
"enum": ["high", "medium", "low"],
"x-index": true
},
"brand_count": {
"type": "integer",
"minimum": 0,
"x-derived": true
}
}
}

See the Schema annotations reference for the full list of x-* annotations.

For each entity schema you declare, the engine generates up to three MCP tools:

ToolGenerated whenParametersReturns
list_<entity_type>"list"x-tool-expose (default)filters (one per x-index field), limit (default 50, max 500), offset{items: [...], total: int}
get_<entity_type>"get"x-tool-expose (default)id (matches x-id-field)full entity or null
list_<entity_type>_ids"list_ids"x-tool-expose (opt-in)same as list_*{ids: [...], total: int} (lighter payload)

Tools are namespaced per tenant — list_category in tenant A’s bundle is invisible to tenant B. Within a tenant, two bundles cannot expose conflicting tool names; the second apply is rejected with tool_name_collision_in_tenant.

Hard limits on list_* parameters:

  • limit must be 1..500. Values < 1 or > 500 return HTTP 400 [INVALID_INPUT] limit must be between 1 and 500 — the engine does not silently clamp.
  • filters keys must reference a field marked x-index: true in the schema. Unknown keys return HTTP 400 with the allowed-list.
  • The agent SDK passes filter values as JSON; LLMs occasionally send the wrong shape (string vs object) or the display label instead of the code. See the Prompt engineering section below for mitigations.

The x-ref annotation marks a property as referencing another entity type. The engine validates that every referenced ID exists at apply time. The admin UI renders refs as clickable navigation links.

{
"$id": "brand",
"type": "object",
"x-id-field": "code",
"properties": {
"code": {"type": "string"},
"category": {
"type": "string",
"x-ref": "category"
},
"parent_brand": {
"type": "string",
"x-ref": "brand",
"x-ref-field": "code"
}
}
}

Cycles between entity types are allowed (A → B → A); they are detected and logged at apply time as a warning, not rejected.

Self-hosted DB note. The knowledge_graphs capability type is enabled by migration 011_capabilities_kg_constraint.yaml (ships with engine 1.3.0). If you run Liquibase manually against an existing database from an older engine version, make sure that migration is applied before binding the capability — otherwise the INSERT into capabilities fails the DB CHECK constraint.

A bundle becomes visible to an agent through the knowledge_graphs capability:

agents:
catalog-assistant:
model: glm-5
capabilities:
- type: knowledge_graphs
config:
bundles: [ecommerce-catalog-example]
system: |
You are bound to the "ecommerce-catalog-example" knowledge graph.
You have read-only tools: list_category, get_category, list_brand,
get_brand, list_product_attribute, get_product_attribute.
MANDATORY workflow on every user question:
1. Identify which entity_type the question is about.
2. Use list_/get_ tools — NEVER invent entity codes or attribute values.
3. If a tool returns 0 results, say so explicitly. Suggest the closest
existing entities by querying a related type.
4. Prefer popularity=high categories first when not specified.
5. Cite the entity code of every recommendation.
Filter values must be ENTITY CODES (lowercase snake_case or kebab-case),
not display names. Filters is an object, not a JSON-encoded string.

See the Prompt engineering section below — without this MANDATORY block your agent will ignore the KG tools and hallucinate.

Agents not bound to a bundle do not see its tools. Two agents in the same tenant can be bound to different bundles and see different tools.

Auto-generated KG tools are only useful if the agent decides to call them. Real testing on production-scale bundles shows that a weak system prompt — for example "You help users navigate the product catalog" — causes the LLM to ignore its KG tools entirely and answer from general knowledge. The result: confident hallucinations of entity codes that look right but do not exist.

The fix is an explicit, mandatory workflow in the agent’s system prompt.

Use the following template (adapt the entity-type names to your bundle):

You are bound to the "{bundle_name}" knowledge graph. You have access to these
read-only tools for navigating it deterministically:
list_<entity_type>(filters?, limit?, offset?) → enumerates entities
get_<entity_type>(id) → fetches one entity by id
MANDATORY workflow on every user question:
1. Identify which entity_types the question is about. If unclear, call
list_<entity_type> on the most general type first to discover the domain.
2. Use the list_/get_ tools to retrieve the actual entities. NEVER invent
entity ids, codes, or attribute values from general knowledge.
3. If a tool returns 0 results, say so explicitly. Suggest the closest
existing entities by querying a related type (do not fabricate).
4. Prefer popularity=high entities first when the user has not specified.
5. Cite the entity id of every record you recommend, e.g. "north-aurora",
so the user can verify.
Filter argument format:
- Filter values must be ENTITY CODES (lowercase snake_case or kebab-case
identifiers, e.g. category="footwear"), NOT display names ("Footwear").
- `filters` is an object, not a JSON-encoded string. Example call:
list_brand(filters={"category": "footwear", "tier": "premium"}, limit=50).
Weak promptStrong prompt (template above)
Agent gives generic catalog advice from training data.Agent calls list_category first, then drills with list_brand(filters={category: ...}).
Hallucinated entity codes (e.g. "brand-xyz" that does not exist).Cited entity codes resolve via get_* to real records.
User asks about a niche the bundle does not cover → agent invents one.Agent says “no premium footwear brands in the catalog yet, closest is mid-tier stride-co” + cites real codes.
Filter args with display labels — 0 results, agent spirals through variants.Filter args use canonical codes — first call returns correct items.

This guidance is the single most important production knowledge for shipping a KG-bound agent — without it, partners will see hallucinated answers even on a correctly applied bundle.

Knowledge Graphs and the vector Knowledge / RAG primitive are complementary, not competing. The right pattern is to use both together:

Use casePrimitive
”What categories exist?” (full recall)Knowledge Graphs (list_category)
“Show only premium-tier brands” (filtered)Knowledge Graphs (list_brand with filters={tier: "premium"})
“How do I care for wool?” (narrative search)Knowledge / RAG (knowledge_search)
“What does the brand north-aurora carry?” (deterministic ID lookup)Knowledge Graphs (get_brand("north-aurora"))

Both capabilities can be enabled on the same agent.

For domains with large transactional data (real-time stock, customer orders, live pricing), put structure in a Knowledge Graph and inventory behind an external MCP server. See the Hybrid pattern guide for full examples.

  • Read-only graphs — agent tools only read. Mutations happen through customer-side API/UI.
  • No schema migrations — breaking schema changes require re-import of conforming entities.
  • No cross-bundle refsx-ref resolves within the same bundle only.
  • Hot tool reload not supported — new tools become visible to new chat sessions, not in-flight ones.
  • Single-source-of-truth for bindings — agent bindings live in the existing capabilities.config, not in a separate binding table.