Knowledge Graphs

Knowledge Graphs let you describe your domain — categories, types, attributes, relationships — and have the engine auto-generate MCP tools that agents use to navigate it deterministically. No more hallucinated IDs. No more “approved record was missed”. Full recall on structured queries.

This is a declarative-first primitive: customers describe their domain in JSON Schema, push it as a bundle, and the engine generates list_X, get_X, list_X_ids tools per entity type for any agent bound to that bundle.

When to use Knowledge Graphs

Knowledge Graphs work for structured, slow-changing domain models — typically 10 to 2,000 entities per type, ~10,000 total. They are the right primitive for:

Taxonomies and ontologies — categories → brands → attributes; conditions → symptoms → treatments; jurisdictions → statutes → topics
Catalogs of typed records — product categories with attributes and brands; legal statutes by jurisdiction; controlled medical terminology
Known-issue libraries — products → modules → known issues → resolutions
Cross-referenced reference data — codes, registries, controlled vocabularies

They are not for:

Inventory or transactional data (20,000 SKUs with real-time stock) → use an external MCP server pointing at your existing system
Long-form documents and narrative content → use Knowledge / RAG (vector search) instead
Conversation memory (what the user told the agent) → use Memory instead

How it works

Define entity schemas in JSON Schema (Draft 2020-12) with SyntheticBrew x-* annotations
Bulk-import entity instances matching the schemas into a named bundle
Bind the bundle to one or more agents via the knowledge_graphs capability
The engine auto-generates MCP tools per entity type and injects them into bound agents

my-bundle/
├── manifest.yaml
├── schemas/
│   ├── category.schema.json
│   └── brand.schema.json
└── entities/
    ├── categories.yaml          # array of category entities
    └── brands.yaml              # array of brand entities

After `brewctl kg apply ./my-bundle`:

  Agent calls list_category(filters={popularity: "high"})
    → engine: SELECT data FROM kg_entity
              WHERE tenant_id = ? AND bundle_name = 'my-bundle'
                AND entity_type = 'category'
                AND data @> '{"popularity": "high"}'
    → returns full entity records, total count, with deterministic recall.

  Agent calls get_brand("north-aurora")
    → engine: SELECT data FROM kg_entity
              WHERE tenant_id = ? ... AND entity_id = 'north-aurora'
    → returns the entity, or null if not found. No hallucinated payload.

Entity schemas

An entity schema is a standard JSON Schema document with SyntheticBrew-specific x-* extension annotations. These annotations tell the engine which field is the primary ID, which fields are filterable, and which fields reference other entity types.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "category",
  "title": "Category",
  "description": "A product category in the catalog.",
  "type": "object",
  "x-id-field": "code",
  "x-tool-expose": ["list", "get"],
  "x-tool-description": "Catalog categories. Use list_category to enumerate top-level categories and get_category to fetch one by code.",
  "required": ["code", "name", "tier"],
  "additionalProperties": false,
  "properties": {
    "code": {
      "type": "string",
      "pattern": "^[a-z][a-z0-9_-]{1,30}$",
      "description": "Lowercase short code.",
      "x-index": true
    },
    "name": {
      "type": "string",
      "minLength": 3,
      "maxLength": 60
    },
    "tier": {
      "type": "string",
      "enum": ["primary", "secondary"],
      "x-index": true
    },
    "popularity": {
      "type": "string",
      "enum": ["high", "medium", "low"],
      "x-index": true
    },
    "brand_count": {
      "type": "integer",
      "minimum": 0,
      "x-derived": true
    }
  }
}

See the Schema annotations reference for the full list of x-* annotations.

Auto-generated MCP tools

For each entity schema you declare, the engine generates up to three MCP tools:

Tool	Generated when	Parameters	Returns
`list_<entity_type>`	`"list"` ∈ `x-tool-expose` (default)	`filters` (one per `x-index` field), `limit` (default 50, max 500), `offset`	`{items: [...], total: int}`
`get_<entity_type>`	`"get"` ∈ `x-tool-expose` (default)	`id` (matches `x-id-field`)	full entity or `null`
`list_<entity_type>_ids`	`"list_ids"` ∈ `x-tool-expose` (opt-in)	same as `list_*`	`{ids: [...], total: int}` (lighter payload)

Tools are namespaced per tenant — list_category in tenant A’s bundle is invisible to tenant B. Within a tenant, two bundles cannot expose conflicting tool names; the second apply is rejected with tool_name_collision_in_tenant.

Hard limits on list_* parameters:

limit must be 1..500. Values < 1 or > 500 return HTTP 400 [INVALID_INPUT] limit must be between 1 and 500 — the engine does not silently clamp.
filters keys must reference a field marked x-index: true in the schema. Unknown keys return HTTP 400 with the allowed-list.
The agent SDK passes filter values as JSON; LLMs occasionally send the wrong shape (string vs object) or the display label instead of the code. See the Prompt engineering section below for mitigations.

The x-ref annotation marks a property as referencing another entity type. The engine validates that every referenced ID exists at apply time. The admin UI renders refs as clickable navigation links.

{
  "$id": "brand",
  "type": "object",
  "x-id-field": "code",
  "properties": {
    "code": {"type": "string"},
    "category": {
      "type": "string",
      "x-ref": "category"
    },
    "parent_brand": {
      "type": "string",
      "x-ref": "brand",
      "x-ref-field": "code"
    }
  }
}

Cycles between entity types are allowed (A → B → A); they are detected and logged at apply time as a warning, not rejected.

Binding to agents

Self-hosted DB note. The knowledge_graphs capability type is enabled by migration 011_capabilities_kg_constraint.yaml (ships with engine 1.3.0). If you run Liquibase manually against an existing database from an older engine version, make sure that migration is applied before binding the capability — otherwise the INSERT into capabilities fails the DB CHECK constraint.

A bundle becomes visible to an agent through the knowledge_graphs capability:

agents:
  catalog-assistant:
    model: glm-5
    capabilities:
      - type: knowledge_graphs
        config:
          bundles: [ecommerce-catalog-example]
    system: |
      You are bound to the "ecommerce-catalog-example" knowledge graph.
      You have read-only tools: list_category, get_category, list_brand,
      get_brand, list_product_attribute, get_product_attribute.

      MANDATORY workflow on every user question:
        1. Identify which entity_type the question is about.
        2. Use list_/get_ tools — NEVER invent entity codes or attribute values.
        3. If a tool returns 0 results, say so explicitly. Suggest the closest
           existing entities by querying a related type.
        4. Prefer popularity=high categories first when not specified.
        5. Cite the entity code of every recommendation.

      Filter values must be ENTITY CODES (lowercase snake_case or kebab-case),
      not display names. Filters is an object, not a JSON-encoded string.

See the Prompt engineering section below — without this MANDATORY block your agent will ignore the KG tools and hallucinate.

Agents not bound to a bundle do not see its tools. Two agents in the same tenant can be bound to different bundles and see different tools.

Prompt engineering for KG-grounded agents

Auto-generated KG tools are only useful if the agent decides to call them. Real testing on production-scale bundles shows that a weak system prompt — for example "You help users navigate the product catalog" — causes the LLM to ignore its KG tools entirely and answer from general knowledge. The result: confident hallucinations of entity codes that look right but do not exist.

The fix is an explicit, mandatory workflow in the agent’s system prompt.

Required workflow

Use the following template (adapt the entity-type names to your bundle):

You are bound to the "{bundle_name}" knowledge graph. You have access to these
read-only tools for navigating it deterministically:

  list_<entity_type>(filters?, limit?, offset?) → enumerates entities
  get_<entity_type>(id)                         → fetches one entity by id

MANDATORY workflow on every user question:

  1. Identify which entity_types the question is about. If unclear, call
     list_<entity_type> on the most general type first to discover the domain.
  2. Use the list_/get_ tools to retrieve the actual entities. NEVER invent
     entity ids, codes, or attribute values from general knowledge.
  3. If a tool returns 0 results, say so explicitly. Suggest the closest
     existing entities by querying a related type (do not fabricate).
  4. Prefer popularity=high entities first when the user has not specified.
  5. Cite the entity id of every record you recommend, e.g. "north-aurora",
     so the user can verify.

Filter argument format:
  - Filter values must be ENTITY CODES (lowercase snake_case or kebab-case
    identifiers, e.g. category="footwear"), NOT display names ("Footwear").
  - `filters` is an object, not a JSON-encoded string. Example call:
    list_brand(filters={"category": "footwear", "tier": "premium"}, limit=50).

Why this matters

Weak prompt	Strong prompt (template above)
Agent gives generic catalog advice from training data.	Agent calls `list_category` first, then drills with `list_brand(filters={category: ...})`.
Hallucinated entity codes (e.g. `"brand-xyz"` that does not exist).	Cited entity codes resolve via `get_*` to real records.
User asks about a niche the bundle does not cover → agent invents one.	Agent says “no premium footwear brands in the catalog yet, closest is mid-tier `stride-co`” + cites real codes.
Filter args with display labels — 0 results, agent spirals through variants.	Filter args use canonical codes — first call returns correct items.

This guidance is the single most important production knowledge for shipping a KG-bound agent — without it, partners will see hallucinated answers even on a correctly applied bundle.

Coexistence with vector knowledge

Knowledge Graphs and the vector Knowledge / RAG primitive are complementary, not competing. The right pattern is to use both together:

Use case	Primitive
”What categories exist?” (full recall)	Knowledge Graphs (`list_category`)
“Show only premium-tier brands” (filtered)	Knowledge Graphs (`list_brand` with `filters={tier: "premium"}`)
“How do I care for wool?” (narrative search)	Knowledge / RAG (`knowledge_search`)
“What does the brand `north-aurora` carry?” (deterministic ID lookup)	Knowledge Graphs (`get_brand("north-aurora")`)

Both capabilities can be enabled on the same agent.

Hybrid pattern: KG + external MCP

For domains with large transactional data (real-time stock, customer orders, live pricing), put structure in a Knowledge Graph and inventory behind an external MCP server. See the Hybrid pattern guide for full examples.

Limitations (MVP)

Read-only graphs — agent tools only read. Mutations happen through customer-side API/UI.
No schema migrations — breaking schema changes require re-import of conforming entities.
No cross-bundle refs — x-ref resolves within the same bundle only.
Hot tool reload not supported — new tools become visible to new chat sessions, not in-flight ones.
Single-source-of-truth for bindings — agent bindings live in the existing capabilities.config, not in a separate binding table.

Next steps

Quickstart tutorial — get a working graph in 15 minutes
Schema annotations reference — every x-* annotation explained
Hybrid pattern guide — combine with external MCP for inventory
Admin UI guide — manage bundles through the dashboard