Knowledge Graphs — Schema Annotations Reference

This page is the canonical reference for the SyntheticBrew-specific x-* annotations on entity schemas. Each annotation extends standard JSON Schema (Draft 2020-12); unknown x-* keywords are ignored by JSON Schema validators, so the schema remains portable.

Top-level annotations (on the schema root)

`x-id-field` (string, required)

Identifies which property of the entity is its unique ID within the (tenant, bundle, entity_type) scope. The referenced property must exist on the same schema.

{
  "$id": "category",
  "x-id-field": "code",
  "properties": {
    "code": {"type": "string"},
    "name": {"type": "string"}
  }
}

Without x-id-field, the schema is rejected at apply time with error code schema_invalid.

`x-tool-expose` (array of strings, optional, default `["list", "get"]`)

Controls which auto-generated MCP tools the engine creates for this entity type. Valid entries:

Value	Generates
`"list"`	`list_<entity_type>(filters, limit, offset)`
`"get"`	`get_<entity_type>(id)`
`"list_ids"`	`list_<entity_type>_ids(filters, limit, offset)` — lighter payload

Use "list_ids" when entity payloads are large but you still want full enumeration:

{
  "x-id-field": "id",
  "x-tool-expose": ["list", "get", "list_ids"]
}

Runtime behaviour of generated list_* tools:

limit argument is clamped to 1..500 — values outside that range return HTTP 400 [INVALID_INPUT] limit must be between 1 and 500. The engine does not silently clamp.
filters argument keys are restricted to properties marked x-index: true. Unknown keys return HTTP 400 with the allowed-list in the response body.
filters argument value must match the canonical entity code (the value the bundle author wrote), not a display label. LLMs frequently confuse the two — see Prompt engineering for KG-grounded agents.

`x-tool-description` (string, optional)

Overrides the auto-derived tool description that agents see. By default, the engine uses the schema’s top-level description field. Use this annotation when the tool needs a richer description than the schema-level documentation.

{
  "$id": "brand",
  "description": "A brand carried in the catalog.",
  "x-tool-description": "Brands in the catalog. Use list_brand to enumerate (filter by category or tier); use get_brand to fetch one by its code."
}

Property-level annotations

`x-index` (boolean, optional, default `false`)

Marks a property as filterable. Filterable properties become parameters of the list_* and list_*_ids tools. The engine uses a generic JSONB GIN index that covers @> containment queries on the entire data column, so marking many fields as indexed has no additional storage cost.

{
  "x-id-field": "code",
  "properties": {
    "code": {"type": "string"},
    "popularity": {
      "type": "string",
      "enum": ["high", "medium", "low"],
      "x-index": true
    }
  }
}

Resulting tool signature:

list_category(
  filters?: {popularity?: "high" | "medium" | "low"},  // ← built from x-index + enum
  limit?: number,
  offset?: number
)

Only x-index fields appear as filter parameters. Other fields are returned in entity payloads but cannot be filtered on.

`x-ref` (string, optional)

Marks a property as a reference to another entity type. The value is the target entity type name. Cross-refs are validated at apply time — every referenced entity must exist within the same bundle.

{
  "$id": "brand",
  "properties": {
    "category": {
      "type": "string",
      "x-ref": "category"
    }
  }
}

When the customer applies a bundle, the engine checks that every value of the category property matches an existing category entity’s ID. Bundles with broken refs are rejected with error code invalid_ref.

`x-ref-field` (string, optional)

When set, specifies which field of the target entity to match against. Default: the target’s x-id-field. Use this when you want to ref by a secondary identifier (e.g. slug instead of code):

{
  "properties": {
    "category_slug": {
      "type": "string",
      "x-ref": "category",
      "x-ref-field": "slug"
    }
  }
}

The engine validates against category.slug values rather than category.code.

`x-derived` (boolean, optional, default `false`)

Marks a property as computed by the engine rather than authored by the customer. Derived fields are excluded from filter parameters on list_* tools (the customer cannot filter on something they did not author):

{
  "properties": {
    "brand_count": {
      "type": "integer",
      "x-derived": true,
      "x-index": true
    }
  }
}

In this example, brand_count is indexed (so it appears in returned entities) but not in filters (because it is derived).

`x-content-type` (string, optional)

UI rendering hint for the admin dashboard. The engine ignores this annotation; the SPA reads it to decide whether to render a field as plain text, code, markdown, or a clickable URL.

Recommended values:

Value	Rendering
`"markdown"`	Render as formatted markdown in the inspect drawer
`"code"`	Monospace font, syntax-highlighted
`"url"`	Clickable hyperlink

{
  "properties": {
    "description": {
      "type": "string",
      "x-content-type": "markdown"
    },
    "homepage": {
      "type": "string",
      "format": "uri",
      "x-content-type": "url"
    }
  }
}

Validation rules

When the engine validates a customer-supplied entity schema, it enforces:

x-id-field is required and must reference an existing property.
x-tool-expose values must be one of "list", "get", "list_ids". Unknown values reject the schema.
x-ref values must reference an entity type that exists in the same bundle or is being applied in the same import.
Property types for x-ref must be "string" or "array" of strings. Other types are ignored with a warning.
Tool name collisions within a tenant are rejected. If two bundles in the same tenant generate list_category, the second apply fails with tool_name_collision_in_tenant.

Entity type naming

The entity type name (which becomes the suffix of auto-generated tools — list_<entity_type>) must match the pattern ^[a-z][a-z0-9_]{0,62}[a-z0-9]$:

Lower-case letters, digits, single underscores
Cannot start or end with an underscore
Length 2-64

Valid: category, brand, product_attribute, legal_topic.

Invalid: Category (uppercase), category-name (hyphen), _x (leading underscore), x (too short).

Bundle naming

Bundle names must match ^[a-z][a-z0-9-]{0,62}[a-z0-9]$:

Lower-case letters, digits, single hyphens
Cannot start or end with a hyphen
Length 2-64

Valid: ecommerce-catalog, support-modules, v2-products.

Invalid: MyBundle (uppercase), bundle_name (underscore), 1bundle (leading digit), ../etc (path traversal).

Limits

Max bundles per tenant: 20
Max entity types per bundle: 50
Max entities per entity type: 10,000
Max single entity JSON size: 100 KB
Max total bundle JSON size: 10 MB

Exceeding any limit returns HTTP 413 (limit_exceeded).

Query-API annotations (1.4.0)

Three annotations layered onto the schema control how the auto-generated MCP tools behave at query time. All are opt-in — bundles authored for 1.3.x keep working unchanged.

`x-summary-fields` — projection for `list_<entity>_ids`

When set, the list_<entity>_ids tool returns a meaningful preview instead of bare ids. The id field is auto-included; only top-level properties are allowed (no dot-notation).

x-id-field: code
x-summary-fields: [title, popularity, industry]

Tool response shape switches from {ids, total} to {items, total} with each item carrying the entity’s id under the key id (the engine normalises the x-id-field’s value into the generic id key for tool responses — the agent sees {"id": "<value-of-x-id-field>", "title": ..., "popularity": ..., "industry": ...} regardless of whether your x-id-field is code, slug, or anything else). The agent decides which entities are worth a full get_<entity>(ids=[...]) round-trip.

Filter and sort operators (no schema annotation needed — schema types drive validation)

Property type	Allowed filter operators	Sort	Notes
`string`	equality, `[in]`	yes (lex)	Plain text comparison
`string` + `format: date`/`date-time`	equality, `[in]`, range (`[gte/gt/lte/lt]`)	yes	Casts to `timestamptz` for both filter and sort
`string` + `enum: [...]`	equality, `[in]`	yes (declaration order)	Sort `desc` = first in declared array; sort `asc` = last (`array_position` under the hood)
`integer`, `number`	equality, `[in]`, range	yes	Casts to `numeric`
`boolean`	equality	yes
`array`, `object`	equality only (`@>` containment)	technically allowed but not recommended	Sort produces stringified JSON ordering; use top-level scalar fields for predicates and sort

Range operators on a non-numeric / non-date field return 400 with a clear message. IN-list size is capped at 500 to mirror the batch get cap. The sort validator gates fields against x-index only — sort on an array or object field will run (with stringified JSON ordering) rather than being rejected, but the results are rarely useful.

Sort semantics

Sort fields must be marked x-index: true.
Direction is asc or desc (case-insensitive). Anything else → 400.
Multi-field sort produces composite ordering: [{popularity:desc}, {code:asc}] sorts by popularity first, code asc as tiebreak.
Missing values appear last regardless of direction (NULLS LAST).
Enum critical: popularity enum [very_high, high, normal, low] sorted desc produces [very_high, high, normal, low] (declaration order, head first), not the alphabetical [very_high, normal, low, high] that PostgreSQL would emit on text sort. Sorted asc produces the reversed [low, normal, high, very_high] — last-declared first. If your enum reads “low to high” the conventional fix is to flip the declared order so desc aligns with the natural “highest first” reading. The tool description tells the agent which order is which.

Worked example — full 1.4.0 schema

$schema: https://json-schema.org/draft/2020-12/schema
$id: use_case
type: object

x-id-field: code
x-tool-expose: [list, get, list_ids]
x-summary-fields: [title, popularity, industry]

properties:
  code:        {type: string, pattern: "^[A-Z]{2}-[A-Z0-9-]+$", x-index: true}
  title:       {type: string, minLength: 3}
  industry:    {type: string, x-index: true}
  popularity:  {type: string, enum: [very_high, high, normal, low], x-index: true}
  score:       {type: integer, x-index: true}
  created_at:  {type: string, format: date-time, x-index: true}

Agents bound to this bundle get three MCP tools (one per x-tool-expose entry):

list_use_case(filters, sort, limit, offset) — full payloads
list_use_case_ids(filters, sort, limit, offset) — preview shape {items, total} with the three summary fields plus id
get_use_case(ids[]) — batch fetch, response {entities, not_found}, max 500 ids per call

If you omit list_ids from x-tool-expose, only list_<entity> and get_<entity> are generated — the summary projection is still recorded in the schema but no tool exposes it. The reverse is also valid: a read-only catalog can expose ["list_ids", "get"] only.

Knowledge Graphs — Schema Annotations Reference

Top-level annotations (on the schema root)

x-id-field (string, required)

x-tool-expose (array of strings, optional, default ["list", "get"])

x-tool-description (string, optional)