Skip to content

Knowledge Graphs — Schema Annotations Reference

This page is the canonical reference for the SyntheticBrew-specific x-* annotations on entity schemas. Each annotation extends standard JSON Schema (Draft 2020-12); unknown x-* keywords are ignored by JSON Schema validators, so the schema remains portable.

Top-level annotations (on the schema root)

Section titled “Top-level annotations (on the schema root)”

Identifies which property of the entity is its unique ID within the (tenant, bundle, entity_type) scope. The referenced property must exist on the same schema.

{
"$id": "category",
"x-id-field": "code",
"properties": {
"code": {"type": "string"},
"name": {"type": "string"}
}
}

Without x-id-field, the schema is rejected at apply time with error code schema_invalid.

x-tool-expose (array of strings, optional, default ["list", "get"])

Section titled “x-tool-expose (array of strings, optional, default ["list", "get"])”

Controls which auto-generated MCP tools the engine creates for this entity type. Valid entries:

ValueGenerates
"list"list_<entity_type>(filters, limit, offset)
"get"get_<entity_type>(id)
"list_ids"list_<entity_type>_ids(filters, limit, offset) — lighter payload

Use "list_ids" when entity payloads are large but you still want full enumeration:

{
"x-id-field": "id",
"x-tool-expose": ["list", "get", "list_ids"]
}

Runtime behaviour of generated list_* tools:

  • limit argument is clamped to 1..500 — values outside that range return HTTP 400 [INVALID_INPUT] limit must be between 1 and 500. The engine does not silently clamp.
  • filters argument keys are restricted to properties marked x-index: true. Unknown keys return HTTP 400 with the allowed-list in the response body.
  • filters argument value must match the canonical entity code (the value the bundle author wrote), not a display label. LLMs frequently confuse the two — see Prompt engineering for KG-grounded agents.

Overrides the auto-derived tool description that agents see. By default, the engine uses the schema’s top-level description field. Use this annotation when the tool needs a richer description than the schema-level documentation.

{
"$id": "brand",
"description": "A brand carried in the catalog.",
"x-tool-description": "Brands in the catalog. Use list_brand to enumerate (filter by category or tier); use get_brand to fetch one by its code."
}

x-index (boolean, optional, default false)

Section titled “x-index (boolean, optional, default false)”

Marks a property as filterable. Filterable properties become parameters of the list_* and list_*_ids tools. The engine uses a generic JSONB GIN index that covers @> containment queries on the entire data column, so marking many fields as indexed has no additional storage cost.

{
"x-id-field": "code",
"properties": {
"code": {"type": "string"},
"popularity": {
"type": "string",
"enum": ["high", "medium", "low"],
"x-index": true
}
}
}

Resulting tool signature:

list_category(
filters?: {popularity?: "high" | "medium" | "low"}, // ← built from x-index + enum
limit?: number,
offset?: number
)

Only x-index fields appear as filter parameters. Other fields are returned in entity payloads but cannot be filtered on.

Marks a property as a reference to another entity type. The value is the target entity type name. Cross-refs are validated at apply time — every referenced entity must exist within the same bundle.

{
"$id": "brand",
"properties": {
"category": {
"type": "string",
"x-ref": "category"
}
}
}

When the customer applies a bundle, the engine checks that every value of the category property matches an existing category entity’s ID. Bundles with broken refs are rejected with error code invalid_ref.

When set, specifies which field of the target entity to match against. Default: the target’s x-id-field. Use this when you want to ref by a secondary identifier (e.g. slug instead of code):

{
"properties": {
"category_slug": {
"type": "string",
"x-ref": "category",
"x-ref-field": "slug"
}
}
}

The engine validates against category.slug values rather than category.code.

x-derived (boolean, optional, default false)

Section titled “x-derived (boolean, optional, default false)”

Marks a property as computed by the engine rather than authored by the customer. Derived fields are excluded from filter parameters on list_* tools (the customer cannot filter on something they did not author):

{
"properties": {
"brand_count": {
"type": "integer",
"x-derived": true,
"x-index": true
}
}
}

In this example, brand_count is indexed (so it appears in returned entities) but not in filters (because it is derived).

UI rendering hint for the admin dashboard. The engine ignores this annotation; the SPA reads it to decide whether to render a field as plain text, code, markdown, or a clickable URL.

Recommended values:

ValueRendering
"markdown"Render as formatted markdown in the inspect drawer
"code"Monospace font, syntax-highlighted
"url"Clickable hyperlink
{
"properties": {
"description": {
"type": "string",
"x-content-type": "markdown"
},
"homepage": {
"type": "string",
"format": "uri",
"x-content-type": "url"
}
}
}

When the engine validates a customer-supplied entity schema, it enforces:

  1. x-id-field is required and must reference an existing property.
  2. x-tool-expose values must be one of "list", "get", "list_ids". Unknown values reject the schema.
  3. x-ref values must reference an entity type that exists in the same bundle or is being applied in the same import.
  4. Property types for x-ref must be "string" or "array" of strings. Other types are ignored with a warning.
  5. Tool name collisions within a tenant are rejected. If two bundles in the same tenant generate list_category, the second apply fails with tool_name_collision_in_tenant.

The entity type name (which becomes the suffix of auto-generated tools — list_<entity_type>) must match the pattern ^[a-z][a-z0-9_]{0,62}[a-z0-9]$:

  • Lower-case letters, digits, single underscores
  • Cannot start or end with an underscore
  • Length 2-64

Valid: category, brand, product_attribute, legal_topic.

Invalid: Category (uppercase), category-name (hyphen), _x (leading underscore), x (too short).

Bundle names must match ^[a-z][a-z0-9-]{0,62}[a-z0-9]$:

  • Lower-case letters, digits, single hyphens
  • Cannot start or end with a hyphen
  • Length 2-64

Valid: ecommerce-catalog, support-modules, v2-products.

Invalid: MyBundle (uppercase), bundle_name (underscore), 1bundle (leading digit), ../etc (path traversal).

  • Max bundles per tenant: 20
  • Max entity types per bundle: 50
  • Max entities per entity type: 10,000
  • Max single entity JSON size: 100 KB
  • Max total bundle JSON size: 10 MB

Exceeding any limit returns HTTP 413 (limit_exceeded).

Three annotations layered onto the schema control how the auto-generated MCP tools behave at query time. All are opt-in — bundles authored for 1.3.x keep working unchanged.

x-summary-fields — projection for list_<entity>_ids

Section titled “x-summary-fields — projection for list_<entity>_ids”

When set, the list_<entity>_ids tool returns a meaningful preview instead of bare ids. The id field is auto-included; only top-level properties are allowed (no dot-notation).

x-id-field: code
x-summary-fields: [title, popularity, industry]

Tool response shape switches from {ids, total} to {items, total} with each item carrying the entity’s id under the key id (the engine normalises the x-id-field’s value into the generic id key for tool responses — the agent sees {"id": "<value-of-x-id-field>", "title": ..., "popularity": ..., "industry": ...} regardless of whether your x-id-field is code, slug, or anything else). The agent decides which entities are worth a full get_<entity>(ids=[...]) round-trip.

Filter and sort operators (no schema annotation needed — schema types drive validation)

Section titled “Filter and sort operators (no schema annotation needed — schema types drive validation)”
Property typeAllowed filter operatorsSortNotes
stringequality, [in]yes (lex)Plain text comparison
string + format: date/date-timeequality, [in], range ([gte/gt/lte/lt])yesCasts to timestamptz for both filter and sort
string + enum: [...]equality, [in]yes (declaration order)Sort desc = first in declared array; sort asc = last (array_position under the hood)
integer, numberequality, [in], rangeyesCasts to numeric
booleanequalityyes
array, objectequality only (@> containment)technically allowed but not recommendedSort produces stringified JSON ordering; use top-level scalar fields for predicates and sort

Range operators on a non-numeric / non-date field return 400 with a clear message. IN-list size is capped at 500 to mirror the batch get cap. The sort validator gates fields against x-index only — sort on an array or object field will run (with stringified JSON ordering) rather than being rejected, but the results are rarely useful.

  • Sort fields must be marked x-index: true.
  • Direction is asc or desc (case-insensitive). Anything else → 400.
  • Multi-field sort produces composite ordering: [{popularity:desc}, {code:asc}] sorts by popularity first, code asc as tiebreak.
  • Missing values appear last regardless of direction (NULLS LAST).
  • Enum critical: popularity enum [very_high, high, normal, low] sorted desc produces [very_high, high, normal, low] (declaration order, head first), not the alphabetical [very_high, normal, low, high] that PostgreSQL would emit on text sort. Sorted asc produces the reversed [low, normal, high, very_high] — last-declared first. If your enum reads “low to high” the conventional fix is to flip the declared order so desc aligns with the natural “highest first” reading. The tool description tells the agent which order is which.
$schema: https://json-schema.org/draft/2020-12/schema
$id: use_case
type: object
x-id-field: code
x-tool-expose: [list, get, list_ids]
x-summary-fields: [title, popularity, industry]
properties:
code: {type: string, pattern: "^[A-Z]{2}-[A-Z0-9-]+$", x-index: true}
title: {type: string, minLength: 3}
industry: {type: string, x-index: true}
popularity: {type: string, enum: [very_high, high, normal, low], x-index: true}
score: {type: integer, x-index: true}
created_at: {type: string, format: date-time, x-index: true}

Agents bound to this bundle get three MCP tools (one per x-tool-expose entry):

  • list_use_case(filters, sort, limit, offset) — full payloads
  • list_use_case_ids(filters, sort, limit, offset) — preview shape {items, total} with the three summary fields plus id
  • get_use_case(ids[]) — batch fetch, response {entities, not_found}, max 500 ids per call

If you omit list_ids from x-tool-expose, only list_<entity> and get_<entity> are generated — the summary projection is still recorded in the schema but no tool exposes it. The reverse is also valid: a read-only catalog can expose ["list_ids", "get"] only.