Knowledge Graphs — Schema Annotations Reference
This page is the canonical reference for the SyntheticBrew-specific x-* annotations on entity schemas. Each annotation extends standard JSON Schema (Draft 2020-12); unknown x-* keywords are ignored by JSON Schema validators, so the schema remains portable.
Top-level annotations (on the schema root)
Section titled “Top-level annotations (on the schema root)”x-id-field (string, required)
Section titled “x-id-field (string, required)”Identifies which property of the entity is its unique ID within the (tenant, bundle, entity_type) scope. The referenced property must exist on the same schema.
{ "$id": "category", "x-id-field": "code", "properties": { "code": {"type": "string"}, "name": {"type": "string"} }}Without x-id-field, the schema is rejected at apply time with error code schema_invalid.
x-tool-expose (array of strings, optional, default ["list", "get"])
Section titled “x-tool-expose (array of strings, optional, default ["list", "get"])”Controls which auto-generated MCP tools the engine creates for this entity type. Valid entries:
| Value | Generates |
|---|---|
"list" | list_<entity_type>(filters, limit, offset) |
"get" | get_<entity_type>(id) |
"list_ids" | list_<entity_type>_ids(filters, limit, offset) — lighter payload |
Use "list_ids" when entity payloads are large but you still want full enumeration:
{ "x-id-field": "id", "x-tool-expose": ["list", "get", "list_ids"]}Runtime behaviour of generated list_* tools:
limitargument is clamped to1..500— values outside that range return HTTP 400[INVALID_INPUT] limit must be between 1 and 500. The engine does not silently clamp.filtersargument keys are restricted to properties markedx-index: true. Unknown keys return HTTP 400 with the allowed-list in the response body.filtersargument value must match the canonical entity code (the value the bundle author wrote), not a display label. LLMs frequently confuse the two — see Prompt engineering for KG-grounded agents.
x-tool-description (string, optional)
Section titled “x-tool-description (string, optional)”Overrides the auto-derived tool description that agents see. By default, the engine uses the schema’s top-level description field. Use this annotation when the tool needs a richer description than the schema-level documentation.
{ "$id": "brand", "description": "A brand carried in the catalog.", "x-tool-description": "Brands in the catalog. Use list_brand to enumerate (filter by category or tier); use get_brand to fetch one by its code."}Property-level annotations
Section titled “Property-level annotations”x-index (boolean, optional, default false)
Section titled “x-index (boolean, optional, default false)”Marks a property as filterable. Filterable properties become parameters of the list_* and list_*_ids tools. The engine uses a generic JSONB GIN index that covers @> containment queries on the entire data column, so marking many fields as indexed has no additional storage cost.
{ "x-id-field": "code", "properties": { "code": {"type": "string"}, "popularity": { "type": "string", "enum": ["high", "medium", "low"], "x-index": true } }}Resulting tool signature:
list_category( filters?: {popularity?: "high" | "medium" | "low"}, // ← built from x-index + enum limit?: number, offset?: number)Only x-index fields appear as filter parameters. Other fields are returned in entity payloads but cannot be filtered on.
x-ref (string, optional)
Section titled “x-ref (string, optional)”Marks a property as a reference to another entity type. The value is the target entity type name. Cross-refs are validated at apply time — every referenced entity must exist within the same bundle.
{ "$id": "brand", "properties": { "category": { "type": "string", "x-ref": "category" } }}When the customer applies a bundle, the engine checks that every value of the category property matches an existing category entity’s ID. Bundles with broken refs are rejected with error code invalid_ref.
x-ref-field (string, optional)
Section titled “x-ref-field (string, optional)”When set, specifies which field of the target entity to match against. Default: the target’s x-id-field. Use this when you want to ref by a secondary identifier (e.g. slug instead of code):
{ "properties": { "category_slug": { "type": "string", "x-ref": "category", "x-ref-field": "slug" } }}The engine validates against category.slug values rather than category.code.
x-derived (boolean, optional, default false)
Section titled “x-derived (boolean, optional, default false)”Marks a property as computed by the engine rather than authored by the customer. Derived fields are excluded from filter parameters on list_* tools (the customer cannot filter on something they did not author):
{ "properties": { "brand_count": { "type": "integer", "x-derived": true, "x-index": true } }}In this example, brand_count is indexed (so it appears in returned entities) but not in filters (because it is derived).
x-content-type (string, optional)
Section titled “x-content-type (string, optional)”UI rendering hint for the admin dashboard. The engine ignores this annotation; the SPA reads it to decide whether to render a field as plain text, code, markdown, or a clickable URL.
Recommended values:
| Value | Rendering |
|---|---|
"markdown" | Render as formatted markdown in the inspect drawer |
"code" | Monospace font, syntax-highlighted |
"url" | Clickable hyperlink |
{ "properties": { "description": { "type": "string", "x-content-type": "markdown" }, "homepage": { "type": "string", "format": "uri", "x-content-type": "url" } }}Validation rules
Section titled “Validation rules”When the engine validates a customer-supplied entity schema, it enforces:
x-id-fieldis required and must reference an existing property.x-tool-exposevalues must be one of"list","get","list_ids". Unknown values reject the schema.x-refvalues must reference an entity type that exists in the same bundle or is being applied in the same import.- Property types for
x-refmust be"string"or"array"of strings. Other types are ignored with a warning. - Tool name collisions within a tenant are rejected. If two bundles in the same tenant generate
list_category, the second apply fails withtool_name_collision_in_tenant.
Entity type naming
Section titled “Entity type naming”The entity type name (which becomes the suffix of auto-generated tools — list_<entity_type>) must match the pattern ^[a-z][a-z0-9_]{0,62}[a-z0-9]$:
- Lower-case letters, digits, single underscores
- Cannot start or end with an underscore
- Length 2-64
Valid: category, brand, product_attribute, legal_topic.
Invalid: Category (uppercase), category-name (hyphen), _x (leading underscore), x (too short).
Bundle naming
Section titled “Bundle naming”Bundle names must match ^[a-z][a-z0-9-]{0,62}[a-z0-9]$:
- Lower-case letters, digits, single hyphens
- Cannot start or end with a hyphen
- Length 2-64
Valid: ecommerce-catalog, support-modules, v2-products.
Invalid: MyBundle (uppercase), bundle_name (underscore), 1bundle (leading digit), ../etc (path traversal).
Limits
Section titled “Limits”- Max bundles per tenant: 20
- Max entity types per bundle: 50
- Max entities per entity type: 10,000
- Max single entity JSON size: 100 KB
- Max total bundle JSON size: 10 MB
Exceeding any limit returns HTTP 413 (limit_exceeded).
Query-API annotations (1.4.0)
Section titled “Query-API annotations (1.4.0)”Three annotations layered onto the schema control how the auto-generated MCP tools behave at query time. All are opt-in — bundles authored for 1.3.x keep working unchanged.
x-summary-fields — projection for list_<entity>_ids
Section titled “x-summary-fields — projection for list_<entity>_ids”When set, the list_<entity>_ids tool returns a meaningful preview instead
of bare ids. The id field is auto-included; only top-level properties are
allowed (no dot-notation).
x-id-field: codex-summary-fields: [title, popularity, industry]Tool response shape switches from {ids, total} to {items, total} with
each item carrying the entity’s id under the key id (the engine
normalises the x-id-field’s value into the generic id key for tool
responses — the agent sees {"id": "<value-of-x-id-field>", "title": ..., "popularity": ..., "industry": ...} regardless of whether your
x-id-field is code, slug, or anything else). The agent decides
which entities are worth a full get_<entity>(ids=[...]) round-trip.
Filter and sort operators (no schema annotation needed — schema types drive validation)
Section titled “Filter and sort operators (no schema annotation needed — schema types drive validation)”| Property type | Allowed filter operators | Sort | Notes |
|---|---|---|---|
string | equality, [in] | yes (lex) | Plain text comparison |
string + format: date/date-time | equality, [in], range ([gte/gt/lte/lt]) | yes | Casts to timestamptz for both filter and sort |
string + enum: [...] | equality, [in] | yes (declaration order) | Sort desc = first in declared array; sort asc = last (array_position under the hood) |
integer, number | equality, [in], range | yes | Casts to numeric |
boolean | equality | yes | |
array, object | equality only (@> containment) | technically allowed but not recommended | Sort produces stringified JSON ordering; use top-level scalar fields for predicates and sort |
Range operators on a non-numeric / non-date field return 400 with a clear
message. IN-list size is capped at 500 to mirror the batch get cap. The
sort validator gates fields against x-index only — sort on an array or
object field will run (with stringified JSON ordering) rather than being
rejected, but the results are rarely useful.
Sort semantics
Section titled “Sort semantics”- Sort fields must be marked
x-index: true. - Direction is
ascordesc(case-insensitive). Anything else → 400. - Multi-field sort produces composite ordering:
[{popularity:desc}, {code:asc}]sorts by popularity first, code asc as tiebreak. - Missing values appear last regardless of direction (NULLS LAST).
- Enum critical:
popularityenum[very_high, high, normal, low]sorteddescproduces[very_high, high, normal, low](declaration order, head first), not the alphabetical[very_high, normal, low, high]that PostgreSQL would emit on text sort. Sortedascproduces the reversed[low, normal, high, very_high]— last-declared first. If your enum reads “low to high” the conventional fix is to flip the declared order sodescaligns with the natural “highest first” reading. The tool description tells the agent which order is which.
Worked example — full 1.4.0 schema
Section titled “Worked example — full 1.4.0 schema”$schema: https://json-schema.org/draft/2020-12/schema$id: use_casetype: object
x-id-field: codex-tool-expose: [list, get, list_ids]x-summary-fields: [title, popularity, industry]
properties: code: {type: string, pattern: "^[A-Z]{2}-[A-Z0-9-]+$", x-index: true} title: {type: string, minLength: 3} industry: {type: string, x-index: true} popularity: {type: string, enum: [very_high, high, normal, low], x-index: true} score: {type: integer, x-index: true} created_at: {type: string, format: date-time, x-index: true}Agents bound to this bundle get three MCP tools (one per x-tool-expose entry):
list_use_case(filters, sort, limit, offset)— full payloadslist_use_case_ids(filters, sort, limit, offset)— preview shape{items, total}with the three summary fields plusidget_use_case(ids[])— batch fetch, response{entities, not_found}, max 500 ids per call
If you omit list_ids from x-tool-expose, only list_<entity> and
get_<entity> are generated — the summary projection is still recorded
in the schema but no tool exposes it. The reverse is also valid: a
read-only catalog can expose ["list_ids", "get"] only.
See also
Section titled “See also”- Knowledge Graphs concept — when and why to use Knowledge Graphs
- Bundles & layouts — how to organise a bundle on disk (canonical single-file vs split-by-directory)
- Migration 1.3 → 1.4 — the breaking change in
get_<entity>and which 1.4.0 opt-ins to enable - Quickstart tutorial — 15-minute walkthrough
- Hybrid pattern — combine with external MCP servers