Skip to content

Bundles & authoring layouts

A Knowledge Graph bundle is a directory containing a manifest, schemas, and entities. brewctl reads this directory, validates it, and posts the flattened payload to the engine in one atomic apply. The engine doesn’t see your file layout — it only sees the merged payload.

This means the layout decision is purely an authoring DX choice. Two patterns are supported as of brewctl 0.4.0 / engine 1.4.0.

Pattern A — canonical single-file layout (1.3.x compatible)

Section titled “Pattern A — canonical single-file layout (1.3.x compatible)”

The simplest layout — one schema file and one entities file per entity type. Works for catalogs up to a few hundred entities per type.

my-bundle/
├── manifest.yaml
├── schemas/
│ └── category.schema.json
└── entities/
└── categories.yaml
manifest.yaml
bundle_name: my-bundle
version: 1.0.0
entity_types:
- name: category
schema_file: schemas/category.schema.json
entities_file: entities/categories.yaml
entities/categories.yaml
- code: footwear
name: Footwear
- code: apparel
name: Apparel
- code: home_goods
name: Home Goods

This is the original 1.3.0 layout. Existing bundles need no changes to work with engine 1.4.0+.

Pattern B — split-by-directory layout (1.4.0+)

Section titled “Pattern B — split-by-directory layout (1.4.0+)”

For larger catalogs (typically ≥100 entities per type), one giant YAML file becomes unwieldy:

  • Git diffs are noisy — finding the one entity that changed in 8000-line files takes editor scrolling.
  • PR review is hard — reviewers can’t isolate a logical change.
  • Merge conflicts on parallel PRs that touch different entities — both PRs edit the same file.

The split-by-directory layout addresses this. Declare entities_path in the manifest instead of entities_file:

my-bundle/
├── manifest.yaml
├── schemas/
│ └── use_case.schema.json
└── entities/
└── use_case/
├── industry-pm.yaml ← array of 30 PM-industry entities
├── industry-fb.yaml ← array of 25 FB-industry entities
└── PM-WF-010.yaml ← single entity (one document per file)
manifest.yaml
bundle_name: my-bundle
version: 1.0.0
entity_types:
- name: use_case
schema_file: schemas/use_case.schema.json
entities_path: entities/use_case/

brewctl globs all *.yaml and *.yml files in the directory (flat — no recursion into subdirectories), merges them in deterministic filename order, validates uniqueness across the merge, and posts the same atomic payload as Pattern A.

What each file in the directory can contain

Section titled “What each file in the directory can contain”

Each file can be either form:

Array of entities — the split-by-category / industry / shard pattern:

entities/use_case/industry-pm.yaml
- code: PM-WF-010
title: Water leak detection
industry: PM
- code: PM-WF-011
title: Toilet overflow
industry: PM

Single entity object — one PR per entity pattern:

entities/use_case/PM-WF-010.yaml
code: PM-WF-010
title: Water leak detection
industry: PM

Both styles can coexist in the same directory — pick what suits each group of entities best.

ConcernSingle-file (Pattern A)Split-by-directory (Pattern B)
Small catalogs (<100 entities/type)BestAcceptable but overkill
Large catalogs (≥100 entities/type)Hard to reviewBest
Git diff readabilityNoisy at scalePer-file diffs are surgical
Merge conflict frequencyHigh when parallel work happensLow — different files don’t conflict
File-system entry countLowHigher (one inode per file)
Quickstart accessibilityEasier for newcomersFamiliar once they grow
brewctl kg pull roundtripLosslessLossy — pull always emits Pattern A

You may declare either entities_file or entities_path per entity type, never both. brewctl rejects bundles that set both with a clear error before any network call:

manifest entity_type "use_case": entities_file and entities_path are
mutually exclusive — pick one

Different entity types in the same manifest can use different patterns — mix freely:

entity_types:
- name: industry # small, keep simple
schema_file: schemas/industry.schema.json
entities_file: entities/industries.yaml
- name: use_case # large, split
schema_file: schemas/use_case.schema.json
entities_path: entities/use_case/

Pull behaviour — lossy on roundtrip (by design)

Section titled “Pull behaviour — lossy on roundtrip (by design)”

brewctl kg pull always emits the canonical single-file layout (Pattern A), even for bundles that were authored via entities_path. The engine does not record which file each entity came from, so reconstructing the split would mean guessing — and guessing produces a misleading roundtrip.

If you maintain a split layout in source, treat pull as a backup / inspection tool, not a roundtrip authoring tool. Apply with entities_path; if you later need to reassemble a split layout from a pull, write a small local script that re-splits by your chosen field (industry, category, etc.).

A future release may add a split_by_field manifest option that makes pull preserve a deterministic split layout. We are deferring that until a concrete customer use case requires it — the lossy-but-honest default is the safer baseline.

When to migrate from Pattern A to Pattern B

Section titled “When to migrate from Pattern A to Pattern B”

A rough rule of thumb:

  • <100 entities per type — stay on Pattern A. The flat file is small enough that the operational complexity of a directory isn’t worth it.
  • 100–500 entities per type — split is helpful but not urgent. If parallel-PR conflicts on the entities file are noticeable, migrate.
  • ≥500 entities per type — split. The diff hell is real and the re-organisation pays for itself within a couple of weeks.
  • Per-entity PR review (each entity gets its own approval / commit) — Pattern B with one document per file is the only sensible layout.

To migrate, split your existing entities/<type>.yaml into multiple files inside entities/<type>/, then update the manifest:

- name: use_case
schema_file: schemas/use_case.schema.json
entities_file: entities/use_cases.yaml
- name: use_case
schema_file: schemas/use_case.schema.json
entities_path: entities/use_case/

brewctl kg validate ./bundle confirms the migration before any network call.