Bundles & authoring layouts

A Knowledge Graph bundle is a directory containing a manifest, schemas, and entities. brewctl reads this directory, validates it, and posts the flattened payload to the engine in one atomic apply. The engine doesn’t see your file layout — it only sees the merged payload.

This means the layout decision is purely an authoring DX choice. Two patterns are supported as of brewctl 0.4.0 / engine 1.4.0.

Pattern A — canonical single-file layout (1.3.x compatible)

The simplest layout — one schema file and one entities file per entity type. Works for catalogs up to a few hundred entities per type.

my-bundle/
├── manifest.yaml
├── schemas/
│   └── category.schema.json
└── entities/
    └── categories.yaml

bundle_name: my-bundle
version: 1.0.0
entity_types:
  - name: category
    schema_file: schemas/category.schema.json
    entities_file: entities/categories.yaml

- code: footwear
  name: Footwear
- code: apparel
  name: Apparel
- code: home_goods
  name: Home Goods

This is the original 1.3.0 layout. Existing bundles need no changes to work with engine 1.4.0+.

Pattern B — split-by-directory layout (1.4.0+)

For larger catalogs (typically ≥100 entities per type), one giant YAML file becomes unwieldy:

Git diffs are noisy — finding the one entity that changed in 8000-line files takes editor scrolling.
PR review is hard — reviewers can’t isolate a logical change.
Merge conflicts on parallel PRs that touch different entities — both PRs edit the same file.

The split-by-directory layout addresses this. Declare entities_path in the manifest instead of entities_file:

my-bundle/
├── manifest.yaml
├── schemas/
│   └── use_case.schema.json
└── entities/
    └── use_case/
        ├── industry-pm.yaml         ← array of 30 PM-industry entities
        ├── industry-fb.yaml         ← array of 25 FB-industry entities
        └── PM-WF-010.yaml           ← single entity (one document per file)

bundle_name: my-bundle
version: 1.0.0
entity_types:
  - name: use_case
    schema_file: schemas/use_case.schema.json
    entities_path: entities/use_case/

brewctl globs all *.yaml and *.yml files in the directory (flat — no recursion into subdirectories), merges them in deterministic filename order, validates uniqueness across the merge, and posts the same atomic payload as Pattern A.

What each file in the directory can contain

Each file can be either form:

Array of entities — the split-by-category / industry / shard pattern:

- code: PM-WF-010
  title: Water leak detection
  industry: PM
- code: PM-WF-011
  title: Toilet overflow
  industry: PM

Single entity object — one PR per entity pattern:

code: PM-WF-010
title: Water leak detection
industry: PM

Both styles can coexist in the same directory — pick what suits each group of entities best.

Trade-offs at a glance

Concern	Single-file (Pattern A)	Split-by-directory (Pattern B)
Small catalogs (<100 entities/type)	Best	Acceptable but overkill
Large catalogs (≥100 entities/type)	Hard to review	Best
Git diff readability	Noisy at scale	Per-file diffs are surgical
Merge conflict frequency	High when parallel work happens	Low — different files don’t conflict
File-system entry count	Low	Higher (one inode per file)
Quickstart accessibility	Easier for newcomers	Familiar once they grow
`brewctl kg pull` roundtrip	Lossless	Lossy — pull always emits Pattern A

Mutual exclusion

You may declare either entities_file or entities_path per entity type, never both. brewctl rejects bundles that set both with a clear error before any network call:

manifest entity_type "use_case": entities_file and entities_path are
mutually exclusive — pick one

Different entity types in the same manifest can use different patterns — mix freely:

entity_types:
  - name: industry         # small, keep simple
    schema_file: schemas/industry.schema.json
    entities_file: entities/industries.yaml

  - name: use_case         # large, split
    schema_file: schemas/use_case.schema.json
    entities_path: entities/use_case/

Pull behaviour — lossy on roundtrip (by design)

brewctl kg pull always emits the canonical single-file layout (Pattern A), even for bundles that were authored via entities_path. The engine does not record which file each entity came from, so reconstructing the split would mean guessing — and guessing produces a misleading roundtrip.

If you maintain a split layout in source, treat pull as a backup / inspection tool, not a roundtrip authoring tool. Apply with entities_path; if you later need to reassemble a split layout from a pull, write a small local script that re-splits by your chosen field (industry, category, etc.).

A future release may add a split_by_field manifest option that makes pull preserve a deterministic split layout. We are deferring that until a concrete customer use case requires it — the lossy-but-honest default is the safer baseline.

When to migrate from Pattern A to Pattern B

A rough rule of thumb:

<100 entities per type — stay on Pattern A. The flat file is small enough that the operational complexity of a directory isn’t worth it.
100–500 entities per type — split is helpful but not urgent. If parallel-PR conflicts on the entities file are noticeable, migrate.
≥500 entities per type — split. The diff hell is real and the re-organisation pays for itself within a couple of weeks.
Per-entity PR review (each entity gets its own approval / commit) — Pattern B with one document per file is the only sensible layout.

To migrate, split your existing entities/<type>.yaml into multiple files inside entities/<type>/, then update the manifest:

- name: use_case
  schema_file: schemas/use_case.schema.json
  entities_file: entities/use_cases.yaml
- name: use_case
  schema_file: schemas/use_case.schema.json
  entities_path: entities/use_case/

brewctl kg validate ./bundle confirms the migration before any network call.