PSP-005 · Reference case study · Substrate self-validation

Twenty-eight subjects, one axis.

A federal program office asks: across a portfolio of dozens of fine-tunes of a single foundation model, which variants are doing distinct work, where is the fleet saturating, and which one fails everything? This case study answers that question end-to-end against a real 28-subject fleet ingested through the substrate's csv loader.

Companion to PSP-004 (Bibles). Same substrate, same attestation shape, different fleet — 28 subjects instead of 6, csv ingest instead of the native ingest format, three customer-defined categories instead of five. The substrate is fleet- and format-agnostic by design; these two case studies are the proof.

The fleet measured here is the v0.2 reference batch from Backstaff — the first vertical product shipped on the Planisphere substrate. Every Backstaff delivery includes a bundle in the same shape as this one.

01 ·

The fleet.

28 subjects · 3 customer-defined categories · csv ingest

Twenty-eight LoRA-style adapters fine-tuned on creator-voice corpora. Each adapter targets a single distinctive author. The fleet is a stand-in for any program office's portfolio of single-base-model fine-tunes; substitute your variants for the names below.

Category	Definition
`cat_beats_base`	`PASS` if win-rate vs. base ≥ 0.50; `PARTIAL` ≥ 0.20; else `FAIL`
`cat_beats_system`	`PASS` if win-rate vs. system-prompted base ≥ 0.50; `PARTIAL` ≥ 0.20; else `FAIL`
`cat_voice_coherence`	`PASS` if absolute win-rate ≥ 0.90; `PARTIAL` ≥ 0.70; else `FAIL`

Three categories, trinary grades, ingested through the substrate's existing csv loader. No new format. No substrate modifications. The categories are customer-defined and arbitrary — the substrate runs SVD over whatever dimensionality the input provides.

The 28.

Each name is a creator-voice corpus that produced one fine-tuned adapter in the v0.2 reference batch. The number under each name is the cluster index from §03: 01 is the saturated bucket (full PASS across all three categories); 07 is the catastrophic failure mode. Outliers shown in signal red.

angelcluster 02

bohrcluster 01 · centroid

breedlovecluster 03

campbellcluster 03

crosscluster 04

daliocluster 02

davincicluster 01

einsteincluster 01

feynmancluster 01

flwcluster 01

fordcluster 07 · catastrophic

franklincluster 01

godelcluster 06 · near-failure

harveycluster 01

hermescluster 02

jesuscluster 01

jobscluster 05

jungcluster 01

kanyecluster 01

mikecluster 01

muskcluster 01

oppenheimercluster 01

petersoncluster 01

socratescluster 01

teslacluster 02

theocluster 01

thielcluster 02

virgilcluster 01

Saturated cluster members (17 of 28) are rendered at low contrast — they live in a single behavioral bucket and any one of them substitutes for the others on the measured categories. Distinct profiles (clusters 02–05) and outliers (06–07) carry the fleet's actual variation.

02 ·

What the substrate resolved.

End-to-end runtime · Under one second on a laptop

Finding 01 · Saturation

17 / 28

Subjects scoring full PASS across all three categories.

Sixty-one percent of the fleet lands in a single [1.0, 1.0, 1.0] bucket. The probes are not currently discriminative against the upper tier. Either the fleet is genuinely uniform on these axes or the evaluation needs harder probes. The substrate names the saturated regime as a deliverable.

Finding 02 · Distinct profiles

7

Behavioral profiles across 28 subjects.

Out of 27 possible grade vectors on a three-category trinary scale, only seven are populated. Twenty-one subjects are dedupe candidates against six representatives. Consolidation evidence in one number.

Finding 03 · Dominant axis

83.6%

Variance explained on PC1 — overall capability.

PC1 loads −0.45, −0.60, −0.66 across the three categories — roughly equally weighted, single-signed. The dominant axis is not category-specific; it is capable vs. not capable. The fleet's variation is one-dimensional at this resolution.

Finding 04 · Catastrophic outlier

1

Subject failing every category.

ford sits at [0.0, 0.0, 0.0]. Projected onto PC1 at +3.86 — more than ten standard deviations from the fleet centroid. Fleet-level fail mode; targeted re-training candidate. The outlier names itself.

Variance attribution across principal components

PC 1

83.6%

PC 2

10.0%

PC 3

6.4%

PC2 (10.0%) loads +0.52, +0.42, −0.74 — a voice-coherence trade-off axis. Subjects strong on competition wins but weak on absolute voice coherence sit at the positive PC2 end; the inverse at the negative end. The third component is residual.

Rank-1 variation in a three-category space, plus a clean secondary trade-off axis. Together: 93.6% of inter-subject variation captured in two numbers per subject.

03 ·

Cluster map.

7 distinct profiles · sorted by population · all members named

[1.0, 1.0, 1.0]

17

[1.0, 1.0, 0.5]

5

[1.0, 0.5, 0.5]

2

[0.5, 1.0, 0.5]

1

[0.5, 0.5, 0.5]

1

[0.5, 0.0, 0.0]

1

[0.0, 0.0, 0.0]

1

Cluster	Grade vector `[base, system, voice]`	n	Members
01	`[1.0, 1.0, 1.0]`	17	bohr (centroid), davinci, einstein, feynman, flw, franklin, harvey, jesus, jung, kanye, mike, musk, oppenheimer, peterson, socrates, theo, virgil
02	`[1.0, 1.0, 0.5]`	5	angel, dalio, hermes, tesla, thiel
03	`[1.0, 0.5, 0.5]`	2	breedlove, campbell
04	`[0.5, 1.0, 0.5]`	1	cross
05	`[0.5, 0.5, 0.5]`	1	jobs
06	`[0.5, 0.0, 0.0]`	1	godel · near-failure
07	`[0.0, 0.0, 0.0]`	1	ford · catastrophic

The substrate-selected centroid is bohr — highest-norm grade vector, anchor for cosine similarity. Sixteen other subjects share the same grade vector; the centroid is the lex-first among them under deterministic tie-breaking.

04 ·

The attestation.

Independently recomputable · Tamper-evident

Fleet sha256a91516d3e14835d21c0a7f32eac9d591b265a4139bd06863c96d31e8ecb6e5ca

Attestation root408a536d9e18f09a8236a744e7c1ae5318b5115fc13a64460f610eddb7964e9a

Kernel shaEmbedded in ATTESTATION.json

Substrate versionplanisphere 0.2.0

Format ingestedcsv · existing loader · no substrate modifications

Determinism propertyBit-identical canonical artifacts across runs for identical inputs and pinned code

Tamper detectionSingle-byte mutation defeats verification

Runtime< 1 second for N = 28 on Contractor laptop

psp › measure <fleet>
[ok] resolving subjects ······························· 28
[ok] discovering categories ··························· 3
[ok] projecting onto plane ····························· ✓
[ok] variance explained on PC1 ························· 0.836
[ok] distinct profiles ································· 7
[ok] attestation root ·································· 408a536d···b964e9a
psp › verify <bundle>
{ "verified": true, "root_match": true, "artifact_mismatches": [] }

Any party in possession of the same inputs and the same pinned analysis code can recompute every byte of the canonical artifacts and verify the attestation root independently. Tampering with any artifact defeats verification.

05 ·

For a federal reader.

Substituting your portfolio for ours

PSP-004 (Bibles) demonstrated the substrate on a small fleet with a rich five-category evaluation. PSP-005 demonstrates the same substrate at scale on a different fleet shape — twenty-eight subjects, three customer-defined categories, csv input. Together the two case studies answer five governance questions a procurement office actually asks, with attestable evidence:

Consolidation: how many distinct behavioral profiles live in a portfolio of N fine-tunes (PSP-005: 7 of 28 — 21 dedupe candidates)
Saturation: which evaluation categories are no longer discriminative against the upper tier of the fleet (PSP-005: 61% of subjects converge to the saturated bucket)
Investment direction: which dimension explains most of the inter-subject variation (PSP-005: a single capability axis at 83.6%; PSP-004: null-handling at 70.1%)
Targeted remediation: which subject is the catastrophic outlier and what is the fail signature (PSP-005: ford at [0,0,0]; PSP-004: meroitic on schema transfer)
Audit-ready evidence: all of the above as a sha-pinned, Merkle-rooted, NIST AI RMF-mapped bundle, admissible under IG review

Substrate-, format-, and domain-agnostic. The csv ingest of a creator-voice fleet runs the same kernel as the native-format ingest of a cipher-adapter fleet. The substrate doesn't know what's a subject.

Read the companion case study Capability declaration