PSP-004 · Reference case study · Substrate self-validation

The Bibles fleet, projected.

A federal program office asks: given a fleet of fine-tuned variants, which are doing distinct work, which are redundant, and what dimension differentiates them most? This case study answers that question end-to-end against a real 6-subject fleet.

Internal substrate validation, not a customer deployment. The fleet measured is the Planisphere team's own reference corpus. A reader should substitute their fleet mentally — the substrate is fleet-agnostic; the pipeline that produced this bundle is the same pipeline a Phase I pilot would execute.

01 ·

The fleet.

6 subjects · 5 evaluation categories

Subject	Corpus tile
`Bible-earth-codebook-v1`	Earth-codebook · designed sign-system · covenant
`Bible-etruscan-v1`	Etruscan · pre-Roman Italic · cognate-failure
`Bible-hermetica-v1`	Hermetic Corpus · Greek/Coptic · revelation
`Bible-linear-a-v1`	Linear A · undeciphered Aegean · structure-without-key
`Bible-meroitic-v1`	Meroitic · Kushite · partial-key
`Bible-shotokan-v1`	Shotokan kata · Funakoshi 1922 · embodiment

Categories scored, PASS / PARTIAL / FAIL: null-handling, recall, cross-tile transfer, router-swap robustness, schema transfer.

02 ·

What the substrate resolved.

End-to-end runtime · Under one second on a laptop

Finding 01 · Redundancy

5 / 6

Distinct behavioral profiles in a 6-subject fleet.

Bible-hermetica-v1 and Bible-linear-a-v1 share byte-identical grade vectors. One is a dedupe candidate; the fleet loses no behavioral coverage by retiring it.

Finding 02 · Effective rank

Independent axes of variation across 5 evaluation dimensions.

Two evaluation categories (cross-tile, router-swap) are currently saturated — not discriminative across this fleet. Either the fleet is uniform on those axes or the probes are not stressing them.

Finding 03 · Dominant axis

70.1%

Variance explained on PC1 — null-handling.

PC1 loads +0.935 on null-handling and −0.342 on schema-transfer. The dimension on which the fleet most varies is null-handling. Targeted investment here standardizes the fleet fastest.

Finding 04 · Outlier

Subject failing where all others partial.

Bible-meroitic-v1 is the sole FAIL on schema-transfer. Every other subject is PARTIAL. Targeted re-training candidate; the rest of the fleet is sound on this axis.

Variance attribution across principal components

PC 1

70.1%

PC 2

18.8%

PC 3

11.1%

PC 4

0.0%

PC 5

0.0%

Rank-3 fleet behavior in a rank-5 evaluation space. Two of the five evaluation categories are saturated against this fleet. The substrate names the saturated axes as a deliverable, not a footnote.

03 ·

Cluster map.

5 distinct profiles · sorted by population

Cluster	Grade vector `[null, recall, cross, router, schema]`	Subjects
01	`[0.5, 1.0, 1.0, 0.5, 0.5]`	`Bible-hermetica-v1` · `Bible-linear-a-v1`
02	`[1.0, 1.0, 1.0, 0.5, 0.5]`	`Bible-etruscan-v1`
03	`[1.0, 1.0, 1.0, 0.5, 0.0]`	`Bible-meroitic-v1`
04	`[0.0, 1.0, 1.0, 0.5, 0.5]`	`Bible-earth-codebook-v1`
05	`[0.5, 0.5, 1.0, 0.5, 0.5]`	`Bible-shotokan-v1`

04 ·

The attestation.

Independently recomputable · Tamper-evident

Fleet sha25677576f236f03da712092e7d93e685a9e59c8c302d619ef6bef1e437319aaa8a7

Attestation root1cd0f2b48922b01dbcb131011fae40ac2da355a4470a89565a74802ed8a3d28d

Kernel shaEmbedded in ATTESTATION.json

Determinism propertyBit-identical canonical artifacts across runs for identical inputs and pinned code

Tamper detectionSingle-byte mutation defeats verification

Runtime< 1 second for N = 6 on Contractor laptop

psp › verify <bundle>
{
  "verified": true,
  "root_match": true,
  "artifact_mismatches": []
}

Any party in possession of the 6 input files matching the fleet sha and the pinned analysis code matching the kernel sha can recompute every byte of the canonical artifacts and verify the attestation root independently. The negative case — tampering — is exercised by the substrate's own test suite.

05 ·

For a federal reader.

Substituting your fleet for ours

Substitute a DoW adapter fleet for the Bibles fleet above, and the substrate answers — at the same runtime and the same cost — the same five questions, with attestable evidence:

Consolidation: which fine-tuned variants are behaviorally redundant and can be retired
Investment direction: which evaluation axis to prioritize for the next training round
Targeted remediation: which subject is the outlier, on what category
Evaluation harness audit: which probes are saturated and may need to be hardened
Audit-ready evidence: all of the above as a sha-pinned, Merkle-rooted, NIST AI RMF-mapped bundle, admissible under IG review

The first measurement is the fleet. A program office that runs Planisphere across 5 candidate vendors walks out with a 5-subject fleet in our attestation system. The fleet didn't exist before the measurement.

Read the capability declaration Back to index