Backstaff-28 · Reference Case Study

Twenty-eight subjects, one axis.

A federal program office asks: across a portfolio of dozens of fine-tunes of a single foundation model, which variants are doing distinct work, where is the fleet saturating, and which one fails everything? This case study answers that question end-to-end against a real 28-agent fleet ingested through Astrolabe's csv loader.

Companion to the Bibles case study. Same engine, same attestation shape, different fleet — 28 agents instead of 6, csv ingest instead of native, three customer-defined categories instead of five. Astrolabe is fleet- and format-agnostic by design; these two case studies are the proof.

The fleet measured here is the v0.2 reference batch from Backstaff — the first vertical product on the Astrolabe engine. Every Backstaff delivery includes a bundle in the same shape as this one. Agent identifiers (Agent-01 through Agent-28) are anonymized labels for public rendering; the attestation bundle was issued against the original identifiers and remains canonically stamped that way — immutable by design.

Category	Definition
`cat_beats_base`	`PASS` if win-rate vs. base ≥ 0.50; `PARTIAL` ≥ 0.20; else `FAIL`
`cat_beats_system`	`PASS` if win-rate vs. system-prompted base ≥ 0.50; `PARTIAL` ≥ 0.20; else `FAIL`
`cat_voice_coherence`	`PASS` if absolute win-rate ≥ 0.90; `PARTIAL` ≥ 0.70; else `FAIL`

Cluster	Grade vector `[base, system, voice]`	n	Members
01	`[1.0, 1.0, 1.0]`	17	Agent-02 (centroid), Agent-07, Agent-08, Agent-09, Agent-10, Agent-12, Agent-14, Agent-16, Agent-18, Agent-19, Agent-20, Agent-21, Agent-22, Agent-23, Agent-24, Agent-26, Agent-28
02	`[1.0, 1.0, 0.5]`	5	Agent-01, Agent-06, Agent-15, Agent-25, Agent-27
03	`[1.0, 0.5, 0.5]`	2	Agent-03, Agent-04
04	`[0.5, 1.0, 0.5]`	1	Agent-05
05	`[0.5, 0.5, 0.5]`	1	Agent-17
06	`[0.5, 0.0, 0.0]`	1	Agent-13 · near-failure
07	`[0.0, 0.0, 0.0]`	1	Agent-11 · catastrophic

Twenty-eight subjects, one axis.

The fleet.

The 28.

What Astrolabe resolved.

Subjects scoring full PASS across all three categories.

Behavioral profiles across 28 subjects.

Variance explained on PC1 — overall capability.

Subject failing every category.

Variance attribution across principal components

Cluster map.

The attestation.

For a federal reader.