A federal program office asks: across a portfolio of dozens of fine-tunes of a single foundation model, which variants are doing distinct work, where is the fleet saturating, and which one fails everything? This case study answers that question end-to-end against a real 28-agent fleet ingested through Astrolabe's csv loader.
Companion to the Bibles case study. Same engine, same attestation shape, different fleet — 28 agents instead of 6, csv ingest instead of native, three customer-defined categories instead of five. Astrolabe is fleet- and format-agnostic by design; these two case studies are the proof.
The fleet measured here is the v0.2 reference batch from Backstaff — the first vertical product on the Astrolabe engine. Every Backstaff delivery includes a bundle in the same shape as this one. Agent identifiers (Agent-01 through Agent-28) are anonymized labels for public rendering; the attestation bundle was issued against the original identifiers and remains canonically stamped that way — immutable by design.
Twenty-eight LoRA-style adapters in an anonymized fleet. Each adapter is one member of a fine-tuned AI workforce. The fleet is a stand-in for any program office's portfolio of single-base-model fine-tunes; the Agent-NN identifiers below are anonymized for public rendering.
| Category | Definition |
|---|---|
cat_beats_base | PASS if win-rate vs. base ≥ 0.50; PARTIAL ≥ 0.20; else FAIL |
cat_beats_system | PASS if win-rate vs. system-prompted base ≥ 0.50; PARTIAL ≥ 0.20; else FAIL |
cat_voice_coherence | PASS if absolute win-rate ≥ 0.90; PARTIAL ≥ 0.70; else FAIL |
Three categories, trinary grades, ingested through Astrolabe's existing csv loader. No new format. No engine modifications. The categories are customer-defined and arbitrary — Astrolabe runs SVD over whatever dimensionality the input provides.
Each Agent-NN identifier represents one fine-tuned agent in the v0.2 reference fleet. The number under each identifier is the cluster index from §03: 01 is the saturated bucket (full PASS across all three categories); 07 is the catastrophic failure mode. Outliers shown in signal red.
Saturated cluster members (17 of 28) are rendered at low contrast — they live in a single behavioral bucket and any one of them substitutes for the others on the measured categories. Distinct profiles (clusters 02–05) and outliers (06–07) carry the fleet's actual variation.
Sixty-one percent of the fleet lands in a single [1.0, 1.0, 1.0] bucket. The probes are not currently discriminative against the upper tier. Either the fleet is genuinely uniform on these axes or the evaluation needs harder probes. Astrolabe names the saturated regime as a deliverable.
Out of 27 possible grade vectors on a three-category trinary scale, only seven are populated. Twenty-one subjects are dedupe candidates against six representatives. Consolidation evidence in one number.
PC1 loads −0.45, −0.60, −0.66 across the three categories — roughly equally weighted, single-signed. The dominant axis is not category-specific; it is capable vs. not capable. The fleet's variation is one-dimensional at this resolution.
Agent-11 sits at [0.0, 0.0, 0.0]. Projected onto PC1 at +3.86 — more than ten standard deviations from the fleet centroid. Fleet-level fail mode; targeted re-training candidate. The outlier names itself.
PC2 (10.0%) loads +0.52, +0.42, −0.74 — a voice-coherence trade-off axis. Subjects strong on competition wins but weak on absolute voice coherence sit at the positive PC2 end; the inverse at the negative end. The third component is residual.
| Cluster | Grade vector [base, system, voice] | n | Members |
|---|---|---|---|
| 01 | [1.0, 1.0, 1.0] | 17 | Agent-02 (centroid), Agent-07, Agent-08, Agent-09, Agent-10, Agent-12, Agent-14, Agent-16, Agent-18, Agent-19, Agent-20, Agent-21, Agent-22, Agent-23, Agent-24, Agent-26, Agent-28 |
| 02 | [1.0, 1.0, 0.5] | 5 | Agent-01, Agent-06, Agent-15, Agent-25, Agent-27 |
| 03 | [1.0, 0.5, 0.5] | 2 | Agent-03, Agent-04 |
| 04 | [0.5, 1.0, 0.5] | 1 | Agent-05 |
| 05 | [0.5, 0.5, 0.5] | 1 | Agent-17 |
| 06 | [0.5, 0.0, 0.0] | 1 | Agent-13 · near-failure |
| 07 | [0.0, 0.0, 0.0] | 1 | Agent-11 · catastrophic |
The Astrolabe-selected centroid is Agent-02 — highest-norm grade vector, anchor for cosine similarity. Sixteen other agents share the same grade vector; the centroid is the lex-first among them under deterministic tie-breaking.
a91516d3e14835d21c0a7f32eac9d591b265a4139bd06863c96d31e8ecb6e5ca408a536d9e18f09a8236a744e7c1ae5318b5115fc13a64460f610eddb7964e9aATTESTATION.jsonplanisphere 0.2.0 (bundle stamp at issue; engine since renamed to Astrolabe — bundle remains canonically stamped)csv · existing loader · no engine modificationsAny party in possession of the same inputs and the same pinned analysis code can recompute every byte of the canonical artifacts and verify the attestation root independently. Tampering with any artifact defeats verification.
The Bibles case study demonstrated Astrolabe on a small fleet with a rich five-category evaluation. This case study demonstrates the same engine at scale on a different fleet shape — twenty-eight agents, three customer-defined categories, csv input. Together the two case studies answer five governance questions a procurement office actually asks, with attestable evidence:
Agent-11 at [0,0,0]; Bibles: meroitic on schema transfer)