The shadow staff that audits your AI workforce. Read behavior by what your fleet casts, not by direct observation. Every delivery includes a sha-pinned, Merkle-rooted bundle that resolves which agents in the fleet are doing distinct work, which are redundant, and which fail. The measurement is part of the product.
Backstaff audits a fleet of AI agents that augment a real human staff — battlestaff analysts, tutors, graders, decision-support assistants. The customer brings the deployed fleet; Backstaff returns a sha-pinned, Merkle-rooted attestation bundle that resolves which agents behave distinctly, which are dedupe candidates, and which fail. The measurement is the deliverable.
| Spec | Value |
|---|---|
| Subject | Fleet of fine-tuned AI agents augmenting a human staff |
| Categories evaluated | Behavioral distinctness · Drift from baseline · Coherence under task |
| Grading | PASS / PARTIAL / FAIL per category, trinary |
| Deliverable | Per-agent behavioral signature + Astrolabe attestation bundle |
| Attestation format | Same as Astrolabe — sha-pinned, Merkle-rooted, NIST AI RMF–mapped |
| Validated scale | 28-agent reference fleet · 7 distinct profiles · 1 catastrophic outlier · 17 dedupe candidates |
| Reference bundle | Backstaff-28 — verified, root 408a536d…b964e9a |
For the reference fleet itself — anonymized as Agent-01 through Agent-28 with the cluster math intact — see the case study.
The historical backstaff measured the sun by what it cast, not by direct observation — the navigator turned their back to the sun and read its altitude from the instrument's shadow. The product reads the same way: it audits what your AI staff casts (decisions, outputs, behavioral signatures) so the people who oversee the staff don't have to stare at it directly.
Battlestaff analysts run dozens of LLM-augmented assistants. Backstaff audits which actually diverge in behavior versus which are prompt-wrapper duplicates. For military →
Districts deploy LoRA-tuned tutor and grader agents. Backstaff measures behavioral distinctness and fairness drift without exposing student data. For education →
verify <bundle> on customer infrastructureFor the worked example — what a 28-agent audit looks like in practice — see the Backstaff-28 case study. Same product, prior fleet, real numbers.