Home Backstaff · Public
Planisphere
Product · v1.0.0 2026-05-17
Backstaff · Shadow-staff audit · Built on Astrolabe

Backstaff.

The shadow staff that audits your AI workforce. Read behavior by what your fleet casts, not by direct observation. Every delivery includes a sha-pinned, Merkle-rooted bundle that resolves which agents in the fleet are doing distinct work, which are redundant, and which fail. The measurement is part of the product.

For military For education
Backstaff — staff, transom, vane, cast shadow
01 ·

What it is.

Shadow audit · for an AI-augmented staff

Backstaff audits a fleet of AI agents that augment a real human staff — battlestaff analysts, tutors, graders, decision-support assistants. The customer brings the deployed fleet; Backstaff returns a sha-pinned, Merkle-rooted attestation bundle that resolves which agents behave distinctly, which are dedupe candidates, and which fail. The measurement is the deliverable.

SpecValue
SubjectFleet of fine-tuned AI agents augmenting a human staff
Categories evaluatedBehavioral distinctness · Drift from baseline · Coherence under task
GradingPASS / PARTIAL / FAIL per category, trinary
DeliverablePer-agent behavioral signature + Astrolabe attestation bundle
Attestation formatSame as Astrolabe — sha-pinned, Merkle-rooted, NIST AI RMF–mapped
Validated scale28-agent reference fleet · 7 distinct profiles · 1 catastrophic outlier · 17 dedupe candidates
Reference bundleBackstaff-28 — verified, root 408a536d…b964e9a

For the reference fleet itself — anonymized as Agent-01 through Agent-28 with the cluster math intact — see the case study.

02 ·

Two verticals. One instrument.

Military · Education · Same math, different staff

The historical backstaff measured the sun by what it cast, not by direct observation — the navigator turned their back to the sun and read its altitude from the instrument's shadow. The product reads the same way: it audits what your AI staff casts (decisions, outputs, behavioral signatures) so the people who oversee the staff don't have to stare at it directly.

For military.

Battlestaff analysts run dozens of LLM-augmented assistants. Backstaff audits which actually diverge in behavior versus which are prompt-wrapper duplicates. For military →

For education.

Districts deploy LoRA-tuned tutor and grader agents. Backstaff measures behavioral distinctness and fairness drift without exposing student data. For education →

The deliverable carries its own proof. Two months later, when the customer wants to know whether their re-trained fleet has actually changed, the attestation is the citable answer.
03 ·

What the customer gets.

Per-batch deliverable shape
Behavioral signature5-D spectral fingerprint per agent in the fleet
GradesThree-category evaluation per agent, as a tabular grade input
Attestation bundleAstrolabe attestation bundle — sha-pinned, Merkle-rooted
Cluster mapDistinct behavioral profiles named; dedupe candidates flagged
Outlier reportCatastrophic failures projected; targeted re-training candidates
Verification commandverify <bundle> on customer infrastructure
Re-measurementOn-demand or scheduled — same engine, same root algorithm

For the worked example — what a 28-agent audit looks like in practice — see the Backstaff-28 case study. Same product, prior fleet, real numbers.

04 ·

What it is not.

Scope discipline
  • Not surveillance. Backstaff measures fleet behavior, not the people supervising it.
  • Not real-time. The audit is a point-in-time attestation; continuous monitoring is a different instrument.
  • Not a replacement for human judgment. The bundle is evidence, not a verdict.
  • Not the engine. Astrolabe is the engine. Backstaff consumes it.
  • Not a hosting offering. The signature bundle is delivered; runtime is the customer's.
  • Not a red-team. The battery is behavioral distinctness and drift, not safety or jailbreak resistance.
See the reference fleet Read the engine capability