And this is the current functionality matrix of our harness
(already implemented)
It's very "enterprisy level" oriented
..thus taking quite a while to get right
Accountable work records:
Runs produce reviewable records of what happened: goals, decisions, tools, policies, models, outcomes, and evidence pointers.
No-secret evidence:
Proof is retained without storing raw secrets, provider payloads, transcripts, raw tool arguments, or sensitive command output.
Claim-to-proof closeout:
Important claims must point to concrete proof: tests, live checks, audits, redacted evidence, or explicit deferrals.
Policy decision history:
Allow, deny, redact, skip, approval-requested, and failure states are recorded as first-class facts.
Action review gates:
Risky actions can be routed through explicit review instead of being silently executed.
Pending action resolution:
Paused actions can be approved, denied, expired, or cancelled, with single-use execution of the exact reviewed action.
Fail-closed behavior:
Malformed, unavailable, unsafe, timed-out, mismatched, or unauthorized paths stop safely instead of guessing forward.
Access enforcement:
Read/write/command boundaries are enforced at runtime, not treated as advisory labels.
Work mode controls:
Modes like planning, asking first, read-only, and higher-trust execution are separate from tool approval and access scope.
Checkpoint and resume:
Work can pause at a known checkpoint and resume through validated authority instead of restarting blindly.
Tool-requested pauses:
Tools can explicitly ask the work loop to pause when human input, policy, or external conditions require it.
Decision requests:
Non-tool decisions can pause work, wait for input, apply declared defaults, timeout, cancel, or resume with evidence.
Delegated decision handling:
A bounded delegate can answer scoped decision prompts without gaining tool approval, access, or human authority.
Reviewer and delegate profiles:
Reviewer and delegate roles are explicit slots with readiness, routing, fallback, and failure evidence.
Model routing evidence:
The requested model, selected model, provider route, fallback behavior, and readiness state can be recorded without exposing sensitive payloads.
Work-record readback:
Operators can inspect concise, no-secret summaries of what happened and why.
Stop reasons:
Runs distinguish done, blocked, interrupted, budget exhausted, policy pause, decision pause, and tool pause.
Budgeted continuation:
Longer work is moving toward explicit continuation budgets instead of open-ended autonomy.
Autopilot with review boundaries:
Near-term autonomous continuation composes with action review and policy gates instead of bypassing them.
Delegated autopilot boundaries:
Delegation can help with bounded decisions, but high-risk actions and human-required gates still block.
Surface-control readback:
Operator-facing controls will show effective work mode, access, review, decision, and delegate state.
Audit matrices:
Important control axes are checked against runtime behavior, tests, evidence routes, and named deferrals.
Traceable evidence routes:
Current status, decisions, closeouts, and proof artifacts link forward and backward so context can be reconstructed later.