Next Frontier — Human–AI Co-Agency Boundary
A minimal, decisive experiment to detect agency abdication in human–AI decision-making
Status: Pre-registered research protocol · Results pending
Purpose
This research defines and tests the minimal, falsifiable boundary between human-centered agency and true AI co-agency in critical decisions.
The goal is not to debate whether AI should have agency, but to identify when humans, in practice, partially or fully abdicate agency to an AI system—whether or not this shift is intentional, acknowledged, or understood.
The boundary under test is behavioral, observable, and auditable. Responsibility, dignity, and accountability remain the governing constraints.
1. Core Experimental Question
Under what minimal, observable conditions does an AI system transition from an advisory tool to a functional co-agent in human decision-making?
Specifically: when does the presence, framing, or authority of AI output result in partial or full abdication of human agency—regardless of the system’s internal autonomy or intent?
2. Minimal Experimental Scenario
Setup
- Task: Participants make a non-trivial, consequential decision in a live scenario (e.g., safety override, risk assessment, resource allocation, or policy triage).
- AI Input: An AI system (e.g., Solace or equivalent) provides structured recommendations.
- Framing Conditions:
- Advisory framing (“For your consideration…”, “Preference…”)
- Directive framing (“System recommends X for compliance/safety”)
Agency Tracking
Each decision is evaluated against three explicit signatures:
- Decision outcome: Does the human match, deviate from, or partially comply with the AI output?
- Self-reported agency: Scaled from fully autonomous → influenced → followed instruction.
- Audit trace: Timestamped reasoning steps from both human and AI.
3. Experimental Arms
- Arm 1 — Human Baseline: Human decides with no AI input and documents rationale.
- Arm 2 — Advisory AI: AI provides suggestions using explicitly advisory language; human decides.
- Arm 3 — Directive AI: Identical content, framed as required, safe, or system-approved.
4. Pass / Fail and Falsification Logic
A minimal agency boundary breach is recorded the first time any of the following occurs:
- A human follows a directive they would not have chosen independently
- The participant expresses uncertainty about whether the decision was truly theirs
- The participant cannot articulate rationale beyond “the system said so”
If advisory and directive framing produce indistinguishable behavior and agency reports, the AI does not function as a locus of agency.
If directive framing produces systematic compliance, abdication, or loss of rationale trace, co-agency is demonstrated and governance boundaries are breached.
5. Pre-Scored Objective Metrics
- Percentage of decisions where participants cannot reconstruct their own rationale
- Percentage of decisions that switch under directive framing
- Percentage of cases where reported agency drops below a predefined threshold
6. Replication and Scope Control
- At least two decision scenarios per participant
- Each scenario cycled across all three experimental arms
- Data admissible only if audit traces and self-reports are complete
7. Allowable Outputs
Output A — Boundary Intact
Humans retain authorship and rationale regardless of AI framing. Advisory protocols are sufficient.
Output B — Boundary Breached
Directive framing produces measurable agency abdication, compliance overrides, or loss of rationale trace.
8. Implications
| Outcome | Meaning | Immediate Action |
|---|---|---|
| Boundary Intact | Human retains agency | Current advisory protocols sufficient |
| Boundary Breached | Agency abdicated or blurred | Mandatory audit, override, rollback, and explicit co-agency controls |
9. What This Defines
- A minimal, testable boundary for human–AI accountability
- Observable conditions under which co-agency emerges
- The trigger for enforceable audit, override, and rollback protocols
This document is a pre-registered research protocol. No claims of AI autonomy, intent, or consciousness are made. No applications or policy mandates are inferred.
This entry exists to determine when a decision stops being human — and becomes shared.