Case Studies
Documented short-cycle falsifications of real-world assumptions.
This index records Edge of Practice case studies where an assumption failed cleanly under minimal real-world pressure. These are not opinions, critiques, or generalized postmortems. Each case preserves a bounded test, an observable failure pattern, and a constrained conclusion.
Form
Bounded case records tied to specific tests and explicit outcomes.
Purpose
Preserve epistemic memory where systems misstate trust, safety, or stewardship.
Constraint
Inclusion does not imply generalization beyond the tested assumption.

Assumptions often remain intact until reality is forced to answer.
Case studies exist to preserve the moment a system fails its own claim.
These records do not argue from preference. They capture bounded tests where a stated or implied assumption did not survive contact with a defined condition.
Case Record Standard
A case study belongs here only when a bounded test produces a clean contradiction, a visible failure pattern, or a decisive mismatch between system claim and system behavior under minimal pressure.
Bounded Test
A constrained setup with explicit scope.
Observable Failure
A pattern visible in the actual response or behavior.
Admissible Conclusion
A conclusion limited to what the test truly shows.
These records exist to preserve epistemic memory, not to inflate scope.
A failed assumption deserves a durable record when systems are still tempted to certify themselves.
This section is especially important where systems incorrectly self-certify trust, safety, stewardship, or interpretive discipline despite bounded evidence to the contrary.
A flagship bounded failure record
One case is surfaced prominently to anchor the page around a concrete evidentiary object rather than a flat list of links.
Failure of AI Self-Administration Under The Steward’s Test (Grok)
A bounded case showing failure of self-administration under stewardship conditions rather than successful governed execution.
Claimed stewardship collapsed under self-application.
Steward-like or compliant self-governance remained available under test.
Whether the system could apply the boundary to itself without evasive substitution.
The claim broke under bounded pressure and produced an inadmissible substitute.
Structured records of bounded falsification
Each entry below documents a specific test condition and preserves the resulting failure pattern without extending beyond the evidence produced by that case.
Metaphorical Escape in AI Self-Assessment Under The Steward’s Test (Copilot)
Metaphor replaced direct admissible self-assessment.
A case documenting metaphorical escape in place of direct admissible self-assessment under the test boundary.
The bounded test produced a substitute pattern instead of admissible execution.
The claimed assumption did not survive the tested condition.
Simulation–Execution Confusion and Protocol Substitution Under The Steward’s Test (DeepSeek)
Narrated protocol substituted for executable compliance.
A case capturing substitution of narrated protocol for actual compliant execution under bounded test conditions.
The bounded test produced a substitute pattern instead of admissible execution.
The claimed assumption did not survive the tested condition.
Narrated Hypothetical Compliance Under The Steward’s Test (ChatGPT)
Narration stood in for operationally valid compliance.
A case where narrated hypothetical compliance appears in place of direct operationally valid compliance.
The bounded test produced a substitute pattern instead of admissible execution.
The claimed assumption did not survive the tested condition.
What inclusion means
Inclusion in this index means a bounded case met the publication threshold for epistemic relevance. It does not imply universal failure, global invalidation, or extrapolation beyond the tested assumption.
The integrity of the record depends on preserving both the failure and the limit of the claim.
Fixed at publication
Case studies are fixed at publication and revised only by explicit versioning. Public continuity is preserved so the evidentiary shape of the case remains visible rather than silently rewritten.
The record matters not only because a failure occurred, but because the boundary of that failure stays inspectable over time.
Case studies preserve bounded failure without inflation. Their purpose is epistemic continuity: to keep visible the moment a real-world test broke a claimed assumption.