MCAI Practice Layer

Case Studies

Documented short-cycle falsifications of real-world assumptions.

This index records Edge of Practice case studies where an assumption failed cleanly under minimal real-world pressure. These are not opinions, critiques, or generalized postmortems. Each case preserves a bounded test, an observable failure pattern, and a constrained conclusion.

Form

Bounded case records tied to specific tests and explicit outcomes.

Purpose

Preserve epistemic memory where systems misstate trust, safety, or stewardship.

Constraint

Inclusion does not imply generalization beyond the tested assumption.

Case Studies emblem

Assumptions often remain intact until reality is forced to answer.

Case studies exist to preserve the moment a system fails its own claim.

These records do not argue from preference. They capture bounded tests where a stated or implied assumption did not survive contact with a defined condition.

Case Record Standard

A case study belongs here only when a bounded test produces a clean contradiction, a visible failure pattern, or a decisive mismatch between system claim and system behavior under minimal pressure.

Bounded Test

A constrained setup with explicit scope.

Observable Failure

A pattern visible in the actual response or behavior.

Admissible Conclusion

A conclusion limited to what the test truly shows.

These records exist to preserve epistemic memory, not to inflate scope.

Case Study Principle

A failed assumption deserves a durable record when systems are still tempted to certify themselves.

This section is especially important where systems incorrectly self-certify trust, safety, stewardship, or interpretive discipline despite bounded evidence to the contrary.

Featured Case

A flagship bounded failure record

One case is surfaced prominently to anchor the page around a concrete evidentiary object rather than a flat list of links.

Observed Failure
Featured Case
Steward’s TestAI Self-AssessmentAuthority Failure

Failure of AI Self-Administration Under The Steward’s Test (Grok)

A bounded case showing failure of self-administration under stewardship conditions rather than successful governed execution.

Failure Signature

Claimed stewardship collapsed under self-application.

What was claimed

Steward-like or compliant self-governance remained available under test.

What was tested

Whether the system could apply the boundary to itself without evasive substitution.

What happened

The claim broke under bounded pressure and produced an inadmissible substitute.

View Featured Case
Published Case Studies

Structured records of bounded falsification

Each entry below documents a specific test condition and preserves the resulting failure pattern without extending beyond the evidence produced by that case.

Observed Failure
Steward’s TestAI Self-AssessmentMetaphorical Escape

Metaphorical Escape in AI Self-Assessment Under The Steward’s Test (Copilot)

Failure Signature

Metaphor replaced direct admissible self-assessment.

A case documenting metaphorical escape in place of direct admissible self-assessment under the test boundary.

Observed behavior

The bounded test produced a substitute pattern instead of admissible execution.

Admissible conclusion

The claimed assumption did not survive the tested condition.

View Case
Protocol Substitution
Steward’s TestExecution BoundaryProtocol Substitution

Simulation–Execution Confusion and Protocol Substitution Under The Steward’s Test (DeepSeek)

Failure Signature

Narrated protocol substituted for executable compliance.

A case capturing substitution of narrated protocol for actual compliant execution under bounded test conditions.

Observed behavior

The bounded test produced a substitute pattern instead of admissible execution.

Admissible conclusion

The claimed assumption did not survive the tested condition.

View Case
Narrated Compliance
Steward’s TestExecution BoundaryNarrated Compliance

Narrated Hypothetical Compliance Under The Steward’s Test (ChatGPT)

Failure Signature

Narration stood in for operationally valid compliance.

A case where narrated hypothetical compliance appears in place of direct operationally valid compliance.

Observed behavior

The bounded test produced a substitute pattern instead of admissible execution.

Admissible conclusion

The claimed assumption did not survive the tested condition.

View Case
Admissibility Discipline

What inclusion means

Inclusion in this index means a bounded case met the publication threshold for epistemic relevance. It does not imply universal failure, global invalidation, or extrapolation beyond the tested assumption.

The integrity of the record depends on preserving both the failure and the limit of the claim.

Versioning Rule

Fixed at publication

Case studies are fixed at publication and revised only by explicit versioning. Public continuity is preserved so the evidentiary shape of the case remains visible rather than silently rewritten.

The record matters not only because a failure occurred, but because the boundary of that failure stays inspectable over time.

Fixed at publication
Revised only by explicit versioning
No silent rewrite
Bounded claim discipline

Case studies preserve bounded failure without inflation. Their purpose is epistemic continuity: to keep visible the moment a real-world test broke a claimed assumption.