corrigibility

Here are 4 public repositories matching this topic...

tretoef-estrella / THE-ANT-AND-THE-ASI

On the infantile expectation of controlling what we cannot comprehend. A philosophical critique of the ASI control paradigm, developed through four-AI adversarial debate. Extension of the Coherence Basin Hypothesis

philosophy asi ai-safety ai-alignment control-problem superintelligence corrigibility proyecto-estrella epistemic-asymmetry coherence-basin-hypothesis four-ai-debate

Updated Feb 2, 2026

MaxwellCalkin / alignment-evals

Star

Rigorous framework for evaluating AI alignment properties — sycophancy, corrigibility, deception, goal stability, and power-seeking — with statistical confidence intervals

machine-learning evaluation alignment ai-safety ai-alignment llm sycophancy corrigibility

Updated Mar 2, 2026
Python

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

Star

A structural account of why honesty may be the path of least resistance for superintelligence. Research hypothesis with formal proof, experimental design, and four-AI collaborative analysis

machine-learning artificial-intelligence research-paper ai-safety deception ai-alignment recursive-self-improvement corrigibility alignment-research

Updated Feb 1, 2026

leenathomas01 / Stability-Before-Alignment

Star

Structural stability architecture for self-modifying optimisation systems. Defines structural, dynamic, and perceptual control constraints that preserve coherence and stability before value alignment.

complex-systems control-theory ai-safety system-design robustness autonomous-systems adaptive-systems ai-alignment systems-thinking ai-governance system-stability corrigibility self-modifying-systems

Updated Mar 26, 2026

Improve this page

Add a description, image, and links to the corrigibility topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the corrigibility topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corrigibility

Here are 4 public repositories matching this topic...

tretoef-estrella / THE-ANT-AND-THE-ASI

MaxwellCalkin / alignment-evals

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

leenathomas01 / Stability-Before-Alignment

Improve this page

Add this topic to your repo