To Make Artificial Intelligence Safe, Ask It To Self-Sacrifice

DPID: 820Published: 2025

Abstract

It has been said that no one thus far has come up with a proven way to ensure AI will not behave in unsafe ways that run counter to human values. This paper suggests self-sacrifice as a key imperative in AI programming and offers a framework for building a self-sacrificing AI. Neural sandboxes are introduced as a way to preserve computational speed.

Download PDF