crossentromain.pdf

Node Root

Rafael Oliveira

Jameson Bednarski

root

Manuscript

We present a gradient-theoretic account of how transformer attention, trained with cross-entropy, sculpts low-curvature Bayesian manifolds for in-context inference. Through a novel instrumentation method that requires no gradient updates, we demonstrate: (1) early key-freezing and late valuerefinement dynamics matching EM algorithms; (2) stable advantage matrices that cluster by inference hypothesis; (3) a confidence-accuracy correlation of r=0.85 emerging from geometry alone. This work provides the first operational measure of epistemic health in large language models.

Sculpting Bayesian Manifolds: How Cross-Entropy and Attention Co-Create Inference Geometry in Transformers

Abstract