Neural network optimization is a challenging task, in large part because of the nonconvexity, in general, of the loss function. Various algorithms have been proposed to solve this task, but most of them focus on the convex subspaces of the loss function to avoid the challenge of finding the optimal step in concave subspaces. In addition to avoiding the concave spaces entirely (which can bear a significant computational burden!), they also simply take the minimizer of a second-degree Taylor approximation of the loss function for their step, without addressing the prospect of a significant gap between the loss function and its second-order approximation. In our work, we tackle this problem by making use of the Lipschitz continuity of the loss function to develop the regret-optimal step in all subspaces, whether convex or concave, and discuss the validity and (sometimes) experimental success of such optimizers. We hope that our work will provide practitioners with new tools for selecting neural network training algorithms.