Retuned Nesterov Convergence #
The Nesterov algorithm achieves the sharp PL exponent by running with an internal parameter just below μ.
Key scalar comparison #
The general local theorem gives per-step Lyapunov contraction rate
r = 1 − (1−θ)·√(μ'·η).
The main specialization sets μ' = μ·(1−θ), η = 1/L, and chooses
0 < θ ≤ √(μ/L)/8.
Local theorem variants #
nesterov_convergence_at_base_point_position_params: state positionsxₖ; arbitraryμ',θ,ρ, plus a scalar rate bound; reusable core theorem.nesterov_convergence_at_base_point_position_theta: state positionsxₖ; explicit retuning parameterθ,ρ = rhoOfTheta; local specialized theorem.
Rate bound: r ≤ exp(-√(μ/L)) #
If 0 < θ ≤ min (√(μ/L)/8) (1/4), then the retuned one-step factor from
the theorem statement is bounded by the sharp exponential exp(-√(μ/L)).
State-position local theorem family #
Parameterized local convergence theorem for the base positions xₙ.
The update queries the gradient at the look-ahead point
xₙ' = xₙ + √η vₙ, but the theorem states the rate for the state positions
xₙ. The retuned internal parameter μ', absorption budget θ, and momentum
ρ are all explicit.
Local convergence theorem for the base positions xₙ with explicit θ.
The extra formal assumption θ ≤ 1/4 follows from μ ≤ L and
θ ≤ √(μ/L)/8 in the public theorem; keeping it explicit here isolates the
scalar rate comparison from the local geometry proof.