Retuned Nesterov Convergence #

The Nesterov algorithm achieves the sharp PL exponent by running with an internal parameter just below μ.

Key scalar comparison #

The general local theorem gives per-step Lyapunov contraction rate r = 1 − (1−θ)·√(μ'·η). The main specialization sets μ' = μ·(1−θ), η = 1/L, and chooses 0 < θ ≤ √(μ/L)/8.

Local theorem variants #

nesterov_convergence_at_base_point_position_params: state positions xₖ; arbitrary μ', θ, ρ, plus a scalar rate bound; reusable core theorem.
nesterov_convergence_at_base_point_position_theta: state positions xₖ; explicit retuning parameter θ, ρ = rhoOfTheta; local specialized theorem.

Rate bound: r ≤ exp(-√(μ/L)) #

source

theorem PLAcceleratedNesterovLean.nesterov_rate_bound_theta (L : NNReal) (hL : 0 < ↑L) (μ : ℝ) (hμ : 0 < μ) (θ : ℝ) (hθ_pos : 0 < θ) (hθ_le : θ ≤ √(μ / ↑L) / 8) (hθ_le_quarter : θ ≤ 1 / 4) :

1 - (1 - θ) * √(muOfTheta μ θ * (1 / ↑L)) ≤ Real.exp (-(1 / √(↑L / μ)))

If 0 < θ ≤ min (√(μ/L)/8) (1/4), then the retuned one-step factor from the theorem statement is bounded by the sharp exponential exp(-√(μ/L)).

State-position local theorem family #

source

theorem PLAcceleratedNesterovLean.nesterov_convergence_at_base_point_position_params {d : ℕ} (hd : 0 < d) (L : NNReal) (hL : 0 < ↑L) (μ : ℝ) (hμ : 0 < μ) (hμ_le_L : μ ≤ ↑L) (μ' : ℝ) (hμ'_pos : 0 < μ') (hμ'_lt : μ' < μ) (θ : ℝ) (hθ_pos : 0 < θ) (hθ_lt1 : θ < 1) (ρ : ℝ) (hρ : ρ = (1 - √(μ' * (1 / ↑L))) / (1 + √(μ' * (1 / ↑L)))) (hrate : 1 - (1 - θ) * √(μ' * (1 / ↑L)) ≤ Real.exp (-(1 / √(↑L / μ)))) (f : E d → ℝ) (S : Set (E d)) (hrange : S = argminSet f) (U : Set (E d)) (hTub : IsTubularNeighborhoodOfSubmanifold S U) (hPL : PolyakLojasiewicz f μ U) (hC2 : ContDiffOn ℝ 2 f U) (hLip : LipschitzOnWith L (gradient f) U) (π : E d → E d) (hπ_on_U : ∀ x ∈ U, π x ∈ S ∧ dist x (π x) = Metric.infDist x S) (hπ_fix : ∀ x ∈ S, π x = x) (hπ_in_S : ∀ (x : E d), π x ∈ S) (hgrad_zero : ∀ x ∈ S, gradient f x = 0) (mstar : E d) (hmstar : mstar ∈ S) :

∃ (α : ℝ), 0 < α ∧ Metric.ball mstar α ⊆ U ∧ ∀ x₀ ∈ Metric.ball mstar α, (∀ (k : ℕ), (nesterovSeqGen f (1 / ↑L) ρ { x := x₀, v := 0 } k).x ∈ U ∧ (nesterovSeqGen f (1 / ↑L) ρ { x := x₀, v := 0 } k).lookahead (1 / ↑L) ∈ U) ∧ HasAcceleratedRateWithPrefactorTwo f (fun (k : ℕ) => (nesterovSeqGen f (1 / ↑L) ρ { x := x₀, v := 0 } k).x) (↑L) μ x₀ ∧ HasAcceleratedRate f (fun (k : ℕ) => (nesterovSeqGen f (1 / ↑L) ρ { x := x₀, v := 0 } k).x) (↑L) μ

Parameterized local convergence theorem for the base positions xₙ.

The update queries the gradient at the look-ahead point xₙ' = xₙ + √η vₙ, but the theorem states the rate for the state positions xₙ. The retuned internal parameter μ', absorption budget θ, and momentum ρ are all explicit.

source

theorem PLAcceleratedNesterovLean.nesterov_convergence_at_base_point_position_theta {d : ℕ} (hd : 0 < d) (L : NNReal) (hL : 0 < ↑L) (μ : ℝ) (hμ : 0 < μ) (hμ_le_L : μ ≤ ↑L) (θ : ℝ) (hθ_pos : 0 < θ) (hθ_lt1 : θ < 1) (hθ_le : θ ≤ √(μ / ↑L) / 8) (hθ_le_quarter : θ ≤ 1 / 4) (f : E d → ℝ) (S : Set (E d)) (hrange : S = argminSet f) (U : Set (E d)) (hTub : IsTubularNeighborhoodOfSubmanifold S U) (hPL : PolyakLojasiewicz f μ U) (hC2 : ContDiffOn ℝ 2 f U) (hLip : LipschitzOnWith L (gradient f) U) (π : E d → E d) (hπ_on_U : ∀ x ∈ U, π x ∈ S ∧ dist x (π x) = Metric.infDist x S) (hπ_fix : ∀ x ∈ S, π x = x) (hπ_in_S : ∀ (x : E d), π x ∈ S) (hgrad_zero : ∀ x ∈ S, gradient f x = 0) (mstar : E d) (hmstar : mstar ∈ S) :

∃ (α : ℝ), 0 < α ∧ Metric.ball mstar α ⊆ U ∧ ∀ x₀ ∈ Metric.ball mstar α, (∀ (k : ℕ), (nesterovSeqGen f (1 / ↑L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).x ∈ U ∧ (nesterovSeqGen f (1 / ↑L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).lookahead (1 / ↑L) ∈ U) ∧ HasAcceleratedRateWithPrefactorTwo f (fun (k : ℕ) => (nesterovSeqGen f (1 / ↑L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).x) (↑L) μ x₀ ∧ HasAcceleratedRate f (fun (k : ℕ) => (nesterovSeqGen f (1 / ↑L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).x) (↑L) μ

Local convergence theorem for the base positions xₙ with explicit θ.

The extra formal assumption θ ≤ 1/4 follows from μ ≤ L and θ ≤ √(μ/L)/8 in the public theorem; keeping it explicit here isolates the scalar rate comparison from the local geometry proof.

Documentation

LeanPool.PLAcceleratedNesterovLean.Convergence.NesterovConvergence

Retuned Nesterov Convergence #

Key scalar comparison #

Local theorem variants #

Rate bound: r ≤ exp(-√(μ/L)) #

State-position local theorem family #