Documentation

LeanPool.PLAcceleratedNesterovLean.Convergence.NesterovConvergence

Retuned Nesterov Convergence #

The Nesterov algorithm achieves the sharp PL exponent by running with an internal parameter just below μ.

Key scalar comparison #

The general local theorem gives per-step Lyapunov contraction rate r = 1 − (1−θ)·√(μ'·η). The main specialization sets μ' = μ·(1−θ), η = 1/L, and chooses 0 < θ ≤ √(μ/L)/8.

Local theorem variants #

Rate bound: r ≤ exp(-√(μ/L)) #

theorem PLAcceleratedNesterovLean.nesterov_rate_bound_theta (L : NNReal) (hL : 0 < L) (μ : ) ( : 0 < μ) (θ : ) (hθ_pos : 0 < θ) (hθ_le : θ (μ / L) / 8) (hθ_le_quarter : θ 1 / 4) :
1 - (1 - θ) * (muOfTheta μ θ * (1 / L)) Real.exp (-(1 / (L / μ)))

If 0 < θ ≤ min (√(μ/L)/8) (1/4), then the retuned one-step factor from the theorem statement is bounded by the sharp exponential exp(-√(μ/L)).

State-position local theorem family #

theorem PLAcceleratedNesterovLean.nesterov_convergence_at_base_point_position_params {d : } (hd : 0 < d) (L : NNReal) (hL : 0 < L) (μ : ) ( : 0 < μ) (hμ_le_L : μ L) (μ' : ) (hμ'_pos : 0 < μ') (hμ'_lt : μ' < μ) (θ : ) (hθ_pos : 0 < θ) (hθ_lt1 : θ < 1) (ρ : ) ( : ρ = (1 - (μ' * (1 / L))) / (1 + (μ' * (1 / L)))) (hrate : 1 - (1 - θ) * (μ' * (1 / L)) Real.exp (-(1 / (L / μ)))) (f : E d) (S : Set (E d)) (hrange : S = argminSet f) (U : Set (E d)) (hTub : IsTubularNeighborhoodOfSubmanifold S U) (hPL : PolyakLojasiewicz f μ U) (hC2 : ContDiffOn 2 f U) (hLip : LipschitzOnWith L (gradient f) U) (π : E dE d) (hπ_on_U : xU, π x S dist x (π x) = Metric.infDist x S) (hπ_fix : xS, π x = x) (hπ_in_S : ∀ (x : E d), π x S) (hgrad_zero : xS, gradient f x = 0) (mstar : E d) (hmstar : mstar S) :
∃ (α : ), 0 < α Metric.ball mstar α U x₀Metric.ball mstar α, (∀ (k : ), (nesterovSeqGen f (1 / L) ρ { x := x₀, v := 0 } k).x U (nesterovSeqGen f (1 / L) ρ { x := x₀, v := 0 } k).lookahead (1 / L) U) HasAcceleratedRateWithPrefactorTwo f (fun (k : ) => (nesterovSeqGen f (1 / L) ρ { x := x₀, v := 0 } k).x) (↑L) μ x₀ HasAcceleratedRate f (fun (k : ) => (nesterovSeqGen f (1 / L) ρ { x := x₀, v := 0 } k).x) (↑L) μ

Parameterized local convergence theorem for the base positions xₙ.

The update queries the gradient at the look-ahead point xₙ' = xₙ + √η vₙ, but the theorem states the rate for the state positions xₙ. The retuned internal parameter μ', absorption budget θ, and momentum ρ are all explicit.

theorem PLAcceleratedNesterovLean.nesterov_convergence_at_base_point_position_theta {d : } (hd : 0 < d) (L : NNReal) (hL : 0 < L) (μ : ) ( : 0 < μ) (hμ_le_L : μ L) (θ : ) (hθ_pos : 0 < θ) (hθ_lt1 : θ < 1) (hθ_le : θ (μ / L) / 8) (hθ_le_quarter : θ 1 / 4) (f : E d) (S : Set (E d)) (hrange : S = argminSet f) (U : Set (E d)) (hTub : IsTubularNeighborhoodOfSubmanifold S U) (hPL : PolyakLojasiewicz f μ U) (hC2 : ContDiffOn 2 f U) (hLip : LipschitzOnWith L (gradient f) U) (π : E dE d) (hπ_on_U : xU, π x S dist x (π x) = Metric.infDist x S) (hπ_fix : xS, π x = x) (hπ_in_S : ∀ (x : E d), π x S) (hgrad_zero : xS, gradient f x = 0) (mstar : E d) (hmstar : mstar S) :
∃ (α : ), 0 < α Metric.ball mstar α U x₀Metric.ball mstar α, (∀ (k : ), (nesterovSeqGen f (1 / L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).x U (nesterovSeqGen f (1 / L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).lookahead (1 / L) U) HasAcceleratedRateWithPrefactorTwo f (fun (k : ) => (nesterovSeqGen f (1 / L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).x) (↑L) μ x₀ HasAcceleratedRate f (fun (k : ) => (nesterovSeqGen f (1 / L) (rhoOfTheta (↑L) μ θ) { x := x₀, v := 0 } k).x) (↑L) μ

Local convergence theorem for the base positions xₙ with explicit θ.

The extra formal assumption θ ≤ 1/4 follows from μ ≤ L and θ ≤ √(μ/L)/8 in the public theorem; keeping it explicit here isolates the scalar rate comparison from the local geometry proof.