Documentation

LeanPool.PLAcceleratedNesterovLean.MorseBott.Submanifold

Theorem 2.16: PŁ ⟹ S is a submanifold #

If f is C² and satisfies PŁ around x₀, then S = localMinSet f x₀ is a C¹ submanifold around x₀ with tangent space T = ker(Hess f(x₀)).

Proof structure #

Core lemmas #

  1. hessian_coercive_on_orthogonal_of_MuPL (§ 2): D²f(x₀) ≥ μ on T⊥. From the eigenvalue lower bound (Prop 2.12).
  2. hessian_injective_on_orthogonal (§ 2): ker(Hess) ∩ ker(Hess)⊥ = {0}. Pure linear algebra (Submodule.orthogonal_disjoint).
  3. ift_gives_graph (§ 4): Given ∇f(x₀) = 0 and Hessian coercivity on T⊥, the IFT produces a C¹ graph φ : T → T⊥ characterizing {projected gradient = 0}. Pure IFT, no PŁ needed.
  4. gradient_alignment (§ 4): Under PŁ, Df(x) = 0 ↔ projected gradient vanishes on T⊥. Uses constant rank (Cor 2.13) + self-adjointness of Hessian + PŁ.

Composed results #

  1. fderiv_eq_zero_of_isLocalMin (§ 1): Df(x₀) = 0 at any local min. Proved directly via Mathlib's IsLocalMin.fderiv_eq_zero.
  2. localMinSet_iff_critical_near (§ 3): proved from PŁ + local min. Chains Fermat's theorem with PŁ-based reverse.
  3. critical_set_is_graph (§ 4): composed from (5) + (1) + (3) + (4). Composes IFT graph with gradient alignment.
  4. MuPL.implies_submanifold (§ 5): composed from (6) + (7).

Mathematical outline #

The projected gradient map G : E → T⊥ is defined by G(x) = π_{T⊥}(∇f(x)), where ∇f(x) ∈ E is the Riesz representative of Df(x) ∈ E*.

  1. G(x₀) = 0 since Df(x₀) = 0 (gradient vanishes at local min, § 1).
  2. DG(x₀) = π_{T⊥} ∘ Hess_Riesz(x₀). Under PŁ, the eigenvalue lower bound (Prop 2.12) gives DG(x₀)|_{T⊥} invertible (eigenvalues ≥ μ > 0). See § 2.
  3. The IFT applied to G yields φ : T → T⊥ with φ(0) = 0, Dφ(0) = 0, characterizing {G = 0} as a graph. (Dφ(0) = 0 follows from DG(x₀)|_T = 0 since T = ker Hess.)
  4. Near x₀: x ∈ S iff Df(x) = 0 iff G(x) = 0 (gradient alignment, Lemma 2.14, and constant rank, Cor 2.13). See § 3.
  5. Combining: S = graph(φ) near x₀, giving the submanifold structure.

References #

theorem PLAcceleratedNesterovLean.fderiv_eq_zero_of_isLocalMin {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] (f : E) (x₀ : E) (hmin : IsLocalMin f x₀) :
fderiv f x₀ = 0

At a local minimum, the Fréchet derivative vanishes (Fermat's theorem).

theorem PLAcceleratedNesterovLean.hessian_coercive_on_orthogonal_of_MuPL {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E) (μ : ) (x₀ : E) ( : 0 < μ) (hf : ContDiffAt 2 f x₀) (hmin : IsLocalMin f x₀) (hPL : MuPL f μ x₀) (v : E) :
v (hessianKer f x₀)((hessian f x₀) v) v μ * v ^ 2

Under μ-PŁ, the Hessian D²f(x₀) is μ-coercive on T⊥ = (hessianKer f x₀)⊥.

theorem PLAcceleratedNesterovLean.hessian_injective_on_orthogonal {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] (f : E) (x₀ : E) (v : (hessianKer f x₀)) (hv : (hessian f x₀) v = 0) :
v = 0

The Hessian is injective on the orthogonal complement of its kernel.

Proof: If v ∈ T⊥ and Hv = 0, then v ∈ T = ker(Hess), so v ∈ T ∩ T⊥. In an inner product space T ∩ T⊥ = ⊥ (Submodule.orthogonal_disjoint), hence v = 0.

This is a corollary of hessian_coercive_on_orthogonal_of_MuPL (coercivity ⟹ injectivity), but can also be proved directly as pure linear algebra without PŁ.

theorem PLAcceleratedNesterovLean.localMinSet_iff_critical_near {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] (f : E) (μ : ) (x₀ : E) (_hμ : 0 < μ) (_hf : ContDiffAt 2 f x₀) (hmin : IsLocalMin f x₀) (hPL : MuPL f μ x₀) :
Vnhds x₀, xV, x localMinSet f x₀ fderiv f x = 0

Near a local minimizer satisfying μ-PŁ, membership in S = localMinSet f x₀ is equivalent to being a critical point (Df(x) = 0).

Forward (x ∈ S ⟹ Df(x) = 0): Fermat's theorem — every local minimizer has zero derivative (IsLocalMin.fderiv_eq_zero).

Reverse (Df(x) = 0 ⟹ x ∈ S, for x near x₀):

  1. μ-PŁ gives f(x) − f(x₀) ≤ (2μ)⁻¹ ‖Df(x)‖² = 0, so f(x) ≤ f(x₀).
  2. x₀ is a local min, so f(y) ≥ f(x₀) for y near x₀. Since x is near x₀, f(x) ≥ f(x₀). Combined: f(x) = f(x₀).
  3. x inherits the local-min property: f(y) ≥ f(x₀) = f(x) for y near both x and x₀.
theorem PLAcceleratedNesterovLean.ift_gives_graph {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E) (μ : ) (x₀ : E) ( : 0 < μ) (hf : ContDiffAt 2 f x₀) (hgrad : fderiv f x₀ = 0) (hcoer : v(hessianKer f x₀), ((hessian f x₀) v) v μ * v ^ 2) :
∃ (U : Set E) (_ : U nhds x₀) (φ : (hessianKer f x₀)(hessianKer f x₀)), ContDiffAt 1 φ 0 φ 0 = 0 fderiv φ 0 = 0 xU, (∀ (w : (hessianKer f x₀)), (fderiv f x) w = 0) ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)) = (φ ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)))

IFT core step: Given ∇f(x₀) = 0 and Hessian μ-coercive on T⊥, the IFT produces a C¹ graph φ : T → T⊥ characterizing the projected critical set {G = 0} near x₀, where G(x) = π_{T⊥}(∇f(x)).

Setup: Decompose E ≅ T × T⊥ via the orthogonal direct sum. Define the projected gradient map: G : T × T⊥ → T⊥, G(t, n) = π_{T⊥}(∇f(x₀ + t + n)) In Lean, "G(x) = 0" is expressed as ∀ w : T⊥, Df(x)(w) = 0 i.e., the Fréchet derivative at x vanishes on all normal vectors.

Hypotheses used: • hgrad: G(0,0) = π_{T⊥}(∇f(x₀)) = 0. • hcoer: ∂G/∂n(0,0) = π_{T⊥} ∘ Hess_Riesz(x₀)|_{T⊥} is μ-coercive, hence a continuous linear equivalence T⊥ ≃L[ℝ] T⊥. • hf: f is C², so G is C¹ (IsContDiffImplicitAt).

IFT output: φ : T → T⊥ with φ(0) = 0, Dφ(0) = 0, φ is C¹, and {G = 0} = graph(φ) near x₀.

Important: This characterizes {G = 0} (projected gradient vanishes on T⊥), not {Df = 0} (full gradient vanishes). Without PŁ, these differ: e.g. f(x,y) = y²/2 + x³/3 has {G = 0} = {y = 0} but {Df = 0} = {(0,0)}. The upgrade {G = 0} → {Df = 0} requires gradient_alignment, which uses PŁ.

theorem PLAcceleratedNesterovLean.gradient_alignment {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E) (μ : ) (x₀ : E) ( : 0 < μ) (hf : ContDiffAt 2 f x₀) (hmin : IsLocalMin f x₀) (hPL : MuPL f μ x₀) :
Wnhds x₀, xW, fderiv f x = 0 ∀ (w : (hessianKer f x₀)), (fderiv f x) w = 0

Gradient alignment (Lemma 2.14 / Cor 2.13 in the paper). Under PŁ, the full gradient vanishes iff its T⊥-projection vanishes.

Forward (trivial): Df(x) = 0 ⟹ Df(x)(w) = 0 for all w ∈ T⊥.

Backward (uses PŁ): If Df(x) vanishes on T⊥, i.e. ∇f(x) ∈ T, then ∇f(x) = 0. The key argument uses self-adjointness of the Hessian: since H is symmetric and T = ker(H), H maps T⊥ to T⊥, so π_T(∇f(x)) = π_T(H(x − x₀) + o(‖x − x₀‖)) = o(‖x − x₀‖). Thus ‖Df(x)‖ = o(‖x − x₀‖) when G(x) = 0. Combined with PŁ (f(x) − f(x₀) ≤ (2μ)⁻¹ ‖Df(x)‖²) and f(x) ≥ f(x₀), constant rank of the Hessian on S (Cor 2.13) forces Df(x) = 0 near x₀.

theorem PLAcceleratedNesterovLean.critical_set_is_graph {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E) (μ : ) (x₀ : E) ( : 0 < μ) (hf : ContDiffAt 2 f x₀) (hmin : IsLocalMin f x₀) (hPL : MuPL f μ x₀) :
∃ (U : Set E) (_ : U nhds x₀) (φ : (hessianKer f x₀)(hessianKer f x₀)), ContDiffAt 1 φ 0 φ 0 = 0 fderiv φ 0 = 0 xU, fderiv f x = 0 ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)) = (φ ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)))

IFT for the critical set (Lemma 2.15 in the paper).

The critical set {x near x₀ | Df(x) = 0} is locally a C¹ graph over T = ker(Hess f(x₀)): there exist U ∈ 𝓝 x₀ and a C¹ map φ : T → T⊥ with φ(0) = 0, Dφ(0) = 0, such that for x ∈ U:

Df(x) = 0 ↔ π_{T⊥}(x − x₀) = φ(π_T(x − x₀))

Proof (composition of §§ 1–2, ift_gives_graph, and gradient_alignment):

  1. fderiv_eq_zero_of_isLocalMin: ∇f(x₀) = 0 (Fermat, § 1).
  2. hessian_coercive_on_orthogonal_of_MuPL: D²f(x₀) ≥ μ on T⊥ (§ 2).
  3. ift_gives_graph: IFT characterizes {projected gradient = 0} as graph(φ) from (1) and (2).
  4. gradient_alignment: PŁ upgrades {proj. gradient = 0} to {Df = 0}.
theorem PLAcceleratedNesterovLean.MuPL.implies_submanifold {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E) (μ : ) (x₀ : E) ( : 0 < μ) (hf : ContDiffAt 2 f x₀) (hmin : IsLocalMin f x₀) (hPL : MuPL f μ x₀) :

Theorem 2.16 (PŁ ⟹ S is a submanifold).

If f is C² and satisfies PŁ around x₀, then S = localMinSet f x₀ is a C¹ submanifold around x₀ with tangent space T = ker(Hess f(x₀)).

Proof: Two parts.

  1. Membership: x₀ ∈ S holds by self_mem_localMinSet.
  2. Graph structure: Chain two equivalences on intersected neighborhoods:
theorem PLAcceleratedNesterovLean.critical_set_is_graph₂ {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E) (μ : ) (x₀ : E) ( : 0 < μ) (hf : ContDiffAt 3 f x₀) (hmin : IsLocalMin f x₀) (hPL : MuPL f μ x₀) :
∃ (U : Set E) (_ : U nhds x₀) (φ : (hessianKer f x₀)(hessianKer f x₀)), ContDiffAt 2 φ 0 φ 0 = 0 fderiv φ 0 = 0 xU, fderiv f x = 0 ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)) = (φ ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)))

C³ version of critical_set_is_graph: yields a C² graph.

theorem PLAcceleratedNesterovLean.MuPL.implies_submanifold_C2_chart {E : Type u_1} [NormedAddCommGroup E] [InnerProductSpace E] [FiniteDimensional E] (f : E) (μ : ) (x₀ : E) ( : 0 < μ) (hf : ContDiffAt 3 f x₀) (hmin : IsLocalMin f x₀) (hPL : MuPL f μ x₀) :
x₀ localMinSet f x₀ ∃ (U : Set E) (_ : U nhds x₀) (φ : (hessianKer f x₀)(hessianKer f x₀)), ContDiffAt 2 φ 0 φ 0 = 0 fderiv φ 0 = 0 xU, x localMinSet f x₀ ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)) = (φ ((hessianKer f x₀).orthogonalProjectionOnto (x - x₀)))

C³ version of MuPL.implies_submanifold: the local-min set is locally a C² graph.