Theorem 2.16: PŁ ⟹ S is a submanifold #
If f is C² and satisfies PŁ around x₀, then S = localMinSet f x₀ is a C¹ submanifold around x₀ with tangent space T = ker(Hess f(x₀)).
Proof structure #
Core lemmas #
hessian_coercive_on_orthogonal_of_MuPL(§ 2): D²f(x₀) ≥ μ on T⊥. From the eigenvalue lower bound (Prop 2.12).hessian_injective_on_orthogonal(§ 2): ker(Hess) ∩ ker(Hess)⊥ = {0}. Pure linear algebra (Submodule.orthogonal_disjoint).ift_gives_graph(§ 4): Given ∇f(x₀) = 0 and Hessian coercivity on T⊥, the IFT produces a C¹ graph φ : T → T⊥ characterizing {projected gradient = 0}. Pure IFT, no PŁ needed.gradient_alignment(§ 4): Under PŁ, Df(x) = 0 ↔ projected gradient vanishes on T⊥. Uses constant rank (Cor 2.13) + self-adjointness of Hessian + PŁ.
Composed results #
fderiv_eq_zero_of_isLocalMin(§ 1): Df(x₀) = 0 at any local min. Proved directly via Mathlib'sIsLocalMin.fderiv_eq_zero.localMinSet_iff_critical_near(§ 3): proved from PŁ + local min. Chains Fermat's theorem with PŁ-based reverse.critical_set_is_graph(§ 4): composed from (5) + (1) + (3) + (4). Composes IFT graph with gradient alignment.MuPL.implies_submanifold(§ 5): composed from (6) + (7).
Mathematical outline #
The projected gradient map G : E → T⊥ is defined by G(x) = π_{T⊥}(∇f(x)), where ∇f(x) ∈ E is the Riesz representative of Df(x) ∈ E*.
- G(x₀) = 0 since Df(x₀) = 0 (gradient vanishes at local min, § 1).
- DG(x₀) = π_{T⊥} ∘ Hess_Riesz(x₀). Under PŁ, the eigenvalue lower bound (Prop 2.12) gives DG(x₀)|_{T⊥} invertible (eigenvalues ≥ μ > 0). See § 2.
- The IFT applied to G yields φ : T → T⊥ with φ(0) = 0, Dφ(0) = 0, characterizing {G = 0} as a graph. (Dφ(0) = 0 follows from DG(x₀)|_T = 0 since T = ker Hess.)
- Near x₀: x ∈ S iff Df(x) = 0 iff G(x) = 0 (gradient alignment, Lemma 2.14, and constant rank, Cor 2.13). See § 3.
- Combining: S = graph(φ) near x₀, giving the submanifold structure.
References #
- Rebjock & Boumal, Theorem 2.16, Lemmas 2.14–2.15.
- Implicit function theorem: Mathlib
IsContDiffImplicitAt/ImplicitFunctionData.
At a local minimum, the Fréchet derivative vanishes (Fermat's theorem).
Under μ-PŁ, the Hessian D²f(x₀) is μ-coercive on T⊥ = (hessianKer f x₀)⊥.
The Hessian is injective on the orthogonal complement of its kernel.
Proof: If v ∈ T⊥ and Hv = 0, then v ∈ T = ker(Hess), so
v ∈ T ∩ T⊥. In an inner product space T ∩ T⊥ = ⊥
(Submodule.orthogonal_disjoint), hence v = 0.
This is a corollary of hessian_coercive_on_orthogonal_of_MuPL
(coercivity ⟹ injectivity), but can also be proved directly
as pure linear algebra without PŁ.
Near a local minimizer satisfying μ-PŁ, membership in S = localMinSet f x₀ is equivalent to being a critical point (Df(x) = 0).
Forward (x ∈ S ⟹ Df(x) = 0): Fermat's theorem — every local
minimizer has zero derivative (IsLocalMin.fderiv_eq_zero).
Reverse (Df(x) = 0 ⟹ x ∈ S, for x near x₀):
- μ-PŁ gives f(x) − f(x₀) ≤ (2μ)⁻¹ ‖Df(x)‖² = 0, so f(x) ≤ f(x₀).
- x₀ is a local min, so f(y) ≥ f(x₀) for y near x₀. Since x is near x₀, f(x) ≥ f(x₀). Combined: f(x) = f(x₀).
- x inherits the local-min property: f(y) ≥ f(x₀) = f(x) for y near both x and x₀.
IFT core step: Given ∇f(x₀) = 0 and Hessian μ-coercive on T⊥, the IFT produces a C¹ graph φ : T → T⊥ characterizing the projected critical set {G = 0} near x₀, where G(x) = π_{T⊥}(∇f(x)).
Setup: Decompose E ≅ T × T⊥ via the orthogonal direct sum. Define
the projected gradient map:
G : T × T⊥ → T⊥, G(t, n) = π_{T⊥}(∇f(x₀ + t + n))
In Lean, "G(x) = 0" is expressed as
∀ w : T⊥, Df(x)(w) = 0
i.e., the Fréchet derivative at x vanishes on all normal vectors.
Hypotheses used:
• hgrad: G(0,0) = π_{T⊥}(∇f(x₀)) = 0.
• hcoer: ∂G/∂n(0,0) = π_{T⊥} ∘ Hess_Riesz(x₀)|_{T⊥} is μ-coercive,
hence a continuous linear equivalence T⊥ ≃L[ℝ] T⊥.
• hf: f is C², so G is C¹ (IsContDiffImplicitAt).
IFT output: φ : T → T⊥ with φ(0) = 0, Dφ(0) = 0, φ is C¹, and {G = 0} = graph(φ) near x₀.
Important: This characterizes {G = 0} (projected gradient vanishes
on T⊥), not {Df = 0} (full gradient vanishes). Without PŁ, these
differ: e.g. f(x,y) = y²/2 + x³/3 has {G = 0} = {y = 0} but
{Df = 0} = {(0,0)}. The upgrade {G = 0} → {Df = 0} requires
gradient_alignment, which uses PŁ.
Gradient alignment (Lemma 2.14 / Cor 2.13 in the paper). Under PŁ, the full gradient vanishes iff its T⊥-projection vanishes.
Forward (trivial): Df(x) = 0 ⟹ Df(x)(w) = 0 for all w ∈ T⊥.
Backward (uses PŁ): If Df(x) vanishes on T⊥, i.e. ∇f(x) ∈ T, then ∇f(x) = 0. The key argument uses self-adjointness of the Hessian: since H is symmetric and T = ker(H), H maps T⊥ to T⊥, so π_T(∇f(x)) = π_T(H(x − x₀) + o(‖x − x₀‖)) = o(‖x − x₀‖). Thus ‖Df(x)‖ = o(‖x − x₀‖) when G(x) = 0. Combined with PŁ (f(x) − f(x₀) ≤ (2μ)⁻¹ ‖Df(x)‖²) and f(x) ≥ f(x₀), constant rank of the Hessian on S (Cor 2.13) forces Df(x) = 0 near x₀.
IFT for the critical set (Lemma 2.15 in the paper).
The critical set {x near x₀ | Df(x) = 0} is locally a C¹ graph over T = ker(Hess f(x₀)): there exist U ∈ 𝓝 x₀ and a C¹ map φ : T → T⊥ with φ(0) = 0, Dφ(0) = 0, such that for x ∈ U:
Df(x) = 0 ↔ π_{T⊥}(x − x₀) = φ(π_T(x − x₀))
Proof (composition of §§ 1–2, ift_gives_graph, and
gradient_alignment):
fderiv_eq_zero_of_isLocalMin: ∇f(x₀) = 0 (Fermat, § 1).hessian_coercive_on_orthogonal_of_MuPL: D²f(x₀) ≥ μ on T⊥ (§ 2).ift_gives_graph: IFT characterizes {projected gradient = 0} as graph(φ) from (1) and (2).gradient_alignment: PŁ upgrades {proj. gradient = 0} to {Df = 0}.
Theorem 2.16 (PŁ ⟹ S is a submanifold).
If f is C² and satisfies PŁ around x₀, then S = localMinSet f x₀ is a C¹ submanifold around x₀ with tangent space T = ker(Hess f(x₀)).
Proof: Two parts.
- Membership: x₀ ∈ S holds by
self_mem_localMinSet. - Graph structure: Chain two equivalences on intersected neighborhoods:
localMinSet_iff_critical_near(§ 3): x ∈ S ↔ Df(x) = 0critical_set_is_graph(§ 4): Df(x) = 0 ↔ graph(φ) giving x ∈ S ↔ π_{T⊥}(x − x₀) = φ(π_T(x − x₀)).
C³ version of critical_set_is_graph: yields a C² graph.
C³ version of MuPL.implies_submanifold: the local-min set is locally a
C² graph.