Documentation

LeanPool.FormalLearningTheory.Theorem.Extended

Extended Theorems #

Advice reduction, meta-learning lower-bound infrastructure, and separation results for compression and SQ dimension.

Multi-Task Meta-Learning Infrastructure #

structure TaskEnvironment (X : Type u) :

A task environment: a finite collection of concept classes (tasks) that a meta-learner is trained on. Each task is a concept class over the same domain X.

This is the formalization of Baxter (2000)'s "learning environment." In the full theory, tasks are drawn i.i.d. from a distribution over concept classes; here we use a finite deterministic collection as the base case.

Instances For
    structure MetaLearnerPAC (X : Type u) [MeasurableSpace X] :

    A meta-learner with PAC guarantees: given a task environment (training tasks), produces a BatchLearner and sample complexity function for new tasks.

    Compared to MetaLearner (in Active.lean), this structure:

    • takes a TaskEnvironment (multiple training tasks) rather than a single ConceptClass
    • exposes the sample complexity function (not just the learner)
    • is designed for quantitative PAC bounds, not just learnability

    The key question: does seeing n training tasks reduce the per-task sample complexity on new tasks? Baxter (2000) shows the answer is yes under task similarity, but the NFL lower bound still applies per-task.

    Instances For

      A task sample environment: n training tasks, each with m samples. The meta-learner observes labeled samples from each task and must produce a learner for a new (unseen) task.

      This extends TaskEnvironment by specifying sample sizes and the actual samples drawn. The meta-learner's output may depend on the samples but not on the true concepts.

      Instances For
        structure SampleMetaLearner (X : Type u) [MeasurableSpace X] :

        A sample-based meta-learner: sees labeled samples from n training tasks, produces a BatchLearner for new tasks. Unlike MetaLearnerPAC (which takes a TaskEnvironment directly), this meta-learner only sees the data, not the concept classes.

        • learn {n m : } : (Fin nFin mX × Bool)BatchLearner X Bool

          Given n × m labeled samples, produce a learner

        • sampleComplexity :

          Given n × m, produce sample complexity for the new task

        Instances For

          Advice Elimination Infrastructure #

          Joint measurability of a sample-dependent advice learner's evaluation map.

          Equations
          Instances For
            def PACLearnableWithAdviceRegular (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) (A : Type u_1) [Fintype A] [nonemptyA : Nonempty A] :

            PAC learnability with finite advice, plus measurability for holdout validation.

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              def adviceTrainSample {X : Type u} {m₁ m₂ : } (S : Fin (m₁ + m₂)X × Bool) :
              Fin m₁X × Bool

              First m₁ coordinates of a sample of size m₁ + m₂.

              Equations
              Instances For
                def adviceValSample {X : Type u} {m₁ m₂ : } (S : Fin (m₁ + m₂)X × Bool) :
                Fin m₂X × Bool

                Next m₂ coordinates of a sample of size m₁ + m₂.

                Equations
                Instances For
                  noncomputable def bestAdvice {X : Type u} {A : Type u_1} [Fintype A] [Nonempty A] (cand : AConcept X Bool) {m : } (Sval : Fin mX × Bool) :
                  A

                  Choose the advice value with minimum validation empirical error.

                  Equations
                  Instances For
                    noncomputable def adviceSelectedHypothesis {X : Type u} {A : Type u_1} [Fintype A] [Nonempty A] (LA : LearnerWithAdvice X Bool A) {m₁ m₂ : } (S : Fin (m₁ + m₂)X × Bool) :

                    The advice-elimination learner applied to a labeled sample.

                    Equations
                    Instances For

                      Advice elimination (Ben-David & Dichterman 1998): If C is PAC-learnable with concept-dependent advice from a FINITE set A (with measurability regularity), then C is PAC-learnable without advice.

                      Proof strategy: run the advice-augmented learner with each a ∈ A on a training portion of the sample, producing |A| candidate hypotheses. Use a validation portion to select the candidate with lowest empirical error. Union bound over |A| advice values + Hoeffding on validation controls total failure probability. Sample complexity: O(m_orig(ε/2, δ/(2|A|)) + log(|A|/δ)/ε²).

                      The [Fintype A] constraint is essential: for infinite A, the theorem is false (no finite union bound). [Nonempty A] ensures the advice space is inhabited.

                      theorem baxter_base_case (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (ML : MetaLearnerPAC X) (env : TaskEnvironment X) (C_new : ConceptClass X Bool) (d : ) (hd : VCDim X C_new = d) (hd_pos : 1 d) (ε δ : ) ( : 0 < ε) (hε1 : ε 1 / 4) ( : 0 < δ) (hδ1 : δ 1) (hδ2 : δ 1 / 7) (hPAC : ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure DcC_new, (MeasureTheory.Measure.pi fun (x : Fin (ML.sampleComplexity env ε δ)) => D) {xs : Fin (ML.sampleComplexity env ε δ)X | D {x : X | (ML.learn env).learn (fun (i : Fin (ML.sampleComplexity env ε δ)) => (xs i, c (xs i))) x c x} ENNReal.ofReal ε} ENNReal.ofReal (1 - δ)) :
                      (d - 1) / 2⌉₊ ML.sampleComplexity env ε δ

                      Baxter base case: any meta-learner's output is subject to the NFL lower bound. Even after seeing arbitrarily many training tasks, the meta-learner's output learner on a NEW task C_new with VCDim = d requires at least ⌈(d-1)/2⌉ samples.

                      This is the n=1 (single environment) base case of Baxter (2000). The full Baxter bound (n environments, per-task m ≥ d/(ε²·n)) requires multi-environment product measure infrastructure not yet built.

                      Proof: the meta-learner produces a BatchLearner L and sample complexity mf. If (L, mf) achieves PAC on C_new, then mf ε δ is a PAC-valid sample size, so pac_lower_bound_member gives ⌈(d-1)/2⌉ ≤ mf ε δ.

                      The full multi-environment Baxter bound is outside this kernel; this theorem records the single-environment lower-bound component.

                      theorem baxter_full (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (SML : SampleMetaLearner X) (C_new : ConceptClass X Bool) (d : ) (hd : VCDim X C_new = d) (hd_pos : 1 d) (ε δ : ) ( : 0 < ε) (hε1 : ε 1 / 4) ( : 0 < δ) (hδ1 : δ 1) (hδ2 : δ 1 / 7) (n m : ) (training_data : Fin nFin mX × Bool) (hPAC : ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure DcC_new, have mf := SML.sampleComplexity n m ε δ; (MeasureTheory.Measure.pi fun (x : Fin mf) => D) {xs : Fin mfX | D {x : X | (SML.learn training_data).learn (fun (i : Fin mf) => (xs i, c (xs i))) x c x} ENNReal.ofReal ε} ENNReal.ofReal (1 - δ)) :
                      (d - 1) / 2⌉₊ SML.sampleComplexity n m ε δ

                      Baxter's multi-task lower bound: any sample-based meta-learner that achieves PAC on a new task C_new with VCDim = d, after seeing n training tasks with m samples each, requires ⌈(d-1)/2⌉ samples for the new task.

                      This is the n-independent version. The n-dependent improvement m ≥ Ω(d/(ε²·n)) requires the product-measure information-theoretic argument.

                      Key insight: the meta-learner's output (L, mf) is a PAC witness for C_new. By pac_lower_bound_member, any PAC witness requires at least ⌈(d-1)/2⌉ samples. The meta-learner's training phase (seeing n tasks) cannot reduce this bound because the new task's concept class is adversarially chosen AFTER training.

                      The n-dependent improvement (Baxter 2000, Theorem 3): For the PRODUCT measure over n tasks × m samples per task, the adversary argument gives m ≥ Ω(d/(ε²·n)). This requires:

                      • TaskDistribution: a measure over concept classes
                      • Product measure: D^(n×m) decomposed as (D^m)^n
                      • Information-theoretic counting: n·m bits vs 2^d labelings This theorem proves the n-independent base case.

                      VC dimension does not determine SQ hardness: there exists a concept class with finite VC dimension but infinite SQ dimension under some distribution at some positive correlation threshold. Witness: singleton indicators on ℕ, with SQDimension = ⊤ at τ = 1. For any probability D on ℕ, the correlation between distinct indicators 1_i, 1_j is |1 - 2(D({i}) + D({j}))| ≤ 1, so every finite subset of C qualifies at τ = 1. Since C is infinite, SQDimension = ⊤. The statement quantifies the measurable domain, distribution, and threshold parameters used by SQDimension.