Extended Theorems #

Advice reduction, meta-learning lower-bound infrastructure, and separation results for compression and SQ dimension.

Multi-Task Meta-Learning Infrastructure #

structure TaskEnvironment (X : Type u) :

A task environment: a finite collection of concept classes (tasks) that a meta-learner is trained on. Each task is a concept class over the same domain X.

This is the formalization of Baxter (2000)'s "learning environment." In the full theory, tasks are drawn i.i.d. from a distribution over concept classes; here we use a finite deterministic collection as the base case.

numTasks : ℕ
Number of training tasks
tasks : Fin self.numTasks → ConceptClass X Bool
The concept classes for each task

Instances For

source

structure MetaLearnerPAC (X : Type u) [MeasurableSpace X] :

Type u

A meta-learner with PAC guarantees: given a task environment (training tasks), produces a BatchLearner and sample complexity function for new tasks.

Compared to MetaLearner (in Active.lean), this structure:

takes a TaskEnvironment (multiple training tasks) rather than a single ConceptClass
exposes the sample complexity function (not just the learner)
is designed for quantitative PAC bounds, not just learnability

The key question: does seeing n training tasks reduce the per-task sample complexity on new tasks? Baxter (2000) shows the answer is yes under task similarity, but the NFL lower bound still applies per-task.

learn : TaskEnvironment X → BatchLearner X Bool
Given training tasks, produce a learner for new tasks
sampleComplexity : TaskEnvironment X → ℝ → ℝ → ℕ
Given training tasks, produce a sample complexity function

Instances For

source

structure TaskSampleEnvironment (X : Type u) [MeasurableSpace X] :

Type u

A task sample environment: n training tasks, each with m samples. The meta-learner observes labeled samples from each task and must produce a learner for a new (unseen) task.

This extends TaskEnvironment by specifying sample sizes and the actual samples drawn. The meta-learner's output may depend on the samples but not on the true concepts.

numTasks : ℕ
Number of training tasks
samplesPerTask : ℕ
Samples per task
taskClasses : Fin self.numTasks → ConceptClass X Bool
The concept classes (one per task)
trueConcepts (j : Fin self.numTasks) : Concept X Bool
The true concepts (one per task, each in its class)
concept_mem (j : Fin self.numTasks) : self.trueConcepts j ∈ self.taskClasses j
Each true concept is in its class

Instances For

source

structure SampleMetaLearner (X : Type u) [MeasurableSpace X] :

Type u

A sample-based meta-learner: sees labeled samples from n training tasks, produces a BatchLearner for new tasks. Unlike MetaLearnerPAC (which takes a TaskEnvironment directly), this meta-learner only sees the data, not the concept classes.

learn {n m : ℕ} : (Fin n → Fin m → X × Bool) → BatchLearner X Bool
Given n × m labeled samples, produce a learner
sampleComplexity : ℕ → ℕ → ℝ → ℝ → ℕ
Given n × m, produce sample complexity for the new task

Instances For

Advice Elimination Infrastructure #

source

def AdviceEvalMeasurable {X : Type u} [MeasurableSpace X] {A : Type u_1} (LA : LearnerWithAdvice X Bool A) :

Prop

Joint measurability of a sample-dependent advice learner's evaluation map.

Equations

AdviceEvalMeasurable LA = ∀ (a : A) (m : ℕ), Measurable fun (p : (Fin m → X × Bool) × X) => LA.learnWithAdvice a p.1 p.2

Instances For

source

def PACLearnableWithAdviceRegular (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) (A : Type u_1) [Fintype A] [nonemptyA : Nonempty A] :

Prop

PAC learnability with finite advice, plus measurability for holdout validation.

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def adviceTrainSample {X : Type u} {m₁ m₂ : ℕ} (S : Fin (m₁ + m₂) → X × Bool) :

Fin m₁ → X × Bool

First m₁ coordinates of a sample of size m₁ + m₂.

Equations

adviceTrainSample S i = S ⟨↑i, ⋯⟩

Instances For

source

def adviceValSample {X : Type u} {m₁ m₂ : ℕ} (S : Fin (m₁ + m₂) → X × Bool) :

Fin m₂ → X × Bool

Next m₂ coordinates of a sample of size m₁ + m₂.

Equations

adviceValSample S j = S ⟨m₁ + ↑j, ⋯⟩

Instances For

source

noncomputable def bestAdvice {X : Type u} {A : Type u_1} [Fintype A] [Nonempty A] (cand : A → Concept X Bool) {m : ℕ} (Sval : Fin m → X × Bool) :

Choose the advice value with minimum validation empirical error.

Equations

bestAdvice cand Sval = Classical.choose ⋯

Instances For

source

noncomputable def adviceSelectedHypothesis {X : Type u} {A : Type u_1} [Fintype A] [Nonempty A] (LA : LearnerWithAdvice X Bool A) {m₁ m₂ : ℕ} (S : Fin (m₁ + m₂) → X × Bool) :

Concept X Bool

The advice-elimination learner applied to a labeled sample.

Equations

adviceSelectedHypothesis LA S = (fun (a : A) => LA.learnWithAdvice a (adviceTrainSample S)) (bestAdvice (fun (a : A) => LA.learnWithAdvice a (adviceTrainSample S)) (adviceValSample S))

Instances For

source

theorem advice_elimination (X : Type u) [MeasurableSpace X] (C : ConceptClass X Bool) [MeasurableHypotheses X C] (A : Type u_1) [Fintype A] [Nonempty A] :

PACLearnableWithAdviceRegular X C A → PACLearnable X C

Advice elimination (Ben-David & Dichterman 1998): If C is PAC-learnable with concept-dependent advice from a FINITE set A (with measurability regularity), then C is PAC-learnable without advice.

Proof strategy: run the advice-augmented learner with each a ∈ A on a training portion of the sample, producing |A| candidate hypotheses. Use a validation portion to select the candidate with lowest empirical error. Union bound over |A| advice values + Hoeffding on validation controls total failure probability. Sample complexity: O(m_orig(ε/2, δ/(2|A|)) + log(|A|/δ)/ε²).

The [Fintype A] constraint is essential: for infinite A, the theorem is false (no finite union bound). [Nonempty A] ensures the advice space is inhabited.

source

theorem baxter_base_case (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (ML : MetaLearnerPAC X) (env : TaskEnvironment X) (C_new : ConceptClass X Bool) (d : ℕ) (hd : VCDim X C_new = ↑d) (hd_pos : 1 ≤ d) (ε δ : ℝ) (hε : 0 < ε) (hε1 : ε ≤ 1 / 4) (hδ : 0 < δ) (hδ1 : δ ≤ 1) (hδ2 : δ ≤ 1 / 7) (hPAC : ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure D → ∀ c ∈ C_new, (MeasureTheory.Measure.pi fun (x : Fin (ML.sampleComplexity env ε δ)) => D) {xs : Fin (ML.sampleComplexity env ε δ) → X | D {x : X | (ML.learn env).learn (fun (i : Fin (ML.sampleComplexity env ε δ)) => (xs i, c (xs i))) x ≠ c x} ≤ ENNReal.ofReal ε} ≥ ENNReal.ofReal (1 - δ)) :

⌈(↑d - 1) / 2⌉₊ ≤ ML.sampleComplexity env ε δ

Baxter base case: any meta-learner's output is subject to the NFL lower bound. Even after seeing arbitrarily many training tasks, the meta-learner's output learner on a NEW task C_new with VCDim = d requires at least ⌈(d-1)/2⌉ samples.

This is the n=1 (single environment) base case of Baxter (2000). The full Baxter bound (n environments, per-task m ≥ d/(ε²·n)) requires multi-environment product measure infrastructure not yet built.

Proof: the meta-learner produces a BatchLearner L and sample complexity mf. If (L, mf) achieves PAC on C_new, then mf ε δ is a PAC-valid sample size, so pac_lower_bound_member gives ⌈(d-1)/2⌉ ≤ mf ε δ.

The full multi-environment Baxter bound is outside this kernel; this theorem records the single-environment lower-bound component.

source

theorem baxter_full (X : Type u) [MeasurableSpace X] [MeasurableSingletonClass X] (SML : SampleMetaLearner X) (C_new : ConceptClass X Bool) (d : ℕ) (hd : VCDim X C_new = ↑d) (hd_pos : 1 ≤ d) (ε δ : ℝ) (hε : 0 < ε) (hε1 : ε ≤ 1 / 4) (hδ : 0 < δ) (hδ1 : δ ≤ 1) (hδ2 : δ ≤ 1 / 7) (n m : ℕ) (training_data : Fin n → Fin m → X × Bool) (hPAC : ∀ (D : MeasureTheory.Measure X), MeasureTheory.IsProbabilityMeasure D → ∀ c ∈ C_new, have mf := SML.sampleComplexity n m ε δ; (MeasureTheory.Measure.pi fun (x : Fin mf) => D) {xs : Fin mf → X | D {x : X | (SML.learn training_data).learn (fun (i : Fin mf) => (xs i, c (xs i))) x ≠ c x} ≤ ENNReal.ofReal ε} ≥ ENNReal.ofReal (1 - δ)) :

⌈(↑d - 1) / 2⌉₊ ≤ SML.sampleComplexity n m ε δ

Baxter's multi-task lower bound: any sample-based meta-learner that achieves PAC on a new task C_new with VCDim = d, after seeing n training tasks with m samples each, requires ⌈(d-1)/2⌉ samples for the new task.

This is the n-independent version. The n-dependent improvement m ≥ Ω(d/(ε²·n)) requires the product-measure information-theoretic argument.

Key insight: the meta-learner's output (L, mf) is a PAC witness for C_new. By pac_lower_bound_member, any PAC witness requires at least ⌈(d-1)/2⌉ samples. The meta-learner's training phase (seeing n tasks) cannot reduce this bound because the new task's concept class is adversarially chosen AFTER training.

The n-dependent improvement (Baxter 2000, Theorem 3): For the PRODUCT measure over n tasks × m samples per task, the adversary argument gives m ≥ Ω(d/(ε²·n)). This requires:

TaskDistribution: a measure over concept classes
Product measure: D^(n×m) decomposed as (D^m)^n
Information-theoretic counting: n·m bits vs 2^d labelings This theorem proves the n-independent base case.

source

theorem vcdim_not_implies_hardness :

∃ (X : Type) (x : MeasurableSpace X) (C : ConceptClass X Bool), VCDim X C < ⊤ ∧ ∃ (D : MeasureTheory.Measure X) (_ : MeasureTheory.IsProbabilityMeasure D) (τ : ℝ), 0 < τ ∧ SQDimension X C D τ = ⊤

VC dimension does not determine SQ hardness: there exists a concept class with finite VC dimension but infinite SQ dimension under some distribution at some positive correlation threshold. Witness: singleton indicators on ℕ, with SQDimension = ⊤ at τ = 1. For any probability D on ℕ, the correlation between distinct indicators 1_i, 1_j is |1 - 2(D({i}) + D({j}))| ≤ 1, so every finite subset of C qualifies at τ = 1. Since C is infinite, SQDimension = ⊤. The statement quantifies the measurable domain, distribution, and threshold parameters used by SQDimension.

Documentation

LeanPool.FormalLearningTheory.Theorem.Extended

Extended Theorems #

Multi-Task Meta-Learning Infrastructure #

Advice Elimination Infrastructure #