Documentation

LeanPool.FormalLearningTheory.Complexity.Generalization.Tail

LeanPool.FormalLearningTheory.Complexity.Generalization.Tail #

def blockExtract {α : Type u_1} (k m : ) (S : Fin (k * m)α) (j : Fin k) :
Fin mα

Extract block j from a flat array of k*m elements, using finProdFinEquiv.

Equations
Instances For
    def majorityVote (k : ) (votes : Fin kBool) :

    Boolean majority vote: returns true iff strictly more than half the votes are true.

    Equations
    Instances For
      theorem block_extract_disjoint (k m : ) (j₁ j₂ : Fin k) (hne : j₁ j₂) :

      Block index sets are disjoint for distinct blocks.

      theorem block_extract_measurable {X : Type u_1} [MeasurableSpace X] (k m : ) (j : Fin k) :
      Measurable fun (ω : Fin (k * m)X) => blockExtract k m ω j

      Block extraction is measurable: extracting block j from a pi-type is measurable.

      theorem iIndepFun_block_extract {X : Type u_1} [MeasurableSpace X] (k m : ) (D : MeasureTheory.Measure X) [MeasureTheory.IsProbabilityMeasure D] :
      ProbabilityTheory.iIndepFun (fun (j : Fin k) (ω : Fin (k * m)X) => blockExtract k m ω j) (MeasureTheory.Measure.pi fun (x : Fin (k * m)) => D)

      Block extractions are independent under the product measure. Key infrastructure for boosting (D4) and probability amplification.

      theorem shatters_realize {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) (f : TBool) :
      cC, ∀ (x : T), c x = f x

      Shattering lifting: if T is shattered by C and f : ↥T → Bool, then there exists c ∈ C that agrees with f on all of T.

      theorem shatters_hard_labeling {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) (h : TBool) (hcard : 2 T.card) :
      cC, T.card < 4 * {x : T | c x h x}.card

      Key counting lemma: for any h : ↥T → Bool on a shattered set T with |T| ≥ 2, there exists c ∈ C with #{x ∈ T | c x ≠ h x} > |T|/4. Lifts exists_many_disagreements through shattering.

      theorem nfl_per_sample_shattered {X : Type u} {C : ConceptClass X Bool} {T : Finset X} (hT : Shatters X C T) {m : } (hTcard : 2 * m < T.card) (xs : Fin mX) (h : XBool) :
      cC, (∀ (i : Fin m), xs i Tc (xs i) = h (xs i)) T.card < 4 * {xT | c x h x}.card

      NFL per-sample lemma for shattered sets: for ANY fixed sample xs and ANY hypothesis h, there exists c ∈ C agreeing with h on sample points but with high error (> 1 / 4) on the shattered set T. Uses the counting argument on unseen points via disagreement_sum_eq.

      If VCDim = ⊤, then C is not PAC learnable. Proof: for any learner L with sample function mf, pick ε = 1 / 4, δ = 1 / 4. Let m = mf(1 / 4, 1 / 4). Since VCDim = ⊤, ∃ shattered set S with |S| ≥ 2m. Put D = uniform on S. For random labeling, any m-sample learner has expected error ≥ 1 / 4 on unseen points. This is the core of pac_imp_vcdim_finite (contrapositive direction).

      @[reducible, inline]
      abbrev DriftRate :

      Drift rate: how fast the target concept changes over time.

      Equations
      Instances For