Documentation

LeanPool.ZhangYeungInequality.CopyLemma

The Zhang-Yeung copy lemma #

The copy lemma of [@zhangyeung1998, §III, eqs. 44-45] is the auxiliary-probability construction at the heart of the 1998 proof of Theorem 3. Given four discrete random variables X, Y, Z, U on a probability space (Ω, μ), it produces an extended space (Ω', ν) carrying a second conditionally-independent copy (X₁, Y₁) of (X, Y) over the shared variables (Z, U). The auxiliary law is the paper's

q(x, y, z, u, x₁, y₁) := p(x, y, z, u) p(x₁, y₁, z, u) / p(z, u) (eq. 44)

and this module formalizes it via PFR's ProbabilityTheory.condIndep_copies applied to the pair ⟨X, Y⟩ conditioned on the shared variable ⟨Z, U⟩. The module then ships Lemma 2 (eq. 45),

Δ(Z, U | X, Y) = I(X; Y₁) - I(X; Y₁ | U) - I(X; Y₁ | Z) - I(Z; U | X, Y₁),

both as an abstract Shannon identity (under the hypothesis that one conditional mutual information vanishes) and as its Zhang-Yeung-flavored specialization to the copy projections. The three subtracted conditional mutual informations on the right are each nonnegative, so the identity immediately yields the inequality Δ(Z, U | X, Y) ≤ I(X; Y₁) (and the X ↔ X₁ symmetric variant I(Z; U) - 2·I(Z; U | X) ≤ I(X; X₁)) -- the two inputs to the Shannon chase that proves Theorem 3 in the next milestone.

Main statements #

Implementation notes #

The main copyLemma is a direct wrapper around ProbabilityTheory.condIndep_copies applied to ⟨X, Y⟩ : Ω → S₁ × S₂ with shared variable ⟨Z, U⟩ : Ω → S₃ × S₄. condIndep_copies returns two pair-valued copies W₁, W₂ and a shared pair-valued variable V; we project to the six individual random variables X', Y', X₁, Y₁, Z', U' via Prod.fst/Prod.snd. The 4-variable IdentDistrib outputs are recovered from the 2-level IdentDistribs PFR supplies by composition with a measurable rearrangement ((s₁, s₂), (s₃, s₄)) ↦ (s₁, s₂, s₃, s₄).

The four codomains S₁, S₂, S₃, S₄ are bound at a common universe u because condIndep_copies binds its two codomains at a single universe. This is a deviation from ZhangYeung/Delta.lean's Type* convention, made here because condIndep_copies itself requires it.

The derived corollaries (delta transports, delta identities, delta ≤ mutualInfo) live in their own section Consequences with a shared variable block packaging the eight relevant hypotheses (six measurabilities, two IdentDistribs, one CondIndepFun). A caller of copyLemma produces these eight hypotheses with one obtain, then applies the corollaries as black-box Shannon identities.

ZhangYeung.condIndepFun_comp (post-composition of CondIndepFun on its two measured coordinates) was promoted to ZhangYeung/Prelude.lean when M3 became its second consumer. IdentDistrib.condMutualInfo_eq now lives there as well, promoted for M5's tuple-level delta transports.

References #

Tags #

Shannon entropy, conditional mutual information, copy lemma, conditional independence, Zhang-Yeung

The main copy construction #

theorem ZhangYeung.copyLemma {Ω : Type u_1} [MeasurableSpace Ω] {S₁ S₂ S₃ S₄ : Type u} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [Finite S₃] [Finite S₄] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {X : ΩS₁} {Y : ΩS₂} {Z : ΩS₃} {U : ΩS₄} (hX : Measurable X) (hY : Measurable Y) (hZ : Measurable Z) (hU : Measurable U) (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] :
∃ (Ω' : Type u) (x : MeasurableSpace Ω') (ν : MeasureTheory.Measure Ω') (X' : Ω'S₁) (Y' : Ω'S₂) (X₁ : Ω'S₁) (Y₁ : Ω'S₂) (Z' : Ω'S₃) (U' : Ω'S₄), MeasureTheory.IsProbabilityMeasure ν Measurable X' Measurable Y' Measurable X₁ Measurable Y₁ Measurable Z' Measurable U' ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X' ω', Y' ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X₁ ω', Y₁ ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ ProbabilityTheory.CondIndepFun (fun (ω' : Ω') => (X' ω', Y' ω')) (fun (ω' : Ω') => (X₁ ω', Y₁ ω')) (fun (ω' : Ω') => (Z' ω', U' ω')) ν

The Zhang-Yeung copy lemma [@zhangyeung1998, §III, eq. 44]. Given four discrete random variables X, Y, Z, U on a probability space (Ω, μ), there exists an extended probability space (Ω', ν) carrying six projected random variables X', Y', X₁, Y₁, Z', U' such that (X', Y', Z', U') and (X₁, Y₁, Z', U') each have the original joint law of (X, Y, Z, U) under μ, and the two pairs (X', Y') and (X₁, Y₁) are conditionally independent given the shared pair (Z', U'). This is a direct wrapper around ProbabilityTheory.condIndep_copies applied to ⟨X, Y⟩ conditioned on ⟨Z, U⟩.

Lemma 2 (abstract Shannon identity) #

theorem ZhangYeung.delta_of_condMI_vanishes_eq {Ω : Type u_1} [MeasurableSpace Ω] {α : Type u_2} {β : Type u_3} {γ : Type u_4} {δ : Type u_5} [Finite α] [Finite β] [Finite γ] [Finite δ] [MeasurableSpace α] [MeasurableSpace β] [MeasurableSpace γ] [MeasurableSpace δ] [MeasurableSingletonClass α] [MeasurableSingletonClass β] [MeasurableSingletonClass γ] [MeasurableSingletonClass δ] {A : Ωα} {B : Ωβ} {C : Ωγ} {D : Ωδ} (hA : Measurable A) (hB : Measurable B) (hC : Measurable C) (hD : Measurable D) (ν : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure ν] (hVanish : I[A : D|prod B C;ν] = 0) :
delta B C A D ν = I[A : D; ν] - I[A : D|B;ν] - I[A : D|C;ν] - I[B : C|prod A D;ν]

Lemma 2 (abstract form) of [@zhangyeung1998, §III, eq. 45]. For any four discrete random variables A, B, C, D on a probability space (Ω, ν) with I[A : D | ⟨B, C⟩; ν] = 0,

Δ(B, C | A, D) = I[A : D] - I[A : D | B] - I[A : D | C] - I[B : C | ⟨A, D⟩].

This is the paper's eq. (45) abstracted away from the copy construction: the identity holds whenever one conditional mutual information vanishes. The three subtracted conditional mutual informations on the right are each nonnegative, so this identity immediately implies the paper's inequality form Δ(B, C | A, D) ≤ I[A : D] via linarith.

Consequences #

The lemmas in this section are parametrized on the outputs of copyLemma. A caller destructures copyLemma once via obtain, producing the eight structural hypotheses (six measurabilities, two 4-variable IdentDistribs, one CondIndepFun), and applies these lemmas one by one.

Measurable projection helpers #

Measurable repackings of a right-associated 4-tuple (a, b, c, d) : S₁ × S₂ × S₃ × S₄ into the three-variable shapes IdentDistrib.condMutualInfo_eq consumes. projZUA extracts (c, d, a) -- the (Z, U, X) triple; projZUB extracts (c, d, b) -- the (Z, U, Y) triple.

Triple-level IdentDistrib facts #

Each of the three triples extracted below feeds directly into IdentDistrib.condMutualInfo_eq. The Fintype/MeasurableSingletonClass/IsProbabilityMeasure side conditions are only needed by the downstream transport lemmas, so these triple facts live above the heavier instance block.

Finite-alphabet consequences #

The lemmas below all require discrete/countable side conditions on the four codomains plus an IsProbabilityMeasure instance on the copy measure ν.

Lemma 2 Form B (specialized to the copy projections) #

delta_of_condMI_vanishes_eq applied to the copy's (X', Y₁, Z', U') and (X', X₁, Z', U') projections, with the vanishing-CMI hypothesis supplied by the projected conditional-independence facts.

theorem ZhangYeung.copyLemma_delta_identity_Y₁ {S₁ : Type u_1} {S₂ : Type u_2} {S₃ : Type u_3} {S₄ : Type u_4} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [Finite S₁] [Finite S₂] [Finite S₃] [Finite S₄] [MeasurableSingletonClass S₁] [MeasurableSingletonClass S₂] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {Ω' : Type u_5} [MeasurableSpace Ω'] {ν : MeasureTheory.Measure Ω'} [MeasureTheory.IsProbabilityMeasure ν] {X' : Ω'S₁} {Y' : Ω'S₂} {X₁ : Ω'S₁} {Y₁ : Ω'S₂} {Z' : Ω'S₃} {U' : Ω'S₄} (hX' : Measurable X') (hY₁ : Measurable Y₁) (hZ' : Measurable Z') (hU' : Measurable U') (hCond : ProbabilityTheory.CondIndepFun (fun (ω' : Ω') => (X' ω', Y' ω')) (fun (ω' : Ω') => (X₁ ω', Y₁ ω')) (fun (ω' : Ω') => (Z' ω', U' ω')) ν) :
delta Z' U' X' Y₁ ν = I[X' : Y₁; ν] - I[X' : Y₁|Z';ν] - I[X' : Y₁|U';ν] - I[Z' : U'|prod X' Y₁;ν]

Lemma 2 Form B (primary) [@zhangyeung1998, §III, eq. 45]. The delta identity of Lemma 2 instantiated at the copy's (X', Y₁, Z', U') projections:

Δ(Z', U' | X', Y₁) = I[X' : Y₁] - I[X' : Y₁ | Z'] - I[X' : Y₁ | U'] - I[Z' : U' | ⟨X', Y₁⟩]

under the copy measure ν. The vanishing-CMI hypothesis is derived from the main CondIndepFun by projecting each measured pair to one coordinate.

theorem ZhangYeung.copyLemma_delta_identity_X_X₁ {S₁ : Type u_1} {S₂ : Type u_2} {S₃ : Type u_3} {S₄ : Type u_4} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [Finite S₁] [Finite S₃] [Finite S₄] [MeasurableSingletonClass S₁] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {Ω' : Type u_5} [MeasurableSpace Ω'] {ν : MeasureTheory.Measure Ω'} [MeasureTheory.IsProbabilityMeasure ν] {X' : Ω'S₁} {Y' : Ω'S₂} {X₁ : Ω'S₁} {Y₁ : Ω'S₂} {Z' : Ω'S₃} {U' : Ω'S₄} (hX' : Measurable X') (hX₁ : Measurable X₁) (hZ' : Measurable Z') (hU' : Measurable U') (hCond : ProbabilityTheory.CondIndepFun (fun (ω' : Ω') => (X' ω', Y' ω')) (fun (ω' : Ω') => (X₁ ω', Y₁ ω')) (fun (ω' : Ω') => (Z' ω', U' ω')) ν) :
delta Z' U' X' X₁ ν = I[X' : X₁; ν] - I[X' : X₁|Z';ν] - I[X' : X₁|U';ν] - I[Z' : U'|prod X' X₁;ν]

Lemma 2 Form B (symmetric) [@zhangyeung1998, §III, eq. 45]. The delta identity of Lemma 2 instantiated at the copy's (X', X₁, Z', U') projections, the X ↔ X₁ swap of copyLemma_delta_identity_Y₁:

Δ(Z', U' | X', X₁) = I[X' : X₁] - I[X' : X₁ | Z'] - I[X' : X₁ | U'] - I[Z' : U' | ⟨X', X₁⟩]

under the copy measure ν.

Delta transport and inequality corollaries #

The lemmas in this section take both the original-law hypotheses (`X, Y, Z, U` on `(Ω, μ)`) and the copy-law outputs (`X', Y', X₁, Y₁, Z', U'` on `(Ω', ν)`) plus the structural facts `hFirst, hSecond, hCond`, and relate the `Δ` under `μ` to the `Δ` under `ν`.

theorem ZhangYeung.copyLemma_delta_transport_Y_to_Y₁ {Ω : Type u_1} [MeasurableSpace Ω] {S₁ : Type u_2} {S₂ : Type u_3} {S₃ : Type u_4} {S₄ : Type u_5} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [Finite S₁] [Finite S₂] [Finite S₃] [Finite S₄] [MeasurableSingletonClass S₁] [MeasurableSingletonClass S₂] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {X : ΩS₁} {Y : ΩS₂} {Z : ΩS₃} {U : ΩS₄} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {Ω' : Type u_6} [MeasurableSpace Ω'] {ν : MeasureTheory.Measure Ω'} [MeasureTheory.IsProbabilityMeasure ν] {X' : Ω'S₁} {Y' : Ω'S₂} {X₁ : Ω'S₁} {Y₁ : Ω'S₂} {Z' : Ω'S₃} {U' : Ω'S₄} (hX : Measurable X) (hY : Measurable Y) (hZ : Measurable Z) (hU : Measurable U) (hX' : Measurable X') (hY₁ : Measurable Y₁) (hZ' : Measurable Z') (hU' : Measurable U') (hFirst : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X' ω', Y' ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) (hSecond : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X₁ ω', Y₁ ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) :
delta Z U X Y μ = delta Z' U' X' Y₁ ν

Bridge identity: Δ(Z, U | X, Y) μ = Δ(Z', U' | X', Y₁) ν. Each side of the delta expands into three mutual-information terms. IdentDistrib.mutualInfo_eq transports the unconditional I[Z : U]; IdentDistrib.condMutualInfo_eq transports the two conditional terms via the triple-level IdentDistribs extracted from hFirst and hSecond.

theorem ZhangYeung.copyLemma_delta_transport_X_to_X₁ {Ω : Type u_1} [MeasurableSpace Ω] {S₁ : Type u_2} {S₂ : Type u_3} {S₃ : Type u_4} {S₄ : Type u_5} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [Finite S₁] [Finite S₃] [Finite S₄] [MeasurableSingletonClass S₁] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {X : ΩS₁} {Y : ΩS₂} {Z : ΩS₃} {U : ΩS₄} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {Ω' : Type u_6} [MeasurableSpace Ω'] {ν : MeasureTheory.Measure Ω'} [MeasureTheory.IsProbabilityMeasure ν] {X' : Ω'S₁} {Y' : Ω'S₂} {X₁ : Ω'S₁} {Y₁ : Ω'S₂} {Z' : Ω'S₃} {U' : Ω'S₄} (hX : Measurable X) (hZ : Measurable Z) (hU : Measurable U) (hX' : Measurable X') (hX₁ : Measurable X₁) (hZ' : Measurable Z') (hU' : Measurable U') (hFirst : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X' ω', Y' ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) (hSecond : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X₁ ω', Y₁ ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) :
delta Z U X X μ = delta Z' U' X' X₁ ν

Symmetric bridge identity: Δ(Z, U | X, X) μ = Δ(Z', U' | X', X₁) ν. Transports both conditional-MI terms via copyLemma_triple_XFirst and copyLemma_triple_XSecond. The μ-side has X in both conditioner slots, so the two transports target the same pattern syntactically; closing by linarith over the two transport equalities sidesteps the ambiguity rw would otherwise face.

theorem ZhangYeung.copyLemma_delta_le_mutualInfo_Y₁ {Ω : Type u_1} [MeasurableSpace Ω] {S₁ : Type u_2} {S₂ : Type u_3} {S₃ : Type u_4} {S₄ : Type u_5} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [Finite S₁] [Finite S₂] [Finite S₃] [Finite S₄] [MeasurableSingletonClass S₁] [MeasurableSingletonClass S₂] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {X : ΩS₁} {Y : ΩS₂} {Z : ΩS₃} {U : ΩS₄} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {Ω' : Type u_6} [MeasurableSpace Ω'] {ν : MeasureTheory.Measure Ω'} [MeasureTheory.IsProbabilityMeasure ν] {X' : Ω'S₁} {Y' : Ω'S₂} {X₁ : Ω'S₁} {Y₁ : Ω'S₂} {Z' : Ω'S₃} {U' : Ω'S₄} (hX : Measurable X) (hY : Measurable Y) (hZ : Measurable Z) (hU : Measurable U) (hX' : Measurable X') (hY₁ : Measurable Y₁) (hZ' : Measurable Z') (hU' : Measurable U') (hFirst : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X' ω', Y' ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) (hSecond : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X₁ ω', Y₁ ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) (hCond : ProbabilityTheory.CondIndepFun (fun (ω' : Ω') => (X' ω', Y' ω')) (fun (ω' : Ω') => (X₁ ω', Y₁ ω')) (fun (ω' : Ω') => (Z' ω', U' ω')) ν) :
delta Z U X Y μ I[X' : Y₁; ν]

Primary inequality form of Lemma 2 [@zhangyeung1998, §III, eq. 45]. Combining the Y-to-Y₁ delta transport with Form B and the nonnegativity of the three subtracted conditional mutual informations on the right of eq. (45) gives

Δ(Z, U | X, Y) μ ≤ I[X' : Y₁; ν],

the first of the two inequalities Theorem 3's proof opens with on paper line 683.

theorem ZhangYeung.copyLemma_delta_le_mutualInfo_X_X₁ {Ω : Type u_1} [MeasurableSpace Ω] {S₁ : Type u_2} {S₂ : Type u_3} {S₃ : Type u_4} {S₄ : Type u_5} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [Finite S₁] [Finite S₃] [Finite S₄] [MeasurableSingletonClass S₁] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {X : ΩS₁} {Y : ΩS₂} {Z : ΩS₃} {U : ΩS₄} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {Ω' : Type u_6} [MeasurableSpace Ω'] {ν : MeasureTheory.Measure Ω'} [MeasureTheory.IsProbabilityMeasure ν] {X' : Ω'S₁} {Y' : Ω'S₂} {X₁ : Ω'S₁} {Y₁ : Ω'S₂} {Z' : Ω'S₃} {U' : Ω'S₄} (hX : Measurable X) (hZ : Measurable Z) (hU : Measurable U) (hX' : Measurable X') (hX₁ : Measurable X₁) (hZ' : Measurable Z') (hU' : Measurable U') (hFirst : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X' ω', Y' ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) (hSecond : ProbabilityTheory.IdentDistrib (fun (ω' : Ω') => (X₁ ω', Y₁ ω', Z' ω', U' ω')) (fun (ω : Ω) => (X ω, Y ω, Z ω, U ω)) ν μ) (hCond : ProbabilityTheory.CondIndepFun (fun (ω' : Ω') => (X' ω', Y' ω')) (fun (ω' : Ω') => (X₁ ω', Y₁ ω')) (fun (ω' : Ω') => (Z' ω', U' ω')) ν) :
I[Z : U; μ] - 2 * I[Z : U|X;μ] I[X' : X₁; ν]

Symmetric inequality form of Lemma 2 [@zhangyeung1998, §III, eq. 45]. The X ↔ X₁ variant,

I[Z : U; μ] - 2·I[Z : U | X; μ] ≤ I[X' : X₁; ν],

the second of the two inequalities Theorem 3's proof opens with on paper line 689.