Documentation

LeanPool.ZhangYeungInequality.Theorem2

Zhang-Yeung Theorem 2: a conditional information inequality #

Theorem 2 of [@zhangyeung1998], originally proved as Theorem 3 of [@zhangyeung1997], states that for any four discrete random variables X, Y, Z, U, the hypothesis

I[X : Y; μ] = 0 and I[X : Y | Z; μ] = 0 (eq. 16)

implies the conditional information inequality

I[X : Y | ⟨Z, U⟩; μ] ≤ I[Z : U | ⟨X, Y⟩; μ] + I[X : Y | U; μ]. (eq. 17)

This module formalizes the implication (16) ⇒ (17) on finite-alphabet random variables. It is a standalone formalization of the first known non-Shannon-type conditional information inequality, originally proved in [@zhangyeung1997, Theorem 3]. Kaced and Romashchenko classify it as $(\mathcal{I}_1)$ in their family of essentially conditional inequalities ([@kaced2013]): it holds on the set $\Gamma^*_4$ of constructible entropy functions but fails on its closure $\overline{\Gamma}^*_4$ (loc. cit., Theorem 5), so it is not derivable from the basic Shannon inequalities under any Lagrange combination of the hypotheses.

Main statements #

Implementation notes #

The proof has two structurally distinct layers. The first is a Shannon-type algebraic identity (theorem2_shannon_identity) that rewrites I[X:Y|⟨Z,U⟩] - I[Z:U|⟨X,Y⟩] - I[X:Y|U] as Δ(Z, U | X, Y) + I[X:Y|Z] - I[X:Y], where Δ is ZhangYeung.delta from the M1 module. Under (16) the two correction terms I[X:Y|Z] and I[X:Y] vanish, so Theorem 2 reduces to Δ(Z, U | X, Y) ≤ 0. The identity is pure Shannon algebra: beyond IsProbabilityMeasure μ, it needs only finiteness of the four codomains ([Finite S₁] … [Finite S₄], used to discharge PFR's finite-range obligations on chain_rule'' and condMutualInfo_eq) and measurability of X, Y, Z, U.

The second layer (theorem2_delta_le_zero) discharges the reduced inequality via the [@zhangyeung1997] argument: construct two auxiliary probability distributions on S₁ × S₂ × S₃ × S₄,

ptilde(x, y, z, u) := p(x, z, u) p(y, z, u) / p(z, u) phat(x, y, z, u) := p(x, z) p(x, u) p(y, z) p(y, u) / [p(z) p(u) p(x) p(y)]

(both vanishing on the appropriate zero-measure diagonals). Both sum to one -- ptilde unconditionally, phat by way of the two hypotheses I[X:Y] = 0 and I[X:Y|Z] = 0. One then expands Δ and observes that every marginal appearing in the log-expression is shared between the original law and ptilde, so the p-weighted sum equals the ptilde-weighted sum, and what drops out is exactly -KL(ptilde ‖ phat) ≤ 0. This is an IsZeroOrProbabilityMeasure-level KL-divergence argument and does not use the kernel/condIndep_copies machinery that Candidate A of the milestone plan envisioned; PFR's KLDiv_nonneg (and the underlying log-sum inequality Real.sum_mul_log_div_leq) is the relevant non-negativity lemma.

Connection to the 1998 copy construction. The auxiliary PMF ptilde(x, y, z, u) := p(x, z, u) p(y, z, u) / p(z, u) defined above is precisely the (X', Y₁, Z', U')-marginal of the extended probability measure ν that PFR's ProbabilityTheory.condIndep_copies, applied to ⟨X, Y⟩ conditioned on ⟨Z, U⟩, would produce. Projecting the copy -- set X' := Prod.fst ∘ W₁, Y₁ := Prod.snd ∘ W₂, ⟨Z', U'⟩ := V -- the conditional independence X' ⟂ Y₁ | ⟨Z', U'⟩ plus the marginal identities (X', Z', U') ∼ (X, Z, U) and (Y₁, Z', U') ∼ (Y, Z, U) force p_ν(x, y, z, u) = p(x, z, u) p(y, z, u) / p(z, u) = ptilde(x, y, z, u). So the 1997 KL proof and the 1998 two-copy copy-lemma framework reach the same object from two directions: the 1997 paper constructs ptilde as a PMF and closes via Real.sum_mul_log_div_leq; the 1998 paper (Lemma 2 in §III, eq. 44-45) constructs ν via kernel composition and closes Theorem 3 (the unconditional inequality) via a Shannon chase on the copy joint. For Theorem 2 specifically a pure copy + Shannon-chase close is ruled out: [@kaced2013, Theorem 3 + Claim 1, Theorem 5] show this inequality is essentially conditional and fails on the closure of the entropic region, so no combination of basic Shannon inequalities plus Lagrange multiples of the premises can derive it. This module follows the 1997 KL route rather than attempting the copy-construction framing.

Current state: The implication (16) ⇒ (17) is fully proved. theorem2_delta_le_zero is wired end-to-end, with the main proof body assembled around Real.sum_mul_log_div_leq and its absolute-continuity side condition (closed inline via marginal bounds). ptilde_sum_eq_one, phat_sum_eq_one, sum_joint_eq_sum_ptilde, and delta_eq_sum_log_ratio are all closed.

The file is organized into the following sections:

  1. theorem2_shannon_identity -- Shannon-algebra reduction to Δ ≤ 0.
  2. Auxiliary distributions ptilde, phat (plus pJoint) and their nonnegativity.
  3. Generic finite-alphabet utilities (marginal summations, marginal bounds, IndepFun product formula, fibrewise-swap helper).
  4. The eleven marginal-match facts for ptilde.
  5. Sum-to-one facts (ptilde_sum_eq_one and phat_sum_eq_one closed).
  6. Δ-to-log-ratio identities (delta_eq_sum_log_ratio and sum_joint_eq_sum_ptilde closed).
  7. theorem2_delta_le_zero + theorem2.

The proof infrastructure is organized around [Fintype] + [MeasurableSingletonClass] so explicit finite sums and PFR's FiniteRange/Countable obligations are available where needed. The public theorem surface, however, should expose only the assumptions that materially belong to the statement; local omit blocks below keep proof-local Fintype structure from leaking into the exported API.

Paper ordering (X, Y, Z, U) is followed here because Theorem 2 is a standalone inequality read most naturally in that order; ZhangYeung/Delta.lean uses (Z, U, X, Y) because the delta quantity is symmetric in its first two arguments. The two modules share no variables, so the naming clash across S₁..S₄ is harmless, and the ZhangYeung.delta identifier appearing in the helpers below takes its arguments in Delta.lean's order: delta Z U X Y μ.

References #

Tags #

Shannon entropy, conditional mutual information, conditional information inequality, Kullback-Leibler divergence, Zhang-Yeung, essentially conditional inequality

Shannon-algebra reduction #

Auxiliary distributions ptilde, phat, and the joint PMF #

Generic finite-alphabet utilities #

Pair and triple pushforward helpers -- marginal summation over each coordinate, pointwise marginal bounds, the IndepFun product formula, and the fibrewise-swap identity. All are stated generically (on abstract Ω' / α / β / γ) so they apply independently of this module's X, Y, Z, U variables.

Marginal structure of ptilde #

The eleven marginal-match facts: each of ptilde's projection-marginals agrees with pJoint's. ptilde_fibre_sum handles the (x, y) fibre at the 2-tuple level; the rest cascade from sum_ptilde_over_y/sum_ptilde_over_x plus sum_map_triple_* / sum_map_pair_* to descend through the other projections.

Sum-to-one #

Δ-to-log-ratio identities #

The eleven per-projection marginal-swap facts, each extracted as its own declaration so that each fits under the default maxHeartbeats budget; the aggregate sum_joint_eq_sum_ptilde below combines them.

Main proof #

theorem ZhangYeung.theorem2 {Ω : Type u_1} [MeasurableSpace Ω] {S₁ : Type u_2} {S₂ : Type u_3} {S₃ : Type u_4} {S₄ : Type u_5} [MeasurableSpace S₁] [MeasurableSpace S₂] [MeasurableSpace S₃] [MeasurableSpace S₄] [MeasurableSingletonClass S₁] [MeasurableSingletonClass S₂] [MeasurableSingletonClass S₃] [MeasurableSingletonClass S₄] {X : ΩS₁} {Y : ΩS₂} {Z : ΩS₃} {U : ΩS₄} [Finite S₁] [Finite S₂] [Finite S₃] [Finite S₄] (hX : Measurable X) (hY : Measurable Y) (hZ : Measurable Z) (hU : Measurable U) (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (h₁ : I[X : Y; μ] = 0) (h₂ : I[X : Y|Z;μ] = 0) :
I[X : Y|prod Z U;μ] I[Z : U|prod X Y;μ] + I[X : Y|U;μ]

Zhang-Yeung Theorem 2 ([@zhangyeung1998], eqs. 16-17; proved in [@zhangyeung1997, Theorem 3]). For any four discrete random variables X, Y, Z, U on a probability space, if I[X : Y; μ] = 0 and I[X : Y | Z; μ] = 0, then I[X : Y | ⟨Z, U⟩; μ] ≤ I[Z : U | ⟨X, Y⟩; μ] + I[X : Y | U; μ].

The proof factors into a Shannon-algebra reduction (theorem2_shannon_identity) that isolates the non-Shannon-type core Δ(Z, U | X, Y) ≤ 0 via the ZhangYeung.delta quantity from M1, and the [@zhangyeung1997] KL-divergence argument (theorem2_delta_le_zero) that discharges that core.