Zhang-Yeung Theorem 2: a conditional information inequality #
Theorem 2 of [@zhangyeung1998], originally proved as Theorem 3 of [@zhangyeung1997],
states that
for any four discrete random variables X, Y, Z, U, the hypothesis
I[X : Y; μ] = 0 and I[X : Y | Z; μ] = 0 (eq. 16)
implies the conditional information inequality
I[X : Y | ⟨Z, U⟩; μ] ≤ I[Z : U | ⟨X, Y⟩; μ] + I[X : Y | U; μ]. (eq. 17)
This module formalizes the implication (16) ⇒ (17) on finite-alphabet random variables. It is a standalone formalization of the first known non-Shannon-type conditional information inequality, originally proved in [@zhangyeung1997, Theorem 3]. Kaced and Romashchenko classify it as $(\mathcal{I}_1)$ in their family of essentially conditional inequalities ([@kaced2013]): it holds on the set $\Gamma^*_4$ of constructible entropy functions but fails on its closure $\overline{\Gamma}^*_4$ (loc. cit., Theorem 5), so it is not derivable from the basic Shannon inequalities under any Lagrange combination of the hypotheses.
Main statements #
ZhangYeung.theorem2: the implication (16) ⇒ (17) for discrete random variables on a probability space.
Implementation notes #
The proof has two structurally distinct layers. The first is a Shannon-type algebraic
identity
(theorem2_shannon_identity) that rewrites I[X:Y|⟨Z,U⟩] - I[Z:U|⟨X,Y⟩] - I[X:Y|U] as
Δ(Z, U | X, Y) + I[X:Y|Z] - I[X:Y], where Δ is ZhangYeung.delta from the M1 module. Under
(16) the two
correction terms I[X:Y|Z] and I[X:Y] vanish, so Theorem 2 reduces to Δ(Z, U | X, Y) ≤ 0. The
identity is pure Shannon algebra: beyond IsProbabilityMeasure μ, it needs only
finiteness of the
four codomains ([Finite S₁] … [Finite S₄], used to discharge PFR's finite-range
obligations on
chain_rule'' and condMutualInfo_eq) and measurability of X, Y, Z, U.
The second layer (theorem2_delta_le_zero) discharges the reduced inequality via the
[@zhangyeung1997] argument: construct two auxiliary probability distributions on S₁ × S₂ × S₃ × S₄,
ptilde(x, y, z, u) := p(x, z, u) p(y, z, u) / p(z, u)
phat(x, y, z, u) := p(x, z) p(x, u) p(y, z) p(y, u) / [p(z) p(u) p(x) p(y)]
(both vanishing on the appropriate zero-measure diagonals). Both sum to one -- ptilde
unconditionally, phat by way of the two hypotheses I[X:Y] = 0 and I[X:Y|Z] = 0.
One then
expands Δ and observes that every marginal appearing in the log-expression is shared
between the
original law and ptilde, so the p-weighted sum equals the ptilde-weighted sum, and
what drops
out is exactly -KL(ptilde ‖ phat) ≤ 0. This is an IsZeroOrProbabilityMeasure-level
KL-divergence argument and does not use the kernel/condIndep_copies machinery that
Candidate A
of the milestone plan envisioned; PFR's KLDiv_nonneg (and the underlying log-sum
inequality
Real.sum_mul_log_div_leq) is the relevant non-negativity lemma.
Connection to the 1998 copy construction. The auxiliary PMF ptilde(x, y, z, u) := p(x, z, u) p(y, z, u) / p(z, u) defined above is precisely the (X', Y₁, Z', U')-marginal of the
extended
probability measure ν that PFR's ProbabilityTheory.condIndep_copies, applied to ⟨X, Y⟩
conditioned on ⟨Z, U⟩, would produce. Projecting the copy -- set X' := Prod.fst ∘ W₁, Y₁ := Prod.snd ∘ W₂, ⟨Z', U'⟩ := V -- the conditional independence X' ⟂ Y₁ | ⟨Z', U'⟩
plus the
marginal identities (X', Z', U') ∼ (X, Z, U) and (Y₁, Z', U') ∼ (Y, Z, U) force
p_ν(x, y, z, u) = p(x, z, u) p(y, z, u) / p(z, u) = ptilde(x, y, z, u). So the 1997 KL proof and the
1998
two-copy copy-lemma framework reach the same object from two directions: the 1997 paper
constructs
ptilde as a PMF and closes via Real.sum_mul_log_div_leq; the 1998 paper (Lemma 2 in
§III, eq.
44-45) constructs ν via kernel composition and closes Theorem 3 (the unconditional
inequality)
via a Shannon chase on the copy joint. For Theorem 2 specifically a pure copy +
Shannon-chase close
is ruled out: [@kaced2013, Theorem 3 + Claim 1, Theorem 5] show this inequality is
essentially
conditional and fails on the closure of the entropic region, so no combination of basic
Shannon
inequalities plus Lagrange multiples of the premises can derive it. This module follows
the 1997 KL
route rather than attempting the copy-construction framing.
Current state: The implication (16) ⇒ (17) is fully proved.
theorem2_delta_le_zero is wired end-to-end, with the main proof body assembled around
Real.sum_mul_log_div_leq and its absolute-continuity side condition (closed inline via
marginal
bounds). ptilde_sum_eq_one, phat_sum_eq_one, sum_joint_eq_sum_ptilde, and
delta_eq_sum_log_ratio are all closed.
The file is organized into the following sections:
theorem2_shannon_identity-- Shannon-algebra reduction toΔ ≤ 0.- Auxiliary distributions
ptilde,phat(pluspJoint) and their nonnegativity. - Generic finite-alphabet utilities (marginal summations, marginal bounds,
IndepFunproduct formula, fibrewise-swap helper). - The eleven marginal-match facts for
ptilde. - Sum-to-one facts (
ptilde_sum_eq_oneandphat_sum_eq_oneclosed). - Δ-to-log-ratio identities (
delta_eq_sum_log_ratioandsum_joint_eq_sum_ptildeclosed). theorem2_delta_le_zero+theorem2.
The proof infrastructure is organized around [Fintype] + [MeasurableSingletonClass]
so explicit
finite sums and PFR's FiniteRange/Countable obligations are available where needed.
The public
theorem surface, however, should expose only the assumptions that materially belong to
the
statement; local omit blocks below keep proof-local Fintype structure from leaking
into the
exported API.
Paper ordering (X, Y, Z, U) is followed here because Theorem 2 is a standalone
inequality read
most naturally in that order; ZhangYeung/Delta.lean uses (Z, U, X, Y) because the
delta
quantity is symmetric in its first two arguments. The two modules share no variables, so
the naming
clash across S₁..S₄ is harmless, and the ZhangYeung.delta identifier appearing in
the helpers
below takes its arguments in Delta.lean's order: delta Z U X Y μ.
References #
- [@zhangyeung1997] -- the 1997 paper containing the KL-divergence proof of the
inequality (as
Theorem 3 of that paper). See
references/papers/zhangyeung1997.pdf. - [@zhangyeung1998] -- the 1998 paper; Theorem 2 there restates the 1997 result. See
references/transcriptions/zhangyeung1998.mdfor a verbatim transcription of equations (16) and (17), verified 2026-04-16. - [@kaced2013] -- Kaced and Romashchenko classify the inequality as $(\mathcal{I}_1)$ in a family of essentially conditional inequalities. Theorem 3 + Claim 1 prove essential conditionality; Theorem 5 shows $(\mathcal{I}_1)$ fails on $\overline{\Gamma}^*_n$. Available as arXiv:1207.5742.
Tags #
Shannon entropy, conditional mutual information, conditional information inequality, Kullback-Leibler divergence, Zhang-Yeung, essentially conditional inequality
Shannon-algebra reduction #
Auxiliary distributions ptilde, phat, and the joint PMF #
Generic finite-alphabet utilities #
Pair and triple pushforward helpers -- marginal summation over each coordinate,
pointwise marginal
bounds, the IndepFun product formula, and the fibrewise-swap identity. All are stated
generically
(on abstract Ω' / α / β / γ) so they apply independently of this module's X, Y, Z, U
variables.
Marginal structure of ptilde #
The eleven marginal-match facts: each of ptilde's projection-marginals agrees with
pJoint's.
ptilde_fibre_sum handles the (x, y) fibre at the 2-tuple level; the rest cascade
from
sum_ptilde_over_y/sum_ptilde_over_x plus sum_map_triple_* / sum_map_pair_* to
descend
through the other projections.
Sum-to-one #
Δ-to-log-ratio identities #
The eleven per-projection marginal-swap facts, each extracted as its own declaration so
that each
fits under the default maxHeartbeats budget; the aggregate sum_joint_eq_sum_ptilde
below
combines them.
Main proof #
Zhang-Yeung Theorem 2 ([@zhangyeung1998], eqs. 16-17; proved in
[@zhangyeung1997, Theorem
3]). For any four discrete random variables X, Y, Z, U on a probability space, if I[X : Y; μ] = 0 and I[X : Y | Z; μ] = 0, then I[X : Y | ⟨Z, U⟩; μ] ≤ I[Z : U | ⟨X, Y⟩; μ] + I[X : Y | U; μ].
The proof factors into a Shannon-algebra reduction (theorem2_shannon_identity) that
isolates the
non-Shannon-type core Δ(Z, U | X, Y) ≤ 0 via the ZhangYeung.delta quantity from M1,
and the
[@zhangyeung1997] KL-divergence argument (theorem2_delta_le_zero) that discharges that
core.