The Zhang-Yeung copy lemma #
The copy lemma of [@zhangyeung1998, §III, eqs. 44-45] is the auxiliary-probability
construction
at the heart of the 1998 proof of Theorem 3. Given four discrete random variables X, Y, Z, U on a
probability space (Ω, μ), it produces an extended space (Ω', ν) carrying a second
conditionally-independent copy (X₁, Y₁) of (X, Y) over the shared variables (Z, U). The
auxiliary law is the paper's
q(x, y, z, u, x₁, y₁) := p(x, y, z, u) p(x₁, y₁, z, u) / p(z, u) (eq. 44)
and this module formalizes it via PFR's ProbabilityTheory.condIndep_copies applied to
the pair
⟨X, Y⟩ conditioned on the shared variable ⟨Z, U⟩. The module then ships Lemma 2 (eq.
45),
Δ(Z, U | X, Y) = I(X; Y₁) - I(X; Y₁ | U) - I(X; Y₁ | Z) - I(Z; U | X, Y₁),
both as an abstract Shannon identity (under the hypothesis that one conditional mutual
information
vanishes) and as its Zhang-Yeung-flavored specialization to the copy projections. The
three
subtracted conditional mutual informations on the right are each nonnegative, so the
identity
immediately yields the inequality Δ(Z, U | X, Y) ≤ I(X; Y₁) (and the X ↔ X₁
symmetric variant
I(Z; U) - 2·I(Z; U | X) ≤ I(X; X₁)) -- the two inputs to the Shannon chase that proves
Theorem 3
in the next milestone.
Main statements #
ZhangYeung.copyLemma: the main existential, producingΩ', ν, X', Y', X₁, Y₁, Z', U'together with the three structural facts of eq. (44) (two 4-variableIdentDistribs and oneCondIndepFun) and the six measurabilities.ZhangYeung.delta_of_condMI_vanishes_eq: Lemma 2 in abstract form -- under the hypothesisI[A : D | ⟨B, C⟩; ν] = 0, the delta identityΔ(B, C | A, D) = I(A; D) - I(A; D | B) - I(A; D | C) - I(B; C | ⟨A, D⟩).ZhangYeung.copyLemma_delta_transport_Y_to_Y₁,ZhangYeung.copyLemma_delta_transport_X_to_X₁: bridge identities betweenΔcomputed under the original lawμandΔcomputed under the copy lawν, with either(X, Y)or(X, X₁)in the measured slots.ZhangYeung.copyLemma_delta_identity_Y₁,ZhangYeung.copyLemma_delta_identity_X_X₁: Lemma 2 specialized to the copy's projections.ZhangYeung.copyLemma_delta_le_mutualInfo_Y₁,ZhangYeung.copyLemma_delta_le_mutualInfo_X_X₁: the inequality-form corollaries combining the delta identity with nonnegativity of the three conditional mutual information terms on the right of eq. (45).
Implementation notes #
The main copyLemma is a direct wrapper around ProbabilityTheory.condIndep_copies
applied to
⟨X, Y⟩ : Ω → S₁ × S₂ with shared variable ⟨Z, U⟩ : Ω → S₃ × S₄.
condIndep_copies
returns two
pair-valued copies W₁, W₂ and a shared pair-valued variable V; we project to the six
individual
random variables X', Y', X₁, Y₁, Z', U' via Prod.fst/Prod.snd. The 4-variable
IdentDistrib
outputs are recovered from the 2-level IdentDistribs PFR supplies by composition with
a
measurable rearrangement ((s₁, s₂), (s₃, s₄)) ↦ (s₁, s₂, s₃, s₄).
The four codomains S₁, S₂, S₃, S₄ are bound at a common universe u because
condIndep_copies
binds its two codomains at a single universe. This is a deviation from
ZhangYeung/Delta.lean's
Type* convention, made here because condIndep_copies itself requires it.
The derived corollaries (delta transports, delta identities, delta ≤ mutualInfo) live in
their own
section Consequences with a shared variable block packaging the eight relevant
hypotheses (six
measurabilities, two IdentDistribs, one CondIndepFun). A caller of copyLemma
produces these
eight hypotheses with one obtain, then applies the corollaries as black-box Shannon
identities.
ZhangYeung.condIndepFun_comp (post-composition of CondIndepFun on its two measured
coordinates)
was promoted to ZhangYeung/Prelude.lean when M3 became its second consumer.
IdentDistrib.condMutualInfo_eq now lives there as well, promoted for M5's tuple-level
delta
transports.
References #
- [@zhangyeung1998, §III, eq. 44-45, Lemma 2] -- see
references/transcriptions/zhangyeung1998.mdfor a verbatim transcription of equations (44) and (45), verified 2026-04-16.
Tags #
Shannon entropy, conditional mutual information, copy lemma, conditional independence, Zhang-Yeung
The main copy construction #
The Zhang-Yeung copy lemma [@zhangyeung1998, §III, eq. 44]. Given four discrete
random
variables X, Y, Z, U on a probability space (Ω, μ), there exists an extended
probability space
(Ω', ν) carrying six projected random variables X', Y', X₁, Y₁, Z', U' such that
(X', Y', Z', U') and (X₁, Y₁, Z', U') each have the original joint law of (X, Y, Z, U) under
μ, and the
two pairs (X', Y') and (X₁, Y₁) are conditionally independent given the shared pair
(Z', U').
This is a direct wrapper around ProbabilityTheory.condIndep_copies applied to ⟨X, Y⟩
conditioned on ⟨Z, U⟩.
Lemma 2 (abstract Shannon identity) #
Lemma 2 (abstract form) of [@zhangyeung1998, §III, eq. 45]. For any four
discrete random
variables A, B, C, D on a probability space (Ω, ν) with I[A : D | ⟨B, C⟩; ν] = 0,
Δ(B, C | A, D) = I[A : D] - I[A : D | B] - I[A : D | C] - I[B : C | ⟨A, D⟩].
This is the paper's eq. (45) abstracted away from the copy construction: the identity
holds
whenever one conditional mutual information vanishes. The three subtracted conditional
mutual
informations on the right are each nonnegative, so this identity immediately implies the
paper's
inequality form Δ(B, C | A, D) ≤ I[A : D] via linarith.
Consequences #
The lemmas in this section are parametrized on the outputs of copyLemma. A caller
destructures
copyLemma once via obtain, producing the eight structural hypotheses (six
measurabilities, two
4-variable IdentDistribs, one CondIndepFun), and applies these lemmas one by one.
Measurable projection helpers #
Measurable repackings of a right-associated 4-tuple (a, b, c, d) : S₁ × S₂ × S₃ × S₄
into the
three-variable shapes IdentDistrib.condMutualInfo_eq consumes. projZUA extracts (c, d, a) --
the (Z, U, X) triple; projZUB extracts (c, d, b) -- the (Z, U, Y) triple.
Triple-level IdentDistrib facts #
Each of the three triples extracted below feeds directly into
IdentDistrib.condMutualInfo_eq. The
Fintype/MeasurableSingletonClass/IsProbabilityMeasure side conditions are only
needed by the
downstream transport lemmas, so these triple facts live above the heavier instance
block.
Finite-alphabet consequences #
The lemmas below all require discrete/countable side conditions on the four codomains
plus an
IsProbabilityMeasure instance on the copy measure ν.
Lemma 2 Form B (specialized to the copy projections) #
delta_of_condMI_vanishes_eq applied to the copy's (X', Y₁, Z', U') and (X', X₁, Z', U')
projections, with the vanishing-CMI hypothesis supplied by the projected
conditional-independence
facts.
Lemma 2 Form B (primary) [@zhangyeung1998, §III, eq. 45]. The delta identity of
Lemma 2
instantiated at the copy's (X', Y₁, Z', U') projections:
Δ(Z', U' | X', Y₁) = I[X' : Y₁] - I[X' : Y₁ | Z'] - I[X' : Y₁ | U'] - I[Z' : U' | ⟨X', Y₁⟩]
under the copy measure ν. The vanishing-CMI hypothesis is derived from the main
CondIndepFun by
projecting each measured pair to one coordinate.
Lemma 2 Form B (symmetric) [@zhangyeung1998, §III, eq. 45]. The delta identity
of Lemma 2
instantiated at the copy's (X', X₁, Z', U') projections, the X ↔ X₁ swap of
copyLemma_delta_identity_Y₁:
Δ(Z', U' | X', X₁) = I[X' : X₁] - I[X' : X₁ | Z'] - I[X' : X₁ | U'] - I[Z' : U' | ⟨X', X₁⟩]
under the copy measure ν.
Delta transport and inequality corollaries #
The lemmas in this section take both the original-law hypotheses (`X, Y, Z, U` on `(Ω, μ)`) and the copy-law outputs (`X', Y', X₁, Y₁, Z', U'` on `(Ω', ν)`) plus the structural facts `hFirst, hSecond, hCond`, and relate the `Δ` under `μ` to the `Δ` under `ν`.
Bridge identity: Δ(Z, U | X, Y) μ = Δ(Z', U' | X', Y₁) ν. Each side of the delta
expands into
three mutual-information terms. IdentDistrib.mutualInfo_eq transports the
unconditional I[Z : U]; IdentDistrib.condMutualInfo_eq transports the two conditional terms via the
triple-level
IdentDistribs extracted from hFirst and hSecond.
Symmetric bridge identity: Δ(Z, U | X, X) μ = Δ(Z', U' | X', X₁) ν. Transports both
conditional-MI terms via copyLemma_triple_XFirst and copyLemma_triple_XSecond. The
μ-side has
X in both conditioner slots, so the two transports target the same pattern
syntactically; closing
by linarith over the two transport equalities sidesteps the ambiguity rw would
otherwise face.
Primary inequality form of Lemma 2 [@zhangyeung1998, §III, eq. 45]. Combining the Y-to-Y₁ delta transport with Form B and the nonnegativity of the three subtracted conditional mutual informations on the right of eq. (45) gives
Δ(Z, U | X, Y) μ ≤ I[X' : Y₁; ν],
the first of the two inequalities Theorem 3's proof opens with on paper line 683.
Symmetric inequality form of Lemma 2 [@zhangyeung1998, §III, eq. 45]. The X ↔ X₁ variant,
I[Z : U; μ] - 2·I[Z : U | X; μ] ≤ I[X' : X₁; ν],
the second of the two inequalities Theorem 3's proof opens with on paper line 689.