Shannon.Entropy.Properties #

The fundamental properties of Shannon entropy from Section 6 of Shannon (1948, pp. 11-12). These are consequences of the entropy formula, proved using the Gibbs inequality and concavity of negMulLog.

Main results #

entropyNat_eq_zero_iff — H(p) = 0 iff p is deterministic
entropyNat_eq_log_card_iff — H(p) = log |α| iff p is uniform
entropyNat_jointLe_add — subadditivity: H(X,Y) ≤ H(X) + H(Y)
entropyNat_doublyStochastic_le — Schur-concavity: doubly stochastic averaging
condEntropy_le_entropyNat — conditioning reduces entropy: H_X(Y) ≤ H(Y)
condEntropy_nonneg — conditional entropy is nonnegative

Property 1: H = 0 iff deterministic #

source

def LeanPool.Shannon1948Formalization.IsDeterministic {α : Type} [Fintype α] (p : ProbDist α) :

Prop

A distribution is deterministic if some outcome has probability one.

Equations

LeanPool.Shannon1948Formalization.IsDeterministic p = ∃ (a₀ : α), ↑p a₀ = 1

Instances For

source

theorem LeanPool.Shannon1948Formalization.entropyNat_eq_zero_iff {α : Type} [Fintype α] (p : ProbDist α) :

entropyNat p = 0 ↔ IsDeterministic p

Property 1: H(p) = 0 if and only if p is deterministic.

Forward: H = 0 means each negMulLog(pᵢ) = 0, forcing each pᵢ ∈ {0, 1}; the sum constraint ∑ pᵢ = 1 then gives exactly one pᵢ = 1. Backward: 1 · log 1 = 0 and 0 · log 0 = 0.

Property 2: H maximized at uniform #

source

theorem LeanPool.Shannon1948Formalization.entropyNat_eq_log_card_iff {α : Type} [Fintype α] [Nonempty α] (p : ProbDist α) :

entropyNat p = Real.log ↑(Fintype.card α) ↔ ∀ (a : α), ↑p a = 1 / ↑(Fintype.card α)

Property 2: H(p) = log |α| if and only if p is uniform.

The forward direction uses strict concavity of negMulLog (strict Jensen): if any pᵢ ≠ 1/|α|, then H(p) < log |α|, contradicting H(p) = log |α|.

Property 3: Subadditivity #

source

theorem LeanPool.Shannon1948Formalization.entropyNat_jointLe_add {α β : Type} [Fintype α] [Fintype β] (p : ProbDist (α × β)) :

entropyNat p ≤ entropyNat (marginalFst p) + entropyNat (marginalSnd p)

Property 3 (subadditivity): H(X,Y) ≤ H(X) + H(Y).

The proof applies gibbs_inequality with q = prodDist (marginalFst p) (marginalSnd p), i.e., the product of marginals. The Gibbs sum telescopes to H(X,Y) - H(X) - H(Y) ≤ 0.

Property 4: Schur-concavity (doubly stochastic averaging) #

source

def LeanPool.Shannon1948Formalization.doublyStochasticApply {n : Type} [Fintype n] [DecidableEq n] (A : Matrix n n ℝ) (hA : A ∈ doublyStochastic ℝ n) (p : ProbDist n) :

ProbDist n

Apply a doubly stochastic matrix A to a probability distribution p, yielding the distribution with (Ap)_i = ∑_j A_{ij} p_j.

Equations

LeanPool.Shannon1948Formalization.doublyStochasticApply A hA p = ⟨fun (i : n) => ∑ j : n, A i j * ↑p j, ⋯⟩

Instances For

source

theorem LeanPool.Shannon1948Formalization.entropyNat_doublyStochastic_le {n : Type} [Fintype n] [DecidableEq n] (A : Matrix n n ℝ) (hA : A ∈ doublyStochastic ℝ n) (p : ProbDist n) :

entropyNat p ≤ entropyNat (doublyStochasticApply A hA p)

Property 4 (Schur-concavity): H(p) ≤ H(Ap) for doubly stochastic A.

The proof applies Jensen's inequality for negMulLog row-by-row with weights A_{ij}, then sums over rows and uses column-stochasticity to collapse.