Documentation

LeanPool.Shannon1948Formalization.Entropy.Properties

Shannon.Entropy.Properties #

The fundamental properties of Shannon entropy from Section 6 of Shannon (1948, pp. 11-12). These are consequences of the entropy formula, proved using the Gibbs inequality and concavity of negMulLog.

Main results #

  1. entropyNat_eq_zero_iffH(p) = 0 iff p is deterministic
  2. entropyNat_eq_log_card_iffH(p) = log |α| iff p is uniform
  3. entropyNat_jointLe_add — subadditivity: H(X,Y) ≤ H(X) + H(Y)
  4. entropyNat_doublyStochastic_le — Schur-concavity: doubly stochastic averaging
  5. condEntropy_le_entropyNat — conditioning reduces entropy: H_X(Y) ≤ H(Y)
  6. condEntropy_nonneg — conditional entropy is nonnegative

Property 1: H = 0 iff deterministic #

A distribution is deterministic if some outcome has probability one.

Equations
Instances For

    Property 1: H(p) = 0 if and only if p is deterministic.

    Forward: H = 0 means each negMulLog(pᵢ) = 0, forcing each pᵢ ∈ {0, 1}; the sum constraint ∑ pᵢ = 1 then gives exactly one pᵢ = 1. Backward: 1 · log 1 = 0 and 0 · log 0 = 0.

    Property 2: H maximized at uniform #

    Property 2: H(p) = log |α| if and only if p is uniform.

    The forward direction uses strict concavity of negMulLog (strict Jensen): if any pᵢ ≠ 1/|α|, then H(p) < log |α|, contradicting H(p) = log |α|.

    Property 3: Subadditivity #

    Property 3 (subadditivity): H(X,Y) ≤ H(X) + H(Y).

    The proof applies gibbs_inequality with q = prodDist (marginalFst p) (marginalSnd p), i.e., the product of marginals. The Gibbs sum telescopes to H(X,Y) - H(X) - H(Y) ≤ 0.

    Property 4: Schur-concavity (doubly stochastic averaging) #

    Apply a doubly stochastic matrix A to a probability distribution p, yielding the distribution with (Ap)_i = ∑_j A_{ij} p_j.

    Equations
    Instances For

      Property 4 (Schur-concavity): H(p) ≤ H(Ap) for doubly stochastic A.

      The proof applies Jensen's inequality for negMulLog row-by-row with weights A_{ij}, then sums over rows and uses column-stochasticity to collapse.

      Properties 5-6: Conditioning reduces entropy #

      Property 6: conditional entropy is nonnegative.

      Each ratio p(x,y)/p_X(x) ≤ 1 (since p(x,y) ≤ ∑_y p(x,y) = p_X(x)), so log(p(x,y)/p_X(x)) ≤ 0 and each summand is nonpositive.

      Property 5: conditioning reduces entropy, H_X(Y) ≤ H(Y).

      From the chain rule H(X,Y) = H(X) + H_X(Y) and subadditivity H(X,Y) ≤ H(X) + H(Y), we get H_X(Y) ≤ H(Y).