为什么突然开始学经典信息论?主要是给量子信息稍微打下基础。可能不会学很多。

Entropy

首先,熵是什么?
The entropy of X is defined by

H(X)=-\sum_{x\in X}p(x)\log p(x)

A measure of the uncertainty of a random variable.
一个永远大于0的家伙。
对于均匀分布来说,
When X is uniform over \mathcal{X}, then H(X)=\log|\mathcal{X}|.
如果要规定log底,那么一般表示为H_b(X)=\log_baH_a(X).
The logarithm is to the base 2 and the unit is bits. If the base of the logarithm is b, we donote of the entropy by H_b(X). If b=e, the entropy is measured in nats.
Unless otherwise specified, the entropies will be measuerd in bits.

example
X=1 with probablity p
X=0 with probablity 1-p
then H(X)=-p\log p-(1-p)\log(1-p)

我们还可以从期望的角度来看:
We denote expectation by E. If X\sim p(x), the expected value of the random variable g(X) is writtenE_{p}g(X)=\sum\limits_{x\in X}g(x)p(x), then H(X)=E_{p}\log\frac{1}{p(X)}.


For a discrete random variable X defined on \mathcal{X},

0\leq H(X)\leq\log|\mathcal{X}|

熵和概率论关系密切,很多深层次原理其实还是概率论。

Joint Entropy

联合熵
The joint entropy H(X,Y) of a pair of discrete random variable (X,Y) with joint distribution p(x,y) is defined as

H(X,Y)=-\sum_{x\in X}\sum_{y\in Y}p(x,y)\log p(x,y)

再往下还能接着定义
For a set of random variables X_1,\dots,X_n with joint distribution p(x_1,\dots,x_n),its joint entropy is defined as

H(X_{1},X_{2},...,X_{n})=-\sum p(x_{1},x_{2},...,x_{n})\mathrm{log}p(x_{1},x_{2},...,x_{n})=-E\log p(X_{1},...,X_{n})

Conditional Entropy

条件熵
When X=x is known, p(Y|X=x) is also a probability distribution

\sum_yp(Y=y|X=x)=\sum_y\frac{p(x,y)}{p(x)}=\frac{p(x)}{p(x)}=1

所有可能性加起来自然是等于1的。
那么我们定义
If (X,Y)\sim p(x,y), the conditional entropy H(Y|X) is defined as

\begin{aligned} H(Y|X& )=\sum_{x\in X}p(x)H(Y|X=x) \\ &=-\sum_{x\in X}p(x)\sum_{y\in Y}p(y|x)\log p(y|x) \\ &=-\sum_{x\in X}\sum_{y\in Y}p(x,y)\log p(y|x) \\ &=-E\log p(Y|X) \end{aligned}

When X is known, the remaining uncertainty of Y\colon H(Y|X)\leq H(Y).

Chain Rule

H(X,Y)=H(Y)+H(X|Y)=H(X)+H(Y|X)