lecture1_sourcecode

Embed Size (px)

Citation preview

  • 7/30/2019 lecture1_sourcecode

    1/7

    Information Theory 1

    3F1 - Signals and Systems

    Michaelmas 2005

    Information TheoryHandout 1

    Andrea Lecchini Visintini

    30 November 2005

    2 Engineering Part IIA: 3F1 - Signals and Systems

  • 7/30/2019 lecture1_sourcecode

    2/7

    Information Theory 3

    Source coding

    Source characters: {x1, x2, . . . , xM}

    e.g. English alphabet { A , B , C , . . . }

    A message is a sequence of source characters

    e.g. HELLO

    Code characters: {1, . . . , D}

    A code is a rule which assigns a sequence of code char-acters to each element of the source alphabet.

    Each of these sequences is called a code word.

    Here we will consider binary {0, 1} codes.

    e.g. H = 1011

    4 Engineering Part IIA: 3F1 - Signals and Systems

    Design of a code

    (1) Unique decipherability

    Every finite sequence of code characters must correspondto at most one message

    e.g. if the code is

    x1 0x2 010

    x3 01

    x4 10

    thenx2x3 x1

    x1 x4 010 ?

    Here we will consider a class of uniquely decipherable

    codes denoted instantaneous codes.

    (2) Efficiency

    The shortest code words must be assigned to the sourcesymbols which are transmitted more frequently.

    e.g. in Morse code E = Q =

    The efficiency of a certain code for a certain source de-pends on the frequency of the symbols {x1, x2, . . . , xM}

    produced by the source and on the lengths of the code

    words.

  • 7/30/2019 lecture1_sourcecode

    3/7

    Information Theory 5

    Instantaneous decipherability

    Instantaneous code: a code which satisfies the prefixcondition.

    Prefix condition: no code word is a prefix of another

    code word

    e.g. if a code word is 01 then there cannot be 01011

    e.g. an instantaneous code

    x1 0

    x2 100

    x3 101

    x4 11

    Decoding Algorithm

    Given a finite sequence of code characters

    1 Proceed from the left until a code word is found

    2 Repeat until the end of the message

    101110100101 x3

    x4x1

    x2

    x3

    Every instantaneous code is uniquely decipherable butthe converse is not true.

    There is a relationship between instantaneous decipher-

    ability and the lengths of the code words. This relation-ship is important because the efficiency of a code will beevaluated on the basis of its code word lengths.

    6 Engineering Part IIA: 3F1 - Signals and Systems

    Theorem (Kraft inequality)

    Let {x1, . . . , xM} be the source characters. Then a binaryinstantaneous code with word lengths {n1, n2, . . . , nM}

    exists if and only if

    Mi=1

    2ni 1

    Some examples:

    x1 0 n1 = 1

    x2 1 n2 = 1

    2

    i=1

    2ni =1

    2+

    1

    2= 1

    x1 0 n1 = 1

    x2 100 n2 = 3x3 101 n3 = 3

    x4 11 n4 = 2

    4

    i=12ni =

    1

    2+

    1

    8+

    1

    8+

    1

    4= 1

    x1 00 n1 = 2

    x2 100 n2 = 3

    x3 110 n3 = 4

    3i=1

    2ni =1

    4+

    1

    8+

    1

    16=

    7

    16

  • 7/30/2019 lecture1_sourcecode

    4/7

    Information Theory 7

    ProofFirstly we prove that the inequality is a necessary con-dition, i.e. any instantaneous code must satisfy it.

    Let nbe the maximum number between {n1, n2, . . . , nM}.Consider the table of all the 2n possible words of length n

    e.g.0 0 0 0

    0 0 0 1

    0 0 1 0

    0 0 1 1

    0 1 0 0

    0 1 0 1

    0 1 1 00 1 1 1

    1 0 0 0 n = 4

    1 0 0 1

    1 0 1 0

    1 0 1 1

    1 1 0 0

    1 1 0 1

    1 1 1 01 1 1 1

    Each of the possible code words with lengths {n1, n2, . . . , nM}

    is a prefix of some lines of the table.

    However, since no code word can be a prefix for another

    code word, each line in the table must correspond to atmost one code word. In fact, if two code words are both

    a prefix of the same line then one of them is a prefix ofthe other.

    A code word of length ni occupies 2(nni) lines of the ta-

    ble.

    8 Engineering Part IIA: 3F1 - Signals and Systems

    e.g. 01 occupies 0 1 0 00 1 0 1

    0 1 1 0

    0 1 1 1

    Since in the table there are 2n lines, the sum of the lines

    occupied by each word must be less than 2n.

    HenceMi=1

    2nni 2n Mi=1

    2ni 1

    Therefore any instantaneous code must satisfy the in-

    equality.In order to prove sufficiency we need to show that iflengths {n1, n2, . . . , nM} are given, which satisfy the in-equality, then a set of code words, having these lengths

    and satisfying the prefix condition, can be constructed.

    The code words can easily be generated using the same

    procedure. Let n be again the maximum lengths. Con-struct the table of all the possible words of that length

    in ascending binary order. Take the code word lengthsin ascending order, assign 2(nni) words to each wordlength and record the common prefix (of length ni) as

    the code word.

  • 7/30/2019 lecture1_sourcecode

    5/7

    Information Theory 9

    Efficiency of a code

    To measure the efficiency of a code for a given source weneed to know the frequencies of the symbols {x1, . . . , xM}

    produced by the source. This can be done a priori by in-troducing a probabilistic model of the source.

    Definition (Information source)An information source X is a sequence of random vari-

    ables X0, X1, X2, . . . such that:1 each Xt takes on values {x1, x2, . . . , xM}; and

    2 the sequence is stationary, i.e.

    P{Xt1 = xi1, . . . , X tk = xik} = P{Xt1+h = xi1, . . . , X tk+h = xik}

    for all nonnegative integers t1, . . ,tk, i1, . . ,ik, k and h.

    Given a source X, we denote with Xt the generic ran-

    dom variable in the sequence. The simplest model of aninformation source is the memoryless model which is asequence of independent random variables.

    Given a source X, such that Xt takes on values{x1, x2, . . . , xM} with probabilities {p1, p2, . . . , pM}, the

    efficiency of a code is then measured by the average

    code word length which is given by:

    L =Mi=1

    pi ni

    where {n1, n2, . . . , nM} are the code word lengths assigned

    to {x1, x2, . . . , xM}.

    Our objective now is to design codes which:

    (1) are instantaneously decipherable; and

    (2) have small average code word length.

    10 Engineering Part IIA: 3F1 - Signals and Systems

    Definition of Entropy (Shannon)

    Given a random variable X, which takes on values{x1, x2, . . . , xM} with probabilities {p1, . . . , pM}, the quan-tity

    H(X) = Mi=1

    pi log2pi

    is the Entropy ofX.

    Theorem (lower bound on efficiency)

    Let X be an information source such that Xt takes onvalues {x1, x2, . . . , xM} with probabilities {p1, p2, . . . , pM}.Then the average code word length, of any binary in-

    stantaneous code which encodes X, satisfies:

    L H(Xt)

    Proof

    H(Xt) L =Mi=1

    pi

    log2

    1

    pi ni

    =Mi=1

    pi log22ni

    pi

    =

    Mi=1

    pi log2(e) ln 2ni

    pisince loga(x) = loga(b)logb(x)

    Mi=1

    pi log2(e)

    2ni

    pi 1

    since ln(x) x 1

    = log2(e)

    M

    i=1

    2ni M

    i=1

    pi

    SinceM

    i=1 2ni 1 (for instantaneous decipherability)

    the last expression on the right hand side is less or equal

    to zero.

  • 7/30/2019 lecture1_sourcecode

    6/7

    Information Theory 11

    In general it is not possible to construct a code which at-

    tains the lower bound on the average code word length

    but it always possible to construct a code which has av-erage code worth length less than the lower bound plusone bit.

    Theorem

    Let X be an information source. Then there exists a bi-nary instantaneous code such that

    L < H(Xt) + 1

    Proof

    Choose code word lengths ni such that ni = log2 (pi).This means that

    log2 (pi) ni < log2 (pi) + 1 .

    An instantaneous code with such code word lengths ex-ists because the left hand side of the inequality implies

    thatMi=1

    2ni Mi=1

    2log2(pi) =Mi=1

    pi = 1

    As for average code word length, we have that the right

    hand side of the inequality implies

    L =Mi=1

    pini