karantp

Embed Size (px)

Citation preview

  • 8/2/2019 karantp

    1/10

    TERM PAPER

    CSE-408

    TOPIC:HUFFMAN CODES

    SUBMITTED TO:- SUBMITTED BY:-

    MR.VIJAY GARG KARANBIR SINGH

    B.TECH CSE

    10804631

    RK1R08B39

  • 8/2/2019 karantp

    2/10

    ACKNOWLEDGEMENT

    First and foremost I, KARANBIR SINGH is very thankful to Lect.VIJAY GARG

    who assigned me this term paper HUFFMAN CODES.

    I am hearty thankful to college library for providing the books, my roommates

    and classmates for helping me in assembling the notes related to this topic.

    Last but not the least; I am very thankful to my parents who give me financial

    support to complete my term paper.

    KARANBIR SINGH

  • 8/2/2019 karantp

    3/10

    Contents

    1) Introduction2) Types of Huffman coding

    a)N-ray Huffman coding

    b)Adaptive Huffman coding

    c)Huffman template algorithm

    d)Length limited Huffman coding

    e)Huffman coding with unequal letter costs

    f)Hu-tucker coding

    g)Canonical Huffman code

    3) Properties

    4) Advantages

    5) Disadvantages

    6) Applications7) References

  • 8/2/2019 karantp

    4/10

    INTRODUCTION

    Huffman coding is an entropy encodingalgorithm used forlosslessdata compression. It was developed by David A. Huffman while he

    was a Ph.D. student at MIT, and published in the 1952 ".Huffman

    coding is based on the frequency of occurrence of a data item (pixel in

    images). The principle is to use a lower number of bits to encode the

    data that occurs more frequently. Codes are stored in a Code Book

    which may be constructed for each image or a set of images. In all

    cases the code book plus encoded data must be transmitted to enable

    decoding.Huffman coding uses a specific method for choosing therepresentation for each symbol, resulting in aprefix code sometimes

    called "prefix-free codes", that is, the bit string representing some

    particular symbol is never a prefix of the bit string representing any

    other symbol that expresses the most common source symbols using

    shorter strings of bits than are used for less common source symbols.

    Huffman was able to design the most efficient compression method of

    this type ,no other mapping of individual source symbols to unique

    strings of bits will produce a smaller average output size when theactual symbol frequencies agree with those used to create the code. A

    method was later found to design a Huffman code in linear time if

    input probabilities (also known as weights) are sorted. Huffman

    coding is equivalent to simple binaryblock encoding,

    e.g., ASCII coding. Huffman coding is such a widespread method for

    creating prefix codes that the term "Huffman code" is widely used as

    a synonym for "prefix code" even when such a code is not produced

    by Huffman's algorithm.

    http://en.wikipedia.org/wiki/Entropy_encodinghttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/David_A._Huffmanhttp://en.wikipedia.org/wiki/Doctor_of_Philosophyhttp://en.wikipedia.org/wiki/Massachusetts_Institute_of_Technologyhttp://en.wikipedia.org/wiki/Prefix_codehttp://en.wikipedia.org/wiki/Linear_timehttp://en.wikipedia.org/wiki/Block_codehttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/Block_codehttp://en.wikipedia.org/wiki/Linear_timehttp://en.wikipedia.org/wiki/Prefix_codehttp://en.wikipedia.org/wiki/Massachusetts_Institute_of_Technologyhttp://en.wikipedia.org/wiki/Doctor_of_Philosophyhttp://en.wikipedia.org/wiki/David_A._Huffmanhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Entropy_encoding
  • 8/2/2019 karantp

    5/10

    TYPES OF HUFFMAN CODING

    N-ary Huffman coding

    The n-ary Huffman algorithm uses the {0, 1, ... , n 1} alphabet to

    encode message and build an n-ary tree. This approach was

    considered by Huffman in his original paper. The same algorithm

    applies as for binary (n equals 2) codes, except that the n least

    probable symbols are taken together, instead of just the 2 least

    probable. Note that forn greater than 2, not all sets of source words

    can properly form an n-ary tree for Huffman coding. In this case,additional 0-probability place holders must be added. This is because

    the tree must form an n to 1 contractor; for binary coding, this is a 2 to

    1 contractor, and any sized set can form such a contractor. If the

    number of source words is congruent to 1 modulo n-1, then the set of

    source words will form a proper Huffman tree.

    Adaptive Huffman coding

    A variation called adaptive Huffman coding involves calculating the

    probabilities dynamically based on recent actual frequencies in the

    sequence of source symbols, and changing the coding tree structure to

    match the updated probability estimates.

    Huffman template algorithm

    Most often, the weights used in implementations of Huffman coding

    represent numeric probabilities, but the algorithm given above does

    not require this; it requires only that the weights form a totally

    orderedcommutative monoid, meaning a way to order weights and to

    add them. The Huffman template algorithm enables one to use any

    kind of weights (costs, frequencies, pairs of weights, non-numericalweights) and one of many combining methods (not just addition).

    http://en.wikipedia.org/wiki/Adaptive_Huffman_codinghttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Monoid#Commutative_monoidhttp://en.wikipedia.org/wiki/Monoid#Commutative_monoidhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Adaptive_Huffman_coding
  • 8/2/2019 karantp

    6/10

    Such algorithms can s

    minimizing

    design.

    Length-limited Hu

    Length-limited Huffm

    achieve a minimum w

    restriction that the len

    constant. Thepackage

    simple greedy approacalgorithm. Its time co

    length of a codeword.

    in linear or linearithmi

    conventional Huffman

    Huffman coding w

    In the standard Huffm

    symbol in the set that t

    equal cost to transmit:

    always have a cost of

    how many are 1s, etc.

    minimizing the total c

    number of digits are th

    Huffman coding with

    which this assumption

    encoding alphabet ma

    characteristics of the t

    encoding alphabet of

    than a 'dot', and theref

    higher. The goal is stil

    length, but it is no lonsymbols used by the m

    lve other minimization problems

    , a problem first applie

    fman coding

    n coding is a variant where the g

    ighted path length, but there is a

    th of each codeword must be less

    merge algorithm solves this prob

    h very similar to that used by Huplexity is , where is the

    o algorithm is known to solve t

    c time, unlike the pre-sorted and

    problems, respectively.

    th unequal letter costs

    n coding problem, it is assumed t

    he code words are constructed fr

    a code word whose length isNdi

    , no matter how many of those d

    hen working under this assump

    st of the message and minimizin

    e same thing.

    nequal letter costs is the generali

    is no longer assumed true: the let

    have non-uniform lengths, due t

    ansmission medium. An example

    orse code, where a 'dash' takes l

    re the cost of a dash in transmiss

    to minimize the weighted avera

    er sufficient just to minimize theessage. No algorithm is known to

    such as

    to circuit

    al is still to

    additional

    than a given

    em with a

    fman'smaximum

    is problem

    nsorted

    hat each

    m has an

    its will

    igits are 0s,

    ion,

    the total

    ation in

    ers of the

    is the

    nger to send

    on time is

    e codeword

    number ofsolve this in

    http://en.wikipedia.org/wiki/Package-merge_algorithmhttp://en.wikipedia.org/wiki/Greedy_algorithmhttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Package-merge_algorithmhttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Morse_codehttp://en.wikipedia.org/wiki/Morse_codehttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Greedy_algorithmhttp://en.wikipedia.org/wiki/Package-merge_algorithm
  • 8/2/2019 karantp

    7/10

    the same manner or wi

    Huffman coding.

    Optimal alphabeti

    In the standard Huffm

    codeword can corresp

    version, the alphabetic

    Thus, for example,

    code

    eitherknown as the Hu-Tuc

    presenting the first lin

    alphabetic problem, w

    algorithm, but is not a

    alphabetic binary trees

    Canonical Huffma

    If weights correspondi

    numerical order, the H

    optimal alphabetic co

    lengths, rendering Hu-

    from numerically (re-)

    Huffman code and is o

    encoding/decoding. T

    called Huffman-Shann

    Huffman coding, but a

    Fano coding. The Huf

    example is

    lengths as the original

    th the same efficiency as convent

    binary trees (Hu-Tucker c

    n coding problem, it is assumed t

    nd to any input symbol. In the al

    order of inputs and outputs must

    could not be assigned

    , but instead should be assig

    or . Ter problem, after the authors of t

    arithmic solution to this optimal

    ich has some similarities to Huff

    variation of this algorithm. These

    are often used asbinary search tr

    code

    ng to the alphabetically ordered i

    uffman code has the same lengths

    e, which can be found from calcu

    Tucker coding unnecessary. The

    ordered input is sometimes called

    ften the code used in practice, du

    e technique for finding this code

    on-Fano coding, since it is optim

    lphabetic in weight probability, li

    man-Shannon-Fano code corresp

    , which, having the same

    solution, is also optimal.

    ional

    ding)

    hat any

    habetic

    be identical.

    ed

    his is alsoe paper

    inary

    man

    optimal

    ees.

    puts are in

    as the

    lating these

    ode resulting

    the canonical

    to ease of

    is sometimes

    l like

    e Shannon-

    nding to the

    codeword

    http://en.wikipedia.org/wiki/Linearithmichttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Linearithmichttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Linearithmic
  • 8/2/2019 karantp

    8/10

    PROPERTIES

    1. Unique Prefix Property: no code is a prefix to any other

    code (all symbols are at the leaf nodes) -> great for

    decoder, unambiguous.

    2. If prior statistics are available and accurate, then

    Huffman coding is very good

    3. The frequencies used can be generic ones for the

    application domain that are based on average experience,

    or they can be the actual frequencies found in the text

    being compressed.

    4. Huffman coding is optimal when the probability of each

    input symbol is a negative power of two.

    5. The worst case for Huffman coding can happen when

    the probability of a symbol 6

    cedes 2-1 = 0.5, making the upper limit of inefficiencyunbounded. These situations often respond well to a form

    of blocking called run-length encoding.

    ADVANTAGES

    Algorithm is easy to implement

    Produce a lossless compression of images

    DISADVANTAGES

    Efficiency depends on the accuracy of the

    statistical model used and type of image.

  • 8/2/2019 karantp

    9/10

    Algorithm varies with different formats, but few

    get any better than 8:1 compression.

    Compression of image files that contain long runs

    of identical pixels by Huffman is not as efficient

    when compared to RLE.

    The Huffman encoding process is usually done in

    two passes. During the first pass, a statistical

    model is built, and then in the second pass the

    image data is encoded based on the generated

    model. From here we can see that Huffman

    encoding is a relatively slow process as time isrequired to build the statistical model in order to

    archive an efficient compression rate.

    Another disadvantage of Huffman is that, all

    codes of the encoded data are of different sizes (not of fixed length).

    Therefore it is very difficult

    for the decoder to know that it has reached the

    last bit of a code, and the only way for it to knowis by following the paths of the up-side down tree

    and coming to an end of it (one of the branch).

    Thus, if the encoded data is corrupted with

    additional bits added or bits missing, then

    whatever that is decoded will be wrong values,

    and the final image displayed will be garbage.

    It is required to send Huffman table at the

    beginning of the compressed file ,otherwise the

    decompressor will not be able to decode it. This

    causes overhead.

  • 8/2/2019 karantp

    10/10

    APPLICATIONS

    1. Arithmetic coding can be viewed as a generalization of

    Huffman coding; indeed, in practice arithmetic coding is

    often preceded by Huffman coding, as it is easier to find

    an arithmetic code for a binary input than for a nonbinary

    input.

    2. Huffman coding is in wide use because of its simplicity,

    high speed and lack of encumbrance by patents.

    3. Huffman coding today is often used as a "back-end" to

    some other compression method. DEFLATE (PKZIP's

    algorithm) and multimedia codecs such as JPEG and MP3

    have a front-end model and quantization followed by

    Huffman coding.

    REFERENCES

    1.www.google.com/Huffman

    2. http://en.wikipedia.org/huffman_codes

    3. A.V.Aho, J.E. Hopcroft and J.D.Ullman, The Design and Analysis

    Of Computer Algorithms, Pearson Education Asia, 2007

    4. T.H. Cormen, C.E. Leiserson, R.L. Rivest and C. Stein,

    Introduction to Algorithms, PHI Pvt. Ltd., 2007

    http://en.wikipedia.org/huffman_codeshttp://en.wikipedia.org/huffman_codes