karantp

8/2/2019 karantp

1/10

TERM PAPER

CSE-408

TOPIC:HUFFMAN CODES

SUBMITTED TO:- SUBMITTED BY:-

MR.VIJAY GARG KARANBIR SINGH

B.TECH CSE

10804631

RK1R08B39

8/2/2019 karantp

2/10

ACKNOWLEDGEMENT

First and foremost I, KARANBIR SINGH is very thankful to Lect.VIJAY GARG

who assigned me this term paper HUFFMAN CODES.

I am hearty thankful to college library for providing the books, my roommates

and classmates for helping me in assembling the notes related to this topic.

Last but not the least; I am very thankful to my parents who give me financial

support to complete my term paper.

KARANBIR SINGH

8/2/2019 karantp

3/10

Contents

1) Introduction2) Types of Huffman coding

a)N-ray Huffman coding

b)Adaptive Huffman coding

c)Huffman template algorithm

d)Length limited Huffman coding

e)Huffman coding with unequal letter costs

f)Hu-tucker coding

g)Canonical Huffman code

3) Properties

4) Advantages

5) Disadvantages

6) Applications7) References

8/2/2019 karantp

4/10

INTRODUCTION

Huffman coding is an entropy encodingalgorithm used forlosslessdata compression. It was developed by David A. Huffman while he

was a Ph.D. student at MIT, and published in the 1952 ".Huffman

coding is based on the frequency of occurrence of a data item (pixel in

images). The principle is to use a lower number of bits to encode the

data that occurs more frequently. Codes are stored in a Code Book

which may be constructed for each image or a set of images. In all

cases the code book plus encoded data must be transmitted to enable

decoding.Huffman coding uses a specific method for choosing therepresentation for each symbol, resulting in aprefix code sometimes

called "prefix-free codes", that is, the bit string representing some

particular symbol is never a prefix of the bit string representing any

other symbol that expresses the most common source symbols using

shorter strings of bits than are used for less common source symbols.

Huffman was able to design the most efficient compression method of

this type ,no other mapping of individual source symbols to unique

strings of bits will produce a smaller average output size when theactual symbol frequencies agree with those used to create the code. A

method was later found to design a Huffman code in linear time if

input probabilities (also known as weights) are sorted. Huffman

coding is equivalent to simple binaryblock encoding,

e.g., ASCII coding. Huffman coding is such a widespread method for

creating prefix codes that the term "Huffman code" is widely used as

a synonym for "prefix code" even when such a code is not produced

by Huffman's algorithm.
http://en.wikipedia.org/wiki/Entropy_encodinghttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/David_A._Huffmanhttp://en.wikipedia.org/wiki/Doctor_of_Philosophyhttp://en.wikipedia.org/wiki/Massachusetts_Institute_of_Technologyhttp://en.wikipedia.org/wiki/Prefix_codehttp://en.wikipedia.org/wiki/Linear_timehttp://en.wikipedia.org/wiki/Block_codehttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/Block_codehttp://en.wikipedia.org/wiki/Linear_timehttp://en.wikipedia.org/wiki/Prefix_codehttp://en.wikipedia.org/wiki/Massachusetts_Institute_of_Technologyhttp://en.wikipedia.org/wiki/Doctor_of_Philosophyhttp://en.wikipedia.org/wiki/David_A._Huffmanhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Entropy_encoding

8/2/2019 karantp

5/10

TYPES OF HUFFMAN CODING

N-ary Huffman coding

The n-ary Huffman algorithm uses the {0, 1, ... , n 1} alphabet to

encode message and build an n-ary tree. This approach was

considered by Huffman in his original paper. The same algorithm

applies as for binary (n equals 2) codes, except that the n least

probable symbols are taken together, instead of just the 2 least

probable. Note that forn greater than 2, not all sets of source words

can properly form an n-ary tree for Huffman coding. In this case,additional 0-probability place holders must be added. This is because

the tree must form an n to 1 contractor; for binary coding, this is a 2 to

1 contractor, and any sized set can form such a contractor. If the

number of source words is congruent to 1 modulo n-1, then the set of

source words will form a proper Huffman tree.

Adaptive Huffman coding

A variation called adaptive Huffman coding involves calculating the

probabilities dynamically based on recent actual frequencies in the

sequence of source symbols, and changing the coding tree structure to

match the updated probability estimates.

Huffman template algorithm

Most often, the weights used in implementations of Huffman coding

represent numeric probabilities, but the algorithm given above does

not require this; it requires only that the weights form a totally

orderedcommutative monoid, meaning a way to order weights and to

add them. The Huffman template algorithm enables one to use any

kind of weights (costs, frequencies, pairs of weights, non-numericalweights) and one of many combining methods (not just addition).
http://en.wikipedia.org/wiki/Adaptive_Huffman_codinghttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Monoid#Commutative_monoidhttp://en.wikipedia.org/wiki/Monoid#Commutative_monoidhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Adaptive_Huffman_coding

8/2/2019 karantp

6/10

Such algorithms can s

minimizing

design.

Length-limited Hu

Length-limited Huffm

achieve a minimum w

restriction that the len

constant. Thepackage

simple greedy approacalgorithm. Its time co

length of a codeword.

in linear or linearithmi

conventional Huffman

Huffman coding w

In the standard Huffm

symbol in the set that t

equal cost to transmit:

always have a cost of

how many are 1s, etc.

minimizing the total c

number of digits are th

Huffman coding with

which this assumption

encoding alphabet ma

characteristics of the t

encoding alphabet of

than a 'dot', and theref

higher. The goal is stil

length, but it is no lonsymbols used by the m

lve other minimization problems

, a problem first applie

fman coding

n coding is a variant where the g

ighted path length, but there is a

th of each codeword must be less

merge algorithm solves this prob

h very similar to that used by Huplexity is , where is the

o algorithm is known to solve t

c time, unlike the pre-sorted and

problems, respectively.

th unequal letter costs

n coding problem, it is assumed t

he code words are constructed fr

a code word whose length isNdi

, no matter how many of those d

hen working under this assump

st of the message and minimizin

e same thing.

nequal letter costs is the generali

is no longer assumed true: the let

have non-uniform lengths, due t

ansmission medium. An example

orse code, where a 'dash' takes l

re the cost of a dash in transmiss

to minimize the weighted avera

er sufficient just to minimize theessage. No algorithm is known to

such as

to circuit

al is still to

additional

than a given

em with a

fman'smaximum

is problem

nsorted

hat each

m has an

its will

igits are 0s,

ion,

the total

ation in

ers of the

is the

nger to send

on time is

e codeword

number ofsolve this in
http://en.wikipedia.org/wiki/Package-merge_algorithmhttp://en.wikipedia.org/wiki/Greedy_algorithmhttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Package-merge_algorithmhttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Morse_codehttp://en.wikipedia.org/wiki/Morse_codehttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Greedy_algorithmhttp://en.wikipedia.org/wiki/Package-merge_algorithm

8/2/2019 karantp

7/10

the same manner or wi

Huffman coding.

Optimal alphabeti

In the standard Huffm

codeword can corresp

version, the alphabetic

Thus, for example,

code

eitherknown as the Hu-Tuc

presenting the first lin

alphabetic problem, w

algorithm, but is not a

alphabetic binary trees

Canonical Huffma

If weights correspondi

numerical order, the H

optimal alphabetic co

lengths, rendering Hu-

from numerically (re-)

Huffman code and is o

encoding/decoding. T

called Huffman-Shann

Huffman coding, but a

Fano coding. The Huf

example is

lengths as the original

th the same efficiency as convent

binary trees (Hu-Tucker c

n coding problem, it is assumed t

nd to any input symbol. In the al

order of inputs and outputs must

could not be assigned

, but instead should be assig

or . Ter problem, after the authors of t

arithmic solution to this optimal

ich has some similarities to Huff

variation of this algorithm. These

are often used asbinary search tr

code

ng to the alphabetically ordered i

uffman code has the same lengths

e, which can be found from calcu

Tucker coding unnecessary. The

ordered input is sometimes called

ften the code used in practice, du

e technique for finding this code

on-Fano coding, since it is optim

lphabetic in weight probability, li

man-Shannon-Fano code corresp

, which, having the same

solution, is also optimal.

ional

ding)

hat any

habetic

be identical.

ed

his is alsoe paper

inary

man

optimal

ees.

puts are in

as the

lating these

ode resulting

the canonical

to ease of

is sometimes

l like

e Shannon-

nding to the

codeword
http://en.wikipedia.org/wiki/Linearithmichttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Linearithmichttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Linearithmic

8/2/2019 karantp

8/10

PROPERTIES

1. Unique Prefix Property: no code is a prefix to any other

code (all symbols are at the leaf nodes) -> great for

decoder, unambiguous.

2. If prior statistics are available and accurate, then

Huffman coding is very good

3. The frequencies used can be generic ones for the

application domain that are based on average experience,

or they can be the actual frequencies found in the text

being compressed.

4. Huffman coding is optimal when the probability of each

input symbol is a negative power of two.

5. The worst case for Huffman coding can happen when

the probability of a symbol 6

cedes 2-1 = 0.5, making the upper limit of inefficiencyunbounded. These situations often respond well to a form

of blocking called run-length encoding.

ADVANTAGES

Algorithm is easy to implement

Produce a lossless compression of images

DISADVANTAGES

Efficiency depends on the accuracy of the

statistical model used and type of image.

8/2/2019 karantp

9/10

Algorithm varies with different formats, but few

get any better than 8:1 compression.

Compression of image files that contain long runs

of identical pixels by Huffman is not as efficient

when compared to RLE.

The Huffman encoding process is usually done in

two passes. During the first pass, a statistical

model is built, and then in the second pass the

image data is encoded based on the generated

model. From here we can see that Huffman

encoding is a relatively slow process as time isrequired to build the statistical model in order to

archive an efficient compression rate.

Another disadvantage of Huffman is that, all

codes of the encoded data are of different sizes (not of fixed length).

Therefore it is very difficult

for the decoder to know that it has reached the

last bit of a code, and the only way for it to knowis by following the paths of the up-side down tree

and coming to an end of it (one of the branch).

Thus, if the encoded data is corrupted with

additional bits added or bits missing, then

whatever that is decoded will be wrong values,

and the final image displayed will be garbage.

It is required to send Huffman table at the

beginning of the compressed file ,otherwise the

decompressor will not be able to decode it. This

causes overhead.

8/2/2019 karantp

10/10

APPLICATIONS

1. Arithmetic coding can be viewed as a generalization of

Huffman coding; indeed, in practice arithmetic coding is

often preceded by Huffman coding, as it is easier to find

an arithmetic code for a binary input than for a nonbinary

input.

2. Huffman coding is in wide use because of its simplicity,

high speed and lack of encumbrance by patents.

3. Huffman coding today is often used as a "back-end" to

some other compression method. DEFLATE (PKZIP's

algorithm) and multimedia codecs such as JPEG and MP3

have a front-end model and quantization followed by

Huffman coding.

REFERENCES

1.www.google.com/Huffman

2. http://en.wikipedia.org/huffman_codes

3. A.V.Aho, J.E. Hopcroft and J.D.Ullman, The Design and Analysis

Of Computer Algorithms, Pearson Education Asia, 2007

4. T.H. Cormen, C.E. Leiserson, R.L. Rivest and C. Stein,

Introduction to Algorithms, PHI Pvt. Ltd., 2007
http://en.wikipedia.org/huffman_codeshttp://en.wikipedia.org/huffman_codes

Documents

karantp