Upload
karanbir-singh
View
223
Download
0
Embed Size (px)
Citation preview
8/2/2019 karantp
1/10
TERM PAPER
CSE-408
TOPIC:HUFFMAN CODES
SUBMITTED TO:- SUBMITTED BY:-
MR.VIJAY GARG KARANBIR SINGH
B.TECH CSE
10804631
RK1R08B39
8/2/2019 karantp
2/10
ACKNOWLEDGEMENT
First and foremost I, KARANBIR SINGH is very thankful to Lect.VIJAY GARG
who assigned me this term paper HUFFMAN CODES.
I am hearty thankful to college library for providing the books, my roommates
and classmates for helping me in assembling the notes related to this topic.
Last but not the least; I am very thankful to my parents who give me financial
support to complete my term paper.
KARANBIR SINGH
8/2/2019 karantp
3/10
Contents
1) Introduction2) Types of Huffman coding
a)N-ray Huffman coding
b)Adaptive Huffman coding
c)Huffman template algorithm
d)Length limited Huffman coding
e)Huffman coding with unequal letter costs
f)Hu-tucker coding
g)Canonical Huffman code
3) Properties
4) Advantages
5) Disadvantages
6) Applications7) References
8/2/2019 karantp
4/10
INTRODUCTION
Huffman coding is an entropy encodingalgorithm used forlosslessdata compression. It was developed by David A. Huffman while he
was a Ph.D. student at MIT, and published in the 1952 ".Huffman
coding is based on the frequency of occurrence of a data item (pixel in
images). The principle is to use a lower number of bits to encode the
data that occurs more frequently. Codes are stored in a Code Book
which may be constructed for each image or a set of images. In all
cases the code book plus encoded data must be transmitted to enable
decoding.Huffman coding uses a specific method for choosing therepresentation for each symbol, resulting in aprefix code sometimes
called "prefix-free codes", that is, the bit string representing some
particular symbol is never a prefix of the bit string representing any
other symbol that expresses the most common source symbols using
shorter strings of bits than are used for less common source symbols.
Huffman was able to design the most efficient compression method of
this type ,no other mapping of individual source symbols to unique
strings of bits will produce a smaller average output size when theactual symbol frequencies agree with those used to create the code. A
method was later found to design a Huffman code in linear time if
input probabilities (also known as weights) are sorted. Huffman
coding is equivalent to simple binaryblock encoding,
e.g., ASCII coding. Huffman coding is such a widespread method for
creating prefix codes that the term "Huffman code" is widely used as
a synonym for "prefix code" even when such a code is not produced
by Huffman's algorithm.
http://en.wikipedia.org/wiki/Entropy_encodinghttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/David_A._Huffmanhttp://en.wikipedia.org/wiki/Doctor_of_Philosophyhttp://en.wikipedia.org/wiki/Massachusetts_Institute_of_Technologyhttp://en.wikipedia.org/wiki/Prefix_codehttp://en.wikipedia.org/wiki/Linear_timehttp://en.wikipedia.org/wiki/Block_codehttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/Block_codehttp://en.wikipedia.org/wiki/Linear_timehttp://en.wikipedia.org/wiki/Prefix_codehttp://en.wikipedia.org/wiki/Massachusetts_Institute_of_Technologyhttp://en.wikipedia.org/wiki/Doctor_of_Philosophyhttp://en.wikipedia.org/wiki/David_A._Huffmanhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Lossless_data_compressionhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Entropy_encoding8/2/2019 karantp
5/10
TYPES OF HUFFMAN CODING
N-ary Huffman coding
The n-ary Huffman algorithm uses the {0, 1, ... , n 1} alphabet to
encode message and build an n-ary tree. This approach was
considered by Huffman in his original paper. The same algorithm
applies as for binary (n equals 2) codes, except that the n least
probable symbols are taken together, instead of just the 2 least
probable. Note that forn greater than 2, not all sets of source words
can properly form an n-ary tree for Huffman coding. In this case,additional 0-probability place holders must be added. This is because
the tree must form an n to 1 contractor; for binary coding, this is a 2 to
1 contractor, and any sized set can form such a contractor. If the
number of source words is congruent to 1 modulo n-1, then the set of
source words will form a proper Huffman tree.
Adaptive Huffman coding
A variation called adaptive Huffman coding involves calculating the
probabilities dynamically based on recent actual frequencies in the
sequence of source symbols, and changing the coding tree structure to
match the updated probability estimates.
Huffman template algorithm
Most often, the weights used in implementations of Huffman coding
represent numeric probabilities, but the algorithm given above does
not require this; it requires only that the weights form a totally
orderedcommutative monoid, meaning a way to order weights and to
add them. The Huffman template algorithm enables one to use any
kind of weights (costs, frequencies, pairs of weights, non-numericalweights) and one of many combining methods (not just addition).
http://en.wikipedia.org/wiki/Adaptive_Huffman_codinghttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Monoid#Commutative_monoidhttp://en.wikipedia.org/wiki/Monoid#Commutative_monoidhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Adaptive_Huffman_coding8/2/2019 karantp
6/10
Such algorithms can s
minimizing
design.
Length-limited Hu
Length-limited Huffm
achieve a minimum w
restriction that the len
constant. Thepackage
simple greedy approacalgorithm. Its time co
length of a codeword.
in linear or linearithmi
conventional Huffman
Huffman coding w
In the standard Huffm
symbol in the set that t
equal cost to transmit:
always have a cost of
how many are 1s, etc.
minimizing the total c
number of digits are th
Huffman coding with
which this assumption
encoding alphabet ma
characteristics of the t
encoding alphabet of
than a 'dot', and theref
higher. The goal is stil
length, but it is no lonsymbols used by the m
lve other minimization problems
, a problem first applie
fman coding
n coding is a variant where the g
ighted path length, but there is a
th of each codeword must be less
merge algorithm solves this prob
h very similar to that used by Huplexity is , where is the
o algorithm is known to solve t
c time, unlike the pre-sorted and
problems, respectively.
th unequal letter costs
n coding problem, it is assumed t
he code words are constructed fr
a code word whose length isNdi
, no matter how many of those d
hen working under this assump
st of the message and minimizin
e same thing.
nequal letter costs is the generali
is no longer assumed true: the let
have non-uniform lengths, due t
ansmission medium. An example
orse code, where a 'dash' takes l
re the cost of a dash in transmiss
to minimize the weighted avera
er sufficient just to minimize theessage. No algorithm is known to
such as
to circuit
al is still to
additional
than a given
em with a
fman'smaximum
is problem
nsorted
hat each
m has an
its will
igits are 0s,
ion,
the total
ation in
ers of the
is the
nger to send
on time is
e codeword
number ofsolve this in
http://en.wikipedia.org/wiki/Package-merge_algorithmhttp://en.wikipedia.org/wiki/Greedy_algorithmhttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Package-merge_algorithmhttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Morse_codehttp://en.wikipedia.org/wiki/Morse_codehttp://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functionshttp://en.wikipedia.org/wiki/Greedy_algorithmhttp://en.wikipedia.org/wiki/Package-merge_algorithm8/2/2019 karantp
7/10
the same manner or wi
Huffman coding.
Optimal alphabeti
In the standard Huffm
codeword can corresp
version, the alphabetic
Thus, for example,
code
eitherknown as the Hu-Tuc
presenting the first lin
alphabetic problem, w
algorithm, but is not a
alphabetic binary trees
Canonical Huffma
If weights correspondi
numerical order, the H
optimal alphabetic co
lengths, rendering Hu-
from numerically (re-)
Huffman code and is o
encoding/decoding. T
called Huffman-Shann
Huffman coding, but a
Fano coding. The Huf
example is
lengths as the original
th the same efficiency as convent
binary trees (Hu-Tucker c
n coding problem, it is assumed t
nd to any input symbol. In the al
order of inputs and outputs must
could not be assigned
, but instead should be assig
or . Ter problem, after the authors of t
arithmic solution to this optimal
ich has some similarities to Huff
variation of this algorithm. These
are often used asbinary search tr
code
ng to the alphabetically ordered i
uffman code has the same lengths
e, which can be found from calcu
Tucker coding unnecessary. The
ordered input is sometimes called
ften the code used in practice, du
e technique for finding this code
on-Fano coding, since it is optim
lphabetic in weight probability, li
man-Shannon-Fano code corresp
, which, having the same
solution, is also optimal.
ional
ding)
hat any
habetic
be identical.
ed
his is alsoe paper
inary
man
optimal
ees.
puts are in
as the
lating these
ode resulting
the canonical
to ease of
is sometimes
l like
e Shannon-
nding to the
codeword
http://en.wikipedia.org/wiki/Linearithmichttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Linearithmichttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Shannon-Fano_codinghttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Canonical_Huffman_codehttp://en.wikipedia.org/wiki/Binary_search_treehttp://en.wikipedia.org/wiki/Linearithmic8/2/2019 karantp
8/10
PROPERTIES
1. Unique Prefix Property: no code is a prefix to any other
code (all symbols are at the leaf nodes) -> great for
decoder, unambiguous.
2. If prior statistics are available and accurate, then
Huffman coding is very good
3. The frequencies used can be generic ones for the
application domain that are based on average experience,
or they can be the actual frequencies found in the text
being compressed.
4. Huffman coding is optimal when the probability of each
input symbol is a negative power of two.
5. The worst case for Huffman coding can happen when
the probability of a symbol 6
cedes 2-1 = 0.5, making the upper limit of inefficiencyunbounded. These situations often respond well to a form
of blocking called run-length encoding.
ADVANTAGES
Algorithm is easy to implement
Produce a lossless compression of images
DISADVANTAGES
Efficiency depends on the accuracy of the
statistical model used and type of image.
8/2/2019 karantp
9/10
Algorithm varies with different formats, but few
get any better than 8:1 compression.
Compression of image files that contain long runs
of identical pixels by Huffman is not as efficient
when compared to RLE.
The Huffman encoding process is usually done in
two passes. During the first pass, a statistical
model is built, and then in the second pass the
image data is encoded based on the generated
model. From here we can see that Huffman
encoding is a relatively slow process as time isrequired to build the statistical model in order to
archive an efficient compression rate.
Another disadvantage of Huffman is that, all
codes of the encoded data are of different sizes (not of fixed length).
Therefore it is very difficult
for the decoder to know that it has reached the
last bit of a code, and the only way for it to knowis by following the paths of the up-side down tree
and coming to an end of it (one of the branch).
Thus, if the encoded data is corrupted with
additional bits added or bits missing, then
whatever that is decoded will be wrong values,
and the final image displayed will be garbage.
It is required to send Huffman table at the
beginning of the compressed file ,otherwise the
decompressor will not be able to decode it. This
causes overhead.
8/2/2019 karantp
10/10
APPLICATIONS
1. Arithmetic coding can be viewed as a generalization of
Huffman coding; indeed, in practice arithmetic coding is
often preceded by Huffman coding, as it is easier to find
an arithmetic code for a binary input than for a nonbinary
input.
2. Huffman coding is in wide use because of its simplicity,
high speed and lack of encumbrance by patents.
3. Huffman coding today is often used as a "back-end" to
some other compression method. DEFLATE (PKZIP's
algorithm) and multimedia codecs such as JPEG and MP3
have a front-end model and quantization followed by
Huffman coding.
REFERENCES
1.www.google.com/Huffman
2. http://en.wikipedia.org/huffman_codes
3. A.V.Aho, J.E. Hopcroft and J.D.Ullman, The Design and Analysis
Of Computer Algorithms, Pearson Education Asia, 2007
4. T.H. Cormen, C.E. Leiserson, R.L. Rivest and C. Stein,
Introduction to Algorithms, PHI Pvt. Ltd., 2007
http://en.wikipedia.org/huffman_codeshttp://en.wikipedia.org/huffman_codes