Data Compression Huffman Coding - University of Babylon · 2016-12-16 · The Huffman Coding Algorithm: This technique was developed by David Huffman as part of a class assignment;

وزارة التعليم العالي والبحث العلمي

كلية العلوم للبنات-جامعة بابل

الحاسبات-قسم

Data Compression

- Huffman Coding

بأشراف االستاذ

علي كاظم محمد

***************************

-A-ةاعداد طالبات المرحلة الرابع

- ختام حسين حبيب

- رسل سمير عبد العالي

Introduction of Huffman Code

In computer science and information theory, a Huffman code is a particular type

of optimal prefix code that is commonly used for lossless data compression. The

process of finding and/or using such a code proceeds by means of Huffman

coding, an algorithm developed by David A. Huffman while he was a Sc.D. student

at MIT, and published in the 1952 paper "A Method for the Construction of

Minimum-Redundancy Codes.

The output from Huffman's algorithm can be viewed as a variable-length code

table for encoding a source symbol (such as a character in a file). The algorithm

derives this table from the estimated probability or frequency of occurrence

(weight) for each possible value of the source symbol. As in other entropy

encoding methods, more common symbols are generally represented using fewer

bits than less common symbols. Huffman's method can be efficiently

implemented, finding a code in time linear to the number of input weights if these

weights are sorted.[2] However, although optimal among methods encoding

symbols separately, Huffman coding is not always optimal among all compression

methods.

The Huffman encoding scheme takes advantage of the disparity between

frequencies and uses less storage for the frequently occurring characters at the

expense of having to use more storage for each of the more rare characters.

Huffman is an example of a variable-length encoding some characters may only

require 2 or 3 bits and other characters may require 7, 10, or 12 bits. The savings

from not having to use a full 8 bits for the most common characters makes up for

having to use more than 8 bits for the rare characters and the overall effect is that

the file almost always requires less space.

The Huffman Coding Algorithm:

This technique was developed by David Huffman as part of a class assignment; the

class was the first ever in the area of information theory and was taught by

Robert Fano at MIT [22].

The codes generated using this technique or procedure are called Huffman codes.

These codes are prefix codes and are optimum for a given model (set of

probabilities).

The Huffman procedure is based on two observations regarding optimum prefix

codes.

1. In an optimum code, symbols that occur more frequently (have a higher

probability of occurrence) will have shorter codewords than symbols that occur

less frequently.

2. In an optimum code ,the two symbols that occur least frequently will have the

same length.

It is easy to see that the first observation is correct. If symbols that occur more

often had codewords that were longer than the codewords for symbols that

occurred less often, the average number of bits per symbol would be larger than

if the conditions were reversed.

Therefore, a code that assigns longer codewords to symbols that occur more

frequently cannot be optimum

Example 1:-

If the alphabet {a,b,c,d,e,f} with p(a)=18% ,p(b)=25% ,p(c)=12% ,p(d)=20%

,p(e)=10% ,p(f)=15% ,Encoding using Huffman .

Solution:-

ai P(ai) code Li e 10% 101 3

c 12% 100 3 f 15% 001 3

a 18% 000 3

d 20% 11 2 b 25% 01 2

1- We link e and c in ec

ec=P(e)+p(c)=22%

ai P(ai)

f 15%

a 18%

d 20%

ec 22%

b 25%

2- We link f and a in fa

fa=p(f)+p(a)=33%

ai P(ai)

d 20%

ec 22% b 25%

fa 33%

3- We link d and ec in dec

dec=p(d)+p(ec)=42%

4- We link b and fa in bfa

bfa=p(b)+p(fa)=58%

5- We link dec and bfa in decbfa

decbfa=p(dec)+p(bfa)=100%

ai P(ai)

b 25%

fa 33%

dec 42%

ai P(ai)

dec 42%

bfa 58%

ai P(ai)

decbfa 100%

* decbfa represents the root.

decbfa 100%

0 1

%58bfadec42% 0 1 0 1

%33 fa b 25% %22 ec d 20% 1

0 1 0 1

%18 a f 15% 1 12% c e 10%

* The average length for this code is

L average=3*10+3*12+3*15+3*18+2*20+2*25=255/100%

=2.55 bits/symbol

* The entropy for this source is given by

=-[0.1*log2 0.1+0.12*log2 0.12+0.15*log2 0.15+0.18*log2 0.18+

0.2*log2 0.2+0.25*log2 0.25]

=2.4

* The efficiency of the Huffman code is

Efficiency= H / L average *100%

=2.46/2.55*100%

=96%

* Length - Limited Huffman Codes =

Example 2:-

If the alphabet {a1,a2,a3,a4,a5} with p(a1)=p(a3)=0.2,p(a2)=0.4,p(a4)=p(a5)=0.1

The entropy for this source is 2.122 . To design the Huffman code.

Solution:-

ai P(ai) code Li

a5 0.1 0000 4

a4 0.1 0001 4

a3 0.2 001 3

a1 0.2 01 2

a2 0.4 1 1

1- We link a5 and a4 in a5a4

a5a4=P(a5)+p(a4)=0.2

ai P(ai)

a5a4 0.2

a3 0.2

a1 0.2

a2 0.4

2- We link a5a4 and a3 in a5a4a3

a5a4a3=P(a5a4)+p(a3)=0.4

ai P(ai)

a1 0.2

a5a4a3 0.4

a2 0.4

3- We link a1 and a5a4a3 in a1a5a4a3

a1a5a4a3=P(a1)+p(a5a4a3)=0.6

4- We link a2 and a1a5a4a3 in a2a1a5a4a3

a2a1a5a4a3=P(a2)+p(a1a5a4a3)=1

* a2a1a5a4a3 represents the root

ai P(ai)

a2 0.4

a1a5a4a3 0.6

ai P(ai)

a2a1a5a4a3 1

a2a1a5a4a3

0 1

a1a5a4a3 (0.6) a2 (0.4)

0 1

a5a4a3 (0.4) a1 (0.2)

0 1

a5a4 (0.2) a3 (0.2)

0 1

a5 (0.1) a4(0.1)

* The average length for this code is

L average=0.4*1+0.2*2+0.2*3+0.1*4+0.1*4

=2.2 bits/symbol

* The efficiency of the Huffman code is

Efficiency= H / L average *100%

=2.122/2.2

=96.5%

* Length - Limited Huffman Codes =

=0.93

Documents

Data Compression Huffman Coding - University of Babylon · 2016-12-16 · The Huffman Coding Algorithm: This technique was developed by David Huffman as part of a class assignment;