EC02-informatica-infocod

8/13/2019 EC02-informatica-infocod

1/44

Informtica

Ing. Aeronutica

Information coding

mircoles 12 de febrero de 14


2/44

Informtica

Ing. AeronuticaInformation coding

!"#$% %'(%)*+#

,-./)0$(1 #2#+).#

3(%'4$(1 #$1()4 (-./)0#

3(%'4$(1 0)"5 (-./)0#

3(%'4$(1 +)6+#

7)4-(4"(%2 "(4 %'.*0)##$'(

2



3/44

J. Vila & E. Hernndez

Informtica

Ing. AeronuticaIntroduction

!"#$% %'(%)*+#

All information processed by a digital computer needs to be encoded:transformed it into a form of representation suitable for the computer.

- Numerical values: magnitudes of computer applications related to geometry

(longitudes, angles,...), physics (pressure, temperature, volumes, forces,...),

mathematics, statistics, finances, etc

- Text informationin different formats, like books, reports, manuals, etc

- Media information: graphics, images, videos, sounds, etc.

- Computer programs

Computers encode information using a binary numbers rather than decimal

numbers.- Binary encoding will be introduced in short.

3



4/44


Informtica


!"#$% %'(%)*+#

Information range: the area of variation between upper and lower limits of amagnitude. All the set of different values or codes that the information may take.

- Example: assume that a magnitude like a length is encoded in decimal with 3

integer digits and 2 decimal digits. The information range is [000.00 ... 999.99].

Information accuracy: information resolution, i.e, the minimum representableinformation value.

- Example: in the above representation, accuracy is 0.01 units.

Information volume: amount of information. Number of measurements(information instances) of a magnitude times the number of digits of each

measurement.- Example: in the above representation, 1000 length measurements have a

volume of 5000 digits.

- Useful to compute the capacity of a storing device.

4



5/44


Informtica


!"#$% %'(%)*+#

Information compression: reduction of the information volume by

- Removing redundancy

! Represent repeated items in a compact form. Example 1000 consecutive whitepixels in a graphic.

- Reducing accuracy! Use a representation with less digits.

5



6/44

Informtica


!"#$% %'(%)*+#

,-./)0$(1 #2#+).#

3(%'4$(1 #$1()4 (-./)0#

3(%'4$(1 0)"5 (-./)0#

3(%'4$(1 +)6+#

7)4-(4"(%2 "(4 %'.*0)##$'(

6



7/44


Informtica

Ing. AeronuticaNumbering systems

8'#$9'("5 (-./)0$(1 #2#+).#

Numbers are represented as a sequence of digits where each digit has a weightaccording to its position.

Numbering in base buses set of digits D= {0,1,2, ..., b-1}

AssumeXis represented asXnXn!1...X2X1X0,X!1,X!2,...X!m in base b.

Example:

- In decimalb=10 and D= {0,1,2, ..., 9}

7

X =

nX

i=m

Xi bi

1, 234.5610 = 1 103 + 2 102 + 3 101 + 4 100+

+ 5 101 + 6 102

!"#



8/44


Informtica


:;) /$("02 #2#+).

In binaryb=10 and D= {0,1}

Converting from binary to decimal: use eq. (1)

Converting from decimal to binary:

8

1101002 = 1 25

+ 1 24

+ 0 23

+ 1 22

+ 0 21

+ 0 20

= 5210

22

2 2

26

0 13

1 60 3

1 1

1

2

0 Stop26 (10= 11010 (2

2 into 26 goes 13 times and 0 is left over

2 squared, 2 cubed, 2 to the fourth,...



9/44


Informtica


:;) /$("02 #2#+).

A bitis binary digit.

The rangeof a representation in base bwith ndigits is [0 ... bn!1]

- The range corresponds to Pr(2,n): n-permutations of 2 elements with repetition.

- Example: b=2, n=3

! [0 ... 23 !1] = [000, 001, 010, 011, 100, 101, 110, 111]

A byteis an 8-bit binary code.

- The range of this representation is [010 ... 25510].

The information volumein the binary system is usually measured as the number

of bytes. Multiples of the byte:- KbKilobyte = 210bytes=1,024 bytes

- MbMegabyte = 220bytes=1,024 Kbytes

- GbGigabyte = 230bytes=1,024 Mbytes

- TbTerabyte = 240bytes=1,024 Gbytes

9



10/44


11/44


12/44


Informtica


!$("02 "0$+;.)9%#

Algorithms for operations with two bits or more:

12

$%%&'() *+,-./0'() 1+2'32&0/'()

!"##$

!"##$

%#&'



13/44


Informtica


;)6?@ #2#+).

In hexadecimalb=16, d = {0, 1, 2, ..., 9, A, B, C, D, E, F}

Converting from hex to decimal:

Converting from decimal to hex: algorithm successive divisions

Converting between binary and hex

- Group binary digits in fours. Four binary digits correspond to one hex digit

13

0x7F9A = 7 163 + F 162 + 9 161 + A 160 =

7 163 + 15 162 + 9 161 + 10 160 = 43, 51110

8 A C 2 16



14/44

Informtica


!"#$% %'(%)*+#

,-./)0$(1 #2#+).#

3(%'4$(1 #$1()4 (-./)0#

3(%'4$(1 0)"5 (-./)0#

3(%'4$(1 +)6+#

7)4-(4"(%2 "(4 %'.*0)##$'(

14



15/44


Informtica

Ing. AeronuticaEncoding signed numbers

A$1(B"(4B."1($+-4) 0)*0)#)(+"9'(

Allocate the most significant bit to represent the sign.

- 0for positive numbers, 1for negative numbers.

The remaining bits indicate the magnitude (or absolute value).

Example:

- 001010102= 4210 , 101010102= -4210

Representation rangewith nbits: [!2n!1!1, ..., 2n!1!1].

- With 8 bits: [!127,...,+127]

Disadvantages:

- Two different zeros:00000000 (0) and 10000000 (-0).

15



16/44


Informtica


:C'D#B%'.*5).)(+ 0)*0)#)(+"9'(

The representation of a negative number -x in n-bits is defined as its twos-complement.

The twos complement can be calculated in decimal as 2n!x modulus 2n:

! -x !C2(x,n) = (2n!x) % 2n (with |x| < 2n)

! Examples:

! C2(3,4) = (24!3) % 24 = 1310 = 11012"-3

! C2(3,8) = (28!3) % 28 = 25310= 1111 11012 (sign extension) "-3

! C2(-3,4)= (24+ 3) % 24= 310 = 00112"3

! C2(0,4) = (24!0) % 24 = 010 = 00002"0

16



17/44


Informtica


:C'D#B%'.*5).)(+ 0)*0)#)(+"9'(

The twos complement of a binary n-bit representation is a new representationwith range [!2n!1, ..., 2n!1!1]in which:

- Codes [0,... ,2n!1!1] "positive integers [0,... ,2n!1!1]

- Codes [2n!1,... ,2n!1] "negative integers [-2n-1,... ,!1]

Example- n=4 "range = [-8, 7]

- codes [0,7]"positive integers[0,7], codes [8,15]"negative integers[-8,-1]

17

,&)/.4 0(%6 +)7&8)6% 9:(;7

0(>>> > >>>>" " "

>>"> ? ?

>>"" @ @

>">> A A

>">" B B

>""> C C

>""" D D

,&)/.4 0(%6 +)7&8)6% 9:(;7

0(>" G FD

">"> "> FC

">"" "" FB

"">> "? FA

"">" "@ F@

"""> "A F?

"""" "B F"



18/44


Informtica


:C'D#B%'.*5).)(+ 0)*0)#)(+"9'(

Converting from decimal to twos complement:

1. Invert the bits

2. Add one

- Example:

! C2(3,4) : 0011"(invert)"1100"(add 1)"1101

! C2(-3,4) : 1101"(invert)"0010"(add 1)"0011

Converting from twos complement to decimal:

- ifit is a positive number (MSB=0), then apply weighted digits eq. (1):

!

Example: 0111 = 0 x 2

3

+ 1 x 2

2

+ 1 x 2

1

+ 1 x 2

0

= 710- if it is a negative number (MSB=1), then compute the twos complement and

apply weighted digits eq. (1) to get the absolute value. Next, change the sign ofthe absolute value.

! Example: 1111 "(invert)"0000 "(add 1)"0001

! 0001 = 0 x 23+ 0 x 22+ 0 x 21+ 1 x 20= 110"-110

18



19/44


Informtica


:C'D#B%'.*5).)(+ 0)*0)#)(+"9'(

Property 1:x - y = x + (-y) = x + C2(y,n)

! Example

! 2 - 3 =0010 !0011 = 1111 = -110

! 2 - 3 = 2 + (-3) = 0010 + 1101 = 1111 = -110

Property 2:only one zero "0......0000

Property 3: sign extension.

- Positive numbers have MSB=0

- Negative numbers have MSB=1

- C2(3,4) = 11012"C2(3,8) = 1111 11012 (sign extension)

19



20/44


Informtica


36%)##BE 0)*0)#)(+"9'(

Excess-K (also called biased representation) of an n-bit representation is arepresentation with range [!K, ..., 2n!1!K]that uses a pre-specified number Kas a biasing value to displace the origin of the representation so as to map themost negative number of the representation (-K) to the code 0000.

Example

- Excess-K, K=8, n=4"range = [-8, 7]

- codes [0,7]"negative integers[-8,-1], codes [8,15]"positive integers[0,7]

20

,&)/.4 0(%6 +)7&8)6% HI0677FE

>>>> > FE

>>>" " FD

>>"> ? FC

>>"" @ FB

>">> A FA

>">" B F@

>""> C F?

>""" D F"

,&)/.4 0(%6 +)7&8)6% HI0677FE

">>> E >

">>" G "

">"> "> ?

">"" "" @

"">> "? A

"">" "@ B

"""> "A C

"""" "B D



21/44


Informtica


36%)##BE 0)*0)#)(+"9'(

Converting from decimal to excess-K:

- Add K to x in decimal and then convert it to binary

- Examples: Assume n=4, K=8.

! x=-3!(add 8)!-3+8=5!(binary)!0101

! x=3!(add 8)!3+8=11!(binary)!1011

Converting from excess-K to decimal:

- Convert it to decimal and then subtract K

- Examples: Assume n=4, K=8.

!

x=0011!

(decimal)!

3!

(subtract 8)!

3-8=-5! x=1011!(decimal)!11!(subtract 8)!11-8=3

21



22/44


Informtica


36%)##BE 0)*0)#)(+"9'(

Property 1: it is monotonic increasing, so it eases to perform comparisons (>,


23/44


Informtica


A-.."02 'F #$1()4 0)*0)#)(+"9'(#

23

/$("02 %'4) -(#$1()4 #$1( G

."1($+-4H:C'D#%'.*5I

36%)##BJ

KKKK > > > FE

KKKL " " " FD

KKLK ? ? ? FC

KKLL @ @ @ FBKLKK A A A FA

KLKL B B B F@

KLLK C C C F?

KLLL D D D F"

LKKK E F> FE >

LKKL G F" FD "LKLK "> F? FC ?

LKLL "" F@ FB @

LLKK "? FA FA A

LLKL "@ FB F@ B

LLLK "A FC F? C

LLLL "B FD F" D



24/44

Informtica


!"#$% %'(%)*+#

,-./)0$(1 #2#+).#

3(%'4$(1 #$1()4 (-./)0#

3(%'4$(1 0)"5 (-./)0#

3(%'4$(1 +)6+#

7)4-(4"(%2 "(4 %'.*0)##$'(

24



25/44


26/44


Informtica

Ing. AeronuticaEncoding real numbers

M5'"9(1B*'$(+ 0)*0)#)(+"9'(

It consists of a fixed number of significant digits, called mantissa, which arescaled them using an exponent. The basefor the scaling is usually 2 or 10:

mantissa "baseexponent

- Examples of the same number using different exponents (scaling factors):

! 1125.0"100 112.5"101 11.25"102 1.125"103 0.1125"104

- The point can float, i.e., be placed anywhere relative to the significant digits of

the number.

Normalized representation: the one that the point follows the most significantdigit different from zero: 1,125"103.

Advantage: it supports a much wider range of values with the same number ofdigits

26



27/44


Informtica


N333 A+"(4"04 F'0 M5'"9(1B 8'$(+ O0$+;.)9% =N333 PQR@

It describes several formats with different accuracies. A given format comprises:

- Finite numbers. Described by three integers (s,c,q). The value of the number is:

- (!1)s"c "bq

- Two infinities: +"and !".

- Two kinds of NaN(Not A Number)

Finite numbers:

- s: the sign (zero or one).

- c: is the mantissa (also called significand or coefficient).

! Uses sign-and-magnitude format. The sign of the mantissa is the sign bit.

! Normalized format " the point follows the most significant digit different from zero. Sincethis bit is always a 1, it is implied and there is no need to store it.

- q: is the exponent.

! Uses excess-K representation with K = 2ne !1!1 ,wherene: number of bits of the exponent.

! K=15 for IEEE-16, K=127 for IEEE-32, and K=1023 for IEEE-64.

- b: is the base which may be 2 or 10.27



28/44


Informtica


N333 A+"(4"04 F'0 M5'"9(1B 8'$(+ O0$+;.)9% =N333 PQR@

28

Name Base Digits Digits Digits Min. Max.

Total Mantissa Exponent Number Number

Half precision 2 16 10+1 5 9.3132 1010 4.2949 109

Single precision 2 32 23+1 8 1.1754 1038 3.4028 1038

Double precision 2 64 52+1 11 2.2250 10308 1.7977 10308

!"#$ %&'($)$* +,$-..,

/ 0 /1

/ 2 34

/ // 03

! !!!! !!!! !!!!!!!

! !!!! !!!! !!!!#!!

$ %%%% %%%% %%%%!!!

% %%%% %%%% %%%%%%&

% %%%% %%%% %%%%%%%

! !!!! !!!! !!!!#!'

!"#$%&

'()*%&

+,%-

56

6

7,7

+"$

1

+,&



29/44


Informtica


N333 A+"(4"04 F'0 M5'"9(1B 8'$(+ O0$+;.)9% =N333 PQR@

Converting fromfloating-point to decimal- Example single: 7F7F FFFF16

- Sign: leading bit"0 "positive number.

- Exponent: 8 bits after the sign"111 1111 02= 25410.

- It is in in excess-127"254 !127 = 127

- Mantissa: 23 bits after the exponen plus the implied bit which is always 1.

- It is: 1,11111....1. Represented with sign-and magnitude.

- Use eq (1) to get the value in decimal:- 1#20+ 1#2!1+ 1#2!2+ + 1#2!23= 1.999999880790710 "2

Result: +2 "2127!3.4028 "1038

29



30/44


Informtica


N333 A+"(4"04 F'0 M5'"9(1B 8'$(+ O0$+;.)9% =N333 PQR@

Converting fromdecimal to floating-point (1)- Example -29.6875 to double

- Convert the absolute value of the number to binary. Convert the integral and

fractional parts separately:

! 2910= 111012

! 0.6875 "2 = 1.375 "1

! 0.3750 "2 = 0.750 "0

! 0.75 "2 = 1.5 "1

! 0.5 "2 = 1.0 "1

! 0.687510= 0.10112"29.687510= 11101.10112= 11101.10112"20

- Normalize the number: 11101.10112"20"1.110110112"24

- Generate the mantissa. Omit the implied one. Fill with zeros on the right up to the

52 bits of the mantissa. Using hex notation:

! 1101 1011 0000 0000 ... 00002= D B000 0000 00001630



31/44


Informtica


N333 A+"(4"04 F'0 M5'"9(1B 8'$(+ O0$+;.)9% =N333 PQR@

Converting fromdecimal to floating-point (2): example -29.6875 to double

- Generate the exponent: expressed in excess-1023. For IEEE-64 the bias is

1023. Add the bias:

! 410+ 102310= 102710= 100 0000 00112= 40316

- Set the sign bit: 1 "negative

- Place the sign, exponent, and mantissa into the fields of the IEEE format:

Result: !29.687510= C03D B000 0000 000016IEEE64

31



32/44

Informtica


!"#$% %'(%)*+#

,-./)0$(1 #2#+).#

3(%'4$(1 #$1()4 (-./)0#

3(%'4$(1 0)"5 (-./)0#

3(%'4$(1 +)6+#

7)4-(4"(%2 "(4 %'.*0)##$'(

32



33/44


34/44


Informtica

Ing. AeronuticaEncoding texts

:;) OASNN %'4)

The American Standard Code for Information Interchange is a 7-bit codingscheme that supports the English alphabetand control characters.

Example:

- sends to the console the following codes (decimal):

- 65 110 32 9 32 65 83 67 73 73 32 13 32 10 32 116 101 120 116 32 10

Drawback: it lacks for symbols from other languagesSolutions:

- Extended 8-bit ASCII coding: ISO 8859-1 standard, known asISO Latin 1

- Unicode ...

34



35/44


Informtica



36/44


Informtica


:;) U($%'4) #+"(4"04 =NAVWN3S LKXRX@

Attempt to create a universal character set with support for most of the worldswriting systems.

Not only a character chart; it defines a complete encoding methodology. It dealswith aspects like:

- Character properties (upper and lower case)- Rules for composition of characters with different types of accents

- Normalization rules for obtaining equivalent forms, etc

It specifies a name and a unique numeric identifier for each character or symbol,named the code point.

Originally this identifier was intended to be coded as a 16-bit integer, but overtime it proved to be insufficient.

36



37/44


Informtica


:;) U($%'4) #+"(4"04 =NAVWN3S LKXRX@

Unicode defines three encoding forms under the name UTF (Unicode TransformationFormat):

- UTF-8- byte oriented coding with variable length symbols (1 to 4 bytes per Unicode

character).

! One-Byte: Those listed in US-ASCII, a total of 128 characters.

! Two-byte: A total of 1920 characters. Includes the characters romances diacritics, andGreek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac ...

! Three-byte: Unicode Basic Multilingual Plane, which together with the previous group,includes CJK characters in the group: Chinese, Japanese and Korean.

! Four-byte: Supplemental multilingual plane. Mathematical symbols. Linear B syllabic andideographic alphabet Persian, Phoenician ... And the supplementary ideographic plane:

Han characters used unusual.

- UTF-16- it uses a 16-bit code for the Basic Multilingual Plane (BMP) and two 16-bit

(surrogates pairs) for additional less frequent planes.

- UTF-32- 32-bit encoding of fixed length, and the simplest of the three.

World Wide Web was ASCII until December 2007, when it was surpassed by UTF-8.

37



38/44


Informtica


M'0."Y)4 +)6+

A markup language is a way to encrypt a document which, in addition to thetext, includes labels or markings to specify the structure of the text.

- Examples: HTML, nroff, troff, LaTeX, RTF

RTF(Rich Text Format) used for text editing:

{\rtf1\ansi\ansicpg1252\cocoartf1138 {\fonttbl\f0\froman\fcharset0 TimesNewRomanPSMT;}

{\colortbl;\red255\green255\blue255;}

\pard This is a {\b boldface} example.

}

Presentational markup: used by traditional text editors. Marking is performed bythe text editor in such a way that marking is hidden from human usersproducing the WYSIWYG (What You See Is What You Get) effect.

Procedural marking: used by LaTeX and some HTML editors. In these systemsthe user explicitly writes the formatting labels in the source file.

38



39/44

Informtica


!"#$% %'(%)*+#

,-./)0$(1 #2#+).#

3(%'4$(1 #$1()4 (-./)0#

3(%'4$(1 0)"5 (-./)0#

3(%'4$(1 +)6+#

7)4-(4"(%2 "(4 %'.*0)##$'(

39



40/44


Informtica

Ing. AeronuticaRedundancy and compression

7)4-(4"(+ )(%'4$(1

Information may get corrupted when it is transmitted through communicationlines or stored in disks or other storing devices.

Redundancy is used to detectand to detect-and-correcterrors.

- Error detection: parity bit, checksums

- Error detection and correction: ECC. They require higher levels of redundancy.

40



41/44


Informtica


7)4-(4"(+ )(%'4$(1

A parity bit:redundantbit added to a set of bits to ensure that the number ofbits with value 1 in the outcome is even or odd.

- Even parity: 1100 0011

- Odd parity:0100 0011

Parity bits are often used when transmitting ASCII characters from/to peripherals.

41

!"#$%&'$$%&'$(')*+,

-)+.($*((*%.



42/44


Informtica


N(F'0."9'( %'.*0)##$'(

Data compression: process of transforming an encoded information using fewerbits than the original representation uses.

- Goal: to reduce the information volume and the consumption of expensive

resources, such as hard disk space or transmission bandwidth.

It has a cost: extra processing for compressing-decompressing.

- Trade-off between the costs of encoding and decoding: time consuming

compression"time efficient decompressing. And viceversa.

Two types of compression:

- Lossless compression: the encoded data is not distortioned or modified, so it

can reconstructed from the compressed data.! Example: text compression. ZIP format

- Lossy compression: the original data is only approximately represented. It only

allows to reconstruct an approximation of the original data.

! Example: image/audio compression. PNG, GIF, MPEG, MP3 formats

42



43/44


Informtica


Z'##5)## %'.*0)##$'(

Lossless algorithms usually exploit statistical redundancy in such a way thatmore frequent data are represented with fewer bits.

Huffman coding

- Example: text with only four characters: , A, B, C with frequencies 45%,

35%, 15% and 5% respectively.

- Compressing ratio:

43

!"#$" !"#$%

!"%$" #&$

!"&$" #'$

!"!$" #($ !"'!

!"$$

&"!!

!!!

!!"

!!

!

!"

"

r = 10.45 1 + 0.35 2 + 0.15 3 + 0.05 3

2

= 12.5%



44/44

Informtica


Z'##2 %'.*0)##$'(

It compresses data by discarding (losing) some of it.

Usually based on perceptual coding: transforming the raw data obtained from adevice to a domain that more accurately reflects the information content.

- Example: a sound file can be more efficiently represented as the frequency

spectrum over time than as the amplitude levels.

Lossy encoding/decoding programs are usually known as codecs.

Key point: required accuracy or Quality of Service (QoS).

- Example: image qualities for video conference 640x480, 800x600,

1920x1080, ...

Documents

EC02-informatica-infocod