MT311 Java Application Programming and Programming Languages Li Tak Sing ( 李德成 )

MT311 Java Application Programming and Programming

Languages

Li Tak Sing (李德成 )

Type checking

Some languages do type checking while others do not.

Type checking means that the compiler will report errors when the actual type of a variable does not match the expected type.

Strong typing

A language is strongly typed if all type errors are reported either during runtime or compile time.

Fortran is not a strongly typed language because it does not report type errors.

Pascal is a nearly strongly typed language because it reports most type errors.

Java is a strongly typed language because it reports all type errors.

Type compatibility

Consider the following declaration statements:

type arraytype1=array [1..10] of integer;

arraytype2=array [1..10] of integer;

var

A,B:arraytype1;

C: arraytype2;

Type compatibility

There are two fundamental rules for type compatibility:

Name type compatibility. Two variables are of the same type only if they are declared with the same type name. A, B in the above example are of the same type. However, C is not of the same type with A.

Type compatibility

Structure type compatibility. Two variables are of the same type if they are of the same structure. A and C would be of the same type.

A comparison of the two types of compatibility

Writability Name type compatibility has lower writability

because it is more restrictive. Structure type compatibility has higher

writability because the programmer can now treat more data types to be the same and therefore can manipulate them together.


Implementation cost Name type compatibility has lower

implementation cost because it is easy to check two variables to be of the same type.

Structure type compatibility has higher implementation case because it is more difficult to check whether two variables are of the same type.


Reliability Name type compatibility is more restrictive

means that it is less likely to mistakenly assign a value to a wrong variable. So it is more reliable.

Structure type compatibility is less restrictive and therefore is less reliable.

Type compatibility

Few programming languages use strict name type compatibility or strict structure type compatibility. Most use combinations of the two. For example, Pascal uses a slight variation of name type compatibility called declaration equivalence — a programmer may define a data type to be equivalent to another type, then the two data types are compatible even though they have different names. C uses a variation of structure type compatibility, while C++ and Ada use a variation of name type compatibility.

Scope

The scope of a variable is the range of statements in which the variable can be referenced. The next reading discusses the scope method that is used in most programming languages — static scope.

Static scope

In languages with static scoping, the compiler can determine the scope of each variable by inspecting the program.

Static scope

Using Pascal syntax:procedure sub1;

var x, y : integer;

procedure sub2;

var x : integer

begin

x := 1;

y := 2;

end

begin

...

end

Static scope

Since sub2 is nested in sub1, we called sub1 a static parent of sub2. Any static parent of sub2, and their parents, and so forth becomes a static ancestor of sub1.y is called a non-local variable of sub2 since the declaration of y is not located in sub2.

Static scope

An important task of a compiler of a language with static scope is to find the correct declaration of any variable it encounters. For example, in the above Pascal program fragment, when the compiler sees “y:=2”, it has to search for the declaration statement of y. It will try to find the declaration of y in sub2 (the current procedure), then try sub1 and every static ancestor of sub2 until it finds the declaration. In this case, the declaration is in the static parent sub1.

Dynamic scope

With dynamic scoping, the scope of variables can only be determined at run time. The calling sequence of the subprograms in a program will affect the scope of a variable.

Dynamic scope

Compared to static scoping, dynamic scoping has the following disadvantages:

Dynamic scoping is less reliable because all local variables of the calling subprogram are visible from the called subprogram. This would make information hiding impossible. For example, when subprogram A calls subprogram B, then all the local variables of A are visible to B. There is no way to prevent B from changing the values of those variables.

Dynamic scope

The compiler cannot check type compatibility because it does not know where a non-local variable is declared. This information is only available at run time.

Referencing non-local variables is more expensive; The program is more difficult to read because the

identity of a nonlocal variable is difficult to trace by just reading the source program.

Scope and lifetime

Scope is something about 'where' while lifetime is something about 'when'.

Scope is the position in a program where a variable is visible.

Lifetime of a variable is the time when it begin to exist until it is no longer referable.

Scope and lifetime

For example, consider the following C code:

void meth() {

static int a=2;

int b=3;

....

}

The scope of a is the statements after its declaration. The lifetime of it is from the beginning of the program till it ends.

Scope and lifetime

The scope of b is the statements after its declaration. The lifetime of the variable is when the function is invoked till the function ends.

Referencing environments

The scope of a variable is the range of statements that can reference the variables. The referencing environment is just the same concept, but in this case we look from the point of view of a program statement. The referencing environment of a statement is the collection of all variables that are visible in the statement.

Data types

Primitive data types: Primitive data types not only are useful by themselves; they also become building blocks for defining user-defined data types, e.g. record structures, arrays, in languages that allow them. The following primitive data types are commonly available:

Primitive data type

Numeric types — integer, floating-point and decimal. The size of integer is usually one word. The size of floating-point is usually four bytes.

Boolean types — usually has a size of one byte for efficient access.

Character types — usually has a size of one byte except those for Unicode character set.

Primitive data type

The language C is special in that the differences between these three primitive types are very vague. First of all, it has no Boolean types, and variables of both numeric types and character types can be used where a Boolean expression is required.

Primitive data types

Secondly, variables of character types and integer types are interchangeable. The only constraint regarding this is the size difference between an integer variable and a character variable. This philosophy makes the language very flexible. For example, we can change the value of a character variable from ‘a’ to ‘b’ by adding 1 to it.

Primitive data types

With other languages, you have to call a function to do that. The disadvantage is that the type checking mechanism of the compiler is defected because a mixture of different primitive types in an expression is still considered to be valid. This is another example of the conflict between writability and reliability of a language.

Character string types

The key questions that you should ask as you analyse the design of character string types in a programming language are:– Are character strings a primitive type in the language

or are they constructed as an array of characters?– Are character strings in the language declared with

fixed lengths, or can they have variable lengths?– What operations are allowed on the character string

type?

User-defined ordinal types

The two kinds of user-defined ordinal types are the enumeration type and the subrange type. The main advantage of using these types is the improved readability and reliability of the program. However, the enumeration type provided in C only increases readability because the data of enumeration type is internally converted into integer.

User-defined ordinal types

Therefore, function that accepts a parameter of an enumeration type would also accept any integer. Therefore reliability is not increased by using enumeration type in C.

Array types

The key points in the design of array types in a language can be emphasized by asking these questions:– What types are legal as subscripts? Readability

and reliability increase if enumerated types are accepted as subscripts.

Array types

– Are subscripts ranges checked at run time? Some compilers will include run time range checks into generated code to check if an array reference is out of range. Some compilers, including most C compilers, will not. Such checking increases the reliability and running cost.

Array types

– When are subscript ranges bound? Some arrays can have sizes determined at time, others must be determined at run time.

– When is storage allocated? The storage can be bound statically (at compile time) or dynamically (at run time). For dynamically bound array, the storage could be allocated from the stack or from the heap.

Array types

– How many subscripts are allowed? Most modern languages do not put any limit on the number of subscripts.

– Can arrays be initialized at storage allocation? Allowing this would increase the writability because if a language does not have this facility then initialization has to be done with a number of assignment statements.

Array types

Is there a way of defining an array type with no subscript bounds? Consider the case when we need to write a subprogram to sort an array of integers. In Pascal, we would have the following fragment:type arr_type = array [1..10] of integer;

......

procedure sort(var a:arr_type)

begin

.......

Array types

The problem of this code is that sort is only suitable for sorting arrays that are of type arr_type. This means that it cannot be used to sort an integer array of integers that has other than ten members. We would need another procedure for sorting an array with 11 members and one for 12 members, etc. Ada solves this problem by defining an unconstrained array. The same fragment in Ada would be:

Array types

type arr_type is array (Integer range <>) of

Integer;

......

procedure sort(a:in out arr_type)

begin

.......

Array type

– Now, arr_type is an array and its subscripts range is not specified. Now, if we declare two variables A and B as:A: arr_type(0..9);B: arr_type(3..11);

Then both A and B are of type arr_type and therefore can be sorted by using sort. Within sort, the lower and upper bounds of the array can be accessed using different standard attributes of arrays in Ada:

Array type

A’First is the index of the first element in A.

A’Last is the index of the last element in A.

Since C uses pointers to access array, the problem does not apply. However, there is a problem of getting the size of the array. Therefore, in C, we have to explicitly pass the size of the array to the function. Therefore, the same fragment in C would be:

Array Type

void sort(int *a, int size) {

.. .. .. ..

}

We can see that if there is a way of defining an array type without bounds, then the writability would be increased.

Row-major order

In row-major storage, a multidimensional array in linear memory is accessed such that rows are stored one after the other. It is the approach used by the C programming language as well as many other langauges, with the notable exception of Fortran. When using row-major order, the difference between addresses of array cells in increasing rows is larger than addresses of cells in increasing columns.

Row-major order

For example, consider this 2×3 array:

1 2 3

4 5 6

Declaring this array in C as

int A[2][3];

Row-major order

would find the array laid-out in linear memory as:

1 2 3 4 5 6

Row-major order

The difference in offset from one column to the next is 1 and from one row to the next is 3. The linear offset from the beginning of the array to any given element A[row][column] can then be computed as:

offset = row*NUMCOLS + column

where NUMCOLS represents the number of columns in the array—in this case, 3.

Row-major order

To generalize the above formula, if we have the following C array:

int A[n1][n2][n3][n4][n5]

Then, the offset of the element A[m1][m2][m3][m4][m5] are:

offset = m1*n2*n3*n4*n5+ m2*n3*n4*n5+m3*n4+m3*n4*n5+m4*n5+m5

Column-major order

Column-major order is a similar method of flattening arrays onto linear memory, but the columns are listed in sequence. The programming language Fortran uses column-major ordering.

Column-major order

The array

1 2 3

4 5 6

7 8 9

if stored in memory with column-major order would look like the following:

1 4 7 2 5 8 3 6 9

Column-major order

With columns listed first. The memory offset could then be computed as:

offset = row + column*NUMROWS

Where NUMROWS is the number of rows in the array.

Column-major order

To generalize the above formula, if we have the following C array:

int A[n1][n2][n3][n4][n5]

Then, the offset of the element A[m1][m2][m3][m4][m5] are:

offset = m1+ m2*n1+m3*n2*n1+m4*n3*n2*n1+m4*n3*n2*n1+m5*n4*n3*n2*n1

Example

Consider the following array:

int A[3][7][8];

Assume that A[0][0][0] is at address 20000. What is the address of A[2][3][4]

(i) if row-major order is used?

(ii) if column-major order is used?

Example

(i) an integer has 4 bytes, so the address of A[2][3][4] is:20000+(2*7*8+3*8+4)*4

(ii) if column-major order is used, the address is:20000+(2+3*3+4*3*7)*4

Documents

MT311 Java Application Programming and Programming Languages Li Tak Sing ( 李德成 )