Click here to load reader

View

72Download

5

Embed Size (px)

DESCRIPTION

- 1. Theory of Computation Formalism, Computation, & Compilation Vladimir Kulyukin
- 2. Outline Formalism & Computation Software/Hardware Duality Churchs Thesis Programming Language L Compilation Finite State Automata & Tokenization CFGs & Syntactic Analysis Recursive-Descent Parsing
- 3. Formalism & Computation
- 4. Outline To a software developer, the question why do we need programming languages? seems silly: we need programming languages to develop software, of course! To a CS theorist, there is a different answer: we need in order to study computation, one must have a formalism in which computation can be expressed Is there the best formalism to work with? Unlikely, because the formalism we use is inseparable from the computation we study (this is the software/hardware duality principle)
- 5. Churchs Thesis On first look, the previous answer seems circular (and it is, to some extent!): before deciding on a formalism we must have a pretty good idea of the computation we want to study and, vice versa, we cannot begin to study computation until we have a formalism that allows us to express that computation Chicken-and-egg conundrum: which comes first formalism or computation? This is the heart of what is known as Church's thesis
- 6. Alonzo Church (1903 - 1995) Alonzo Church developed -calculus, a formal system for defining functions, applying functions, and recursion
- 7. Churchs Thesis The commonsense formulation of Church's Thesis: Everything computable can be computed by a formalism X X can be replaced by -calculus or Turing machine or some other formalism (C++, Python, Java, etc.) Another subtle and often unstated assumption in Churchs thesis is that there is a device that can mechanically execute computational instructions expressed in that formalism
- 8. Choice of Formalism Choice of formalism is both objective and subjective It is objective in that many formalisms have been shown to be equivalent (at least, on natural numbers): any computation that can be expressed in one can be expressed in another, and vice versa Similarly, programming languages are equivalent in the sense that an algorithm implemented in one language, can be implemented in a different one without any loss of generality (modulo standard tradeoffs such as speed vs. ease of development & maintenance) It is subjective in that people always have their own personal preferences
- 9. Choice of Formalism There is a simple assembly-like programming language L developed in Chapter 2 of Computability, Complexity, and Languages by Davis, Weyuker, and Sigal While L is a theoretical construct, it can be thought of as a higher level assembly language Since L is a programming language, it is, in my humble opinion, more appealing to the programmatically inclined than more formal constructs such as -calculus or Turing machine
- 10. Programming Language L
- 11. Ls Tokens 1 1 1 1 1 1 1 1 2 1 2 3 1 2 3 is the same as is the same as is the same as If the subscript is omitted, it is assumed to be 1. For example, Labels : , , , , , ,... Output variable : Local variables : , , ,... Input variables : , , ,... A A Z Z X X A B C D E A Y Z Z Z X X X
- 12. Ls Basic Instructions (Primitives) and the right - hand side are the same NOTE : In instructions 1, 2, 3 the variables on the left - hand side 4. IF 0 GOTO (cond. branch) 3. (no - opp) 2. 1 (decrement) 1. 1 (increment) V L V V V V V V
- 13. Instruction V V + 1 These instructions are primitives: X1 X1 + 1 Z10 Z10 + 1 Y Y + 1 X102 X102 + 1 These instructions are NOT primitives: X1 X10 + 1 Z10 X1 + 1 Y X102 + 1
- 14. Instruction V V - 1 These instructions are primitives: X1 X1 - 1 Z10 Z10 - 1 Y Y - 1 X102 X102 1 These instructions are NOT primitives: X1 X10 - 1 Z10 X1 - 1 Y X102 - 1
- 15. Instruction V V These instructions are primitives: X1 X1 Z10 Z10 X120 X120 Y Y These instructions are NOT primitives: X1 Y X120 Z10 Z10 X1
- 16. Ls Labeled Primitives after GOTO. However, in conditional dispatches the square brackets are dropped NOTE : At the beginning of the line the label is in square brackets. 4. L IF 0 GOTO (cond. branch) 3. L (no - opp) 2. L 1 (decrement) 1. L 1 (increment) V L V V V V V V
- 17. Labeled Primitives: Examples [A1] X1 X1 + 1 [B1] X23 X23 1 [C10] Z12 Z12 + 1 [E1] Y Y [D101] IF X1 != 0 GOTO E1
- 18. Increments and Decrements Since there is no upper limit on variable values, the increment instruction always succeeds (there are no buffer overflows): V V + 1 In the above instruction Vs value is always incremented by 1 Since variable values are natural numbers, the decrement instruction has no effect if the value of the variable is 0 V V 1 if V is 0 before the instruction, V remains 0 after the instruction If V > 0 before the instruction, Vs value is decremented by 1
- 19. The Output Value of Ls Program The output value of an L program is the value of the Y variable If an L program goes into an infinite loop, the value is undefined Thus, an L program implements a function that maps the values of the input variables into the value of Y
- 20. Exit Label E We will assume that each L program has a unique exit label E or (E1) If conditional dispatch with GOTO E or GOTO E1 is executed, the control exits the program and its execution terminates If we want to be explicit about this, we can assume that the implicit last statement of every L-program is [E1] return Y
- 21. Example otherwise 1 if 0 ( ) x x f x
- 22. Implementing f(x) in L X A Y Y A X X X A Y Y A X X IF 0 GOTO 1 [ ] 1 Or, if we do not want to use subscripts : IF 0 GOTO 1 [ ] 1 1 1 1 1 1
- 23. Compiling L-Programs
- 24. Three Stages of Compilation Syntactic Analysis: The source program is processed to determine its conformity to the language grammar and its structure Contextual Analysis: The output of the syntactic analysis (a parse tree) is checked for its conformity to the languages contextual constraints Code Generation: The checked parse tree is used to generate the target code, e.g. Java byte code or assembly or some other target language
- 25. Components of Syntactic Analysis Syntactic Analysis consists of Tokenization and Parsi