Upload
jason-yeo-jie-shun
View
224
Download
0
Embed Size (px)
Citation preview
SLAYING THE DRAGON�
A GUIDE TO IMPLEMENTING YOUR OWN PROGRAMMING LANGUAGE�
HELLO!�
A GUIDE TO IMPLEMENTING YOUR OWN PROGRAMMING LANGUAGE�
A GUIDE TO IMPLEMENTING YOUR OWN PROGRAMMING LANGUAGE�
¯ˉ\_(ツ)_/¯ˉ �
FUN!�
UNDERSTAND HOW PROGRAMMING LANGUAGES WORK�
SystemStackError
def case_when n case n when :foo 42 when :bar 88 when :foobar 4288 end end
def hash_lookup n { foo: 42 bar: 88 foobar: 4288 }[n] end
UNDERSTAND HOW COMPUTERS WORK�
If you don't know how compilers work, then you don't know how computers work. - Steve Yegge (http://steve-yegge.blogspot.sg/2007/06/rich-programmer-food.html)
LET'S MAKE A LISP�
SUPER EASY TO IMPLEMENT�
The "most powerful language in the world” that can be defined in "a page of code." - Alan Kay
HOMOICONIC�
TONS OF GUIDES ONLINE�
norvig.com/lispy.html
buildyourownlisp.com
github.com/kanaka/mal
MAL�
mal>
mal> (+ 2 6)
mal> (+ 2 6) => 8 mal>
mal> (+ 2 6) => 8 mal>
LISP FORM �
mal> (+ 2 6) => 8 mal> (-‐ 4 8)
mal> (+ 2 6) => 8 mal> (-‐ 4 8) => -‐4 mal>
mal> (+ 2 6) => 8 mal> (-‐ 4 8) => -‐4 mal> (+ (-‐ 4 8) 10)
mal> (+ 2 6) => 8 mal> (-‐ 4 8) => -‐4 mal> (+ (-‐ 4 8) 10) => 6 mal>
mal> (+ 2 6) => 8 mal> (-‐ 4 8) => -‐4 mal> (+ (-‐ 4 8) 10) => 6 mal> (< 4 10)
mal> (+ 2 6) => 8 mal> (-‐ 4 8) => -‐4 mal> (+ (-‐ 4 8) 10) => 6 mal> (< 4 10) true
mal>
mal> (if (< 4 10) 42 88)
mal> (if (< 4 10) 42 88) => 42
mal>
mal> (fn* [x] (* x x)) => #<function> mal>
mal> (fn* [x] (* x x)) => #<function> mal>
PARAMS�
mal> (fn* [x] (* x x)) => #<function> mal> FUNCTION BODY �
irb> lambda { |x| x * x }
mal> (fn* [x] (* x x)) => #<function> mal> ((fn* [x] (* x x)) 8)
mal> (fn* [x] (* x x)) => #<function> mal> ((fn* [x] (* x x)) 8) => 64
mal> (fn* [x] (* x x)) => #<function> mal> ((fn* [x] (* x x)) 8) => 64
FUNCTION THAT YOU WANT TO CALL �
mal> (fn* [x] (* x x)) => #<function> mal> ((fn* [x] (* x x)) 8) => 64
ARGUMENTS�
irb> lambda { |x| x * x }.call 8
mal> (fn* [x] (* x x)) => #<function> mal> ((fn* [x] (* x x)) 8)) => 64 mal> (def sq (fn* [x] (* x x))) => #<function>
mal> (fn* [x] (* x x)) => #<function> mal> ((fn* [x] (* x x)) 8)) => 64 mal> (def sq (fn* [x] (* x x))) => #<function> mal> (sq 8) => 64
•
NOW YOU KNOW (BASIC) LISP!�
•
BEFORE WE DIVE IN�
•
RUBINIUS�
•
LLVM�
•
A LANGAUGE PLATFORM�
• WHAT THE HECK IS A VIRTUAL
MACHINE?�
•
AN ABSTRACT MACHINE�
•
LET’S DIVE IN!�
• CUSTOMIZING RUBINIUS’ COMPILATION PIPELINE�
• FILE | STRING�
• FILE | STRING�
COMPILED METHOD�
•
CUSTOMIZE TWO THINGS�IN THIS PIPELINE�
• PARSER�
•
ABSTRACT SYNTAX TREE�
PARSER�
•
MAL PARSER�
TOKENIZER�
TOKENIZER�
READER�
TOKENIZER�
READER�
PARSING LOGIC�
•
TOKENIZER�
(+ my_var 42)
(+ my_var 42)
['(', '+', 'my_var', '42', ')']
/[\s,]*(~@|[\[\]{}()'`~^@]|"(?:\\.|[^\\"])*"|;.*|[^\s\[\]{}('"`,;)]*)/
•
LEXING IN RUBY�
[1] pry(main)>
[1] pry(main)> require 'ripper' => true [2] pry(main)>
[1] pry(main)> require 'ripper' => true [2] pry(main)> Ripper.lex 'foo = 123'
[1] pry(main)> require 'ripper' => true [2] pry(main)> Ripper.lex 'foo = 123' => [[[1, 0], :on_ident, "foo"], [[1, 3], :on_sp, " "], [[1, 4], :on_op, "="], [[1, 5], :on_sp, " "], [[1, 6], :on_int, "123"]]
(+ my_var 42)
['(', '+', 'my_var', '42', ')']
•
READER�
['(', '+', 'my_var', '42', ')']
['(', '+', 'my_var', '42', ')']
[:list, [:symbol, '+'], [:symbol, 'my_var'], [:integer, 42]]
•
MAL GRAMMAR�
<form> ::= <list> | <atom> <list> ::= '(' <form>* ')' | '[' <form>* ']' <atom> ::= a-‐z+ | 0-‐9+ | true | false
<form> ::= <list> | <atom> <list> ::= '(' <form>* ')' | '[' <form>* ']' <atom> ::= a-‐z+ | 0-‐9+ | true | false
<form> ::= <list> | <atom> <list> ::= '(' <form>* ')' | '[' <form>* ']' <atom> ::= a-‐z+ | 0-‐9+ | true | false
<form> ::= <list> | <atom> <list> ::= '(' <form>* ')' | '[' <form>* ']' <atom> ::= a-‐z+ | 0-‐9+ | true | false
<form> ::= <list> | <atom> <list> ::= '(' <form>* ')' | '[' <form>* ']' <atom> ::= a-‐z+ | 0-‐9+ | true | false
<form> ::= <list> | <atom>
def read_form(tokens) if tokens.first =~ /(\(|\[)/ read_list(tokens) else read_atom(tokens.shift) end end
<form> ::= <list> | <atom> <list> ::= '(' <form>* ')' | '[' <form>* ']' <atom> ::= a-‐z+ | 0-‐9+ | true | false
<list> ::= '(' <form>* ')' | '[' <form>* ']'
def read_list(tokens) list = [:list] tokens.shift # pop our opening paren while tokens.first !~ /(\)|\])/ list << read_form(tokens) end tokens.shift # pop our closing paren list end
<form> ::= <list> | <atom> <list> ::= '(' <form>* ')' | '[' <form>* ']' <atom> ::= a-‐z+ | 0-‐9+ | true | false
<atom> ::= a-‐z+ | 0-‐9+ | true | false
def read_atom(token) case token when /^-‐?\d+$/ [:integer, token.to_i] when 'true' [:boolean, :true] when 'false' [:boolean, :false] when /^\D+$/ [:symbol, token] else raise 'Reader error: Unknown token' end end
['(', '+', 'my_var', '42', ')']
(+ my_var 42)
[:list, [:symbol, '+'], [:symbol, 'my_var'], [:integer, 42]]
•
PARSING LOGIC�
•
ABSTRACT SYNTAX TREES�
(+ my_var 42)
(+ my_var 42)
(+ my_var 42)
LEFT HAND SIDE �
(+ my_var 42)
LEFT HAND SIDE � RIGHT HAND SIDE �
(+ my_var 42)
AddNode.new(SymbolNode.new('my_var'), IntegerNode.new(42))
LEFT HAND SIDE � RIGHT HAND SIDE �
(if (< n 2) 42 88)
(if (< n 2) 42 88)
(if (< n 2) 42 88) CONDITIONAL �
(if (< n 2) 42 88) CONDITIONAL �
THEN BRANCH �
(if (< n 2) 42 88) CONDITIONAL �
THEN BRANCH �
ELSE BRANCH �
(if (< n 2) 42 88)
if_node = IfNode( LessThanNode( SymbolNode('n'), IntegerNode(2) ), IntegerNode(42), IntegerNode(88) )
CONDITIONAL �THEN BRANCH �
ELSE BRANCH �
if_node.condition => LessThanNode( SymbolNode('n'), IntegerNode(2) )
if_node.condition => LessThanNode( SymbolNode('n'), IntegerNode(2) ) if_node.then_branch => IntegerNode(42)
if_node.condition => LessThanNode( SymbolNode('n'), IntegerNode(2) ) if_node.then_branch => IntegerNode(42) if_node.else_branch => IntegerNode(88)
def parse_sexp(sexp) type, *rest = sexp case type when :boolean boolean = sexp.last if boolean == :true Malady::AST::TrueBooleanNode.new else Malady::AST::FalseBooleanNode.new end when :symbol name = sexp.last builtins.fetch(name, Malady::AST::SymbolNode.new(name) when :integer Malady::AST::IntegerNode.new sexp[1] when :list rest.map { |sexp| parse(sexp) } end end
•
RUBINIUS BYTECODE�
•
STACK BASED VM�
1 + 2
STACK �
1 + 2
INSTRUCTIONS�
STACK �
1 + 2
push 1
INSTRUCTIONS�
STACK �
1 + 2
push 1
INSTRUCTIONS�
1
STACK �
1 + 2
push 1 push 2
INSTRUCTIONS�
1
STACK �
1 + 2
push 1 push 2
INSTRUCTIONS�
1
2
STACK �
1 + 2
push 1 push 2 add
INSTRUCTIONS�
1
2
STACK �
1 + 2
push 1 push 2 add
INSTRUCTIONS�
3
•
RUBINIUS BYTECODE�
-‐> % rbx compile -‐B -‐e '123+42' ============= :__script__ ============== Arguments: 0 required, 0 post, 0 total Arity: 0 Locals: 0 Stack size: 2 Literals: 1: :+ Lines to IP: 1: 0..9 0000: push_int 123 0002: push_int 42 0004: send_stack :+, 1 0007: pop 0008: push_true 0009: ret -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
•
GENERATING BYTECODE�
class IntegerNode < Node attr_reader :value def initialize(value) @value = value end def bytecode(g) pos(g) g.push_int value end end
class AddNode < Node def initialize(lhs, rhs) @lhs, @rhs = lhs, rhs end def bytecode(g) pos(g) lhs.bytecode(g) rhs.bytecode(g) g.send(:+, 1) end end
class IfNode < Node attr_reader :condition, :then_branch, :else_branch def initialize(filename, line, condition, then_branch, else_branch) super @condition = condition @then_branch = then_branch @else_branch = else_branch end def bytecode(g) pos(g) end_label = g.new_label else_label = g.new_label condition.bytecode(g) g.goto_if_false else_label then_branch.bytecode(g) g.goto end_label else_label.set! else_branch.bytecode(g) end_label.set! end end
class IfNode < Node ... def bytecode(g) pos(g) end_label = g.new_label else_label = g.new_label condition.bytecode(g) g.goto_if_false else_label then_branch.bytecode(g) g.goto end_label else_label.set! else_branch.bytecode(g) end_label.set! end end
class IfNode < Node ... def bytecode(g) pos(g) end_label = g.new_label else_label = g.new_label condition.bytecode(g) g.goto_if_false else_label then_branch.bytecode(g) g.goto end_label else_label.set! else_branch.bytecode(g) end_label.set! end end
class IfNode < Node ... def bytecode(g) pos(g) end_label = g.new_label else_label = g.new_label condition.bytecode(g) g.goto_if_false else_label then_branch.bytecode(g) g.goto end_label else_label.set! else_branch.bytecode(g) end_label.set! end end
class IfNode < Node ... def bytecode(g) pos(g) end_label = g.new_label else_label = g.new_label condition.bytecode(g) g.goto_if_false else_label then_branch.bytecode(g) g.goto end_label else_label.set! else_branch.bytecode(g) end_label.set! end end
class IfNode < Node ... def bytecode(g) pos(g) end_label = g.new_label else_label = g.new_label condition.bytecode(g) g.goto_if_false else_label then_branch.bytecode(g) g.goto end_label else_label.set! else_branch.bytecode(g) end_label.set! end end
class IfNode < Node ... def bytecode(g) pos(g) end_label = g.new_label else_label = g.new_label condition.bytecode(g) g.goto_if_false else_label then_branch.bytecode(g) g.goto end_label else_label.set! else_branch.bytecode(g) end_label.set! end end
•
NOW YOU HAVE A WORKING PROGRAMMING LANGUAGE!�
•
WHERE TO GET HELP?�
github.com/kanaka/mal
github.com/queenfrankie/lani
rubinius.com/doc/en/virtual-‐machine/instructions/
github.com/jsyeo/malady
•
QUESTIONS?�
•
POSTSCRIPT�
http://philosecurity.org/2009/01/12/interview-‐with-‐an-‐adware-‐author
YO DAWG�