17
A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁宁宁宁宁宁宁 mail: [email protected]

A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Embed Size (px)

Citation preview

Page 1: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

A Syntactic Justification ofOccam’s Razor

1

John Woodward, Jerry SwanFoundations of Reasoning Group

University of Nottingham Ningbo, China宁波诺丁汉大学

Email: [email protected]

Page 2: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Overview

Occam’s Razor Definitions Sampling of Program Spaces (Langdon) Two Assumptions Proof of Occam’s Razor Further Work Context

2

Page 3: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Occam’s Razor

Occam’s Razor says has been adopted by the machine learning community to mean;

“Given two hypotheses which agree with the observed data, pick the simplest, as this is more likely to make the correct predictions”

3

Page 4: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Definitions

ProgramHypothesis: a

program (effective procedure which allows us to make predictions)

Size: the number of bits needed to represent a hypothesis.

FunctionConcept: A set of

predictions (input-output pair)

Complexity: the size of the smallest program which computes a given function.

4

Page 5: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

From Hypotheses to Concepts.

5

Page 6: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Langdon 1 (Foundation of Genetic Programming)

1. The limiting distribution of functions is independent of program size!There is a correlation between the frequency in the limiting distributionand the complexity of a function.

6

Page 7: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Langdon 2 (Foundation of Genetic Programming)

7

Page 8: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Hypothesis-Concept Spaces8

Page 9: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Notation

P is the hypothesis space (i.e. a set of programs). |P| is the size of the space (i.e. the cardinality of

the set of programs). F is the concept space (i.e. a set of functions

represented by the programs in P). |F| is the size of the space (i.e. the cardinality of

the set of functions). If two programs pi and pj map to the same function

(i.e. they are interpreted as the same function, I(pi)=f=I(pj)), they belong to the same equivalence class (i.e. pi is in [pj] ↔ I(pi)=I(pj)). The notation [p] denotes the equivalence class which contains the program p (i.e. given I(pi)=I(pj), [pi]=[pj]). The size of an equivalence class [p] is denoted by |[p]|.

9

Page 10: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Two assumptions

1. Uniformly sample the hypothesis space, probability of sampling a given program is 1/|P|.

2. There are fewer hypotheses that represent complex functions

|[p1]|>|[p2]|↔c(f1)<c(f2), where I(p1)=f1 and I(p2)=f2.

Note that |[p1]|/|P| = p(I(p1))=p(f1), that is |[p1]|/|P|=p(f1)

the probability of sampling a function is given by the ratio of the size of the equivalence class containing all the programs which are interpreted that function, divided by the size of the hypothesis space.

10

Page 11: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Proof

Starting from a statement of the assumption

|[p1]|>|[p2]| ↔ c(f1) < c(f2) Dividing the left hand side by |P|, |[p1]|/|P|>|[p2]|/|P| ↔c(f1)< c(f2) As |[p1]|/|P| = p(I(p1)) = p(f1), we can

rewrite as p(f1) > p(f2) ↔ c(f1) < c(f2) Which is a statement of Occam’s razor.

11

Page 12: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Restatement of Occam’s Razor Occam’s Razor is often stated as “prefer

the shortest consistent hypothesis” Our restatement of Occam’s Razor: “The

preferred function is the one that is represented most frequently. The equivalence class which contains the shortest program is represented most frequently.”

12

Page 13: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Summary

Occam’s razor states “pick the simplest hypothesis consistent with data”

We agree, but for a different reason. Restatement. Pick the function that is

represented most frequently (i.e. belongs to the largest equivalence class). Occam’s razor is concerned with probability, and we presented a simple counting argument.

Unlike some interpretations of Occam’s razor we do not discard more complex hypotheses we count them in [p].

We offer no reason to believe the world is simple, our razor only gives a reason to predict using the simplest hypothesis.

13

Page 14: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Further work

There are fewer hypotheses that represent complex functions: |[p1]|>|[p2]|↔ c(f1) < c(f2),

Why are some functions represented more frequently that other functions.

The base functions may contain functions which are: Symmetrical i.e. f(x, y) = f(y, x), e.g. nand. Complementary, i.e. f(g(x))= g(f(x)) e.g. inc

and dec.

14

Page 15: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Further work

Further work - to prove our assumptions. Does the result depend on the primitive

set? How are the primitive functions linked

together (e.g. tree, lists, directed acyclic graphs…). Does this make a difference to the result.

15

Page 16: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

How does nature compute?

Heuristics such as Occam’s razor need not be explicitly present as rules.

Random searches of an agents generating capacity may implicitly carry heuristics.

Axiomatic reasoning probably comes late.

16

Page 17: A Syntactic Justification of Occam’s Razor 1 John Woodward, Jerry Swan Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email:

Thanks & Questions?

1) Thomas M. Cover and Joy A. Thomas. Elements of information theory. John Wiley and Sons 1991.

2) Michael J. Kearns and Umesh V. Vazirani. An introduction to computational learning theory. MIT Press, 1994.

3) William B. Langdon. Scaling of program fitness spaces. Evolutionary Computation, 7(4):399-428,1999.

4) Tom M. Mitchell. Machine Learning. McGraw-Hill 1997.5) S. Russell and P. Norvig. Artificial Intelligence: A

modern approach. Prentice Hall, 1995.6) G. I. Webb. Generality is more significant than

complexity: Toward an alternative to occam’s razor. In 7th Australian Joint Conference on Artificial Intelligence – Artificial Intelligence: Sowing the Seeds for the Future, 60-67, Singapore, 1994, World Scientific.

7) Ming Li and Paul Vitanyi . An Introduction to Kolmogorov Complexity and Its Applications (2nd Ed.). Springer Verlag.

17