370
Artificial Intelligence 2 (Künstliche Intelligenz 2) Part V: Probabilitstic Reasoning Michael Kohlhase Professur für Wissensrepräsentation und -verarbeitung Informatik, FAU Erlangen-Nürnberg http://kwarc.info July 12, 2018 Kohlhase: Künstliche Intelligenz 2 122 July 12, 2018

Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Embed Size (px)

Citation preview

Page 1: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Artificial Intelligence 2 (Künstliche Intelligenz 2)Part V: Probabilitstic Reasoning

Michael Kohlhase

Professur für Wissensrepräsentation und -verarbeitungInformatik, FAU Erlangen-Nürnberg

http://kwarc.info

July 12, 2018

Kohlhase: Künstliche Intelligenz 2 122 July 12, 2018

Page 2: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Chapter 7 Quantifying Uncertainty

Kohlhase: Künstliche Intelligenz 2 122 July 12, 2018

Page 3: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.1 Dealing with Uncertainty: Probabilities

Kohlhase: Künstliche Intelligenz 2 122 July 12, 2018

Page 4: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.1.1 Sources of Uncertainty

Kohlhase: Künstliche Intelligenz 2 122 July 12, 2018

Page 5: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Sources of Uncertainty in Decision-Making

Where’s that d. . .Wumpus?And where am I, anyway??

I Non-deterministic actions.

I “When I try to go forward in this dark cave, I might actually go forward-left orforward-right.”

I Partial observability with unreliable sensors.I “Did I feel a breeze right now?”;I “I think I might smell a Wumpus here, but I got a cold and my nose is blocked.”I “According to the heat scanner, the Wumpus is probably in cell [2,3].”

I Uncertainty about the domain behavior.I “Are you sure the Wumpus never moves?”

Kohlhase: Künstliche Intelligenz 2 123 July 12, 2018

Page 6: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Sources of Uncertainty in Decision-Making

Where’s that d. . .Wumpus?And where am I, anyway??

I Non-deterministic actions.I “When I try to go forward in this dark cave, I might actually go forward-left or

forward-right.”I Partial observability with unreliable sensors.

I “Did I feel a breeze right now?”;I “I think I might smell a Wumpus here, but I got a cold and my nose is blocked.”I “According to the heat scanner, the Wumpus is probably in cell [2,3].”

I Uncertainty about the domain behavior.I “Are you sure the Wumpus never moves?”

Kohlhase: Künstliche Intelligenz 2 123 July 12, 2018

Page 7: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Sources of Uncertainty in Decision-Making

Where’s that d. . .Wumpus?And where am I, anyway??

I Non-deterministic actions.I “When I try to go forward in this dark cave, I might actually go forward-left or

forward-right.”I Partial observability with unreliable sensors.I “Did I feel a breeze right now?”;I “I think I might smell a Wumpus here, but I got a cold and my nose is blocked.”I “According to the heat scanner, the Wumpus is probably in cell [2,3].”

I Uncertainty about the domain behavior.

I “Are you sure the Wumpus never moves?”

Kohlhase: Künstliche Intelligenz 2 123 July 12, 2018

Page 8: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Sources of Uncertainty in Decision-Making

Where’s that d. . .Wumpus?And where am I, anyway??

I Non-deterministic actions.I “When I try to go forward in this dark cave, I might actually go forward-left or

forward-right.”I Partial observability with unreliable sensors.I “Did I feel a breeze right now?”;I “I think I might smell a Wumpus here, but I got a cold and my nose is blocked.”I “According to the heat scanner, the Wumpus is probably in cell [2,3].”

I Uncertainty about the domain behavior.I “Are you sure the Wumpus never moves?”

Kohlhase: Künstliche Intelligenz 2 123 July 12, 2018

Page 9: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Unreliable Sensors

I Robot Localization: Suppose we want to support localization using landmarksto narrow down the area.

I “If you see the Eiffel tower, then you’re in Paris.”

I Difficulty: Sensors can be imprecise.I Even if a landmark is perceived, we cannot conclude with certainty that the robot is

at that location.I (“This is the half-scale Las Vegas copy, you dummy.”)I Even if a landmark is not perceived, we cannot conclude with certainty that the

robot is not at that location.I (“Top of Eiffel tower hidden in the clouds.”)

I Only the probability of being at a location increases or decreases.

Kohlhase: Künstliche Intelligenz 2 124 July 12, 2018

Page 10: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Unreliable Sensors

I Robot Localization: Suppose we want to support localization using landmarksto narrow down the area.

I “If you see the Eiffel tower, then you’re in Paris.”I Difficulty: Sensors can be imprecise.I Even if a landmark is perceived, we cannot conclude with certainty that the robot is

at that location.I (“This is the half-scale Las Vegas copy, you dummy.”)I Even if a landmark is not perceived, we cannot conclude with certainty that the

robot is not at that location.

I (“Top of Eiffel tower hidden in the clouds.”)I Only the probability of being at a location increases or decreases.

Kohlhase: Künstliche Intelligenz 2 124 July 12, 2018

Page 11: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Unreliable Sensors

I Robot Localization: Suppose we want to support localization using landmarksto narrow down the area.

I “If you see the Eiffel tower, then you’re in Paris.”I Difficulty: Sensors can be imprecise.I Even if a landmark is perceived, we cannot conclude with certainty that the robot is

at that location.I (“This is the half-scale Las Vegas copy, you dummy.”)I Even if a landmark is not perceived, we cannot conclude with certainty that the

robot is not at that location.I (“Top of Eiffel tower hidden in the clouds.”)

I Only the probability of being at a location increases or decreases.

Kohlhase: Künstliche Intelligenz 2 124 July 12, 2018

Page 12: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.1.2 Recap: Rational Agents as a ConceptualFramework

Kohlhase: Künstliche Intelligenz 2 124 July 12, 2018

Page 13: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Agents and Environments

I Definition 1.1. An agent is anything thatI perceives its environment via sensors (means of sensing the environment)I acts on it with actuators (means of changing the environment).

I Example 1.2. Agents include humans, robots, softbots, thermostats, etc.

Kohlhase: Künstliche Intelligenz 2 125 July 12, 2018

Page 14: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Agent Schema: Visualizing the Internal Agent Structure

I Agent Schema: We will use the following kind of schema to visualize theinternal structure of an agents:

Different agents differ on the contents of the white box in the center.

Kohlhase: Künstliche Intelligenz 2 126 July 12, 2018

Page 15: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Rationality

I Idea: Try to design agents that are successful (do the right thing)I Definition 1.3. A performance measure is a function that evaluates a sequence

of environments.I Example 1.4. A performance measure for the vacuum cleaner world couldI award one point per square cleaned up in time T?I award one point per clean square per time step, minus one per move?I penalize for > k dirty squares?

I Definition 1.5. An agent is called rational, if it chooses whichever actionmaximizes the expected value of the performance measure given the perceptsequence to date.

I Question: Why is rationality a good quality to aim for?

Kohlhase: Künstliche Intelligenz 2 127 July 12, 2018

Page 16: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Consequences of Rationality: Exploration, Learning,Autonomy

I Note: a rational need not be perfectI only needs to maximize expected value (Rational 6= omniscient)I need not predict e.g. very unlikely but catastrophic events in the future

I percepts may not supply all relevant information (Rational 6= clairvoyant)I if we cannot perceive things we do not need to react to them.I but we may need to try to find out about hidden dangers (exploration)

I action outcomes may not be as expected (rational 6= successful)I but we may need to take action to ensure that they do (more often) (learning)

I Rational =⇒ exploration, learning, autonomyI Definition 1.6. An agent is called autonomous, if it does not rely on the prior

knowledge of the designer.I Autonomy avoids fixed behaviors that can become unsuccessful in a changing

environment. (anything else would be irrational)I The agent has to learn all relevant traits, invariants, properties of the

environment and actions.

Kohlhase: Künstliche Intelligenz 2 128 July 12, 2018

Page 17: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

PEAS: Describing the Task Environment

I Observation: To design a rational agent, we must specify the task environmentin terms of performance measure, environment, actuators, and sensort, togethercalled the PEAS components.

I Example 1.7. designing an automated taxi:I Performance measure: safety, destination, profits, legality, comfort, . . .I Environment: US streets/freeways, traffic, pedestrians, weather, . . .I Actuators: steering, accelerator, brake, horn, speaker/display, . . .I Sensors: video, accelerometers, gauges, engine sensors, keyboard, GPS, . . .

I Example 1.8 (Internet Shopping Agent). The task environment:I Performance measure: price, quality, appropriateness, efficiencyI Environment: current and future WWW sites, vendors, shippersI Actuators: display to user, follow URL, fill in formI Sensors: HTML pages (text, graphics, scripts)

Kohlhase: Künstliche Intelligenz 2 129 July 12, 2018

Page 18: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Environment types

I Observation 1.9. The environment type largely determines the agent design.

I Problem: There is a vast number of possible environments in AI.

I Solution: Classify along a handful of “dimensions” (independent characteristics)I Definition 1.10. For an agent a we call an environment eI fully observable, iff the a’s sensors give it access to the complete state of the

environment at any point in time, else partially observable.I deterministic, iff the next state of the environment is completely determined by the

current state and a’s action, else stochastic.I episodic, iff a’a experience is divided into atomic episodes, where it perceives and

then performes a single action. Crucially the next episode does not depend onprevious ones. Non-episodic environments are called sequential.

I dynamic, iff the environment can change with out an action performed by a, elsestatic. If the environment does not change but a’s performance measure does, wecall e semidynamic.

I discrete, iff the sets of e’s states and a’s actions are countable, else continuous.I single-agent, iff only a acts on e (when must we count parts of e as agents?)

Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018

Page 19: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Environment types

I Example 1.11. Some environments classified:Solitaire Backgammon Internet shopping Taxi

observable Yes Yes No Nodeterministic Yes No Partly Noepisodic No No No Nostatic Yes Semi Semi Nodiscrete Yes Yes Yes Nosingle-agent Yes No Yes (except auctions) No

I Observation 1.12. The real world is (of course) partially observable,stochastic, sequential, dynamic, continuous, multi-agent (worst case for AI)

Kohlhase: Künstliche Intelligenz 2 131 July 12, 2018

Page 20: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Simple reflex agents

I Definition 1.13. A simple reflex agent is an agent a that only bases its actionson the last percept: fa : P → A.

I Agent Schema:I Example 1.14.

procedure Reflex−Vacuum−Agent [location,status] returns an action if status =Dirty then . . .

Kohlhase: Künstliche Intelligenz 2 132 July 12, 2018

Page 21: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Reflex agents with state

I Idea: Keep track of the state of the world we cannot see now in an internalmodel

I Definition 1.15. A stateful reflex agent (also called reflex agent with state ormodel-based agent) whose agent function depends on a model of the world(called the world model).

I

Kohlhase: Künstliche Intelligenz 2 133 July 12, 2018

Page 22: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.1.3 Agent Architectures based on Belief States

Kohlhase: Künstliche Intelligenz 2 133 July 12, 2018

Page 23: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Uncertainty

I Problem: We do not know with certainty what state the world is in!

I Idea: Just keep track of all the possible states it could be in.I Definition 1.16. A stateful reflex agent has a world model consisting ofI a belief state that has information about the possible states the world may be in andI a transition model that updates the belief state based on sensor information and

actions.

Idea: The agent environment determines what the world model can be.

II In a fully observable, deterministic environment,I we can observe the initial state and subsequent states are given by the actions alone.I thus the belief state is a singleton set (we call its member the world state) and the

transition model is a function from states and actions to states: a transition function.

Kohlhase: Künstliche Intelligenz 2 134 July 12, 2018

Page 24: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Uncertainty

I Problem: We do not know with certainty what state the world is in!

I Idea: Just keep track of all the possible states it could be in.I Definition 1.16. A stateful reflex agent has a world model consisting ofI a belief state that has information about the possible states the world may be in andI a transition model that updates the belief state based on sensor information and

actions.

Idea: The agent environment determines what the world model can be.

II In a fully observable, deterministic environment,I we can observe the initial state and subsequent states are given by the actions alone.I thus the belief state is a singleton set (we call its member the world state) and the

transition model is a function from states and actions to states: a transition function.

Kohlhase: Künstliche Intelligenz 2 134 July 12, 2018

Page 25: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Uncertainty

I Problem: We do not know with certainty what state the world is in!

I Idea: Just keep track of all the possible states it could be in.I Definition 1.16. A stateful reflex agent has a world model consisting ofI a belief state that has information about the possible states the world may be in andI a transition model that updates the belief state based on sensor information and

actions.

Idea: The agent environment determines what the world model can be.

II In a fully observable, deterministic environment,I we can observe the initial state and subsequent states are given by the actions alone.I thus the belief state is a singleton set (we call its member the world state) and the

transition model is a function from states and actions to states: a transition function.

Kohlhase: Künstliche Intelligenz 2 134 July 12, 2018

Page 26: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Uncertainty

I Problem: We do not know with certainty what state the world is in!

I Idea: Just keep track of all the possible states it could be in.I Definition 1.16. A stateful reflex agent has a world model consisting ofI a belief state that has information about the possible states the world may be in andI a transition model that updates the belief state based on sensor information and

actions.

Idea: The agent environment determines what the world model can be.

II In a fully observable, deterministic environment,I we can observe the initial state and subsequent states are given by the actions alone.I thus the belief state is a singleton set (we call its member the world state) and the

transition model is a function from states and actions to states: a transition function.

Kohlhase: Künstliche Intelligenz 2 134 July 12, 2018

Page 27: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models by Agent Type

I Note: All of these considerations only give requirements to the world modelWhat we can do with it depends on representation and inference.

I Search-based Agents: In a fully observable, deterministic environmentworld state = “current state”no inference.

I CSP-based Agents: In a fully observable, deterministic environmentworld state = constraint networkinference = constraint propagation.

I Logic-based Agents: In a fully observable, deterministic environmentworld state = logical formulainference = e.g. DPLL or resolution.

I Planning Agents: In a fully observable, deterministic, environmentworld state = PL0, transition model = Strips,inference = state/plan space search.

Kohlhase: Künstliche Intelligenz 2 135 July 12, 2018

Page 28: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models by Agent Type

I Note: All of these considerations only give requirements to the world modelWhat we can do with it depends on representation and inference.

I Search-based Agents: In a fully observable, deterministic environmentworld state = “current state”no inference.

I CSP-based Agents: In a fully observable, deterministic environmentworld state = constraint networkinference = constraint propagation.

I Logic-based Agents: In a fully observable, deterministic environmentworld state = logical formulainference = e.g. DPLL or resolution.

I Planning Agents: In a fully observable, deterministic, environmentworld state = PL0, transition model = Strips,inference = state/plan space search.

Kohlhase: Künstliche Intelligenz 2 135 July 12, 2018

Page 29: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models by Agent Type

I Note: All of these considerations only give requirements to the world modelWhat we can do with it depends on representation and inference.

I Search-based Agents: In a fully observable, deterministic environmentworld state = “current state”no inference.

I CSP-based Agents: In a fully observable, deterministic environmentworld state = constraint networkinference = constraint propagation.

I Logic-based Agents: In a fully observable, deterministic environmentworld state = logical formulainference = e.g. DPLL or resolution.

I Planning Agents: In a fully observable, deterministic, environmentworld state = PL0, transition model = Strips,inference = state/plan space search.

Kohlhase: Künstliche Intelligenz 2 135 July 12, 2018

Page 30: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models by Agent Type

I Note: All of these considerations only give requirements to the world modelWhat we can do with it depends on representation and inference.

I Search-based Agents: In a fully observable, deterministic environmentworld state = “current state”no inference.

I CSP-based Agents: In a fully observable, deterministic environmentworld state = constraint networkinference = constraint propagation.

I Logic-based Agents: In a fully observable, deterministic environmentworld state = logical formulainference = e.g. DPLL or resolution.

I Planning Agents: In a fully observable, deterministic, environmentworld state = PL0, transition model = Strips,inference = state/plan space search.

Kohlhase: Künstliche Intelligenz 2 135 July 12, 2018

Page 31: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models by Agent Type

I Note: All of these considerations only give requirements to the world modelWhat we can do with it depends on representation and inference.

I Search-based Agents: In a fully observable, deterministic environmentworld state = “current state”no inference.

I CSP-based Agents: In a fully observable, deterministic environmentworld state = constraint networkinference = constraint propagation.

I Logic-based Agents: In a fully observable, deterministic environmentworld state = logical formulainference = e.g. DPLL or resolution.

I Planning Agents: In a fully observable, deterministic, environmentworld state = PL0, transition model = Strips,inference = state/plan space search.

Kohlhase: Künstliche Intelligenz 2 135 July 12, 2018

Page 32: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Complex Environments

I In a fully observable, but stochastic environment,I the belief state must deal with a set of possible statesI generalize the transition function to a transition relation

I Note: this even applies for online problem solving we can just perceive the state.(e.g. when we want to optimize utility)

I In a deterministic, but partially observable environment,I the belief state must deal with a set of possible states.I we can use transition functions.I We need a sensor model, which predicts the influence of percepts on the belief state

– during update.I In a stochastic partially observable environment,I mix the ideas from the last two. (sensor model + transition relation)

Kohlhase: Künstliche Intelligenz 2 136 July 12, 2018

Page 33: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Complex Environments

I In a fully observable, but stochastic environment,I the belief state must deal with a set of possible statesI generalize the transition function to a transition relation

I Note: this even applies for online problem solving we can just perceive the state.(e.g. when we want to optimize utility)

I In a deterministic, but partially observable environment,I the belief state must deal with a set of possible states.I we can use transition functions.I We need a sensor model, which predicts the influence of percepts on the belief state

– during update.I In a stochastic partially observable environment,I mix the ideas from the last two. (sensor model + transition relation)

Kohlhase: Künstliche Intelligenz 2 136 July 12, 2018

Page 34: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Complex Environments

I In a fully observable, but stochastic environment,I the belief state must deal with a set of possible statesI generalize the transition function to a transition relation

I Note: this even applies for online problem solving we can just perceive the state.(e.g. when we want to optimize utility)

I In a deterministic, but partially observable environment,I the belief state must deal with a set of possible states.I we can use transition functions.I We need a sensor model, which predicts the influence of percepts on the belief state

– during update.

I In a stochastic partially observable environment,I mix the ideas from the last two. (sensor model + transition relation)

Kohlhase: Künstliche Intelligenz 2 136 July 12, 2018

Page 35: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

World Models for Complex Environments

I In a fully observable, but stochastic environment,I the belief state must deal with a set of possible statesI generalize the transition function to a transition relation

I Note: this even applies for online problem solving we can just perceive the state.(e.g. when we want to optimize utility)

I In a deterministic, but partially observable environment,I the belief state must deal with a set of possible states.I we can use transition functions.I We need a sensor model, which predicts the influence of percepts on the belief state

– during update.I In a stochastic partially observable environment,I mix the ideas from the last two. (sensor model + transition relation)

Kohlhase: Künstliche Intelligenz 2 136 July 12, 2018

Page 36: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Preview: New World Models (Belief) ; new Agent Types

I Probabilistic Agents: In a partially observable, belief model = Bayesiannetworks, inference = probabilistic inference.

I Decision-Theoretic Agents: In a partially observable, stochastic, belief model +transition model = Decision networks, inference = MEU.

Kohlhase: Künstliche Intelligenz 2 137 July 12, 2018

Page 37: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Preview: New World Models (Belief) ; new Agent Types

I Probabilistic Agents: In a partially observable, belief model = Bayesiannetworks, inference = probabilistic inference.

I Decision-Theoretic Agents: In a partially observable, stochastic, belief model +transition model = Decision networks, inference = MEU.

Kohlhase: Künstliche Intelligenz 2 137 July 12, 2018

Page 38: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.1.4 Modeling Uncertainty

Kohlhase: Künstliche Intelligenz 2 137 July 12, 2018

Page 39: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Wumpus World Revisited

I Recall: We have updated agents with world/transition models with possibleworlds.

I Problem: But pure sets of possible worlds are not enoughI Example 1.17 (The Wumpus is Back).

I We have a maze with pits that are detected inneighboring squares via breeze (Wumpus andgold will not be assumed now).

I Where does the agent should go, if there isbreeze at (1,2) and (2,1)?

I Problem: (1.3), (2,2), and (3.1) are all unsafe!(there are possible worlds with wumpus in any ofthem)

I Idea: We need world models that estimate the wumpus-likelyhood in cells!

Kohlhase: Künstliche Intelligenz 2 138 July 12, 2018

Page 40: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic

I Diagnosis: We want to build an expert dental diagnosis system, that deducesthe cause (the disease) from the symptoms.

I Can we base this on logic?

I Attempt 1: Say we have a toothache. How’s about:

∀p Symptom(p, toothache)⇒Disease(p, cavity)

I Is this rule correct?I No, toothaches may have different causes (“cavity” = “Loch im Zahn”).

I Attempt 2: So what about this:

∀p Symptom(p, toothache)⇒Disease(p, cavity)∨Disease(p, gingivitis)∨ . . .

I We don’t know all possible causes.I And we’d like to be able to deduce which causes are more plausible!

Kohlhase: Künstliche Intelligenz 2 139 July 12, 2018

Page 41: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic

I Diagnosis: We want to build an expert dental diagnosis system, that deducesthe cause (the disease) from the symptoms.

I Can we base this on logic?I Attempt 1: Say we have a toothache. How’s about:

∀p Symptom(p, toothache)⇒Disease(p, cavity)

I Is this rule correct?

I No, toothaches may have different causes (“cavity” = “Loch im Zahn”).I Attempt 2: So what about this:

∀p Symptom(p, toothache)⇒Disease(p, cavity)∨Disease(p, gingivitis)∨ . . .

I We don’t know all possible causes.I And we’d like to be able to deduce which causes are more plausible!

Kohlhase: Künstliche Intelligenz 2 139 July 12, 2018

Page 42: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic

I Diagnosis: We want to build an expert dental diagnosis system, that deducesthe cause (the disease) from the symptoms.

I Can we base this on logic?I Attempt 1: Say we have a toothache. How’s about:

∀p Symptom(p, toothache)⇒Disease(p, cavity)

I Is this rule correct?I No, toothaches may have different causes (“cavity” = “Loch im Zahn”).

I Attempt 2: So what about this:

∀p Symptom(p, toothache)⇒Disease(p, cavity)∨Disease(p, gingivitis)∨ . . .

I We don’t know all possible causes.I And we’d like to be able to deduce which causes are more plausible!

Kohlhase: Künstliche Intelligenz 2 139 July 12, 2018

Page 43: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic, ctd.

I Attempt 3: Perhaps a causal rule is better?

∀p Disease(p, cavity)⇒ Symptom(p, toothache)

I Is this rule correct?

I No, not all cavities cause toothaches.I Does this rule allow to deduce a cause from a symptom?I No, setting Symptom(p, toothache) to true here has no consequence on the truth of

Disease(p, cavity).I Note: If Symptom(p, toothache) is false, we would conclude ¬Disease(p, cavity)

. . . which would be incorrect, cf. previous question.I Anyway, this still doesn’t allow to compare the plausibility of different causes.I Logic does not allow to weigh different alternatives, and it does not allow to express

incomplete knowledge (“cavity does not always come with a toothache, nor viceversa”).

Kohlhase: Künstliche Intelligenz 2 140 July 12, 2018

Page 44: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic, ctd.

I Attempt 3: Perhaps a causal rule is better?

∀p Disease(p, cavity)⇒ Symptom(p, toothache)

I Is this rule correct?I No, not all cavities cause toothaches.I Does this rule allow to deduce a cause from a symptom?

I No, setting Symptom(p, toothache) to true here has no consequence on the truth ofDisease(p, cavity).

I Note: If Symptom(p, toothache) is false, we would conclude ¬Disease(p, cavity). . . which would be incorrect, cf. previous question.

I Anyway, this still doesn’t allow to compare the plausibility of different causes.I Logic does not allow to weigh different alternatives, and it does not allow to express

incomplete knowledge (“cavity does not always come with a toothache, nor viceversa”).

Kohlhase: Künstliche Intelligenz 2 140 July 12, 2018

Page 45: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic, ctd.

I Attempt 3: Perhaps a causal rule is better?

∀p Disease(p, cavity)⇒ Symptom(p, toothache)

I Is this rule correct?I No, not all cavities cause toothaches.I Does this rule allow to deduce a cause from a symptom?I No, setting Symptom(p, toothache) to true here has no consequence on the truth of

Disease(p, cavity).

I Note: If Symptom(p, toothache) is false, we would conclude ¬Disease(p, cavity). . . which would be incorrect, cf. previous question.

I Anyway, this still doesn’t allow to compare the plausibility of different causes.I Logic does not allow to weigh different alternatives, and it does not allow to express

incomplete knowledge (“cavity does not always come with a toothache, nor viceversa”).

Kohlhase: Künstliche Intelligenz 2 140 July 12, 2018

Page 46: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic, ctd.

I Attempt 3: Perhaps a causal rule is better?

∀p Disease(p, cavity)⇒ Symptom(p, toothache)

I Is this rule correct?I No, not all cavities cause toothaches.I Does this rule allow to deduce a cause from a symptom?I No, setting Symptom(p, toothache) to true here has no consequence on the truth of

Disease(p, cavity).I Note: If Symptom(p, toothache) is false, we would conclude ¬Disease(p, cavity)

. . . which would be incorrect, cf. previous question.

I Anyway, this still doesn’t allow to compare the plausibility of different causes.I Logic does not allow to weigh different alternatives, and it does not allow to express

incomplete knowledge (“cavity does not always come with a toothache, nor viceversa”).

Kohlhase: Künstliche Intelligenz 2 140 July 12, 2018

Page 47: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Logic, ctd.

I Attempt 3: Perhaps a causal rule is better?

∀p Disease(p, cavity)⇒ Symptom(p, toothache)

I Is this rule correct?I No, not all cavities cause toothaches.I Does this rule allow to deduce a cause from a symptom?I No, setting Symptom(p, toothache) to true here has no consequence on the truth of

Disease(p, cavity).I Note: If Symptom(p, toothache) is false, we would conclude ¬Disease(p, cavity)

. . . which would be incorrect, cf. previous question.I Anyway, this still doesn’t allow to compare the plausibility of different causes.I Logic does not allow to weigh different alternatives, and it does not allow to express

incomplete knowledge (“cavity does not always come with a toothache, nor viceversa”).

Kohlhase: Künstliche Intelligenz 2 140 July 12, 2018

Page 48: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Beliefs and Probabilities

I What do we model with probabilities?

I Incomplete knowledge! We are not 100% sure, but we believe to a certaindegree that something is true.

I Probability ≈ Our degree of belief, given our current knowledge.I Example 1.18 (Diagnosis).I Symptom(p, toothache)⇒Disease(p, cavity) with 80% probability.I But, for any given p, in reality we do, or do not, have cavity: 1 or 0!I The “probability” depends on our knowledge! The “80%” refers to the fraction of

cavity, within the set of all p′ that are indistinguishable from p based on ourknowledge.

I If we receive new knowledge (e.g., Disease(p, gingivitis)), the probability changes!

I Probabilities represent and measure the uncertainty that stems from lack ofknowledge.

Kohlhase: Künstliche Intelligenz 2 141 July 12, 2018

Page 49: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Beliefs and Probabilities

I What do we model with probabilities?I Incomplete knowledge! We are not 100% sure, but we believe to a certain

degree that something is true.I Probability ≈ Our degree of belief, given our current knowledge.I Example 1.18 (Diagnosis).I Symptom(p, toothache)⇒Disease(p, cavity) with 80% probability.

I But, for any given p, in reality we do, or do not, have cavity: 1 or 0!I The “probability” depends on our knowledge! The “80%” refers to the fraction of

cavity, within the set of all p′ that are indistinguishable from p based on ourknowledge.

I If we receive new knowledge (e.g., Disease(p, gingivitis)), the probability changes!

I Probabilities represent and measure the uncertainty that stems from lack ofknowledge.

Kohlhase: Künstliche Intelligenz 2 141 July 12, 2018

Page 50: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Beliefs and Probabilities

I What do we model with probabilities?I Incomplete knowledge! We are not 100% sure, but we believe to a certain

degree that something is true.I Probability ≈ Our degree of belief, given our current knowledge.I Example 1.18 (Diagnosis).I Symptom(p, toothache)⇒Disease(p, cavity) with 80% probability.I But, for any given p, in reality we do, or do not, have cavity: 1 or 0!

I The “probability” depends on our knowledge! The “80%” refers to the fraction ofcavity, within the set of all p′ that are indistinguishable from p based on ourknowledge.

I If we receive new knowledge (e.g., Disease(p, gingivitis)), the probability changes!

I Probabilities represent and measure the uncertainty that stems from lack ofknowledge.

Kohlhase: Künstliche Intelligenz 2 141 July 12, 2018

Page 51: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Beliefs and Probabilities

I What do we model with probabilities?I Incomplete knowledge! We are not 100% sure, but we believe to a certain

degree that something is true.I Probability ≈ Our degree of belief, given our current knowledge.I Example 1.18 (Diagnosis).I Symptom(p, toothache)⇒Disease(p, cavity) with 80% probability.I But, for any given p, in reality we do, or do not, have cavity: 1 or 0!I The “probability” depends on our knowledge! The “80%” refers to the fraction of

cavity, within the set of all p′ that are indistinguishable from p based on ourknowledge.

I If we receive new knowledge (e.g., Disease(p, gingivitis)), the probability changes!

I Probabilities represent and measure the uncertainty that stems from lack ofknowledge.

Kohlhase: Künstliche Intelligenz 2 141 July 12, 2018

Page 52: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

How to Obtain Probabilities?

I Assessing probabilities through statistics:I The agent is 90% convinced by its sensor information := in 9 out of 10 cases, the

information is correct.I Disease(p, cavity)⇒ Symptom(p, toothache) with 80% probability := 8 out of 10

persons with a cavity have toothache.I The process of estimating a probability P using statistics is called assessing P.

I Assessing even a single P can require huge effort! (Eg. “The likelihood ofmaking it to the university within 10 minutes”)

I What is probabilistic reasoning? Deducing probabilities from knowledgeabout other probabilities.

I Probabilistic reasoning determines, based on probabilities that are (relatively)easy to assess, probabilities that are difficult to assess.

Kohlhase: Künstliche Intelligenz 2 142 July 12, 2018

Page 53: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

How to Obtain Probabilities?

I Assessing probabilities through statistics:I The agent is 90% convinced by its sensor information := in 9 out of 10 cases, the

information is correct.I Disease(p, cavity)⇒ Symptom(p, toothache) with 80% probability := 8 out of 10

persons with a cavity have toothache.I The process of estimating a probability P using statistics is called assessing P.I Assessing even a single P can require huge effort!

(Eg. “The likelihood ofmaking it to the university within 10 minutes”)

I What is probabilistic reasoning? Deducing probabilities from knowledgeabout other probabilities.

I Probabilistic reasoning determines, based on probabilities that are (relatively)easy to assess, probabilities that are difficult to assess.

Kohlhase: Künstliche Intelligenz 2 142 July 12, 2018

Page 54: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

How to Obtain Probabilities?

I Assessing probabilities through statistics:I The agent is 90% convinced by its sensor information := in 9 out of 10 cases, the

information is correct.I Disease(p, cavity)⇒ Symptom(p, toothache) with 80% probability := 8 out of 10

persons with a cavity have toothache.I The process of estimating a probability P using statistics is called assessing P.I Assessing even a single P can require huge effort! (Eg. “The likelihood of

making it to the university within 10 minutes”)I What is probabilistic reasoning? Deducing probabilities from knowledge

about other probabilities.I Probabilistic reasoning determines, based on probabilities that are (relatively)

easy to assess, probabilities that are difficult to assess.

Kohlhase: Künstliche Intelligenz 2 142 July 12, 2018

Page 55: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.1.5 Acting under Uncertainty

Kohlhase: Künstliche Intelligenz 2 142 July 12, 2018

Page 56: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Decision-Making Under Uncertainty

I Example 1.19. Giving a lecture:I Goal: Be in HS002 at 10:15 to give a lecture.

I Possible plans:I P1: Get up at 8:00, leave at 8:40, arrive at 9:00.

I P2: Get up at 9:50, leave at 10:05, arrive at 10:15.I Decision: Both plans are correct, but P2 succeeds only with probability 50%, and

giving a lecture is important, so P1 is the plan of choice.I Better Example: Which train to take to Frankfurt airport?

Kohlhase: Künstliche Intelligenz 2 143 July 12, 2018

Page 57: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Decision-Making Under Uncertainty

I Example 1.19. Giving a lecture:I Goal: Be in HS002 at 10:15 to give a lecture.I Possible plans:I P1: Get up at 8:00, leave at 8:40, arrive at 9:00.I P2: Get up at 9:50, leave at 10:05, arrive at 10:15.

I Decision: Both plans are correct, but P2 succeeds only with probability 50%, andgiving a lecture is important, so P1 is the plan of choice.

I Better Example: Which train to take to Frankfurt airport?

Kohlhase: Künstliche Intelligenz 2 143 July 12, 2018

Page 58: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Decision-Making Under Uncertainty

I Example 1.19. Giving a lecture:I Goal: Be in HS002 at 10:15 to give a lecture.I Possible plans:I P1: Get up at 8:00, leave at 8:40, arrive at 9:00.I P2: Get up at 9:50, leave at 10:05, arrive at 10:15.

I Decision: Both plans are correct, but P2 succeeds only with probability 50%, andgiving a lecture is important, so P1 is the plan of choice.

I Better Example: Which train to take to Frankfurt airport?

Kohlhase: Künstliche Intelligenz 2 143 July 12, 2018

Page 59: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Decision-Making Under Uncertainty

I Example 1.19. Giving a lecture:I Goal: Be in HS002 at 10:15 to give a lecture.I Possible plans:I P1: Get up at 8:00, leave at 8:40, arrive at 9:00.I P2: Get up at 9:50, leave at 10:05, arrive at 10:15.

I Decision: Both plans are correct, but P2 succeeds only with probability 50%, andgiving a lecture is important, so P1 is the plan of choice.

I Better Example: Which train to take to Frankfurt airport?

Kohlhase: Künstliche Intelligenz 2 143 July 12, 2018

Page 60: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Uncertainty and Rational Decisions

I Here: We’re only concerned with deducing the likelihood of facts, not withaction choice. In general, selecting actions is of course important.

I Rational Agents:I We have a choice of actions (go to FRA early, go to FRA just in time).I These can lead to different solutions with different probabilities.I The actions have different costs.I The results have different utilities (safe timing/dislike airport food).

I A rational agent chooses the action with the maximum expected utility.I Decision Theory = Utility Theory + Probability Theory.

Kohlhase: Künstliche Intelligenz 2 144 July 12, 2018

Page 61: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utility-based agents

I Definition 1.20. A utility-based agent uses a world model along with a utilityfunction that influences its preferences among the states of that world. Itchooses the action that leads to the best expected utility, which is computed byaveraging over all possible outcome states, weighted by the probability of theoutcome.

I

Kohlhase: Künstliche Intelligenz 2 145 July 12, 2018

Page 62: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utility-based agents

I A utility function allows rational decisions where mere goals are inadequateI conflicting goals (utility gives tradeoff to make rational decisions)I goals obtainable by uncertain actions (utility * likelyhood helps)

Kohlhase: Künstliche Intelligenz 2 146 July 12, 2018

Page 63: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Decision-Theoretic Agent

I A particular kind of utility-based agent:

13 QUANTIFYINGUNCERTAINTY

function DT-AGENT(percept ) returns anactionpersistent: belief state , probabilistic beliefs about the current state of the world

action , the agent’s action

updatebelief state based onaction andperceptcalculate outcome probabilities for actions,

given action descriptions and currentbelief stateselectaction with highest expected utility

given probabilities of outcomes and utility informationreturn action

Figure 13.1 A decision-theoretic agent that selects rational actions.

32

Kohlhase: Künstliche Intelligenz 2 147 July 12, 2018

Page 64: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.1.6 Agenda for this Chapter: Basics of ProbabilityTheory

Kohlhase: Künstliche Intelligenz 2 147 July 12, 2018

Page 65: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Our Agenda for This Topic

I Our treatment of the topic “Probabilistic Reasoning” consists of this Chapterand the next.I This Chapter: All the basic machinery at use in Bayesian networks.I Chapter 8: Bayesian networks: What they are, how to build them, how to use them.

I Bayesian networks are the most wide-spread and successful practical frameworkfor probabilistic reasoning.

Kohlhase: Künstliche Intelligenz 2 148 July 12, 2018

Page 66: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Our Agenda for This Chapter

I Unconditional Probabilities and Conditional Probabilities: Which conceptsand properties of probabilities will be used?I Mostly a recap of things you’re familiar with from school.

I Independence and Basic Probabilistic Reasoning Methods: What simplemethods are there to avoid enumeration and to deduce probabilities from otherprobabilities?I A basic tool set we’ll need. (Still familiar from school?)

I Bayes’ Rule: What’s that “Bayes”? How is it used and why is it important?I The basic insight about how to invert the “direction” of conditional probabilities.

I Conditional Independence: How to capture and exploit complex relationsbetween random variables?I Explains the difficulties arising when using Bayes’ rule on multiple evidences.

Conditional independence is used to ameliorate these difficulties.

Kohlhase: Künstliche Intelligenz 2 149 July 12, 2018

Page 67: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.2 Unconditional Probabilities

Kohlhase: Künstliche Intelligenz 2 149 July 12, 2018

Page 68: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

1 EdN:1

Kohlhase: Künstliche Intelligenz 2 150 July 12, 2018

Page 69: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Probabilistic Models

I Definition 2.1. A probability theory is an assertion language for talking aboutpossible worlds and an inference method for quantifying the degree of belief insuch assertions.

I Remark: Like logic, but for non-binary belief degree.

I The possible worlds are mutually exclusive: possible worlds cannot both be thecase and exhaustive: one possible world must be the case.

I This determines the set of possible worldsI Example 2.2. If we roll two (distinguishable) dice with six sides, then we have

36 possible worlds: (1, 1), (2, 1), . . . , (6, 6).I We will restrict ourselves to a discrete, countable sample space. (others more

complicated, less useful in AI)I Definition 2.3. A probabiltiy model 〈Ω,P〉 consists of a set Ω of possible

worlds called the sample space and a probability function P : Ω→ R, such that0≤P(ω)≤1 for all ω ∈ Ω and

∑ω∈Ω P(ω) = 1.

Kohlhase: Künstliche Intelligenz 2 150 July 12, 2018

Page 70: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Unconditional Probabilities, Random Variables, and Events

I Definition 2.4. A random variable (also called random quantity, aleatoryvariable, or stochastic variable) is a variable quantity whose value depends onpossible outcomes of unknown variables and processes we do not understand.

I Definition 2.5. We will refer to the fact X = x as an outcome and a set ofoutcomes as an event.

I The notation uppercase “X ” for a variable, and lowercase “x” for one of itsvalues will be used frequently. (Follows Russel/Norvig)

I Definition 2.6. Given a random variable X , P(X = x) denotes the priorprobability, or unconditional probability, that X has value x in the absence ofany other information.

I Example 2.7. P(Cavity = T) = 0.2, where Cavity is a random variable whosevalue is true iff some given person has a cavity.

Kohlhase: Künstliche Intelligenz 2 151 July 12, 2018

Page 71: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Types of Random Variables

I Note: In general, random variables can have arbitrary domains. Here, weconsider finite-domain random variables only, and Boolean random variablesmost of the time.

I Example 2.8.

P(Weather = sunny) = 0.7P(Weather = rain) = 0.2

P(Weather = cloudy) = 0.08P(Weather = snow) = 0.02P(Headache = T) = 0.1

I Unlike us, Russel and Norvig live in California . . . :-( :-(I Convenience Notations:I By convention, we denote Boolean random variables with A, B, and more general

finite-domain random variables with X , Y .I For Boolean variable Name, we write name for Name = T and ¬ name for

Name = F. (Follows Russel/Norvig)

Kohlhase: Künstliche Intelligenz 2 152 July 12, 2018

Page 72: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Probability Distributions

I Definition 2.9. The probability distribution for a random variable X , writtenP(X ), is the vector of probabilities for the (ordered) domain of X .

I Example 2.10. Probability distributions for finite-domain and Boolean randomvariables

P(Headache) = 〈0.1, 0.9〉P(Weather) = 〈0.7, 0.2, 0.08, 0.02〉

define the probability distribution for the random variables Headache andWeather.

I Definition 2.11. Given a subset Z⊆X1, . . . ,Xn of random variables, an eventis an assignment of values to the variables in Z. The joint probabilitydistribution, written P(Z), lists the probabilities of all events.

I Example 2.12. P(Headache,Weather) isHeadache = T Headache = F

Weather = sunny P(W = sunny∧ headache) P(W = sunny∧¬ headache)

Weather = rainWeather = cloudyWeather = snow

Kohlhase: Künstliche Intelligenz 2 153 July 12, 2018

Page 73: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Full Joint Probability Distribution

I Definition 2.13. Given random variables X1, . . . ,Xn, an atomic event is anassignment of values to all variables.

I Example 2.14. If A and B are Boolean random variables, then we have 4atomic events: a∧ b, a∧¬ b, ¬ a∧ b, ¬ a∧¬ b.

I Definition 2.15. Given random variables X1, . . . ,Xn, the full joint probabilitydistribution, denoted P(X1, . . . ,Xn), lists the probabilities of all atomic events.

I Example 2.16. P(Cavity ,Toothache)

toothache ¬ toothachecavity 0.12 0.08¬ cavity 0.08 0.72

I All atomic events are disjoint (their pairwise conjunctions all are ⊥); the sum ofall fields is 1 (corresponds to their disjunction >).

Kohlhase: Künstliche Intelligenz 2 154 July 12, 2018

Page 74: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Probabilities of Propositional Formulas

I Definition 2.17. Given random variables X1, . . . ,Xn, a propositional formula,short proposition, is a propositional formula over the atoms Xi = xi where xi is avalue in the domain of Xi .A function P that maps propositions into [0, 1] is a probability measure if(i) P(>) = 1 and(ii) for all propositions A, P(A) =

∑e|=A P(e) where e is an atomic event.

I Propositions represent sets of atomic events: the interpretations satisfying theformula.

I Example 2.18. P(cavity∧ toothache) = 0.12 is the probability that some givenperson has both a cavity and a toothache. (Note the use of cavity forCavity = T and toothache for Toothache = T.)

I Notes:I Instead of P(a∧ b), we often write P(a, b).I Propositions can be viewed as Boolean random variables; we will denote them with

A, B as well.

Kohlhase: Künstliche Intelligenz 2 155 July 12, 2018

Page 75: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Theorem 2.19 (Kolmogorow). A function P that maps propositions into[0, 1] is a probability measure if and only ifi P(>) = 1 and

ii’ for all propositions A, B: P(a∨ b) = P(a) + P(b)− P(a∧ b).I We can equivalently replace

ii for all propositions A, P(A) =∑

I |=A P(I ) (c.f. previous slide) with Kolmogorow’s(ii’).

1. Question!: Assume we haveiii P(⊥) = 0.How to derive from (i), (ii’), and (iii) that, for all propositions A, P(¬ a) = 1−P(a)?

1.1 By (i), P(>) = 1; as (a∨¬ a)⇔>, we get P(a∨¬ a) = 1.1.2 By (iii), P(⊥) = 0; as (a∧¬ a)⇔⊥, we get P(a∧¬ a) = 0.1.3 Inserting this into (ii’), we get P(a∨¬ a) = 1 = P(a) + P(¬ a)− 0.

Kohlhase: Künstliche Intelligenz 2 156 July 12, 2018

Page 76: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Theorem 2.19 (Kolmogorow). A function P that maps propositions into[0, 1] is a probability measure if and only ifi P(>) = 1 and

ii’ for all propositions A, B: P(a∨ b) = P(a) + P(b)− P(a∧ b).I We can equivalently replace

ii for all propositions A, P(A) =∑

I |=A P(I ) (c.f. previous slide) with Kolmogorow’s(ii’).

1. Question!: Assume we haveiii P(⊥) = 0.How to derive from (i), (ii’), and (iii) that, for all propositions A, P(¬ a) = 1−P(a)?

1.1 By (i), P(>) = 1; as (a∨¬ a)⇔>, we get P(a∨¬ a) = 1.1.2 By (iii), P(⊥) = 0; as (a∧¬ a)⇔⊥, we get P(a∧¬ a) = 0.1.3 Inserting this into (ii’), we get P(a∨¬ a) = 1 = P(a) + P(¬ a)− 0.

Kohlhase: Künstliche Intelligenz 2 156 July 12, 2018

Page 77: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire, ctd.

I Reminder 1: (i) P(>) = 1; (ii’) P(a∨ b) = P(a) + P(b)− P(a∧ b).I Reminder 2: “Probabilities model our belief.”I If P represents an objectively observable probability, the axioms clearly make sense.

But why should an agent respect these axioms, when modeling its subjective ownbelief?

Question: Do you believe in Kolmogorow’s axioms?

II You’re free to believe whatever you want, but note this [deFinetti:sssdp31]: Ifan agent has a belief that violates Kolmogorov’s axioms, then there exists acombination of “bets” on propositions so that the agent always looses money.

I If your beliefs are contradictory, then you will not be successful in the long run(and even the next minute if your opponent is clever).

Kohlhase: Künstliche Intelligenz 2 157 July 12, 2018

Page 78: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire, ctd.

I Reminder 1: (i) P(>) = 1; (ii’) P(a∨ b) = P(a) + P(b)− P(a∧ b).I Reminder 2: “Probabilities model our belief.”I If P represents an objectively observable probability, the axioms clearly make sense.

But why should an agent respect these axioms, when modeling its subjective ownbelief?

Question: Do you believe in Kolmogorow’s axioms?

II You’re free to believe whatever you want, but note this [deFinetti:sssdp31]: Ifan agent has a belief that violates Kolmogorov’s axioms, then there exists acombination of “bets” on propositions so that the agent always looses money.

I If your beliefs are contradictory, then you will not be successful in the long run(and even the next minute if your opponent is clever).

Kohlhase: Künstliche Intelligenz 2 157 July 12, 2018

Page 79: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.3 Conditional Probabilities

Kohlhase: Künstliche Intelligenz 2 157 July 12, 2018

Page 80: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Probabilities: Intuition

I Do probabilities change as we gather new knowledge?

I Yes! Probabilities model our belief, thus they depend on our knowledge.I Example 3.1. Your “probability of missing the connection train” increases when

you are informed that your current train has 30 minutes delay.I Example 3.2. The “probability of cavity” increases when the doctor is informed

that the patient has a toothache.I In the presence of additional information, we can no longer use the unconditional

(prior!) probabilities.

I Given propositions A and B, P(a | b) denotes the conditional probability of a(i.e., A = T) given that all we know is b (i.e., B = T).

I Example 3.3. P(cavity) = 0.2 vs. P(cavity | toothache) = 0.6. AndP(cavity | toothache∧¬ cavity) = 0

Kohlhase: Künstliche Intelligenz 2 158 July 12, 2018

Page 81: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Probabilities: Intuition

I Do probabilities change as we gather new knowledge?I Yes! Probabilities model our belief, thus they depend on our knowledge.I Example 3.1. Your “probability of missing the connection train” increases when

you are informed that your current train has 30 minutes delay.I Example 3.2. The “probability of cavity” increases when the doctor is informed

that the patient has a toothache.

I In the presence of additional information, we can no longer use the unconditional(prior!) probabilities.

I Given propositions A and B, P(a | b) denotes the conditional probability of a(i.e., A = T) given that all we know is b (i.e., B = T).

I Example 3.3. P(cavity) = 0.2 vs. P(cavity | toothache) = 0.6. AndP(cavity | toothache∧¬ cavity) = 0

Kohlhase: Künstliche Intelligenz 2 158 July 12, 2018

Page 82: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Probabilities: Intuition

I Do probabilities change as we gather new knowledge?I Yes! Probabilities model our belief, thus they depend on our knowledge.I Example 3.1. Your “probability of missing the connection train” increases when

you are informed that your current train has 30 minutes delay.I Example 3.2. The “probability of cavity” increases when the doctor is informed

that the patient has a toothache.I In the presence of additional information, we can no longer use the unconditional

(prior!) probabilities.

I Given propositions A and B, P(a | b) denotes the conditional probability of a(i.e., A = T) given that all we know is b (i.e., B = T).

I Example 3.3. P(cavity) = 0.2 vs. P(cavity | toothache) = 0.6. AndP(cavity | toothache∧¬ cavity) = 0

Kohlhase: Künstliche Intelligenz 2 158 July 12, 2018

Page 83: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Probabilities: Intuition

I Do probabilities change as we gather new knowledge?I Yes! Probabilities model our belief, thus they depend on our knowledge.I Example 3.1. Your “probability of missing the connection train” increases when

you are informed that your current train has 30 minutes delay.I Example 3.2. The “probability of cavity” increases when the doctor is informed

that the patient has a toothache.I In the presence of additional information, we can no longer use the unconditional

(prior!) probabilities.

I Given propositions A and B, P(a | b) denotes the conditional probability of a(i.e., A = T) given that all we know is b (i.e., B = T).

I Example 3.3. P(cavity) = 0.2 vs. P(cavity | toothache) = 0.6. AndP(cavity | toothache∧¬ cavity) = 0

Kohlhase: Künstliche Intelligenz 2 158 July 12, 2018

Page 84: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Probabilities: Definition

I Definition 3.4. Given propositions A and B where P(b) 6= 0, the conditionalprobability, or posterior probability, of a given b, written P(a | b), is defined as:

P(a | b) :=P(a∧ b)

P(b)

I Intuition: The likelihood of having a and b, within the set of outcomes where wehave b.

I Example 3.5. P(cavity∧ toothache) = 0.12 and P(toothache) = 0.2 yieldP(cavity | toothache) = 0.6.

Kohlhase: Künstliche Intelligenz 2 159 July 12, 2018

Page 85: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Probability Distributions

I Definition 3.6. Given random variables X and Y , the conditional probabilitydistribution of X given Y , written P(X | Y ), is the table of all conditionalprobabilities of values of X given values of Y .

I For sets of variables: P(X1, . . . ,Xn | Y1, . . . ,Ym).I Example 3.7. P(Weather | Headache) =

Headache = T Headache = FWeather = sunny P(W = sunny | headache) P(W = sunny | ¬ headache)

Weather = rainWeather = cloudyWeather = snow

What is “The probability of sunshine given that I have a headache?”I If you’re susceptible to headaches depending on weather conditions, this makes

sense. Otherwise, the two variables are independent (see next section)

Kohlhase: Künstliche Intelligenz 2 160 July 12, 2018

Page 86: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.4 Independence

Kohlhase: Künstliche Intelligenz 2 160 July 12, 2018

Page 87: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution

I Example 4.1. Consider the following joint probability distribution:

toothache ¬ toothachecavity 0.12 0.08¬ cavity 0.08 0.72

I How to compute P(cavity)?

I Sum across the row:

P(cavity∧ toothache) + P(cavity∧¬ toothache) = 0.2

I How to compute P(cavity∨ toothache)?I Sum across atomic events:

P(cavity∧ toothache) + P(¬ cavity∧ toothache) + P(cavity∧¬ toothache) = 0.28

I How to compute P(cavity | toothache)?I P(cavity∧ toothache)

P(toothache)I All relevant probabilities can be computed using the full joint probability

distribution, by expressing propositions as disjunctions of atomic events.

Kohlhase: Künstliche Intelligenz 2 161 July 12, 2018

Page 88: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution

I Example 4.1. Consider the following joint probability distribution:

toothache ¬ toothachecavity 0.12 0.08¬ cavity 0.08 0.72

I How to compute P(cavity)?I Sum across the row:

P(cavity∧ toothache) + P(cavity∧¬ toothache) = 0.2

I How to compute P(cavity∨ toothache)?

I Sum across atomic events:

P(cavity∧ toothache) + P(¬ cavity∧ toothache) + P(cavity∧¬ toothache) = 0.28

I How to compute P(cavity | toothache)?I P(cavity∧ toothache)

P(toothache)I All relevant probabilities can be computed using the full joint probability

distribution, by expressing propositions as disjunctions of atomic events.

Kohlhase: Künstliche Intelligenz 2 161 July 12, 2018

Page 89: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution

I Example 4.1. Consider the following joint probability distribution:

toothache ¬ toothachecavity 0.12 0.08¬ cavity 0.08 0.72

I How to compute P(cavity)?I Sum across the row:

P(cavity∧ toothache) + P(cavity∧¬ toothache) = 0.2

I How to compute P(cavity∨ toothache)?I Sum across atomic events:

P(cavity∧ toothache) + P(¬ cavity∧ toothache) + P(cavity∧¬ toothache) = 0.28

I How to compute P(cavity | toothache)?

I P(cavity∧ toothache)P(toothache)

I All relevant probabilities can be computed using the full joint probabilitydistribution, by expressing propositions as disjunctions of atomic events.

Kohlhase: Künstliche Intelligenz 2 161 July 12, 2018

Page 90: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution

I Example 4.1. Consider the following joint probability distribution:

toothache ¬ toothachecavity 0.12 0.08¬ cavity 0.08 0.72

I How to compute P(cavity)?I Sum across the row:

P(cavity∧ toothache) + P(cavity∧¬ toothache) = 0.2

I How to compute P(cavity∨ toothache)?I Sum across atomic events:

P(cavity∧ toothache) + P(¬ cavity∧ toothache) + P(cavity∧¬ toothache) = 0.28

I How to compute P(cavity | toothache)?I P(cavity∧ toothache)

P(toothache)I All relevant probabilities can be computed using the full joint probability

distribution, by expressing propositions as disjunctions of atomic events.

Kohlhase: Künstliche Intelligenz 2 161 July 12, 2018

Page 91: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution??

I Question: Is it a good idea to use the full joint probability distribution?

I Answer: No:I Given n random variables with k values each, the joint probability distribution

contains kn probabilities.I Computational cost of dealing with this size.I Practically impossible to assess all these probabilities.

I Question: So, is there a compact way to represent the full joint probabilitydistribution? Is there an efficient method to work with that representation?

I Answer: Not in general, but it works in many cases. We can work directly withconditional probabilities, and exploit (conditional) independence.

I Bayesian networks.

(First, we do the simple case.)

Kohlhase: Künstliche Intelligenz 2 162 July 12, 2018

Page 92: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution??

I Question: Is it a good idea to use the full joint probability distribution?

I Answer: No:I Given n random variables with k values each, the joint probability distribution

contains kn probabilities.I Computational cost of dealing with this size.I Practically impossible to assess all these probabilities.

I Question: So, is there a compact way to represent the full joint probabilitydistribution? Is there an efficient method to work with that representation?

I Answer: Not in general, but it works in many cases. We can work directly withconditional probabilities, and exploit (conditional) independence.

I Bayesian networks. (First, we do the simple case.)

Kohlhase: Künstliche Intelligenz 2 162 July 12, 2018

Page 93: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution??

I Question: Is it a good idea to use the full joint probability distribution?

I Answer: No:I Given n random variables with k values each, the joint probability distribution

contains kn probabilities.I Computational cost of dealing with this size.I Practically impossible to assess all these probabilities.

I Question: So, is there a compact way to represent the full joint probabilitydistribution? Is there an efficient method to work with that representation?

I Answer: Not in general, but it works in many cases. We can work directly withconditional probabilities, and exploit (conditional) independence.

I Bayesian networks. (First, we do the simple case.)

Kohlhase: Künstliche Intelligenz 2 162 July 12, 2018

Page 94: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Working with the Full Joint Probability Distribution??

I Question: Is it a good idea to use the full joint probability distribution?

I Answer: No:I Given n random variables with k values each, the joint probability distribution

contains kn probabilities.I Computational cost of dealing with this size.I Practically impossible to assess all these probabilities.

I Question: So, is there a compact way to represent the full joint probabilitydistribution? Is there an efficient method to work with that representation?

I Answer: Not in general, but it works in many cases. We can work directly withconditional probabilities, and exploit (conditional) independence.

I Bayesian networks. (First, we do the simple case.)

Kohlhase: Künstliche Intelligenz 2 162 July 12, 2018

Page 95: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Independence

I Definition 4.2. Events a and b are independent if P(a∧ b) = P(a) · P(b).I Proposition 4.3. Given independent events a and b where P(b) 6= 0, we have

P(a | b) = P(a).I Proof:

P.1 By definition, P(a | b) = P(a∧ b)P(b) ,

P.2 which by independence is equal to P(a)·P(b)P(b) = P(a).

I Similarly, if P(a) 6= 0, we have P(b | a) = P(b).I Example 4.4.I P(Dice1 = 6∧Dice2 = 6) = 1/36.I P(W = sunny | headache) = P(W = sunny) unless you’re weather-sensitive (cf.

slide 26).I But toothache and cavity are NOT independent.I The fraction of “cavity” is higher within “toothache” than within “¬ toothache”.

P(toothache) = 0.2 and P(cavity) = 0.2, but P(toothache∧ cavity) = 0.12 > 0.04.I Definition 4.5. Random variables X and Y are independent if

P(X ,Y ) = P(X ) · P(Y ). (System of equations!)

Kohlhase: Künstliche Intelligenz 2 163 July 12, 2018

Page 96: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Illustration: Exploiting Independence

I Example 4.6. Consider (again) the following joint probability distribution:toothache ¬ toothache

cavity 0.12 0.08¬ cavity 0.08 0.72

Adding variable Weather with values sunny, rain, cloudy, snow, the full jointprobability distribution contains 16 probabilities.But your teeth do not influence the weather, nor vice versa!I Weather is independent of each of Cavity and Toothache: For all value combinations

(c, t) of Cavity and Toothache, and for all values w of Weather, we haveP(c ∧ t ∧w) = P(c ∧ t) · P(w).

I P(Cavity,Toothache,Weather) can be reconstructed from the separate tablesP(Cavity,Toothache) and P(Weather). (8 probabilities)

I Independence can be exploited to represent the full joint probability distributionmore compactly.

I Sometimes, variables are independent only under particular conditions:conditional independence, see later.

Kohlhase: Künstliche Intelligenz 2 164 July 12, 2018

Page 97: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.5 Basic Probabilistic Reasoning Methods

Kohlhase: Künstliche Intelligenz 2 164 July 12, 2018

Page 98: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Product Rule

I Proposition 5.1 (Product Rule). Given propositions A and B,P(a∧ b) = P(a | b) · P(b)

I Example 5.2. P(cavity∧ toothache) = P(toothache | cavity) · P(cavity).I If we know the values of P(a | b) and P(b), then we can compute P(a∧ b).I Similarly, P(a∧ b) = P(b | a) · P(a).I Definition 5.3. P(X ,Y ) = P(X | Y ) · P(Y ) is a system of equations:

P(W = sunny∧ headache) = P(W = sunny | headache) · P(headache)

P(W = rain∧ headache) = P(W = rain | headache) · P(headache)

... =...

P(W = snow∧¬ headache) = P(W = snow | ¬ headache) · P(¬ headache)

I Similar for unconditional distributions, P(X ,Y ) = P(X ) · P(Y ).

Kohlhase: Künstliche Intelligenz 2 165 July 12, 2018

Page 99: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Chain Rule

I Proposition 5.4 (Chain Rule). Given random variables X1, . . . ,Xn, we have

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) · . . . · P(X2 | X1) · P(X1)

I Example 5.5.

P(¬ brush∧ cavity∧ toothache)

= P(toothache | cavity,¬ brush) · P(cavity,¬ brush)

= P(toothache | cavity,¬ brush) · P(cavity | ¬ brush) · P(¬ brush)

I Proof: Iterated application of Product RuleP.1 P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1, . . . ,X1) by Product

Rule.P.2 In turn, P(Xn−1, . . . ,X1) = P(Xn−1 | Xn−2, . . . ,X1) · P(Xn−2, . . . ,X1),

etc.

Note: This works for any ordering of the variables.I I We can recover the probability of atomic events from sequenced conditional

probabilities for any ordering of the variables.I First of the four basic techniques in Bayesian networks.

Kohlhase: Künstliche Intelligenz 2 166 July 12, 2018

Page 100: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Marginalization

I Extracting a sub-distribution from a larger joint distribution:I Proposition 5.6 (Marginalization). Given sets X and Y of random variables,

we have:P(X) =

∑y∈Y

P(X, y)

where∑

y∈Y sums over all possible value combinations of Y.I Example 5.7. (Note: Equation system!)

P(Cavity) =∑

y∈Toothache

P(Cavity, y)

P(cavity) = P(cavity, toothache) + P(cavity,¬ toothache)

P(¬ cavity) = P(¬ cavity, toothache) + P(¬ cavity,¬ toothache)

Kohlhase: Künstliche Intelligenz 2 167 July 12, 2018

Page 101: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Say P(dog) = 0.4, (¬ dog)⇔ cat, and P(likeslasagna | cat) = 0.5.

I Question: Is P(likeslasagna∧ cat) is A: 0.2, B: 0.5, C: 0.475, D: 0.3

I Answer: We have P(cat) = 0.6 and P(likeslasagna | cat) = 0.5, hence (D) bythe product rule.

I Question: Can we compute the value of P(likeslasagna), given the aboveinformations?

I Answer: No. We don’t know the probability that dogs like lasagna, i.e.P(likeslasagna | dog).

Kohlhase: Künstliche Intelligenz 2 168 July 12, 2018

Page 102: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Say P(dog) = 0.4, (¬ dog)⇔ cat, and P(likeslasagna | cat) = 0.5.

I Question: Is P(likeslasagna∧ cat) is A: 0.2, B: 0.5, C: 0.475, D: 0.3

I Answer: We have P(cat) = 0.6 and P(likeslasagna | cat) = 0.5, hence (D) bythe product rule.

I Question: Can we compute the value of P(likeslasagna), given the aboveinformations?

I Answer: No. We don’t know the probability that dogs like lasagna, i.e.P(likeslasagna | dog).

Kohlhase: Künstliche Intelligenz 2 168 July 12, 2018

Page 103: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Say P(dog) = 0.4, (¬ dog)⇔ cat, and P(likeslasagna | cat) = 0.5.

I Question: Is P(likeslasagna∧ cat) is A: 0.2, B: 0.5, C: 0.475, D: 0.3

I Answer: We have P(cat) = 0.6 and P(likeslasagna | cat) = 0.5, hence (D) bythe product rule.

I Question: Can we compute the value of P(likeslasagna), given the aboveinformations?

I Answer: No. We don’t know the probability that dogs like lasagna, i.e.P(likeslasagna | dog).

Kohlhase: Künstliche Intelligenz 2 168 July 12, 2018

Page 104: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Normalization: Idea

I Problem: We know P(cavity∧ toothache) but don’t know P(toothache).

I Step 1: Case distinction over values of Cavity: (P(toothache) as an unknown)

P(cavity | toothache) =P(cavity∧ toothache)

P(toothache)=

0.12P(toothache)

P(¬ cavity | toothache) =P(¬ cavity∧ toothache)

P(toothache)=

0.08P(toothache)

I Step 2: Assuming placeholder α := 1/P(toothache):

P(cavity | toothache) = α P(cavity∧ toothache) = α 0.12P(¬ cavity | toothache) = α P(¬ cavity∧ toothache) = α 0.08

I Step 3: Fixing toothache to be true, view P(cavity∧ toothache) vs.P(¬ cavity∧ toothache) as the relative weights of P(cavity) vs. P(¬ cavity)within toothache. Then normalize their summed-up weight to 1:1 = α (0.12+ 0.08) ; α = 1

0.12+ 0.08 = 10.2 = 5

I α is a normalization constant scaling the sum of relative weights to 1.

Kohlhase: Künstliche Intelligenz 2 169 July 12, 2018

Page 105: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Normalization: Idea

I Problem: We know P(cavity∧ toothache) but don’t know P(toothache).

I Step 1: Case distinction over values of Cavity: (P(toothache) as an unknown)

P(cavity | toothache) =P(cavity∧ toothache)

P(toothache)=

0.12P(toothache)

P(¬ cavity | toothache) =P(¬ cavity∧ toothache)

P(toothache)=

0.08P(toothache)

I Step 2: Assuming placeholder α := 1/P(toothache):

P(cavity | toothache) = α P(cavity∧ toothache) = α 0.12P(¬ cavity | toothache) = α P(¬ cavity∧ toothache) = α 0.08

I Step 3: Fixing toothache to be true, view P(cavity∧ toothache) vs.P(¬ cavity∧ toothache) as the relative weights of P(cavity) vs. P(¬ cavity)within toothache. Then normalize their summed-up weight to 1:1 = α (0.12+ 0.08) ; α = 1

0.12+ 0.08 = 10.2 = 5

I α is a normalization constant scaling the sum of relative weights to 1.

Kohlhase: Künstliche Intelligenz 2 169 July 12, 2018

Page 106: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Normalization: Idea

I Problem: We know P(cavity∧ toothache) but don’t know P(toothache).

I Step 1: Case distinction over values of Cavity: (P(toothache) as an unknown)

P(cavity | toothache) =P(cavity∧ toothache)

P(toothache)=

0.12P(toothache)

P(¬ cavity | toothache) =P(¬ cavity∧ toothache)

P(toothache)=

0.08P(toothache)

I Step 2: Assuming placeholder α := 1/P(toothache):

P(cavity | toothache) = α P(cavity∧ toothache) = α 0.12P(¬ cavity | toothache) = α P(¬ cavity∧ toothache) = α 0.08

I Step 3: Fixing toothache to be true, view P(cavity∧ toothache) vs.P(¬ cavity∧ toothache) as the relative weights of P(cavity) vs. P(¬ cavity)within toothache. Then normalize their summed-up weight to 1:1 = α (0.12+ 0.08) ; α = 1

0.12+ 0.08 = 10.2 = 5

I α is a normalization constant scaling the sum of relative weights to 1.

Kohlhase: Künstliche Intelligenz 2 169 July 12, 2018

Page 107: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Normalization: Idea

I Problem: We know P(cavity∧ toothache) but don’t know P(toothache).

I Step 1: Case distinction over values of Cavity: (P(toothache) as an unknown)

P(cavity | toothache) =P(cavity∧ toothache)

P(toothache)=

0.12P(toothache)

P(¬ cavity | toothache) =P(¬ cavity∧ toothache)

P(toothache)=

0.08P(toothache)

I Step 2: Assuming placeholder α := 1/P(toothache):

P(cavity | toothache) = α P(cavity∧ toothache) = α 0.12P(¬ cavity | toothache) = α P(¬ cavity∧ toothache) = α 0.08

I Step 3: Fixing toothache to be true, view P(cavity∧ toothache) vs.P(¬ cavity∧ toothache) as the relative weights of P(cavity) vs. P(¬ cavity)within toothache. Then normalize their summed-up weight to 1:1 = α (0.12+ 0.08) ; α = 1

0.12+ 0.08 = 10.2 = 5

I α is a normalization constant scaling the sum of relative weights to 1.

Kohlhase: Künstliche Intelligenz 2 169 July 12, 2018

Page 108: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Normalization: Idea

I Problem: We know P(cavity∧ toothache) but don’t know P(toothache).

I Step 1: Case distinction over values of Cavity: (P(toothache) as an unknown)

P(cavity | toothache) =P(cavity∧ toothache)

P(toothache)=

0.12P(toothache)

P(¬ cavity | toothache) =P(¬ cavity∧ toothache)

P(toothache)=

0.08P(toothache)

I Step 2: Assuming placeholder α := 1/P(toothache):

P(cavity | toothache) = α P(cavity∧ toothache) = α 0.12P(¬ cavity | toothache) = α P(¬ cavity∧ toothache) = α 0.08

I Step 3: Fixing toothache to be true, view P(cavity∧ toothache) vs.P(¬ cavity∧ toothache) as the relative weights of P(cavity) vs. P(¬ cavity)within toothache. Then normalize their summed-up weight to 1:1 = α (0.12+ 0.08) ; α = 1

0.12+ 0.08 = 10.2 = 5

I α is a normalization constant scaling the sum of relative weights to 1.Kohlhase: Künstliche Intelligenz 2 169 July 12, 2018

Page 109: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Normalization: Formal

I Definition 5.8. Given a vector 〈w1, . . . ,wk〉 of numbers in [0, 1] where∑ki=1 wi ≤ 1, the normalization constant α is α〈w1, . . . ,w1〉 := 1∑k

i=1 wi.

I Example 5.9. α〈0.12, 0.08〉 = 5 〈0.12, 0.08〉 = 〈0.6, 0.4〉.Proposition 5.10 (Normalization). Given a random variable X and an evente, we have P(X | e) = α P(X , e).

I Proof:P.1 For each value x of X , P(X = x | e) = P(X = x ∧ e)/P(e).P.2 So all we need to prove is that α = 1/P(e).P.3 By definition, α = 1/

∑x P(X = x ∧ e),

so we need to proveP(e) =

∑x P(X = x ∧ e) which holds by marginalization.

I Example 5.11. α 〈P(cavity∧ toothache),P(¬ cavity∧ toothache)〉 =α 〈0.12, 0.08〉, so P(cavity | toothache) = 0.6, andP(¬ cavity | toothache) = 0.4.

I Another way of saying this is: “We use α as a placeholder for 1/P(e), which wecompute using the sum of relative weights by Marginalization.”

I Normalization+Marginalization: Given “query variable” X , “observed event”e, and “hidden variables” set Y: P(X | e) = α · P(X , e) = α ·

∑y∈Y P(X , e, y).

I Second of the four basic techniques in Bayesian networks.

Kohlhase: Künstliche Intelligenz 2 170 July 12, 2018

Page 110: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Normalization: Formal

I Definition 5.8. Given a vector 〈w1, . . . ,wk〉 of numbers in [0, 1] where∑ki=1 wi ≤ 1, the normalization constant α is α〈w1, . . . ,w1〉 := 1∑k

i=1 wi.

I Example 5.9. α〈0.12, 0.08〉 = 5 〈0.12, 0.08〉 = 〈0.6, 0.4〉.Proposition 5.10 (Normalization). Given a random variable X and an evente, we have P(X | e) = α P(X , e).

I Proof:P.1 For each value x of X , P(X = x | e) = P(X = x ∧ e)/P(e).P.2 So all we need to prove is that α = 1/P(e).P.3 By definition, α = 1/

∑x P(X = x ∧ e), so we need to prove

P(e) =∑

x P(X = x ∧ e) which holds by marginalization.I Example 5.11. α 〈P(cavity∧ toothache),P(¬ cavity∧ toothache)〉 =α 〈0.12, 0.08〉, so P(cavity | toothache) = 0.6, andP(¬ cavity | toothache) = 0.4.

I Another way of saying this is: “We use α as a placeholder for 1/P(e), which wecompute using the sum of relative weights by Marginalization.”

I Normalization+Marginalization: Given “query variable” X , “observed event”e, and “hidden variables” set Y: P(X | e) = α · P(X , e) = α ·

∑y∈Y P(X , e, y).

I Second of the four basic techniques in Bayesian networks.Kohlhase: Künstliche Intelligenz 2 170 July 12, 2018

Page 111: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.6 Bayes’ Rule

Kohlhase: Künstliche Intelligenz 2 170 July 12, 2018

Page 112: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Bayes’ Rule

I Proposition 6.1 (Bayes’ Rule). Given propositions A and B where P(a) 6= 0and P(b) 6= 0, we have:

P(a | b) =P(b | a) · P(a)

P(b)

I Proof:P.1 By definition, P(a | b) = P(a∧ b)

P(b)

P.2 by the product rule P(a∧ b) = P(b | a) · P(a) is equal to the claim.

Notation: note that this is a system of equations!

P(X | Y ) =P(Y | X ) · P(X )

P(Y )

Kohlhase: Künstliche Intelligenz 2 171 July 12, 2018

Page 113: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Applying Bayes’ Rule

II Example 6.2. Say we know that P(toothache | cavity) = 0.6, P(cavity) = 0.2,and P(toothache) = 0.2.We can we compute P(cavity | toothache): By Bayes’ rule,P(cavity | toothache) = P(toothache|cavity)·P(cavity)

P(toothache) = 0.6·0.20.2 = 0.6.

I Ok, but: Why don’t we simply assess P(cavity | toothache) directly?I P(toothache | cavity) is causal, P(cavity | toothache) is diagnostic.I Causal dependencies are robust over frequency of the causes.I Example 6.3. If there is a cavity epidemic then P(cavity | toothache) increases,

but P(toothache | cavity) remains the same. (only depends on how cavities“work”)

I Also, causal dependencies are often easier to assess.I Bayes’ rule allows to perform diagnosis (observing a symptom, what is the

cause?) based on prior probabilities and causal dependencies.

Kohlhase: Künstliche Intelligenz 2 172 July 12, 2018

Page 114: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Extended Example: Bayes’ Rule and Meningitis

I Facts known to doctors:I The prior probabilities of meningitis (m) and stiff neck (s) are P(m) = 0.00002 and

P(s) = 0.01.I Meningitis causes a stiff neck 70% of the time: P(s | m) = 0.7.

I Doctor d uses Bayes’ Rule:P(m | s) = P(s|m)·P(m)

P(s) = 0.7·0.000020.01 = 0.0014 ∼ 1

700 .I Even though stiff neck is strongly indicated by meningitis (P(s | m) = 0.7)I the probability of meningitis in the patient remains small.I The prior probability of stiff necks is much higher than that of meningitis.

I Doctor d ′ knows P(m | s) from observation; she does not need Bayes’ rule!I Indeed, but what if a meningitis epidemic eruptsI Then d knows that P(m | s) grows proportionally with P(m) (d ′ clueless)

Kohlhase: Künstliche Intelligenz 2 173 July 12, 2018

Page 115: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Say P(dog) = 0.4, P(likeschappi | dog) = 0.8, and P(likeschappi) = 0.5.

I Question: What is P(dog | likeschappi)?A: 0.8 B: 0.64 C:0.9 D: 0.32?

I Answer: By Bayes’ rule,P(dog | likeschappi) = P(likeschappi|dog) P(dog)

P(likeschappi) = 0.8∗0.40.5 = 0.64 so (B).

I Question: Is P(dog | likeschappi) causal or diagnostic?

I Answer: Diagnostic; liking Chappi does not cause anybody to be a dog.

I Question: Is P(likeschappi | dog) causal or diagnostic?

I Answer: Causal; liking or not liking dog food may be caused by being or notbeing a dog.

Kohlhase: Künstliche Intelligenz 2 174 July 12, 2018

Page 116: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Say P(dog) = 0.4, P(likeschappi | dog) = 0.8, and P(likeschappi) = 0.5.

I Question: What is P(dog | likeschappi)?A: 0.8 B: 0.64 C:0.9 D: 0.32?

I Answer: By Bayes’ rule,P(dog | likeschappi) = P(likeschappi|dog) P(dog)

P(likeschappi) = 0.8∗0.40.5 = 0.64 so (B).

I Question: Is P(dog | likeschappi) causal or diagnostic?

I Answer: Diagnostic; liking Chappi does not cause anybody to be a dog.

I Question: Is P(likeschappi | dog) causal or diagnostic?

I Answer: Causal; liking or not liking dog food may be caused by being or notbeing a dog.

Kohlhase: Künstliche Intelligenz 2 174 July 12, 2018

Page 117: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Say P(dog) = 0.4, P(likeschappi | dog) = 0.8, and P(likeschappi) = 0.5.

I Question: What is P(dog | likeschappi)?A: 0.8 B: 0.64 C:0.9 D: 0.32?

I Answer: By Bayes’ rule,P(dog | likeschappi) = P(likeschappi|dog) P(dog)

P(likeschappi) = 0.8∗0.40.5 = 0.64 so (B).

I Question: Is P(dog | likeschappi) causal or diagnostic?

I Answer: Diagnostic; liking Chappi does not cause anybody to be a dog.

I Question: Is P(likeschappi | dog) causal or diagnostic?

I Answer: Causal; liking or not liking dog food may be caused by being or notbeing a dog.

Kohlhase: Künstliche Intelligenz 2 174 July 12, 2018

Page 118: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Say P(dog) = 0.4, P(likeschappi | dog) = 0.8, and P(likeschappi) = 0.5.

I Question: What is P(dog | likeschappi)?A: 0.8 B: 0.64 C:0.9 D: 0.32?

I Answer: By Bayes’ rule,P(dog | likeschappi) = P(likeschappi|dog) P(dog)

P(likeschappi) = 0.8∗0.40.5 = 0.64 so (B).

I Question: Is P(dog | likeschappi) causal or diagnostic?

I Answer: Diagnostic; liking Chappi does not cause anybody to be a dog.

I Question: Is P(likeschappi | dog) causal or diagnostic?

I Answer: Causal; liking or not liking dog food may be caused by being or notbeing a dog.

Kohlhase: Künstliche Intelligenz 2 174 July 12, 2018

Page 119: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.7 Conditional Independence

Kohlhase: Künstliche Intelligenz 2 174 July 12, 2018

Page 120: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Bayes’ Rule with Multiple Evidence

I Example 7.1. Say we know from medicinical studies that P(cavity) = 0.2,P(toothache | cavity) = 0.6, P(toothache | ¬ cavity) = 0.1,P(catch | cavity) = 0.9, and P(catch | ¬ cavity) = 0.2.Now, in case we did observe the symptoms toothache and catch (the dentist’sprobe catches in the aching tooth), what would be the likelihood of having acavity? What is P(cavity | toothache∧ catch)?I Trial 1: Bayes’ rule

P(cavity | toothache∧ catch) = P(toothache∧ catch | cavity) P(cavity)P(toothache∧ catch)

I Trial 2: Normalization P(X | e) = α P(X , e) then Product RuleP(X , e) = P(e | X ) P(X ), with X = Cavity, e = toothache∧ catch:

P(Cavity | catch∧ toothache) = α P(toothache∧ catch | Cavity) P(Cavity)P(cavity | catch∧ toothache) = α P(toothache∧ catch | cavity) P(cavity)

P(¬ cavity | catch∧ toothache) = α P(toothache∧ catch | ¬ cavity) P(¬ cavity)

Kohlhase: Künstliche Intelligenz 2 175 July 12, 2018

Page 121: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Bayes’ Rule with Multiple Evidence

I Example 7.1. Say we know from medicinical studies that P(cavity) = 0.2,P(toothache | cavity) = 0.6, P(toothache | ¬ cavity) = 0.1,P(catch | cavity) = 0.9, and P(catch | ¬ cavity) = 0.2.Now, in case we did observe the symptoms toothache and catch (the dentist’sprobe catches in the aching tooth), what would be the likelihood of having acavity? What is P(cavity | toothache∧ catch)?I Trial 1: Bayes’ rule

P(cavity | toothache∧ catch) = P(toothache∧ catch | cavity) P(cavity)P(toothache∧ catch)

I Trial 2: Normalization P(X | e) = α P(X , e) then Product RuleP(X , e) = P(e | X ) P(X ), with X = Cavity, e = toothache∧ catch:

P(Cavity | catch∧ toothache) = α P(toothache∧ catch | Cavity) P(Cavity)P(cavity | catch∧ toothache) = α P(toothache∧ catch | cavity) P(cavity)

P(¬ cavity | catch∧ toothache) = α P(toothache∧ catch | ¬ cavity) P(¬ cavity)

Kohlhase: Künstliche Intelligenz 2 175 July 12, 2018

Page 122: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Bayes’ Rule with Multiple Evidence, ctd.

I P(Cavity | toothache∧ catch) = αP(toothache∧ catch | Cavity)P(Cavity)

I Question: So, is everything fine?

I Answer: No! We need P(toothache∧ catch | Cavity), i.e. causal dependenciesfor all combinations of symptoms! ( 2, in general)

I Question: Are Toothache and Catch independent?

I Answer: No. If a probe catches, we probably have a cavity which probablycauses toothache.

I But: They are independent given the presence or absence of cavity!

Kohlhase: Künstliche Intelligenz 2 176 July 12, 2018

Page 123: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Bayes’ Rule with Multiple Evidence, ctd.

I P(Cavity | toothache∧ catch) = αP(toothache∧ catch | Cavity)P(Cavity)

I Question: So, is everything fine?

I Answer: No! We need P(toothache∧ catch | Cavity), i.e. causal dependenciesfor all combinations of symptoms! ( 2, in general)

I Question: Are Toothache and Catch independent?

I Answer: No. If a probe catches, we probably have a cavity which probablycauses toothache.

I But: They are independent given the presence or absence of cavity!

Kohlhase: Künstliche Intelligenz 2 176 July 12, 2018

Page 124: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Bayes’ Rule with Multiple Evidence, ctd.

I P(Cavity | toothache∧ catch) = αP(toothache∧ catch | Cavity)P(Cavity)

I Question: So, is everything fine?

I Answer: No! We need P(toothache∧ catch | Cavity), i.e. causal dependenciesfor all combinations of symptoms! ( 2, in general)

I Question: Are Toothache and Catch independent?

I Answer: No. If a probe catches, we probably have a cavity which probablycauses toothache.

I But: They are independent given the presence or absence of cavity!

Kohlhase: Künstliche Intelligenz 2 176 July 12, 2018

Page 125: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Bayes’ Rule with Multiple Evidence, ctd.

I P(Cavity | toothache∧ catch) = αP(toothache∧ catch | Cavity)P(Cavity)

I Question: So, is everything fine?

I Answer: No! We need P(toothache∧ catch | Cavity), i.e. causal dependenciesfor all combinations of symptoms! ( 2, in general)

I Question: Are Toothache and Catch independent?

I Answer: No. If a probe catches, we probably have a cavity which probablycauses toothache.

I But: They are independent given the presence or absence of cavity!

Kohlhase: Künstliche Intelligenz 2 176 July 12, 2018

Page 126: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Independence

I Definition 7.2. Given sets of random variables Z1, Z2, and Z, we say that Z1and Z2 are conditionally independent given Z if:

P(Z1,Z2 | Z) = P(Z1 | Z) · P(Z2 | Z)

We alternatively say that Z1 is conditionally independent of Z2 given Z.I Example 7.3.

P(Toothache,Catch | cavity) = P(Toothache | cavity)P(Catch | cavity)

P(Toothache,Catch | ¬ cavity) = P(Toothache | ¬ cavity)P(Catch | ¬ cavity)

I For cavity: this may cause both, but they don’t influence each other.I For ¬ cavity: catch and/or toothache would each be caused by something else.

I Note: The definition is symmetric regarding the roles of Z1 and Z2: Toothacheis conditionally independent of

I But there may be dependencies within Z1 or Z2, e.g.Z2 = Toothache, Sleeplessness.

Kohlhase: Künstliche Intelligenz 2 177 July 12, 2018

Page 127: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Conditional Independence, ctd.

I Proposition 7.4. If Z1 and Z2 are conditionally independent given Z, thenP(Z1 | Z2,Z) = P(Z1 | Z).

I Proof:P.1 By definition, P(Z1 | Z2,Z) = P(Z1,Z2,Z)

P(Z2,Z)

P.2 which by product rule is equal to P(Z1,Z2|Z)·P(Z)P(Z2,Z)

P.3 which by conditional independence is equal to P(Z1|Z)·P(Z2|Z)·P(Z)P(Z2,Z) .

P.4 Since P(Z2|Z)·P(Z)P(Z2,Z) = 1 this proves the claim.

I Example 7.5. Using Toothache as Z1, Catch as Z2, and Cavity as Z:P(Toothache | Catch,Cavity) = P(Toothache | Cavity).

I In the presence of conditional independence, we can drop variables from theright-hand side of conditional probabilities.

I Third of the four basic techniques in Bayesian networks.I Last missing technique: “Capture variable dependencies in a graph”; illustration

see next slide, details see next Chapter

Kohlhase: Künstliche Intelligenz 2 178 July 12, 2018

Page 128: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Overview

I 1. Graph captures variable dependencies: (Variables X1, . . . ,Xn)

Toothache Catch

Cavity

I Given evidence e, want to know P(X | e).I Remaining vars: Y.

I 2. Normalization+Marginalization:P(X | e) = α · P(X , e); if Y 6= ∅ then P(X | e) = α ·

∑y∈Y P(X , e, y)

I A sum over atomic events!

I 3. Chain rule: Order X1, . . . ,Xn consistently with dependency graph.P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) · . . . · P(X1)

I 4. Exploit conditional independence: Instead of P(Xi | Xi−1, . . . ,X1), withprevious slide we can use P(Xi | Parents(Xi )).I Bayesian networks!

Kohlhase: Künstliche Intelligenz 2 179 July 12, 2018

Page 129: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Overview

I 1. Graph captures variable dependencies: (Variables X1, . . . ,Xn)

Toothache Catch

Cavity

I Given evidence e, want to know P(X | e).I Remaining vars: Y.

I 2. Normalization+Marginalization:P(X | e) = α · P(X , e); if Y 6= ∅ then P(X | e) = α ·

∑y∈Y P(X , e, y)

I A sum over atomic events!

I 3. Chain rule: Order X1, . . . ,Xn consistently with dependency graph.P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) · . . . · P(X1)

I 4. Exploit conditional independence: Instead of P(Xi | Xi−1, . . . ,X1), withprevious slide we can use P(Xi | Parents(Xi )).I Bayesian networks!

Kohlhase: Künstliche Intelligenz 2 179 July 12, 2018

Page 130: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Overview

I 1. Graph captures variable dependencies: (Variables X1, . . . ,Xn)

Toothache Catch

Cavity

I Given evidence e, want to know P(X | e).I Remaining vars: Y.

I 2. Normalization+Marginalization:P(X | e) = α · P(X , e); if Y 6= ∅ then P(X | e) = α ·

∑y∈Y P(X , e, y)

I A sum over atomic events!

I 3. Chain rule: Order X1, . . . ,Xn consistently with dependency graph.P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) · . . . · P(X1)

I 4. Exploit conditional independence: Instead of P(Xi | Xi−1, . . . ,X1), withprevious slide we can use P(Xi | Parents(Xi )).I Bayesian networks!

Kohlhase: Künstliche Intelligenz 2 179 July 12, 2018

Page 131: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Overview

I 1. Graph captures variable dependencies: (Variables X1, . . . ,Xn)

Toothache Catch

Cavity

I Given evidence e, want to know P(X | e).I Remaining vars: Y.

I 2. Normalization+Marginalization:P(X | e) = α · P(X , e); if Y 6= ∅ then P(X | e) = α ·

∑y∈Y P(X , e, y)

I A sum over atomic events!

I 3. Chain rule: Order X1, . . . ,Xn consistently with dependency graph.P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) · . . . · P(X1)

I 4. Exploit conditional independence: Instead of P(Xi | Xi−1, . . . ,X1), withprevious slide we can use P(Xi | Parents(Xi )).I Bayesian networks!

Kohlhase: Künstliche Intelligenz 2 179 July 12, 2018

Page 132: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Example

I 1. Graph captures variable dependencies: (See previous slide.)I Given toothache, catch, want P(Cavity | toothache, catch). Remaining vars: ∅.

I 2. Normalization+Marginalization:P(Cavity | toothache, catch) = α · P(Cavity, toothache, catch)

I 3. Chain rule: Order X1 = Cavity, X2 = Toothache, X3 = Catch.P(Cavity, toothache, catch) =

P(catch | toothache,Cavity) · P(toothache | Cavity) · P(Cavity)

I 4. Exploit conditional independence:Instead of P(catch | toothache,Cavity) use P(catch | Cavity).

I Thus:

P(Cavity | toothache, catch)

= α · P(catch | Cavity) · P(toothache | Cavity) · P(Cavity)

= α · 〈0.9 · 0.6 · 0.2, 0.2 · 0.1 · 0.8〉= α · 〈0.108, 0.016〉

I So α ≈ 8.06 and P(cavity | toothache∧ catch) ≈ 0.87.

Kohlhase: Künstliche Intelligenz 2 180 July 12, 2018

Page 133: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Example

I 1. Graph captures variable dependencies: (See previous slide.)I Given toothache, catch, want P(Cavity | toothache, catch). Remaining vars: ∅.

I 2. Normalization+Marginalization:P(Cavity | toothache, catch) = α · P(Cavity, toothache, catch)

I 3. Chain rule: Order X1 = Cavity, X2 = Toothache, X3 = Catch.P(Cavity, toothache, catch) =

P(catch | toothache,Cavity) · P(toothache | Cavity) · P(Cavity)

I 4. Exploit conditional independence:Instead of P(catch | toothache,Cavity) use P(catch | Cavity).

I Thus:

P(Cavity | toothache, catch)

= α · P(catch | Cavity) · P(toothache | Cavity) · P(Cavity)

= α · 〈0.9 · 0.6 · 0.2, 0.2 · 0.1 · 0.8〉= α · 〈0.108, 0.016〉

I So α ≈ 8.06 and P(cavity | toothache∧ catch) ≈ 0.87.

Kohlhase: Künstliche Intelligenz 2 180 July 12, 2018

Page 134: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Example

I 1. Graph captures variable dependencies: (See previous slide.)I Given toothache, catch, want P(Cavity | toothache, catch). Remaining vars: ∅.

I 2. Normalization+Marginalization:P(Cavity | toothache, catch) = α · P(Cavity, toothache, catch)

I 3. Chain rule: Order X1 = Cavity, X2 = Toothache, X3 = Catch.P(Cavity, toothache, catch) =

P(catch | toothache,Cavity) · P(toothache | Cavity) · P(Cavity)

I 4. Exploit conditional independence:Instead of P(catch | toothache,Cavity) use P(catch | Cavity).

I Thus:

P(Cavity | toothache, catch)

= α · P(catch | Cavity) · P(toothache | Cavity) · P(Cavity)

= α · 〈0.9 · 0.6 · 0.2, 0.2 · 0.1 · 0.8〉= α · 〈0.108, 0.016〉

I So α ≈ 8.06 and P(cavity | toothache∧ catch) ≈ 0.87.

Kohlhase: Künstliche Intelligenz 2 180 July 12, 2018

Page 135: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Example

I 1. Graph captures variable dependencies: (See previous slide.)I Given toothache, catch, want P(Cavity | toothache, catch). Remaining vars: ∅.

I 2. Normalization+Marginalization:P(Cavity | toothache, catch) = α · P(Cavity, toothache, catch)

I 3. Chain rule: Order X1 = Cavity, X2 = Toothache, X3 = Catch.P(Cavity, toothache, catch) =

P(catch | toothache,Cavity) · P(toothache | Cavity) · P(Cavity)

I 4. Exploit conditional independence:Instead of P(catch | toothache,Cavity) use P(catch | Cavity).

I Thus:

P(Cavity | toothache, catch)

= α · P(catch | Cavity) · P(toothache | Cavity) · P(Cavity)

= α · 〈0.9 · 0.6 · 0.2, 0.2 · 0.1 · 0.8〉= α · 〈0.108, 0.016〉

I So α ≈ 8.06 and P(cavity | toothache∧ catch) ≈ 0.87.

Kohlhase: Künstliche Intelligenz 2 180 July 12, 2018

Page 136: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Example

I 1. Graph captures variable dependencies: (See previous slide.)I Given toothache, catch, want P(Cavity | toothache, catch). Remaining vars: ∅.

I 2. Normalization+Marginalization:P(Cavity | toothache, catch) = α · P(Cavity, toothache, catch)

I 3. Chain rule: Order X1 = Cavity, X2 = Toothache, X3 = Catch.P(Cavity, toothache, catch) =

P(catch | toothache,Cavity) · P(toothache | Cavity) · P(Cavity)

I 4. Exploit conditional independence:Instead of P(catch | toothache,Cavity) use P(catch | Cavity).

I Thus:

P(Cavity | toothache, catch)

= α · P(catch | Cavity) · P(toothache | Cavity) · P(Cavity)

= α · 〈0.9 · 0.6 · 0.2, 0.2 · 0.1 · 0.8〉= α · 〈0.108, 0.016〉

I So α ≈ 8.06 and P(cavity | toothache∧ catch) ≈ 0.87.

Kohlhase: Künstliche Intelligenz 2 180 July 12, 2018

Page 137: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exploiting Conditional Independence: Example

I 1. Graph captures variable dependencies: (See previous slide.)I Given toothache, catch, want P(Cavity | toothache, catch). Remaining vars: ∅.

I 2. Normalization+Marginalization:P(Cavity | toothache, catch) = α · P(Cavity, toothache, catch)

I 3. Chain rule: Order X1 = Cavity, X2 = Toothache, X3 = Catch.P(Cavity, toothache, catch) =

P(catch | toothache,Cavity) · P(toothache | Cavity) · P(Cavity)

I 4. Exploit conditional independence:Instead of P(catch | toothache,Cavity) use P(catch | Cavity).

I Thus:

P(Cavity | toothache, catch)

= α · P(catch | Cavity) · P(toothache | Cavity) · P(Cavity)

= α · 〈0.9 · 0.6 · 0.2, 0.2 · 0.1 · 0.8〉= α · 〈0.108, 0.016〉

I So α ≈ 8.06 and P(cavity | toothache∧ catch) ≈ 0.87.Kohlhase: Künstliche Intelligenz 2 180 July 12, 2018

Page 138: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Native Bayes Models

I Definition 7.6. A Bayesian network in which a single cause directly influences anumber of effects, all of which are conditionally independent, given the cause iscalled a naive Bayes model or Bayesian classifiers. (also called idiot Bayes modelby Bayesian fundamentalists)

I Observation 7.7. In a naive Bayes model, the full joint probabilitydistribution can be written as

P(cause | effect1, . . . , effectn) = P(cause) ·∏i

P(effecti | cause)

I This kind of model is called “naive” or “idiot” since it is often used as asimplifying model if the effects are not conditionally independent after all.

I In practice, naive Bayes systems can work surprisingly well, even when theconditional independence assumption is not true.

I Example 7.8. The dentistry example is a (true) naive Bayes model.

Kohlhase: Künstliche Intelligenz 2 181 July 12, 2018

Page 139: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Consider the random variables X1 = Animal, X2 = LikesChappi, andX3 = LoudNoise; X1 has values dog, cat, other, X2 and X3 are Boolean.

I Question: Which statements are correct?(A) Animal is independent of LikesChappi.(B) LoudNoise is independent of LikesChappi.(C) Animal is conditionally independent of LikesChappi given LoudNoise.(D) LikesChappi is conditionally independent of LoudNoise given Animal.

I Answer:(A) No: likeschappi indicates dog.

(B) No: Not knowing what animal it is, loudnoise is an indication for dog whichindicates likeschappi.

(C) No: For example, even if we know loudnoise, knowing in addition that likeschappigives us a stronger indication of Animal = dog.

(D) Yes: If we know what animal it is, LoudNoise does not influence LikesChappi. (Well,at least that’s a reasonable assumption.)

Kohlhase: Künstliche Intelligenz 2 182 July 12, 2018

Page 140: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Consider the random variables X1 = Animal, X2 = LikesChappi, andX3 = LoudNoise; X1 has values dog, cat, other, X2 and X3 are Boolean.

I Question: Which statements are correct?(A) Animal is independent of LikesChappi.(B) LoudNoise is independent of LikesChappi.(C) Animal is conditionally independent of LikesChappi given LoudNoise.(D) LikesChappi is conditionally independent of LoudNoise given Animal.

I Answer:(A) No: likeschappi indicates dog.(B) No: Not knowing what animal it is, loudnoise is an indication for dog which

indicates likeschappi.

(C) No: For example, even if we know loudnoise, knowing in addition that likeschappigives us a stronger indication of Animal = dog.

(D) Yes: If we know what animal it is, LoudNoise does not influence LikesChappi. (Well,at least that’s a reasonable assumption.)

Kohlhase: Künstliche Intelligenz 2 182 July 12, 2018

Page 141: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Consider the random variables X1 = Animal, X2 = LikesChappi, andX3 = LoudNoise; X1 has values dog, cat, other, X2 and X3 are Boolean.

I Question: Which statements are correct?(A) Animal is independent of LikesChappi.(B) LoudNoise is independent of LikesChappi.(C) Animal is conditionally independent of LikesChappi given LoudNoise.(D) LikesChappi is conditionally independent of LoudNoise given Animal.

I Answer:(A) No: likeschappi indicates dog.(B) No: Not knowing what animal it is, loudnoise is an indication for dog which

indicates likeschappi.(C) No: For example, even if we know loudnoise, knowing in addition that likeschappi

gives us a stronger indication of Animal = dog.

(D) Yes: If we know what animal it is, LoudNoise does not influence LikesChappi. (Well,at least that’s a reasonable assumption.)

Kohlhase: Künstliche Intelligenz 2 182 July 12, 2018

Page 142: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Consider the random variables X1 = Animal, X2 = LikesChappi, andX3 = LoudNoise; X1 has values dog, cat, other, X2 and X3 are Boolean.

I Question: Which statements are correct?(A) Animal is independent of LikesChappi.(B) LoudNoise is independent of LikesChappi.(C) Animal is conditionally independent of LikesChappi given LoudNoise.(D) LikesChappi is conditionally independent of LoudNoise given Animal.

I Answer:(A) No: likeschappi indicates dog.(B) No: Not knowing what animal it is, loudnoise is an indication for dog which

indicates likeschappi.(C) No: For example, even if we know loudnoise, knowing in addition that likeschappi

gives us a stronger indication of Animal = dog.(D) Yes: If we know what animal it is, LoudNoise does not influence LikesChappi. (Well,

at least that’s a reasonable assumption.)

Kohlhase: Künstliche Intelligenz 2 182 July 12, 2018

Page 143: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.8 The Wumpus World Revisited

Kohlhase: Künstliche Intelligenz 2 182 July 12, 2018

Page 144: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Wumpus World Revisited

I Example 8.1 (The Wumpus is Back).

I We have a maze with pits that are detectedin neighboring squares via breeze (forgetwumpus and gold for now)

I Where does the agent should go, if there isbreeze at (1,2) and (2,1)?

I Pure logical inference can conclude nothingabout which square is most likely to be safe!

I Idea: Let’s evaluate our probabilistic reasoning machinery, if that can help!

Kohlhase: Künstliche Intelligenz 2 183 July 12, 2018

Page 145: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Wumpus: Probabilistic Model

I Boolean Variables (only for the observed squares)

I Pi,j : pit at square (i , j)

I Bi,j : breeze at square (i , j)

I Full joint probability distribution1. P(P1,2, . . . ,P4,4,B1,1,B1,2,B2,1) =

P(B1,1,B1,2,B2,1 | P1,2, . . . ,P4,4) P(P1,2, . . . ,P4,4) (Product Rule)2. P(P1,2, . . .P4,4) =

∏4,4i,j=1,1 P(Pi,j) (pits are spread independently)

3. P(P1,2, . . .P4,4) = 0.2n · 0.816− n (probability of a pit is 0.2 and there are n pits)

Kohlhase: Künstliche Intelligenz 2 184 July 12, 2018

Page 146: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Wumpus: Query and Simple Reasoning

Assume that we have evidence:I I b = ¬ b1,1 ∧ b1,2 ∧ b2,1 andI κ = ¬ p1,1 ∧¬ p1,2 ∧¬ p2,1

We are interested in answering queries suchas P(P1,3 | κ, b). (pit in (1, 3) givenevidence)

I The answer can be computed by enumeration of the full joint probabilitydistribution.

I Let U be the variables Pi,j except P1,3 and κ, then

P(P1,3 | κ, b) =∑u∈U

P(P1,3, u, κ, b)

I Problem: We need to explore all possible values of variables in U (212 = 4096terms!)

I Can we do better (faster)?Kohlhase: Künstliche Intelligenz 2 185 July 12, 2018

Page 147: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Wumpus: Conditional Independence

I Observation 8.2.

The observed breezes are conditionallyindependent of the other variables given theknown, frontier, and query variables.

I We split the set of hidden variables into fringe and other variables: U = F ∪Owhere F is the fringe and O the rest.

I From conditional independence we get: P(b | P1,3, κ,U) = P(b | P1,3, κ,F )

I Now, let us exploit this formula.

Kohlhase: Künstliche Intelligenz 2 186 July 12, 2018

Page 148: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Wumpus: Reasoning

I We calculate:

P(P1,3 | κ, b) = α∑u∈U

P(P1,3, u, κ, b)

= α∑u∈U

P(b | P1,3, κ, u) · P(P1,3, κ, u)

= α∑f∈F

∑o∈O

P(b | P1,3, κ, f , o) · P(P1,3, κ, f , o)

= α∑f∈F

P(b | P1,3, κ, f ) ·∑o∈O

P(P1,3, κ, f , o)

= α∑f∈F

P(b | P1,3, κ, f ) ·∑o∈O

P(P1,3) · P(κ) · P(f ) · P(o)

= α P(P1,3) P(κ)∑f∈F

P(b | P1,3, κ, f ) · P(f ) ·∑o∈O

P(o)

= α′ P(P1,3)∑f∈F

P(b | P1,3, κ, f ) · P(f )

for α′ := α P(κ) as∑

o∈O P(o) = 1.

Kohlhase: Künstliche Intelligenz 2 187 July 12, 2018

Page 149: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Wumpus: Solution

I We calculate using the product rule and conditional independence (see above)P(P1,3 | κ, b) = α′ P(P1,3)

∑f∈F

P(b | P1,3, κ, f ) · P(f )

I Let us explore possible models (values) of Fringe that are F compatible withobservation b.

I P(P1,3 | κ, b) = α′ 〈0.2 (0.04+ 0.16+ 0.16), 0.8 (0.04+ 0.16)〉 = 〈0.31, 0.69〉I P(P3,1 | κ, b) = 〈0.31, 0.69〉 by symmetryI P(P2,2 | κ, b) = 〈0.86, 0.14〉 (definitely avoid)

Kohlhase: Künstliche Intelligenz 2 188 July 12, 2018

Page 150: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

7.9 Conclusion

Kohlhase: Künstliche Intelligenz 2 188 July 12, 2018

Page 151: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Summary

I Uncertainty is unavoidable in many environments, namely whenever agents donot have perfect knowledge.

I Probabilities express the degree of belief of an agent, given its knowledge, intoan event.

I Conditional probabilities express the likelihood of an event given observedevidence.

I Assessing a probability means to use statistics to approximate the likelihood ofan event.

I Bayes’ rule allows us to derive, from probabilities that are easy to assess,probabilities that aren’t easy to assess.

I Given multiple evidence, we can exploit conditional independence.I Bayesian networks (up next) do this, in a comprehensive manner.

Kohlhase: Künstliche Intelligenz 2 189 July 12, 2018

Page 152: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Chapter 8 Probabilistic Reasoning, Part II: BayesianNetworks

Kohlhase: Künstliche Intelligenz 2 189 July 12, 2018

Page 153: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

8.1 Introduction

Kohlhase: Künstliche Intelligenz 2 189 July 12, 2018

Page 154: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Reminder: Our Agenda for This Topic

I Our treatment of the topic “Probabilistic Reasoning” consists of this and lastdocument.I Chapter 7: All the basic machinery at use in Bayesian networks.I This Chapter: Bayesian networks: What they are, how to build them, how to use

them.I The most wide-spread and successful practical framework for probabilistic reasoning.

Kohlhase: Künstliche Intelligenz 2 190 July 12, 2018

Page 155: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Reminder: Our Machinery

I 1. Graph captures variable dependencies: (Variables X1, . . . ,Xn)

Toothache Catch

Cavity

I Given evidence e, want to know P(X | e). Remaining vars: Y.I 2. Normalization+Marginalization:

P(X | e) = αP(X , e) = α∑y∈Y

P(X , e, y)

I A sum over atomic events!I 3. Chain rule: X1, . . . ,Xn consistently with dependency graph.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) · . . . · P(X1)

I 4. Exploit conditional independence: Instead of P(Xi | Xi−1, . . . ,X1), we canuse P(Xi | Parents(Xi )).I Bayesian networks!

Kohlhase: Künstliche Intelligenz 2 191 July 12, 2018

Page 156: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Some Applications

I A ubiquitous problem: Observe “symptoms”, need to infer “causes”.Medical Diagnosis Face Recognition

Self-Localization Nuclear Test Ban

Kohlhase: Künstliche Intelligenz 2 192 July 12, 2018

Page 157: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Our Agenda for This Chapter

I What is a Bayesian Network? What is the syntax?I Tells you what Bayesian networks look like.

I What is the Meaning of a Bayesian Network? What is the semantics?I Makes the intuitive meaning precise.

I Constructing Bayesian Networks: How do we design these networks? Whateffect do our choices have on their size?I Before you can start doing inference, you need to model your domain.

I Inference in Bayesian Networks: How do we use these networks? What is theassociated complexity?I Inference is our primary purpose. It is important to understand its complexities and

how it can be improved.

Kohlhase: Künstliche Intelligenz 2 193 July 12, 2018

Page 158: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

8.2 What is a Bayesian Network?

Kohlhase: Künstliche Intelligenz 2 193 July 12, 2018

Page 159: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

What is a Bayesian Network? (Short: BN)

I What do the others say?I “A Bayesian network is a methodology for representing the full joint probability

distribution. In some cases, that representation is compact.”I “A Bayesian network is a graph whose nodes are random variables Xi and whose

edges 〈Xj ,Xi 〉 denote a direct influence of Xj on Xi . Each node Xi is associated witha conditional probability table (CPT), specifying P(Xi | Parents(Xi )).”

I “A Bayesian network is a graphical way to depict conditional independence relationswithin a set of random variables.”

I A Bayesian network (BN) represents the structure of a given domain.Probabilistic inference exploits that structure for improved efficiency.

I BN inference: Determine the distribution of a query variable X given observedevidence e: P(X | e).

Kohlhase: Künstliche Intelligenz 2 194 July 12, 2018

Page 160: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John, Mary, and My Brand-New Alarm

I Example 2.1 (From Russell/Norvig).I I got very valuable stuff at home. So I bought an alarm. Unfortunately, the alarm

just rings at home, doesn’t call me on my mobile.I I’ve got two neighbors, Mary and John, who’ll call me if they hear the alarm.I The problem is that, sometimes, the alarm is caused by an earthquake.I Also, John might confuse the alarm with his telephone, and Maria might miss the

alarm altogether because she typically listens to loud music.

Question: Given that both John and Mary call me, what is the probability of aburglary?

Kohlhase: Künstliche Intelligenz 2 195 July 12, 2018

Page 161: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John, Mary, and My Alarm: Designing the BN

II Cooking Recipe:(1) Design the random variables X1, . . . ,Xn;(2) Identify their dependencies;(3) Insert the conditional probability tables P(Xi | Parents(Xi )).

I Example 2.2 (Let’s cook!).(1) Random variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls.(2) Dependencies: Burglaries and earthquakes are independent (this is actually

debatable ; design decision!)the alarm might be activated by either. John and Mary call if and only if they hearthe alarm (they don’t care about earthquakes)

(3) Conditional probability tables: Assess the probabilities, see next slide.

Kohlhase: Künstliche Intelligenz 2 196 July 12, 2018

Page 162: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John, Mary, and My Alarm: The BN

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

Note: In each P(Xi | Parents(Xi )), we show only P(Xi = T | Parents(Xi )). Wedon’t show P(Xi = F | Parents(Xi )) which is 1− P(Xi = T | Parents(Xi )).

Kohlhase: Künstliche Intelligenz 2 197 July 12, 2018

Page 163: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Syntax of Bayesian Networks

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

I Definition 2.3 (Bayesian Network). Given random variables X1, . . . ,Xn withfinite domains D1, . . . ,Dn, a Bayesian network (also belief network orprobabilistic network) is an acyclic directed graph BN = 〈X1, . . . ,Xn,E 〉. Wedenote Parents(Xi ) := Xj | (Xj ,Xi ) ∈ E. Each Xi is associated with a functionCPT(Xi ) : Di ×

∏Xj∈Parents(Xi )

Dj → [0, 1], the conditional probability table.I Related formalisms summed up under the term graphical models.

Kohlhase: Künstliche Intelligenz 2 198 July 12, 2018

Page 164: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

8.3 What is the Meaning of a Bayesian Network?

Kohlhase: Künstliche Intelligenz 2 198 July 12, 2018

Page 165: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Semantics of BNs: Illustration

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

I Alarm depends on Burglary and Earthquake.I MaryCalls only depends on Alarm.

P(MaryCalls | Alarm,Burglary) = P(MaryCalls | Alarm)

I Bayesian networks represent sets of independence assumptions.

Kohlhase: Künstliche Intelligenz 2 199 July 12, 2018

Page 166: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Semantics of BNs: Illustration, ctd.

I Each node X in a BN is conditionally independent of its non-descendants givenits parents Parents(X ).

. . .

. . .U1

X

Um

Yn

Znj

Y1

Z1j

Kohlhase: Künstliche Intelligenz 2 200 July 12, 2018

Page 167: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Semantics of BNs: Illustration, ctd.

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

I Given the value of Alarm, MaryCalls is independent of?

I Burglary,Earthquake, JohnCalls.

Kohlhase: Künstliche Intelligenz 2 201 July 12, 2018

Page 168: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Semantics of BNs: Illustration, ctd.

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

I Given the value of Alarm, MaryCalls is independent of?I Burglary,Earthquake, JohnCalls.

Kohlhase: Künstliche Intelligenz 2 201 July 12, 2018

Page 169: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Semantics of BNs: Formal

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

I Definition 3.1. Given a Bayesian network BN = 〈X1, . . . ,Xn,E 〉, we identifyBN with the following two assumptions:

(A) For 1 ≤ i ≤ n, Xi is conditionally independent of NonDesc(Xi ) given Parents(Xi ),where NonDesc(Xi ) := Xj | (Xi ,Xj) 6∈ E∗\Parents(Xi ) where E∗ is thetransitive-reflexive closure of E .

(B) For 1 ≤ i ≤ n, all values xi of Xi , and all value combinations of Parents(Xi ), we haveP(xi | Parents(Xi )) = CPT(xi ,Parents(Xi )).

Kohlhase: Künstliche Intelligenz 2 202 July 12, 2018

Page 170: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Recovering the Full Joint Probability Distribution

I “A Bayesian network is a methodology for representing the full joint probabilitydistribution.”

I Problem: How to recover the full joint probability distribution P(X1, . . . ,Xn)from BN = 〈X1, . . . ,Xn,E 〉?

I Chain rule: For any ordering X1, . . . ,Xn, we have:

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) . . .P(X1)

Choose X1, . . . ,Xn consistent with BN: Xj ∈ Parents(Xi ) ; j < i .I Observation 3.2 (Exploiting Conditional Independence).

With BN assumption (A), we can use P(Xi | Parents(Xi )) instead ofP(Xi | Xi−1 . . . ,X1):

P(X1, . . . ,Xn) =n∏

i=1

P(Xi | Parents(Xi ))

The distributions P(Xi | Parents(Xi )) are given by BN assumption (B).I Same for atomic events P(x1, . . . , xn).I Observation 3.3 (Why “acyclic”?). for cyclic BN, this does NOT hold,

indeed cyclic BNs may be self-contradictory. (need a consistent ordering)

Kohlhase: Künstliche Intelligenz 2 203 July 12, 2018

Page 171: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Recovering the Full Joint Probability Distribution

I “A Bayesian network is a methodology for representing the full joint probabilitydistribution.”

I Problem: How to recover the full joint probability distribution P(X1, . . . ,Xn)from BN = 〈X1, . . . ,Xn,E 〉?

I Chain rule: For any ordering X1, . . . ,Xn, we have:

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) . . .P(X1)

Choose X1, . . . ,Xn consistent with BN: Xj ∈ Parents(Xi ) ; j < i .

I Observation 3.2 (Exploiting Conditional Independence).With BN assumption (A), we can use P(Xi | Parents(Xi )) instead ofP(Xi | Xi−1 . . . ,X1):

P(X1, . . . ,Xn) =n∏

i=1

P(Xi | Parents(Xi ))

The distributions P(Xi | Parents(Xi )) are given by BN assumption (B).I Same for atomic events P(x1, . . . , xn).I Observation 3.3 (Why “acyclic”?). for cyclic BN, this does NOT hold,

indeed cyclic BNs may be self-contradictory. (need a consistent ordering)

Kohlhase: Künstliche Intelligenz 2 203 July 12, 2018

Page 172: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Recovering the Full Joint Probability Distribution

I “A Bayesian network is a methodology for representing the full joint probabilitydistribution.”

I Problem: How to recover the full joint probability distribution P(X1, . . . ,Xn)from BN = 〈X1, . . . ,Xn,E 〉?

I Chain rule: For any ordering X1, . . . ,Xn, we have:

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) . . .P(X1)

Choose X1, . . . ,Xn consistent with BN: Xj ∈ Parents(Xi ) ; j < i .I Observation 3.2 (Exploiting Conditional Independence).

With BN assumption (A), we can use P(Xi | Parents(Xi )) instead ofP(Xi | Xi−1 . . . ,X1):

P(X1, . . . ,Xn) =n∏

i=1

P(Xi | Parents(Xi ))

The distributions P(Xi | Parents(Xi )) are given by BN assumption (B).I Same for atomic events P(x1, . . . , xn).

I Observation 3.3 (Why “acyclic”?). for cyclic BN, this does NOT hold,indeed cyclic BNs may be self-contradictory. (need a consistent ordering)

Kohlhase: Künstliche Intelligenz 2 203 July 12, 2018

Page 173: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Recovering the Full Joint Probability Distribution

I “A Bayesian network is a methodology for representing the full joint probabilitydistribution.”

I Problem: How to recover the full joint probability distribution P(X1, . . . ,Xn)from BN = 〈X1, . . . ,Xn,E 〉?

I Chain rule: For any ordering X1, . . . ,Xn, we have:

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) · P(Xn−1 | Xn−2, . . . ,X1) . . .P(X1)

Choose X1, . . . ,Xn consistent with BN: Xj ∈ Parents(Xi ) ; j < i .I Observation 3.2 (Exploiting Conditional Independence).

With BN assumption (A), we can use P(Xi | Parents(Xi )) instead ofP(Xi | Xi−1 . . . ,X1):

P(X1, . . . ,Xn) =n∏

i=1

P(Xi | Parents(Xi ))

The distributions P(Xi | Parents(Xi )) are given by BN assumption (B).I Same for atomic events P(x1, . . . , xn).I Observation 3.3 (Why “acyclic”?). for cyclic BN, this does NOT hold,

indeed cyclic BNs may be self-contradictory. (need a consistent ordering)Kohlhase: Künstliche Intelligenz 2 203 July 12, 2018

Page 174: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Recovering a Probability for John, Mary, and the Alarm

I Example 3.4. John and Mary called because there was an alarm, but noearthquake or burglary

P(j ,m, a,¬ b,¬ e) = P(j | a) · P(m | a) · P(a | ¬ b,¬ e) · P(¬ b) · P(¬ e)

= 0.9 ∗ 0.7 ∗ 0.001 ∗ 0.999 ∗ 0.998= 0.00062

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

Kohlhase: Künstliche Intelligenz 2 204 July 12, 2018

Page 175: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

Animal

LoudNoise LikesChappi

I Say BN is the Bayesian network above. Which statements are correct?(A) Animal is independent of LikesChappi.(B) LoudNoise is independent of LikesChappi.(C) Animal is conditionally independent of LikesChappi given LoudNoise.(D) LikesChappi is conditionally independent of LoudNoise given Animal.

I Answers:(A) No: likeschappi indicates dog.(B) No: Not knowing what animal it is, likeschappi is an indication for dog which

indicates loudnoise.(C) No: For example, even if we know loudnoise, knowing in addition that likeschappi

gives us a stronger indication of Animal = dog.(D) Yes: Xi = LikesChappi is conditionally independent of NonDesc(Xi ) = LoudNoise

given Parents(Xi ) = Animal.

Kohlhase: Künstliche Intelligenz 2 205 July 12, 2018

Page 176: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

Animal

LoudNoise LikesChappi

I Say BN is the Bayesian network above. Which statements are correct?(A) Animal is independent of LikesChappi.(B) LoudNoise is independent of LikesChappi.(C) Animal is conditionally independent of LikesChappi given LoudNoise.(D) LikesChappi is conditionally independent of LoudNoise given Animal.I Answers:(A) No: likeschappi indicates dog.(B) No: Not knowing what animal it is, likeschappi is an indication for dog which

indicates loudnoise.(C) No: For example, even if we know loudnoise, knowing in addition that likeschappi

gives us a stronger indication of Animal = dog.(D) Yes: Xi = LikesChappi is conditionally independent of NonDesc(Xi ) = LoudNoise

given Parents(Xi ) = Animal.

Kohlhase: Künstliche Intelligenz 2 205 July 12, 2018

Page 177: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

8.4 Constructing Bayesian Networks

Kohlhase: Künstliche Intelligenz 2 205 July 12, 2018

Page 178: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Constructing Bayesian Networks

I BN construction algorithm:1. Initialize BN := 〈X1, . . . ,Xn,E〉 where E = ∅.2. Fix any order of the variables, X1, . . . ,Xn.3. for i := 1, . . . , n do

a. Choose a minimal set Parents(Xi )⊆X1, . . . ,Xi−1 so thatP(Xi | Xi−1, . . . ,X1) = P(Xi | Parents(Xi )).

b. For each Xj ∈ Parents(Xi ), insert (Xj ,Xi ) into E .c. Associate Xi with CPT(Xi ) corresponding to P(Xi | Parents(Xi )).

Attention: Which variables we need to include into Parents(Xi ) depends on what“X1, . . . ,Xi−1” is . . . !

II The size of the resulting BN depends on the chosen order X1, . . . ,Xn.I The size of a Bayesian network is not a fixed property of the domain. It depends

on the skill of the designer.

Kohlhase: Künstliche Intelligenz 2 206 July 12, 2018

Page 179: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John and Mary Depend on the Variable Order!

I Example 4.1. MaryCalls, JohnCalls,Alarm,Burglary,Earthquake.

Kohlhase: Künstliche Intelligenz 2 207 July 12, 2018

Page 180: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John and Mary Depend on the Variable Order!

I Example 4.1. MaryCalls, JohnCalls,Alarm,Burglary,Earthquake.

Kohlhase: Künstliche Intelligenz 2 207 July 12, 2018

Page 181: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John and Mary Depend on the Variable Order! Ctd.

I Example 4.2. MaryCalls, JohnCalls,Earthquake,Burglary,Alarm.

Kohlhase: Künstliche Intelligenz 2 208 July 12, 2018

Page 182: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John and Mary Depend on the Variable Order! Ctd.

I Example 4.2. MaryCalls, JohnCalls,Earthquake,Burglary,Alarm.

Kohlhase: Künstliche Intelligenz 2 208 July 12, 2018

Page 183: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

John and Mary, What Went Wrong?

I These BNs link from symptoms to causes! (P(Cavity | Toothache))I We fail to identify many conditional independence relations (e.g., get

dependencies between conditionally independent symptoms).I Also recall: Conditional probabilities P(Symptom | Cause) are more robust and

often easier to assess than P(Cause | Symptom).I Rule of Thumb: We should order causes before symptoms.

Kohlhase: Künstliche Intelligenz 2 209 July 12, 2018

Page 184: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Compactness of Bayesian Networks

I Definition 4.3. Given random variables X1, . . . ,Xn with finite domainsD1, . . . ,Dn, the size of BN = 〈X1, . . . ,Xn,E 〉 is defined assize(BN) :=

∑ni=1 #(Di ) ·

∏Xj∈Parents(Xi )

#(Dj).I = The total number of entries in the CPTs.I Smaller BN ; assess less probabilities, more efficient inference.I Explicit full joint probability distribution has size

∏ni=1 #(Di ).

I If #(Parents(Xi )) ≤ k for every Xi , and Dmax is the largest variable domain,then size(BN) ≤ n #(Dmax)k+1.

I For #(Dmax) = 2, n = 20, k = 4 we have 220 = 1048576 probabilities, but aBayesian network of size ≤ 20 · 25 = 640 . . . !

I In the worst case, size(BN) = n ·∏n

i=1 #(Di ), namely if every variable dependson all its predecessors in the chosen order.

I BNs are compact if each variable is directly influenced only by few of itspredecessor variables.

Kohlhase: Künstliche Intelligenz 2 210 July 12, 2018

Page 185: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Deterministic Nodes

I Problem: Even if max(Parents) is small, the CPT has 2k entries. (worst-case)

I Idea: Usually CPTs follow standard patterns called canonical distributions.

I only need to determine pattern and some values.

I Definition 4.4. A node X in a Bayesian network is called deterministic, if itsvalue is completely determined by the values of Parents(X ).

I Example 4.5 (Logical Dependencies).

In the network on the right, the node European isdeterministic, the CPT corresponds to a logicaldisjunction, i.e.P(european) = P(greek∨ german∨ french).

Greek

German

French

European

I Example 4.6 (Numerical Dependencies).In the network on the right, the nodeStudents is deterministic, the CPTcorresponds to a sum, i.e.P(S = i − d − g) =P(I = i)∧P(D = d)∧P(G = g).

Inscriptions

Dropouts

Graduations

Students

I Intuition: Deterministic nodes model direct, causal relationships

Kohlhase: Künstliche Intelligenz 2 211 July 12, 2018

Page 186: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Deterministic Nodes

I Problem: Even if max(Parents) is small, the CPT has 2k entries. (worst-case)I Idea: Usually CPTs follow standard patterns called canonical distributions.I only need to determine pattern and some values.I Definition 4.4. A node X in a Bayesian network is called deterministic, if its

value is completely determined by the values of Parents(X ).

I Example 4.5 (Logical Dependencies).

In the network on the right, the node European isdeterministic, the CPT corresponds to a logicaldisjunction, i.e.P(european) = P(greek∨ german∨ french).

Greek

German

French

European

I Example 4.6 (Numerical Dependencies).In the network on the right, the nodeStudents is deterministic, the CPTcorresponds to a sum, i.e.P(S = i − d − g) =P(I = i)∧P(D = d)∧P(G = g).

Inscriptions

Dropouts

Graduations

Students

I Intuition: Deterministic nodes model direct, causal relationships

Kohlhase: Künstliche Intelligenz 2 211 July 12, 2018

Page 187: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Deterministic Nodes

I Problem: Even if max(Parents) is small, the CPT has 2k entries. (worst-case)I Idea: Usually CPTs follow standard patterns called canonical distributions.I only need to determine pattern and some values.I Definition 4.4. A node X in a Bayesian network is called deterministic, if its

value is completely determined by the values of Parents(X ).I Example 4.5 (Logical Dependencies).

In the network on the right, the node European isdeterministic, the CPT corresponds to a logicaldisjunction, i.e.P(european) = P(greek∨ german∨ french).

Greek

German

French

European

I Example 4.6 (Numerical Dependencies).In the network on the right, the nodeStudents is deterministic, the CPTcorresponds to a sum, i.e.P(S = i − d − g) =P(I = i)∧P(D = d)∧P(G = g).

Inscriptions

Dropouts

Graduations

Students

I Intuition: Deterministic nodes model direct, causal relationships

Kohlhase: Künstliche Intelligenz 2 211 July 12, 2018

Page 188: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Deterministic Nodes

I Problem: Even if max(Parents) is small, the CPT has 2k entries. (worst-case)I Idea: Usually CPTs follow standard patterns called canonical distributions.I only need to determine pattern and some values.I Definition 4.4. A node X in a Bayesian network is called deterministic, if its

value is completely determined by the values of Parents(X ).I Example 4.5 (Logical Dependencies).

In the network on the right, the node European isdeterministic, the CPT corresponds to a logicaldisjunction, i.e.P(european) = P(greek∨ german∨ french).

Greek

German

French

European

I Example 4.6 (Numerical Dependencies).In the network on the right, the nodeStudents is deterministic, the CPTcorresponds to a sum, i.e.P(S = i − d − g) =P(I = i)∧P(D = d)∧P(G = g).

Inscriptions

Dropouts

Graduations

Students

I Intuition: Deterministic nodes model direct, causal relationships

Kohlhase: Künstliche Intelligenz 2 211 July 12, 2018

Page 189: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Deterministic Nodes

I Problem: Even if max(Parents) is small, the CPT has 2k entries. (worst-case)I Idea: Usually CPTs follow standard patterns called canonical distributions.I only need to determine pattern and some values.I Definition 4.4. A node X in a Bayesian network is called deterministic, if its

value is completely determined by the values of Parents(X ).I Example 4.5 (Logical Dependencies).

In the network on the right, the node European isdeterministic, the CPT corresponds to a logicaldisjunction, i.e.P(european) = P(greek∨ german∨ french).

Greek

German

French

European

I Example 4.6 (Numerical Dependencies).In the network on the right, the nodeStudents is deterministic, the CPTcorresponds to a sum, i.e.P(S = i − d − g) =P(I = i)∧P(D = d)∧P(G = g).

Inscriptions

Dropouts

Graduations

Students

I Intuition: Deterministic nodes model direct, causal relationshipsKohlhase: Künstliche Intelligenz 2 211 July 12, 2018

Page 190: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Noisy Nodes

I Problem: Sometimes, values of nodes are only “almost deterministic”.(uncertain, but mostly logical)

I Idea: Use “noisy” logical relationships. (generalize logical ones softly to [0, 1])I Example 4.7 (Inhibited Causal Dependencies).In the network on the right, deterministic disjunctionfor the node Fever is incorrect, since the diseasessometimes fail to develop fever. The causal relationbetween parent and child is inhibited.

Cold

Flu

Malaria

Fever

I Assumptions: We make the following assumptions for modeling Example 4.7:1. Cold, Flu, and Malaria is a complete list of fever causes (add a leak node for the

others otherwise).2. Inhibitions of the parents are independent

Thus we can model the inhibitions by individual inhibition factors qd .I Definition 4.8. The CPT of a noisy disjunction node X in a Bayesian network

is given by P(xi | Parents(Xi )) =∏j |Xj=T qj , where the qi are the inhibition

factors of Xi ∈ Parents(X ).

Kohlhase: Künstliche Intelligenz 2 212 July 12, 2018

Page 191: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Noisy Nodes

I Problem: Sometimes, values of nodes are only “almost deterministic”.(uncertain, but mostly logical)

I Idea: Use “noisy” logical relationships. (generalize logical ones softly to [0, 1])

I Example 4.7 (Inhibited Causal Dependencies).In the network on the right, deterministic disjunctionfor the node Fever is incorrect, since the diseasessometimes fail to develop fever. The causal relationbetween parent and child is inhibited.

Cold

Flu

Malaria

Fever

I Assumptions: We make the following assumptions for modeling Example 4.7:1. Cold, Flu, and Malaria is a complete list of fever causes (add a leak node for the

others otherwise).2. Inhibitions of the parents are independent

Thus we can model the inhibitions by individual inhibition factors qd .I Definition 4.8. The CPT of a noisy disjunction node X in a Bayesian network

is given by P(xi | Parents(Xi )) =∏j |Xj=T qj , where the qi are the inhibition

factors of Xi ∈ Parents(X ).

Kohlhase: Künstliche Intelligenz 2 212 July 12, 2018

Page 192: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Noisy Nodes

I Problem: Sometimes, values of nodes are only “almost deterministic”.(uncertain, but mostly logical)

I Idea: Use “noisy” logical relationships. (generalize logical ones softly to [0, 1])I Example 4.7 (Inhibited Causal Dependencies).In the network on the right, deterministic disjunctionfor the node Fever is incorrect, since the diseasessometimes fail to develop fever. The causal relationbetween parent and child is inhibited.

Cold

Flu

Malaria

Fever

I Assumptions: We make the following assumptions for modeling Example 4.7:1. Cold, Flu, and Malaria is a complete list of fever causes (add a leak node for the

others otherwise).2. Inhibitions of the parents are independent

Thus we can model the inhibitions by individual inhibition factors qd .I Definition 4.8. The CPT of a noisy disjunction node X in a Bayesian network

is given by P(xi | Parents(Xi )) =∏j |Xj=T qj , where the qi are the inhibition

factors of Xi ∈ Parents(X ).

Kohlhase: Künstliche Intelligenz 2 212 July 12, 2018

Page 193: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Noisy Nodes

I Problem: Sometimes, values of nodes are only “almost deterministic”.(uncertain, but mostly logical)

I Idea: Use “noisy” logical relationships. (generalize logical ones softly to [0, 1])I Example 4.7 (Inhibited Causal Dependencies).In the network on the right, deterministic disjunctionfor the node Fever is incorrect, since the diseasessometimes fail to develop fever. The causal relationbetween parent and child is inhibited.

Cold

Flu

Malaria

Fever

I Assumptions: We make the following assumptions for modeling Example 4.7:1. Cold, Flu, and Malaria is a complete list of fever causes (add a leak node for the

others otherwise).2. Inhibitions of the parents are independent

Thus we can model the inhibitions by individual inhibition factors qd .I Definition 4.8. The CPT of a noisy disjunction node X in a Bayesian network

is given by P(xi | Parents(Xi )) =∏j |Xj=T qj , where the qi are the inhibition

factors of Xi ∈ Parents(X ).

Kohlhase: Künstliche Intelligenz 2 212 July 12, 2018

Page 194: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Noisy Nodes

I Example 4.9. We have the following inhibition factors for Example 4.7:

qcold = P(¬ fever | cold,¬ flu,¬malaria) = 0.6qflu = P(¬ fever | ¬ cold, flu,¬malaria) = 0.2

qmalaria = P(¬ fever | ¬ cold,¬ flu,malaria) = 0.1

If we model Fever as a noisy disjunction node, then the general ruleP(xi | Parents(Xi )) =

∏j |Xj=T qj for the CPT gives the following table:

Cold Flu Malaria P(Fever) P(¬Fever)F F F 0.0 1.0F F T 0.9 0.1F T F 0.8 0.2F T T 0.98 0.02 = 0.2 · 0.1T F F 0.4 0.6T F T 0.94 0.06 = 0.6 · 0.1T T F 0.88 0.12 = 0.6 · 0.2T T T 0.988 0.012 = 0.6 · 0.2 · 0.1

Kohlhase: Künstliche Intelligenz 2 213 July 12, 2018

Page 195: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Representing Conditional Distributions: Noisy Nodes

I Observation 4.10. In general, noisy logical relationships in which a variabledepends on k parents can be described by O(k) parameters instead of O(2k) forthe full conditional probability table. This can make assessment (and learning)tractable.

I Example 4.11. The CPCS network [PraProMid:kelbn94] uses noisy-OR andnoisy-MAX distributions to model relationships among diseases and symptoms ininternal medicine. With 448 nodes and 906 links, it requires only 8,254 valuesinstead of 133,931,430 for a network with full CPTs.

Kohlhase: Künstliche Intelligenz 2 214 July 12, 2018

Page 196: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = Animal,X3 = LikesChappi?

I Answer:Animal

LoudNoise LikesChappi

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = LikesChappi,X3 = Animal?

I Answer:Animal

LoudNoise LikesChappi

Kohlhase: Künstliche Intelligenz 2 215 July 12, 2018

Page 197: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = Animal,X3 = LikesChappi?

I Answer:Animal

LoudNoise LikesChappi

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = LikesChappi,X3 = Animal?

I Answer:Animal

LoudNoise LikesChappi

Kohlhase: Künstliche Intelligenz 2 215 July 12, 2018

Page 198: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = Animal,X3 = LikesChappi?

I Answer:Animal

LoudNoise LikesChappi

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = LikesChappi,X3 = Animal?

I Answer:Animal

LoudNoise LikesChappi

Kohlhase: Künstliche Intelligenz 2 215 July 12, 2018

Page 199: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = Animal,X3 = LikesChappi?

I Answer:Animal

LoudNoise LikesChappi

I Question: What is the Bayesian network we get by constructing according to theordering X1 = LoudNoise,X2 = LikesChappi,X3 = Animal?

I Answer:Animal

LoudNoise LikesChappi

Kohlhase: Künstliche Intelligenz 2 215 July 12, 2018

Page 200: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

8.5 Inference in Bayesian Networks

Kohlhase: Künstliche Intelligenz 2 215 July 12, 2018

Page 201: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference for Mary and John

I Intuition: Observe evidence variables and draw conclusions on query variables.I Example 5.1.

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

I What is P(Burglary | johncalls)?I What is P(Burglary | johncalls,marycalls)?

Kohlhase: Künstliche Intelligenz 2 216 July 12, 2018

Page 202: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Probabilistic Inference Tasks in Bayesian Networks

I Definition 5.2 (Probabilistic Inference Task). Given random variablesX1, . . . ,Xn, a probabilistic inference task consists of a set X⊆X1, . . . ,Xn ofquery variables, a set E⊆X1, . . . ,Xn of evidence variables, and an event ethat assigns values to E. We wish to compute the posterior probabilitydistribution P(X | e).Y := X1, . . . ,Xn\(X∪E) are the hidden variables.

I Notes:I We assume that a BN for X1, . . . ,Xn is given.I In the remainder, for simplicity, X = X is a singleton.

I Example 5.3. In P(Burglary | johncalls,marycalls), X = Burglary,e = johncalls,marycalls, and Y = Alarm,EarthQuake.

Kohlhase: Künstliche Intelligenz 2 217 July 12, 2018

Page 203: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: The Principle (A Reminder!)

I Problem: Given evidence e, want to know P(X | e). Hidden variables: Y.

I 1. Bayesian network BN captures variable dependencies.I 2. Normalization+Marginalization.

P(X | e) = α P(X , e); if Y 6= ∅ then P(X | e) = α∑

y∈Y P(X , e, y)

I Recover the summed-up probabilities P(X , e, y) from BN!I 3. Chain rule. Order X1, . . . ,Xn consistent with BN.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) P(Xn−1 | Xn−2, . . . ,X1) . . . P(X1)

I 4. Exploit conditional independence. Instead of P(Xi | Xi−1, . . . ,X1), useP(Xi | Parents(Xi )).

I Given a Bayesian network BN, probabilistic inference tasks can be solved assums of products of conditional probabilities from BN.

I Sum over all value combinations of hidden variables.

Kohlhase: Künstliche Intelligenz 2 218 July 12, 2018

Page 204: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: The Principle (A Reminder!)

I Problem: Given evidence e, want to know P(X | e). Hidden variables: Y.I 1. Bayesian network BN captures variable dependencies.

I 2. Normalization+Marginalization.P(X | e) = α P(X , e); if Y 6= ∅ then P(X | e) = α

∑y∈Y P(X , e, y)

I Recover the summed-up probabilities P(X , e, y) from BN!I 3. Chain rule. Order X1, . . . ,Xn consistent with BN.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) P(Xn−1 | Xn−2, . . . ,X1) . . . P(X1)

I 4. Exploit conditional independence. Instead of P(Xi | Xi−1, . . . ,X1), useP(Xi | Parents(Xi )).

I Given a Bayesian network BN, probabilistic inference tasks can be solved assums of products of conditional probabilities from BN.

I Sum over all value combinations of hidden variables.

Kohlhase: Künstliche Intelligenz 2 218 July 12, 2018

Page 205: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: The Principle (A Reminder!)

I Problem: Given evidence e, want to know P(X | e). Hidden variables: Y.I 1. Bayesian network BN captures variable dependencies.I 2. Normalization+Marginalization.

P(X | e) = α P(X , e); if Y 6= ∅ then P(X | e) = α∑

y∈Y P(X , e, y)

I Recover the summed-up probabilities P(X , e, y) from BN!I 3. Chain rule. Order X1, . . . ,Xn consistent with BN.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) P(Xn−1 | Xn−2, . . . ,X1) . . . P(X1)

I 4. Exploit conditional independence. Instead of P(Xi | Xi−1, . . . ,X1), useP(Xi | Parents(Xi )).

I Given a Bayesian network BN, probabilistic inference tasks can be solved assums of products of conditional probabilities from BN.

I Sum over all value combinations of hidden variables.

Kohlhase: Künstliche Intelligenz 2 218 July 12, 2018

Page 206: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: The Principle (A Reminder!)

I Problem: Given evidence e, want to know P(X | e). Hidden variables: Y.I 1. Bayesian network BN captures variable dependencies.I 2. Normalization+Marginalization.

P(X | e) = α P(X , e); if Y 6= ∅ then P(X | e) = α∑

y∈Y P(X , e, y)

I Recover the summed-up probabilities P(X , e, y) from BN!I 3. Chain rule. Order X1, . . . ,Xn consistent with BN.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) P(Xn−1 | Xn−2, . . . ,X1) . . . P(X1)

I 4. Exploit conditional independence. Instead of P(Xi | Xi−1, . . . ,X1), useP(Xi | Parents(Xi )).

I Given a Bayesian network BN, probabilistic inference tasks can be solved assums of products of conditional probabilities from BN.

I Sum over all value combinations of hidden variables.

Kohlhase: Künstliche Intelligenz 2 218 July 12, 2018

Page 207: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: The Principle (A Reminder!)

I Problem: Given evidence e, want to know P(X | e). Hidden variables: Y.I 1. Bayesian network BN captures variable dependencies.I 2. Normalization+Marginalization.

P(X | e) = α P(X , e); if Y 6= ∅ then P(X | e) = α∑

y∈Y P(X , e, y)

I Recover the summed-up probabilities P(X , e, y) from BN!I 3. Chain rule. Order X1, . . . ,Xn consistent with BN.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) P(Xn−1 | Xn−2, . . . ,X1) . . . P(X1)

I 4. Exploit conditional independence. Instead of P(Xi | Xi−1, . . . ,X1), useP(Xi | Parents(Xi )).

I Given a Bayesian network BN, probabilistic inference tasks can be solved assums of products of conditional probabilities from BN.

I Sum over all value combinations of hidden variables.

Kohlhase: Künstliche Intelligenz 2 218 July 12, 2018

Page 208: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: The Principle (A Reminder!)

I Problem: Given evidence e, want to know P(X | e). Hidden variables: Y.I 1. Bayesian network BN captures variable dependencies.I 2. Normalization+Marginalization.

P(X | e) = α P(X , e); if Y 6= ∅ then P(X | e) = α∑

y∈Y P(X , e, y)

I Recover the summed-up probabilities P(X , e, y) from BN!I 3. Chain rule. Order X1, . . . ,Xn consistent with BN.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) P(Xn−1 | Xn−2, . . . ,X1) . . . P(X1)

I 4. Exploit conditional independence. Instead of P(Xi | Xi−1, . . . ,X1), useP(Xi | Parents(Xi )).

I Given a Bayesian network BN, probabilistic inference tasks can be solved assums of products of conditional probabilities from BN.

I Sum over all value combinations of hidden variables.

Kohlhase: Künstliche Intelligenz 2 218 July 12, 2018

Page 209: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: The Principle (A Reminder!)

I Problem: Given evidence e, want to know P(X | e). Hidden variables: Y.I 1. Bayesian network BN captures variable dependencies.I 2. Normalization+Marginalization.

P(X | e) = α P(X , e); if Y 6= ∅ then P(X | e) = α∑

y∈Y P(X , e, y)

I Recover the summed-up probabilities P(X , e, y) from BN!I 3. Chain rule. Order X1, . . . ,Xn consistent with BN.

P(X1, . . . ,Xn) = P(Xn | Xn−1, . . . ,X1) P(Xn−1 | Xn−2, . . . ,X1) . . . P(X1)

I 4. Exploit conditional independence. Instead of P(Xi | Xi−1, . . . ,X1), useP(Xi | Parents(Xi )).

I Given a Bayesian network BN, probabilistic inference tasks can be solved assums of products of conditional probabilities from BN.

I Sum over all value combinations of hidden variables.

Kohlhase: Künstliche Intelligenz 2 218 July 12, 2018

Page 210: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: John and Mary

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

I Want: P(Burglary | johncalls,marycalls).Hidden variables: Y = Earthquake,Alarm.

I Normalization+Marginalization:

P(B | j ,m) = α P(B, j ,m) = α∑vE

∑vA

P(B, j ,m, vE , vA)

I Order X1 = B, X2 = E , X3 = A, X4 = J, X5 = M.I Chain rule and conditional independence:

P(B | j ,m) = α∑vE

∑vA

P(B) · P(vE ) · P(vA | B, vE ) · P(j | vA) · P(m | vA)

I Continuation on next slide . . .Kohlhase: Künstliche Intelligenz 2 219 July 12, 2018

Page 211: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: John and Mary

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

I Want: P(Burglary | johncalls,marycalls).Hidden variables: Y = Earthquake,Alarm.

I Normalization+Marginalization:

P(B | j ,m) = α P(B, j ,m) = α∑vE

∑vA

P(B, j ,m, vE , vA)

I Order X1 = B, X2 = E , X3 = A, X4 = J, X5 = M.I Chain rule and conditional independence:

P(B | j ,m) = α∑vE

∑vA

P(B) · P(vE ) · P(vA | B, vE ) · P(j | vA) · P(m | vA)

I Continuation on next slide . . .Kohlhase: Künstliche Intelligenz 2 219 July 12, 2018

Page 212: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: John and Mary

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

I Want: P(Burglary | johncalls,marycalls).Hidden variables: Y = Earthquake,Alarm.

I Normalization+Marginalization:

P(B | j ,m) = α P(B, j ,m) = α∑vE

∑vA

P(B, j ,m, vE , vA)

I Order X1 = B, X2 = E , X3 = A, X4 = J, X5 = M.

I Chain rule and conditional independence:

P(B | j ,m) = α∑vE

∑vA

P(B) · P(vE ) · P(vA | B, vE ) · P(j | vA) · P(m | vA)

I Continuation on next slide . . .Kohlhase: Künstliche Intelligenz 2 219 July 12, 2018

Page 213: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: John and Mary

B

TTFF

E

TFTF

P(A)

.95

.29

.001

.001

P(B).002

P(E)

Alarm

Earthquake

MaryCallsJohnCalls

Burglary

A P(J)

TF

.90

.05

A P(M)

TF

.70

.01

.94

I Want: P(Burglary | johncalls,marycalls).Hidden variables: Y = Earthquake,Alarm.

I Normalization+Marginalization:

P(B | j ,m) = α P(B, j ,m) = α∑vE

∑vA

P(B, j ,m, vE , vA)

I Order X1 = B, X2 = E , X3 = A, X4 = J, X5 = M.I Chain rule and conditional independence:

P(B | j ,m) = α∑vE

∑vA

P(B) · P(vE ) · P(vA | B, vE ) · P(j | vA) · P(m | vA)

I Continuation on next slide . . .Kohlhase: Künstliche Intelligenz 2 219 July 12, 2018

Page 214: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: John and Mary, ctd.

I Move variables outwards (until we hit the first parent):

P(B | j ,m) = α · P(B) ·∑vE

P(vE ) ·∑vA

P(vA | B, vE ) · P(j | vA) · P(m | vA)

I The probabilities of the outside-variables multiply the entire “rest of the sum”

I Chain rule and conditional independence, ctd.:P(B | j ,m)

= α P(B)∑vE

P(vE )∑vA

P(vA | B, vE ) P(j | vA) P(m | vA)

= α · P(b) ·

P(e) ·

a︷ ︸︸ ︷

P(a | b, e) P(j | a) P(m | a)+ P(¬ a | b, e) P(j | ¬ a) P(m | ¬ a)︸ ︷︷ ︸

¬ a

e

+ P(¬ e) ·

a︷ ︸︸ ︷

P(a | b,¬ e) P(j | a) P(m | a)+ P(¬ a | b,¬ e) P(j | ¬ a) P(m | ¬ a)︸ ︷︷ ︸

¬ a

¬ e

= α 〈0.00059224, 0.0014919〉 ≈ 〈0.284, 0.716〉

Kohlhase: Künstliche Intelligenz 2 220 July 12, 2018

Page 215: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: John and Mary, ctd.

I Move variables outwards (until we hit the first parent):

P(B | j ,m) = α · P(B) ·∑vE

P(vE ) ·∑vA

P(vA | B, vE ) · P(j | vA) · P(m | vA)

I The probabilities of the outside-variables multiply the entire “rest of the sum”I Chain rule and conditional independence, ctd.:

P(B | j ,m)

= α P(B)∑vE

P(vE )∑vA

P(vA | B, vE ) P(j | vA) P(m | vA)

= α · P(b) ·

P(e) ·

a︷ ︸︸ ︷

P(a | b, e) P(j | a) P(m | a)+ P(¬ a | b, e) P(j | ¬ a) P(m | ¬ a)︸ ︷︷ ︸

¬ a

e

+ P(¬ e) ·

a︷ ︸︸ ︷

P(a | b,¬ e) P(j | a) P(m | a)+ P(¬ a | b,¬ e) P(j | ¬ a) P(m | ¬ a)︸ ︷︷ ︸

¬ a

¬ e

= α 〈0.00059224, 0.0014919〉 ≈ 〈0.284, 0.716〉

Kohlhase: Künstliche Intelligenz 2 220 July 12, 2018

Page 216: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Evaluation of P(b | j ,m), as a “Search Tree”

I Inference by enumeration = a tree with “sum nodes” branching over values ofhidden variables, and with non-branching “multiplication nodes”.

Kohlhase: Künstliche Intelligenz 2 221 July 12, 2018

Page 217: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: Properties

I Inference by Enumeration:I Evaluates the tree in a depth-first manner.

I Space Complexity: Linear in the number of variables.I Time Complexity: Exponential in the number of hidden variables, e.g. O(2#(Y)) in

case these variables are Boolean.I Can we do better than this?I Variable Elimination:I Improves on inference by enumeration through (A) avoiding repeated computation,

and (B) avoiding irrelevant computation.I In some special cases, variable elimination runs in polynomial time.

Kohlhase: Künstliche Intelligenz 2 222 July 12, 2018

Page 218: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference by Enumeration: Properties

I Inference by Enumeration:I Evaluates the tree in a depth-first manner.I Space Complexity: Linear in the number of variables.I Time Complexity: Exponential in the number of hidden variables, e.g. O(2#(Y)) in

case these variables are Boolean.I Can we do better than this?I Variable Elimination:I Improves on inference by enumeration through (A) avoiding repeated computation,

and (B) avoiding irrelevant computation.I In some special cases, variable elimination runs in polynomial time.

Kohlhase: Künstliche Intelligenz 2 222 July 12, 2018

Page 219: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Variable Elimination: Sketch of Ideas

I (A) Avoiding repeated computation: Evaluate expressions from right to left,storing all intermediate results.

I For query P(B | j ,m):1. CPTs of BN yield factors (probability tables):

P(B | j ,m) = α P(B)︸ ︷︷ ︸f1(B)

∑vE

P(vE )︸ ︷︷ ︸f2(E)

∑vA

P(vA | B, vE )︸ ︷︷ ︸f3(A,B,E)

P(j | vA)︸ ︷︷ ︸f4(A)

P(m | vA)︸ ︷︷ ︸f5(A)

2. Then the computation is performed in terms of factor product and summing outvariables from factors:

P(B | j ,m) = α · f1(B) ·∑vE

f2(E) ·∑vA

f3(A,B,E) · f4(A) · f5(A)

I (B) Avoiding irrelevant computation: Repeatedly remove hidden variablesthat are leaf nodes.

I For query P(JohnCalls | burglary):

P(J | b) = α P(b)∑vE

P(vE ),∑vA

P(vA | b, vE ) P(J | vA)∑vM

P(vM | vA)

I The rightmost sum equals 1 and can be dropped.

Kohlhase: Künstliche Intelligenz 2 223 July 12, 2018

Page 220: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Variable Elimination: Sketch of Ideas

I (A) Avoiding repeated computation: Evaluate expressions from right to left,storing all intermediate results.

I For query P(B | j ,m):1. CPTs of BN yield factors (probability tables):

P(B | j ,m) = α P(B)︸ ︷︷ ︸f1(B)

∑vE

P(vE )︸ ︷︷ ︸f2(E)

∑vA

P(vA | B, vE )︸ ︷︷ ︸f3(A,B,E)

P(j | vA)︸ ︷︷ ︸f4(A)

P(m | vA)︸ ︷︷ ︸f5(A)

2. Then the computation is performed in terms of factor product and summing outvariables from factors:

P(B | j ,m) = α · f1(B) ·∑vE

f2(E) ·∑vA

f3(A,B,E) · f4(A) · f5(A)

I (B) Avoiding irrelevant computation: Repeatedly remove hidden variablesthat are leaf nodes.

I For query P(JohnCalls | burglary):

P(J | b) = α P(b)∑vE

P(vE ),∑vA

P(vA | b, vE ) P(J | vA)∑vM

P(vM | vA)

I The rightmost sum equals 1 and can be dropped.Kohlhase: Künstliche Intelligenz 2 223 July 12, 2018

Page 221: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Complexity of Exact Inference

I Good News:I Definition 5.4. A graph is called singly connected, or a polytree, if there is at most

one undirected path between any two nodes in the graph.I Theorem 5.5. On polytree Bayesian networks, variable elimination runs in

polynomial time.

I Is our BN for Mary & John a polytree?I Bad News:I For multiply connected Bayesian networks, in general probabilistic inference is

#P-hard.I #P is harder than NP (i.e. NP ⊆ #P).

I So?I Life goes on . . . In the hard cases, if need be we can throw exactitude to the

winds and approximate.I Example 5.6. Sampling techniques

Kohlhase: Künstliche Intelligenz 2 224 July 12, 2018

Page 222: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Complexity of Exact Inference

I Good News:I Definition 5.4. A graph is called singly connected, or a polytree, if there is at most

one undirected path between any two nodes in the graph.I Theorem 5.5. On polytree Bayesian networks, variable elimination runs in

polynomial time.I Is our BN for Mary & John a polytree?

I Bad News:I For multiply connected Bayesian networks, in general probabilistic inference is

#P-hard.I #P is harder than NP (i.e. NP ⊆ #P).

I So?I Life goes on . . . In the hard cases, if need be we can throw exactitude to the

winds and approximate.I Example 5.6. Sampling techniques

Kohlhase: Künstliche Intelligenz 2 224 July 12, 2018

Page 223: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Complexity of Exact Inference

I Good News:I Definition 5.4. A graph is called singly connected, or a polytree, if there is at most

one undirected path between any two nodes in the graph.I Theorem 5.5. On polytree Bayesian networks, variable elimination runs in

polynomial time.I Is our BN for Mary & John a polytree?I Bad News:I For multiply connected Bayesian networks, in general probabilistic inference is

#P-hard.I #P is harder than NP (i.e. NP ⊆ #P).

I So?

I Life goes on . . . In the hard cases, if need be we can throw exactitude to thewinds and approximate.

I Example 5.6. Sampling techniques

Kohlhase: Künstliche Intelligenz 2 224 July 12, 2018

Page 224: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

The Complexity of Exact Inference

I Good News:I Definition 5.4. A graph is called singly connected, or a polytree, if there is at most

one undirected path between any two nodes in the graph.I Theorem 5.5. On polytree Bayesian networks, variable elimination runs in

polynomial time.I Is our BN for Mary & John a polytree?I Bad News:I For multiply connected Bayesian networks, in general probabilistic inference is

#P-hard.I #P is harder than NP (i.e. NP ⊆ #P).

I So?I Life goes on . . . In the hard cases, if need be we can throw exactitude to the

winds and approximate.I Example 5.6. Sampling techniques

Kohlhase: Künstliche Intelligenz 2 224 July 12, 2018

Page 225: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

8.6 Conclusion

Kohlhase: Künstliche Intelligenz 2 224 July 12, 2018

Page 226: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Summary

I Bayesian networks (BN) are a wide-spread tool to model uncertainty, and toreason about it. A BN represents conditional independence relations betweenrandom variables. It consists of a graph encoding the variable dependencies, andof conditional probability tables (CPTs).

I Given a variable order, the BN is small if every variable depends on only a few ofits predecessors.

I Probabilistic inference requires to compute the probability distribution of a set ofquery variables, given a set of evidence variables whose values we know. Theremaining variables are hidden.

I Inference by enumeration takes a BN as input, then appliesNormalization+Marginalization, the Chain rule, and exploits conditionalindependence. This can be viewed as a tree search that branches over all valuesof the hidden variables.

I Variable elimination avoids unnecessary computation. It runs in polynomial timefor poly-tree BNs. In general, exact probabilistic inference is #P-hard.Approximate probabilistic inference methods exist.

Kohlhase: Künstliche Intelligenz 2 225 July 12, 2018

Page 227: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Topics We Didn’t Cover Here

I Inference by sampling: A whole zoo of methods for doing this exists.

I Clustering: Pre-combining subsets of variables to reduce the runtime ofinference.

I Compilation to SAT: More precisely, to “weighted model counting” in CNFformulas. Model counting extends DPLL with the ability to determine thenumber of satisfying interpretations. Weighted model counting allows to definea mass for each such interpretation (= the probability of an atomic event).

I Dynamic BN: BN with one slice of variables at each “time step”, encodingprobabilistic behavior over time.

I Relational BN: BN with predicates and object variables.I First-order BN: Relational BN with quantification, i.e. probabilistic logic. E.g.,

the BLOG language developed by Stuart Russel and co-workers.

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 228: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Topics We Didn’t Cover Here

I Inference by sampling: A whole zoo of methods for doing this exists.I Clustering: Pre-combining subsets of variables to reduce the runtime of

inference.

I Compilation to SAT: More precisely, to “weighted model counting” in CNFformulas. Model counting extends DPLL with the ability to determine thenumber of satisfying interpretations. Weighted model counting allows to definea mass for each such interpretation (= the probability of an atomic event).

I Dynamic BN: BN with one slice of variables at each “time step”, encodingprobabilistic behavior over time.

I Relational BN: BN with predicates and object variables.I First-order BN: Relational BN with quantification, i.e. probabilistic logic. E.g.,

the BLOG language developed by Stuart Russel and co-workers.

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 229: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Topics We Didn’t Cover Here

I Inference by sampling: A whole zoo of methods for doing this exists.I Clustering: Pre-combining subsets of variables to reduce the runtime of

inference.I Compilation to SAT: More precisely, to “weighted model counting” in CNF

formulas. Model counting extends DPLL with the ability to determine thenumber of satisfying interpretations. Weighted model counting allows to definea mass for each such interpretation (= the probability of an atomic event).

I Dynamic BN: BN with one slice of variables at each “time step”, encodingprobabilistic behavior over time.

I Relational BN: BN with predicates and object variables.I First-order BN: Relational BN with quantification, i.e. probabilistic logic. E.g.,

the BLOG language developed by Stuart Russel and co-workers.

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 230: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Topics We Didn’t Cover Here

I Inference by sampling: A whole zoo of methods for doing this exists.I Clustering: Pre-combining subsets of variables to reduce the runtime of

inference.I Compilation to SAT: More precisely, to “weighted model counting” in CNF

formulas. Model counting extends DPLL with the ability to determine thenumber of satisfying interpretations. Weighted model counting allows to definea mass for each such interpretation (= the probability of an atomic event).

I Dynamic BN: BN with one slice of variables at each “time step”, encodingprobabilistic behavior over time.

I Relational BN: BN with predicates and object variables.I First-order BN: Relational BN with quantification, i.e. probabilistic logic. E.g.,

the BLOG language developed by Stuart Russel and co-workers.

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 231: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Topics We Didn’t Cover Here

I Inference by sampling: A whole zoo of methods for doing this exists.I Clustering: Pre-combining subsets of variables to reduce the runtime of

inference.I Compilation to SAT: More precisely, to “weighted model counting” in CNF

formulas. Model counting extends DPLL with the ability to determine thenumber of satisfying interpretations. Weighted model counting allows to definea mass for each such interpretation (= the probability of an atomic event).

I Dynamic BN: BN with one slice of variables at each “time step”, encodingprobabilistic behavior over time.

I Relational BN: BN with predicates and object variables.

I First-order BN: Relational BN with quantification, i.e. probabilistic logic. E.g.,the BLOG language developed by Stuart Russel and co-workers.

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 232: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Topics We Didn’t Cover Here

I Inference by sampling: A whole zoo of methods for doing this exists.I Clustering: Pre-combining subsets of variables to reduce the runtime of

inference.I Compilation to SAT: More precisely, to “weighted model counting” in CNF

formulas. Model counting extends DPLL with the ability to determine thenumber of satisfying interpretations. Weighted model counting allows to definea mass for each such interpretation (= the probability of an atomic event).

I Dynamic BN: BN with one slice of variables at each “time step”, encodingprobabilistic behavior over time.

I Relational BN: BN with predicates and object variables.I First-order BN: Relational BN with quantification, i.e. probabilistic logic. E.g.,

the BLOG language developed by Stuart Russel and co-workers.

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 233: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Chapter 9 Making Simple Decisions Rationally

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 234: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

9.1 Introduction

Kohlhase: Künstliche Intelligenz 2 226 July 12, 2018

Page 235: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Decision Theory

I Definition 1.1. Decision theory investigates how an agent a deals with choosingamong actions based on the desirability of their outcomes.

I Wait: Isn’t that what we did in ?problem-solving? Problem Solving?

I Yes, but: now we do it for stochastic (i.e. non-deterministic), partiallyobservable environments.

I Recall: We call an agent environmentI fully observable, iff the a’s sensors give it access to the complete state of the

environment at any point in time, else partially observable.I deterministic, iff the next state of the environment is completely determined by the

current state and a’s action, else stochastic.I episodic, iff a’s experience is divided into atomic episodes, where it perceives and

then performes a single action. Crucially the next episode does not depend onprevious ones. Non-episodic environments are called sequential.

I We restrict ourselves to episodic decision theory, which deals with choosingamong actions based on the desirability of their immediate outcomes.

Kohlhase: Künstliche Intelligenz 2 227 July 12, 2018

Page 236: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Preview: Episodic Decision Theory

I Problem: The environment is partially observable, so we do not know the“current state”

I Idea: rational decisions = choose actions that maximize expected utility (MEU)I Treat the result of an action a as a random variable R(a) whose variables are the

possible outcome states.I Study P(R(a) = s ′ | a, e) given evidence observations e.I Capture the agent’s preferences in a utility function U from states to R+

0 .I Definition 1.2. The expected utility EU(a) of an action a (given evidence e) is then

EU(a|e) =∑s′

P(R(a) = s ′ | a, e) · U(s ′)

I Intuitively: A formalization of what it means to “do the right thing”.

I Hooray: This solves all of the AI problem (in principle)

I Problem: There is a long long way towards an operationalization (do that now)

Kohlhase: Künstliche Intelligenz 2 228 July 12, 2018

Page 237: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Outline of this Chapter

I Rational preferencesI Utilities and MoneyI Multiattribute utilitiesI Decision networksI Value of information

Kohlhase: Künstliche Intelligenz 2 229 July 12, 2018

Page 238: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

9.2 Rational Preferences

Kohlhase: Künstliche Intelligenz 2 229 July 12, 2018

Page 239: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Preferences

I Problem: We cannot directly measure utility of (or satisfaction/happiness in) astate.

I Idea: We can let people choose between two states! (subjective preference)I Definition 2.1.

An agent chooses among prizes(A, B, etc.) and lotteries, i.e.,situations with uncertain prizes.

Lottery L = [p,A; (1− p),B]

L

A

B

p

1− p

I Definition 2.2 (Preferences).A ≺ B A preferred to BA ∼ B indifference between A and BA B B not preferred to A

Kohlhase: Künstliche Intelligenz 2 230 July 12, 2018

Page 240: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Rational preferences

I Idea: Preferences of a rational agent must obey constraints:Rational preferences ; behavior describable as maximization of expected utility

I Definition 2.3. Constraints:Orderability A ≺ B ∨B ≺ A∨A ∼ BTransitivity A ≺ B ∧B ≺ C⇒A ≺ CContinuity A ≺ B ≺ C⇒ (∃p [p,A; (1− p),C ] ∼ B)Substitutability A ∼ B⇒ [p,A; (1− p),C ] ∼ [p,B; (1− p),C ]Monotonicity A ≺ B⇒ (p≥q)⇔ [p,A; (1− p),B] [q,A; (1− q),B]

Kohlhase: Künstliche Intelligenz 2 231 July 12, 2018

Page 241: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Rational preferences contd.

I Violating the constraints leads to self-evident irrationalityI Example 2.4. An agent with intransitive preferences can be induced to give

away all its money:I If B ≺ C , then an agent who has C would pay (say) 1 cent to get BI If A ≺ B, then an agent who has B would pay (say) 1 cent to get AI If C ≺ A, then an agent who has A would pay (say) 1 cent to get C

Kohlhase: Künstliche Intelligenz 2 232 July 12, 2018

Page 242: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

9.3 Utilities and Money

Kohlhase: Künstliche Intelligenz 2 232 July 12, 2018

Page 243: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Ramseys Theorem and Value Functions

I Theorem 3.1. (Ramsey, 1931; von Neumann and Morgenstern, 1944)Given preferences satisfying the constraints there exists a real-valued function Usuch that

(U(A)≥U(B))⇔A B and U([p1,S1; . . .; pn,Sn]) =∑i

pi U(Si )

I These are existence theorems, uniqueness not guaranteed.

I Note: Agent behavior is invariant w.r.t. positive linear transformation, i.e.

U ′(x) = k1 U(x) + k2 where k1 > 0

behaves exactly like U.

I With deterministic prizes only (no lottery choices), only a total order on prizescan be determined

I Definition 3.2. We call a total ordering on states a value function or ordinalutility function.

Kohlhase: Künstliche Intelligenz 2 233 July 12, 2018

Page 244: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Maximizing Expected Utility

I Definition 3.3 (MEU principle). Choose the action that maximizes expectedutility (MEU)

I Note: an agent can be entirely rational (consistent with MEU) without everrepresenting or manipulating utilities and probabilities

I Example 3.4. A lookup table for perfect tic tac toe.I But an observer can construct a value function V by observing the agent’s

preferences. (even if the agent does not know V )

Kohlhase: Künstliche Intelligenz 2 234 July 12, 2018

Page 245: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utilities

I Utilities map states to real numbers. Which numbers?I Definition 3.5 (Standard approach to assessment of human utilities).

Compare a given state A to a standard lottery Lp that hasI “best possible prize” u> with probability pI “worst possible catastrophe” u⊥ with probability 1− p

adjust lottery probability p until A ∼ Lp. Then U(A) = p.I Example 3.6. Choose u> = current state, u⊥ = instant death

pay $30 ∼ L

continue as before

instant death

0.999999

0.000001

Kohlhase: Künstliche Intelligenz 2 235 July 12, 2018

Page 246: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utility scales

I Definition 3.7. Normalized utilities: u> = 1, u⊥ = 0I Definition 3.8. Micromorts: one-millionth chance of deathI Micromorts are useful for Russian roulette, paying to reduce product risks, etc.

I Problem: What is the value of a micromort?

I Ask them directly: What would you pay to avoid playing Russian roulette with amillion-barrelled revolver (very large numbers)

I But their behavior suggests a lower price:I driving in a car for 370 km incurs a risk of one micromort;I over the life of your car – say, 150,000 km that’s 400 micromorts.I People appear to be willing to pay about €10,000 more for a safer car that halves

the risk of death (; €25 per micromort)I This figure has been confirmed across many individuals and risk types.I Of course, this argument holds only for small risks. Most people won’t agree to

kill themselves for €25 million.I Definition 3.9. QALYs: quality-adjusted life yearsI they useful for medical decisions involving substantial risk

Kohlhase: Künstliche Intelligenz 2 236 July 12, 2018

Page 247: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Money

I Money does not behave as a utility functionI Given a lottery L with expected monetary value EMV (L), usually

U(L) < U(EMV (L)), i.e., people are risk-averse.I Utility curve: for what probability p am I indifferent between a prize x and a

lottery [p,M$; (1− p), 0$] for large M?I Typical empirical data, extrapolated with risk-prone behavior for debitors:

I Empirically: comes close to the logarithm on the positive numbers.Kohlhase: Künstliche Intelligenz 2 237 July 12, 2018

Page 248: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

9.4 Multi-Attribute Utility

Kohlhase: Künstliche Intelligenz 2 237 July 12, 2018

Page 249: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Multi-Attribute Utility

I How can we handle utility functions of many variables X1 . . .Xn?I Example 4.1 (Assessing an Airport Site).

Construction

Litigation

Air Traffic Deaths

Noise

Cost

what isU(Deaths,Noise,Cost) fora projected airport?

I How can complex utility functions be assessed from preference behaviour?

I Idea 1: identify conditions under which decisions can be made without completeidentification of U(x1, . . . , xn)

I Idea 2: identify various types of independence in preferences and deriveconsequent canonical forms for U(x1, . . . , xn)

Kohlhase: Künstliche Intelligenz 2 238 July 12, 2018

Page 250: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Strict Dominance

I Typically define attributes such that U is monotonic in each argument. (wlog.growing)

I Definition 4.2. Choice B strictly dominates choice A iff Xi (B) ≥ Xi (A) for all i(and hence U(B) ≥ U(A))

I Strict dominance seldom holds in practice (life is difficult)I but is useful for narrowing down the field of contenders.I For uncertain attributes strict dominance is eve more unlikely

Kohlhase: Künstliche Intelligenz 2 239 July 12, 2018

Page 251: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Stochastic Dominance

I Definition 4.3. Distribution p2 stochastically dominates distribution p1 iff thecummulative distribution of p2 dominates that for p1 for all t, i.e.∫ t

−∞p1(x)dx ≤

∫ t

−∞p2(x)dx

I Example 4.4.

Kohlhase: Künstliche Intelligenz 2 240 July 12, 2018

Page 252: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Stochastic dominance contd.

I Observation 4.5. If U is monotonic in x , then A1 with outcome distributionp1 stochastically dominates A2 with outcome distribution p2:∫ ∞

−∞p1(x)U(x)dx ≥

∫ ∞−∞

p2(x)U(x)dx

I Multiattribute case: stochastic dominance on all attributes ; optimalI Stochastic dominance can often be determined without exact distributions using

qualitative reasoningI E.g., construction cost increases with distance from city S1 is closer to the city

than S2 ; S1 stochastically dominates S2 on costI E.g., injury increases with collision speedI Can annotate belief networks with stochastic dominance informationI Definition 4.6. X +→ Y (X positively influences Y ) means that P(Y | x1, z)

stochastically dominates P(Y | x2, z) for every value z of Y ’s other parents Zand all x1 and x2 with x1 ≥ x2.

Kohlhase: Künstliche Intelligenz 2 241 July 12, 2018

Page 253: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Label the arcs + or – for influence in a Bayesian Network

Kohlhase: Künstliche Intelligenz 2 242 July 12, 2018

Page 254: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Label the arcs + or – for influence in a Bayesian Network

Kohlhase: Künstliche Intelligenz 2 242 July 12, 2018

Page 255: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Label the arcs + or – for influence in a Bayesian Network

Kohlhase: Künstliche Intelligenz 2 242 July 12, 2018

Page 256: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Label the arcs + or – for influence in a Bayesian Network

Kohlhase: Künstliche Intelligenz 2 242 July 12, 2018

Page 257: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Label the arcs + or – for influence in a Bayesian Network

Kohlhase: Künstliche Intelligenz 2 242 July 12, 2018

Page 258: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Label the arcs + or – for influence in a Bayesian Network

Kohlhase: Künstliche Intelligenz 2 242 July 12, 2018

Page 259: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Preference Structure and Multiattribute Utility

I Observation 4.7. n attributes with d values each ; need dn values todetermine utility function U(x1, . . . , xn). (worst case)

I Assumption: Preferences of real agens have much more structure

I Approach: identify regularities and prove repesentation theorems based on these:

U(x1, . . . , xn) = F (f1(x1), . . . , fn(xn))

where F is simple, e.g. addition.I note the similarity to Bayesian networks that decompose the full joint probability

distribution.

Kohlhase: Künstliche Intelligenz 2 243 July 12, 2018

Page 260: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Preference structure: Deterministic

I Recall: In deterministic environments an agent has a value function.I Definition 4.8. X1 and X2 preferentially independent of X3 iff preference

between 〈x1, x2, x3〉 and 〈x ′1, x ′2, x3〉 does not depend on x3.I Example 4.9. E.g., 〈Noise,Cost, Safety〉: are preferentially independent〈20,000 suffer, 4.6 G$, 0.06 deaths/mpm〉 vs.〈70,000 suffer, 4.2 G$, 0.06 deaths/mpm〉

I Theorem 4.10 (Leontief, 1947). If every pair of attributes is preferentiallyindependent of its complement, then every subset of attributes is preferentiallyindependent of its complement: mutual preferential independence.

I Theorem 4.11 (Debreu, 1960). Mutual preferential independence impliesthat there is an additive value function: V (S) =

∑i Vi (Xi (S)), where Vi is a

value function referencing just one variable Xi .I Hence assess n single-attribute functions; often a good approximationI Example 4.12. The value function for the airport decision might be

V (noise, cost, deaths) = −noise · 104 − cost − deaths · 1012

Kohlhase: Künstliche Intelligenz 2 244 July 12, 2018

Page 261: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Preference structure: Stochastic

I Need to consider preferences over lotteries and real utility functions (not justvalue functions)

I Definition 4.13. X is utility-independent of Y iff preferences over lotteries in Xdo not depend on particular values in Y.

I Definition 4.14. A set X is mutually utility-independent, iff each subset isutility-independent of its complement.

I Theorem 4.15. For mutually utility-indepenent sets there is a multiplicativeutility function: [Keeney:muf74]

U = k1U1 + k2U2 + k3U3 + k1k2U1U2 + k2k3U2U3 + k3k1U3U1 + k1k2k3U1U2U3

I Routine procedures and software packages for generating preference tests toidentify various canonical families of utility functions

Kohlhase: Künstliche Intelligenz 2 245 July 12, 2018

Page 262: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

9.5 Decision Networks

Kohlhase: Künstliche Intelligenz 2 245 July 12, 2018

Page 263: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utility-Based Agents (Recap)

I

Kohlhase: Künstliche Intelligenz 2 246 July 12, 2018

Page 264: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Decision networks

I Definition 5.1. Add action nodes and utility nodes (also called value nodes) tobelief networks to enable rational decision making.

I Example 5.2 (Choosing an Airport Site).

Algorithm:For each value of action node

compute expected value of utility node given action, evidenceReturn MEU action (via argmax)

Kohlhase: Künstliche Intelligenz 2 247 July 12, 2018

Page 265: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

A Decision-Theoretic Expert System or Aortic Coarctation

I

Kohlhase: Künstliche Intelligenz 2 248 July 12, 2018

Page 266: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Knowledge Eng. for Decision-Theoretic Expert Systems

I Create a causal modelI symptoms, disorders, treatments, outcomes, and their influences

I Simplify to a qualitative decision modelI remove vars not involved in treatment decisions

I Assign probabilities (; Bayesian network)I e.g. from patient databases, literature studies, or the expert’s subjective assessments

I Assign utilities (e.g. in QUALYs or micromorts)I Verify and refine the model wrt. a gold standard given by expertsI refine by “running the model backwards” and compare with the literature

I Perform sensitivity analysis (important step in practice)I is the optimal treatment decision robust against small changes in the parameters? (if

yes ; great! if not, collect better data)

Kohlhase: Künstliche Intelligenz 2 249 July 12, 2018

Page 267: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

9.6 The Value of Information

Kohlhase: Künstliche Intelligenz 2 249 July 12, 2018

Page 268: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

What if we do not have all information we need?

I It is Well-Known: that one of the most important parts of decision making isknowing what questions to ask.

I Example 6.1 (Medical Diagnosis).I We do not expect a doctor to already know the results of the diagnostic tests when

the patient comes in.I Tests are often expensive, and sometimes hazardous (directly or by delaying

treatment)I only test, ifI knowing the results lead to a significantly better treatment planI information from test results is not drowned out by a-priori likelihood.

I Information value theory enables the agent to make such decisions rationally.I Simple form of sequential decision making (action only impacts belief state)

I Intuition: With the information, we can change the action to the actualinformation, rather than the average.

Kohlhase: Künstliche Intelligenz 2 250 July 12, 2018

Page 269: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Value of Information by Example

I Idea: compute value of acquiring each possible piece of evidenceCan be done directly from decision network

I Example 6.2 (Buying Oil Drilling Rights). n blocks of rights, exactly one hasoil, worth kI Prior probabilities 1/n each, mutually exclusiveI Current price of each block is k/nI “Consultant” offers accurate survey of block 3. Fair price?

Solution: compute expected value of information = expected value of best actiongiven the information minus expected value of best action without information

II Example 6.3 (Oil Drilling Rights contd.).I Survey may say “oil in block 3”, prob. 1/n ; buy block 3 for k/n make profit of

k − k/n.I Survey may say “no oil in block 3” prob. (n− 1)/n ; buy another block make profit

of k/(n − 1)− C/n.I Expected profit is 1

n· (n−1)k

n+ n−1

n· kn(n−1)

= kn

I we should pay up to k/n for the information (as much as block 3 is worth)

Kohlhase: Künstliche Intelligenz 2 251 July 12, 2018

Page 270: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

General formula (VPI)

I Current evidence E , current best action αI Possible action outcomes Si :

EU(α | E ) = maxa

∑i

U(Si ) · P(Si | E , a)

I Suppose we knew Ej = ejk (new evidence), then we would choose αejk s.t.

EU(αejk | E ,Ej = ejk) = maxa

∑i

U(Si ) · P(Si | E , a,Ej = ejk)

Ej is a random variable whose value is currently unknownI So we must compute expected gain over all possible values:

VPIE (Ej) =∑k

P(Ej = ejk | E ) · EU(αejk | E ,Ej = ejk)− EU(α | E )

I Definition 6.4. VPI = value of perfect information

Kohlhase: Künstliche Intelligenz 2 252 July 12, 2018

Page 271: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Properties of VPI

I Nonnegative: in expectation, not post hoc: VPIE (Ej) ≥ 0 for all j and E

I Nonadditive: consider, e.g., obtaining Ej twice

VPIE (Ej ,Ek) 6= VPIE (Ej) + VPIE (Ek)

I Order-independent:

VPIE (Ej ,Ek) = VPIE (Ej) + VPIE ,Ej (Ek) = VPIE (Ek) + VPIE ,Ek(Ej)

I Note: when more than one piece of evidence can be gathered,maximizing VPI for each to select one is not always optimal; evidence-gathering becomes a sequential decision problem

Kohlhase: Künstliche Intelligenz 2 253 July 12, 2018

Page 272: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire: Qualitative behaviors

I Question: Say we have three distributions for P(U | Ej)

What is the value of information in these three cases?

I Answers: qualitativelya) Choice is obvious (a1 almost certainly better) ; information worth little

b) Choice is nonobvious (unclear) ; information worth a lotc) Choice is nonobvious (unclear) but makes little difference ; information worth little

The fact that U2 has a high peak in (c) means that its expected value is knownwith higher certainty than U1. (irrelevant to the argument)

Kohlhase: Künstliche Intelligenz 2 254 July 12, 2018

Page 273: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire: Qualitative behaviors

I Question: Say we have three distributions for P(U | Ej)

What is the value of information in these three cases?

I Answers: qualitativelya) Choice is obvious (a1 almost certainly better) ; information worth littleb) Choice is nonobvious (unclear) ; information worth a lot

c) Choice is nonobvious (unclear) but makes little difference ; information worth little

The fact that U2 has a high peak in (c) means that its expected value is knownwith higher certainty than U1. (irrelevant to the argument)

Kohlhase: Künstliche Intelligenz 2 254 July 12, 2018

Page 274: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire: Qualitative behaviors

I Question: Say we have three distributions for P(U | Ej)

What is the value of information in these three cases?

I Answers: qualitativelya) Choice is obvious (a1 almost certainly better) ; information worth littleb) Choice is nonobvious (unclear) ; information worth a lotc) Choice is nonobvious (unclear) but makes little difference ; information worth little

The fact that U2 has a high peak in (c) means that its expected value is knownwith higher certainty than U1. (irrelevant to the argument)

Kohlhase: Künstliche Intelligenz 2 254 July 12, 2018

Page 275: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Questionnaire: Qualitative behaviors

I Question: Say we have three distributions for P(U | Ej)

What is the value of information in these three cases?

I Answers: qualitativelya) Choice is obvious (a1 almost certainly better) ; information worth littleb) Choice is nonobvious (unclear) ; information worth a lotc) Choice is nonobvious (unclear) but makes little difference ; information worth little

The fact that U2 has a high peak in (c) means that its expected value is knownwith higher certainty than U1. (irrelevant to the argument)

Kohlhase: Künstliche Intelligenz 2 254 July 12, 2018

Page 276: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

A simple Information-Gathering Agent

I Definition 6.5. A simple Information-Gathering Agent (gathers info beforeacting)

function Information−Gathering−Agent (percept) returns an actionpersistent: D, a decision networkintegrate percept into Dj := argmax

k(VPIE (Ek)/Cost(Ek))

if VPIE (Ej) > Cost(Ej) return Request(Ej)else return the best action from D

The next percept after Request(Ej) provides a value for Ej

I Problem: The information gathering implemented here is myopic, i.e. calculatingVPI as if only a single evidence variable will be acquired. (cf. greedy search)

I But it works relatively well in practice. (e.g. outperforms humans for selectingdiagnostic tests)

Kohlhase: Künstliche Intelligenz 2 255 July 12, 2018

Page 277: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Chapter 10 Temporal Probability Models

Kohlhase: Künstliche Intelligenz 2 255 July 12, 2018

Page 278: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Outline

I Time and uncertaintyI Inference: filtering, prediction, smoothingI Hidden Markov modelsI Dynamic Bayesian networksI Particle filteringI Further Algorithms and Topics

Kohlhase: Künstliche Intelligenz 2 256 July 12, 2018

Page 279: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

10.1 Modeling Time and Uncertainty

Kohlhase: Künstliche Intelligenz 2 256 July 12, 2018

Page 280: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Time and uncertainty

I Observation 1.1. The world changes; we need to track and predict itI Example 1.2. Diabetes management vs. vehicle diagnosisI Definition 1.3. A temporal probability model is a probability model, where

possible worlds are indexed by a time structure 〈S ,〉I We restrict ourselves to linear, discrete time structures, i.e. 〈S ,〉 = 〈N,≤〉.

(Step size irrelevant for theory, depends on problem in practice)

I Basic idea: index random variables by N.I Xt = set of unobservable state variables at time t

e.g., BloodSugart , StomachContentst , etc.I Et = set of observable evidence variables at time t

e.g., MeasuredBloodSugart , PulseRatet , FoodEatentI Example 1.4 (Umbrellas). You are a security guard in a secret underground

facility, want to know it if is raining outside. Your only source of information iswhether the director comes in with an umbrella.State variables R1,R2,R3, . . ., Observations U1,U2,U3, . . .

I Notation: Xa:b = Xa,Xa+1, . . . ,Xb−1,Xb

Kohlhase: Künstliche Intelligenz 2 257 July 12, 2018

Page 281: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Processes

I Construct a Bayesian network from these variables: parents?

I Definition 1.5. Markov property: Xt only depends on a bounded subset ofX0:t−1.

I Definition 1.6. A (discrete-time) Markov process (also called Markov chain) isa sequence of random variables with the Markov property.

I Definition 1.7. First-order Markov process: P(Xt | X0:t−1) = P(Xt | Xt−1)

Second-order Markov process: P(Xt | X0:t−1) = P(Xt | Xt−2,Xt−1)

I We will use Markov processes to model sequential environments.

Kohlhase: Künstliche Intelligenz 2 258 July 12, 2018

Page 282: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Processes

I Construct a Bayesian network from these variables: parents?I Definition 1.5. Markov property: Xt only depends on a bounded subset of

X0:t−1.I Definition 1.6. A (discrete-time) Markov process (also called Markov chain) is

a sequence of random variables with the Markov property.

I Definition 1.7. First-order Markov process: P(Xt | X0:t−1) = P(Xt | Xt−1)

Second-order Markov process: P(Xt | X0:t−1) = P(Xt | Xt−2,Xt−1)

I We will use Markov processes to model sequential environments.

Kohlhase: Künstliche Intelligenz 2 258 July 12, 2018

Page 283: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Processes

I Construct a Bayesian network from these variables: parents?I Definition 1.5. Markov property: Xt only depends on a bounded subset of

X0:t−1.I Definition 1.6. A (discrete-time) Markov process (also called Markov chain) is

a sequence of random variables with the Markov property.I Definition 1.7. First-order Markov process: P(Xt | X0:t−1) = P(Xt | Xt−1)

Second-order Markov process: P(Xt | X0:t−1) = P(Xt | Xt−2,Xt−1)

I We will use Markov processes to model sequential environments.

Kohlhase: Künstliche Intelligenz 2 258 July 12, 2018

Page 284: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Process Example: The Umbrella

I Example 1.8 (Umbrellas continued). We model the situation in a Bayesiannetwork:

Raint−1 Raint Raint+1

Umbrellat−1 Umbrellat Umbrellat+1

Problem: First-order Markov assumption not exactly true in real world!

II Possible fixes:1. Increase order of Markov process2. Augment state, e.g., add Tempt , Pressuret

I Example 1.9 (Robot Motion). Augment Positiont and Velocityt with Batteryt

Kohlhase: Künstliche Intelligenz 2 259 July 12, 2018

Page 285: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Stationary Markov Processes as Transition Models

I Definition 1.10. We divide the random variables in a Markov process M into aset of (hidden) state variables Xt and a set of (observable) evidence variablesEt . We call P(Xt | Xt−1) the transition model and P(Et | Et−1) the sensormodel of M.

I Problem: Even with Markov assumption the transition model is infinite (t ∈ N)

I Definition 1.11. A Markov process is called stationary if the transition model isindependent of time, i.e. P(Xt | Xt−1) is the same for all t.

I Example 1.12 (Umbrellas are stationary). P(Rt | Rt−1) does not depend ont (need only one table)

Raint−1 Raint Raint+1

Umbrellat−1 Umbrellat Umbrellat+1

Rt−1 P(Rt )

T 0.7F 0.3

I Don’t confuse “stationary” (processes) with “static” (environments).I We restrict ourselves to stationary Markov processes in this course.

Kohlhase: Künstliche Intelligenz 2 260 July 12, 2018

Page 286: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Stationary Markov Processes as Transition Models

I Definition 1.10. We divide the random variables in a Markov process M into aset of (hidden) state variables Xt and a set of (observable) evidence variablesEt . We call P(Xt | Xt−1) the transition model and P(Et | Et−1) the sensormodel of M.

I Problem: Even with Markov assumption the transition model is infinite (t ∈ N)I Definition 1.11. A Markov process is called stationary if the transition model is

independent of time, i.e. P(Xt | Xt−1) is the same for all t.

I Example 1.12 (Umbrellas are stationary). P(Rt | Rt−1) does not depend ont (need only one table)

Raint−1 Raint Raint+1

Umbrellat−1 Umbrellat Umbrellat+1

Rt−1 P(Rt )

T 0.7F 0.3

I Don’t confuse “stationary” (processes) with “static” (environments).I We restrict ourselves to stationary Markov processes in this course.

Kohlhase: Künstliche Intelligenz 2 260 July 12, 2018

Page 287: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Stationary Markov Processes as Transition Models

I Definition 1.10. We divide the random variables in a Markov process M into aset of (hidden) state variables Xt and a set of (observable) evidence variablesEt . We call P(Xt | Xt−1) the transition model and P(Et | Et−1) the sensormodel of M.

I Problem: Even with Markov assumption the transition model is infinite (t ∈ N)I Definition 1.11. A Markov process is called stationary if the transition model is

independent of time, i.e. P(Xt | Xt−1) is the same for all t.I Example 1.12 (Umbrellas are stationary). P(Rt | Rt−1) does not depend on

t (need only one table)

Raint−1 Raint Raint+1

Umbrellat−1 Umbrellat Umbrellat+1

Rt−1 P(Rt )

T 0.7F 0.3

I Don’t confuse “stationary” (processes) with “static” (environments).I We restrict ourselves to stationary Markov processes in this course.

Kohlhase: Künstliche Intelligenz 2 260 July 12, 2018

Page 288: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Stationary Markov Processes as Transition Models

I Definition 1.10. We divide the random variables in a Markov process M into aset of (hidden) state variables Xt and a set of (observable) evidence variablesEt . We call P(Xt | Xt−1) the transition model and P(Et | Et−1) the sensormodel of M.

I Problem: Even with Markov assumption the transition model is infinite (t ∈ N)I Definition 1.11. A Markov process is called stationary if the transition model is

independent of time, i.e. P(Xt | Xt−1) is the same for all t.I Example 1.12 (Umbrellas are stationary). P(Rt | Rt−1) does not depend on

t (need only one table)

Raint−1 Raint Raint+1

Umbrellat−1 Umbrellat Umbrellat+1

Rt−1 P(Rt )

T 0.7F 0.3

I Don’t confuse “stationary” (processes) with “static” (environments).I We restrict ourselves to stationary Markov processes in this course.

Kohlhase: Künstliche Intelligenz 2 260 July 12, 2018

Page 289: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Sensor Models

I Recap: The sensor model predicts the influence of percepts (and the worldstate) on the belief state (used during update)

I Problem: The evidence variables Et could depend on previous variables as wellas the current state.

I we restrict dependency to current state (otherwise state repn. deficient)I Definition 1.13. We say that a sensor model has the sensor Markov property,

iff P(Et | X0:t ,E0:t−1) = P(Et | Xt)

I Assumptions on Sensor Models: We usually assume the sensor Markov propertyand make it stationary as well: P(Et | Xt) is fixed for all t.

Kohlhase: Künstliche Intelligenz 2 261 July 12, 2018

Page 290: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Sensor Models

I Recap: The sensor model predicts the influence of percepts (and the worldstate) on the belief state (used during update)

I Problem: The evidence variables Et could depend on previous variables as wellas the current state.

I we restrict dependency to current state (otherwise state repn. deficient)I Definition 1.13. We say that a sensor model has the sensor Markov property,

iff P(Et | X0:t ,E0:t−1) = P(Et | Xt)

I Assumptions on Sensor Models: We usually assume the sensor Markov propertyand make it stationary as well: P(Et | Xt) is fixed for all t.

Kohlhase: Künstliche Intelligenz 2 261 July 12, 2018

Page 291: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Sensor Models

I Recap: The sensor model predicts the influence of percepts (and the worldstate) on the belief state (used during update)

I Problem: The evidence variables Et could depend on previous variables as wellas the current state.

I we restrict dependency to current state (otherwise state repn. deficient)I Definition 1.13. We say that a sensor model has the sensor Markov property,

iff P(Et | X0:t ,E0:t−1) = P(Et | Xt)

I Assumptions on Sensor Models: We usually assume the sensor Markov propertyand make it stationary as well: P(Et | Xt) is fixed for all t.

Kohlhase: Künstliche Intelligenz 2 261 July 12, 2018

Page 292: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Umbrellas, the full Story

I Example 1.14 (Umbrellas, Transition & Sensor Models).

Raint−1 Raint Raint+1

Umbrellat−1 Umbrellat Umbrellat+1

Rt−1 P(Rt )

T 0.7F 0.3 Rt P(Ut )

T 0.9F 0.2

Note that influence goes from Raint to Umbrellat (causal)I Observation 1.15. If we additionally know the initial prior probabilities P(X0)

(= time t = 0), then we can compute the full joint probability distribution as

P(X0:t ,E0:t) = P(X0) ·t∏

i=1

P(Xi | Xi−1) · P(Ei | Xi )

Kohlhase: Künstliche Intelligenz 2 262 July 12, 2018

Page 293: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

10.2 Inference: Filtering, Prediction, and Smoothing

Kohlhase: Künstliche Intelligenz 2 262 July 12, 2018

Page 294: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Inference tasks

I Definition 2.1. Filtering (or monitoring): P(Xt | e1:t)computing the belief state – input to the decision process of a rational agent.

I Definition 2.2. Prediction (or state estimation): P(Xt+k | e1:t) for k > 0evaluation of possible action sequences. (= filtering without the evidence)

I Definition 2.3. Smoothing (or hindsight): P(Xk | e1:t) for 0 ≤ k < tbetter estimate of past states (essential for learning)

I Definition 2.4. Most likely explanation argmaxx1:t

(P(x1:t | e1:t))

speech recognition, decoding with a noisy channel.

Kohlhase: Künstliche Intelligenz 2 263 July 12, 2018

Page 295: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Filtering

I Aim: recursive state estimation: P(Xt+1 | e1:t+1) = f (et+1:,P(Xt | e1:t))

I Project the current distribution forward from t to t + 1:

P(Xt+1 | e1:t+1) = P(Xt+1 | e1:t , et+1) (dividing up evidence)= α · P(et+1 | Xt+1, e1:t) · P(Xt+1 | e1:t) (using Bayes’ rule)= α · P(et+1 | Xt+1) · P(Xt+1 | e1:t) (sensor Markov assumption)

I Note that P(et+1 | Xt+1) can be obtained directly from the sensor modelI Continue by conditioning on the current state Xt :

P(Xt+1 | e1:t+1)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt , e1:t) · P(xt | e1:t)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt) · P(xt | e1:t)

I P(Xt+1 | Xt) is simply the transition model, P(xt | e1:t) the “recursive call”.I So f1:t+1 = α · FORWARD(f1:t , et+1) where f1:t = P(Xt | e1:t) and FORWARD

is the update shown above. (Time and space constant (independent of t))

Kohlhase: Künstliche Intelligenz 2 264 July 12, 2018

Page 296: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Filtering

I Aim: recursive state estimation: P(Xt+1 | e1:t+1) = f (et+1:,P(Xt | e1:t))I Project the current distribution forward from t to t + 1:

P(Xt+1 | e1:t+1) = P(Xt+1 | e1:t , et+1) (dividing up evidence)= α · P(et+1 | Xt+1, e1:t) · P(Xt+1 | e1:t) (using Bayes’ rule)= α · P(et+1 | Xt+1) · P(Xt+1 | e1:t) (sensor Markov assumption)

I Note that P(et+1 | Xt+1) can be obtained directly from the sensor model

I Continue by conditioning on the current state Xt :

P(Xt+1 | e1:t+1)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt , e1:t) · P(xt | e1:t)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt) · P(xt | e1:t)

I P(Xt+1 | Xt) is simply the transition model, P(xt | e1:t) the “recursive call”.I So f1:t+1 = α · FORWARD(f1:t , et+1) where f1:t = P(Xt | e1:t) and FORWARD

is the update shown above. (Time and space constant (independent of t))

Kohlhase: Künstliche Intelligenz 2 264 July 12, 2018

Page 297: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Filtering

I Aim: recursive state estimation: P(Xt+1 | e1:t+1) = f (et+1:,P(Xt | e1:t))I Project the current distribution forward from t to t + 1:

P(Xt+1 | e1:t+1) = P(Xt+1 | e1:t , et+1) (dividing up evidence)= α · P(et+1 | Xt+1, e1:t) · P(Xt+1 | e1:t) (using Bayes’ rule)= α · P(et+1 | Xt+1) · P(Xt+1 | e1:t) (sensor Markov assumption)

I Note that P(et+1 | Xt+1) can be obtained directly from the sensor modelI Continue by conditioning on the current state Xt :

P(Xt+1 | e1:t+1)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt , e1:t) · P(xt | e1:t)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt) · P(xt | e1:t)

I P(Xt+1 | Xt) is simply the transition model, P(xt | e1:t) the “recursive call”.I So f1:t+1 = α · FORWARD(f1:t , et+1) where f1:t = P(Xt | e1:t) and FORWARD

is the update shown above. (Time and space constant (independent of t))

Kohlhase: Künstliche Intelligenz 2 264 July 12, 2018

Page 298: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Filtering

I Aim: recursive state estimation: P(Xt+1 | e1:t+1) = f (et+1:,P(Xt | e1:t))I Project the current distribution forward from t to t + 1:

P(Xt+1 | e1:t+1) = P(Xt+1 | e1:t , et+1) (dividing up evidence)= α · P(et+1 | Xt+1, e1:t) · P(Xt+1 | e1:t) (using Bayes’ rule)= α · P(et+1 | Xt+1) · P(Xt+1 | e1:t) (sensor Markov assumption)

I Note that P(et+1 | Xt+1) can be obtained directly from the sensor modelI Continue by conditioning on the current state Xt :

P(Xt+1 | e1:t+1)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt , e1:t) · P(xt | e1:t)

= α · P(et+1 | Xt+1) ·∑xt

P(Xt+1 | xt) · P(xt | e1:t)

I P(Xt+1 | Xt) is simply the transition model, P(xt | e1:t) the “recursive call”.I So f1:t+1 = α · FORWARD(f1:t , et+1) where f1:t = P(Xt | e1:t) and FORWARD

is the update shown above. (Time and space constant (independent of t))Kohlhase: Künstliche Intelligenz 2 264 July 12, 2018

Page 299: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Filtering the Umbrellas

I Example 2.5. Say the guard believes P(R0) = 〈0.5, 0.5〉. On day 1 and 2 theumbrella appears.

P(R1) =∑r0

P(R1 | r0) · P(r0) = 〈0.7, 0.3〉 · 0.5 + 〈0.3, 0.7〉 · 0.5 = 〈0.5, 0.5〉

update with evidence for t = 1 givesP(R1 | u1) = α · P(u1 | R1) · P(R1) = α · 〈0.9, 0.2〉〈0.5, 0.5〉 = α · 〈0.45, 0.1〉 ≈ 〈0.818, 0.182〉

Kohlhase: Künstliche Intelligenz 2 265 July 12, 2018

Page 300: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Prediction

I Prediction computes future k > 0 state distributions: P(Xt+k | e1:t)

I Intuition: Prediction is filtering without new evidence.I Lemma 2.6. P(Xt+k+1 | e1:t) =

∑xt+k

P(Xt+k+1 | xt+k) · P(xt+k | e1:t)

I Proof Sketch: Using the same reasoning as for the FORWARD algorithm forfiltering.

I Observation 2.7. As k →∞, P(xt+k | e1:t) tends to the stationarydistribution of the Markov chain, i.e. the a fixed point under prediction.

I The mixing time, i.e. the time until prediction reaches the stationary distributiondepends on how “stochastic” the chain is.

Kohlhase: Künstliche Intelligenz 2 266 July 12, 2018

Page 301: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Smoothing

I Smoothing estimates past states by computing P(Xk | e1:t) for 0 ≤ k < t

I Divide evidence e1:t into e1:k (before k) and ek+1:t (after k):

P(Xk | e1:t) = P(Xk | e1:k , ek+1:t)= α · P(Xk | e1:k) · P(ek+1:t | Xk , e1:k) (Bayes Rule)= α · P(Xk | e1:k) · P(ek+1:t | Xk) (cond. independence)= α · f1:k · bk+1:t

Kohlhase: Künstliche Intelligenz 2 267 July 12, 2018

Page 302: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Smoothing (continued)

I Backward message bk+1:t = P(ek+1:t | Xk) computed by a backwards recursion:

P(ek+1:t | Xk) =∑xk+1

P(ek+1:t | Xk , xk+1) · P(xk+1 | Xk)

=∑xk+1

P(ek+1:t | xk+1) · P(xk+1 | Xk)

=∑xk+1

P(ek+1, ek+2:t | xk+1) · P(xk+1 | Xk)

=∑xk+1

P(ek+1 | xk+1) · P(ek+2:t | xk+1) · P(xk+1 | Xk)

P(ek+1 | xk+1) and P(xk+1 | Xk) can be directly obtained from the model,P(ek+2:t | xk+1) is the “recursive call” (bk+2:t).

I In message notation: bk+1:t = BACKWARD(bk+2:t , ek+1t) where BACKWARDis the update shown above. (Time and space constant (independent of t))

Kohlhase: Künstliche Intelligenz 2 268 July 12, 2018

Page 303: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Smoothing example

I Example 2.8 (Smoothing Umbrellas). Umbrella appears on days 1/2.I P(R1 | u1, u2) = α · P(R1 | u1) · P(u2 | R1) = α · 〈0.818, 0.182〉 · P(u2 | R1)I compute P(u2 | R1) by backwards recursion:

P(u2 | R1) =∑r2

P(u2 | r2) · P( | r2) · P(r2 | R1)

= 0.9 · 1 · 〈0.7, 0.3〉+ 0.2 · 1 · 〈0.3, 0.7〉 = 〈0.69, 0.41〉I So P(R1 | u1, u2) = α · 〈0.818, 0.182〉 · 〈0.69, 0.41〉 ≈ 0.883, 0.117

smoothing gives a higherprobability for rain on day 1I umbrella on day 2I ; rain more likely on day 2I ; rain more likely on day 1.

Kohlhase: Künstliche Intelligenz 2 269 July 12, 2018

Page 304: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Forward/Backward Algorithm for Smoothing

I Definition 2.9. Forward-backward algorithm: cache forward messages along theway

function Forward−Backward (ev ,prior)returns: a vector of probability distributionsinputs: ev , a vector of evidence evidence values for steps 1, . . . ,t

prior , the prior distribution on the initial state, P(X0)local: fv , a vector of forward messages for steps 0, . . . , t

b, a representation of the backward message, initially all 1ssv , a vector of smoothed estimates for steps 1, . . . ,t

fv [0] := priorfor i = 1 to t do

fv [i ] := FORWARD(fv [i − 1], ev [i ])for i = t downto 1 do

sv [i ] := NORMALIZE(fv [i ]b)b := BACKWARD(b, ev [i ])

return sv

I Time linear in t (polytree inference), space O(t ·#(f))

Kohlhase: Künstliche Intelligenz 2 270 July 12, 2018

Page 305: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Most Likely Explanation

I Observation 2.10. Most likely sequence 6= sequence of most likely states!I Example 2.11. Suppose the umbrella sequence is T,T,F,T,T what is the most

likely weather sequence?I Idea: Use smoothing to find posterior distribution in each time step, construct

sequence of most likely statesI Problem: These posterior distributions range over a single time step (and this

difference matters)I Example 2.12.

Kohlhase: Künstliche Intelligenz 2 271 July 12, 2018

Page 306: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Most Likely Explanation (continued)

I Most likely path to each xt+1 = most likely path to some xt plus one more step

maxx1,...,xt

P(x1, . . . , xt ,Xt+1 | e1:t+1)

= P(et+1 | Xt+1) ·maxxt

(P(Xt+1 | xt) · maxx1,...,xt−1

P(x1, . . . , xt−1, xt | e1:t)))

I Identical to filtering, except f1:t replaced by

m1:t = maxx1,...,xt−1

P(x1, . . . , xt−1,Xt | e1:t)

I.e., m1:t(i) gives the probability of the most likely path to state i .Update has sum replaced by max, giving the Viterbi algorithm:

m1:t+1 = P(et+1 | Xt+1) ·maxxt

P(Xt+1 | xt ,m1:t)

I Observation 2.13. Viterbi has linear time complexity (like filtering), butlinear space complexity (needs to keep a pointer to most likely sequence leadingto each state).

Kohlhase: Künstliche Intelligenz 2 272 July 12, 2018

Page 307: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

10.3 Hidden Markov Models

Kohlhase: Künstliche Intelligenz 2 272 July 12, 2018

Page 308: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Hidden Markov Models

I Xt is a single, discrete variable (usually Et is too) , Domain of Xt is 1, . . .,SI Observation: The transition model P(Xt | Xt−1) can be described by a single

S × S matrix.I Definition 3.1. Transition matrix Tij = P(Xt = j | Xt−1 = i)

I Example 3.2. For the umbrella example: T = P(Xt | Xt−1) =

(0.7 0.30.3 0.7

).

I Definition 3.3. Sensor matrix Ot for each time step, diagonal elementsP(et | Xt = i)

I Example 3.4. With U1 = T and U3 = F we have

O1 =

(0.9 00 0.2

)and O3 =

(0.1 00 0.8

)I Definition 3.5. Forward and backward messages as column vectors:

HMM filtering equation: f1:t+1 = α · (Ot+1 Tt f1:t)HMM smoothing equation: bk+1:t = T Ok+1 bk+2:t

I Forward-backward algorithm needs time O(S2 t) and space O(S t)

Kohlhase: Künstliche Intelligenz 2 273 July 12, 2018

Page 309: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

HMM Example: Robot Localization

I Example 3.6 (Robot Localization in a Maze). Robot has four sonar sensorsthat tell it about obstacles in four directions: N, S, W, E.

I Notation: We write the result where the sensor that detects obstacles in thenorth, south, and east as NSE.

I Example 3.7 (Filter out Impossible States).

a) Possible robot locations after E1 = NSE

Kohlhase: Künstliche Intelligenz 2 274 July 12, 2018

Page 310: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

HMM Example: Robot Localization

I Example 3.6 (Robot Localization in a Maze). Robot has four sonar sensorsthat tell it about obstacles in four directions: N, S, W, E.

I Notation: We write the result where the sensor that detects obstacles in thenorth, south, and east as NSE.

I Example 3.7 (Filter out Impossible States).

b) Possible robot locations after E1 = NSE and E2 = NS

Kohlhase: Künstliche Intelligenz 2 274 July 12, 2018

Page 311: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

HMM Example: Robot Localization

I Example 3.8 (HMM-based Robot Localization).I Random variable Xt for robot location (domain: 42 empty squares)I Transition action for the move action: (T has 422 = 1764 entries)

P(Xt+1 = j | X=i) = Tij =

1N(i)

if j ∈ N(i)

0 else

I We do not know where the robot starts: P(X0) =1n

(here n = 42)I Sensor variable Et : four-bit presence/absence of obstacles in N, S, W, E. Let dit be

the number of wrong bits and ε the error rate of the sensor.

P(Et = et | Xt = i) = Oti = (1− ε)4−dit εdit

I For instance, the probability that the sensor on a square with obstacles in north andsouth would produce NSE is NSE is (1− ε)3 ε1.

Use HMM filtering equation f1:t+1 = α · (Ot+1 Tt f1:t) for localization. (next)

Kohlhase: Künstliche Intelligenz 2 275 July 12, 2018

Page 312: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

HMM Example: Robot Localization

I Idea: Use HMM filtering equation f1:t+1 = α · (Ot+1 Tt f1:t) to computeposterior distribution over locations (i.e. robot localization)I Example 3.9. We come back to the maze of Example 3.6, with ε = 0.2.

a) Posterior distribution over robot location after E1 = NSE

Still the same locations as in the “perfect sensing” case, but now other locationshave non-zero probability.

Kohlhase: Künstliche Intelligenz 2 276 July 12, 2018

Page 313: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

HMM Example: Robot Localization

I Idea: Use HMM filtering equation f1:t+1 = α · (Ot+1 Tt f1:t) to computeposterior distribution over locations (i.e. robot localization)I Example 3.9. We come back to the maze of Example 3.6, with ε = 0.2.

b) Posterior distribution over robot location after E1 = NSE and E2 = NS

Still the same locations as in the “perfect sensing” case, but now other locationshave non-zero probability.

Kohlhase: Künstliche Intelligenz 2 276 July 12, 2018

Page 314: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

HMM Example: Further Inference Applications

I Idea: Use smoothing: bk+1:t = T Ok+1 bk+2:t to find out where it started andthe Viterbi algorithm to find the most likely path it took.

I Example 3.10. Performance of HMM localization vs. observation length(various error rates ε)

Localization error, (Manhattan dis-tance from true location)

Viterbi path accuracy (fraction of cor-rect states on Viterbi path)

Kohlhase: Künstliche Intelligenz 2 277 July 12, 2018

Page 315: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

I Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

I Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 316: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

I Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

I Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 317: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 318: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 319: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 320: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 321: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 322: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 323: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 324: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 325: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 326: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Country dance algorithm

I Can avoid storing all forward messages in smoothing by runningforward algorithm backwards:

f1:t+1 = α · (Ot+1 Tt f1:t)

Ot+1−1 f1:t+1 = α · (Tt f1:t)

α′ (Tt−1 Ot+1−1 f1:t+1) = f1:t

I Algorithm: forward pass computes f1:t , backward pass does f1:i , bt−i :t .

Observation: backwards pass only needs to store one copy of f1:i , bt:t−i ;constant space.

II Problem: Algorithm is severely limited: transition matrix must be invertible andsensor matrix cannot have zeroes – that is, that every observation be possible inevery state.

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 327: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

10.4 Dynamic Bayesian Networks

Kohlhase: Künstliche Intelligenz 2 278 July 12, 2018

Page 328: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Dynamic Bayesian networks

I Definition 4.1. A Bayesian network D is called dynamic (a DBN), iff its randomvariables are indexed by a time structure. We assume that its structure isI time sliced, i.e. that the time slices Dt – the subgraphs of t-indexed random

variables and the edges between them – are isomorophic.I a first-order Markov process, i.e. that variables Xt can only have parents in Dt and

Dt−1.I Xt , Et contain arbitrarily many variables in a replicated Bayes netI Example 4.2.

Kohlhase: Künstliche Intelligenz 2 279 July 12, 2018

Page 329: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

DBNs vs. HMMs

I Every HMM is a single-variable DBN; every discrete DBN is an HMM

I Sparse dependencies ; exponentially fewer parameters;I e.g., 20 state variables, three parents each DBN has 20 · 23 = 160 parameters,

HMM has 220 · 220 ≈ 1012

Kohlhase: Künstliche Intelligenz 2 280 July 12, 2018

Page 330: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Exact inference in DBNs

I Definition 4.3 (Naive method). Unroll the network and run any exactalgorithm

Problem: inference cost for each update grows with t

II Definition 4.4. Rollup filtering: add slice t + 1, “sum out” slice t using variableelimination

I Largest factor is O(dn+1), update cost O(dn+2) (cf. HMM update cost O(d2n))

Kohlhase: Künstliche Intelligenz 2 281 July 12, 2018

Page 331: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Summary

I Temporal models use state and sensor variables replicated over timeI Markov assumptions and stationarity assumption, so we needI transition model P(Xt | Xt−1)I sensor model P(Et | Xt)

I Tasks are filtering, prediction, smoothing, most likely sequence;all done recursively with constant cost per time step

I Hidden Markov models have a single discrete state variable; usedfor speech recognition

I Dynamic Bayes nets subsume HMMs, exact update intractableI Particle filtering is a good approximate filtering algorithm for DBNs

Kohlhase: Künstliche Intelligenz 2 282 July 12, 2018

Page 332: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Chapter 11 Making Complex decisions

Kohlhase: Künstliche Intelligenz 2 282 July 12, 2018

Page 333: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Outline

I Markov Decision Problems (for Sequential Environments)I Value/Policy iteration for computing utilities in MDPsI Partially Observable MDPs (POMDPs)I Decision-theoretic agents for POMDPs

Kohlhase: Künstliche Intelligenz 2 283 July 12, 2018

Page 334: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

11.1 Sequential Decision Problems

Kohlhase: Künstliche Intelligenz 2 283 July 12, 2018

Page 335: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Sequential decision problems

I In sequential decision problems, the agent’s utility depends on a sequence ofdecisions.

I Sequential decision problems incorporate utilities, uncertainty, and sensingI search and planning problems are special cases

Kohlhase: Künstliche Intelligenz 2 284 July 12, 2018

Page 336: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Example MDP

I Example 1.1. A (fully observable) environment with uncertain actions

I States s ∈ S , actions a ∈ AI Transition Model P(s ′ | s, a) = probability that a in s leads to s ′

I Reward function

R(s, a, s ′) :=

−0.04 if (small penalty) for nonterminal states±1 if for terminal states

Kohlhase: Künstliche Intelligenz 2 285 July 12, 2018

Page 337: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Markov Decision Process

I Definition 1.2. A sequential decision problem in a fully observable, stochasticenvironment with a Markovian transition model and an additive reward functionis called a Markov decision process. It consist ofI a set of S of states (with initial state s0 ∈ S),I sets Actions(s) of actions for each state sI a transition model P(s ′ | s, a), andI a reward function R : S → R.

Kohlhase: Künstliche Intelligenz 2 286 July 12, 2018

Page 338: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Solving MDPs

I In search problems, the aim is to find an optimal sequence

I In MDPs, aim is to find an optimal policy π(s) i.e., best action for every possiblestate s (because can’t predict where one will end up)

I The optimal policy maximizes (say) the expected sum of rewardsI Example 1.3. Optimal policy when state penalty R(s) is −0.04:

Kohlhase: Künstliche Intelligenz 2 287 July 12, 2018

Page 339: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Risk and Reward (Example and Questionnaire)

I Example 1.4. Optimal policy depends on the reward R(s) on non-terminals

1 2 3

1

2

3 + 1

–1

4

–1

+1

R(s) < –1.6284

(a) (b)– 0.0221 < R(s) < 0

–1

+1

–1

+1

–1

+1

R(s) > 0

– 0.4278 < R(s) < – 0.0850

Explanation: Explain what you see

II Remark: The careful balancing of risk and reward is characteristic of MDPs.

Kohlhase: Künstliche Intelligenz 2 288 July 12, 2018

Page 340: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Risk and Reward (Example and Questionnaire)

I Example 1.4. Optimal policy depends on the reward R(s) on non-terminals

1 2 3

1

2

3 + 1

–1

4

–1

+1

R(s) < –1.6284

(a) (b)– 0.0221 < R(s) < 0

–1

+1

–1

+1

–1

+1

R(s) > 0

– 0.4278 < R(s) < – 0.0850

Explanation: −∞ ≤ R(s) ≤ −1.6284 ; Life is so painful that agent heads for the next exit.

II Remark: The careful balancing of risk and reward is characteristic of MDPs.

Kohlhase: Künstliche Intelligenz 2 288 July 12, 2018

Page 341: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Risk and Reward (Example and Questionnaire)

I Example 1.4. Optimal policy depends on the reward R(s) on non-terminals

1 2 3

1

2

3 + 1

–1

4

–1

+1

R(s) < –1.6284

(a) (b)– 0.0221 < R(s) < 0

–1

+1

–1

+1

–1

+1

R(s) > 0

– 0.4278 < R(s) < – 0.0850

Explanation: −0.4278 ≤ R(s) ≤ −0.0850, life is quite unpleasant; the agent takes theshortest route to the +1 state and is willing to risk falling into the −1 state by accident. Inparticular, the agent takes the shortcut from (3,1).

II Remark: The careful balancing of risk and reward is characteristic of MDPs.

Kohlhase: Künstliche Intelligenz 2 288 July 12, 2018

Page 342: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Risk and Reward (Example and Questionnaire)

I Example 1.4. Optimal policy depends on the reward R(s) on non-terminals

1 2 3

1

2

3 + 1

–1

4

–1

+1

R(s) < –1.6284

(a) (b)– 0.0221 < R(s) < 0

–1

+1

–1

+1

–1

+1

R(s) > 0

– 0.4278 < R(s) < – 0.0850

Explanation: Life is slightly dreary (−0.0221 < R(s) < 0) ; take no risks at all. In (4,1)and (3,2) head directly away from the −1 ; cannot fall in by accident.

II Remark: The careful balancing of risk and reward is characteristic of MDPs.

Kohlhase: Künstliche Intelligenz 2 288 July 12, 2018

Page 343: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Risk and Reward (Example and Questionnaire)

I Example 1.4. Optimal policy depends on the reward R(s) on non-terminals

1 2 3

1

2

3 + 1

–1

4

–1

+1

R(s) < –1.6284

(a) (b)– 0.0221 < R(s) < 0

–1

+1

–1

+1

–1

+1

R(s) > 0

– 0.4278 < R(s) < – 0.0850

Explanation: If R(s) > 0, then life is positively enjoyable ; avoid both exits ; reapinfinite rewards.

II Remark: The careful balancing of risk and reward is characteristic of MDPs.

Kohlhase: Künstliche Intelligenz 2 288 July 12, 2018

Page 344: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Risk and Reward (Example and Questionnaire)

I Example 1.4. Optimal policy depends on the reward R(s) on non-terminals

1 2 3

1

2

3 + 1

–1

4

–1

+1

R(s) < –1.6284

(a) (b)– 0.0221 < R(s) < 0

–1

+1

–1

+1

–1

+1

R(s) > 0

– 0.4278 < R(s) < – 0.0850

Explanation:

II Remark: The careful balancing of risk and reward is characteristic of MDPs.

Kohlhase: Künstliche Intelligenz 2 288 July 12, 2018

Page 345: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

11.2 Utilities over Time

Kohlhase: Künstliche Intelligenz 2 288 July 12, 2018

Page 346: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utility of state sequences

I Need to understand preferences between sequences of statesI Definition 2.1. We call preferences on reward sequences stationary, iff

[r , r0, r1, r2, . . .] ≺ [r , r ′0, r′1, r′2, . . .]⇔ [r0, r1, r2, . . .] ≺ [r ′0, r

′1, r′2, . . .]

I Theorem 2.2. For stationary preferences, there are only two ways to combinerewards over time.I An additive rewards: U([s0, s1, . . .]) = R(s0) + R(s1) + · · ·I A discounted rewards:

U([s0, s1, s2, . . .]) = R(s0) + γR(s1) + γ2R(s2) + · · ·

where γ is the discount factor

Kohlhase: Künstliche Intelligenz 2 289 July 12, 2018

Page 347: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utilities contd.

I Problem: infinite lifetimes ; additive utilities become infinite1. Finite horizon: termination at a fixed time T ; nonstationary policy: π(s) depends

on time left2. If there are absorbing states: for any policy π agent eventually “dies” with probability

1 ; expected utility of every state is finite.3. Discounting: assuming γ < 1, R(s) ≤ Rmax,

U([s0, . . ., s∞]) =∞∑t=0

γtR(st) =≤∞∑t=0

γtRmax = Rmax/(1− γ)

Smaller γ ; shorter horizon

Kohlhase: Künstliche Intelligenz 2 290 July 12, 2018

Page 348: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utility of States

I Intuition: Utility of a state = expected (discounted) sum of rewards (untiltermination) assuming optimal actions

I Definition 2.3. Given a policy π, let St be the state the agent reaches at time tstarting at state s0. Then the expected utility obtained by executing π startingin s is given by

Uπ(s) = E (∞∑t=0

γt R(st))

we define the π∗s := argmaxπ

(Uπ(s)).

I Observation 2.4. π∗s is independent from s.I Proof Sketch: If π∗a and π∗b reach point c , then there is no reason to disagree –

or with π∗cI Definition 2.5. We call π∗ := π∗s for some s the optimal policy.

I : This does not hold for finite-horizon policies.

Kohlhase: Künstliche Intelligenz 2 291 July 12, 2018

Page 349: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Utility of States (continued)

I Definition 2.6. The utility U(s) of a state s is Uπ∗(s).

I Remark: R(s) = “short-term reward”, whereas U = “long-term reward”.I Given the utilities of the states, choosing the best action is just MEU:I maximize the expected utility of the immediate successors

π∗(s) = argmaxa∈A(s)

(∑s′

P(s ′ | s, a),U(s ′))

Kohlhase: Künstliche Intelligenz 2 292 July 12, 2018

Page 350: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

11.3 Value/Policy Iteration

Kohlhase: Künstliche Intelligenz 2 292 July 12, 2018

Page 351: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Dynamic programming: the Bellman equation

I Definition of utility of states leads to a simple relationship among utilities ofneighboring states:

expected sum of rewards = current reward+γ · exp. reward sum after best action

I Theorem 3.1 (Bellman equation (1957)).

U(s) = R(s) + γ ·maxa

∑s′

U(s ′) · T (s, a, s ′)

I Example 3.2. U(1, 1) = −0.04+ γ max0.8U(1, 2) + 0.1U(2, 1) + 0.1U(1, 1), up

0.9U(1, 1) + 0.1U(1, 2) left0.9U(1, 1) + 0.1U(2, 1) down0.8U(2, 1) + 0.1U(1, 2) + 0.1U(1, 1) right

I Problem: One equation per state ; n nonlinear equations in n unknowns (maxnot linear) ; cannot use linear algebra techniques for solving them.

Kohlhase: Künstliche Intelligenz 2 293 July 12, 2018

Page 352: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Value Iteration Algorithm I

I Idea:1. Start with arbitrary utility values,2. Update to make them locally consistent with Bellman eqn.3. Everywhere locally consistent ; global optimality

I Definition 3.3. The value iteration algorithm for utility functions is given byfunction VALUE−ITERATION (mdp,ε) returns a utility fn.

inputs: mdp, an MDP with states S , actions A(s), transition model P(s′ | s, a),rewards R(s), and discount γ

ε, the maximum error allowed in the utility of any statelocal variables: U, U′, vectors of utilities for states in S , initially zero

δ, the maximum change in the utility of any stae in an iterationrepeat

U := U′; δ := 0for each state s in S doU′[s] := R(s) + γ ·max

a

∑s′ U[s′] · P(s′ | s, a)

if |U′[s]− U[s]| > δ then δ := |U′[s]− U[s]|until δ < ε(1− γ)/γreturn U

Kohlhase: Künstliche Intelligenz 2 294 July 12, 2018

Page 353: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Value Iteration Algorithm II

I Example 3.4 (Iteration on 4x3).

Kohlhase: Künstliche Intelligenz 2 295 July 12, 2018

Page 354: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Convergence

I Define the max-norm ||U|| = maxs |U(s)|, so ||U − V || = maximum differencebetween U and V

I Let U t and U t+1 be successive approximations to the true utility U

I Theorem 3.5. For any two approximations U t and V t

||U t+1 − V t+1|| ≤ γ ||U t − V t ||

I.e., any distinct approximations must get closer to each otherso, in particular, any approximation must get closer to the true Uand value iteration converges to a unique, stable, optimal solution

I Theorem 3.6. If ||U t+1 − U t || < ε, then ||U t+1 − U|| < 2εγ/(1− γ)I.e., once the change in U t becomes small, we are almost done.

I MEU policy using U t may be optimal long before convergence of values

Kohlhase: Künstliche Intelligenz 2 296 July 12, 2018

Page 355: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Policy Iteration

I Recap: value iteration computes utilities ; optimal policy by MEU.I This even works if the utility estimate is inaccurate ( policy loss small)

I Idea: search for optimal policy and utility values simultaneously[Howard:dpmp60]: IterateI policy evaluation: given policy πi , calulate Ui = Uπi , the utility of each state were πi

to be executed.I policy improvement: calculate a new MEU policy πi+1 using 1-lookahead

Terminate if policy improvement yields no change in utilities.I Observation 3.7. Upon termination Ui is a fixed point of Bellman update

; Solution to Bellman equation ; πi is an optimal policy.I Observation 3.8. policy improvement improves policy and policy space is

finite ; termination.

Kohlhase: Künstliche Intelligenz 2 297 July 12, 2018

Page 356: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Policy Iteration Algorithm

I Definition 3.9. The policy iteration algorithm is given by the followingpseudocode:function POLICY−ITERATION(mdp) returns a policy

inputs: mdp, and MDP with states S , actions A(s), transition model P(s′ | s, a)local variables: U a vector of utilities for states in S , initially zero

π a policy indexed by state, initially random,repeat

U := POLICY−EVALUATION(π,U,mdp)unchanged? := trueforeach state s in X doif max

a∈A(s)()∑

s′ P(s′ | s, a) · U(s′) >∑

s′ P(s′ | s, π[s′]) · U(s′) then do

π[s] := argmaxb∈A(s)

(∑

s′ P(s′ | s, a) · U(s′))

unchanged? := falseuntil unchanged?return π

Kohlhase: Künstliche Intelligenz 2 298 July 12, 2018

Page 357: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Policy Evaluation

I Problem: How to implement the POLICY−EVALUATION algorithm?

I Solution: To compute utilities given a fixed π: For all s we have

U(s) = R(s) + γ∑s′

U(s ′) · T (s, π(s), s ′)

I Example 3.10 (Simplified Bellman Equations for π).

Ui (1, 1) = −0.04 + 0.8Ui (1, 2) + 0.1Ui (1, 1) + 0.1Ui (2, 1)

Ui (1, 2) = −0.04 + 0.8Ui (1, 3) + 0.1Ui (1, 2)

...

I Observation 3.11. n simultaneous linear equations in n unknowns, solve inO(n3) with standard linear algebra methods

Kohlhase: Künstliche Intelligenz 2 299 July 12, 2018

Page 358: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Modified policy iteration

I Policy iteration often converges in few iterations, but each is expensive

I Idea: Use a few steps of value iteration (but with π fixed)starting from the value function produced the last timeto produce an approximate value determination step.

I Often converges much faster than pure VI or PII Leads to much more general algorithms where Bellman value updates and

Howard policy updates can be performed locally in any orderI Reinforcement learning algorithms operate by performing such updates based on

the observed transitions made in an initially unknown environment

Kohlhase: Künstliche Intelligenz 2 300 July 12, 2018

Page 359: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

11.4 Partially Observable MDPs

Kohlhase: Künstliche Intelligenz 2 300 July 12, 2018

Page 360: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Partial Observability

I Definition 4.1. A partially observable MDP (a POMDP for short) is a MDPtogether with an observation model O that is stationary and has the sensorMarkov property: O(s, e) = P(e | s).

Example 4.2 (Noisy 4x3 World).Add a partial and/or noisy sensor.e.g. count number of adjacent walls (1 ≤ w ≤ 2)with 0.1 error (noise)If sensor reports 1, we are in (3, ?) (probably)

I

I Problem: Agent does not know which state it is in ; makes no sense to talkabout policy π(s)!

I Theorem 4.3 (Astrom 1965). The optimal policy in a POMDP is a functionπ(b) where b is the belief state (probability distribution over states).

I Idea: convert a POMDP into an MDP in belief-state space, where T (b, a, b′) isthe probability that the new belief state is b′ given that the current belief stateis b and the agent does a. I.e., essentially a filtering update step

Kohlhase: Künstliche Intelligenz 2 301 July 12, 2018

Page 361: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Partial Observability

I Definition 4.1. A partially observable MDP (a POMDP for short) is a MDPtogether with an observation model O that is stationary and has the sensorMarkov property: O(s, e) = P(e | s).

Example 4.2 (Noisy 4x3 World).Add a partial and/or noisy sensor.e.g. count number of adjacent walls (1 ≤ w ≤ 2)with 0.1 error (noise)If sensor reports 1, we are in (3, ?) (probably)

I

I Problem: Agent does not know which state it is in ; makes no sense to talkabout policy π(s)!

I Theorem 4.3 (Astrom 1965). The optimal policy in a POMDP is a functionπ(b) where b is the belief state (probability distribution over states).

I Idea: convert a POMDP into an MDP in belief-state space, where T (b, a, b′) isthe probability that the new belief state is b′ given that the current belief stateis b and the agent does a. I.e., essentially a filtering update step

Kohlhase: Künstliche Intelligenz 2 301 July 12, 2018

Page 362: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Partial Observability

I Definition 4.1. A partially observable MDP (a POMDP for short) is a MDPtogether with an observation model O that is stationary and has the sensorMarkov property: O(s, e) = P(e | s).

Example 4.2 (Noisy 4x3 World).Add a partial and/or noisy sensor.e.g. count number of adjacent walls (1 ≤ w ≤ 2)with 0.1 error (noise)If sensor reports 1, we are in (3, ?) (probably)

I

I Problem: Agent does not know which state it is in ; makes no sense to talkabout policy π(s)!

I Theorem 4.3 (Astrom 1965). The optimal policy in a POMDP is a functionπ(b) where b is the belief state (probability distribution over states).

I Idea: convert a POMDP into an MDP in belief-state space, where T (b, a, b′) isthe probability that the new belief state is b′ given that the current belief stateis b and the agent does a. I.e., essentially a filtering update step

Kohlhase: Künstliche Intelligenz 2 301 July 12, 2018

Page 363: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

POMDP: Filtering at the Belief State Level

I Recap: Filtering updates the belief state for new evidence.I For POMDP, we also need to consider actions (but the effect is the same)I If b(s) is the previous belief state and agent does action a and then perceives e,

then the new belief state is

b′(s ′) = α P(e | s ′)∑s

P(s ′ | s, a) · b(s)

We write b′ = FORWARD(b, a, e) in analogy to recursive state estimation.I Fundamental Insight for POMDPs: The optimal action only depends on the

agent’s current belief state. (good, it does not know the state!)I Consequence: the optimal policy can be written as a function π∗(b) from belief

states to actions.I POMDP decision cycle: Iterate over

1. Given the current belief state b, execute the action a = π∗(b)2. Receive percept e.3. Set the current belief state to FORWARD(b, a, e) and repeat.

I Intuition: POMDP decision cycle is search in belief state space.Kohlhase: Künstliche Intelligenz 2 302 July 12, 2018

Page 364: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Partial observability contd.

I Recap: POMDP decision cycle is search in belief state space.I Observation 4.4. Actions change the belief state, not just the physical state.

thus POMDP solutions automatically include information-gathering behavior.

I Problem: The belief state is continuous: If there are n states, b is ann-dimensional real-valued vector.

I Example 4.5. The belief state of the 4x3 world is a 11-dimensional continuousspace (11 states)

I Theorem 4.6. Solving POMDPs is very hard! (actually, PSPACE-hard)

I In particular none of the algorithms we have learned applied (discretenessassumption)

I The real world is a POMDP (with initially unknown transition model T andobservation model O)

Kohlhase: Künstliche Intelligenz 2 303 July 12, 2018

Page 365: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

11.5 Online Agents with POMDPs

Kohlhase: Künstliche Intelligenz 2 303 July 12, 2018

Page 366: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Designing Online Agents for POMDPs

I Definition 5.1 (Dynamic Decision Networks).I transition and sensor models are represented as a DBN (a dynamic Bayesian

network).I action nodes and utility nodes are added to create a dynamic decision network

(DDN).I a filtering algo is used to incorporate each new percept and action and to update the

belief state representation.I decisions are made by projecting forward possible action sequences and choosing the

best one.I Generic structure of a dymamic decision network at time t

Variables with known values are gray, agent must choose a value for At .Rewards for t = 0, . . . , t + 2, but utility for t + 3 (= discounted sum of rest)

Kohlhase: Künstliche Intelligenz 2 304 July 12, 2018

Page 367: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Designing Online Agents for POMDPs (continued)

I Part of the lookahead solution of the DDN above (search over action tree)

circle = chance nodes (the environment decides)triangle = belief state (each action decision is taken there)

Kohlhase: Künstliche Intelligenz 2 305 July 12, 2018

Page 368: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Designing Online Agents for POMDPs (continued)

I Note: belief state update is deterministic irrespective of the action outcome; no chance nodes for action outcomes

I belief state at triangle computed by filtering with actions/percepts leading to itI for decision At+i will have percepts Et+1:t+i (even if it does not know their values at

time t)I A POMDP-agent automatically takes into account the value of information and

executes information-gathering actions where appropriate.I time complexity for exhaustive search up to depth d is O(|A|d · |E|d) (|A| =

number of actions, |E| = number of percepts)Kohlhase: Künstliche Intelligenz 2 306 July 12, 2018

Page 369: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

Summary

I Decision theoretic agents for sequential environmentsI Building on temporal, probabilistic models/inference (dynamic Bayesian

networks)I MDPs for fully observable caseI Value/Policy Iteration for MDPs ; optimal policiesI POMDPs for partially observable caseI = MDP on belief state space.I The world is a POMDP with (initially) unknown transition and sensor models.

Kohlhase: Künstliche Intelligenz 2 307 July 12, 2018

Page 370: Artificial Intelligence 2 (Künstliche Intelligenz 2) Part ... · Kohlhase: Künstliche Intelligenz 2 130 July 12, 2018. Environmenttypes IExample1.11. Someenvironmentsclassified:

References I

Kohlhase: Künstliche Intelligenz 2 307 July 12, 2018