64
Review of Schank’s Scripts: consist of a set of slots. Associated with each slot may be information about the kinds of values it may contain, as well as default values. Scripts have causal structure – events connected to earlier events that make them possible, and later events they enable. Headers of scripts indicate when a script should be activated Related to the concept of Frames (Minsky) which was earlier and for more static structures (e.g. a room). Scripts more like a big verb dictionary, Frames more like one for nouns.

COM1070: Introduction to Artificial Intelligence: week 7 Yorick Wilks Computer Science Department University of Sheffield

Embed Size (px)

Citation preview

Review of Schank’s Scripts: consist of a set of slots.

Associated with each slot may be information about the kinds of values it may contain, as well as default values.

Scripts have causal structure – events connected to earlier events that make them possible, and later events they enable.

Headers of scripts indicate when a script should be activated

Related to the concept of Frames (Minsky) which was earlier and for more static structures (e.g. a room). Scripts more like a big verb dictionary, Frames more like one for nouns.

What background knowledge do we need to understand a story?

What information does the writer expect us to infer?

Are we likely to have both in a predetermined script?

How do when know when a story has stopped following a script? (Compare: how do we know when the person we are talking to has changed the subject--some people never notice!)

De Jong’s ‘sketchy script matcher’ FRUMP At Yale around 1977 DeJong developed

a new form of SAM (Lehnert’s Script Applier Mechanism)

It sought only to fill initially determined predicate values of interest to a user

It worked mainly on newspaper stories about terrorism.

For example, FRUMP wants to find out type of car, object it collided with, location of accident, number of people killed/injured, who was at fault.

Skims new story to identify appropriate script.

Then tries to answer expectations.

Connected to UPI wire service.

UPI Story.Pisa, Italy. Officials today searched for the black box flight recorder aboard an Italian air force transport plane to determine why the aircraft crashed into a mountainside killing 44 persons. They said the weather was calm and clear, except for some ground level fog, when the US-made Hercules C130 transport plane hit Mt Serra moments after takeoff Thursday.

The pilot described as one of the country’s most experienced, did not report any trouble in a brief radio conversation before the crash.

FRUMP summary:

44 people were killed when an airplane crashed into a mountain in Italy today.

FRUMP is not like a ‘full’ restaurant script (for air disaster) but it simply fills a small number of slots (not necessarily ordered) like NUMBER_DEAD, WHERE_CRASH, WHEN_ CRASH.

FRUMP was never statistically evaluated

But FRUMP was the forerunner of a 1990’s technology Information Extraction, where

‘templates’ of slots and fillers are filled from web or newspaper text at high speed and huge volume)

This new AI technology was created by US Government funding in the 1990s and is highly statistical and competitive between groups/universities/companies.

How do humans perform tasks?

Part of the aim of research on Script as was to find a way of giving a program the same knowledge that humans use to understand a story--and Script theory was very influential in Psychology.

Similarly, in research on Expert Systems, aim is to capture, and apply, the knowledge that human experts have.

And in earlier examples, e.g. GPS, idea was to mimic human problem solving ability.

It makes sense to emulate humans in Artificial Intelligence research.

One of the original motivations for AI research was to understand human mind.

But also to get computers to do clever things, no matter how!

Difficult to provide an account of intelligence without reference to what humans can do.

Although our changed conception of intelligence now is less human-based e.g. perhaps a bee is capable of intelligent behaviour.

But if we are concerned to emulate humans, we need to find out how humans think, if we think psychology has ways of telling us that reliably

Ways of finding out how people work……..

Introspection (most AI experiments, like CD/Sripts)

Protocol analysis (Activity reports--GPS) Psychology experiments One problem for expert systems is that the

introspection of experts is unreliable (plumbers cant always tell you how they do it).

Much psychology is unsurprising but sometimes helpful--e.g. that people usually cant remember surface words only content--which is consistent with CD’s claims.

Return to Expert Systems

SHRDLU, and blocks microworld. Domain-specific knowledge (as opposed to domain-general knowledge).

Understood substantial subset of English by representing and reasoning about a very restricted domain.

Had knowledge of microworld, (but no real understanding).

But program too complex to be extended to real world.

Expert systems: also relied on depth of knowledge of constrained domain.

But commercially exploitable. ‘Real’ applications.

SHRDLU Dead end: program very complex, also little to do with real world.

General realisation that programs that performed well within limits of microworlds, could not capture complexity of everyday human reasoning.

Remember that SHRDLU would have to process AN INTERESTING BOOK by accessing all the books it knew in its database and all the interesting things!

Hubert Dreyfus (1972): criticism of idea that reasoning and intelligence could be captured by logical rules.

Dreyfus was part of the first major reaction against the claims of AI in the 1970s (cf. UK Govt. Lighthill Report).

Weizenbaum (1976): pointing out that his ELIZA ‘had come close to passing Turing Test.(!) Humans too ready to attribute intelligence to unintelligent devices. Risk of oversold programs.

But some of this was just breast beating for profit (Weizenbaum’s Computer Power and Human Reason was Reader’s Digest Book of the Month!). Overselling how much one had done even while repenting!

References for Knowledge Representation

Rich and Knight (1991) Artificial Intelligence, McGraw-Hill, Inc. Chapter 4.

Cawsey, A. (1997) Essentials of Artificial Intelligence, Prentice-Hall. (see also web reference on course page)

Russell and Norvig (1995) Artificial Intelligence: A modern approach. Chapter 3.

Introspective evidence of stages of learning a skill or expertise – e.g. car driving or chess playing

Novice. Car driver or chess player is consciously following rules.

Expert: can decide what to do ‘ without thinking’ – making decisions about what to do based on resemblance of current situation to many previously experience situations.

- best chess players can usually instantly recognise what is a good move.

- expert driver knows when slowing down is needed without thinking about it. (e.g. becomes difficult to drive if you consciously reflect about gear shifting and try to decide what to do).

If this intuition is correct, there is more to real expert understanding than following rules.

BUT a few problems where (rule driven) expert systems can perform as well as experts.

And even in the absence of claims that expert systems think like humans, these may well be a useful tools.

Probably work best when used as consultant or aide to human expert or novice.

Examples are medical diagnostic systems, optimal layout systems for space, and scheduling algorithms. Feigenbaum’s DENDRAL at Stanford predicts chemical compounds.

Criticisms by Hubert Dreyfus

Dreyfus: points out ways in which AI theorists have overclaimed about what they can do.

e.g. Feigenbaum claims that ‘DENDRAL has been in use for many years at university and industrial chemical labs around the world’.

But ‘..when we called several university and industrial sites that do mass spectroscopy, we were surprised to find that none of them use DENDRAL..’

Dreyfus: Programming attempts to capture ordinary, or common sense knowledge and reasoning ability are doomed to failure.

Such knowledge cannot be captured by programs because it is too contextual and open-ended.

For Dreyfus, the real expert is not following rules

Strong AI: building programs that actually think (or striving towards this)

Weak AI 1: Applications – trying to perform tasks that would require intelligence if performed by humans.

Some attempt to simulate human solutionsWeak AI 2: Modelling human cognition Expert Systems sometimes do better than human

experts. e.g. Buchanan, 1982, MYCIN did better than panel of experts in evaluating ten selected meningitis cases.

But expert systems benefit from being applied in an area where computer can exploit an ability to follow rules.

Four major problems for expert systems

Brittleness. Cannot fall back on general knowledge – e.g. if mistake in entering data for medical expert system, entering that patient is 130 years old, and weighs 40 pounds. ES would not guess values switched.

No Meta-knowledge. Expert systems do not know their own limitations.

Knowledge acquisition. Still bottleneck in new domains.

Validation. Difficult to know what to compare it to (unless compared to human experts diagnosing real world problems).

Domain-specific knowledge versus domain-independent knowledge

Expert systems: good at domain-specific knowledge, bad at domain-independent.

PUFF knows nothing about medical complaints except conditions of the lung (i.e. knowledge very specific), and may not even know whether lungs are above or below knees (example of common knowledge about human anatomy).

Does that matter?

Would we care if it diagnosed us efficiently?

Why are we obsessed with being a human whole?

Is an ES like an Idiot savant: person who is basically retarded, but able to perform very well in one limited domain. e.g. calculating day on which particular dates fall.

From Lenat and Guha (1990) (in Rich and Knight, 1991, Artificial Intelligence)

System: How old is the patient?

Human: (looking at his 1957 chevrolet) 33

System: Are there any spots on the patients body?

Human: (noticing rust spots) Yes.

System: What colour are the spots?

Human: Reddish-brown.

System: The patient has measles (probability 0.9)

More like ‘automated reference manuals’ (Copeland, 1993).

Advantages of Expert Systems

Human experts can lose expertise.

Ease of transfer of artificial expertise.

No effect of emotion in artificial expertise.

Expert systems are a low cost alternative – expensive to develop but cheap to operate.

Limitations:

Lack of creativity, not adaptive, lack sensory experience, narrow focus, and no commonsense knowledge (or meta-knowledge).

Lack of wider understanding

Winograd (Shrdlu’s programmer)

‘..There is a danger inherent in the label ‘expert system’. When we talk of a human expert we connote someone whose depth of understanding serves not only to solve specific well-formulated problems, but also to put them into a larger context. We distinguish between experts and idiot savants. Calling a program an expert is misleading….’

Can lead to inappropriate expectations

But may be useful if users can be educated about proper expectations (are people getting used to limited machines?)

See following two paragraphs (from Hayes-Roth, 1983)

Summaries of pulmonary function diagnosis of particular patient. One by human expert, other by expert system (PUFF).

Conclusions: the low diffusing capacity, in combination with obstruction and a high total lung capacity is consistent with a diagnosis of emphysema. Although bronchodilators were only slightly useful in this one case, prolonged use may prove beneficial to the patient.

PULMONARY FUNCTION DIAGNOSIS: MODERATELY SEVERE OBSTRUCTIVE AIRWAYS DISEASE. EMPHYSEMATOUS TYPE.

Conclusions: Overinflation, fixed airway obstruction and low diffusing capacity would all indicate moderately severe obstructive airway disease of the emphysematous type. Although there is no response to bronchodilators on this occasion, more prolonged use may prove to me more helpful.

PULMONARY FUNCTION DIAGNOSIS: OB-STRUCTIVE AIRWAYS DISEASE, MODERATELY SEVERE EMPHYSEMATOUS TYPE.

No totally automatic ways of constructing expert knowledge bases, but there are programs which interact with domain experts to extract expert knowledge efficiently.

e.g. finding holes in knowledge and prompting expert to fill them.

AND/OR checking for consistency in knowledgeOR Alternative to interviewing expert: looking at

sample problem and solutions, and inferring its own rules.

e.g. bank’s problem of deciding whether to approve a loan. Instead of interviewing loan oficers, look at past loans, and try to generate loans that will maximise number of good loans in the future.

Expert system Shells also marketed. e.g. EMYCIN (Empty Mycin) Consists of the shell of an expert system, without domain specific knowledge.

New knowledge domain can be entered, and make use of same rule mechanisms.

Evaluate expert systems: good idea or not?

How important is it to have systems that are commercially viable, and made use of in the real world?

Would you be happy to rely on a medical Expert System instead of a doctor?

Advantages Disadvantages

Reliance of expert systems on domain specific knowledge

Also on heuristics operating on the knowledge

Knowledge-base: need to find a way of representing knowledge. MYCIN: production rules.

Also need to draw appropriate inferences – inference-engine.

Need to work out what knowledge is appropriate, and to get it into the knowledge-base.

Knowledge engineering

Based on protocol analysis (GPs pioneered this) : human subjects encouraged to think aloud as they solved problems. Protocols later analysed to reveal concepts and procedures employed.

Protocol analysis used alongside Logic Theorist by Newell and Simon.

Interaction between expert system builder, knowledge engineer, and human experts in some problem area.

Some computational psychologists (e.g. Schvaneveldt) used networks to represent knowledge elicited as associations of concepts.

Automated Knowledge Acquisition and Evaluation

Alternative to time-consuming and expensive knowledge engineering.

Evaluation depends entirely on task for which ES are designed.

If they function as assistants (like DENDRAL) we need only that they do not miss any solutions with respect to given set of constraints, and take a reasonable length of time.

If like MYCIN they generate whole solutions, we need evaluation against human experts (or rival expert systems).

Evaluation of expert systems.

Comparison to experts: need to follow experimental procedures, i.e. so raters don’t know which are human and which are computer’s solutions.

DENDRAL: used as expert’s assistant, rather than stand alone expert. Heuristic search technique constrained by knowledge of human expert.

‘…supports hundreds of international users every day, assisting in structure elucidation problems for such things as antibiotics and impurities in manufactured chemicals..’ (Jackson, 1990)

.

MYCIN: performance compares favourably with human experts. But never used in hospitals

Suggested reasons (Jackson, 1990) Its knowledge base is incomplete since it

does not cover anything like the full spectrum of infectious diseases.

Running it would have required more computing power than hospitals could afford.

Interface not good. Trade union protectionism by US doctors?

MYCIN. (Shortliffe and Buchanan, Stanford).

Expert system which attempts to recommend appropriate therapies for patients with bacterial infections.

Four part decision process: Deciding if the patient has a significant

infection Determining the possible organisms involved Selected a set of drugs that might be

appropriate Choosing the most appropriate drug or

combination of drugs.

MYCIN has five components.

A knowledge base A dynamic patient database A consultation program An explanation program A knowledge acquisition program, for

adding or changing rules.

Once MYCIN finds the identities of the disease-causing organisms, it tries to select therapy to treat disease.

IF the identity of the organism is pseudomonas THEN therapy should be selected from among the following drugs:

Colistin (.98) Polymyxin (.96) Gentamicin (.96) Carbenicillin (.65) Sulfisoxazole (.64)

(decimal numbers show prob. of arresting growth of pseudomonas).

Expert systems typically use production rules: (IF – THEN rules)

e.g. MYCIN rule

If: The stain of the organism is gram-positive,

and The morphology of the organism is coccus,

and The growth conformation of the organism is

clumps,

then there is suggestive evidence (0.7) that the identity of the organism is staphylococcus.

MYCIN contains more than 500 such rules.

Complex interactions of rules gives high level of performance.

- at level of human specialists in blood infections (and much better than GPs) (Shortliffe, 1976).

The UK NHS is said to be shifting to ‘evidence based medicine’ and is VERY short of experts, so be optimistic!

Diagnostic knowledge (knowledge-based) is represented as a set of rules

IF The site of the culture is blood, and The stain of the organism is gram net, and The morphology of the organism is rod, and The patient has been seriously burned

THEN there is evidence (0.4) that the identity of the organism is pseudomonas.

MYCIN control structure.

Has top level goal

IF (1) there is an organism which requires therapy, and (2) consideration has been given to any other organisms requiring therapy

THEN compile a list of possible therapies, and determine the best one in this list.

These rules used to reason backward to the clinical data (backward chaining).

Possible bacteria causing infection are considered in turn.

MYCIN attempts to prove whether they are involved.

Another actual expert system

DENDRAL project, began at Stanford University (USA) in 1965.

Feigenbaum and Lederberg.Aim: to determine the molecular structure of an

unknown organic compound.Analysed data from mass spectrometer.Mass spectrometer – bombards chemical

sample with beam of electrons, causing compound to fragment, and components to be rearranged.

But complex molecule can fragment in different ways; can only make predictions about which bonds will break.

Has data from mass spectogram (i.e. after bonds have broken), and has to work out what the original compound was.

Although there are constraints (i.e. has identified chemical formula of compound, and presence/absence of certain substructural features) still many possibilities.

DENDRAL planner can assist in decision about which constraints to impose.

DENDRAL could figure out (on basis of vast amount of data from mass spectographs) which organic compound was being analysed.

Performance relevant data, formulated hypotheses about compound’s molecular structure, and tested hypotheses by way of further predictions.

Output was list of possible molecular compounds ranked in terms of decreasing plausibility.

Required constraints – based on conclusions already drawn.

Forbidden constraints – rules out possibilities that don’t fit the data, or because resultant structures are chemically unstable.

BUT: does not emulate ways in which humans would actually solve problems.

DENDRAL (in 1960s) – beginning of divide between simulation of human behaviour, and trying to arrive at intelligence by any means available.

Problems: Best way to achieve intelligent behaviour may

be to emulate human intelligence. Most interesting aspect of AI is the light it

throws on understanding the human mind. Yet…expert systems do work!

Examples of domains for Expert Systems: Engineering

- Design- Fault finding

- Manufacturing planning

- Scheduling Scientific analysis Medical diagnosis Financial analysis

User User Interface

Explanation system

Inference engine

Knowledge base editor

Case specific data

Knowledge base

Expert System Shell

Knowledge-base, contains representation of domain-specific knowledge.

Inference engine – performs reasoning.

Two kept separate.

Normal method for representing knowledge in an expert system:

IF-THEN rules.

Often rules do not have certain conclusions: dealing with uncertainty.

Main approaches to knowledge representation in AI.

Logic Frames and semantic networks If-then rules within a rule-based system

General characteristics

Expert system: program designed to replicate decision making process of human expert.

Basic idea: experts have a great deal of knowledge, and this knowledge could be provided in some formal manner to a program.

Requires knowledge base. Knowledge base entered by knowledge engineer – ‘knowledge engineering’, involves interviewing and observing experts, and converting their words and action into a knowledge base

Reasoning mechanisms to apply knowledge to problems

Mechanism for explaining their decisions.

Example: rules for diagnosing household emergency.

Rule 1: If coughing THEN add smoky

Rule 2: If wet and NOT raining THEN add burstpipe

Rule 3: If NOT coughing AND alarm-rings THEN ADD burglar

Rule 4: If smoky AND hot THEN ADD fire

Space of hypotheses for Home Emergency expert system: fire, burstpipe, burglar.

Try to prove fire (R4)

Try to prove smoky and hot (R1 for smoky)

Are you coughing? – no (smoky fails, therefore fire fails)

Try to prove burst pipe (R2)

Are you getting wet? No (burst pipe fails)

Burglar? Not coughing (known) and alarm rings.

Is there an alarm ringing? Yes

I conclude there’s been a burglary.

Often some explanation facility included.e.g.:SYSTEM: Are you coughing?USER: NoSYSTEM: Are you getting wet?USER: NoSYSTEM: Is there an alarm ringing?USER: Why?SYSTEM: I’m trying to prove there has been a

burglary. It has been established that it is not the case that you are coughing. Therefore, if there is an alarm ringing then we can conclude that there is a burglary. Is there an alarm ringing?

USER: Yes

SYSTEM: I conclude that there has been a burglary.

USER: How?

SYSTEM: This follows from Rule 3: If NOT coughing and alarm-rings THEN ADD burglar. You told me that it is not the case that you are coughing. You told me that there is an alarm ringing.

Use of backward chaining as problem solving strategy.

Algorithm: To prove G- If G is in current facts it is proved.- Otherwise, if G is askable then ask user,

record their answer as a new current fact, and succeed or fail according to their response.

- Otherwise, find a rule which can be used to conclude G and try to prove each of that rule’s preconditions.

- Otherwise fail G.

Fire scenario of rules and facts

R1: IF smoky AND hot THEN ADD fire

R2: IF alarm-beeps THEN ADD smoky

R3: IF alarm-beeps THEN ADD ear-plugs

R4: IF fire THEN ADD switch-on-sprinklers

R5: IF smoky THEN ADD poor-visibility

F1: alarm-beeps

F2: hot

Proving Switch-on-sprinklers

Try to prove G1 switch-on-sprinklers

Matches Rule 4: try to prove G2 fire

Matches Rule 1: try to prove G3 smoky and G4 hot

G3 matches R2.

New goals G5: alarm beeps, G4: hot.

Goals satisfied (by F1 and F2):

THEREFORE sprinkler switched on.

Backward chaining again:

If you know what the conclusion might be: backward chaining may be better.

e.g. start with goal to prove, like switch-on-sprinkler.

To prove goal G: If G is in initial facts it is proven Otherwise find a rule which ca be used to

conclude G, and try to prove its preconditions. Otherwise, fail G.

Forward Chaining

Facts held in working memory Find all the rules which have preconditions

satisfied Select one (using conflict resolution

strategies---see below) Perform actions in conclusion, maybe

modifying working memory.

Revised simple example:Rule 1: IF hot AND smoky THEN ADD fireRule 2: IF alarm-beeps THEN ADD smokyRule 3: IF fire THEN ADD switch-on-sprinklersFact 1: alarm-beepsFact 2: hot Check to see rules whose conditions hold

(=R2) Add new fact to working memory: Fact 3: smoky.

Check again (=R1) Add new fact. Fact 4: Fire.

Check again (=R3) Sprinklers on!

What happens if more than one rule has its conditions satisfied?

Rule 1: IF hot AND smoky THEN ADD fire

Rule 2: IF alarm-beeps then add smoky

Rule 3: IF fire THEN ADD switch-on-sprinklers

Rule 4: IF hot AND dry THEN switch on humidifier

Rule 5: IF fire THEN delete dry.

Fact 1: alarm-beeps

Fact 2: dry

Fact 3: hot

In first cycle, 2 rules apply: Rule 2 and Rule 4.

If Rule 4 chosen, humidifier switched on.

If Rule 2 chosen, then Rules 1, 3 and 5 apply, and humidifier never switched on.

Therefore, Forward chaining systems need conflict resolution strategies.

For example – we could prefer rules involving facts recently added to memory. Therefore, if Rule 2 fires, next rule is Rule 1 as smoky recently added.

Or could prioritise rules. Give Rule 4 a lower priority.

Inference by pattern matching:

Increases flexibility and allows more complex facts:

e.g. Temperature (kitchen, hot) instead of hot

Could have Rule 6:

If Temperature (room, hot) AND

Environment (room, smoky),

Then ADD

Fire-in (Room).

Fact 6: Temperature (kitchen, hot)

Fact 7: Environment (kitchen, smoky)

Therefore Fire-in (Kitchen)

added to memory.

Forward versus backward chaining:

depends on how many possible hypotheses to consider.

If few, then backward chaining (e.g. MYCIN).

If many, then forward chaining (e. XCON).

Backward chaining also known as abduction,

the basic form of scientific explanation

(I.e. find some assumption that proves this fact true).

Necessary ES components: IF-THEN rules, + facts, + interpreter

Two types of interpreter: forward chaining and backward chaining.

Forward chaining: Start with some facts, and use rules to draw new conclusions.

Backward chaining: Start with hypothesis (goal) to prove, and look for rules to prove that hypothesis.

Forward chaining: data-driven (alias bottom-up)

Backward chaining: goal-driven (alias top-down)