56

Prof. Sin-Min Lee

Embed Size (px)

DESCRIPTION

Final Revision 2. CS 157A Lecture 26. Prof. Sin-Min Lee. Review Theoretical Query Languages. Relational Algebra. SELECT ( σ ) PROJECT ( π ) UNION (  ) SET DIFFERENCE ( – ) CARTESIAN PRODUCT (  ) RENAME ( ρ ). RA: gives semantics to practical query languages - PowerPoint PPT Presentation

Citation preview

ReviewTheoretical Query Languages

1. SELECT ( σ )2. PROJECT ( π )3. UNION ( )4. SET DIFFERENCE ( – )5. CARTESIAN PRODUCT ( )6. RENAME ( ρ )

Relational Algebra

• RA: gives semantics to practical query languages• Above set: minimal relational algebra

will look at some redundant (but useful!) operators today

Review

Find the names of customers who have both accounts and loans

T1 ρT1 (cname2, lno)

(borrower)

T2 depositor T1

T3 σcname = cname2 (T2)

Result π cname (T3)

Above sequence of operators (ρ, , σ) very common.

Express the following query in the RA:

Motivates additional (redundant) RA operators.

Relational AlgebraRedundant Operators

• 5. Update ( ) (we’ve already been using)

2. Division ( )

3. Generalized Projection (π)

1. Natural Join ( )

4. Outer Joins ( )

• Redundant: Above can be expressed in terms of minimal RA e.g. depositor borrower =

π …(σ…(depositor ρ…(borrower)))

• Added as convenience

Natural Join

Idea: combines ρ, , σ

A B C D

1

2

2

3

α

α

α

β

+

-

-

+

10

10

20

10

E B D

‘a’

‘a’

‘b’

‘c’

α

α

β

β

10

20

10

10r s

A B C D E

1

2

2

3

3

α

α

α

β

β

+

-

-

+

+

10

10

20

10

10

‘a’

‘a’

‘a’

‘b’

‘c’

=

Relation1 Relation2Notation:

πcname,acct_no,lno (σcname=cname2 (depositor ρt(cname2,lno) (borrower)))

depositor borrower

Division

A B

α

α

α

β

γ

γ

γ

γ

δ

δ

1

23

1

1

3

4

6

1

2

B

1

2

r

s

A

α

δ

=

Query: Find values for A in r which have corresponding B values for all B values in s

Relation1 Relation2Notation:

Idea: expresses “for all” queries

Division

A B

α

α

α

β

γ

γ

γ

γ

δ

δ

1

23

1

1

3

4

6

1

2

B

1

2

r

s

A

α

δ

=

17 3 = 5

The largest value of i such

that: i 3 ≤ 17

t

Relational Division

The largest value of t such that:

( t s r )

Another way to look at it: and

A B C D E

α

α

α

β

β

γ

γ

γ

a

a

a

a

a

a

a

a

α

γ

γ

γ

γ

γ

γ

β

a

a

b

a

b

a

b

b

1

1

1

1

3

1

1

1

D E

a

b

1

1

r

s

A B C

α

γ

a

a

γγ

=

t

?

Division

A More Complex Example

e1,…,en (Relation)

e1,…,en can include arithmetic expressions – not just attributes

cname limit balance

Jones

Turner

5000

3000

2000

2500

credit =

π cname, limit - balance (credit) = cname limit-balance

Jones

Turner

3000

500

Generalized Projection

Notation:

Example

Then…

bname lno amt

Downtown

Redwood

Perry

L-170

L-230

L-260

3000

4000

1700

loan =

cname lno

Jones

Smith

Hayes

L-170

L-230

L-155

borrower =

=

bname lno amt cname

Downtown

Redwood

L-170

L-230

3000

4000

Jones

Smith

Join result loses… any record of Perry any record of Hayes

Outer Joins

Motivation:

loan borrower =

bname lno amt

Downtown

Redwood

Perry

L-170

L-230

L-260

3000

4000

1700

loan =

cname lno

Jones

Smith

Hayes

L-170

L-230

L-155

borrower =

bname lno amt cname

Downtown

Redwood

Perry

L-170

L-230

L-260

3000

4000

1700

Jones

Smith

• preserves all tuples in left relation

1. Left Outer Join ( )

┴ = NULL

Outer Joins

loan borrower =

bname lno amt cname

Downtown

Redwood

L-170

L-230

L-155

3000

4000

Jones

Smith

Hayes

bname lno amt

Downtown

Redwood

Perry

L-170

L-230

L-260

3000

4000

1700

loan =cname lno

Jones

Smith

Hayes

L-170

L-230

L-155

borrower =

• preserves all tuples in right relation2. Right Outer Join ( )

┴ = NULL

Outer Joins

loan borrower =

bname lno amt

Downtown

Redwood

Perry

L-170

L-230

L-260

3000

4000

1700

loan =cname lno

Jones

Smith

Hayes

L-170

L-230

L-155

borrower =

• preserves all tuples in both relations3. Full Outer Join ( )

┴ = NULL

Outer Joins

bname lno amt cname

Downtown

Redwood

Perry

L-170

L-230

L-260

L-155

3000

4000

1700

Jones

Smith

Hayes

loan borrower =

1. Deletion: r r – s e.g., account account – σbname=Perry (account)(deletes all Perry accounts)

2. Insertion: r r se.g., branch branch {(Waltham, Boston, 7M)}(inserts new branch with bname = Waltham, bcity = Boston, assets = 7M)

3. Update: r πe1,…,en (r)

e.g., depositor depositor (ρtemp (cname,acct_no) (borrower))(adds all borrowers to depositors, treating lno’s as acct_no’s)

e.g., account πbname,acct_no,bal*1.05 (account)(adds 5% interest to account balances)

Update

Identifier QueryNotation:

Common Uses:

Another Theoretical Query Language

Relational CalculusTwo flavors:

• Tuple relational calculus (TRC)• Domain relational calculus (DRC)

Logic-based query language({x | … }, , , , , , , …)

More declarative than RARA: πlno (σamt > 1000 (loan))

Procedural

1. Select loan tuples with amt > 10002. Project the result of 1 on lno

TRC: {t | s loan (t [lno] = s [lno] s [amt] > 1000) }

Non-procedural

• No order of evaluation implied• Basis for SQL

Bank DatabaseAccount

bname acct_no balance

DowntownMianusPerryR.H.

BrightonRedwoodBrighton

A-101A-215A-102A-305A-201A-222A-217

500700400350900700750Depositor

cname acct_noJohnsonSmithHayesTurner

JohnsonJones

Lindsay

A-101A-215A-102A-305A-201A-217A-222

Customer

cname cstreet ccityJonesSmithHayesCurry

LindsayTurner

WilliamsAdamsJohnsonGlennBrooksGreen

MainNorthMainNorthPark

PutnamNassauSpringAlma

Sand HillSenatorWalnut

HarrisonRye

HarrisonRye

PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford

Branch

bname bcity assetsDowntownRedwood

PerryMianus

R.H.Pownel

N. TownBrighton

BrooklynPalo AltoHorseneckHorseneckHorseneckBennington

RyeBrooklyn

9M2.1M1.7M0.4M8M

0.3M3.7M7.1M

Borrower

cname lnoJonesSmithHayes

JacksonCurrySmith

WilliamsAdams

L-17L-23L-15L-14L-93L-11L-17L-16

Loan

bname lno amtDowntownRedwood

PerryDowntown

MianusR.H.Perry

L-17L-23L-15L-14L-93L-11L-16

1000200015001500500900

1300

Tuple Relational CalculusSome Queries

bname lno amt

Redwood

Perry

Downtown

Perry

L-23

L-15

L-14

L-16

2000

1500

1500

1300

1. Find loans for amounts > $1200{t | t loan t[amt] > 1200}

Basic Form: {x | P(x)}• set comprehension: “the set of all x such that P(x) is true”• x: tuple variable• logic contained in predicate (P)1. t loan2. t [amt] > 12003. (1) (2) (equivalent to σ (2) (loan))

Result

Given {x | P(x)}, what can P(x) be?

1. Simple predicate (, =, ≠, <, >, ≤, ≥)

e.g., t loane.g., t [amt] > 1200

2. Compound predicate (, , , )

e.g., (t loan) t [amt] > 1200e.g., (t [bname] = “Downtown”)e.g., (t [bname] = “Downtown”) t [amt] > 1200

( OR, AND, NOT)

Tuple Relational CalculusPredicates

3. Quantified Predicates (, )

(a) Existential Quantification ()

Given {x | P(x)}, what can P(x) be?

Tuple Relational CalculusPredicates

• true if there exists some tuple in r (t) such that Q(t) is true• e.g., s loan (s [lno] = “L-17”)

t r (Q (t))

(b) Universal Quantification ()

• true if for all tuples in r (t), Q(t) is true• e.g., s loan (s [amt] > 100)

t r (Q (t))

bname

lno amt

Mianus

L-93 500

2. {t | t loan s loan (s [amt] > t [amt])}

A. Returns everything in loan except for (Redwood, L-23, 2000)

3. {t | t loan s loan (s [amt] > t [amt])}

A. Returns

Q. Express a TRC query to find the largest loan

A. {t | t loan s loan (s [amt] < t [amt])} OR

{t | t loan ( s loan (s [amt] > t [amt]))}

Tuple Relational CalculusMore Queries

lno

L-23

L-15

L-14

L-16

σ : Find loans for amts > 1200{t | t loan (t [amt] > 1200)}

t | t loan indicates that t has same structure as tuples in loan

π: Find loan numbers for all loans for amts > 1200

{t | s loan (t [lno] = s [lno] s [amt] > 1200)}

No predicate of form: t relation

t consists of attributes used in set comprehension with t (i.e., lno)

Result =

Tuple Relational CalculusProjection Queries

bname bcity

Downtown

R.H.

N.Town

Brighton

Brooklyn

Horseneck

Rye

Brooklyn

Q. Find names and cities of branches with assets > $3M.

A. {t | s branch (t [branch] = s [branch] t [bcity] = s [bcity] s [assets] > 3M)}

Result =

Tuple Relational CalculusProjection Queries

Find the names of customers w/ loans at the Perry branch.

Answer has form {t | P(t)}.

Strategy for determining P(t):

1. What tables are involved?

2. What are the conditions?

borrower (s), loan (u)

(a) Projection: t [cname] = s [cname](b) Join: s [lno] = u [lno](c) Selection: u [bname] = “Perry”

Tuple Relational CalculusJoin Queries

A. {t | s borrower (P(t,s))} such that:

P(t,s) t [cname] = s [cname] u loan (Q(t,s,u))Q(t,s,u) s [lno] = u [lno] u [bname] = “Perry”

OR

{t | s borrower (t [cname] = s [cname] u loan (s [lno] = u [lno] u [bname] = “Perry”))}

Find the names of customers w/ loans at the Perry branch.

Tuple Relational CalculusJoin Queries

unfolded version (either is ok)

Q. Find loan numbers of loans held at branches in Brooklyn.1. Tables involved

loan (s), branch (u)

2. Conditions

(a) Projectiont [lno] = s [lno]

(b) Joins [bname] = u [bname]

(c) Selectionu [bcity] = “Brooklyn”

A. {t | s loan (P(t,s))} such that:

P(t,s) t [lno] = s [lno] u branch (Q(t,s,u))

Q(t,s,u) s [bname] = u [bname] u [bcity] = “Brooklyn”

Tuple Relational CalculusJoin Queries

Q. Find the names and cities of customers having a loan from the Perry branch

1. Tables involvedborrower (s), customer (u), loan (v)

2. Conditions(a) Projection

t [cname] = s [cname]t [ccity] = u [ccity]

(b) Joins [cname] = u [cname]s [lno] = v [lno]

(c) Selectionv [bname] = “Perry”

A. {t | s borrower (P(t,s))}

P(t,s) t [cname] = s [cname] u customer (Q(t,s,u))

Q(t,s,u) t [ccity] = u [ccity] s [cname] = u [cname] v loan (R(t,s,u,v))

R(t,s,u,v) s [lno] = v [lno] v [bname] = “Perry”

Tuple Relational CalculusJoin Queries

bname lno amt

Redwood

Downtown

Mianus

R.H

Perry

Perry

L-23

L-14

L-93

L-11

L-16

L-15

2000

1500

500

900

1300

1500

P(t)

Q(t)

Resembles if … then

Example

Result =

p q : true if p being true always means q is also truep q ≡ p q

{t | t loan P(t) Q(t) }

P(t) ≡ t [bname] = “Perry”Q(t) ≡ t [amt] > 1000

Tuple Relational CalculusImplication ()

Often is used with to express “for all” queries

e.g., Find names of customers who have an account at all branches

located in Brooklyn

Connection of all to implies

Rewording of example:

Find names of customers for whom the following property hold:For every branch, if the branch in located in Brooklyn, this implies that the customer has an account at that branch.

Tuple Relational CalculusImplication ()

Tuple Relational CalculusImplication (cont.)

Q. Find names of customers for whom the following property holds: For every branch, if the branch is located in Brooklyn, this implies that the customer has an account at that branch.

A. {t | s branch (s [bcity] = “Brooklyn” P(t,s))}

What is P(t,s)?

Tuple Relational CalculusImplication (cont.)

Q. Find names of customers for whom the following property holds: For every branch, if the branch is located in Brooklyn, this implies that the customer has an account at that branch.

A. {t | s branch (s [bcity] = “Brooklyn” P(t,s))}

1. Tables involved

branch (s), depositor (u), account (v)

2. Conditions

(a) Implications [bcity] = “Brooklyn”

(b) Projectiont [cname] = u [cname]

(c) Joins [bname] = v [bname]u [acct_no] = v [acct_no]

(d) Selection -

Tuple Relational CalculusImplication (cont.)

Q. Find names of customers for whom the following property holds: For every branch, if the branch is located in Brooklyn, this implies that the customer has an account at that branch.

A. {t | s branch (s [bcity] = “Brooklyn” P(t,s))}P(t,s) ≡ u depositor (Q(t,s,u))Q(t,s,u) ≡ t [cname] = u [cname] v account (R(t,s,u,v))R(t,s,u,v) ≡ s [bname] = v [bname] u [acct_no] = v [acct_no]

Domain Relational CalculusAtoms & Formulas

LetDi be a domain variablec be a domain constant be a comparison operator

Atoms• r(D1, D2, …, Dn)• Di Dj

• Di c

Let FF, F1F1 and F2F2 be formulasFormulas

• ( FF )• not FF• F1F1 and F2F2• F1F1 or F2F2

Let D be free* in FF(D) • (exists D) FF(D)• (forall D) FF(D)

* a variable is free in a formula if it is not quantified by exists or forall

Domain Relational CalculusValid Expression

{ D1, …, Dn | FF (D1, …, Dn) }is a valid DRC expression if it has only the variables appearing to the left of the vertical bar | free in FF.Any other variable appearing in FF must be bound.

free vs. bound variables• free (global): variable is not explicitly quantified• bound (free): variable is declared explicitly

through quantification and its scope is the quantified formula

Domain Relational CalculusRelational Completeness

condition (r):

{ R1, …, Rn | r(R1, …, Rn) and condition}

ai,…,aj(r):

{ Ri, …, Rj | r(R1, …, Ri, …, Rj, …, Rn)}

r s:

{ D1, …, Dn | r(D1, …, Dn) or s(D1, …, Dn) }

r - s:

{ D1, …, Dn | r(D1, …, Dn) and not s(D1, …, Dn) }

q × r :

{ Q1, …, Qm, R1, …, Rn | q(Q1, …, Qm) and r(R1, …, Rn) }

Tuple Relational CalculusSyntax Summary

{ T1, …, Tn | FF (T1, …, Tn) }

• F F describes the properties of the data to be retrieved.

• The output schema of FF is given by the tuple variables

T1, …, Tn that act as global variables in FF.

Tuple Relational CalculusAtoms & Formulas

LetT and Ti be tuple variablesaj be an attributec be a domain constant be a comparison operator

Atoms• r(T)• Ti.am Tj.an

• T.ai c

Let FF, F1F1 and F2F2 be formulasFormulas

• ( FF )• not FF• F1F1 and F2F2• F1F1 or F2F2

Let T be free* in FF(T) • (exists T) FF(T)• (forall T) FF(T)

* a variable is free in a formula if it is not quantified by exists or forall

Tuple Relational CalculusValid Expression

{ T1, …, Tn | FF (T1, …, Tn) }is a valid TRC expression if it has only the variables appearing to the left of the vertical bar | free in FF.Any other variable appearing in FF must be bound.

free vs. bound variables• free (global): variable is not explicitly quantified• bound (free): variable is declared explicitly through

quantification and its scope is the quantified formula

Tuple Relational CalculusRelational Completeness

condition (r):

{ R| r(R) and condition}

ai…,aj(r):

{ R.ai, …, R.aj | r(R)}

r s:

{ T | r(T) or s(T) }

r - s:

{ T | r(T) and not s(T) }

q × r :

{ Q, R | q(Q) and r(R) }

Introduction

- Relational algebra is procedural: it specifies the procedure to be followed in order to get the answer to the query.

- Relational calculus is declarative: it describes (declares) the answer to the query without specifying how to get it.

- Relational calculus strongly resembles First Order Predicate Logic, or simply first order logic.

- There are two variants of relational calculus:

- Tuple relational calculus (TRC)

- Domain relational calculus (DRC)

TUPLE RELATIONAL CALCULUS

- A query statement in TRC is a set declaration having the form:

{ P first-order logic formula}

- This is to be read as ‘the set of tuple variables, P, for which the specified first order logic formula is true’.

- Thus a TRC query is a request (to the DBMS) to produce a set of tuples corresponding to the tuples of the relational answer in SQL.

- Example

Given the following query:

(Q11) Find all sailors with a rating above 7.

The TRC statement of this query is

{S S Sailors S.rating > 7}.

SYNTAX AND SEMANTICS OF TRC

• The syntax and semantics of TRC is that of first-order logic. It is stated quite precisely in the text and there is no need to repeat it here. Instead we shall examine a few query applications.

QUERY Q12(Q12) Find the names and ages of sailors with a rating above 7.

{P S ∃ Sailors (S.rating > 7 P.name = S.name P.age = S.age)}

Remarks

1. The fact that the tuple variable P occurs with two attributes (using the dot notation) means that solely these two attributes are required in the answer relation.

2. The symbols used are the usual first-order logic symbols:

∀: for all : there exists ∃ ⋀ : and : or ¬ : not : implies⋁ ⇒

QUERIES 1,2,7,9,14

The TRC statements for these queries are pretty well self explanatory, especially with the added English statements of how to read them.

DOMAIN RELATIONAL CALCULUS (1)

- The form of a DRC query is as follows:

{<X1, X2, … , Xn> logical DRC formula}

signifying that the system must construct (and output) a set of all the tuples which satisfy the stated logical DRC formula in terms of the n attributes X1, X2, … ,Xn. Thus, the answer is a relational instance with attributes X1, X2, … , Xn, these attributes corresponding to those of some of the relations in the database.

- Again, the approach used by the system is left unspecified.

- The Syntax and the semantics of the DRC are explicitly and precisely described in the text.

DOMAIN RELATIONAL CALCULUS (2)

Example:

(Q11) Find all sailors with a rating above 7.

{ < I, N, T, A > <I, N, T, A > ∈ Sailors ⋀ T > 7 }

Other queries are illustrated and described in the text with all necessary explanation.

EXPRESSIVE POWER OF ALGEBRA AND CALCULUS( 1)

Safety- Certain queries stated in the relational calculus may lead to

answers which contain an infinite number of tuples (or at least as many as the system can handle).

Example:

Consider the TRC query {S ¬(S ∈ Sailors)}. Since there is a quasi-infinite number of tuples that can be created with the attributes of sailors, the answer is (quasi)-infinite.

- A query which yields a (quasi)-infinite answer is said to be unsafe, and, of course, should not be allowed by the system.

- It is possible to define a safe formula in TRC (see text, section 4.4).

LabSessie 2 Tuple Relational Calculus

• A query in a tuple relational calculus is expressed as

(set of all tuples t such that predicate P is true for t)

• Query with constructs “or”, “and”, “there exists”

(there exists a tuple t in relation such that predicate Q(t) is true)

{t| P(t)}

t r(Q(t))

Tuple Relational Calculus• Query with the construct “implies”

(if P is true, then Q must be true)

• Query with the construct “for all”

(Q is true for all tuples t in relation r)

t r(Q(t))

P Q

Tuple Relational Calculus• Relations

• V(d, k): visits(drinker, kroeg)

• S(k,b): servers(kroeg, bier)

• L(d,b): likes(drinker, bier)

• Example (one relation)– We want to have the drinkers and the pubs for visitors

of the pub ‘Café’:

{t|tV t[k]=‘Café’}

{t| P(t)}

V(d, k)S(k,b)L(d,b)

t r(Q(t))

Tuple Relational Calculus• Example (one relation)

– If we want only the drinker attribute rather than all the attribute of the V relation: {t| s V ( t[d] = s[d] s[k]=‘Café’)}

• Example (two relations)– Fin the names of drinkers that likes Duvel

{t| s V ( t[d] = s[d] u L ( s[d] = u[d]

u[b]=‘Duvel’ ))}

V(d, k)S(k,b)L(d,b)

V(d, k)S(k,b)L(d,b)

Tuple Relational Calculus• Example (union)

– If we want all beers (server or liked)

{t | s S ( t[b] = s[b])

u L ( t[b] = u[b]) }

• Example (intersection)– If we want the beers that are both served and liked

{t | s S ( t[b] = s[b])

u L ( t[b] = u[b]) }

V(d, k)S(k,b)L(d,b)

S L

S L

Functional Dependency

• Canonical Cover– Definition

– Fc is a minimal set of the functional dependencies that has the same closure as a given set of functional dependencies

– Application: reducing the effort spent in checking for constraint violation

A canonical cover Fc for F is a set of dependencies such that F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F.

Functional Dependency

• Canonical Cover– Computing Canonical Cover

Fc=FRepeat

• Use the union rule to replace any dependencies in Fc of the form X Y and X Z with X YZ

• Find a functional dependency X Y in Fc with an extraneous attribute either in X or in Y• If an extraneous attribute is found:

delete it from X Y

Until Fc does not change

Extraneous attributeAttribute of a functional dependency that can be removed without changing the closure of the set of functional dependencies

NoteThe test for extraneous attributes is done using Fc, not F

Functional Dependency

• Computing Canonical Cover– Exercise

F={BCD A, BC E, A F, F G, C D, A G}

• Fc = F

• Union rule: A F, A G then A FG

• Fc = {BCD A, BC E, F G, C D, A FG}

Functional Dependency

• Computing Canonical Cover– Exercise

• Fc = {BCD A, BC E, F G, C D, A FG}• D is an extraneous attribute in BCD A

– To prove F |- (F -{BCD A}) {BC A}– Proof C D (given)

C CD (augmentation)BC BCD (augmentation)BCD A (given)BC A (transitivity)

• Fc = {BC A, BC E, F G, C D, A FG}

Functional Dependency

• Computing Canonical Cover– Exercise

• Fc = {BC A, BC E, F G, C D, A FG}• G is an extraneous attribute in A FG

– To prove: (F - {A FG}) {A F} |- F– Proof: F G (given)

F FG (augmentation)A F (given in F)A FG (transitivity)

• Fc = {BC A, BC E, F G, C D, A F}• Union rule: BC A, BC E then BC AE• Fc = {BC AE, F G, C D, A F}