17
Wolf-Tilo Balke Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig www.ifis.cs.tu-bs.de Relational Database Systems 1 Implementing the relational model SQL Queries SELECT Data manipulation language (next lecture) INSERT UPDATE DELETE Data definition language (next lecture) CREATE TABLE ALTER TABLE DROP TABLE Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2 Overview A hotel database: Hotel(hotelNo , hotelName, city) Room(roomNo , hotelNo Hotel, type, price) Booking(hotelNo Hotel/Room, guestNo Guest, dateFrom , dateTo, roomNo Room) Guest(guestNo , guestName, guestAddress) Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3 Exercise 6.1 Give relational algebra expressions for the following queries in natural language: a) Create a list of all hotels that includes the total number of rooms for each hotel. π hotelName, count roomNo ( hotelNo, hotelName count(roomNo) (Hotel Room)) Of course, this does not work for hotels without any rooms. However, let‟s assume that there is no such hotel ... b) What‟s the average room price at the Balke Inn? π average price ( average(price) ( ς hotelName = „Balke Inn‟ (Hotel) Room)) Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4 Exercise 6.1 c) How many different people made a reservation for a room whose price is higher than the average room price at the respective hotel? AvgPrice ≔π hotelNo, average price ( hotelNo average(price) (Hotel Room)) ExpensiveRooms π hotelNo, roomNo (ς price > average price (Room AvgPrice)) RichGuests = π guestNo (Booking ExpensiveRooms) Result = count(guestNo) RichGuests Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5 Exercise 6.1 Describe (using natural language) the relations that would be produced by the following tuple relational calculus expressions: a) { H.hotelName | hotel(H) H.city = „Braunschweig} List the names of all hotels in Braunschweig! b) { H.hotelName | Hotel(H) ∧ ∃R ( Room(R) H.hotelNo = R.hotelNo R.price > 50)} List the names of all hotels offering a room that costs more than 50! Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6 Exercise 6.2

Relational Database Systems 1 - TU Braunschweig

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Relational Database Systems 1 - TU Braunschweig

Wolf-Tilo Balke

Joachim Selke

Institut für Informationssysteme

Technische Universität Braunschweig

www.ifis.cs.tu-bs.de

Relational

Database Systems 1

• Implementing the relational model

• SQL

– Queries

• SELECT

– Data manipulation language (next lecture)

• INSERT

• UPDATE

• DELETE

– Data definition language (next lecture)

• CREATE TABLE

• ALTER TABLE

• DROP TABLE

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

Overview

• A hotel database:

Hotel(hotelNo, hotelName, city)

Room(roomNo, hotelNo → Hotel, type, price)

Booking(hotelNo → Hotel/Room, guestNo → Guest,

dateFrom, dateTo, roomNo → Room)

Guest(guestNo, guestName, guestAddress)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

Exercise 6.1

• Give relational algebra expressions for the following queries in natural language:

a) Create a list of all hotels that includes the total number of rooms for each hotel.

πhotelName, count roomNo(

hotelNo, hotelName𝔉count(roomNo) (Hotel ⋈ Room))

• Of course, this does not work for hotels without any rooms. However, let‟s assume that there is no such hotel ...

b) What‟s the average room price at the Balke Inn?

πaverage price(𝔉average(price) (

ςhotelName = „Balke Inn‟(Hotel) ⋈ Room))

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

Exercise 6.1

c) How many different people made a reservation for a

room whose price is higher than the average room

price at the respective hotel?

AvgPrice ≔ πhotelNo, average price(

hotelNo𝔉average(price) (Hotel ⋈ Room))

ExpensiveRooms ≔πhotelNo, roomNo(ςprice > average price(Room ⋈ AvgPrice))

RichGuests = πguestNo(Booking ⋈ ExpensiveRooms)

Result = 𝔉count(guestNo)RichGuests

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

Exercise 6.1

• Describe (using natural language) the relations

that would be produced by the following

tuple relational calculus expressions:

a) { H.hotelName | hotel(H) ∧ H.city = „Braunschweig‟ }

List the names of all hotels in Braunschweig!

b) { H.hotelName | Hotel(H) ∧ ∃R (

Room(R) ∧ H.hotelNo = R.hotelNo ∧ R.price > 50)}

List the names of all hotels offering a room that

costs more than €50!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6

Exercise 6.2

Page 2: Relational Database Systems 1 - TU Braunschweig

c) { H.hotelName | Hotel(H) ∧ ∃B ∃G (

Booking(B) ∧ Guest(G) ∧ H.hotelNo = B.hotelNo

∧ B.guestNo = G.guestNo

∧ G.guestName = „Wolf-Tilo Balke‟)}

List the names of all hotels in that Wolf-Tilo Balke

has or had a reservation!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

Exercise 6.2

d) { H.hotelName, G.guestName, B1.dateFrom, B2.dateFrom |Hotel(H) ∧ Guest(G) ∧ Booking(B1) ∧ Booking(B2)

∧ H.hotelNo = B1.hotelNo ∧ G.guestNo = B1.guestNo

∧ B2.hotelNo = B1.hotelNo ∧ B2.guestNo = B1.guestNo

∧ B2.dateFrom ≠ B1.dateFrom}

List the names of all hotels and guests that

have or had different reservations at the same hotel;

for each hotel and guest, also list all possible pairs

of starting dates for the corresponding reservations.

• Hint: (hotelNo, guestNo, dateFrom) is the primary key of

Booking, which makes each result line a pair of reservations

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

Exercise 6.2

• Generate the relational algebra, tuple relational

calculus, and domain relational calculus

expressions for the following queries:

a) List all hotels.

RA: πhotelName(Hotel)

TRC: { h.hotelName | Hotel(h) }

DRC: { h2 | ∃h1, h3 Hotel(h1, h2, h3) }

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

Exercise 6.3

b) List all single rooms with a price below €20 per night.

RA: πhotelName, city, roomNo(

ςtype = „single‟ ∧ price < 20 (Room ⋈ Hotel))

TRC: { h.hotelName, h.city, r.roomNo |

Hotel(h) ∧ Room(r) ∧ r.type = „single‟

∧ r.price < 20 ∧ h.hotelNo = r.hotelNo }

DRC: { h2, h3, r1 | ∃r4, h1 (Room(r1, h1, „single‟, r4)

∧ r4 < 20 ∧ Hotel(h1, h2, h3)) }

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

Exercise 6.3

c) List the price and type of all rooms at the Balke Inn.

RA: πroomNo, type, price(Room ⋈ ςhotelName = „Balke Inn‟(Hotel))

TRC: { r.roomNo, r.roomType, r.price | Room(r)

∧ ∃h (Hotel(h) ∧ h.hotelName = „Balke Inn‟

∧ h.hotelNo = r.hotelNo) }

DRC: { r1, r3, r4 | ∃h1, h3 (Room(r1, h1, r3, r4)

∧ Hotel(h1, „Balke Inn‟, h3)) }

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

Exercise 6.3

d) List the details (guestNo, guestName, and guestAddress)

of all guests currently staying at the Balke Inn.

RA: Guest ⋉ ςdateFrom ≤ TODAY ∧ dateTo ≥ TODAY

∧ hotelName = „Balke Inn‟(Hotel ⋈ Booking)

TRC: { g | Guest(g) ∧ ∃b, h (Booking(b) ∧ Hotel(h)

∧ b.guestNo = g.guestNo ∧ b.hotelNo = h.hotelNo

∧ b.dateFrom ≤ TODAY ∧ b.dateTo ≥ TODAY

∧ h.hotelName = „Balke Inn‟) }

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

Exercise 6.3

Page 3: Relational Database Systems 1 - TU Braunschweig

DRC: { g1, g2, g3 | Guest(g1, g2, g3) ∧ ∃h1, h3, b3, b4, b5 ( Hotel(h1, „Balke Inn‟, h3)

∧ Booking(h1, g1, b3, b4, b5) ∧ b3 ≤ TODAY

∧ b4 ≥ TODAY) }

e) List the details of all rooms at the Balke Inn,including the name of the guest currently stayingin the room, if the room is occupied.

RA: πroomNo, type, price, guestName(Room

⎧ (ςhotelName = „Balke Inn‟(Hotel)⋈ ςdateFrom ≤ TODAY ∧ dateTo ≥ TODAY(Booking)

⋈ Guest))

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

Exercise 6.3

TRC:

BalkeRoom(r) ≔ Room(r) ∧ ∃h (Hotel(h)

∧ h.hotelName = „Balke Inn‟ ∧ h.hotelNo = r.hotelNo)

CurrentBooking(g, r) ≔ Guest(g) ∧ ∃b (Booking(b)

∧ b.hotelNo = r.hotelNo ∧ b.roomNo = r.roomNo

∧ b.dateFrom ≤ TODAY ∧ b.dateTo ≥ TODAY

∧ b.guestNo = g.guestNo)

{ r.roomNo, r.type, r.price, g.guestName | BR(r) ∧

(CB(g, r) ∨ (≦∃g‟ CB(g‟, r) ∧ g = (null, null, null, null))) }

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

Exercise 6.3

DRC:

BalkeRoom(r1, r2, r3, r4) ≔ Room(r1, r2, r3, r4)

∧ ∃h3 Hotel(r2, „Balke Inn‟, h3)

CurrentBooking(g2, r1, r2) ≔ ∃g1, g3 (Guest(g1, g2, g3)

∧ ∃b3, b4 (Booking(r2, g1, b3, b4, r1)

∧ b3 ≤ TODAY ∧ b4 ≥ TODAY))

{ r1 r3, r4, g2 | ∃r2 BR(r1, r2, r3, r4) ∧

(CB(g2, r1, r2) ∨ (≦∃g2‟ CB(g2‟, r1, r2) ∧ g2 = null)) }

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

Exercise 6.3

• In the early 1970s, the relational modelbecame a “hot topic” database research

– Based on set theory

– A relation is a subset of theCartesian product over a list of domains

• Early “query interfaces” for the relational model:

– Relational algebra

– Tuple relational calculus (SQUARE, SEQUEL)

– Domain relational calculus (QBE)

• Question: How to build a working database management system using this theory?

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16

7.1 FromTheory to Practice

• System R was the first working prototype of

a relational database system (starting 1973)

– Most design decisions taken during the

development of System R substantially influenced

the design of subsequent systems

• Questions

– How to store and represent data?

– How to query for data?

– How to manipulate data?

– How do you do all this with good performance?

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17

7.1 FromTheory to Practice

• The challenge of the System R project was to create a working prototype system

– Theory is good

– But developers were willing to sacrifice theoretical beauty and clarity for the sake of usability and performance

• Vocabulary change

– Mathematical terms were too unfamiliar for most people

– Table = relation

– Row = tuple

– Column = attribute

– Data type, domain = domain

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18

7.1 FromTheory to Practice

Page 4: Relational Database Systems 1 - TU Braunschweig

• Design decisions:

During the development of System R, two major

and very controversial decisions had been made

– Allow duplicate tuples

– Allow NULL values

• Those decisions are still subject to discussions…

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19

7.1 FromTheory to Practice

• Duplicates

– In a relation, there cannot be any duplicate tuples

– Also, query results cannot contain duplicates

• Remember: The relational algebra and relational calculi

all had implicit duplicate elimination

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20

7.1 FromTheory to Practice

• Practical considerations

– You want to query for name and birth year of all students of TU Braunschweig

– The result returns roughly 13,000 tuples

– Probably there are some duplicates

– It‟s 1973, and your computer has 16 kilobytes ofmain memory and a very slow external storage device…

– To eliminate duplicates, you need to store the result,sort it, and scan for adjacent duplicate lines• System R engineers concluded that this effort

is not worth the effect

• Duplicate elimination in result sets happens only on-request

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

7.1 FromTheory to Practice

• Decision: Don‟t eliminate duplicates in results

• What about the tables?

– Again: Ensuring that no duplicates end up in the tables requires some work

– Engineers also concluded that there is actually no need in enforcing the no-duplicate policy

• If the user wants duplicates and is willing to deal with all the arising problems – then that‟s fine

• Decision: Allow duplicates in tables

• As a result, the theory underlying relational databases shifted from set theory to multi-set theory

– Straightforward, only notation is more complicated

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

7.1 FromTheory to Practice

• Sometimes, an attribute value is not known or

an attribute does not apply for an entity

– Example: What value should the attribute

universityDegree take for the entity Heinz Müller,

if Heinz Müller does not have any degree?

– Example: You regularly observe the weather and

store temperature, wind strength, and

air pressure every hour – and then

your barometer breaks... What now?

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23

7.1 FromTheory to Practice

• Possible solution:

For each domain, define a value indicating that

data is not available, not known, not applicable, …

– For example, use none for Heinz Müller‟s degree,

use −1 for missing pressure data, ...

– Problem:

• You need such a special value for each domain or use case

• You need special failure handling for queries, e.g.

“compute average of all pressure values that are not −1”

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24

7.1 FromTheory to Practice

Page 5: Relational Database Systems 1 - TU Braunschweig

• Again, system designers chose the simplest solution

(regarding implementation): NULL values

– NULL is a special value which is usable in any domain

and represents that data is just there

• There are many interpretations of what NULL actually means

– Systems have some default rules how to deal with

NULL values

• Aggregation functions usually ignore rows with NULL values

(which is good in most, but not all cases)

• Three-valued logic

• However, creates some strange anomalies

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

7.1 FromTheory to Practice

• Another tricky problem:

How should users query the DB?

• Classical answer:

– Relational algebra and relational calculi

– Problem: More and more non-expert users

• More “natural” query interfaces:

– QBE (query by example)

– SEQUEL (structured English query language)

– SQL: the current standard; derived from SEQUEL

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26

7.1 FromTheory to Practice

• The design of relational query languages

– Donald D. Chamberlin and Raymond F. Boyce

worked on this task

– Both of IBM Research in San Jose, California

– Main concern: “Querying relational databases

is too difficult with current paradigms”

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

7.2 SQUARE & SEQUEL

• “Current paradigms” at the time:

– Relational algebra

• Requires users to define how and in which order data

should be retrieved

• The specific choice of a sequence of operations has an

enormous influence on the system‟s performance

– Relational calculi (tuple, domain)

• Provide declarative access to data, which is good

• Just state what you want and not how to get it

• Relational calculi are quite complex:

many variables and quantifiers

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

7.2 SQUARE & SEQUEL

• Chamberlin and Boyce‟s first result was a

query language called SQUARE

– “Specifying queries as relational expressions”

– Based directly on tuple relational calculus

– Main observations:

• Most database queries are rather simple

• Complex queries are rarely needed

• Quantification confuses people

• Under the closed-world assumption,

any TRC expression with quantifiers can be replaced

by a join of quantifier-free expressions

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29

7.2 SQUARE & SEQUEL

• SQUARE is a notation for (or interface to) TRC

– No quantifiers, implicit notation of variables

– Adds additional functionality needed in practice

(grouping, aggregating, among others)

– Solves safety problem by introducing the

closed world assumption

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30

7.2 SQUARE & SEQUEL

Page 6: Relational Database Systems 1 - TU Braunschweig

• Retrieve the names of all female students

– TRC: { t.name | students(t) ⋀ t.sex = „f ‟ }

– SQUARE: namestudentssex („f ‟)

• Get all exam results better than 2.0 in course 101

– TRC:{ t.result | exams(t) ∧ t.crsNr = 101 ∧ t.result < 2.0}

– SQUARE: resultexamscrsNr, result (101, <2.0)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31

What part of the

result tuples

should be returned?The range relation of

the result tuples

Attributes with conditions

Conditions

7.2 SQUARE & SEQUEL

• Get a list of all exam results better than 2.0

along with the according student name

– TRC:

{ t1.name, t2.result | students(t1) ⋀ exams(t2)

⋀ t1.matNr = t2.matNr ⋀ t2.result < 2.0 }

– SQUARE:

name result students matNr ⃘ matNr examsresult (<2.0)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

Join of two SQUARE queries

7.2 SQUARE & SEQUEL

Also, ∪, ∩, and ∖ can be used

to combine SQUARE queries.

• Also, SQUARE is relationally complete

– You do not need explicit quantifiers

– Everything you need can be done using

conditions and query combining

• However, SQUARE was not well received

– Syntax was difficult to read and parse, especially

when using text console devices:

• name result students matNr ⃘ matNr exams crsNr result (102, <2.0)

– SQUARE‟s syntax is too mathematical and artificial

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

7.2 SQUARE & SEQUEL

• In 1974, Chamberlin & Boyce proposed SEQUEL

– Structured English Query Language

– Based on SQUARE

• Guiding principle:

– Use natural English keywords to structure queries

– Supports “fluent” vocalization and notation

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

7.2 SQUARE & SEQUEL

• Fundamental keywords

– SELECT: What attributes should be retrieved?

– FROM: What relations are involved?

– WHERE: What conditions should hold?

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35

7.2 SQUARE & SEQUEL

• Get all exam results better than 2.0 for course 101

– SQUARE:

resultexamscrsNr result (101, < 2.0)

– SEQUEL:

SELECT result FROM exams

WHERE crsNr = 101 AND result < 2.0

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36

7.2 SQUARE & SEQUEL

Page 7: Relational Database Systems 1 - TU Braunschweig

• Get a list of all exam results better than 2.0,

along with the according student names

– SQUARE:

name result students matNr ⃘ matNr result examsresult (< 2.0)

– SEQUEL:

SELECT name, result

FROM students, exams

WHERE students.matNr = exams.matNr

AND result < 2.0

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

7.2 SQUARE & SEQUEL

• IBM integrated SEQUEL into System R

• It proved to be a huge success

– Unfortunately, the name SEQUEL already

has been registered as a trademark

by the Hawker Siddeley aircraft company

– Name has been changed to SQL (spoken: Sequel)

• Structured query language

– Patented in 1985 by IBM

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38

7.2 SQUARE & SEQUEL

• Since then, SQL has been adopted by all(?)

relational database management systems

• This created a need for standardization:

– 1986: SQL-86 (ANSI standard, ISO standard)

– SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008

– The official pronunciation is “es queue el”

• However, most database vendors treat the

standard as some kind of “recommendation”

– More on this later (next lecture)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39

7.2 SQUARE & SEQUEL

• There are three major classes of DB operations:

– Defining relations, attributes, domains, constraints, ...• Data definition language (DDL)

– Adding, deleting and modifying tuples

• Data manipulation language (DML)

– Asking queries

• Often part of the DML

• SQL covers all these classes

• In this lecture, we will use IBM DB2’s SQL dialect

– DB2 is used in our SQL lab

– Similar notation in other RDBMS(at least for the part of SQL we teach you in this lecture)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

7.3 Overview of SQL

• According to the SQL standard, relations and other database objects exist in an environment

– Think “environment = RDBMS”

• Each environment consists of catalogs

– Think “catalog = database”

• Each catalog consists of a set of schemas

– Think “schema = group of tables (and other stuff)”

• A schema is a collection of database objects (tables, domains, constraints, ...)

– Each database object belongs to exactly one schema

– Schemas are used to group related database objects

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41

7.3 Overview of SQL

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42

7.3 Overview of SQL

Environment

“BigCompany database server”

Catalog

“humanResources”

Schema

“people”

SchemaSchema

“training”

Catalog

“production”

Schema “products”Schema “products”

Schema “testing”

Schema “taxes”Schema “taxes”

Table “staff”

Table

“has Office”

...

...

...

...

Page 8: Relational Database Systems 1 - TU Braunschweig

• When working with the environment,users connect to a single catalog andhave access to all database objects in this catalog

– However, accessing/combining data objects from different catalogs usually is not possible

– Thus, typically, catalogs are the maximum scopeover that SQL queries can be issued

– In fact, the SQL standard defines an additional layerin the hierarchy on top of catalogs

• Clusters are used to group related catalogs

• According to the standard, they provide the maximum scope

• However, hardly any vendor supports clusters

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43

7.3 Overview of SQL

• After connecting to a catalog, database objects can be referenced using their qualified name

– Qualified name = schemaname.objectname

• However, when working only with objects from asingle schema, using unqualified names would be nice

– Unqualified name = objectname

• One schema always is defined to be the default schema

– SQL implicitly treats objectname as defaultschema.objectname

– The default schema can be set to schemaname as follows:SET SCHEMA schemaname

– Initially, after login: default schema = current user name• Remember to change the default schema accordingly!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44

7.3 Overview of SQL

• SQL queries

– Simple SELECT

– Joins

– Set operations

– Aggregation and grouping

– Subqueries

– Writing “good” SQL code

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45

7.4 SQL Queries

• SQL queries are statements that

retrieve information from a DBMS

– Simplest case: From a single table,

return all rows matching some given criterion

– SQL queries may return multi-sets (bags) of rows

• Duplicates are allowed by default

(but can be eliminated on request)

• Even just a single value is a row set

(one row with one column)

• However, often it‟s just called a result set…

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46

7.4 SQL Queries

• Basic structure of SQL queries:

–SELECT <attribute list> FROM <table list> WHERE <condition>

– Attribute list: Attributes to be returned (projection)

– Table list: All tables involved

– Condition: A Boolean expression that

intensionally defines the result set

• If no condition is provided, it is implicitly replaced by TRUE

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47

7.4 SQL Queries

• SQL performs duplicate elimination

of rows in result set

– May be expensive (due to sorting)

– DISTINCT keyword is used

• Example:

– SELECT DISTINCT name FROM staff

• Returns all different names of staff members,

without duplicates

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48

7.4 Enforcing Sets in SQL

Page 9: Relational Database Systems 1 - TU Braunschweig

• The SELECT keyword is sometimes a bit

confusing, since it actually denotes a projection

• To return all attributes under consideration,

the wildcard “*” may be used

• Examples:

– SELECT * FROM …Return all attributes of the tables in the FROM clause

– SELECT movies.* FROM movies, persons WHERE ...Return all attributes of the movies table

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

7.4 Attribute Names

• Attribute names are qualified or unqualified

– Unqualified: Just the attribute name

• Only possible, if attribute name is unique among

the tables given in the FROM clause

– Qualified: tablename.attributename

• Necessary if tables share identical attribute names

• If tables in the FROM clause share identical attribute names

and also identical table names, we need even more

qualification:

schemaname.tablename.attributename

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

7.4 Attribute Names

• The the result set/table consists ofthe attributes given in the SELECT clause

• However, result attributes can be renamed

– Remember the renaming operator ρ fromrelational algebra…

– SQL uses the AS keyword for renaming

– The new names can also be used in the WHERE clause

• Example:

– SELECT persons.personName AS nameFROM persons, … WHERE name = ’Smith’ AND ...• personName is now called name in the result schema

• name = ’Smith’ means persons.personName = ’Smith’

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

7.4 Attribute Names

• Table names can be referenced in the same wayas attribute names (qualified or unqualified)

• However, renaming works slightly different

– The result table of an SQL query has no name

– But tables can be given alias names to simplify queries(also called tuple variables or just aliases)

– Indicated by the AS keyword

• Example:SELECT title, genreFROM movies AS m, genres AS gWHERE m.id = g.id

• The AS keyword is optional: FROM movies m, genres g

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52

7.4 Table Names

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53

7.4 Basic Select

SELECTSELECT expression

ALLALL

DISTINCTDISTINCT

,,

table name

**

..

column nameASAS

FROMFROM table name

alias name

ASAS

,,

query ))((

WHEREWHERE search condition GROUP BYGROUP BY column namecolumn name

,,

HAVINGHAVING search condition

attribute names

table names

query block

• Usually, SQL queries return exactly those tuples

matching a given search condition

– Indicated by the WHERE keyword

– The condition is a logical expression which can be

applied to each row and may have one of three values

TRUE, FALSE, and NULL

• Again, NULL might mean “unknown,” “does not apply,”

“does not matter,” ...

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54

7.4 Conditions

Page 10: Relational Database Systems 1 - TU Braunschweig

• Search conditions are conjunctions of predicates

– Each predicate evaluates to TRUE, FALSE, or NULL

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

7.4 Conditions

search condition

NOTNOT

predicate

search condition(( ))

ANDAND

OROR

• Why TRUE, FALSE, and NULL?

– SQL uses so-called ternary (three-valued) logic

– When a predicate cannot be evaluated because itcontains some NULL value, the result will be NULL• Example: powerStrength > 10 evalutes to NULL

iff powerStrength is NULL

• NULL = NULL also evaluates to NULL

• Handling of NULL by the operators AND and OR:

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

7.4 Conditions

AND TRUE FALSE NULL

TRUE TRUE FALSE NULL

FALSE FALSE FALSE FALSE

NULL NULL FALSE NULL

OR TRUE FALSE NULL

TRUE TRUE TRUE TRUE

FALSE TRUE FALSE NULL

NULL TRUE NULL NULL

NOT NULL

TRUE FALSE

FALSE TRUE

? NULL

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

7.4 Conditionspredicate

expressionexpression

expressionexpression

expressionexpression

expressionexpression

NOTNOT

comparisoncomparison expressionexpression

expressionexpression expressionexpressionBETWEENBETWEEN ANDAND

NOTNOT

NULLNULLISIS

expressionexpression

NOTNOT

LIKELIKE

expressionexpressionESCAPEESCAPE

expressionexpression

NOTNOT

ININ

queryquery(( ))

(( ))expressionexpression

,,

EXISTSEXISTS queryquery(( ))

expressionexpression comparisoncomparison

SOMESOME

ANYANYALLALL

queryquery(( ))

• Predicates usually contain expressions

– Column names or constants

– Additionally, SQL provides some special expressions

• Functions

• CASE expressions

• CAST expressions

• Scalar subqueries

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58

7.4 Conditions

• Expressions can be combined usingexpression operators

– Arithmetic operators:+, -, *, and / with the usual semantics

• age + 2

• price * quantity

– String concatenation ||: (also written as CONCAT ) Combines two strings into one

• firstName || ’ ’ || lastname || ’ (aka ’ || alias || ’)’

• ’Hello’ CONCAT ’ World’ → ’Hello World’

– Parenthesis:Used to modify the evaluation order

• (price + 10) * 20

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59

7.4 Conditions

• Simple comparisons:– Valid comparison operators are

• =, <, <=, >=, and >

• <> (meaning: not equal)

– Data types of expressions need to be compatible (if not, CAST has to be used)

– Character values are usually compared lexicographically (while ignoring case)

– Examples:• powerStrength > 10• name = ’Magneto’• ’Magneto’ <= ’Professor X’

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60

7.4 Conditions

expressionexpression comparisoncomparison expressionexpression

Page 11: Relational Database Systems 1 - TU Braunschweig

• BETWEEN predicate:

– X BETWEEN Y AND Z is a shortcut for

Y <= X AND X =< Z

– Note that you cannot reverse the order of Z and Y

• X BETWEEN Y AND Z is different from

X BETWEEN Z AND Y

• The expression can never be true if Y > Z

– Examples:

• year BETWEEN 2000 AND 2008

• score BETWEEN targetScore - 10 AND targetScore + 10

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61

7.4 Conditions

expressionexpression

NOTNOT

expressionexpression expressionexpressionBETWEENBETWEEN ANDAND

• IS NULL predicate:

– The only way to check if a value is NULL or not

– Returns either TRUE or FALSE

– Examples:

• realName IS NOT NULL

• powerStrength IS NULL

• NULL IS NULL

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62

7.4 Conditions

expressionexpression

NOTNOT

NULLNULLISIS

• LIKE predicate:– The predicate is for matching character strings to patterns

– Match expression must be a character string

– Pattern expression is a (usually constant) string• May not contain column names

– Escape expression is just a single character

– During evaluation, the match expression is compared to the pattern expression with following additions• _ in the pattern expression represents any single character

• % represents any number of arbitrary characters

• The escape character prevents the special semantics of _ and %

– Most modern database nowadays also support more powerful regular expressions (introduced in SQL-99)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63

7.4 Conditionsmatch

expression

match

expression

pattern

expression

pattern

expressionLIKELIKE

escape

expression

escape

expressionESCAPEESCAPENOTNOT

• Examples:– address LIKE ’%City%’

• ’Manhattan’ → FALSE• ’Gotham City Prison’ → TRUE

– name LIKE ’M%_t_’• ’Magneto’ → TRUE• ’Matt’ → TRUE• ’Mtt’ → FALSE

– status LIKE ’_/_%’ ESCAPE ’/’• ’1_inPrison’ → TRUE• ’1–inPrison’ → FALSE• ’%_%’ → TRUE• ’%%%’ → FALSE

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64

7.4 Conditions

match

expression

match

expression

pattern

expression

pattern

expressionLIKELIKE

escape

expression

escape

expressionESCAPEESCAPENOTNOT

• IN predicate:

– Evaluates to true if the value of the test expression is

within a given set of values

– Particularly useful when used with a subquery (later)

– Examples:

• name IN (’Magneto’, ’Batman’, ’Dr. Doom’)

• name IN (SELECT title FROM movies)– Those people having a film named after them…

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65

7.4 Conditions

expressionexpression

NOTNOT

ININ

queryquery(( ))

(( ))expressionexpression

,,

• EXISTS predicate:

– Evaluates to TRUE if a given subquery

returns at least one result row

• Always returns either TRUE or FALSE

– Examples

• EXISTS (SELECT * FROM heroes)

– EXISTS may also be used to express semi-joins

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66

7.4 Conditions

EXISTSEXISTS queryquery(( ))

Page 12: Relational Database Systems 1 - TU Braunschweig

• SOME/ANY and ALL:

– Compares an expression to each value provided bya subquery

– TRUE if

• SOME/ANY: At least one comparison returns TRUE

• ALL: All comparisons return TRUE

– Examples:

• result <= ALL (SELECT result FROM results)– TRUE if the current result is the smallest one

• result < SOME (SELECT result FROM results)– TRUE if the current result is not the largest one

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67

7.4 Conditions

expressionexpression comparisoncomparison

SOMESOME

ANYANYALLALL

queryquery(( ))

• Also, SQL can do joins of multiple tables

• Traditionally, this is performed by simply stating

multiple tables in the FROM clause

– Result contains all possible combinations of all

rows of all tables such that the search condition holds

– If there is no WHERE clause, it‟s a Cartesian product

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 68

7.5 Joins

• Example:

– SELECT * FROM hero, hasAlias

• hero × hasAlias

– SELECT * FROM hero, hasAliasWHERE hero.id = hasAlias.hero_id

• hero ⋈id = hero_id hasAlias

• Besides this common implicit notation of joins,

SQL also supports explicit joins

– Inner joins

– Left/right/full outer joins

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 69

7.5 Joins

• Explicit joins are specified in the FROM clause– SELECT * FROM table1 JOIN table2 ON <join condition>

WHERE <some other condition>

– Often, attributes in joined tables have the same names,

so qualified attributes are needed

– INNER JOIN and JOIN are equivalent

• Explicit joins improve readability of your SQL code!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 70

7.5 Joins

table name

INNER JOININNER JOIN

table name

JOINJOIN

LEFT JOINLEFT JOINRIGHT JOINRIGHT JOIN

FULL JOINFULL JOIN

joined table

ONON condition

Natural join: List students and their exam results– π lastName, crsNr, result students ⋈ exams

– SELECT lastName, crsNr, resultFROM students AS s JOIN exams AS e ON s.matNr = e.matNr

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 71

7.5 Joins

matNr firstName lastName sex

1005 Clark Kent m

2832 Louise Lane f

4512 Lex Luther m

5119 Charles Xavier m

matNr crsNr result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 1.3

students exams

lastName crsNr result

Kent 100 1.3

Kent 101 4.0

Lane 102 2.0

πlastName, crsNr, result students ⋈exams

We lost Lex Luther and Charles Xavier

because they didn’t take any exams!

Also information on student 9876 disappears…

Left outer join: List students and their exam results– π lastName, crsNr, result students ⎧ exams

– SELECT lastName, crsNr, resultFROM students AS s LEFT JOIN exams AS e ON s.matNr = e.MatNr

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 72

7.5 Joins

matNr firstName lastName sex

1005 Clark Kent m

2832 Louise Lane f

4512 Lex Luther m

5119 Charles Xavier m

matNr crsNr result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 1.3

students exams

lastName crsNr result

Kent 100 1.3

Kent 101 4.0

Lane 102 2.0

Luther NULL NULL

Xavier NULL NULL

π lastName, crsNr, result students ⎧exams

Page 13: Relational Database Systems 1 - TU Braunschweig

Right outer join:– π lastName, crsNr, result students ⎨ exams

– SELECT lastName, crsNr, resultFROM students s RIGHT JOIN exams e ON s.matNr = e.MatNr

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 73

7.5 Joins

matNr firstName lastName sex

1005 Clark Kent m

2832 Louise Lane f

4512 Lex Luther m

5119 Charles Xavier m

matNr crsNr result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 1.3

students exams

lastName crsNr result

Kent 100 1.3

Kent 101 4.0

Lane 102 2.0

NULL 100 3.7

π lastName, crsNr, result students ⎨ exams

Full outer join:– π lastName, crsNr, result students ⎩ exams

– SELECT lastName, crsNr, resultFROM students s FULL JOIN exams e ON s.matNr = e.MatNr

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 74

7.5 Joins

matNr firstName lastName sex

1005 Clark Kent m

2832 Louise Lane f

4512 Lex Luther m

5119 Charles Xavier m

matNr crsNr result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 1.3

students exams

lastName crsNr result

Kent 100 1.3

Kent 101 4.0

Lane 102 2.0

Luther NULL NULL

NULL 100 3.7

Xavier NULL NULL

π lastName, crsNr, result students ⎩ exams

• SQL also supports the common set operators

– Set union ∪: UNION

– Set intersection ∩ : INTERSECT

– Set difference ∖: EXCEPT

• By default, set operators eliminate duplicates unlessthe ALL modifier is used

• Sets need to be union-compatible to use set operators

– Row definition must match (data types)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 75

7.6 Set Operators

queryINTERSECTINTERSECT

queryquery-block

literal table

(( ))

EXCEPTEXCEPT

UNIONUNION

ALLALL queryquery

query-blockquery-block

literal tableliteral table

(( ))

• Example:

( SELECT course, result FROM exams WHERE course= 100

EXCEPTSELECT course, result FROM exams WHERE result IS NULL)

UNIONVALUES (100, 2.3), (100, 1.7)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 76

7.6 Set Operators

student course result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 NULL

6676 102 4.3

3412 NULL NULL

exams

course result

100 3.7

100 2.3

100 1.7

query block query

}literal table

• Column functions are used to performstatistical computations

– Similar to aggregate function in relational algebra

– Column functions are expressions

– They compute a scalar value for a set of values

• Examples:

– Compute the average score over all exams

– Count the number of exams each student has taken

– Retrieve the best student

– ...

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 77

7.7 Column Function

• Column functions in the SQL standard:– MIN, MAX, AVG, COUNT, SUM:

Each of these are applied to some other expression• NULL values are ignored

• Function columns in result set just get their column number as name

– If DISTINCT is specified, duplicates are eliminated in advance• By default, duplicates are not eliminated (ALL)

– COUNT may also be applied to *• Simply counts the number of rows

– Typically, there are many more column functions available in your RDBMS (e.g. in DB2: CORRELATION, STDDEV, VARIANCE, ...)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 78

7.7 Column Function

function

name

function

name

column functionexpression

COUNT(*)COUNT(*)

(( ))

ALLALL

DISTINCTDISTINCT

expression

Page 14: Relational Database Systems 1 - TU Braunschweig

• Examples:

– SELECT COUNT(*) FROM heroes

• Returns the number of rows of the heroes table

– SELECT COUNT(name), COUNT(DISTINCT name) FROM heroes

• Returns the number of rows in the heroes table for that

name is not null and the number of non-null unique names

– SELECT MIN(strength), MAX(strength), AVG(strength) FROM powers

• Returns the minimal, maximal, and average power strength

in the powers table

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 79

7.7 Column Function

• Similar to aggregation in relation algebra,

SQL supports grouping

– GROUP BY <column names>

– Creates a group for each combination of

distinct values within the provided columns

– A query containing GROUP BY can access

non-group-by-attributes only by column functions

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 80

7.7 Grouping

Examples:

– SELECT course, AVG(result), COUNT(*), COUNT(result) FROM exams GROUP BY course

• For each course, list the average result,

the number of results, and the number of non-null results

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 81

7.7 Grouping

student course Result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 NULL

6676 102 4.3

3412 NULL NULL

exams

course 2 3 4

100 3,7 2 1

101 4.0 1 1

102 3.15 2 2

NULL NULL 1 0

Examples:

– SELECT course, AVG(result), COUNT(*)FROM exams WHERE course IS NOT NULLGROUP BY course

• The where clause is evaluated before the groups are formed!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 82

7.7 Grouping

student course Result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 NULL

6676 102 4.3

3412 NULL NULL

exams

course 2 3

100 3.7 2

101 4.0 1

102 3.15 2

• Additionally, there may be restrictions on

the groups themselves

– HAVING <condition>

– The condition may involve group properties

– Only those groups are created that fulfill the

HAVING condition

– A query may have a WHERE and a HAVING clause

– Also, it is possible to have HAVING without GROUP BY

• Then, the whole table is treated as a single group

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 83

7.7 Grouping

• Examples:

– SELECT course, AVG(result), COUNT(*)FROM exams WHERE course <> 100GROUP BY course HAVING count(*) > 1

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 84

7.7 Grouping

student course Result

9876 100 3.7

2832 102 2.0

1005 101 4.0

1005 100 NULL

6676 102 4.3

3412 NULL NULL

exams

course 1 2

102 3.15 2

Page 15: Relational Database Systems 1 - TU Braunschweig

• As last step in the processing pipeline,(unordered) result sets may be converted into lists– Impose an order on the rows

– This concludes the SELECT statement

– ORDER BY keyword

• Please note:Ordering completely breaks with set calculus/algebra– Result after ordering is a list, not a (multi-)set!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 85

7.8 Ordering

SELECT statement

query

column namecolumn nameODER BYODER BY

integer

ASCASC

DESCDESC

,,

• ORDER BY may order ascending or descending

– Default: ascending

• Ordering on multiple columns possible

• Columns used for ordering arereferences by their name

• Example

– SELECT * FROM resultsORDER BY student, crsNo DESC

– Returns all results ordered by student id (ascending)

– If student ids are identical, we sort in descending orderby course number

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 86

7.8 Ordering

• When working with result lists, often only the

top-k results are of interest (e.g. k = 10)

• How can we limit the number of result rows?

– Since SQL:2008, the SQL standard offers the

FETCH FIRST clause (supported e.g. in DB2)

– Example:

SELECT name, salary FROM salariesORDER BY salary FETCH FIRST 10 ROWS ONLY

– FETCH FIRST can also be used without ORDER BY

• Get a quick impression of the result set

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 87

7.8 Ordering

• SQL queries are evaluated in this sequence:

5. SELECT <attribute list>

1. FROM <table list>

2. [WHERE <condition>]

3. [GROUP BY <attribute list>]

4. [HAVING <condition>]

6. [UNION/INTERSECT/EXCEPT <query>]

7. [ORDER BY <attribute list>]

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 88

7.8 Evaluation Order of SQL

• In SQL, you may embed a query block within

a query block (so called subquery, or nested query)

– Subqueries are written in parenthesis

– Literal subqueries can be used as expressions

• Return just one single value

– Subqueries may create the test set for the IN-condition

– Subqueries may create the test set for the

EXISTS-condition

– Each subquery within the table list creates

a temporary source table

• Called inline view

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 89

7.9 Subqueries

• Subqueries may either be correlated

or uncorrelated

– If the WHERE clause of the inner query uses an

attributes within a table declared in the outer query,

the two queries are correlated

• The inner query needs to be re-evaluated for every tuple

in the outer query

• This is rather inefficient, so avoid correlated subqueries

whenever possible!

– Otherwise, the queries are uncorrelated

• The inner queries needs to be evaluated just once

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 90

7.9 Subqueries

Page 16: Relational Database Systems 1 - TU Braunschweig

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 91

7.9 Subqueries

id realNameHero

hero_id aliasNameHasAlias

hero_id power_id powerStrengthHasPower

id name descriptionPower

• Expressions:

– SELECT hero.* FROM hero, hasPower pWHERE hero.id = p.hero_id

AND powerStrenth =(SELECT MAX(powerStrength) FROM hasPower)• Select all those heroes having powers with maximal strength

• Uncorrelated! Subquery is evaluated only once

• IN-condition:

– SELECT * FROM hero WHERE id IN(SELECT hero_id FROM hasAlias

WHERE aliasName LIKE ’Super%’)• Select all those heroes having an alias starting with “Super”

• Uncorrelated

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 92

7.9 Subqueries

• EXISTS-condition:

– SELECT * FROM hero hWHERE EXISTS

(SELECT * FROM hasAlias a WHERE h.id = a.hero_id)• Select heroes having at least one alias

• Correlated! Subquery is evaluated for each tuple in hero

• Inline view

– SELECT h.realName, a.aliasNameFROM hasAlias a,(SELECT * FROM heroes WHERE realName LIKE ’A%’) h WHERE h.id < 100 AND a.hero_id = h.id• Get realName–alias pairs for all heroes with a real name staring

with “A” and an id smaller than 100

• Uncorrelated

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 93

7.9 Subqueries

• What is “good” SQL code?

– Easy to read

– Easy to write

– Easy to understand!

• There is no “official” SQL style guide,

but here are some general hints

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 94

7.10 Writing Good SQL Code

1. Write SQL keywords in uppercase,

names in lowercase!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 95

7.10 Writing Good SQL Code

GOOD

BADSELECT MOVIE_TITLEFROM MOVIESWHERE MOVIE_YEAR = 2009

SELECT movie_titleFROM moviesWHERE movie_year = 2009

2. Use proper qualification!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 96

7.10 Writing Good SQL Code

GOOD

BADSELECT imdbraw.movies.movie_title,imdbraw.movies.movie_yearFROM imdbraw.moviesWHERE imdbraw.movies.movie_year = 2009

SET SCHEMA imdbraw

SELECT movie_title, movie_yearFROM moviesWHERE movie_year = 2009

Page 17: Relational Database Systems 1 - TU Braunschweig

3. Use aliases to keep your code short

and the result clear!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 97

7.10 Writing Good SQL Code

GOOD

BADSELECT movie_title, movie_yearFROM movies, genresWHERE movies.movie_id = genres.movie_idAND genres.genre = ’Action’

SELECT movie_title title, movie_year yearFROM movies m, genres gWHERE m.movie_id = g.movie_idAND g.genre = ’Action’

4. Separate joins from conditions!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 98

7.10 Writing Good SQL Code

GOOD

BAD

SELECT movie_title title, movie_year yearFROM movies m, genres g, actors aWHERE m.movie_id = g.movie_idAND g.genre = ’Action’AND m.movie_id = a.movie_idAND a.person_name LIKE ’%Schwarzenegger%’

SELECT movie_title title, movie_year yearFROM movies mJOIN genres g ON m.movie_id = g.movie_idJOIN actors a ON g.movie_id = a.movie_idWHERE g.genre = ’Action’AND a.person_name LIKE ’%Schwarzenegger%’

5. Use proper indentation!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 99

7.10 Writing Good SQL Code

GOOD

BAD

SELECT movie_title title, movie_year yearFROM movies m, genres g, actors aWHERE m.movie_id = g.movie_idAND g.genre = ’Action’AND m.movie_id = a.movie_idAND a.person_name LIKE ’%Schwarzenegger%’

SELECT movie_title title, movie_year yearFROM movies m

JOIN genres g ON m.movie_id = g.movie_idJOIN actors a ON g.movie_id = a.movie_id

WHERE g.genre = ’Action’AND a.person_name LIKE ’%Schwarzenegger%’

6. Extract uncorrelated subqueries!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 100

7.10 Writing Good SQL Code

GOOD

BAD

WITH recent_actors (person_id) AS(SELECT DISTINCT person_id

FROM actors aJOIN movies m ON a.movie_id = m.movie_id

WHERE movie_year >= 2007)SELECT DISTINCT person_name nameFROM directors dWHERE d.person_id IN (SELECT * FROM recent_actors)

SELECT DISTINCT person_name nameFROM directors dWHERE d.person_id IN(SELECT DISTINCT person_id

FROM actors aJOIN movies m ON a.movie_id = m.movie_id

WHERE movie_year >= 2007)

7. Avoid subqueries (use joins if possible)!

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 101

7.10 Writing Good SQL Code

GOOD

BAD

SELECT DISTINCT d.person_name nameFROM directors d

JOIN actors a ON d.person_id = a.person_idJOIN movies m ON a.movie_id = m.movie_id

WHERE movie_year >= 2007

WITH recent_actors (person_id) AS(SELECT DISTINCT person_id

FROM actors aJOIN movies m ON a.movie_id = m.movie_id

WHERE movie_year >= 2007)SELECT DISTINCT person_name nameFROM directors dWHERE d.person_id IN (SELECT * FROM recent_actors)