Upload
dheeman-ghosh
View
219
Download
0
Embed Size (px)
Citation preview
8/3/2019 DBMSmis1
1/56
Data
Data is a collection of facts, such as values or measurements.
It can be numbers, words, characters, symbols,measurements, observations or even just descriptions ofthings.
Data is the lowest level of abstraction, information is the nextlevel, and finally, knowledge is the highest level among allthree
Data on its own carries no meaning. For data to becomeinformation, it must be interpreted and take on a meaning by
a human or machine.
8/3/2019 DBMSmis1
2/56
Why data matters
As organizations continue to struggle to maintain competitiveadvantage, information becomes the key component inenabling executives and decision makers to make informeddecisions based on a 360-degree view of the organizationand its various operational processes.
8/3/2019 DBMSmis1
3/56
Data files
Each application generates a specific file type Read by an identical application produced by the
same vendor. Some applications do have import and export
facilities to allow a range of different formats tobe produced or read,
The specific issues with any data file relate tothe following:
-Version number of the application-Structure of data
e.g. student data file in an institute
8/3/2019 DBMSmis1
4/56
Data Processing
Data processing is the act of handling or manipulating data insome fashion.Regardless of the activities involved in it,
processing tries to assign meaning to data. Thus, the ultimate
goal of processing is to transform data into information.
8/3/2019 DBMSmis1
5/56
Information
Knowledge derived from study, experience (bythe senses), or instruction.
Communication of intelligence.
"Information is any kind of knowledge that isexchangeable amongst people, about things,facts, concepts, etc., in some context." *
"Information is interpreted data" (Data operatedin such a way as to display information)
e.g. if student is new to institute or not
8/3/2019 DBMSmis1
6/56
Why Information?
Information is critical
Information is a resource
-It is scarce
-It has a cost
-It has alternative uses
-cost factor involved if one does not process
information Ensure effective and efficient decision making
leading to prosperity of organization
8/3/2019 DBMSmis1
7/56
DATA
INFORMATION
KNOWLEDGE
Levels of Abstraction of Data , Information and
Knowledge
"Information is
interpreted data"
Raw Facts
Knowledge derived fromstudy, experience (by
the senses)
8/3/2019 DBMSmis1
8/56
Qualitative vs Quantitative Data
Data can be qualitative or quantitative.
Qualitative data is descriptive information (it describes something).
Quantitative data, is numerical information (numbers).
8/3/2019 DBMSmis1
9/56
Variables
Variables hold or store DataBasic Types of variables
Logical
Numeric - Integer, Float String or Text Variable
Mixed Variables (Data Structures)
- Complex data structures. To store records ofmixed types.Eg: AXEMP001 (Alphanumeric), Sailesh Singh (Text),20,000 (Numeric)
10(Numeric)
8/3/2019 DBMSmis1
10/56
Data files
Data Storage Flat files Data Base Management Systems
Flat files
Plain text fileBefore 1960s, when the concept of DBMS
was not there, flat text files were used as
databases, and programmers wroteprograms to store or retrieve data indata files
8/3/2019 DBMSmis1
11/56
Advantages of files as databases
Cheap - Using a flat file database costs practically nothing because data is storedas text files. No software is required other than the program that needs to accessthe data.
Platform Independent - Since text files are universally accepted by all serverplatforms, there is no problem moving your database from one server to another.
Very Simple to Understand - Records in a flat file are stored in one straight lineand are separated by delimiters.
8/3/2019 DBMSmis1
12/56
Disadvantages of Using FlatFiles
: Low Security - No security feature is built into a textfile. It can be opened for viewing by anyone whohappens to know where to look.
Data Redundancy- Duplication of same data indifferent files.- Wastage of storage space, since duplicated data isstored.-Errors may be generated due to updating of the same
data in different files.-Time in entering data again and again is wasted.-Computer Resources are needlessly used.-It is very difficult to combine information
8/3/2019 DBMSmis1
13/56
Disadvantages of Using FlatFiles
Data Inconsistency- Conflicting data in files.
(Example)
Suppose that in STUDENT file it is indicated that Roll no= 10
has opted for 'Computer course but in RESULT file it isindicated that Roll No. =10 has opted for 'Accounts' course.
Low Reliability & Integrity - Flat files are very prone to data
corruption especially if the size of the database grows beyondwhat the server resources are prepared to handle.
8/3/2019 DBMSmis1
14/56
Disadvantages of Using FlatFiles
Limited Data Structuring - As mentionedpreviously, records are stored as lines of text.This does not offer the flexibility of creating"relationships" between data whether within oneflat file or across several.
Difficult to Integrate with Other Programs -Once a flat file is created for use by oneprogram, it is impossible to have anotherprogram use it. This is because the succeedingprograms need to conform to the structure of theflat file
8/3/2019 DBMSmis1
15/56
What Is a DBMS?
Database - A very large, integrated collection
of data or facts. E.g.The information in aphone book is an example of a database.The database is the information stored on thepages of the book, not the book itself
A Database Management System (DBMS)isa software package designed to store andmanage databases. Typical examples of
DBMSs include Oracle, Microsoft Access,
8/3/2019 DBMSmis1
16/56
Advantage of DatabaseTechnology
Redundancy controlled (normalization)Efficientdata processing and storage
Data integrity and avoid inconsistencies
Integrity constraints Sharing data by many applications- Good fordecision support system
Data security
Standards can be enforced in datarepresentation, naming of variables anddocumentation
8/3/2019 DBMSmis1
17/56
Advantage of DatabaseTechnology
Data centralization- Shared by manydepartments
Data independence-Changes in structureof data files do not affect applicationprogram
8/3/2019 DBMSmis1
18/56
Disadvantage of DatabaseTechnology
Complex- Database administrator requiredfor maintenance
Costly to purchase and install
Since it is centralized high impact onorganization because of failure
8/3/2019 DBMSmis1
19/56
Structure of a DBMS
A typical DBMS has alayered architecture.
The figure does not
show the concurrencycontrol and recoverycomponents.
This is one of several
possible architectures;each system has itsown variations.
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
These layers
must consider
concurrency
control andrecovery
8/3/2019 DBMSmis1
20/56
Department Technician
Employees Equipment
Maintenance
Records
RDBMS
Model
8/3/2019 DBMSmis1
21/56
Motivation: Why databasemanagementsystems?
Database management systems (DBMSs)are very good at organizing and managinglarge collections of persistent data.
e.g. finding a particular book in a typicaluniversity library if the library does notkeep the books arranged in any particular
order or if the library has no indexes.
8/3/2019 DBMSmis1
22/56
Motivation: Why databasemanagementsystems?
Using a big collection of unorganizedthings is practically impossible. Structureturns data into information.
Persistencemeans that the data existpermanently; they do not disappear whenthe computer is shut off.
8/3/2019 DBMSmis1
23/56
Motivation: Why databasemanagementsystems?
Shift from computationto information
at the low end? scramble to webspace (amess!)
at the high end? scientific applications
Datasets increasing in diversity and volume.
Digital libraries, interactive video, Human
Genome project.
8/3/2019 DBMSmis1
24/56
Motivation: Why databasemanagementsystems?
DBMSs : data all in one place and easy to get to.
DBMSs help protect data from unauthorizedaccess
DBMSs help protect data from accidentalcorruption or loss due to:
-hardware failures such as power outages and
computer crashes-software failures such as operating systemcrashes
8/3/2019 DBMSmis1
25/56
Motivation: Why relational databasemanagement systems?
Concurrency Control
DBMSs allow concurrent access, meaning that asingle data set can be accessed by more than
one user at a time virtually all commercial database applications
require the data entry staff to have access to thedatabase simultaneously. E.g. an airlinereservation system cannot restrict access to thedatabase to a single travel agent.
8/3/2019 DBMSmis1
26/56
Motivation: Why relational databasemanagement systems?
These problems can cause the databaseto be corrupted or for a users interfaceprogram to never complete its query.
e.g. if there are no traffic lights or stopsigns -chaos
RDBMSs provide mechanisms to prevent
concurrent access problems; thesemechanisms are collectively calledconcurrency control.
8/3/2019 DBMSmis1
27/56
Motivation: Why relational databasemanagement systems?
Concurrent data access introducesunwanted problems caused by two usersmanipulating exactly the same data at
exactly the same time.
Logical data independence: Protectionfrom changes in logicalstructure of data.
Physical data independence: Protectionfrom changes in physicalstructure of data.
8/3/2019 DBMSmis1
28/56
Distributed RDBMS
A distributedDBMS allows a single database tobe split apart such that its pieces reside atgeographically separated sites.
this can provide performance improvements byeliminating transmitting the data across arelatively slow long distance communicationchannel (its a lot faster to have the database onhard drive than to access it across an Ethernetor via a modem)
this can reduce concurrency control problems bygiving each user that part of the database whichthey need rather than having all the userscompete for access to the whole database
8/3/2019 DBMSmis1
29/56
RDBMS characteristics
RDBMSs are not necessarily meant for dataanalysis; that is more the job of a spread sheetor some other special-purpose analysis tool.
RDBMSs are general-purpose tools. It isbasically irrelevant to the DBMS what is storedwithin it. Software design principles suggest de-coupling domain specific analysis packagesfrom the DBMS to keep the division of laborclear.
RDBMSs are very good at retrieving a relativelysmall portion of the database and passing italong for detailed analysis by a tool designed forthat purpose.
8/3/2019 DBMSmis1
30/56
RDBMS characteristics
RDBMSs often allow integrity constraints to be imposedon the data to insure validity and consistency. When anintegrity constraint applies to a table, all data in the tablemust conform to the corresponding rule.
E.g. TABLE Dept .ADD PRIMARY KEY (Deptno); Then,create a rule that every department listed in theemployee table must match one of the values in thedepartment table: alter table Emp ADD FOREIGN KEY(Deptno) REFERENCES Dept_tab (Dept no); When you
add a new employee record to the table, automaticcheck that its department number appears in thedepartment table
8/3/2019 DBMSmis1
31/56
Referential Integrity Rules
A rule defined on a key (a column or set of columns) inone table that guarantees that the values in that keymatch the values in a key in a related table (thereferenced value).
Referential integrity also includes the rules that dictatewhat types of data manipulation are allowed onreferenced values and how these actions affectdependent values. The rules associated with referentialintegrity are:
Restrict: Disallows the update or deletion of referenceddata.
Set to Default: When referenced data is updated ordeleted, all associated dependent data is set to a default
value.
8/3/2019 DBMSmis1
32/56
Referential integrity rules
Cascade: When referenced data isupdated, all associated dependent data iscorrespondingly updated. When a
referenced row is deleted, all associateddependent rows are deleted
8/3/2019 DBMSmis1
33/56
Data integrity constraints
Null Rule0A null is a rule defined on a single column that allows ordisallows inserts or updates of rows containing a null(the absence of a value) in that column.
Unique Column Values- A unique value defined on a column (or set of columns)
allows the insert or update of a row only if it contains aunique value in that column (or set of columns).
Primary Key Values-A primary key value defined on a key (a column or set of
columns) specifies that each row in the table can beuniquely identified by the values in the key.
8/3/2019 DBMSmis1
34/56
Other integrity constraints
Validation rules e.g.This integrity constraintenforces the rule that no row in this table cancontain a numeric value greater than 10,000 inthis column. If an INSERT or UPDATE
statement attempts to violate this integrityconstraint, then returns an error message.
CHECK Integrity Constraints
ACHECK
integrity constraint on a column or set ofcolumns requires that a specified condition betrue or unknown for every row of the table..Usually Boolean expression evaluated using thevalues in the row being inserted or updated.
8/3/2019 DBMSmis1
35/56
Levels of Abstraction in DBMS
Many views, single conceptual(logical)schemaand physical
schema.
Views describe how users
see the data-Filedescription,recorddescription
Conceptual schema defineslogical structure
Physical schemahowcomputer views data onsecondary device
Physical Schema
Conceptual Schema
View 1 View 2 View 3
Disk
8/3/2019 DBMSmis1
36/56
Summary
DBMS used to maintain, query large datasets. Benefits include recovery from system crashes,
concurrent access, quick applicationdevelopment, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
8/3/2019 DBMSmis1
37/56
Fundamental Concepts and Terminology
Data are facts. Some facts are more importantto than others. Some facts are importantenough to warrant keeping track of them in aformal, organized way.
Data" is a broad concept that can include thingssuch as pictures (binary images), programs, andrules. Informally, dataare the things you want tostore in a database
Data mining: applied to large volumes of data todiscover trends and patterns.
8/3/2019 DBMSmis1
38/56
Metadata
Meta means "about," so metadata is"about data," or, more specifically,"information about data." Metadata thatdescribes the fields and formats of
databases and data warehouses. Databasecontains fields such as Name, Address,City, and so on. Metadata names thesefields, describes the size of the fields, and
may put restrictions on what can go in thefield ( data schema) (for example,numbers only).
8/3/2019 DBMSmis1
39/56
Data Repository
A repository is a structure that stores andprotects data. (Database+metadata)
Repositories provide the following
functionality: add (insert) data to the repository
retrieve (find, select) data in the repository
delete data from the repository Some repositories allow data to be
changed, to be updated.
8/3/2019 DBMSmis1
40/56
Data Warehouse
Central repository of all data which anorganizations various business systems
collect.e.g. financial data used for
planning,marketing, contracting anddecision-making
8/3/2019 DBMSmis1
41/56
Data Repository
Repositories are like a bank vault. They existmainly to protect their contents from theft andaccidental destruction.
Security: repositories are typically passwordprotected, many have much more elaboratesecurity mechanisms.
Robustness: Accidental data loss is safeguardedagainst via the transactionmechanism.
A transactionis a sequence of databasemanipulation operations.
Data warehouse is the main repository of an
organization's historical data -management's
8/3/2019 DBMSmis1
42/56
Queries
Many DBMSs provide a user interface consistingof some sort of formal language.
A data definition language(DDL) is used to
specify which data will be stored in the databaseand how they are related. E.g. create table ordrop table
A data manipulation language(DML) is used
to add, retrieve, update, and delete data in theDBMS.
8/3/2019 DBMSmis1
43/56
Queries
A queryis often taken as a statement or groupof statements in either a DDL or a DML orboth. Some researchers view queries as read-
only operations, no data modifications areallowed e.g. Codd
A query languageis a formal language thatimplements a DDL, a DML, or both. Examples
of query languages include SQL (StructuredQuery Language),
8/3/2019 DBMSmis1
44/56
Database report
A database report presents informationretrieved from a table or query in apreformatted, attractive manner. Reporting
Services uses a SQL Server database forinternal storage. Microsoft Access can beused to create non-interactive HTML
reports. This is the easiest way to presentdatabase information on the Web.
D t M d l
8/3/2019 DBMSmis1
45/56
Data Models
A data modelis mathematical formalismconsisting of two
A notation for describing data, and
A set of operations used to manipulatethat data.
A data model is a way of organizing a
collection of facts pertaining to a systemunder investigation.
8/3/2019 DBMSmis1
46/56
Data models
Different models provide differentconceptualizations of the world; they havedifferent outlooks and different
perspectives. There is no universally agreed upon best
data model. The most common ones are
presented
8/3/2019 DBMSmis1
47/56
Overview of Database Design
Entity-Relationship Model
The ER model envisions the world ascomprised of entitiesthat are associated
with each other by relationships. All ofthe entities of a particular type arecollected together into entity sets. Anentity-relationship model (ERM) is anabstract conceptual representation ofstructured data;
8/3/2019 DBMSmis1
48/56
Overview of Database Design
What are the entitiesand relationshipsin theenterprise?
What information about these entities and
relationships should we store in the database? What are the integrity constraintsthat hold?
A database `schema?in the ER Model can berepresented pictorially (ER diagrams).
Can map an ER diagram into a relationalschema.
8/3/2019 DBMSmis1
49/56
Entities
Entities are distinguishable real-world objectssuch as employees, maps, airplanes, or busschedules.
-Distinguishable means that all entities can beuniquely identified.
-Entities have common attributes that definewhat it means to be such an entity.
-For any given real-world object, differentmodelers can choose different sets of attributesof the object that are of interest to their particularsituation.
8/3/2019 DBMSmis1
50/56
Relationship
A relationship Association among two or more entities.An association is a business component that defines arelationship between two entity objects based oncommon attributes. Relationship Set: Collection ofsimilar relationships.
Notation: two entity sets A and Bthat stand inrelationship r is written A rB.
Types of Relationship
One-One: if A rBand r is one-one then each entity of B
is in relationship with at most one entity of A and vice-versa. e.g. if CAPTAINcommands VESSEL andcommands is one-one then, in model, each vessel hasat most one captain and each captain commands atmost one vessel at a time.
8/3/2019 DBMSmis1
51/56
Types of Relationship
Many-one : if A rBand r is many-one then eachentity of A is in relationship with at most oneentity of Bbut not vice-versa. e.g. if CREWassigned-toVESSEL and assigned-to is
many-one then, in this model, a vessel hasmany crew members but a crew member isassigned to only one vessel.
Many-many if A r Band r is many-many then
each entity of A can be in relationship with anynumber of Bentities and vice-versa. if VESSELpatrols REGIONand patrols is many-manythen, in our model, a vessel patrols many
regions and a region is patrolled by many ships.
8/3/2019 DBMSmis1
52/56
ER model
isa -relationships: if A isa Bthen A is aspecialization of B, or, conversely, Bis ageneralization of A.
For example, if CAPTAINisa CREWthen,in model, captains have all the attributesof crew members but not vice versa.
The isa relationship allows hierarchies tobe established among entity sets.
8/3/2019 DBMSmis1
53/56
ER model basics
Consider Works: An employee can workin many departments; a dept can havemany employees.(one-many)
In contrast, each dept has at most onemanager, according to the key constrainton Manages.
8/3/2019 DBMSmis1
54/56
Primary and foreign key
Primary and Foreign key constraints are andwhat they are used for:
Primary Key:
A primary key is a field or combination of fieldsthat uniquely identify a record in a table, so thatan individual record can be located withoutconfusion.
Foreign Key: A foreign key (sometimes called a referencing
key) is a key used to link two tables together.Typically you take the primary key field from one
table and insert it into the other table where it
Primary and foreign key
8/3/2019 DBMSmis1
55/56
Primary and foreign keyconstraints
primary key constraint is a rule that says thatthe primary key fields cannot be null and cannotcontain duplicate data.
A foreign key constraint specifies that the datain a foreign key must match the data in theprimary key of the linked table.This system iscalled referential integrity, it is to ensure that
the data entered is correct and not orphaned(i.e. there are no broken links between data inthe tables)
8/3/2019 DBMSmis1
56/56
RDBMS
A relational database management systemisa DBMS based on the relational model asdefined by Codd.
There is no commercially available DBMS thatfully implements the relational model as definedby (Codd 1990). .
Advantages of the Relational Model
-queries can be automatically compiled,executed, and optimized without resorting toprogramming
-correctness: the semantics of the relational