RDBMS Unit- V

8/8/2019 RDBMS Unit- V

1/4

Unit V

Distributed Database Systems

A distributeddatabase is a database that is under the control of a central database management

system (DBMS) in which storage devices are not all attached to a common CPU. It may bestored in multiple computers located in the same physical location, or may be dispersed over anetworkof interconnected computers.

Collections of data (eg. in a database) can be distributed across multiple physical locations. A

distributed database can reside on network servers on the Internet, on corporate intranets orextranets, or on other company networks. Replication and distribution of databases improve

database performance at end-user worksites. [1]

To ensure that the distributive databases are up to date and current, there are two processes:

replication and duplication. Replication involves using specialized software that looks for

changes in the distributive database. Once the changes have been identified, the replicationprocess makes all the databases look the same. The replication process can be very complex andtime consuming depending on the size and number of the distributive databases. This process can

also require a lot of time and computer resources. Duplication on the other hand is not ascomplicated. It basically identifies one database as a master and then duplicates that database.

The duplication process is normally done at a set time after hours. This is to ensure that eachdistributed location has the same data. During the duplication process, changes to the master

database only are allowed. This is to ensure that local data will not be overwritten. Both of theprocesses can keep the data current in all distributive locations.

[2]

Besides distributed database replication and fragmentation, there are many other distributed

database design technologies. For example, local autonomy, synchronous and asynchronousdistributed database technologies. These technologies' implementation can and does depend on

the needs of the business and the sensitivity/confidentiality of the data to be stored in thedatabase, and hence the price the business is willing to spend on ensuring data security,

consistency and integrity.

Distributed Database Design

Introduction: Design StrategiesAlternative design strategies approaches of distribution design

Top-down design suitable when designing systems from scratch mostly in homogeneous systemsmain focus in this chapter

Bottom-up design suitable when DBs already exist at a number of sites mostly in heterogeneous systems


2/4

Distribution design design the local conceptual schemas by distributing entities over the sites of DCS

fragmentation allocation

Reasons for fragmentation relation may not be a suitable unit of distribution

application views are usually subsets of relations i.e., locality or proximity

permits a number of transactions to execute concurrently i.e., transactions that access different portions of a relation inter-query concurrency intra-query concurrency i.e., parallel execution of a single query

Disadvantage of fragmentation may require extra processing, e.g., join for views that cannot be defined on a single fragment

semantic data control is more difficult especially, integrity enforcement

Fragmentation alternatives horizontal fragmentation vertical fragmentation

(Ex) Horizontal fragmentation

PNO PNAME BUDGET LOC


3/4

P1 Instrumentation 150000 MontrealP2 Database Develop. 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 ParisP5 CAD/CAM 500000 BostonPROJ

PROJ1: projects with budgets less than $200,000PROJ2: projects with budgets greater than or equal to $200,000

PROJ1

PNO PNAME BUDGET LOCP1 Instrumentation 150000 MontrealP2 Database Develop. 135000 New York

PROJ2

PNO PNAME BUDGET LOCP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 ParisP5 CAD/CAM 500000 Boston

Query Processing

Introduction

SQL query processing requires that the DBMS identify and execute a strategy for retrieving the results of the query.

The SQL query determines what data is to be found, but does not define the method by which the data manager

searches the database. Hence, query optimization is necessary for high-level relational queries and provides an

opportunity for the DBMS to systematically evaluate alternative query execution strategies and to choose an optimal

strategy. In some cases the data manager cannot determine the optimal strategy. Assumptions are made which are

predicated on the actual structure of the SQL query. These assumptions can significantly affect the queryperformance. This implies that certain queries can exhibit significantly different response times for relatively

innocuous changes in query syntax and structure.

For the purpose of this discussion an example medical database will be used. Figure 1 below illustrates our subject

database schema for physicians, patients, and medical services. The Physician table contains one row for every

physician in the system. Various attributes describe the physician name, address, provider number and specialty.

The Patient table contains one row for every individual in the system. Patients have attributes listing their social

security number, name, residence area, age, gender, and doctor. For simplicity, a physician can see many patients,

but a patient has only one doctor. A Services table exists which lists all the valid medical procedures which can be

performed. When a patient is ill and under the care of a physician, a row exists in the Treatment table describing

the prescribed treatment. This table contains one attribute recording the cost of the individual service and a

compound key that identifies the patient, physician, and the specific service received.


4/4

Query Processing

The steps necessary for processing an SQL query are shown in Figure 2. The SQL query statement is first parsed

into its constituent parts. The basic SELECT statement is formed from the three clauses SELECT, FROM, and

WHERE. These parts identify the various tables and columns that participate in the data selection process. The

WHERE clause is used to determine the order and precedence of the various attribute comparisons through aconditional expression. An example query to determine the names and addresses of all patients of Doctor 1234 is

shown as query Q1 below. The WHERE clause uses a conjunctive clause which combines two attributecomparisons. More complex conditions are possible.

Q1: SELECT Name, Address, Dr_Name

FROM Patient, Physician

WHERE Patient.Doctor = Physician.Provider AND Physician.Provider = 1234

The query optimizer has the task of determining the optimum query execution plan. The term optimizer is actually

a misnomer, because in many cases the optimum strategy is not found. The goal is to find a reasonably efficient

strategy for executing the query. Finding the perfect strategy is usually too time consuming and can require detailedinformation on both the data storage structure and the actual data content. Usually this information is simply not

available.

Once the execution plan is established the query code is generated. Various techniques such as memory

management, disk caching and parallel query execution can be used to improve the query performance. However,

if the plan is not correct, then the query performance cannot be optimum.

Documents

RDBMS Unit- V