Upload
saif-uddin
View
219
Download
0
Embed Size (px)
Citation preview
8/8/2019 RDBMS Unit- V
1/4
Unit V
Distributed Database Systems
A distributeddatabase is a database that is under the control of a central database management
system (DBMS) in which storage devices are not all attached to a common CPU. It may bestored in multiple computers located in the same physical location, or may be dispersed over anetworkof interconnected computers.
Collections of data (eg. in a database) can be distributed across multiple physical locations. A
distributed database can reside on network servers on the Internet, on corporate intranets orextranets, or on other company networks. Replication and distribution of databases improve
database performance at end-user worksites. [1]
To ensure that the distributive databases are up to date and current, there are two processes:
replication and duplication. Replication involves using specialized software that looks for
changes in the distributive database. Once the changes have been identified, the replicationprocess makes all the databases look the same. The replication process can be very complex andtime consuming depending on the size and number of the distributive databases. This process can
also require a lot of time and computer resources. Duplication on the other hand is not ascomplicated. It basically identifies one database as a master and then duplicates that database.
The duplication process is normally done at a set time after hours. This is to ensure that eachdistributed location has the same data. During the duplication process, changes to the master
database only are allowed. This is to ensure that local data will not be overwritten. Both of theprocesses can keep the data current in all distributive locations.
[2]
Besides distributed database replication and fragmentation, there are many other distributed
database design technologies. For example, local autonomy, synchronous and asynchronousdistributed database technologies. These technologies' implementation can and does depend on
the needs of the business and the sensitivity/confidentiality of the data to be stored in thedatabase, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.
Distributed Database Design
Introduction: Design StrategiesAlternative design strategies approaches of distribution design
Top-down design suitable when designing systems from scratch mostly in homogeneous systemsmain focus in this chapter
Bottom-up design suitable when DBs already exist at a number of sites mostly in heterogeneous systems
8/8/2019 RDBMS Unit- V
2/4
Distribution design design the local conceptual schemas by distributing entities over the sites of DCS
fragmentation allocation
Reasons for fragmentation relation may not be a suitable unit of distribution
application views are usually subsets of relations i.e., locality or proximity
permits a number of transactions to execute concurrently i.e., transactions that access different portions of a relation inter-query concurrency intra-query concurrency i.e., parallel execution of a single query
Disadvantage of fragmentation may require extra processing, e.g., join for views that cannot be defined on a single fragment
semantic data control is more difficult especially, integrity enforcement
Fragmentation alternatives horizontal fragmentation vertical fragmentation
(Ex) Horizontal fragmentation
PNO PNAME BUDGET LOC
8/8/2019 RDBMS Unit- V
3/4
P1 Instrumentation 150000 MontrealP2 Database Develop. 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 ParisP5 CAD/CAM 500000 BostonPROJ
PROJ1: projects with budgets less than $200,000PROJ2: projects with budgets greater than or equal to $200,000
PROJ1
PNO PNAME BUDGET LOCP1 Instrumentation 150000 MontrealP2 Database Develop. 135000 New York
PROJ2
PNO PNAME BUDGET LOCP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 ParisP5 CAD/CAM 500000 Boston
Query Processing
Introduction
SQL query processing requires that the DBMS identify and execute a strategy for retrieving the results of the query.
The SQL query determines what data is to be found, but does not define the method by which the data manager
searches the database. Hence, query optimization is necessary for high-level relational queries and provides an
opportunity for the DBMS to systematically evaluate alternative query execution strategies and to choose an optimal
strategy. In some cases the data manager cannot determine the optimal strategy. Assumptions are made which are
predicated on the actual structure of the SQL query. These assumptions can significantly affect the queryperformance. This implies that certain queries can exhibit significantly different response times for relatively
innocuous changes in query syntax and structure.
For the purpose of this discussion an example medical database will be used. Figure 1 below illustrates our subject
database schema for physicians, patients, and medical services. The Physician table contains one row for every
physician in the system. Various attributes describe the physician name, address, provider number and specialty.
The Patient table contains one row for every individual in the system. Patients have attributes listing their social
security number, name, residence area, age, gender, and doctor. For simplicity, a physician can see many patients,
but a patient has only one doctor. A Services table exists which lists all the valid medical procedures which can be
performed. When a patient is ill and under the care of a physician, a row exists in the Treatment table describing
the prescribed treatment. This table contains one attribute recording the cost of the individual service and a
compound key that identifies the patient, physician, and the specific service received.
8/8/2019 RDBMS Unit- V
4/4
Query Processing
The steps necessary for processing an SQL query are shown in Figure 2. The SQL query statement is first parsed
into its constituent parts. The basic SELECT statement is formed from the three clauses SELECT, FROM, and
WHERE. These parts identify the various tables and columns that participate in the data selection process. The
WHERE clause is used to determine the order and precedence of the various attribute comparisons through aconditional expression. An example query to determine the names and addresses of all patients of Doctor 1234 is
shown as query Q1 below. The WHERE clause uses a conjunctive clause which combines two attributecomparisons. More complex conditions are possible.
Q1: SELECT Name, Address, Dr_Name
FROM Patient, Physician
WHERE Patient.Doctor = Physician.Provider AND Physician.Provider = 1234
The query optimizer has the task of determining the optimum query execution plan. The term optimizer is actually
a misnomer, because in many cases the optimum strategy is not found. The goal is to find a reasonably efficient
strategy for executing the query. Finding the perfect strategy is usually too time consuming and can require detailedinformation on both the data storage structure and the actual data content. Usually this information is simply not
available.
Once the execution plan is established the query code is generated. Various techniques such as memory
management, disk caching and parallel query execution can be used to improve the query performance. However,
if the plan is not correct, then the query performance cannot be optimum.