RDBMS Unit- V

Embed Size (px)

Citation preview

  • 8/8/2019 RDBMS Unit- V

    1/4

    Unit V

    Distributed Database Systems

    A distributeddatabase is a database that is under the control of a central database management

    system (DBMS) in which storage devices are not all attached to a common CPU. It may bestored in multiple computers located in the same physical location, or may be dispersed over anetworkof interconnected computers.

    Collections of data (eg. in a database) can be distributed across multiple physical locations. A

    distributed database can reside on network servers on the Internet, on corporate intranets orextranets, or on other company networks. Replication and distribution of databases improve

    database performance at end-user worksites. [1]

    To ensure that the distributive databases are up to date and current, there are two processes:

    replication and duplication. Replication involves using specialized software that looks for

    changes in the distributive database. Once the changes have been identified, the replicationprocess makes all the databases look the same. The replication process can be very complex andtime consuming depending on the size and number of the distributive databases. This process can

    also require a lot of time and computer resources. Duplication on the other hand is not ascomplicated. It basically identifies one database as a master and then duplicates that database.

    The duplication process is normally done at a set time after hours. This is to ensure that eachdistributed location has the same data. During the duplication process, changes to the master

    database only are allowed. This is to ensure that local data will not be overwritten. Both of theprocesses can keep the data current in all distributive locations.

    [2]

    Besides distributed database replication and fragmentation, there are many other distributed

    database design technologies. For example, local autonomy, synchronous and asynchronousdistributed database technologies. These technologies' implementation can and does depend on

    the needs of the business and the sensitivity/confidentiality of the data to be stored in thedatabase, and hence the price the business is willing to spend on ensuring data security,

    consistency and integrity.

    Distributed Database Design

    Introduction: Design StrategiesAlternative design strategies approaches of distribution design

    Top-down design suitable when designing systems from scratch mostly in homogeneous systemsmain focus in this chapter

    Bottom-up design suitable when DBs already exist at a number of sites mostly in heterogeneous systems

  • 8/8/2019 RDBMS Unit- V

    2/4

    Distribution design design the local conceptual schemas by distributing entities over the sites of DCS

    fragmentation allocation

    Reasons for fragmentation relation may not be a suitable unit of distribution

    application views are usually subsets of relations i.e., locality or proximity

    permits a number of transactions to execute concurrently i.e., transactions that access different portions of a relation inter-query concurrency intra-query concurrency i.e., parallel execution of a single query

    Disadvantage of fragmentation may require extra processing, e.g., join for views that cannot be defined on a single fragment

    semantic data control is more difficult especially, integrity enforcement

    Fragmentation alternatives horizontal fragmentation vertical fragmentation

    (Ex) Horizontal fragmentation

    PNO PNAME BUDGET LOC

  • 8/8/2019 RDBMS Unit- V

    3/4

    P1 Instrumentation 150000 MontrealP2 Database Develop. 135000 New YorkP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 ParisP5 CAD/CAM 500000 BostonPROJ

    PROJ1: projects with budgets less than $200,000PROJ2: projects with budgets greater than or equal to $200,000

    PROJ1

    PNO PNAME BUDGET LOCP1 Instrumentation 150000 MontrealP2 Database Develop. 135000 New York

    PROJ2

    PNO PNAME BUDGET LOCP3 CAD/CAM 250000 New YorkP4 Maintenance 310000 ParisP5 CAD/CAM 500000 Boston

    Query Processing

    Introduction

    SQL query processing requires that the DBMS identify and execute a strategy for retrieving the results of the query.

    The SQL query determines what data is to be found, but does not define the method by which the data manager

    searches the database. Hence, query optimization is necessary for high-level relational queries and provides an

    opportunity for the DBMS to systematically evaluate alternative query execution strategies and to choose an optimal

    strategy. In some cases the data manager cannot determine the optimal strategy. Assumptions are made which are

    predicated on the actual structure of the SQL query. These assumptions can significantly affect the queryperformance. This implies that certain queries can exhibit significantly different response times for relatively

    innocuous changes in query syntax and structure.

    For the purpose of this discussion an example medical database will be used. Figure 1 below illustrates our subject

    database schema for physicians, patients, and medical services. The Physician table contains one row for every

    physician in the system. Various attributes describe the physician name, address, provider number and specialty.

    The Patient table contains one row for every individual in the system. Patients have attributes listing their social

    security number, name, residence area, age, gender, and doctor. For simplicity, a physician can see many patients,

    but a patient has only one doctor. A Services table exists which lists all the valid medical procedures which can be

    performed. When a patient is ill and under the care of a physician, a row exists in the Treatment table describing

    the prescribed treatment. This table contains one attribute recording the cost of the individual service and a

    compound key that identifies the patient, physician, and the specific service received.

  • 8/8/2019 RDBMS Unit- V

    4/4

    Query Processing

    The steps necessary for processing an SQL query are shown in Figure 2. The SQL query statement is first parsed

    into its constituent parts. The basic SELECT statement is formed from the three clauses SELECT, FROM, and

    WHERE. These parts identify the various tables and columns that participate in the data selection process. The

    WHERE clause is used to determine the order and precedence of the various attribute comparisons through aconditional expression. An example query to determine the names and addresses of all patients of Doctor 1234 is

    shown as query Q1 below. The WHERE clause uses a conjunctive clause which combines two attributecomparisons. More complex conditions are possible.

    Q1: SELECT Name, Address, Dr_Name

    FROM Patient, Physician

    WHERE Patient.Doctor = Physician.Provider AND Physician.Provider = 1234

    The query optimizer has the task of determining the optimum query execution plan. The term optimizer is actually

    a misnomer, because in many cases the optimum strategy is not found. The goal is to find a reasonably efficient

    strategy for executing the query. Finding the perfect strategy is usually too time consuming and can require detailedinformation on both the data storage structure and the actual data content. Usually this information is simply not

    available.

    Once the execution plan is established the query code is generated. Various techniques such as memory

    management, disk caching and parallel query execution can be used to improve the query performance. However,

    if the plan is not correct, then the query performance cannot be optimum.