Upload
neal-cobb
View
227
Download
0
Embed Size (px)
Citation preview
SFDV3007
Chapter 2: DistributedData Management
Overview of Chapter 2
• Distributed information systems• Client/server systems• XML and its applications• Distributed database systems
2
References
• Kifer chapters 15, 16, 23, 24, (25)• Silberschatz chapters 10, 18, 19, 21• Mannino chapter 17• Date (An Introduction to Database Systems, 6th
ed.) chapter 21• INFO 323 (distributed processing)• “Oracle DBA” = Oracle10g Administrator’s Guide,
Part VII• “Heterogeneous Connectivity” = Oracle10g
Heterogeneous Connectivity Administrator’s Guide
3
Distributed information systems
2.1A brief overview &
definition of some terms
5
We are here…
• Distributed information systems• Client/server systems• XML and its applications• Distributed database systems
6
Timeline
7
Timeline•CPU and memory related technology
•Offline storage
•Primary mode of user interaction (e.g. GUI
mainstream since 1984)
•Programming languages
•Data storage & management
•Hardware (boxes)
•Degree of centralisation
•Primary data processing style
Databases form the“back end” of IS
• Controlled by DBMS.• Integrity code ideally in database or
at least in one location– includes triggers, stored procedures, …– applications cannot bypass
8
Applications form the“front end” of IS
• Data entry, validation, query formulation, data retrieval and processing, information display.
• Commonly written in:– 3GLs (COBOL, C++, Java)– RAD (Rapid Application Development) tools
(Developer, Visual Studio, …)• Most code external to DBMS.• Originally all code was external to the DBMS, but
this has shifted with the introduction of stored procedures
9
The database and application layers
10
We can distribute processing
(Kifer §23.2; Silberschatz §18.1)
• Multiple independent, interconnected, cooperating computers.
• Processors may cooperate and/or data are distributed across various machines.
• multiple processors on different machines cooperate to carry out a task.
11
Distributed Computing or Distributed
ProcessingShares database’s logical processing among physically, networked independent sites
We can distribute data(Kifer pp. 687–688; Silberschatz §18.4)
• Data at multiple locations on the network.
• Compare with physical partitioning.
• Does not necessarily imply a “distributed database”.
• Often easier to distribute processing.
13
We can distribute the database
(Kifer §16.1; Silberschatz §19.1)
• Stored on and managed by computers at several sites on a network.
• Distribution of data and database processing.
• Ideally a single logical database.
14
What is a Distributed Database System?
Stores logically related database over physically independent sites
16
We are here…
• Distributed information systems
•XML and its applications
• Distributed database systems
XML and its applications
2.3Current developments in
distributed data management
XML References
• Kifer ch. 15• Silberschatz ch. 10• XML in 10 Points <http://www.w3.org/XML/1999/XML-in-10-
points>
• A Technical Introduction to XML <http://www.xml.com/>
• Web standard specifications– World Wide Web Consortium (W3C)
<http://www.w3.org/>
– Organization for the Advancement of Structured Information Standards (OASIS) <http://www.oasis-open.org/>
18
XML = Extensible Markup Language
(Kifer pp. 582–585; Silberschatz §10.1)
• Text-based markup language with user-definable tag sets.
• In HTML, the tag set is fixed.• Used to:
– create domain-specific markup languages– exchange data
• Not really intended for humans to read.
19
What is a markup language?
• Embedded “commands” within text.• Examples: HTML, LaTeX, WordPerfect.• Possible uses:
– specifying semantics– specifying document structure– specifying visual formatting– structure/meaning vs. presentation
20
XML has a long history
1969: IBM introduces Generalised Markup Language (GML); first use of <tag> </tag>.
1986: Standard Generalised Markup Language (SGML) defines different document types (DTD).
1990–1997: HyperText Markup Language (HTML) v1.0–4.0.1; an SGML document type
1996–1998: XML 1.0; a simplified form of SGML.2000: XHTML; HTML as an XML document type.
21
Why XML?(Kifer §15.1; Silberschatz §10.1)
• HTML more for display:– mixes structure, meaning, formatting– difficult to query/manipulate HTML documents– XML separates content from presentation
• No predefined tags (cf. HTML).• Free-form data storage.• Typically self-documenting.• Plain text, hierarchical structure ⇒ very easy
to process.
22
What XML is• “SGML (Standard Generalized Markup
Language) lite”.• Plain text (easy to manipulate).• Free-form.• Extensible (define your own elements).• Content-neutral markup -- which allowed for multi-
channel publishing into a variety of external container formats. .
• Hierarchically structured.• Extremely verbose (deliberately)
23
What XML isn’t• A language
– XML is not a language in the programming sense, at least. It’s really more of a mechanism for specifying different varieties of markup.
• A replacement for HTML• Intended for human consumption• A database• A silver bullet - refers to any straightforward
solution perceived to have extreme effectiveness.
24
XML = content, XSL = presentation
(Kifer §15.4.2; Silberschatz §10.4.2; Example 2–10)
• XML Stylesheet Language.• Specifies document formatting
(appearance).• Separate from XML documents.
25
XML can be used for many things
• Domain specific markup languages:– XHTML (HTML as an XML document type)– SVG (Scalable Vector Graphics)– MathML (mathematical formulae)– ChemML (chemical industry)
• Dynamic document publication.– dynamic document publication with a single source and
multiple target formats.• Data interchange.• .
26
XML can be used for many things
• Metadata.– Metadata specifications typically use RDF (Resource
Description Framework), which is a major component of the Semantic Web
• Aggregating data from multiple sources.• Document storage and manipulation.
– Document storage implies a database. Document manipulation implies the ability to transform XML documents into other forms.
27
But XML has its problems too…
• Extremely (deliberately) verbose ⇒ bloated files.
• Hierarchical structure not suited to all applications.
• TMA: Too Many Acronyms (XML, XSL, XSLT, DTD, CSS, HTML, XHTML, DOM, …argh!)
• Too many specifications (XML, XPath, XQuery, XML Schema, XPointer, XLink, XForms, XHTML, XML Encryption, web services, …ARGH!)
28
Distributed database systems
2.5Theory & practice
30
We are here…
• Distributed information systems• XML and its applications• Distributed database systems
Recall: We can distribute data
(Kifer pp. 687–688; Silberschatz §18.4)
• Data at multiple locations on the network.
• Does not necessarily imply a “distributed database”.
• Architectures:– networked databases– federated database – “true” distributed
database
31
The networked databases architecture
(Kifer §16.1: “Multiple local schemas”)
32
Networked databaseshave many problems
• Heterogeneous data management systems.
• Duplication ⇒ different versions of “same” data.
• Synonyms and homonyms.• Data “islands”.• Transfer difficulties.
33
There are many reasonsfor heterogeneity
• Historical:– “databases” dating back decades– “foreign” databases acquired by
mergers
• Separate vs. monolithic:– Performance– focus
• “Bottom-up” development.34
An example of a distributed database
35
DunedinWarehouse,
Inventory(SQL Server)
WellingtonSales (NI)& Staff(Sybase)
AucklandMarketing
(Oracle10g)
ChristchurchSales (SI)& Service(DB2)
An example of a distributed query
36
DunedinProduct
Auckland(initiates
query)
ChristchurchCustomerOrder (SI)
SELECT Customer.Name, Product.Name, Employee.NameFROM Customer, Product, Employee, OrderWHERE Order.Cust_ID = Customer.Cust_ID AND Order.Prod_ID = Product.Prod_ID AND Order.Emp_ID = Employee.Emp_ID;
WellingtonCustomer (NI)EmployeeOrder (NI)
An example of a distributed transaction
37
DunedinProduct
WellingtonCustomer (NI)EmployeeOrder (NI)
ChristchurchCustomer (SI)Order (SI)
SET TRANSACTION READ WRITE;INSERT INTO Customer VALUES (...);DELETE FROM Customer WHERE Credit_Limit < 1000;COMMIT;
Auckland(initiates
transaction)
Oracle10g provides good support for DDB
Oracle10g Parallel Server– Oracle only, not really a true DDBMS
Oracle10g Heterogeneous Services– Oracle10g DDBMS– Underlying: just about anything as
long as an interface exists (see slide **126**)
38
Oracle’s Heterogeneous Services
(Heterogeneous Connectivity; Oracle DBA)
• Distributed transactions (2PC).• SQL & data dictionary translation, pass-
through SQL.• Procedural access.• Global query optimisation.• Site autonomy.• Location transparency: synonyms &
views.
39
Oracle’s Heterogeneous Services
(Heterogeneous Connectivity, Figure 2–2)
40
How Heterogeneous Services integrates
• Non-Oracle SQL DBMSs integrated via agents (e.g., Oracle Transparent Gateways).
• Non-SQL DBMSs via direct application programming interfaces (API).
• Call any API as remote procedures.
41
Heterogeneous Services is
built on other features• Networking through Oracle Net Services
(layer over network protocols).• Oracle Names global directory service
(cf. NetWare Directory Services).• Oracle10g Replication.• Oracle Transparent Gateways.• Generic connectivity (ODBC, OLE DB).
42
Heterogeneous Services architecture
43
Creating distributeddatabases in Oracle10g
(Oracle DBA, Figure 29–2)
44
Creating distributeddatabases in Oracle10g
(Oracle DBA ch. 29; SQL Reference ◃ “CREATE DATABASE LINK”)
Create a database linkCREATE DATABASE LINK Christchurch USING Christchurch-NTS-07';
Use the database linkSELECT Address, Asking_PriceFROM Happyhomes.House@Christchurch;
45
Location transparency in Oracle10g
(Oracle DBA, Figure 30–3)
46
Location transparency in Oracle10g
(Oracle DBA ch. 30; SQL Reference ◃ “CREATE SYNONYM”)
Use views and/or synonyms
CREATE SYNONYM Chch_House
FOR Happyhomes.House@Christchurch;
CREATE VIEW Houses AS
SELECT * FROM House UNION
SELECT * FROM Chch_House;
47
Replication in Oracle10g
Simple replication using materialised views(SQL Reference ◃ “CREATE MATERIALIZED VIEW”)
CREATE MATERIALIZED VIEW Chch_Sellers REFRESH COMPLETE START WITH SYSDATE NEXT SYSDATE + 1/48AS SELECT S.* FROM Seller@Christchurch S;
48
Replication in Oracle10g
Oracle’s Advanced Replication(Oracle10g Advanced Replication manual)
•Read/write replicas.•DBA privileges required.•Usual issues with keeping replicas synchronised.
49
Remote transactions in Oracle10g
(Oracle DBA ch. 29)
Access exactly one remote site
UPDATE Happyhomes.House@ChristchurchSET Name = 'The Palace'WHERE Name = 'The Dump';
50
Distributed transactions in Oracle10g
(Oracle DBA ch. 29)
Access two or more remote or local sites
SELECT C.Name, C.Age, D.Name, D.RatingFROM Happyhomes.House@Dunedin D, Happyhomes.House@Christchurch CWHERE C.Name = D.Name;
51
The Evolution of Distributed Database Management Systems
• Distributed database management system (DDBMS) – Governs storage and processing of
logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites
52
The Evolution of Distributed Database Management Systems
(continued)• Centralized database required that
corporate data be stored in a single central site
• Dynamic business environment and centralized database’s shortcomings spawned a demand for applications based on data access from different sources at multiple locations
53
The Evolution of Distributed Database Management Systems
(continued)
DDBMS Advantages and Disadvantages
• Advantages include:– Data are located near “greatest
demand” site– Faster data access– Faster data processing – Growth facilitation – Improved communications
55
DDBMS Advantages and Disadvantages
(continued)• Advantages include (continued):
– Reduced operating costs – User-friendly interface – Less danger of a single-point failure – Processor independence
56
DDBMS Advantages and Disadvantages
(continued)• Disadvantages include:
– Complexity of management and control
– Security – Lack of standards– Increased storage requirements – Increased training cost
57
DDBMS Advantages and Disadvantages
(continued)
DDBMS Advantages and Disadvantages
(continued)
DDBMS Advantages and Disadvantages
(continued)
Characteristics of Distributed
Management Systems• Application interface• Validation • Transformation• Query optimization• Mapping • I/O interface
61
Characteristics of Distributed
Management Systems (continued)
• Formatting• Security • Backup and recovery • DB administration • Concurrency control• Transaction management
62