XML/SGML 文件管理: 概念、技術、及經驗 葉 慶 隆 博士 大同工學院...

Preview:

Citation preview

XML/SGML 文件管理:概念、技術、及經驗

葉 慶 隆 博士大同工學院 資訊工程系

台北市中山北路三段四十號 104

chingyeh@cse.ttit.edu.tw

2

Agenda

• SGML/XML View of Documents (pp 3─9)• XML Background (pp 10─12)• Document Management (pp 13─17)• IETM (pp 18─24)• An XML DB (pp 25)• IETM and DB (pp 26─27)• IETM and Expert System (pp 28─29)• Expert System Background (pp 30─31)• ES for M60A3 Engine Troubleshooting (pp 32─48)• XML-Related Project (pp 49─50)• Conclusion (pp 51)

3

SGML/XML View of Documents

• Central to SGML/XML is the concept that documents have structurestructure, contentcontent, and formatformat.

• These three ingredients combine to form a document.

4

SGML/XML View of DocumentsContent

• What is Content?– Content is the actual data within a

document.– The words and illustrations that make

up a bicycle assembly manual are its contents.

5

SGML/XML View of DocumentsFormat

• What is format?– Format consists of how the words, sentences,

and paragraphs are visually presentedvisually presented and distinguished from one another within a document.

– Boldface for title, italics for special terms, and blank lines between sections are examples of document formats.

– People often confuse format with People often confuse format with

structure.structure.

6

SGML/XML View of DocumentsStructure

Coconut Pudding

12 ounces coconut milk

4 to 6 tablespoons sugar

4 to 6 tablespoons cornstarch

3/4 cup water

Pour coconut milk into saucepan.

Combine sugar and cornstarch; stir in waterand blend well.

Stir sugar mixture into coconut milk; cook and stir over low heat until thickened.

Recipe

Title

IngredientList

Ingredient

InstructionList

Step

7

• Defining structures in SGML/XML– The structure of a document its type is

defined by a document type definition, or DTD.

– The DTD lays out the rules for a document through the use of elements, attributes, and entities.

SGML/XML View of DocumentsStructure

8

SGML/XML View of DocumentsStructure

<!ELEMENT recipe (title,ingredientList, instructionList)><!ELEMENT title (#PCDATA)><!ELEMENT ingredientList (ingredient*)><!ELEMENT instructionList(step*)><!ELEMENT ingredient (#PCDATA)><!ELEMENT step (#PCDATA)>

• An XML DTD looks like

9

<!DOCTYPE RECIPE SYSTEM ”recipe"><RECIPE><TITLE>Coconut Pudding</TITLE><INGREDIENTLIST> <INGREDIENT> 12 ounces coconut milk</INGREDIENT> <INGREDIENT> 4 to 6 tablespoons sugar </INGREDIENT> <INGREDIENT> 4 to 6 tablespoons cornstarch </INGREDIENT> <INGREDIENT> 3/4 cup water </INGREDIENT><INGREDIENTLIST><INSTRUCTIONLIST> <STEP> Pour coconut milk into saucepan. </STEP> <STEP>Combine sugar and cornstarch; stir in water and blend well. </STEP> <STEP>Stir sugar mixture into coconut milk; cook and stir over low heat until thickened. </STEP> …</INSTRUCTIONLIST></RECIPE>

SGML/XML View of DocumentsStructure

10

XML Background

• HTML helped establish the Internet by providing a universal way to present information.

• However, HTML only addresses the presentation of data.

• Using SGML, users can add structure along with the content of a document.

• However, SGML has proven too heavy-weight for the Internet.

11

XML Background

• The XML is a simple dialect of SGML.• HTML is sufficient for sending web pages

that are viewed by human beings.• XML, however, adds the tags that enable

computers to understand, act on or process the information.

• XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

12

XML Background

• XML Application Profile– Information brokering – Electronic business– Electronic publishing

13

Printing

Import

Exchange

Searching andviewing

Creation

Types of Interaction with Document

Workstation

UpdateWorkstation

Review/validation

Workstation

Conversion/transformation

Workstation

Document classificationDocument assemblyDocument archivalDocument storage

Useful databaseinformation

Document creationand modification

Document managementand storage

Document utilization

Workstation

Laser printer

Building alternatedocuments

Online searchingviewing,

exchange, export

Extraction,analysis

14

Data Type Requirements of Documents

• HTML– One file per page– Simple uni-directional linking

• XML– Tens, hundreds or even thousands of objects per

page– Multiple DTDs– Hierarchical structure and rich linking– Query and navigation capabilities required– Agents and business rules interact with the data

15

Data Types of Storage File System

• File system– Store monolithic stuff.– Folder system on top of them– Good at storing multimedia data

16

Data Types of Storage Relational DB

• Relational database– Tabular in nature– Good at storing rows and columns of

data like spreadsheets and data from forms like invoices.

17

Data Types of Storage Object-Oriented DB

• Object-oriented database– Good at managing structured,

hierarchical rich linked information.– That’s exactly what XML is.– XML is the object representation of

data.

18

IETM Background

• An Interactive Electronic Technical Manual (IETM), as defined in the DoD IETM Specifications, is a package of information required for the diagnosis and maintenance of a weapons system, optimally arranged and formatted for interactive screen presentation to the end-user.

• It is a Technical Manual prepared (i.e., authored) by a contractor and delivered to the Government, or prepared by a Government activity, in digital form on a suitable medium, by means of an automated authoring system.

19

IETM Background

• An IETM is designed for electronic screen display to an end user, and has the following three characteristics: – The information is designed and formatted for screen

presentation to enhance comprehension. – The elements of technical data making up the technical

manual are interrelated. A user's access to required information is possible by a variety of paths.

– The computer-controlled technical manual display device functions interactively (as a result of user requests and information input) to provide procedural guidance, navigational directions, and supplemental information.

20

IETM Background

• IETMs allow a user to locate required information faster and more easily than is possible with a paper technical manual.

• They are easier to comprehend, more specifically matched to the system configuration under diagnosis, and are available in a form that requires much less physical storage than paper.

• Powerful interactive troubleshooting procedures, not possible with paper technical manuals, can be made available using the intelligent features of the IETM display device.

21

IETM Specifications for DoD Use

• MIL-M-87268 defines how the IETM should look and behave to the reader. This standard will soon be replaced by a performance specification (MIL-PRF-87268A).

• MIL-D-87269 establishes the IETM database forms, structure, and key controlling mechanisms. This standard will soon be replaced by a performance specification (MIL-PRF-87269A).

22

Class Definition of IETM

• NSWCCD developed a set of definitions to partition the range of electronic technical manual functionality into five classes to establish a common framework and vocabulary for discussions.

• The classes are defined in fairly broad, general terms that necessarily overlap somewhat.

• The class definitions have been loosely adopted within Navy to facilitate discussions of options and differences.

• http://navycals.dt.navy.mil/classes.html

23

Class Definition of IETM

• Class 1:– Electronically Indexed Pages

• Class 2:– Electronically Scrolling Documents

• Class 3:– Linearly Structured IETMS (example)

• Class 4:– Hierarchically Structured IETMs (example)

• Class 5:– Integrated Data Base (IETIS)

24

Class Definition of IETM

25

An XML DBMS

OODB (Jasmine)

Class schemaData object

XML parser

DB schemagenerator

XML document

XML DTD

Object definitions

in ODQL

Schemadefinitions

in ODQL

ODQLprocessor

ODQL (Object Data Query Language): the language for defining, manipulating,and query object in Jasmine, an object-oriented DB by CA.

26

IETM and DB

• Class 3 (SGML file without DB)– Data Format

• Linear ASCII with SGML tags • SGML with content vice format tags • Maximum use of MIL-D-87269 • Generic: SGML tags equivalent to MIL-D-87269

27

IETM and DB

• Class 4 (SGML + DB)– Data Format

• Fully attributed DB elements (MIL-D-87269) • MIL-D-87269 content tags with full conformance with

Generic Level Object Outlines (architectural forms) • Authored directly to database for interactive electroni

c output • Data managed by a DBMS • Interactive features "authored in" vice added-on • Generic: COTS equal to MIL-D-87269 data definition an

d tags

28

IETM and Expert System

• Class 5 – Display

• Expert system allows same display session and view system to provide simultaneous access to many differing functions (e.g., supply, training, troubleshooting)

29

IETM and Expert System

• Class 5 – Data Format

• IETM info integrated at the data level with other application info

• Does not use separate databases for other application data.

• Identical to Class 4 standards for IETM applications data per MIL-D-87269

• Coding for Expert Systems and AI modules when used

• Generic: COTS equal to MIL-D-87269 data definition and tags

30

Elements of an Expert System

User

Knowledge engineer

Working Memory

KnowledgeBase

RuleAdjuster

InferenceEngine

Interface

31

Knowledge Acquisition

HumanExpert

HumanExpert

KnowledgeEngineer

KnowledgeEngineer

Dialog

Knowledge Base of

Expert System

Knowledge Base of

Expert System

Explicit Knowledge

32

ES for M60A3 Engine Troubleshooting

• Knowledge representation• Knowledge acquisition• The user interface• Implementation• Demonstration

33

Implementation Tools

• Flex: – An expert system shell developed by

Logic Programming Associates, UK.– Based on LPA Prolog– Window programming facilities– Rules, frames.

34

Knowledge Representation (M60A3)

• IF-THEN Rule /* 若 主電瓶指示器發生效用 *//* 且 引擎可發動 *//* 則 執行 solve1 */

rule solution1 if check_master_battery_light is ‘ 是’ and check_engine is ‘ 是’ then solve1.

35

Knowledge Representation (M60A3)

• Prompt and explanation

36

Knowledge Acquisition (M60A3)

• Knowledge source– Troubleshooting procedure from the M60A3

maintenance manual步驟 1. 將主電瓶開關定於通電位置。

檢查主電瓶指示器燈是否發生效用。 (1) 若指示器燈能作用 , 應進行步驟 2 (2) 若指示器燈不能作用 , 則應進行 60 號故障 排除步驟 2. 檢查變速器變速桿 … (1) ... (2) ...

37

Knowledge Acquisition (M60A3)

• The knowledge source can be illustrated by a decision tree.

是檢查主電瓶指示器燈是否發生效用 ?

進行 60 號故障排除 檢查變速器變速桿是否在停車位置 ?

將其定於 P 位置並試圖發動引擎

在 P 位置將變速桿定於H 、 L 、 R 再將變速桿定回 P 位置並試圖發動引擎

檢查引擎是否轉動 ?

38

User Interface (M60A3)

• The interface provides three functions:– Display the

malfunction component

– Diagnosis

– Troubleshooting procedure

39

User Interface (M60A3)

• User can choose one of:– Default troubleshooting

procedure (inference starting from the root of the decision tree), or

– Procedure for selected malfunction component (from the corresponding node in the tree).

40

Demo (M60A3)

• At the beginning of program execution

41

Demo (M60A3)

• Press [ 確定 ]

•Press [ 確定 ]

42

Demo (M60A3)

• Answer the question,

43

Demo (M60A3)

44

Demo (M60A3)

45

Demo (M60A3)

46

Demo (M60A3)

47

Demo (M60A3)

48

Further Improvement

• Validating the prototype system• Adding shallow knowledge• Improving user interface• Automatic conversion of M60A3 SGML docu

ment instances into an OODB• Employing the SGML DB as the knowledge

base of the expert system• Intelligent Interactive Electronic Technical

Manual (IETM)

49

XML-Related Projects

• Design and Implementation of an Object-Oriented XML Document Repository– Sponsor: III– Time: 1998/9─1999/6

• An Office Automation System Based on SGM Document Database– Sponsor: NSC– Time: 1998/8─1999/7

50

XML-Related Thesis Projects

• A Video Content Query System Based on XML Database

• An Agent-Based Electronic Commerce System Based on Object-Oriented Database System

• A Query-By-Template User Interface for XML Document Database

51

Conclusion

• XML will be the "lingua franca" of the WWW for – Information brokering, – Electronic business, and – Electronic publishing.

• Using XML DB to combine the task of technical document management and data management.

Recommended