60
1 The Digital Universe Scientific Data– Science of Data (Algorithmic Information Theoretical Analyses) András Benczúr ELTE Faculty of Informatics Supported by the following project: „Independent steps in scienece” ELTE TÁMOP-4.2.2/B-10/1-2010-0030

The Digital Universe Scientific Data– Science of Data ( Algorithmic Information Theoretical Analyses )

  • Upload
    odin

  • View
    43

  • Download
    3

Embed Size (px)

DESCRIPTION

The Digital Universe Scientific Data– Science of Data ( Algorithmic Information Theoretical Analyses ) . András Benczúr ELTE Faculty of Informatics Supported by the following project: „Independent steps in scienece” ELTE TÁMOP-4.2.2/B-10/1-2010-0030. 1. Latest Press Releases. - PowerPoint PPT Presentation

Citation preview

Page 1: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

1

The Digital Universe Scientific Data– Science of Data

(Algorithmic Information Theoretical Analyses)

András BenczúrELTE Faculty of Informatics

Supported by the following project:„Independent steps in scienece”

ELTE TÁMOP-4.2.2/B-10/1-2010-0030

Page 2: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

2

Latest Press Releases• CERN awards major contract for computer infrast

ructure hosting to Wigner Research Centre for Physics in Hungary 08.05.2012

• CERN today signed a contract with the Wigner Research Centre for Physics in Budapest for an extension to the CERN data centre. Under the new agreement, the Wigner Centre will host CERN equipment that will substantially extend the capabilities of the LHC Computing Grid Tier-0 activities and provide the opportunity for business continuity solutions to be implemented. This contract is initially until 31 December 2015, with the possibility of up to four, one year, extensions thereafter.

Page 3: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

3

Recent NewsWigner-DataCenter at Wigner Research InstituteTier-0 center for LHC Computing – 150M EUR investment. Rolf-Dieter Heuer : 20 years participation of Hungarian physicists in CERN. New high-tech data connection between Budapest and CERN, new challenging project that will change the way of computing support for research in Europe.Some history: Gy. Vesztergombi – DATA Grid initiative, 1999.Hungarian projects: Demo-Grid , EGEE-I:,II.,IIIHungarian Grid Competence Center, Hungrid, Cluster Grid, Desktop-Grid.

Page 4: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

4

Recent News• Big data has the power to change scientific

research from a hypothesis-driven field to one that’s data-driven, Farnam Jahanian, chief of the National Science Foundation’s Computer and Information Science and Engineering Directorate, said Wednesday. (Two weewks ago)

• The term big data refers generally to the mass of new information created by the Internet and by scientific tools such as the Hubble Telescope and the Large Hadron Collider. The emerging field of big data analysis is aimed at sorting through the massive volume of that data -- whether it’s social media posts, video clips, satellite feeds or the reaction of accelerated particles -- to gather intelligence and spot new patterns.

Page 5: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

5

Recent News• Federal officials announced in March that the government will

invest $200 million in research grants and infrastructure building for big data.

• The investment was spawned by a June 2011 report from the President's Council of Advisors on Science and Technology, which found a gap in the private sector's investment in basic research and development for big data.

Page 6: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

6

Digital Universe and Semantic Gap Mankind gave born to a new universe, the Digital Universe. Majority of our data and information is inside it somewhere and in digital form of some kind. Even new observations – from LHC, digital sensors, cameras etc. – go first in digital form into it.The conjecture on the growing semantic gap between human beings and computers:With the growing of the size of databases the length of queries grows at least logarithmically, and may grow linearly. According to the estimation from IDC in [4] the size of the Digital Universe will grow in the next five year by a factor 9. It doubles every one and a half year.

Page 7: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

7

Digital Universe and Semantic Gap The Digital Universe contains only the substitutions, or encodings of information, independently whatever information means. Inside the Digital Universe the physical processes are either transformations of signals from one form to other one or they are materialized computations.

Page 8: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

8

Digital Universe and Semantic Gap

Paradoxically, inside the Digital Universe, the basic components, the physically existing – even temporarily - digits as bits and bytes have no semantic meaning but operational, computational or transformational. The observer’s meanings at the very end of the interaction with the real world are in the mappings of the real world stuff to a formal computable model. This mapping is the kernel of filling the gap between human beings and computers.

Page 9: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

9

Digital Universe and Semantic Gap H. Mason: data scientist need tree skills:•mathematically modeling of data, build the model•engineering in implementing data processing•find inside and tell stories on the data, asking the right questions – the hardest taskWe need them to fill the SEMANTIC GAPP. Gelsinger: „Thirty years ago we didn’t have CS departments, now every quality school on the planet has one. Now, nobody has a data-science department. In thirty years every school on the planet will have one.” In: „Big Data’s Big Problem: Little Talent (The Wall Street Journal, 04/29/2012

Page 10: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

10

Motivation1967. Debrecen, Colloquium on Information Theory

„Where does information come from?” (from past)S. Watanabe, abstractThe question was raised for inductive inference and for deductive inference.„Human mind, being an information transducer, it can lose but not gain information.”So, Digital Universe, being an information transducer, it can lose but not gain information.

Page 11: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

11

MotivationToday: Where is information? In the Digital Universe. Digital Universe: can lose but not gain information. Information is collected in it.In 2011: 1.8 Zettabyte of data will be created. Is information there? There are signals only. How can we gain informationfrom it? By computation. Computation: signal transformation. How Much Information?What is information?

Page 12: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

12

MotivationData volume on the NET:

Estimation: the data on the Web doubles in 11-18 months

Exabyte: the size of new data in year 1998

IDC research: the size of new data in 2011 will exceed 1.8 Zettabyte

(1,8*1021 Byte)Upper estimate: 108 programmers, 8 ours daily, one keystroke (one byte)

per second: new programs in one year: 1015 byte

Page 13: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

13

MotivationNext generation science , data intensive science (Jim Grey, Alex Szalay et al. 2005).„Scientists generate new data much faster as they can analyze them. All looks like optical illusion.”

(Hugh Kieffert)Big Data

Scientific Data

Page 14: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

14

The Data-Scope Project - 6PB storage, 500GBytes/sec sequential IO, 20M IOPS, 130TFlops

• Thursday, February 2, 2012 at 9:10AM • “Data is everywhere, never be at a single

location. Not scalable, not maintainable.” –Alex Szalay

• interview by Nicole Hemsoth with Dr. Alexander Szalay, Data-Scope team lead, is available at The New Era of Computing: An Interview with "Dr. Data".

Page 15: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

15

Semantic Gap

The semantic gap between two persons.The semantic gap between a person and a computer.The effect of growing data volume on the semantic gap: the law of algorithmic information theory.

Page 16: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

16T

Mathematics: Information TheoryMathematical theories of information deal with quantitative properties. They mainly deal with the objective parts of information (representation and the mapping to their referents). The subjective aspect, the semantics of the referents is the problem of the observer. In [1]: P.J. Denning summarizes the discussion on the definition of information in the following: “The formal definitions of data (objective symbols) and information (subjective meaning) do not help me to design computers and algorithms. … Still, what information is remains an open question. “

Page 17: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

17T

Mathematics: Information TheoryIf we want to get closer to the notions of information from the point of view of the mathematical models we have to investigate carefully what is measured by the entropy functions. We can measure the quantity of information in three ways, according to Kolmogorov [2]. All the three measures are related to the length of description and not to the meaning of information. They are connected to the length of optimal digital code.

Page 18: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

18

Maesures of Information quantityKolmogorov: three approaches

1. Probabilistic: Shannon-entropy

2. Algorithmic: Kolmogorov-entropy

3. Combinatorial: uniform code length for all elements of the set

i

n

iin pppppH 2

121 log),,(

. machine-Turing universal thetipically function, reference fixed theis U definition In the

.psuch no if , and ,xpU|plmin)x(CxC U

Page 19: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

19T

Mathematics: Information TheoryIn the Shannon-model, the expected value of the code length is minimized, whilst Kolmogorov-entropy measures the minimal length of codes used by the Universal Reference machine. In both models we don’t know what information is, we only know that there is a way to construct/reconstruct it from a signal of given length. We don’t know what information is, we only know how much it is. Processing information you have to understand meaning. Meaning should be in the eye of the beholder.

Page 20: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

20T

Basics of Algorithmic Information Theory

The two basic principles of algorithmic information theory:Different things need different encodings.Decoding needs computable functions.

Page 21: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

21

Basic techniques: 1) counting the number of code words of given lengths,2) using a reference machine that enumerates a set of decoding functions. Invariance theorem.

The algorithmic information quantity:

the length of the shortest codeword used by the Universal Turing-Machine as reference machine.

l(p): the length of code p.

Page 22: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

22

Conditional Kolmogorov entropy

Definition:

Prefix entropy: choose the prefix Universal Turing-

Machine U(p,y) as reference machine

exists. psuch no when , and

,xy,pU|plmin)y|x(Cy|xC U

Page 23: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

23

Conditional Kolmogorov entropy

The measure of the algorithmic information quantity, the Kolomogorov entropy is not good for direct investigation of the Digital Universe. Only the construction of the Universal Reference Machine is important as measurement tool in finding approximation of quantitative analyses of the behavior of the Digital Universe.

Page 24: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

24

Querying a computer - a modell

Participants: the computer Watson , and person Holmes.

Watson: Content of data system: M, contains codes of programs: ProgAnswers a query (request) Q if there exists P in Prog, such that P computes some answer A from Q and M. The reference to P must be given in Q.

Page 25: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

25

Querying a computer - a model

The person Holmes:Conscious content of the brain: knowledge K, contains a part on „Thinking”, the ability to Articulate and Codify Knowledge, Cognitive Processes, Mental Mechanisms Holmes should articulate and codify a formal query Q for retrieving data A from Watson. This process is called filling the semantic gap between Holmes and Watson.

Page 26: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

In our simple model Holmes submit the query Q and Watson answers A. Q contains some reference to a program P in M used to compute answer A=P(Q,M) .Now the conditional Kolmogorov-entropy

(The Law of information no growth.) Meaning: the length of the shortest query used by U, Practical limitation: strong only for large A and Q.

pcQlM|AC

Page 27: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

27

New reference machine: M with Prog inside

The reference machine used in the definition of Kolmogorov-entropy utilizes the possibility of enumerate every computable functions, and it is a bit far from practical applications. Following the basic idea in the construction of the reference machine, we can consider M with Prog inside as reference machine. (The anytime best approximation of the Universal Reference Machine is in the Digital Universe.)

Page 28: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

28

The conditional algorithmic entropy of A given M is the length of the shortest query for which Watson gives the answer A: In notation:

note: q contains a reference to pAn important difference from the universal Turing machine is that Watson contains a collection of facts in M. (Finite Oracle) We can measure the querying efficiency of Holmes in getting answer A from Watson as

AMqppqlMACWATSON , and Mp and |min|

MACQl Watson |

Page 29: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

29

Quantitative modelling the human computer interaction

Supose, today Holmes solves a problem D after entering query Q and retrieving some information A from Watson.This means, using a human reasoning “program” R, Holmes obtains solution S from D, K and A:R(D,K,Q,A)=SNote: the semantics of A is relative to Q.

.

Page 30: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

Douglas Adams: The Hitchhiker’s Guide to the Galaxy“Tell us!”

All right said Deep Thought. “The Answer to the Great Question…”“Yes…!”“Of Life, the Universe and Everything…” said Deep Thought “Is Forty-two.”“You have never actually known what the question is.”“So once you do know what the question actually is, you know what the answer means.”

Page 31: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

31

Individual information measure

Similarly to Watson we can introduce information measures for Holmes.The need of querying Watson means that he can’t give a solution S, even if the problem is formulated in the form of D, so

Explanation: K is closed

),|( also and )|( DKSCKSC HolmesHolmes

Page 32: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

32

Model fittingModel fitting between the problem domain of D and a pre-coded model in M is necessary for codifying query Q. During this process the knowledge on M contained in K plays an important role in formulating an efficient query. Also, M may contain some information on K, this is the possibility of personalization. All this influences the semantic gap in formulating query Q. Explanation – the role of stochastic modellingProblem of (scientific) databases: mapping the semantics of measurement information to computational data model

Page 33: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

33

Information no growth law revisitedFormulating query Q he uses K and the problem description D. Added Q to M he receives back some information that has been added to M by someone else. If the answer A is sufficient to solution S, then there is no semantic gap. Otherwise, in order to obtain the solution S from K,D,Q and A he uses some process R not codified for Watson. Another semantic gap arises: codifying R into a code QR, so that Watson gives answer SR for QR.

Page 34: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

34

How can we use the model?

Estimate the cardinality of the sets of possible answers, questions, problems, and then estimate the average length of queries and answers.Let us fix the present situation as above.With growing M, the code length of new query and answer of the same semantics as the former A had are growing.

Page 35: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

35

The effect of growing M

Conditional entropy of answer A according to the reference machine Watson, or the Digital Universe or the Universal Turing machine uses the condition that M is given. How will the conditional entropy vary when we add some new data (digital signals) to M? Denoting the new content by M’ we can ask what the new conditional entropy of the same answer A is.

Page 36: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

36

The effect of growing M

The number of possible answers grows exponentially. So the number of queries also grows exponentially.Typical Query lengths grows linearly.

Page 37: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

37

Example: subset query

M encodes n elements of a set. A query retrieves a subset. Number of queries and answers: 2n. Average length of queries and answers: c*n.Adding m new element to the set: Number of queries and answers: 2n+m. Average length of queries and answers: c*(n+m).The average length is independent of the reference machine.

Page 38: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

38

The threat of growing semantic gap

The size of queries and answers exceeds the processing capacities of a human beings.The difference between information quantity of K (human knowledge) and M (World’s data) is growing exponentially.The same will be true for the common knowledge of a group of people, and finally for the mankind.

Page 39: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

39

World’s DataConducted by Revolution Analytics at the Joint Statistical Meeting held in Miami from July 30 through Aug. 4, the survey shows that 97% of data scientists believe "big data" analytics technology currently is falling short of enterprise needs. • Specifically, the 200 or so scientists surveyed

highlighted three obstacles to running analytics on big data:

• * the inherent complexities of big data software• * problems applying valid statistical models to the

data• * a general lack of insight into what the data means

Page 40: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

40

Evolution of info communication technologies will help us

Search engines – concentration (Google, Yahoo, Ms Explorer, Mozilla, …)Distributed and parallel technologies: HPC, Clusters, Grid, Cloud, …Social Networking: Twitter, Blogging, Youtube, Facebook, …Semantic technologies (Semantic Web, RDF, OWL,…)Data Mining, Data Warehousing, OLAP, Big DataNo-SQL

Page 41: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

World’s Data

Unstructured data, files, email, video will account for 90% of all data created over the next decade.Number of servers managing the world’s data stores will grow by ten times.The bad news: the number of IT professionals available to manage all that data will grow only by 1.5 times today’s levels. They simple won’t keeping pace with demand. (Threat of growing Semantic Gap.)New data sources: embedded systems, sensors in clothing, medical devices, buildings, …)

Page 42: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

42

Data intensive science Next generation science by Jim Grey, Alex Szalay et al. 2005.„Scientists generate new data much faster as they can analyze them. All looks like optical illusion.”

(Hugh Kieffert)

Page 43: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

43

Jim Gray’s Law of Data Engineering

1. Scientific cumputing is revolvong around data.

2. Need scale-out solutions for analyses

Page 44: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

Jim Gray: The Big Picture

• Data ingest • Managing a petabyte• Common schema• How to organize it?• How to reorganize it?• How to coexist with others?

• Data Query and Visualization tools • Support/training• Performance

– Execute queries in a minute – Batch (big) query scheduling

The Big Problems

Experiments &Instruments

Simulationsfacts

facts

answers

questions

?Literature

Other Archives facts

facts

Page 45: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

The Big Picture - extended

• Data ingest • Managing a petabyte• Common schema• How to organize it?• How to reorganize it?• How to coexist with others?

• Data Query and Visualization tools • Support/training• Performance

– Execute queries in a minute – Batch (big) query scheduling

The Big Problems

Experiments &Instruments

Simulationsfacts

facts

Answers

QuestionsMProg

Literature

Other Archives facts

facts

codes Programs

She is XYDigital

Universese

Documentsfacts

Page 46: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

46

Computational Statistics

Unstructured data, like recording facts on stochastic and random phenomena in M needs queries formulated in terms of computational statistics.from MIT Technology Review Jan/Feb 2010:Mike Lynch (cofounder of Autonomy) pp.24: Why can’t Google’s algorithms search unstructured information? Processing unstructured information you have to understand meaning. Meaning should be in the eye of the beholder.

Page 47: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

47

Theory of Algorithmic Statistics.

Two parts code: description of a set, conditional encoding of the elementsKolmogorov’s structure function:

The description of the set S is the structural part; it gives the regular or statistical properties of x, and usually has some natural meaning. The second part, the long code, is the random component.Now, probably, the random part of the Digital Universe is much larger than the discovered structure.

Sogx SCSxh 2, lmin

Page 48: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

48

The three Universe

The UniverseThe Univerese in a human brainThe Digital Universe

Three different past to be observed„Where does information come from?” (from past)Research: force and provoke the Nature (an Universe) to produce and show a past such that we have not observed yet.

Page 49: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

"WE WANT THE DEMON, YOU SEE, TO EXTRACT FROM THE DANCE OF ATOMS ONLY INFORMATION THAT IS GENUINE, LIKE MATHEMATICAL THEOREMS, FASHION MAGAZINES, BLUEPRINTS, HISTORICAL CHRONICLES, OR A RECIPE FOR ION CRUMPETS, OR HOW TO CLEAN AND IRON A SUIT OF ASBESTOS, AND POETRY TOO, AND SCIENTIFIC ADVICE, AND ALMANACS, AND CALENDARS, AND SECRET DOCUMENTS, AND EVERYTHING THAT EVER APPEARED IN ANY NEWSPAPER IN THE UNIVERSE, AND TELEPHONE BOOKS OF THE FUTURE…" (STANISLAW LEM, THE CYBERIAD)

DEMON OF THE SECOND KIND

Page 50: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

A Demon of the Second Kind is a fictional machine that writes factual statements, but only all too well. It appears in the short story "The Sixth Sally," which is part of the novel The Cyberiad by Stanislaw Lem.In the story, two clever, space-traveling robots (Trurl and Klapaucius) fall into the clutches of an evil robot, the giant pirate Pugg. This pirate does not want to rob them of gold or silver; instead, he wants information. Specifically, Pugg tells his two captives that he will forcibly hold them until they tell him everything they know.Faced with the possibility of spending eons reciting all their knowledge, Trurl and Klapaucius offer the pirate a bargain. If he promises to let them go afterwards, the pair will build him a Demon of the Second Kind, a special machine that can print out an infinite amount of information.

DEMON OF THE SECOND KIND

Page 51: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

The process is straightforward. In any gas, molecules are bumping into each other with trillions of collisions per second. Sometimes, they happen to arrange themselves in the shape of a letter. More rarely, they arrange themselves in the shape of a word. Rarer still, they arrange themselves to read out a statement. Some of these statements are true; some aren't. The specialty of a Demon of the Second Kind is that it can separate the false statements from the true, and given a roll of paper, it will write out the truth and forget the falsehood.The Demon can separate fact from fiction, but it cannot separate the useful from the useless, and almost every fact it prints is good for absolutely nothing.

An overabundance of useless information is a curse.

DEMON OF THE SECOND KIND

Page 52: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

DEMON OF THE SECOND KIND Demon of the Second Kind gathering intelligence

Page 53: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

PUGG can only wait for the

Demon to run out of paper.

Page 54: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )
Page 55: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )
Page 56: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

Data Mining:Potentials and Challenges

Rakesh Agrawal & Jeff Ullman

Page 57: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

Summary

Data mining has shown promise but needs much more further research

We stand on the brink of great new answers, but even more, of great new questions -- Matt Ridley

Page 58: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

58

Thank you for the attention

Page 59: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

source encoder channel decoder destination

Sender

Computer

Receiver

Computer

message signal signal message

(consciousness)

(conscious-ness)

? ?

Computers and Information technology

Common knowledge in electronic databases = Digital Universe

? : Tools of interaction

THE NET

Page 60: The Digital  Universe Scientific  Data– Science of Data   ( Algorithmic Information Theoretical Analyses )

source encoder channel decoder destination

Sender

Computer

Receiver

Computer

message signal signal message

Artifact/Nature

Artifact/Nature

? ?

Computers and Information technology

Common knowledge in electronic databases = Digital Universe

? : Tools of interaction

THE NET