25
What is Data Science Big Data Dive, 20.09.2012

Что такое Data Science

  • Upload
    -

  • View
    614

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Что такое Data Science

What is Data ScienceBig Data Dive, 20.09.2012

Page 2: Что такое Data Science

Data is everywhere

Page 3: Что такое Data Science

Apps with data

Page 4: Что такое Data Science

Google Page Rank

Page 5: Что такое Data Science

Amazon Recommendations

Page 6: Что такое Data Science

Meteorology

Page 7: Что такое Data Science

Healthcare

Page 8: Что такое Data Science

Big Data processing

Page 9: Что такое Data Science

Definition ofData Science

Page 10: Что такое Data Science

Data Science is…

• Data Engineering

• Scientific Method

• Math

• Statistics

• Advanced Computing

• Visualization

• Hacker mindset

• Domain Expertise

Page 11: Что такое Data Science

Data Science is…

• A/B testing

• Association rule learning

• Classification

• Cluster analysis

• Crowdsourcing

• Data fusion and integration

• Data mining

• Ensemble learning

• Genetic algorithms

• Machine learning

• Massive parallel-processing

• Natural language processing

• Neural networks

• Pattern recognition

• Predictive modelling

• Regression

• Sentiment analysis

• Signal processing

• Simulation

• Time series analysis

• Visualization

Page 12: Что такое Data Science

Data Science is…

• Explore data

• Build model

• Apply model

The most important goal of data science is

prediction

Page 13: Что такое Data Science

Process

Page 14: Что такое Data Science

Explore data

• Preprocessing

• Data cleaning

• Transformations

• Subsets selection

• Feature selection

• Discretization

• Binarization

• Normalization

• Generalization

• Investigation

• Plots

• Histograms

• Smoothing

• Plot matrices

• Distributions

• Multidimensional scaling

• Classification trees

• Correlation matrices

Page 15: Что такое Data Science

Example: Binarization

Page 16: Что такое Data Science

Example: Plot Matrices

Page 17: Что такое Data Science

Build model

• Artificial neural networks

• Association rules

• Bayesian networks

• Clustering

• Decision trees

• Generalized linear models

• Genetic programming

• Inductive logic programming

• Sparse dictionaries

• Support vector machines

• Reinforcement learning

• Representation learning

Page 18: Что такое Data Science

Example: Decision Trees

Page 19: Что такое Data Science

Apply model

Page 20: Что такое Data Science

Tools

Page 21: Что такое Data Science

R

• Open source programming language and software environment

• Designed for statistical computing and graphics

• CRAN (The Comprehensive R Archive Network) – 5300 packages and counting

• In 2010 has become the data mining tool used by more data miners (43%) than any other

Page 22: Что такое Data Science

Mathematical packages

Page 23: Что такое Data Science

They make presentation better

• Google Prediction API

• Microsoft Analysis Services

• Oracle Data Mining

Page 24: Что такое Data Science

Python

• Well recognized for scientific engineering

• General purpose scientific libraries:

Numpy, Scipy, Matplotlib, python-multiprocessing

• Statistical, data mining, machine learning packages:

Scikit-learn, Pandas, PyBrain