47
Big Data – To Explain or To Predict? Big Data Experts Speaker Series Rotman School of Management, U Toronto, March 2016 Galit Shmueli

Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Management

Embed Size (px)

Citation preview

Page 1: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Big Data – To Explain or To Predict?

Big Data Experts Speaker Series Rotman School of Management, U Toronto, March 2016

Galit Shmueli

Page 2: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Galit Shmueli ( 徐茉莉 )www.galitshmueli.com

❷ 2000-2002 Carnegie Mellon Univ.Visiting Assistant Prof.Dept. of Statistics

❸ 2002-2012 Univ. of Maryland College ParkAssistant then Associate Prof. of

Statistics & Management Science

R H Smith School of Business

2008-2014 Rigsum Institute (Bhutan)

Co-Director, Rigsum Research Lab

❹ 2011-2014 Indian School of Business SRITNE Chaired Prof. of Data

Analytics, Associate Prof. of Statistics & Info Systems

❶ 1994-2000 Israel Institute of

TechnologyMSc + PhD, Statistics

2014-… NTHUInstitute of Service ScienceDirector, Center for Service

Innovation & Analytics

Page 3: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Research in Data Analytics‘Entrepreneurial’

statistical & data mining modeling (for today’s problems)

Interdisciplinary modeling

Statistical StrategyTo Explain or To Predict?Information QualityRegression with Big Data

Page 4: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Road Map

DefinitionsExplanatory-dominated social sciencesExplanatory modeling ≠ predictive modeling

Why?Different modeling pathsExplanatory power vs. predictive power

Implications

Page 5: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Definitions

Explanatory modeling:Theory-based, statistical testing of causal hypotheses

Explanatory power:Strength of relationship in statistical model

Page 6: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Definitions

Predictive modeling:Empirical method for predicting new observations

Predictive power:Ability to accurately predict new observations

Page 7: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Explain PredictDescribe

Matching Game

Social Sciences

Machine learning

Statistics

Page 8: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Statistical modeling in social sciences &

management research

Purpose: test causal theory (“explain”)Association-based statistical models

Prediction nearly absent

Page 9: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Start with a causal theory

Generate causal hypotheses on constructs

Operationalize constructs → Measurable variables

Fit statistical model

Statistical inference → Causal conclusions

Classic journal paper

Page 10: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

In the social sciences,

data analysis is mainly used for testing causal theory.

“If it explains, it predicts”

Page 11: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

“Empirical prediction aloneis un-scientific”

Some statisticians share this view:

The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth.

- Parzen, Statistical Science 2001

Page 12: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Prediction in top research journals in Information Systems

Predictive goal?Predictive modeling?Predictive assessment?

1990-2006

Page 13: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

52 “predictive” articles among 1,072 in Information Systems top journals

Page 14: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

“A good explanatory model will also predict well”

“You must understand the underlying causes in order to predict”

Page 15: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management
Page 16: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Meanwhile… in industry

Page 17: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management
Page 18: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Philosophy of Science

“Explanation and prediction have the same logical structure”

Hempel & Oppenheim, 1948

“It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation”

Helmer & Rescher, 1959

“Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding”

Dubin, Theory Building, 1969

Page 19: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Why statistical

explanatory modeling differs from

predictive modeling

Page 20: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Explanatory Model: Test/quantify causal effect for “average” record in population

Predictive Model: Predict new individual observations

Different Scientific Goals

Different generalization

Page 21: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Theory vs. its manifestation

?

Page 22: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Four aspects

1. Theory – Data

2. Causation – Association

3. Retrospective – Prospective

4. Bias - Variance

Page 23: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

“The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”

Page 24: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Best explanatory model

Best predictive model

Point #1

Page 25: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Predict ≠ Explain

+ ?

“we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However… they could not help at all for improving the [predictive] accuracy.”

Bell et al., 2008

Page 26: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Explain ≠ PredictThe FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125%

“We are planning to… develop predictive models for bioavailability and bioequivalence”

Lester M. Crawford, 2005Acting Commissioner of Food & Drugs

Page 27: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

“For a long time, we thought that Tamoxifen was roughly 80% effective for breast cancer patients.

But now we know much more: we know that it’s 100% effective in 70%-80% of the patients, and ineffective in the rest.”

Page 28: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Goal Definition

Design & Collection

Data Preparation

EDA

Variables? Methods? Evaluation,

Validation & Model Selection

Model Use & Reporting

Page 29: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Study design

Hierarchical data

Observational or experiment?

Primary or secondary data?

Instrument (reliability+validity vs. meas. accuracy)

How much data?

How to sample?

& data collection

Page 30: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Data Preprocessing

reduced-feature models

missing

partitioning

Page 31: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Data exploration, viz, reduction

PCA

Factor Analysis(interpretable)

Dimension Reduction(fast, small)

Page 32: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Which Variables?

Multicollinearity?causation associations

endogeneity ex-post

availability

A, B, A*B?

Page 33: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

ensemblesShrinkage models

variance bias

Methods / ModelsBlackbox / interpretableMapping to theory

Page 34: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Evaluation, Validation& Model Selection

Training dataEmpirical model Holdout data

Predictive power

Over-fitting analysis

Theoretical model

Empirical model

Data

ValidationModel fit ≠

Explanatory power

Page 35: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Inference

Model Use: Industry

Identify causal factors

generate predictions for new data

Predictive performance

Over-fitting analysis

Null hypothesis

Naïve/baseline

Page 36: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Inference

Model Use (Science)

test causal theory

generate new theorydevelop measurescompare theoriesimprove theoryassess relevanceEvaluate predictability

Predictive performance

Over-fitting analysis

Null hypothesis

Naïve/baseline

Page 37: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Point #2

Explanatory Power

Predictive Power ≠

Cannot infer one from the other

Page 38: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

out-of-sample

Performance Metrics

type I,II errors

goodness-of-fit

p-values

over-fitting

costs

prediction accuracy

interpretation

Training vs. holdout

R2

Page 39: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Explanatory Power

Pred

ictiv

e Po

wer

Page 40: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

The predictive power of an explanatory model has important scientific value

Relevance, reality check, predictability

Page 41: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Current state in academia (social sciences and management)

“While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.”

Helmer & Rescher, 1959

Distinction blurred

Unfamiliarity with predictive modeling/assessment

Prediction underappreciated

Page 42: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

State-of-the-art in industry

Distinction blurred

Prediction over-appreciated

“Big Data” synonymous with prediction

Page 43: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

How does this impact

Scientific research?

Page 44: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

How does this impact organizations’ actions?

…and our lives?

Page 45: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Will the customer pay?

What causes non-payment?

Page 46: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

ExplainPredict

PredictPotential explanations

Page 47: Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Management

Shmueli (2010) “To Explain or To Predict?”, Statistical ScienceShmueli & Koppius (2011) “Predictive Analytics in IS Research”, MISQ