Upload
galit-shmueli
View
140
Download
1
Embed Size (px)
Citation preview
Research Using Behavioral Big Data A Tour and Why Mechanical Engineers Should Care
Galit Shmueli
I’m not a mechanical engineerPhD in Statistics, Technion IE&MCMU Statistics DeptU of Maryland Business SchoolIndian School of BusinessNational Tsing (“Ching”) Hua U, Inst. Service Science
פרופ' מנחם שמואלי )ז"ל(
1935-1980הנדסת מכונות, טכניון
Research in Data Analytics‘Entrepreneurial’ statistical & data mining modeling (for today’s problems)
Interdisciplinary Research
Statistical StrategyTo Explain or To Predict?Information QualityData Mining and Causality
What is Behavioral Big Data (BBD)
Special type of Big DataBehavioral: people’s actions, interactions, self-reported opinions, thoughts, feelings
Human and social aspects: Intentions, deception, emotion, reciprocation, herding,…
When aware of data collection -> modify behavior (legal risks, embarrassment, unwanted solicitation)
BBD vs. Medical Big Data
• Physical measurements
• Data collection timing often set by medical system
• Clinical trials: awareness & vested interest
• People’s daily actions, interactions, self-reported feelings, opinions, thoughts
• Data generation timing often chosen by user
• Experiments: users often unaware; goal not always in user’s interest
BBD on Citizens and Customers – old storyGovernments law enforcement, security, traffic (cameras, sensors)
Financial Institutionsfraud, loans (IT systems, cameras)
Telecoms fraud, infrastructure, marketing (IT systems, mobile)
Retail Chainsmarketing, operations, merchandising (POS systems, video, social, mobile)
InsuranceUsage-based premiums (telematics)
“Old”:• Cameras• Sensors• IT systems
(POS, calls,…)New:• GPS• Internet• Mobile• Social• Things
BBD on Employees
Service Providersquality control, employee performance
Electronic Performance Monitoring (EPM) systems, web surfing, e-mails sent and received, telephone use, video, location (taxis)
BBD on Citizens, Customers, Employees: Internet!• BBD now also available to small companies & organizations• Online platforms have BBD (e-commerce, gaming, search,
social networks…)• Voluntarily entered by users (UGC): personal details, photos,
comments, messages, search terms, bids in auctions, likes, payment information, connections with “friends”
• Passive footprints: duration on the website, pages browsed, sequence, referring website, Internet browser, operating system, location, IP address.
• BBD now available to individuals: Quantified Self
1. Research Opportunity
2. Understand3. Collaborate
How does your ME work relate to BBD? To Data Analytics & Social Sci?
Engineering
Social Sciences
Data Analytics
Behavioral Big Data
From theory to practice
More and more human and social activities are moving online
Most companies that have BBD were not created for the purpose of generating BBD
Two important points
Why should mechanical engineers care about BBD?
Technology is advancing in two directions
Fully automated (algorithmic) solutions
Because you are (and should be) involved in designing both!
Micro-level recording of human and social behavior
1. Research Opportunity2. Understand3. Collaborate
How does your ME work relate to BBD? To Data Analytics & Social Sci?
Engineering
Social Sciences
Data Analytics
Behavioral Big Data
the most crucial choices about the future of ordinary voters and their children are probably made not by Brussels bureaucrats or Washington lobbyists but by engineers, entrepreneurs, and scientists who are hardly aware of the implications of their decisions, and who certainly don’t represent anyone.
Brief Tour of BBD Research in the Land of Social Science & Business
Research using BBD
Duncan Watts, Microsoft Research (NY):1. Social science problems are almost always more
difficult than they seem2. The data required to address many problems of
interest to social scientists remain difficult to assemble3. Thorough exploration of complex social problems
often requires the complementary application of multiple research traditions
Academic Research Qs using BBD
Causal questions about human and social behavior
examine new phenomena
re-examine old phenomena with better data
Research Methodologies Using BBD
Quasi experiments
Randomized experiments
Observational studies
Survey studies
Naturalexperiments
Research Communities
Researchers with social science + technical backgrounds
Information Systems
Marketing
Computational Social Science
7 Examples of BBD Studies in Top Journals
Emotional Contagion in Social Networks (Kramer et al. Proc of the National Academies of Sciences, 2014)
• Can emotional states be transferred to others via emotional contagion?
• Old question, new data• Large-scale experiment run by FB,
manipulating users’ exposure level to emotional expressions in their Facebook News Feed
Anonymous Browsing in Dating Websites (Bapna et al. Management Science, 2016)
• How does anonymous browsing affect outcomes on dating sites?• New questions about human behavior due to new technologies• Large-scale experiment on N American dating website
Identifying Influential and Susceptible Members of Social Networks (Aral and Walker, Science, 2012)
• How do individuals’ attributes modulate peer influence
• Old question in new context• Experiment on social news
aggregation website where users contribute news articles, discuss them, and rate comments
Consumption in Virtual Worlds (Hinz et al. Info Sys Research, 2015)• Does conspicuous consumption increase social status? • Age-old sociology question with new BBD data• Observational BBD from 2 virtual world websites (gaming with social network)
Impact of Online Intermediaries on HIV Transmission (Ghose & Chan MIS Quarterly, 2015)
• Does entry of major online personals ad website increase HIV prevalence?
• New context• Natural experiment on Craigslist
Impact of Info Hiding on Crowdfunding(Burtch et al. Management Science, 2016)
• Does peer influence drive information hiding in crowdfunding campaigns and effect on contributions
• New online social context• Observational BBD from large online
crowdfunding platform
Forecasting Elections with Non-Representative Polls(Wang et al. Intl. Journal on Forecasting, 2014)
• Can elections be forecast using a non-representative sample?
• Old question, new data• Survey BBD from Xbox with built-in daily poll
ONE WAY MIRRORS IN ONLINE DATINGA Randomized Field Experiment
Ravi Bapna, University of MinnesotaJui Ramaprasad, Mcgill UniversityGalit Shmueli, National Tsing Hua
UniversityAkhmed Umyarov, University of
Minnesota
Online Dating
46
of the single population in the US uses online dating to find a partner (Gelles 2011)
%
Online Dating Website
Non-anonymous Browsing (Default)
Profile Visit
Recent visitor:
Anonymous Browsing
Profile Visit
Recent visitor:
NONE
Research Question (in simple words)
How does anonymous browsing affect user behavior?
… and matching?
Formal Research Question
what is the relative causal effect of social inhibitions on search preferences vs. social inhibitions of contact initiation in dating markets?
given known gender asymmetries, how does this effect differ for men vs. women?
Randomized Field Experiment on Large Online Dating Website
50,000 users receive gift of anonymous browsing
Results
Users treated with anonymity
become disinhibited view more profiles, view more same-sex and interracial mates
get less matcheslose ability to leave a weak signal - especially harmful for women!
Role of anonymity and importance of WEAK SIGNAL
in online platforms
In Academia
Purpose: Scientific inquiry
Causal Qs are most popular• Determinants of social phenomena• Impact studies
Predictive Qs (quite rare)
In Industry
Purpose: evaluate or improve products, service, operations, etc.
Mostly predictive, but also causal• Netflix Prize: recommender system• Yahoo!, LinkedIn, FB: personalized news
content to increase user engagement/clicks
• Target: pregnancy prediction• Amazon: pricing, logistics,...• Government: campaign targeting
Study Types• Observational• Experiments• Surveys
BBD-based Research: Academia vs. Industry
Getting BBD for Research
1. Open Data, Publicly Available DataData.govTwitterKaggle (UCI MR)API and web scraping
2. Partnering with a Company• Both parties interested in research question• Data purchase• Personal connections• Partnership between school and organization
(CMU Living Analytics Research Lab)
3. CrowdsourcingAMT Replacing student subjects• Experiment subjects• Survey respondents• Cleaning and tagging data
“easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments” [Mason and Suri, 2012]
Using BBD for Research: Human Subjects
Institutional Review Board (IRB)“ethics committee”
University-level committee designated to approve, monitor, and review biomedical and behavioral research involving humans.• performs benefit-risk analysis for
proposed study• guidelines: Beneficence, Justice, and
Respect for persons
• HHS propose new IRB exemption criteria for publicly available data (or even buying it)• Council for Big Data, Ethics & Society’s letter: “these criteria for exclusion focus on the
status of the dataset… not the content of the dataset nor what will be done with the dataset, which are more accurate criteria for determining the risk profile of the proposed research
Ethics: Beyond IRBFacebook experiment [Kramer et al. 2014]: IRB Exemption
“[The work] was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.”
• Expression of Concern by PNAS editor• Varied response from public, academia,
press, ethicists, corporates [Adar 2015]
Big Behavioral Field Experiments: 5 Challenges
Big Behavioral Field Experiments: Challenges
1. Fast-Changing EnvironmentUsers keep evolvingTechnology changes fast (Netflix)Parallel experiments run every day (Amazon)
2. Multidimensional Behavior, Context, Objectives Comp. advertising & content recommendation: 3M’s [Agarwal & Chen 2016]• Multi-response (clicks, shares, likes,…)• Multi-context (mobile, email,...)• Multiple objectives (engagement, revenue,...)
4. Spillover EffectsTreatment can affect control group (social networks)How to randomly assign on a social network?Dependence among units (data analysis) [Fienberg, 2015]
3. Knowledge of Allocation; Gift Effect (≈ clinical trials) • Allocation knowledge can affect outcome• Blinding? placebo?• Online users discover their allocation via online forums• “Gift” or preferential treatment can affect outcome
BB Field Experiments: More Challenges
5. Ethical and Moral IssuesEase of running a large scale experiment quickly and at low cost -> danger of harming many people quickly
small scale pilot study?
Experiment platforms: Fair treatment & payment
BB Field Experiments: Even More Challenges
Big Behavioral Quasi-Experiments & Observational Studies: 5 Methodological Issues
Quasi-Experiments and Observational BBD: Methodological Challenges
1. Data Size & DimensionScaling of statistical inference: p-values, multiple testing“Too Big to Fail: Large Samples and the p-Value Problem” (Lin, Lucas & Shmueli ISR 2013)
Data DredgingCan detect lots of tiny & complex effectsRole of theory vs data discovery
Role of Prediction“Predictive Analytics in Information Systems Research” (Shmueli & Koppius MISQ 2011)
2. Self-Selection BiasUsers choose treatment/control groupScaling of stat/econ methods to big data
“A Tree-Based Approach for Addressing Self-selection in Impact Studies with Big Data” (Yahav, Shmueli & Mani, MIS Quarterly 2016)
More challenges (in search of causal explanations)
3. Simpson’s ParadoxCausal direction reverses when data are disaggregated
Big data: lots of possible breakdowns
“The Forest or the Trees? Tackling Simpson’s Paradox with Classification Trees” (Shmueli & Yahav, 2016)
Does a dataset display a paradox?
And finally…
5. Data Contaminated by Experiments+ some of the randomized experiments issues (fast-changing environment, etc.)
Using Observational Data: Ethical & Moral Issues
1. Web data collection by researchers
2. Data protection, data sharing, and reproducible research (Privacy - Netflix)
3. Data tagging by AMT – fair payment (+quality issues)
Large Scale Surveys
Data quality issues at large scale• duplicate responses• insincere responses
Online surveys: cheap, easy, fastLarge pool of available “workers”Supplement experimental/observational studies
The promise of para dataData on how the survey was accessed/answered (OECD Survey of Adult Skills)• time stamps of opening invitation email, survey access,…• duration for answering each question
The real gorilla in large scale surveys: Generalization
Sampling and non-sampling errors“The central issue is whether conditional effects in the sample… may be transported to desired target populations. Success depends on compatibility of causal structures in study and target populations, and will require subject matter considerations in each concrete case.” - Keiding and Louis, JRSS 2016
Statistical generalization & scientific generalization
Who do the Turkers represent?
Information Quality: The Potential of Data & Analytics to Generate Knowledge, Kenett & Shmueli, Wiley 2016“Clarifying the terminology that describes scientific reproducibility” (Kenett & Shmueli, Nature Methods 2015)
Summary
Technical ChallengesData accessAnalysis scalabilityQuick-changing environment
BBD = lots of behavioral dataWho has it?How is it analyzed?For what purpose?
Methodological ChallengesSelection biasGeneralizationData contaminated by other experimentsSpillover effectsLack of methodical lifecycle
Legal, Ethical, Moral ChallengesPrivacy violation (Netflix; networks)Risks to human subjectsCompany vs. Researcher ObjectivesGains of company at expense of individuals, communities, societies, & science
Why should mechanical engineers care about BBD?Technology is advancing in two directions
Fully automated (algorithmic) solutions
Micro-level recording of human and social behavior
Going Forward…
Convergence of Social Sciences & Engineering
Things now collect BBD (intentionally or not)
1. Research Opportunity2. Understand3. Collaborate
How does your ME work relate to BBD? To Data Analytics & Social Sci?
Engineering
Social Sciences
Data Analytics
Behavioral Big Data
Galit Shmueli 徐茉莉Institute of Service Science