46
the BIGger data story Ben Keller [ @vinegarbin bjkeller.github.io linkedin.com/in/bjkeller ] STEAM Vent 8 January 2015 ger / Creative Commons Attribution-ShareAlike 4.0 International License

The Bigger Data Story

Embed Size (px)

Citation preview

Page 1: The Bigger Data Story

the

BIGger data storyBen Keller [@vinegarbin bjkeller.github.io linkedin.com/in/bjkeller ] STEAM Vent 8 January 2015

ger/

Creative Commons Attribution-ShareAlike 4.0 International License

Page 2: The Bigger Data Story

prelude

Page 3: The Bigger Data Story

na

i=1

wi⇤f�!

ma

i=1

vi⇤

la

i=1

xi⇤g�!

na

i=1

wi⇤

Given

⇤ = K�/Iwhere

compute

Im g = ker fsuch that

Page 4: The Bigger Data Story

algorithmic problem

na

i=1

wi⇤f�!

ma

i=1

vi⇤

la

i=1

xi⇤g�!

na

i=1

wi⇤

Given

⇤ = K�/Iwhere

compute

Im g = ker fsuch that

input

output

Page 5: The Bigger Data Story

na

i=1

wi⇤f�!

ma

i=1

vi⇤

la

i=1

xi⇤g�!

na

i=1

wi⇤

Given

⇤ = K�/Iwhere

compute

Im g = ker fsuch that

a graph

Page 6: The Bigger Data Story

a (directed) graph

a

bcd

1 2

3

Page 7: The Bigger Data Story

a (directed) graph

vertices

a

bcd

1 2

3

Page 8: The Bigger Data Story

a (directed) graph

an edge (arc)

a

bcd

1 2

3

Page 9: The Bigger Data Story

graphs

a

bcd

1 2

3

used to represent relationships

in our algebra, represents which multiplications work

ab, abc, ad, aba, …

ba = bd = cb = … = 0

while others don’t

Page 10: The Bigger Data Story

act I

Page 11: The Bigger Data Story

recommender systems

Abby

Brian

Charles

David

apples

bananas

cherries

doughnuts

eggs

tell us what we might like based on similarities and what others have liked

can represent data as a graph

how do algorithms work with respect to graph?

Page 12: The Bigger Data Story

recommender graphs

Brian

Charlescherriesapples

model similarity of items

by shared likes of users

Page 13: The Bigger Data Story

recommender graphs

Brian

Charlescherriesapples

model similarity of items

by shared likes of users

to construct new edges

weighted by number of shared users

cherriesapples

Page 14: The Bigger Data Story

similarity graph

Brian

Charlescherriesapples

model similarity of items

by shared likes of users

to construct new edges

weighted by number of shared users

cherriesapples

giving a new graph representing similarity between items

apples

bananas

cherries

doughnuts

eggs

Page 15: The Bigger Data Story

making recommendations

with similarity graph

apples

bananas

cherries

doughnuts

eggs

apples

bananas

Abby

combining graph of “likes”

Page 16: The Bigger Data Story

making recommendations

make list by ranking them by weight

recommend items similar to those a user likes

apples

bananas

cherries

doughnuts

eggs

Abby

Page 17: The Bigger Data Story

interlude

Page 18: The Bigger Data Story

genetic disease

• causal factors of disease are inherited

• assumed to manifest themselves as variations of the genome

• may combine in complex ways

Page 19: The Bigger Data Story

act II

Page 20: The Bigger Data Story

a genetic disease question

chr A

chr B

have paired regions of genome where variations cooccur in bipolar disorder patients

how are these regions related

by biology?

Page 21: The Bigger Data Story

a genetic disease question

chr A

chr B …… …

genes in regions

“biology” of genes

Page 22: The Bigger Data Story

a genetic disease question

… …

……

genes “biology”

Page 23: The Bigger Data Story

a familiar graph

• model biological factors of genes by words found in descriptions of what gene does

• gives us a graph similar to starting graph for recommenders

• form similarity graph only between genes in regions

……

Page 24: The Bigger Data Story

sCDKN2A/B

PPARG

HHEX TCF7L2

"mortality"

"g1""repression"

MTHFR

TNF

"ethanol""intake" "consumption"gene-

environmentinteraction

NURR1

FOS

"cocaine" dopaminesignalling

Look at local connections between genes

words can help find explanations

Page 25: The Bigger Data Story

interlude

Page 26: The Bigger Data Story

user's ways of thinking

how user thinks about:

tasks

biology of disease

Page 27: The Bigger Data Story

tool support

how tools

allow manipulations

representinformation

Page 28: The Bigger Data Story

manipulations

representation

tasks

knowledge

user has to manage relationships in her head

Page 29: The Bigger Data Story

cognitive engineering

Interdisciplinary approach to supporting user performance in complex systems

Page 30: The Bigger Data Story

act III

Page 31: The Bigger Data Story

a model of biology

A

BC

D

Page 32: The Bigger Data Story

A

BC

D

genetic variations

Page 33: The Bigger Data Story

A

BC

D

genetic variations

affect

how cells/organs work

Page 34: The Bigger Data Story

A

BC

D

genetic variations

affect

how cells/organs work

resulting insymptoms of

disease

Page 35: The Bigger Data Story

A

BC

D

genetic variations

affect

how cells/organs work

resulting insymptoms of

disease

groups of

in particular ways

shared

certain

^

^

^

and there are other things we don’t know about

Page 36: The Bigger Data Story

A

BC

D

data is observation of measures in cells/organs of individuals

Page 37: The Bigger Data Story

with our graph trick we can take data relating people to observations

and create a graph showing which observations are shared by groups of people

Page 38: The Bigger Data Story

and we can do it for each kind of observation

Page 39: The Bigger Data Story

how do these

relate?scientist left to decide:

but…

Page 40: The Bigger Data Story

scientist wants to know

how groups here

affect values below

and how values hereare affected by groups

of values above

Page 41: The Bigger Data Story

scientist wants to know

how groups here

affect values below

and how values hereare affected by groups

of values above

but cannot answer easily with the tools we’ve chosen

Page 42: The Bigger Data Story

A

BC

D

we will ultimately solve this problem by supporting the scientist in her reasoning

not by choosing our favorite tool

Page 43: The Bigger Data Story

finale

Page 44: The Bigger Data Story

data science

math/statistics computation

Page 45: The Bigger Data Story

data science

math/statistics computation

human reasoning

Page 46: The Bigger Data Story

This work is licensed under a Creative Commons Attribution-ShareAlike

4.0 International License.