The Bigger Data Story

Preview:

Citation preview

the

BIGger data storyBen Keller [@vinegarbin bjkeller.github.io linkedin.com/in/bjkeller ] STEAM Vent 8 January 2015

ger/

Creative Commons Attribution-ShareAlike 4.0 International License

prelude

na

i=1

wi⇤f�!

ma

i=1

vi⇤

la

i=1

xi⇤g�!

na

i=1

wi⇤

Given

⇤ = K�/Iwhere

compute

Im g = ker fsuch that

algorithmic problem

na

i=1

wi⇤f�!

ma

i=1

vi⇤

la

i=1

xi⇤g�!

na

i=1

wi⇤

Given

⇤ = K�/Iwhere

compute

Im g = ker fsuch that

input

output

na

i=1

wi⇤f�!

ma

i=1

vi⇤

la

i=1

xi⇤g�!

na

i=1

wi⇤

Given

⇤ = K�/Iwhere

compute

Im g = ker fsuch that

a graph

a (directed) graph

a

bcd

1 2

3

a (directed) graph

vertices

a

bcd

1 2

3

a (directed) graph

an edge (arc)

a

bcd

1 2

3

graphs

a

bcd

1 2

3

used to represent relationships

in our algebra, represents which multiplications work

ab, abc, ad, aba, …

ba = bd = cb = … = 0

while others don’t

act I

recommender systems

Abby

Brian

Charles

David

apples

bananas

cherries

doughnuts

eggs

tell us what we might like based on similarities and what others have liked

can represent data as a graph

how do algorithms work with respect to graph?

recommender graphs

Brian

Charlescherriesapples

model similarity of items

by shared likes of users

recommender graphs

Brian

Charlescherriesapples

model similarity of items

by shared likes of users

to construct new edges

weighted by number of shared users

cherriesapples

similarity graph

Brian

Charlescherriesapples

model similarity of items

by shared likes of users

to construct new edges

weighted by number of shared users

cherriesapples

giving a new graph representing similarity between items

apples

bananas

cherries

doughnuts

eggs

making recommendations

with similarity graph

apples

bananas

cherries

doughnuts

eggs

apples

bananas

Abby

combining graph of “likes”

making recommendations

make list by ranking them by weight

recommend items similar to those a user likes

apples

bananas

cherries

doughnuts

eggs

Abby

interlude

genetic disease

• causal factors of disease are inherited

• assumed to manifest themselves as variations of the genome

• may combine in complex ways

act II

a genetic disease question

chr A

chr B

have paired regions of genome where variations cooccur in bipolar disorder patients

how are these regions related

by biology?

a genetic disease question

chr A

chr B …… …

genes in regions

“biology” of genes

a genetic disease question

… …

……

genes “biology”

a familiar graph

• model biological factors of genes by words found in descriptions of what gene does

• gives us a graph similar to starting graph for recommenders

• form similarity graph only between genes in regions

……

sCDKN2A/B

PPARG

HHEX TCF7L2

"mortality"

"g1""repression"

MTHFR

TNF

"ethanol""intake" "consumption"gene-

environmentinteraction

NURR1

FOS

"cocaine" dopaminesignalling

Look at local connections between genes

words can help find explanations

interlude

user's ways of thinking

how user thinks about:

tasks

biology of disease

tool support

how tools

allow manipulations

representinformation

manipulations

representation

tasks

knowledge

user has to manage relationships in her head

cognitive engineering

Interdisciplinary approach to supporting user performance in complex systems

act III

a model of biology

A

BC

D

A

BC

D

genetic variations

A

BC

D

genetic variations

affect

how cells/organs work

A

BC

D

genetic variations

affect

how cells/organs work

resulting insymptoms of

disease

A

BC

D

genetic variations

affect

how cells/organs work

resulting insymptoms of

disease

groups of

in particular ways

shared

certain

^

^

^

and there are other things we don’t know about

A

BC

D

data is observation of measures in cells/organs of individuals

with our graph trick we can take data relating people to observations

and create a graph showing which observations are shared by groups of people

and we can do it for each kind of observation

how do these

relate?scientist left to decide:

but…

scientist wants to know

how groups here

affect values below

and how values hereare affected by groups

of values above

scientist wants to know

how groups here

affect values below

and how values hereare affected by groups

of values above

but cannot answer easily with the tools we’ve chosen

A

BC

D

we will ultimately solve this problem by supporting the scientist in her reasoning

not by choosing our favorite tool

finale

data science

math/statistics computation

data science

math/statistics computation

human reasoning

This work is licensed under a Creative Commons Attribution-ShareAlike

4.0 International License.