Upload
benjamin-keller
View
67
Download
0
Embed Size (px)
Citation preview
the
BIGger data storyBen Keller [@vinegarbin bjkeller.github.io linkedin.com/in/bjkeller ] STEAM Vent 8 January 2015
ger/
Creative Commons Attribution-ShareAlike 4.0 International License
prelude
na
i=1
wi⇤f�!
ma
i=1
vi⇤
la
i=1
xi⇤g�!
na
i=1
wi⇤
Given
⇤ = K�/Iwhere
compute
Im g = ker fsuch that
algorithmic problem
na
i=1
wi⇤f�!
ma
i=1
vi⇤
la
i=1
xi⇤g�!
na
i=1
wi⇤
Given
⇤ = K�/Iwhere
compute
Im g = ker fsuch that
input
output
na
i=1
wi⇤f�!
ma
i=1
vi⇤
la
i=1
xi⇤g�!
na
i=1
wi⇤
Given
⇤ = K�/Iwhere
compute
Im g = ker fsuch that
a graph
a (directed) graph
a
bcd
1 2
3
a (directed) graph
vertices
a
bcd
1 2
3
a (directed) graph
an edge (arc)
a
bcd
1 2
3
graphs
a
bcd
1 2
3
used to represent relationships
in our algebra, represents which multiplications work
ab, abc, ad, aba, …
ba = bd = cb = … = 0
while others don’t
act I
recommender systems
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
tell us what we might like based on similarities and what others have liked
can represent data as a graph
how do algorithms work with respect to graph?
recommender graphs
Brian
Charlescherriesapples
model similarity of items
by shared likes of users
recommender graphs
Brian
Charlescherriesapples
model similarity of items
by shared likes of users
to construct new edges
weighted by number of shared users
cherriesapples
similarity graph
Brian
Charlescherriesapples
model similarity of items
by shared likes of users
to construct new edges
weighted by number of shared users
cherriesapples
giving a new graph representing similarity between items
apples
bananas
cherries
doughnuts
eggs
making recommendations
with similarity graph
apples
bananas
cherries
doughnuts
eggs
apples
bananas
Abby
combining graph of “likes”
making recommendations
make list by ranking them by weight
recommend items similar to those a user likes
apples
bananas
cherries
doughnuts
eggs
Abby
interlude
genetic disease
• causal factors of disease are inherited
• assumed to manifest themselves as variations of the genome
• may combine in complex ways
act II
a genetic disease question
chr A
chr B
have paired regions of genome where variations cooccur in bipolar disorder patients
how are these regions related
by biology?
a genetic disease question
chr A
chr B …… …
genes in regions
“biology” of genes
a genetic disease question
…
… …
…
……
genes “biology”
a familiar graph
• model biological factors of genes by words found in descriptions of what gene does
• gives us a graph similar to starting graph for recommenders
• form similarity graph only between genes in regions
…
……
sCDKN2A/B
PPARG
HHEX TCF7L2
"mortality"
"g1""repression"
MTHFR
TNF
"ethanol""intake" "consumption"gene-
environmentinteraction
NURR1
FOS
"cocaine" dopaminesignalling
Look at local connections between genes
words can help find explanations
interlude
user's ways of thinking
how user thinks about:
tasks
biology of disease
tool support
how tools
allow manipulations
representinformation
manipulations
representation
tasks
knowledge
user has to manage relationships in her head
cognitive engineering
Interdisciplinary approach to supporting user performance in complex systems
act III
a model of biology
A
BC
D
A
BC
D
genetic variations
A
BC
D
genetic variations
affect
how cells/organs work
A
BC
D
genetic variations
affect
how cells/organs work
resulting insymptoms of
disease
A
BC
D
genetic variations
affect
how cells/organs work
resulting insymptoms of
disease
groups of
in particular ways
shared
certain
^
^
^
and there are other things we don’t know about
A
BC
D
data is observation of measures in cells/organs of individuals
with our graph trick we can take data relating people to observations
and create a graph showing which observations are shared by groups of people
and we can do it for each kind of observation
how do these
relate?scientist left to decide:
but…
scientist wants to know
how groups here
affect values below
and how values hereare affected by groups
of values above
scientist wants to know
how groups here
affect values below
and how values hereare affected by groups
of values above
but cannot answer easily with the tools we’ve chosen
A
BC
D
we will ultimately solve this problem by supporting the scientist in her reasoning
not by choosing our favorite tool
finale
data science
math/statistics computation
data science
math/statistics computation
human reasoning
This work is licensed under a Creative Commons Attribution-ShareAlike
4.0 International License.