RDBMS to Graphs

Preview:

Citation preview

RDBMS  to  Graphs  Harnessing  the  Power  of  the  Graph  

September  2015  

Ryan  Boyd  @ryguyrg  

Agenda  

•  Origins  of  Neo4j  •  Benefits  of  Graphs  •  Designing  your  Graph  Model  •  Query  <me!  •  Fi@ng  Neo4j  into  your  Enterprise  Architecture    •  Q&A  

Neo  Technology  Overview  

Product  • Neo4j  -­‐  World’s  leading  graph  database  

• 150+  enterprise  subscrip<on  customers  including  over    50  of  the  Global  2000  

Company  • Neo  Technology,  Creator  of  Neo4j  • 100  employees  with  HQ  in  Silicon  Valley,  London,  Munich,  Paris  and  Malmö  

• $45M  in  funding  

Neo4j  AdopDon  by  Selected  VerDcals  FinancialServices Communications Health &

Life Sciences HR &

Recruiting Media &

Publishing SocialWeb

Industry & Logistics

Entertainment Consumer Retail Information Services Business Services

How  Customers  Use  Neo4j  Network &

Data Center Master DataManagement Social Recom–

mendations Identity

& Access Search &Discovery GEO

“Forrester  es<mates  that  over  25%  of  enterprises  will  be  using  graph  databases  by  2017”  

Neo4j  Leads  the  Graph  Database  RevoluDon  

“Neo4j  is  the  current  market  leader  in  graph  databases.”  

“Graph  analysis  is  possibly  the  single  most  effecDve  compeDDve  differenDator  for  organiza<ons  pursuing  data-­‐driven  opera<ons  and  decisions  aaer  the  design  of  data  capture.”  

IT  Market  Clock  for  Database  Management  Systems,  2014  hbps://www.gartner.com/doc/2852717/it-­‐market-­‐clock-­‐database-­‐management  TechRadar™:  Enterprise  DBMS,  Q1  2014  hbp://www.forrester.com/TechRadar+Enterprise+DBMS+Q1+2014/fulltext/-­‐/E-­‐RES106801  Graph  Databases  –  and  Their  PotenDal  to  Transform  How  We  Capture  Interdependencies  (Enterprise  Management  Associates)  hbp://blogs.enterprisemanagement.com/dennisdrogseth/2013/11/06/graph-­‐databasesand-­‐poten<al-­‐transform-­‐capture-­‐interdependencies/  

High  Business  Value  in  Data  RelaDonships  

Data  is  increasing  in  volume…  •  New  digital  processes  •  More  online  transac<ons  •  New  social  networks  •  More  devices  

Using  Data  RelaDonships  unlocks  value    •  Real-­‐<me  recommenda<ons  •  Fraud  detec<on  •  Master  data  management  •  Network  and  IT  opera<ons  •  Iden<ty  and  access  management  •  Graph-­‐based  search  …  and  is  ge[ng  more  connected  

Customers,  products,  processes,  devices  interact  and  relate  to  each  other    

Early  adopters  became  industry  leaders  

RelaDonal  DBs  Can’t  Handle  RelaDonships  Well  

•  Cannot  model  or  store  data  and  rela>onships  without  complexity  

•  Performance  degrades  with  number  and  levels  of  rela<onships,  and  database  size  

•  Query  complexity  grows  with  need  for  JOINs  •  Adding  new  types  of    data  and  rela>onships  requires  schema  redesign,  increasing  <me  to  market  

…  making  tradi<onal  databases  inappropriate  when  data  rela<onships  are  valuable  in  real-­‐Dme      

Slow  development  Poor  performance  Low  scalability  Hard  to  maintain  

Modeling  as  a  Graph  

The  Whiteboard  Model  Is  the  Physical  Model  

CAR  

name:  “Dan”  born:  May  29,  1970  twiber:  “@dan”  

name:  “Ann”  born:    Dec  5,  1975  

since:    Jan  10,  2011  

brand:  “Volvo”  model:  “V70”  

Property  Graph  Model  Components  

Nodes  •  The  objects  in  the  graph  •  Can  have  name-­‐value  proper&es  •  Can  be  labeled  RelaDonships  •  Relate  nodes  by  type  and  direc<on  •  Can  have  name-­‐value  proper&es  

LOVES  

LOVES  

LIVES  WITH  PERSON   PERSON  

RelaDonal  Versus  Graph  Models  

RelaDonal  Model   Graph  Model  

KNOWS  ANDREAS  

TOBIAS  

MICA  

DELIA  

Person   Friend  Person-­‐Friend  

ANDREAS  DELIA  

TOBIAS  

MICA  

Let’s  Model!  

 

Customer,  Supplier,  and  Product  (Master  Data)  Orders  (AcDvity)  

The  Domain  Model  

Except…  

Northwind  Example!    

The  QuintessenDal  Northwind  Example!  

 

NOT  JUST  ANY  

(Northwind)-­‐[:TO]-­‐>(Graph)  Building  the  Graph  Model  

Building  RelaDonships  in  Graphs  

SOLD  

Employee   Order  Order  

Locate  Foreign  Keys  

(FKs)-­‐[:BECOME]-­‐>(RelaDonships)  Correct  DirecDons  

Simple  Join  Tables  Becomes  RelaDonships  

Afributed  Join  Tables  Become  RelaDonships  with  ProperDes  

Working  Subset  (Today’s  Exercise)  

Northwind  Graph  Model  

Querying  Your  Data  

Basic  Query:  Who  do  people  report  to?  

MATCH  (:Employee{  firstName:“Steven”}  )  -­‐[:REPORTS_TO]-­‐>  (:Employee{  firstName:“Andrew”}  )    

REPORTS_TO  Steven   Andrew  

LABEL   PROPERTY  

NODE   NODE  

LABEL   PROPERTY  

Basic  Query:  Who  do  people  report  to?  

MATCH ! (e:Employee)<-[:REPORTS_TO]-(sub:Employee)!RETURN ! *!

Basic  Query:  Who  do  people  report  to?  

Basic  Query:  Who  do  people  report  to?  

Real  Query  from  a  Customer  

Find  all  direct  reports  and    how  many  people  they  manage,    

each  up  to  3  levels  down  

(SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.directly_manages AS directReportees, 0 AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION SELECT reportee.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager  

JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT depth1Reportees.pid AS directReportees, count(depth2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT T.directReportees AS directReportees, sum(T.count) AS count OUTER UNIONS FROM( SELECT reportee.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees ) AS T GROUP BY directReportees) UNION (SELECT L2Reportees.directly_manages AS directReportees, 0 AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid

JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid

WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") )

Real  Query  from  a  Customer  

MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)  WHERE  boss.name  =  “John  Doe”  RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports  

Find  all  direct  reports  and  how  many  people  they  manage,    up  to  3  levels  down  

Cypher  Query  

Real  Query  from  a  Customer  

Find  all  direct  reports  and  how  many  people  they  manage,    up  to  3  levels  down  

Cypher  Query  

SQL  Query  

MATCH  (manager)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(manager)  WHERE  boss.name  =  “John  Doe”  RETURN  manager.name  AS  Manager,        count(report)  AS  TotalReports  

MATCH  (sub)-­‐[:REPORTS_TO*0..3]-­‐>(boss),              (report)-­‐[:REPORTS_TO*1..3]-­‐>(sub)  WHERE  boss.name  =  “John  Doe”  RETURN  sub.name  AS  Subordinate,        count(report)  AS  Total  

Express  Complex  Queries  Easily  with  Cypher  

Find  all  direct  reports  and  how  many  people  they  manage,    up  to  3  levels  down  

Cypher  Query  

SQL  Query  

“We  found  Neo4j  to  be  literally  thousands  of  Dmes  faster  than  our  prior  MySQL  solu<on,  with  queries  that  require  10  to  100  Dmes  less  code.  Today,  Neo4j  provides  eBay  with  func<onality  that  was  previously  impossible.”    Volker  Pacher  Senior  Developer  

Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?  

MATCH ! p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)!WHERE! sub.firstName = ‘Robert’!RETURN ! p!

Who  is  in  Robert’s  (direct,  upwards)  reporDng  chain?  

Who’s  the  Big  Boss?  

MATCH ! p=(e:Employee)!WHERE! NOT (e)-[:REPORTS_TO]->()!RETURN ! e.firstName as bigBoss!

Who’s  the  Big  Boss?  

Product  Cross-­‐Sell  MATCH ! (choc:Product {productName: 'Chocolade'})! <-[:PRODUCT]-(:Order)<-[:SOLD]-(employee),! (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)!RETURN ! employee.firstName, other.productName, count(distinct o2) as count!ORDER BY ! count DESC!LIMIT 5;!

Product  Cross-­‐Sell  

High  Performance    

Cypher  vs  SQL  -­‐  Paths  

MATCH (u:User)-[:KNOWS*5..5]->(f5) WHERE u.name = 'John' RETURN count(f5) as size;

Cypher  Find  Size  of  John’s  5th  degree  Network  

●  100k  Users  ●  5M  

Rela<onships  ●  Query  took  5  

min,  30s  ●  Returns  count  of  

312M    Neo4j  config:    page-­‐cache  =  512m  heap  =  4G  

Cypher  vs  SQL  -­‐  Paths  

SELECT count(*) FROM user, user_friend as uf1, user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5 user as f5 WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1 AND uf5.user_2 = f5.id;

SQL  Find  Size  of  John’s  5th  degree  Network  

●  100k  Users  ●  5M  Connec<ons  ●  Query  took  1hr  55  mins  ●  Returns  312M  

 MySQL  config:    key_buffer  =  2G  join_buffer_size  =  2G  

Cypher  vs  SQL  -­‐  Paths    

SELECT count(*) FROM user, user_friend as uf1,

user_friend as uf2, user_friend as uf3, user_friend as uf4, user_friend as uf5

WHERE user.name='John' AND user.id = uf1.user_1 AND uf1.user_2 = uf2.user_1 AND uf2.user_2 = uf3.user_1 AND uf3.user_2 = uf4.user_1 AND uf4.user_2 = uf5.user_1;

SQL  Op>mize:  Only  count  on  JOIN  table  

●  100k  Users  ●  5M  Connec<ons  ●  Query  took  2  min,  30s  ●  Returns  count  of  312M  

 MySQL  config:    key_buffer  =  2G  join_buffer_size  =  2G  

Cypher  vs  SQL  -­‐  Paths  

MATCH (u:User)-[:KNOWS*4..4]->(f4) WHERE u.name = 'John' RETURN sum(size((f4)-[:KNOWS]->()))

Cypher  Op>mize:  Only  sum  degree  of  last  step  

●  100k  Users  ●  5M  

Rela<onships  ●  Query  takes  12  

sec  ●  Returns  count  of  

312M    Neo4j  config:    page-­‐cache  =  512m  heap  =  4G  

Neo4j  Clustering    Architecture  OpDmized  for  Speed  &  Availability  at  Scale  

50

Performance  Benefits  •  No  network  hops  within  queries  •  Real-­‐>me  opera>ons  with  fast  and  consistent  response  <mes    

•  Cache  sharding  spreads  cache  across  cluster  for  very  large  graphs  

Clustering  Features  •  Master-­‐slave  replica<on  with    master  re-­‐elec>on  and  failover    

•  Each  instance  has  its  own  local  cache  •  Horizontal  scaling  &  disaster  recovery  

Load  Balancer  

Neo4j  Neo4j  Neo4j  

Ge[ng  Data  into  Neo4j  

Cypher-­‐Based  “LOAD  CSV”  Capability  •  Transac<onal  (ACID)  writes  •  Ini<al  and  incremental  loads  of  up  to    10  million  nodes  and  rela<onships  

Command-­‐Line  Bulk  Loader        neo4j-­‐import  •  For  ini<al  database  popula<on  •  For  loads  with  10B+  records  •  Up  to  1M  records  per  second  

 4.58  million  things  and  their  rela<onships…  

 Loads  in  100  seconds!  

MIGRATE    ALL  DATA  

MIGRATE    GRAPH  DATA  

DUPLICATE  GRAPH  DATA  

Non-­‐graph  data   Graph  data  

Graph  data  All  data  

All  data  

RelaDonal  Database  

Graph  Database  

Applica<on  

Applica<on  

Applica<on  

Three  Ways  to  Load  Data  into  Neo4j  

Polyglot  Persistence    

Data  Storage  and  Business  Rules  Execu<on  

Data  Mining    and  Aggrega<on  

Neo4j  Fits  into  Your  Enterprise  Environment  

ApplicaDon  

Graph  Database  Cluster  

Neo4j   Neo4j   Neo4j  

Ad  Hoc  Analysis  

Bulk  AnalyDc  Infrastructure  

Graph  Compute  Engine  EDW      …  

Data  ScienDst  

End  User  

Databases  Rela<onal  NoSQL  Hadoop  

Neo4j  +  Mongo!  

Users  Love  Neo4j  

Users  Love  Neo4j  

Learn  the  Way  of  the  Graph  Quickly  and  Easily  

Quick  Start  in  1  minute  

Quick  Start:  Plan  Your  Project  

1  

2  

3  

4  

5  

6  

7  

8  

Learn  Neo4j  

Decide  on  Architecture  

Import  and  Model  Data  

Build  ApplicaDon  

Test  ApplicaDon  

Deploy  your  app  in  as  lible  as  8  weeks  

PROFESSIONAL  SERVICES  PLAN  

There  Are  Lots  of  Ways  to  Easily  Learn  Neo4j  

Huge  Ecosystem  of  Graph  Enthusiasts  

•  1,000,000+  downloads  •  20,000+  educa<on  registrants  •  18,000+  Meetup  members  •  100+  technology  and  service  partners  •  150+  enterprise  subscrip<on  customers    including  50+  Global  2000  companies  

Get  Started  Now  

Summary  of  the  Power  of  the  Graph  

•  Take  rela<onships  and  connected  data  seriously  •  Seriously  easy  to  model    •  Serious  performance    

•  Fits  in  with  your  Enterprise  Architecture  •  Easy  to  get  started  •  Fast  to  reap  the  benefits  

RDBMS  to  Graphs  Harnessing  the  Power  of  the  Graph  

Start  of  Q&A  

Ryan  Boyd  @ryguyrg