Building Data-Centric Businesses

Embed Size (px)

Text of Building Data-Centric Businesses

  • Daniel Aragao & Simon Hope

  • Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher

  • REALESTATE.COM.AU

    6BMarket Cap

    11MAustralian Properties

    55MVisits in September

    4.7MApp Downloads and counting

    http://realestate.com.au

  • 3,500PEOPLE

    13COUNTRIES

    34OFFICES

    TECHNOLOGY &

    SOCIAL JUSTICE

  • In the beginning

    Organising our Data

    Implementation approaches

    Hipster Batches

    Reactify

    Bring Your Own Data

    Finding the Data

    What we have learned so far

    THIS IS WHAT THE STORY IS ABOUT

  • SORRY ITS OK TO LEAVE NOW

    Nope, we didnt create a new Hadoop

    No hardcore Data Science

    There are some implementation details

    REA embraced the Cloud. AWS everywhere

    Under construction

  • IN THE BEGINNING

  • ORGANISING OUR DATA

    Increasingly, content is being distributed through searchand social platforms... Theres less visiting of publishers as destinations.

    Jeff Weiner, CEO, Linkedin

  • Data sources

    Data warehouse

    PROBLEM

  • STRATEGY

  • STRATEGY

  • STRATEGY

  • Data Warehouse

    StagingSSIS Dim Fact

    PROBLEM

  • Data Warehouse

    StagingSSIS Dim Fact

    PROBLEM

    Star schema leaky details

  • No Data Warehouse

    StagingSSIS Dim Fact

    STRATEGY

  • STRATEGY

    Data Warehouse Facade

    StagingSSIS Dim Fact

  • ???

    WHATS IN THE BOX?

  • Good things come in small packages services

    THE HIPSTER BATCH

    ???

    Hipster Batch

  • Hipster Batch

    THE HIPSTER BATCH

    Small and short lived

    Decoupled via flat files via S3

    Single purpose

    Idempotent

    Polyglot

    Minimal runtime dependencies

    Discoverable

  • SNS, SQS

    Data

    A TYPICAL IMPLEMENTATIONHipster Batch

  • SNS, SQS

    ASG, ECS, Lambda

    Data

    A TYPICAL IMPLEMENTATIONHipster Batch

  • SNS, SQS

    ASG, ECS, Lambda

    KMS

    Data

    A TYPICAL IMPLEMENTATIONHipster Batch

  • Logs

    SNS, SQS

    ASG, ECS, Lambda

    KMS

    Data

    A TYPICAL IMPLEMENTATIONHipster Batch

  • Logs

    SNS, SQS

    ASG, ECS, Lambda

    KMS

    Cloudwatch

    Data

    A TYPICAL IMPLEMENTATIONHipster Batch

  • Logs

    SNS, SQS

    ASG, ECS, Lambda

    KMS

    Cloudwatch

    S3 buckets

    Data

    A TYPICAL IMPLEMENTATIONHipster Batch

  • Hipster Batch

    HIPSTER BATCH DOES SCIENCE

    Behavioural models for targeted marketing

    Recommendation engine

    External channels

  • Hipster BatchSCIENCE!

  • x 20

    Hipster Batch

    Stats models

    SCIENCE!

  • x 20

    API

    Hipster Batch

    Stats models

    SCIENCE!

  • API

    x 20

    API

    Hipster Batch

    Stats models

    SCIENCE!

  • API

    x 20

    API

    Hipster Batch

    Stats models

    SCIENCE!

  • API

    x 20

    API

    Hipster Batch

    Stats models

    GoogleNowAPI

    SCIENCE!

  • From legacy to reactive

    REACTIFY

    Reactify

    ???

  • Reactify

    http://www.reactivemanifesto.org

    REACTIFY

    Manage Data flow with messages

    Protect consumers and care about isolation

    Resilience is important and Data replication is just fine

    Demand is elastic - and your components should be too

  • Reactify

    Listings

    Data coupling

    No resilience or elasticity

    Coupling

    PROBLEM

  • Reactify

    Listings

    SOLUTION

  • Reactify

    Listings Reactify

    SOLUTION

  • Reactify

    Listings Reactify

    SOLUTION

  • Reactify

    Listings ReactifyHipster Batch

    SOLUTION

  • Reactify

    Listings ReactifyHipster Batch

    Shielded consumers

    IsolationDecoupled

    SOLUTION

  • Reactify

    Listings

    IMPLEMENTATION

  • Reactify

    ListingsRESTAPI

    IMPLEMENTATION

  • Reactify

    ListingsRESTAPI

    IMPLEMENTATION

  • Reactify

    ListingsRESTAPI Dynamo

    Event Maker

    Event Differ

    IMPLEMENTATION

  • Reactify

    ListingsRESTAPI Dynamo

    Event Maker

    Event Differ

    Kinesis

    2

    IMPLEMENTATION

    2

  • Exposes current state only

    Stream of change notifications

    Hypertext Application Language - HAL

    Clear entity types

    Linking over embedding

    Cacheable and discoverable

    REST API

    REACTIFY REST API

  • REST API

    https://feeds.listings.realestate.com.au/combined-listings/120449689

  • REST API

    https://feeds.listings.realestate.com.au/combined-listings/120449689

  • REST API

    https://feeds.listings.realestate.com.au/combined-listings/120449689

  • REST API

    https://feeds.listings.realestate.com.au/combined-listings/120449689

  • REST API

    Event Maker

    https://feeds.listings.realestate.com.au/combined-listings/-/changes

  • REST API

    Event Maker

    https://feeds.listings.realestate.com.au/combined-listings/-/changes

  • REST API

    Event Maker

    https://feeds.listings.realestate.com.au/combined-listings/-/changes

  • REST API

    Event Maker

    https://feeds.listings.realestate.com.au/combined-listings/-/changes

  • Reactify

    Event Differ

  • Reactify

    Event Differ

  • Reactify

    Event Differ

  • Reactify

    Event Differ

  • The octopus in the box

    Did you use that data set? Errr No, we have another one

    BRING YOUR OWN DATA

  • BRING YOUR OWN DATA - BYOD

    Allow data to flow freely

    Help the business to get what they need when they need it

    Self-service

  • BYOD

  • BYOD

    CSV

  • BYOD

    CSV

    x 5

  • BYOD

    CSV

    x 5

    Smarts on datatypes

  • BYOD

    CSV

    x 5

    TableauServer

    Smarts on datatypes

  • BYOD

    CSV

    x 5

    TableauServer

    Smarts on datatypes

  • BYOD

    CSV

    x 5

    TableauServer

    Audit, auth, share

    Smarts on datatypes

  • These were the implementation approaches, now to

    FIND THE DATA

    Meaningful, automated, and easy-to-search metadata

  • WE TRIED

  • SNS, SQS

    ASG, ECS, Lambda

    KMS

    Cloudwatch

    Logs

    MORE THAN DATAHipster Batch

  • SNS, SQS

    ASG, ECS, Lambda

    KMS

    Cloudwatch

    Logs

    MORE THAN DATAHipster Batch

  • SNS, SQS

    ASG, ECS, Lambda

    KMS

    Cloudwatch

    Logs

    Dataz

    Ancestry

    MORE THAN DATAHipster Batch

  • SNS, SQS

    ASG, ECS, Lambda

    KMS

    Cloudwatch

    Logs

    Dataz

    Ancestry

    Metadata

    MORE THAN DATAHipster Batch

  • Ancestry

  • Ancestry

  • Ancestry

  • Ancestry

  • Ancestry

  • RESTAPI

    METADATA PIPELINE

    Producers

  • RESTAPI

    Ancestry

    Ancestry

    Ancestry

    METADATA PIPELINE

    Producers

  • RESTAPI

    Ancestry

    Ancestry

    Ancestry

    METADATA PIPELINE

    Producers

  • RESTAPI

    Ancestry

    Ancestry

    Ancestry

    METADATA PIPELINE

    Producers

    Scrapy

  • RESTAPI

    Ancestry

    Ancestry

    Ancestry

    METADATA PIPELINE

    Producers

    Scrapy

  • RESTAPI

    Ancestry

    Ancestry

    Ancestry

    METADATA PIPELINE

    Producers

    Scrapy

  • WHAT WE HAVE LEARNED SO FAR

    Consumers create the last-mile data as needed

    We must work with external, independent delivery channels

    Push quality back to source/producer systems

    Data belongs to the entire organisation, not to a single team

  • Ill give you my Data Warehouse when you can pry it from my cold dead hands.

  • THANK YOU

    Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher

    REALESTATE.COM.AU