HUG NY Meeting: Presenting Lily

Embed Size (px)

Citation preview

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    1/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Presenting LilyBay Area HBase UG - NYC - 10/11/2010

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    2/26

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    3/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Outerthought

    software product company scalable content applications open source product portfolio

    Java, REST, internet

    3

    L

    Noteblock_03.indd 1l k_ .in 23/05/10 14:4244

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    4/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Technology

    4

    Lily : NoSQL-based contentrepository (HBase + SOLR)

    Kauri : REST centric webapp dev framework Daisy : techdoc / QDoc / publishing CMS

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    5/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Needs for Scalable Content

    5

    wire-speed capturing batch-oriented post-

    processing semantic lifting :

    extracting knowledgeout of noise

    data and inferred databecome one

    NoSQL & write-optimized storage

    map/reduce

    Natural LanguageProcessing

    smart contentrepositories

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    6/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 6

    The Lily Project

    content repository: store + search

    REST-centriccontent app UI

    framework

    }us}partners

    customers

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    7/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Lily essentials

    www.lilyproject.org Apache license for maximal exibility (lots of) documentation at

    docs.outerthought.org

    7

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    8/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Lily content repository

    8

    Scalable store (HBase) andsearch (SOLR)

    exible content model index maintenance high-level API base foundation

    contentapplication

    repository

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    9/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    HBase a datamodel where you can have column

    families which keep all versions and otherswhich do not, which ts very well on ourCMS document model

    ordered tables with the ability to do rangescans on them, which allows to buildscalable indexes on top of it

    HDFS, a convenient place to store large blobs Apache license and community, a familiar

    environment for us

    9

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    10/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 10

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    11/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 11

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    12/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 12

    1. Store, 2. Search...? Ouch.

    CMS = two types of search structured, logic search

    numbers, strings based on logic (SQL, anyone?)

    information retrieval (or: full-text search)

    text based on statistics

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    13/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Search ponderings

    All of that, at scale

    13

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    14/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Structured Search HBase Indexing Library

    idea from Google App Engine datastore indexes http://code.google.com/appengine/articles/

    index_building.html

    14

    rowkey

    AB

    col

    val3val2

    col

    foo6foo7

    content table index table A

    rowkey

    val2-Bval3-A

    col

    o r d

    e r

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    15/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Full-text / IR search

    Lucene? no sharding (for scale) no replication (for availability) batched index updates (not real-time)

    15

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    16/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Beyond Lucene

    Katta scalable architecture, however only search, no indexing

    Elastic Search very young (sorry)

    hbasene et al. stores inverted index in HBase, does not scale all features

    SOLR widely used, schema, facets, query syntax, cloud branch

    16

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    17/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 17

    +?

    =

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    18/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 18

    Need for reliable queuing

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    19/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 19

    Connecting things

    we needed a reliable bridge between ourmain storage (HBase) and our index/searchserver(s) (SOLR) indexing, reindexing, mass reindexing (M/R)

    we need a reliable method of updatingHBase secondary indexes

    all of that eventually to run distributed distribution means coping with failure

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    20/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Solution

    ... a QUEUE ! (Meh)

    ACMEMessageQueue ? Bzzzzzt.

    We wanted fault-safe HBase persistence forthe queues.Also for ease of administration.

    WAL & Queue implemented on top of HBase tables

    20

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    21/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    WAL & Queue = RowLog Library

    WAL guaranteed execution

    of synchronous actions call doesnt return before

    secondary action nishes e.g. update secondary indexes if all goes well,

    size = #concurrent ops useful outside of Lily context

    as well!

    Queue triggering of async

    actions e.g. (re)index (updated)

    record with SOLR back-end size depends on speed of

    back-end process

    21

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    22/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    The Sum Lily model (records & elds) mapped onto HBase (=storage) indexed and searchable through

    SOLR using a WAL/Queue mechanism

    implemented in HBase

    runtime based on Kauri with client/server comms via Avro

    (and a REST interface with JSON)

    22

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    23/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 23

    Architecture

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    24/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 24

    Architecture

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    25/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org

    Lily roadmap

    development started Sept. 2009 development trunk opened Jul. 2010

    end of Oct. 2010: milestone/beta release

    fully distributable spec-complete

    Onwards: business-level 1.0 release (packaging, testing, performance) user/auth management & access control UI framework (Kauri) ins and outs, semantic lifting

    25

  • 8/8/2019 HUG NY Meeting: Presenting Lily

    26/26

    IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 26

    stevenn @outerthought.org @stevenn

    Thanks for yourhospitality andattention !

    Noteblock_03.indd 1 23/05/10 14:42: