Upload
oleksiy-kovyrin
View
220
Download
0
Embed Size (px)
Citation preview
8/8/2019 HUG NY Meeting: Presenting Lily
1/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Presenting LilyBay Area HBase UG - NYC - 10/11/2010
8/8/2019 HUG NY Meeting: Presenting Lily
2/26
8/8/2019 HUG NY Meeting: Presenting Lily
3/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Outerthought
software product company scalable content applications open source product portfolio
Java, REST, internet
3
L
Noteblock_03.indd 1l k_ .in 23/05/10 14:4244
8/8/2019 HUG NY Meeting: Presenting Lily
4/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Technology
4
Lily : NoSQL-based contentrepository (HBase + SOLR)
Kauri : REST centric webapp dev framework Daisy : techdoc / QDoc / publishing CMS
8/8/2019 HUG NY Meeting: Presenting Lily
5/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Needs for Scalable Content
5
wire-speed capturing batch-oriented post-
processing semantic lifting :
extracting knowledgeout of noise
data and inferred databecome one
NoSQL & write-optimized storage
map/reduce
Natural LanguageProcessing
smart contentrepositories
8/8/2019 HUG NY Meeting: Presenting Lily
6/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 6
The Lily Project
content repository: store + search
REST-centriccontent app UI
framework
}us}partners
customers
8/8/2019 HUG NY Meeting: Presenting Lily
7/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Lily essentials
www.lilyproject.org Apache license for maximal exibility (lots of) documentation at
docs.outerthought.org
7
8/8/2019 HUG NY Meeting: Presenting Lily
8/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Lily content repository
8
Scalable store (HBase) andsearch (SOLR)
exible content model index maintenance high-level API base foundation
contentapplication
repository
8/8/2019 HUG NY Meeting: Presenting Lily
9/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
HBase a datamodel where you can have column
families which keep all versions and otherswhich do not, which ts very well on ourCMS document model
ordered tables with the ability to do rangescans on them, which allows to buildscalable indexes on top of it
HDFS, a convenient place to store large blobs Apache license and community, a familiar
environment for us
9
8/8/2019 HUG NY Meeting: Presenting Lily
10/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 10
8/8/2019 HUG NY Meeting: Presenting Lily
11/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 11
8/8/2019 HUG NY Meeting: Presenting Lily
12/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 12
1. Store, 2. Search...? Ouch.
CMS = two types of search structured, logic search
numbers, strings based on logic (SQL, anyone?)
information retrieval (or: full-text search)
text based on statistics
8/8/2019 HUG NY Meeting: Presenting Lily
13/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Search ponderings
All of that, at scale
13
8/8/2019 HUG NY Meeting: Presenting Lily
14/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Structured Search HBase Indexing Library
idea from Google App Engine datastore indexes http://code.google.com/appengine/articles/
index_building.html
14
rowkey
AB
col
val3val2
col
foo6foo7
content table index table A
rowkey
val2-Bval3-A
col
o r d
e r
8/8/2019 HUG NY Meeting: Presenting Lily
15/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Full-text / IR search
Lucene? no sharding (for scale) no replication (for availability) batched index updates (not real-time)
15
8/8/2019 HUG NY Meeting: Presenting Lily
16/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Beyond Lucene
Katta scalable architecture, however only search, no indexing
Elastic Search very young (sorry)
hbasene et al. stores inverted index in HBase, does not scale all features
SOLR widely used, schema, facets, query syntax, cloud branch
16
8/8/2019 HUG NY Meeting: Presenting Lily
17/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 17
+?
=
8/8/2019 HUG NY Meeting: Presenting Lily
18/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 18
Need for reliable queuing
8/8/2019 HUG NY Meeting: Presenting Lily
19/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 19
Connecting things
we needed a reliable bridge between ourmain storage (HBase) and our index/searchserver(s) (SOLR) indexing, reindexing, mass reindexing (M/R)
we need a reliable method of updatingHBase secondary indexes
all of that eventually to run distributed distribution means coping with failure
8/8/2019 HUG NY Meeting: Presenting Lily
20/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Solution
... a QUEUE ! (Meh)
ACMEMessageQueue ? Bzzzzzt.
We wanted fault-safe HBase persistence forthe queues.Also for ease of administration.
WAL & Queue implemented on top of HBase tables
20
8/8/2019 HUG NY Meeting: Presenting Lily
21/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
WAL & Queue = RowLog Library
WAL guaranteed execution
of synchronous actions call doesnt return before
secondary action nishes e.g. update secondary indexes if all goes well,
size = #concurrent ops useful outside of Lily context
as well!
Queue triggering of async
actions e.g. (re)index (updated)
record with SOLR back-end size depends on speed of
back-end process
21
8/8/2019 HUG NY Meeting: Presenting Lily
22/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
The Sum Lily model (records & elds) mapped onto HBase (=storage) indexed and searchable through
SOLR using a WAL/Queue mechanism
implemented in HBase
runtime based on Kauri with client/server comms via Avro
(and a REST interface with JSON)
22
8/8/2019 HUG NY Meeting: Presenting Lily
23/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 23
Architecture
8/8/2019 HUG NY Meeting: Presenting Lily
24/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 24
Architecture
8/8/2019 HUG NY Meeting: Presenting Lily
25/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org
Lily roadmap
development started Sept. 2009 development trunk opened Jul. 2010
end of Oct. 2010: milestone/beta release
fully distributable spec-complete
Onwards: business-level 1.0 release (packaging, testing, performance) user/auth management & access control UI framework (Kauri) ins and outs, semantic lifting
25
8/8/2019 HUG NY Meeting: Presenting Lily
26/26
IIC TECHNOLOGIEPARK 3 B-9052 ZWIJNAARDE (GENT) www.outerthought.org 26
stevenn @outerthought.org @stevenn
Thanks for yourhospitality andattention !
Noteblock_03.indd 1 23/05/10 14:42: