Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Joint Interface and Management Review • Tucson, Arizona • May 30th – June 1st, 2012XLDB Asia 2012 • Beijing, China • June 22-23, 2012 1
XLDB and the Large Synoptic Survey Telescope
Kian-Tat Lim - 林建达LSST Data Management System Architect
XLDB Asia 2012
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
What is LSST?
2
Proposed telescope to be built in Chile
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Large
3
3.2 gigapixel camera
8.4 meter diameter mirror
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Synoptic Survey
Wide: entire visible sky
Fast: image every 15 seconds
Deep: faint and distant objects
4
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Results
− Thousand-framemovie of the sky
5
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Results
− Catalogs
6
Image Metadata
Moving ObjectsCatalog
Object Catalog
Source Catalog
Difference Image Source Catalog
ProvenanceStatistics
Summaries
Calibration Engineering and Facility Database
Lots of databases, but Object and Source (and ForcedSource) are most important and largest.
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
How Big?
− Tens of billions of Objects• Hundreds of columns per Object
− Trillions of Sources• High signal-to-noise observations of Objects• Dozens of columns per Source
− Tens of trillions of ForcedSources• All observations of Objects• 7 columns
− Total space required at end of survey including all overheads, replication, and compression: 35 PB
7
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Queries
− All about an object− All objects meeting criteria− All objects near objects meeting criteria− All objects with interesting time series− All pairs of objects with similar time series
8
Criteria may involve 1–30 attributes/columns, not entire row Selectivity on individual attributes may be low When interesting objects are identified, may need large fraction of the rowNear-neighbor queries involve self-join on multi-billion row table, but spatially localizedPairing time series may involve self-join on multi-trillion row table!
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Usual Needs
ScalableFast
Fault-tolerantCost-effectiveOpen Source
9
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
qserv
Prototype system
Demonstrates feasibility
Useful for large-scale Data Challenges
Will be turned into production system during construction
10
Don’t expect too much. Mostly the work of one person, Daniel Wang.
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Supporting ad hoc Queries
− Random small queries• Indexing and sharding (also key/value)
− Narrow, full-table scans and aggregates• Vertical partitioning
− Diverse, simultaneous scans• Shared scans
− qserv may need to support all three
11
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Architecture
− MPP RDBMS on shared-nothing commodity cluster, with incremental scaling, non-disruptive failure recovery
− Data clustered spatially and by time, partitioned with overlaps• Two-level partitioning
• 2nd level materialized on-the-fly
• Transparent to end-users
− Selective indices to speed up interactive queries, spatial searches, joins including time series analysis
− Shared scans− Custom software based on open source:
RDBMS (MySQL) + xrootd• SciSQL: MySQL UDFs for HTM-based spatial indexing
−
12
Apologies to Martin Kersten for independently choosing a name close to his SciQL.
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Baseline Architecture
13
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Prototype Implementation
14
Intercepting user queries
Worker dispatch, query fragmentation
generation, spatial indexing, query
recovery, optimizations, scheduling, aggregation
Communication, replication
Metadata, result cache
MySQL dispatch, shared scanning, optimizations,
scheduling
Single node RDBMS
RDBMS-agnostic
XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15
Large Scale Tests
− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed
2 billion objects, 55 billion sources, total ~32 TB
− Tested queries• Interactive (object retrieval,
object time series, spatially restricted filter)
• Scans (full sky filter, densities)• Joins (near neighbor,
sources not near objects)• Concurrency
XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15
Large Scale Tests
− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed
2 billion objects, 55 billion sources, total ~32 TB
− Tested queries• Interactive (object retrieval,
object time series, spatially restricted filter)
• Scans (full sky filter, densities)• Joins (near neighbor,
sources not near objects)• Concurrency
Object retrieval
~4-9s
XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15
Large Scale Tests
− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed
2 billion objects, 55 billion sources, total ~32 TB
− Tested queries• Interactive (object retrieval,
object time series, spatially restricted filter)
• Scans (full sky filter, densities)• Joins (near neighbor,
sources not near objects)• Concurrency
Full-sky density
~3-8m
XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15
Large Scale Tests
− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed
2 billion objects, 55 billion sources, total ~32 TB
− Tested queries• Interactive (object retrieval,
object time series, spatially restricted filter)
• Scans (full sky filter, densities)• Joins (near neighbor,
sources not near objects)• Concurrency
~10m – 5h
XLDB Asia 2012 • Beijing, China • June 22-23, 2012 15
Concurrency Test
Large Scale Tests
− Setup• 150 nodes• ~10% of DR1 data set: realistically distributed
2 billion objects, 55 billion sources, total ~32 TB
− Tested queries• Interactive (object retrieval,
object time series, spatially restricted filter)
• Scans (full sky filter, densities)• Joins (near neighbor,
sources not near objects)• Concurrency
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Scalability Testing
− Constant data/node
16
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Status
− Cleaning up for end-user testing− Then adding features:
• Shared scans• User tables• Fault tolerance• Updates• Query management
− Code available:git://git.lsstcorp.org/LSST/DMS/qserv.githttps://launchpad.net/scisql
17
XLDB Asia 2012 • Beijing, China • June 22-23, 2012
Thoughts on the Future
− qserv• Incorporate MonetDB back-end
− SciDB• What about the petabytes of raw image data?• Perhaps store in an array database• Cutouts, mosaics, image manipulation become queries• UDFs for detection, measurement• Evaluation before end of 2013
18