36
Pinot Kishore Gopalakrishna Tuesday, August 18, 15

Pinot: Realtime Distributed OLAP datastore

Embed Size (px)

Citation preview

Pinot

Kishore Gopalakrishna

Tuesday, August 18, 15

Agenda

• Pinot @ LinkedIn - Current

• Pinot - Architecture

• Pinot Operations

• Pinot @ LinkedIn - Future

Tuesday, August 18, 15

WVMP

Tuesday, August 18, 15

Slice and Dice Metrics

Tuesday, August 18, 15

Pinot @ LinkedIn

Customers Members Internal tools

Tuesday, August 18, 15

• 100B documents

• 1B documents ingested per day

• 100M queries per day

• 10’s of ms latency

• 30 tables in prod, 250 * 3 std app nodes

Pinot @ LinkedIn

Tuesday, August 18, 15

Key features

SQL-likeinterface

Columnar storage and

indexing

Real-timedata load

Tuesday, August 18, 15

(S)QL: Filters and Aggs

SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND'day' >= 15949 AND 'day' <= 15963 ANDpaid = 'y’ ANDaction = 'stop'

Tuesday, August 18, 15

(S)QL: Group BySELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND'day' >= 15949 AND 'day' <= 15963 ANDpaid = 'y’ GROUP BY action

Tuesday, August 18, 15

(S)QL: ORDER BY and LIMITSELECT * FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND entityId = 1000 AND action = 'start' ORDER BY creationTime DESC LIMIT 1

Tuesday, August 18, 15

Whats not supported• JOIN: unpredictable performance

• NOT A SOURCE OF TRUTH

• Mutation

Tuesday, August 18, 15

Pinot• Data flow

• Query Execution

• How to use/operate

• Pinot @ LinkedIn - Future

Tuesday, August 18, 15

Broker Helix

Realtime

Historical

Kafka Hadoop

PinotArchitecture

Queries

RawData

Tuesday, August 18, 15

Pinot• Pinot segments

Tuesday, August 18, 15

Pinot Segment layout: Columnar storage

Tuesday, August 18, 15

Pinot Segment layout: Sorted Forward Index

Tuesday, August 18, 15

Pinot Segment layout: Other techniques• Indexes: Inverted index, Bitmap, RoaringBitmap

• Compression: Dictionary Encoding, P4Delta

• Multi Valued columns, skip lists,

• Hyperloglog for unique

• T-digest for Percentile, Quantile

Tuesday, August 18, 15

Data aware pre-computation

Star tree Index

Tuesday, August 18, 15

Pinot• Query Execution

Tuesday, August 18, 15

Pinot Query Execution: Distributed

Servers

S1

S3

S2

S1

S3

S2

Helix

Brokers

Tuesday, August 18, 15

Pinot Query Execution: Distributed

Servers

1.Query

S1

S3

S2

S1

S3

S2

Helix

Brokers

Tuesday, August 18, 15

Pinot Query Execution: Distributed

Servers

1.Query

S1

S3

S2

S1

S3

S2

Helix

2. Fetch routing table from HelixBrokers

Tuesday, August 18, 15

Pinot Query Execution: Distributed

Servers

1.Query

S1

S3

S2

S1

S3

S2

Helix

2. Fetch routing table from HelixBrokers

3. Scatter Request

Tuesday, August 18, 15

Pinot Query Execution: Distributed

Servers

1.Query

S1

S3

S2

S1

S3

S2

Helix

2. Fetch routing table from HelixBrokers

3. Scatter Request

4. Process Request &

send response

Tuesday, August 18, 15

Pinot Query Execution: Distributed

Servers

1.Query

S1

S3

S2

S1

S3

S2

Helix

2. Fetch routing table from HelixBrokers

3. Scatter Request

4. Process Request &

send response

5. Gather Response

Tuesday, August 18, 15

Pinot Query Execution: Distributed

Servers

1.Query

S1

S3

S2

S1

S3

S2

Helix

2. Fetch routing table from HelixBrokers

3. Scatter Request

4. Process Request &

send response

5. Gather Response

6. Return Response

Tuesday, August 18, 15

Pinot Query Execution: Single Node Architecture

EXECUTION ENGINE

INVERTED INDEX

BITMAP INDEX

COLUMN FORMAT

PLANNER

Tuesday, August 18, 15

Pinot Query Execution: Single Node Architecture

SELECT campaignId, sum(clicks)FROM Table AWHERE accountId = 121011

AND'day' >= 15949

GROUP BY campaignId

account Id daycampaign Id click

Filter Operator

Projection Operator

Aggregation Group by Operator

Combine Operator

Pinot Segments

Data sources

Matching doc ids

campaignId,Click tuple

Tuesday, August 18, 15

Pinot• Operations

Tuesday, August 18, 15

Cluster Management: Deployment

Helix

Brokers

Servers

• Brokers and Servers register themselves in Helix

• All servers start with no use case specific configuration

Controller

Tuesday, August 18, 15

On boarding new use case

Helix

Brokers

Servers

XLNT XLNT

XLNT

Create Table command

Controller

XLNT

XLNTTag

ServersTableName

Brokers3

XLNT_T1

1

Tuesday, August 18, 15

Segment Assignment

Servers

S3

S2

S1

Upload Segment S2

S1

S3

S2

S1

S3

Helix

Brokers

CopiesTableName

2XLNT_T1

Controller

Tuesday, August 18, 15

• AUTO recovery mode: Automatically redistribute segments on failure/addition of new nodes

• Custom mode: Run in degraded mode until node is restarted/replaced.

Pinot - Fault tolerance/Elasticity

Tuesday, August 18, 15

Pinot vs Druid

Druid Pinot

Architecture Realtime + Offline, Realtime only Realtime + Offline Realtime only -> consistency is hard and

schema evolution/Bootstrap is hard

Inverted Index Always On all columns, Fixed

Configurable on per column basis

Allows trade off between scanning v/s inverted index + scanning. More data can be

fit in given memory size

Data organization N/A Sorts dataOrganizing data provides speed/better compression and removes the need for

inverted index

Smart pre- materialization N/A star-tree Allows trade off between latency and space

Query Execution Layer Fixed Plan Split into Planning

and executionSmart choices can be made at runtime

based on metadata/query.

Tuesday, August 18, 15

• Documentation & tooling

• In progress - consistency among real time replicas.

• Improve cost to serve - leverage SSD, partial pre materialization

• ThirdEye - Business Metrics Monitoring

Pinot - Future

Tuesday, August 18, 15

Thank You

30Tuesday, August 18, 15