16
Apache Apex + Apache Geode In-Memory Streaming, Storage & Analytics Ashish Tadose

ApexMeetup Geode - Talk2 2016-03-17

Embed Size (px)

Citation preview

Page 1: ApexMeetup Geode - Talk2 2016-03-17

Apache Apex + Apache GeodeIn-Memory Streaming, Storage & Analytics

Ashish Tadose

Page 2: ApexMeetup Geode - Talk2 2016-03-17

Streaming meets In Memory Data Grid

Page 3: ApexMeetup Geode - Talk2 2016-03-17

Apache Geode: Listeners• CacheWriter / CacheListener• AsyncEventListener (queue / batch)

• Parallel or Serial• Conflation

3

Page 4: ApexMeetup Geode - Talk2 2016-03-17

Apache Geode: Events & NotificationsRegister Interest•Individual Keys OR RegEx for Keys•Updates Local Copy•Examples:

• region.registerInterest(“key-1”);• region1.registerInterestRegex(“[a-z]+“);

Continuous Query•Receive Notification when Query condition met on server•Example:– SELECT * FROM /tradeOrder t WHERE t.price > 100.00

Can be DURABLE

4

Page 5: ApexMeetup Geode - Talk2 2016-03-17

Apex: Checkpointing Today

er

Operator

er

Operator

er

Operator

Filtered

Stream

Filtered

Stream

er

OperatorInput

Stream

Enriched

Stream

Enriched

Stream

Output

Stream

Checkpoint

State

Checkpoint

State

Checkpoint

State

Persistence

In-Memory

Page 6: ApexMeetup Geode - Talk2 2016-03-17

Apex: Checkpointing with Geode

er

Operator

er

Operator

er

Operator

Filtered

Stream

Filtered

Stream

er

OperatorInput

Stream

Enriched

Stream

Enriched

Stream

Output

Stream

Checkpoint

State

Checkpoint

State

Checkpoint

State

Persistence

In-Memory

Page 7: ApexMeetup Geode - Talk2 2016-03-17

Operator Checkpointing in Geode

Apex Operator check-pointing in an IMDG (Geode store)

•Checkpointing is an essential mechanism to ensure Fault Tolerance•Apex checkpoints operator state to HDFS•Slower HDFS checkpointing hurts application performance•Checkpointing in Geode ensures that application performance is not impacted •Geode has better latency for write operations than HDFS.

Implementation: GeodeStorageAgent

https://issues.apache.org/jira/browse/APEXCORE-283

Page 8: ApexMeetup Geode - Talk2 2016-03-17

Apex: Input/Output Operators

er

Operatorer

OutputOperator

Input

StreamOutput

Stream

Checkpoint

State

Checkpoint

State

Data Store

In-Memory

No SQL Database

Page 9: ApexMeetup Geode - Talk2 2016-03-17

er

Operatorer

Geode Output

Input

StreamOutput

Stream

Checkpoint

State

Checkpoint

State

Data Store

In-Memory

Geode Output Operator

• Built-in OQL support

• Visualization support

• Persistence options

• Transaction support

Page 10: ApexMeetup Geode - Talk2 2016-03-17

Data Streams to Geode Store

Page 11: ApexMeetup Geode - Talk2 2016-03-17

Apex + Geode: Future Integrations

• Geode output operator with transactional support

• Input Operator: Ingest data from Geode to Apex DAG

• Distributed Cache Operator

• Scan Operator: Parallel query execution & result retrieval

Page 12: ApexMeetup Geode - Talk2 2016-03-17

Geode Transaction Operator

Apex Output Operator to write to Geode store with Transactions

•Apex DAG uses TransactionableStore to provide guarantee that records are written are exactly once. E.g. JdbcTransactionalStore

•Geode provides transaction support for efficient and safe coordinated operations•Geode store using transactions guarantee that records are written exactly once•Put operator backed by GeodeTransactional store can help to achieve Exactly once semantics

Implementation: GeodeWindowStore as TransactionableStore

Proposed

Page 13: ApexMeetup Geode - Talk2 2016-03-17

Input Operator: Streaming Geode data

Apex Input Operator to read from Geode store

•Apex Input operators – Ingest data from external sources into Apex DAG

•Geode provides versatile and reliable event distribution to provide Real Time updates to data• Use case – Apex operator to stream async events from Geode in DAG• Call back events reduce polling cycles over network

Implementation: GeodeRegionStreamOperator receives a newly added tuples and emits in DAG

Proposed

Page 14: ApexMeetup Geode - Talk2 2016-03-17

Geode Cache Operator

Apex+Geode Cache Operator

•Geode provides efficient Events & Notifications • Register interest – update local copies • Continuous Query

• Receive notification when Query condition met on server• Eg.g SELECT * FROM /tradeOrder t WHERE t.price > 100.00

•Use Geode events notification framework to maintain & invalidate cache.

Implementation: GeodeCacheOperatormaintains consistent cache based on subscribed keyset/query

Proposed

Page 15: ApexMeetup Geode - Talk2 2016-03-17

Geode Scan Operator

Apex+Geode Scan Operator

•Function Execution provides Parallel Query Execution•MapReduce like execution - concurrent execution on members & results are collected from members & sent to caller. •Use case: Streaming application depending on large scan result from external store

Implementation: GeodeQueryOperator execute data dependent queries on distributed regionemit results in DAG

Proposed

Page 16: ApexMeetup Geode - Talk2 2016-03-17

Questions ???

Thank You …