30
Intelligent, Tiered, Scalable Caching with LCache 1

LCache DrupalCon Dublin 2016

Embed Size (px)

Citation preview

Page 1: LCache DrupalCon Dublin 2016

Intelligent, Tiered, ScalableCaching with LCache

1

Page 2: LCache DrupalCon Dublin 2016

Existing Cachinng Challenges

2

Page 3: LCache DrupalCon Dublin 2016

Pantheon.io

Traditional Web Caching

3

Redis or Memcache

Cache Traffic

Web Server

Web Server

Web Server

Web Server

Bottleneck

Page 4: LCache DrupalCon Dublin 2016

Pantheon.io

The Anatomy of a Bottleneck

4

Page 5: LCache DrupalCon Dublin 2016

Pantheon.io

Scaling Traditional Web Caching

5

Redis or Memcache

Cache Traffic

Web Server

Web Server

Web Server

Web Server

Redis or Memcache

● Use replication?

○ Failover issues

○ Replication lag

or slow writes

● Use sharding?

○ Consistency issues

● Still network-bound

Page 6: LCache DrupalCon Dublin 2016

Proudly Designed Elsewhere:Employing Known Solutions

6

Page 7: LCache DrupalCon Dublin 2016

Pantheon.io

Existing Solutions: Multi-Core Processors

7

Page 8: LCache DrupalCon Dublin 2016

Pantheon.io

Writes

Existing Solutions: Pantheon’s Valhalla

8

Application Container

File Mount

Cache

Application Container

File Mount

Cache Application Container

File Mount

Cache

File Server

File Server

File Server

Events

Page 9: LCache DrupalCon Dublin 2016

Pantheon.io

Row Changes(No SQL)

SQ

L

Existing Solutions: MySQL Row Replication

9

MySQL Primary

Application

MySQL Replica

shell> mysqlbinlog -vv log_file...# at 302#080828 15:03:08 server id 1 end_log_pos 356 Update_rows: table id 17 flags: STMT_END_F

BINLOG 'fAS3SBMBAAAALAAAAC4BAAAAABEAAAAAAAAABHRlc3QAAXQAAwMPCgIUAAQ=fAS3SBgBAAAANgAAAGQBAAAQABEAAAAAAAEAA////AEAAAAFYXBwbGX4AQAAAARwZWFyIbIP'/*!*/;### UPDATE test.t### WHERE### @1=1 /* INT meta=0 nullable=0 is_null=0 */### @2='apple' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */### @3=NULL /* VARSTRING(20) meta=0 nullable=1 is_null=1 */### SET### @1=1 /* INT meta=0 nullable=0 is_null=0 */### @2='pear' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */### @3='2009:01:01' /* DATE meta=0 nullable=1 is_null=0 */

Keeps Replication

Simple!

Page 10: LCache DrupalCon Dublin 2016

Pantheon.io

“Because it’s faster, of course.”

● Inspired by multicore processors

⌾ Get the working set close to the work

⌾ Trade some write performance and scale for massive read gains

⌾ Hide the coherency management

● Inspired by Pantheon’s Valhalla file system

⌾ Write-through: clients can leave at any point

⌾ Incremental changes freshen the local cache

⌾ Only as read-after-write consistent as it needs to be

● Inspired by MySQL row-based replication

⌾ Materialize complex tag deletion on the primary instance

and only replicate the key-based changes

10

Page 11: LCache DrupalCon Dublin 2016

Pantheon.io

Contrast: ChainedFastBackend

11

LCache ChainedFastBackend

Beginning of Request

Synchronizes cache writes and bin/key invalidations. One SELECT query.

Updates bin invalidation data.One SELECT query.

Read Key Reads local cache. If no key does not exist in the local cache, reads consistent cache.No query if hitting local cache.

Reads from local cache.No query if hitting local cache.

Write or Invalidate Key

Writes to local and consistent caches.One INSERT query.

Writes to local and consistent caches.Invalidates entire bin in all local caches.Up to two queries per write.

Invalidate Tag Writes to consistent cache and generates key invalidations. Multiple queries.

Writes to consistent cache.Invalidates entire bin in local caches.

End of Request Garbage-collects deletions.Executes one batched DELETE query(if cache writes have occurred) after request closes.

No activity.

Page 12: LCache DrupalCon Dublin 2016

Challenges and Solutions

12

Page 13: LCache DrupalCon Dublin 2016

Pantheon.io

Unexpected Issues

● Sites write to caches very often

⌾ Seeing 10-40 cache “sets” per page

⌾ LCache’s “sets” are expensive (transactional database plus replication to clients)

⌾ Most modules assume a miss is a good reason to “set.”

⌾ Some cache items are “set” more than “get.”

● Using tags for bins was not fast enough

⌾ Relational model created too much overhead

⌾ Materializing the clearing of a whole bin wasn’t efficient (replicated many, many changes)

⌾ Moved to native bin support

13

Page 14: LCache DrupalCon Dublin 2016

Pantheon.io

Write Models to Optimize the “Set” Path

14

Low Splay(each write to random choice of 64 keys)

High Splay(each write to random choice of 4096 keys)

10 Processes ✕ 40 Writes Each

Winner here!

And notworse here!

Page 15: LCache DrupalCon Dublin 2016

Pantheon.io

Machine Learning: Avoiding Useless “Sets”

15

Loading iterator...Iterating...Array( [lcache:10.223.176.176:18341:5:cache:environment_indicator] => 5634 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:taxonomy_term] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_8] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_1] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_2] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_3] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:calendar] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_5] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:redirects] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:backlinks] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_7] => 3037 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_6] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:frontpage] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:nodequeue_4] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:agency_search] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:glossary] => 3036 [lcache:10.223.176.176:18341:11:cache_views:ctools_export:views_view:campaign] => 3036

LCache starts

ignoring at 100 now

Page 16: LCache DrupalCon Dublin 2016

Pantheon.io

Configuration: Assigning Bins and Keys

● Better with LCache

⌾ Frequently read

⌾ Rarely written

⌾ Large

● Worse (or not ideal) with LCache

⌾ Read once or not at all (e.g. form cache should use normal database cache backend)

⌾ Things handleable earlier in the stack (e.g. Varnish instead of Drupal’s page cache)

⌾ Keys updated often (partly mitigated with machine learning)

⌾ Clearing 100+ keys with a tag (because of replication)

16

Page 17: LCache DrupalCon Dublin 2016

Built for Reliability

17

Page 18: LCache DrupalCon Dublin 2016

Pantheon.io

Test-Driven Development

18

Page 19: LCache DrupalCon Dublin 2016

Pantheon.io

Composer-Based Library

19

Page 20: LCache DrupalCon Dublin 2016

Pantheon.io

Lightweight Adapters for Frameworks

● Stateless

● Composer inclusion of the LCache library

● Modules and extensions

⌾ Drupal 7 module

⌾ Drupal 8 module

⌾ WordPress drop-in

● Drupal 8.3+ core?

20

Page 21: LCache DrupalCon Dublin 2016

Performance and Scalability

21

Page 22: LCache DrupalCon Dublin 2016

Pantheon.io

Comparing Against Redis: Performance

22

Page 23: LCache DrupalCon Dublin 2016

Pantheon.io

Comparing Against Redis: Concurrency

23

Page 24: LCache DrupalCon Dublin 2016

Pantheon.io

Going Live: Performance

24

Page 25: LCache DrupalCon Dublin 2016

Pantheon.io

Going Live: Impact on Databases

25

Page 26: LCache DrupalCon Dublin 2016

Next Steps

26

Page 27: LCache DrupalCon Dublin 2016

Pantheon.io

Further Performance Improvements

● Try mysqli with asynchronous queries for the initial synchronization.

⌾ Upside: No synchronous wait on obtaining events.

⌾ Downside: Yet another database connection.

● Synchronize (again) in the destructor after the request closes.

⌾ Upside: Potentially handles some events without users waiting.

⌾ Downside: Additional database queries.

● SQLite L1 cache

⌾ Upside: Persists across PHP-FPM restarts. Useful with CLI.

Cache can be larger than memory.

⌾ Downside: Slower writes. Possible lock contention.

27

Page 28: LCache DrupalCon Dublin 2016

Pantheon.io

Ambitions for Core

● ChainedFastBackend isn’t going to cut it.

⌾ Not usable for most cache bins.

⌾ Administrators need to carefully choose when to introduce it.

⌾ Degrades rapidly on cache writes.

● Even just the LCache L2 component is faster than Drupal’s built-in caches.

⌾ INSERT-only model is a big win.

⌾ LCache can use a Null L1 seamlessly.

● Relying on Composer-based libraries is widespread in Drupal 8.

● A default cache for most bins

28

Page 29: LCache DrupalCon Dublin 2016

Pantheon.io

PSR-6 and PSR-16

● PSR-6

⌾ No concept of cache tags, an essential part of Drupal 8 caching.

⌾ No concept of retrieving invalidated items.

(Not supported in LCache yet, but supported by Drupal 8.)

⌾ Interesting concept of deferred persistence.

● PSR-16

⌾ Counter interface wouldn’t be consumed by Drupal 8 (but would be by WordPress).

⌾ Mostly built on PSR-6.

29

Page 30: LCache DrupalCon Dublin 2016

@DavidStrauss

[email protected]

30

Questions?