Как Web-акселератор акселерирует ваш сайт / Александр Крижановский (Tempesta Technologies)

Target RPSActual RPSHyperScanVanilla NginxPCRE-JIT

11

10001000.03333467762100010001000.02222375455

22

20002000.04444485989200020002000.07778046348

33

30003000.06666728984300030003000.06666728984

44

40004000.08888971978400040004067.26809506246

55

50005000500050004168.44211438114

66

60006000600060004109.90181945957

77

70006854.60618363795700070004132.23140392475

88

80006854.76001348953800080004135.11487879612

99

90006814.08075018844900090004136.29342180833

1010

100006850.5923878368710000100004132.00374792404

10

15000685515000150004136

10

2000068512000020000.55558370444135

10

2500068552500023830.83287990534132

10

3000068513000023828.5404180584133

Web-acceleration Technologies

Alexander Krizhanovsky

Tempesta Technologies, Inc.

[email protected]

Who am I?

CEO & CTO at NatSys Lab & Tempesta Technologies

Tempesta Technologies (Seattle, WA)

Subsidiary of NatSys Lab. developing Tempesta FW a first and only hybrid of HTTP accelerator and firewall for DDoS mitigation & WAF

NatSys Lab (Moscow, Russia)

Custom software development in:

high performance network traffic processing

databases

Web-content Acceleration

Web-framework caching (e.g. Django caching)
=> whole site, pages, compiler objects, templates, any data

Downstream caching (RFC 7234, e.g. mod_cache):
reduces origin server requests (thundering herd)
=> whole site, pages

forward proxy cache (e.g. Squid, ATS)

reverse proxy (Web-accelerator) cache (e.g. Squid, Varnish etc.)

SSL acceleration

Private caching (Web-browser cache)

...eAccelerator, xslcache etc.

Web-acceleration

Web-caching
(how Web-accelerator accelerates your site)

To Cache

static (e.g. video, images, CSS, HTML)

some dynamic

Negative results (e.g. 404)

Permanent redirects

Incomplete results (206, RFC 7233 Range Requests)

Methods: GET, POST, whatever

GET /script?action=delete this is your responsibility
(but some servers don't cache URIs w/ arguments)

Not to Cache

Responses to Authenticated requests

Unsafe methods (RFC 7231 4.2.1)
(safe methods: GET, HEAD, OPTIONS, TRACE)

Explicit no-cache directive

Set-Cookie (?)

Cache POST?

Idempotent POST (e.g. web-serarch) just like GET

Non-idempotent POST (e.g. blog comment) cache response for following GET

RFC 7234 4.4: URI must be invalidated

Cache Cookies?

Varnish, Nginx, ATS don't cache responses w/ Set-Cookie by default

mod_cache and Squid do cache responses w/ Set-Cookie by default

RFC 7234:
Note that the Set-Cookie response header field [RFC6265] does not inhibit caching; a cacheable response with a Set-Cookie header field can be (and often is) used to satisfy subsequent requests to caches. Servers who wish to control caching of these responses are encouraged to emit appropriate Cache-Control response header fields.

Cache Entries Freshness

RFC 7234: freshness_lifetime > current_age

Freshness calculation:

Last-Modified when a resource was modified at origin server

Date response generation timestamp

Age the age the object has been in proxy cache

Expires when a cache entry expires

Revalidation:Conditional requests (RFC 7232, e.g. If-Modified-Since)

Background activity or on-request job

Stale Cache Entries

Sometimes is OK, e.g. Nginx: proxy_cache_use_stale

Expired responses

Invalidated by unsafe methods

Error responses for the URI

Timeout

Etc.

Cache-Control

A cache MUST obey the requirements of the Cache-Control directives

Freshness and staleness control

Explicit cache/no-cache

Private caching (browser vs proxy) caching not privacy!

Pragma: no-cache

Vary
(secondary keys say hello to databases)

Accept-Language return localized version of page (no need /en/index.html)

User-Agent mobile vs desktop (bad!)

Accept-Encoding don't send compressed page if browser doesn't understaind it

Request headers normalization is required!

Buffering vs Streaming

Buffering

Seems everyone by default

Performance degradation on large messages

200 means Ok, not incomplete response

Streaming

Tengine (patched Nginx) w/
proxy_request_buffering & fastcgi_request_buffering

More performance, but 200 doesn't mean full response

Cache Storage

Plain files (Nginx, Squid, Apache HTTPD)

Meta-data in RAM

Filesystem database

Easy to manage

Database (Apache Traffic Server, Tempesta FW)

Faster access

Persistency (experimental in Varnish, upcoming in Tempesta FW)

no real consistency

Cache Storage: mmap(2)

Alistair Wooldrige, BBC Digital Media Distribution: How we improved throughput by 4x,
http://www.bbc.co.uk/blogs/internet/entries/17d22fb8-cea2-49d5-be14-86e7a1dcde04

48 CPUs, 512GB RAM, 8TB SSD

Cache Key

Primary key: URI path + Host

POST key: URI path + Host + body

Secondary (Vary) key: any headers

E.g. Nginx custom cache key:

proxy_cache_key "$request_uri|$request_body"

Cache Purging

$ curl -X PURGE

Not RFC-defined

Squid, Varnish, Nginx (by wildcard)

Use case:

Update some resource at upstream (POST can invalidate an entry)

Send PURGE & GET reuests to the cache

Now cache is up to date

Cache Busting

No access to Web-accelerator or Web-server

E.g. force users to use a new version of CSS or Ad?

Engineering

Как Web-акселератор акселерирует ваш сайт / Александр Крижановский (Tempesta Technologies)