Upload
alberta-gilbert
View
274
Download
0
Embed Size (px)
Citation preview
Agenda
Principles and Patterns of large-scale service architecture
Best Practice in Large Scale Service Development
Principles and Patterns of large-scale service architecture
Challenges at Internet Scale• Online Services manage …
– Millions of active users worldwide– Millions of URL requests per day
• … worldwide– In 39 countries– 24x7x365
• millions or billions read / write operations / day
Architectural Forces: What do we think about?• Scalability
– Resource usage should increase linearly (or better!) with load– Design for 10x growth in data, traffic, users, etc.
• Availability– Resilience to failure– Graceful degradation– Recoverability from failure
• Latency– User experience latency– Data latency
• Manageability– Simplicity– Maintainability– Diagnostics
• Cost– Development effort and complexity– Operational cost (TCO)
Principles for Internet Scale
• 1. Partition Everything
• 2. Asynchrony Everywhere
• 3. Automate Everything
• 4. Remember Everything Fails
• 5. Embrace Inconsistency
Principle 1: Partition Everything• Split every problem into manageable chunks
– Split by data, load, and/or usage pattern– “If you can’t split it, you can’t scale it”
• Motivations– Scalability: scale horizontally and independently– Availability: isolate failures– Manageability: decouple different segments and functional areas – Cost: choose partition size that maximizes price-performance
• Partitioning Patterns– Functional Segmentation– Horizontal Split
Partition Everything: Functional Segmentation
• Segment processing into pools, services, and stages, following is what ebay do
• Segment data by entity and usage pattern
Item Transaction ProductUser Account Feedback
Search View ItemSelling Bidding
MyEbay Checkout Feedback
Partition Everything: Horizontal Split
• Load-balance processing– All servers in a pool are created equal
• Split (or “shard”) data along primary access path– Many split strategies: modulo of a key, range, lookup table,
etc.– Aggregation / routing in Data Access Layer (DAL)– Abstract developers from split logic, logical-physical mapping
Item Transaction ProductUser Account Feedback
Search View ItemSelling Bidding
MyEbay Checkout Feedback
Corollary: No Session State
• User session flows through multiple pools
• Absolutely no session state or business object caching in application tier– No HttpSession, no EJBs
• Transient session state maintained in (or referenced by) cookie, URL
• Extended session state in MemCache or database
Search View ItemSelling Bidding
MyEbay Checkout Feedback
Principle 2: Async Everywhere• Prefer Asynchronous Processing
– Move most processing to asynchronous flows– Integrate components asynchronously
• Motivations– Scalability: scale components independently– Availability: Decouple availability state, retry operations– Latency: trade execution latency for response latency– Cost: spread peak load over time
• Asynchrony Patterns– Event Queue– Message Multicast– Batch
Async EverywherePattern: Event Queue
– Primary use case writes data and queues event• Event created transactionally with insert/update of primary table
– Consumers subscribe to event• At least once delivery, rather than exactly once• No guaranteed order, rather than in-order
– Guaratee order in app protocol layer, not in transport layer• Idempotency: processing event N times should give same results as processing
once• Readback: consumer typically reads back to primary database for latest data
Async Everywhere: SearchPattern: Message Multicast
– Feeder reads item updates from primary database– Feeder publishes updates via reliable multicast
• Persist messages in intermediate data store for recovery• Publish sequenced updates via multicast to search grid• Resend recovery messages when messages are missed
– Search nodes listen to updates• Listen to assigned subset of messages• Update in-memory index in real time
Async Everywhere: BatchPattern: Periodic Batch
– Scheduled offline batch process
– Most appropriate for• Infrequent, periodic, or scheduled processing (once per day, week,
month)• Non-incremental computation (a.k.a. “Full Table Scan”)
– Examples• Import third-party data (catalogs, currency, etc.)• Generate recommendations (items, products, searches, etc.)• Process items at end of auction
– Often drives further downstream processing through Event Queue
Principle 3: Automate Everything• Prefer Adaptive / Automated Systems to Manual Systems
– Design systems which adapt to their environment
• Motivations– Scalability: scale with machines, not humans– Availability / Latency: adapt to changing environment more rapidly– Cost: machines are less expensive than humans
• Automation Patterns– Adaptive Configuration
Automate Everything: ConfigurationPattern: Adaptive Configuration
– Do not manually configure event consumers• Polling size and frequency• Number of threads
– Define service-level agreement (SLA) for a given logical event consumer• E.g., 99% of events processed in 15 seconds• Each consumer dynamically adjusts to meet defined SLA with minimal resources
– Consumers automatically adapt to changes in • Load (queue length)• Event processing time• Number of consumer instances
Principle 4: Remember Everything Fails
• Build all systems to be tolerant of failure– Assume every operation will fail and every resource will be
unavailable– Detect failure as rapidly as possible– Recover from failure as rapidly as possible– Do as much as possible during failure
• Motivation– Availability
• Failure Patterns– Failure Detection– Rollback– Graceful Degradation
Everything Fails: EBay’s Central Application Logging (CAL)
Pattern: Failure Detection– Application servers log all requests on multicast bus
• Log all application activity, database and service calls on the bus• Log request, application-generated information, and exceptions
– Listeners automate failure detection and notification• Real-time application state monitoring: exceptions and operational alerts• Historical reports by application server pool, URL, database, etc.
Everything Fails: Feature Wire-on / Wire-off
Pattern: Rollback– Every feature has on / off state driven by central
configuration– Can immediately turn feature off for operational or
business reasons– Can deploy “wired-off” to unroll dependencies
• Decouples code deployment from feature deployment
– For an application, feature “availability” is just like resource availability
Everything Fails: Resource Markdown
Pattern: Failure Detection– Application detects when database or other backend resource is unavailable or
distressed• “Resource slow” is often far more challenging than “resource down” (!)
– Application “marks down” the resource• Stops making calls to it and sends alert
Pattern: Graceful Degradation– Remove or ignore non-critical functionality– Retry or defer critical functionality
• Failover to alternate resource• Defer processing event
– Explicit “markup”• Allows resource to be restored and brought online in a controlled way
Principle 5: Embrace Inconsistency• Brewer’s CAP Theorem
– Any shared-data system can have at most two of the following properties:
• Consistency: All clients see the same data, even in the presence of updates
• Availability: All clients will get a response, even in the presence of failures
• Partition-tolerance: The system properties hold even when the network is partitioned
• This trade-off is fundamental to all distributed systems
Principle 5: Embrace InconsistencyChoose Appropriate Consistency Guarantees
• Typically we trade off immediate consistency for availability and partition-tolerance– Choose A and P; give up C
• Most real-world systems do not require immediate consistency (even financial systems!)– “Settlement”: financial parties reconcile and resolve conflicts at end
of trading day• Consistency is a spectrum
Immediate
ConsistencyBids, Purchases
Eventual
ConsistencySearch Engine, Billing System, etc.
No
ConsistencyPreferences
Principle 5: Embrace InconsistencyAvoid Distributed Transactions
• Never do distributed transactions– No JDBC client transactions, etc.
• Minimize inconsistency through state machines and careful ordering of database operations
• Reach eventual consistency through asynchronous event or reconciliation batch
Best Practice in Large Scale Service Development
Implementation
• Services are typically running as RPC or Web Service– Rest Service : use URL to identify resources and http method to
access resource– SOAP Service – Good for interop– RPC Style Service (XML is not really an efficient solution)
• Services can use other services to implement its functionality.
• Difference between service and library components – Latency, different model of memory access, concurrency and
failure control. Coarse-grained.– Isolation boundary (minimum interface) makes development,
deployment, maintenance and troubleshooting easier.
Release Cycle
• Short Release Cycles– typically on the order of weeks
• Fork a release branch at a golden CL number (Continuous Integration System is mandatory)
• Only fixes show-stoppers on release branch• What are the benefits of short cycles?
– Quick feedback-response turnaround– Never work on “Old” version– Easy to know what has changed– Great for experimental features
• Big features can be checked in partially
Release Cycle• Production Quality at Every Check In• Dev Unit Test
– Dev write more test code than dev code– Aim at high coverage number like 70% (coverage number is not as important as
scenario coverage)– Unit test also documents how the class is used– What about dependencies (programming by Interface)
• use Dependency Injection and Mock framework to implement tests
• Thorough code reviews• Share common code as much as possible• Very strict process for storage schema change• Have a testing environment almost identical to production environment.
Common platform makes this possible• Automated tests for all P1 scenarios• Mechanism to disable features/enable features in production
Deployment
• Service continuity during upgrade?– Upgrade every DC in turn• needs replicas in different DCs• Stale data due to replication latency• less than optimal performance for users during upgrade
– Rolling update servers in the same DC in turn• Code compatibility: can old code work with new code?• Data compatibility: can old code work with new data, can new code
work with old data
• Disasters after going live?– Roll back to last release, very similar to upgrade– It is easy to roll back code, but not easy to roll back data.
Deployment• Service Binaries will be uploaded to repository.• Each Service will have a configuration file. Common tools are used to start/update/stop
service based on the configuration file.• It’s good idea to use Zookeeper, Chubby… tools to manage configurations.• Job Description Language:
– Logic Service Name– Binary Name, Version– Binary Dependencies– Runtime Service Dependencies– Resource Requirement– Number of Instances
• Job Scheduler: run the job somewhere and communicate server location to naming service
• Client access service by service name• Service typically only access resource/services in the same DC: performance,
manageability• Service can run in multiple DC use replicated data
How To Scale• Offline Data Processing: MapReduce• Online Processing
– Run multiple replicas– Stateless servers will be ideal: any new server can handle new request, easy
to fail over– Use sticky session to improve cache effectiveness if user has state– Use sharding to partition data where there is too much data– Multi-level load balancer (DNS, hardware load balancer, soft load balancer)– Newer distributed storage system are much more scalable than traditional
database or file system. Easy to increase resource usage dynamically for a table. MongoDB, Hbase, Redis…
– Try to reduce synchronization between replicas – Split functionality into multiple services if necessary
• Do not do everything in a single service. It is not easy to scale.
Failures• Failures are common rather than exceptional. Services/Machine can die at
any time. Design for failures.• Monitoring
– Service/Machine need to publish status information (performance counters etc).– Use heartbeat/customized prober to detect failures. Probers are registered with
global monitoring services.– Alerts can be fired. Custom actions can be taken.
• Logging– Application log should contain enough information for analysis. It is impossible to
debug production server most of the time.– Create a unique id for ach request and save it in log entries. – Centralized logging service and tool will provide complete log for a request across
machines/services.• Scribe: open sourced by Facebook• Chukwa: open source data collection system for monitoring large distributed systems
Failures
• How to handle failures:– Make it possible for another (or any) machine to resume
processing of a request. – Operation log might need to be saved to shared storage or
replicated to another machine to continue.– Failed machine should be not corrupt data. – Idempotent operation will be good.– Optimistic locking can be used to detect conflicts.– Queues can be used on server side or client side to help
retry.– Sometime it is ok to tell user an operation failed.
Failures
• Exception Handling– Fail-fast is preferred in general.– Is the exception retryable? What should the retry
strategy be?– Poison detection: servers should not crash
repeatedly due to the same request (DOS attack.)– What exceptions should be handled? Can I catch
an unknown exception? – Transactional semantics is very useful.
Latency• Latency (what user perceives) Matters
– Google +500 ms -20% traffic– Yahoo +400 ms -5-9% full-page traffic– Amazon +100 ms -1% sales
• What can you do to reduce latency?– Know what things cost (try things out, profiling)– From high level to low level– Be lazy, caching, batching, asynchronous pattern– Know the platform well (number of concurrent connections from Browser)– Use more than one machine to handle the request– Reduce I/O: load data into memory– Large scale computing make it possible to handle interactive requests that
were not possible before
Thank You!
Questions?