67
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Gaming Ops Running High-Performance Ops for Mobile Gaming Eduardo Saito – Director, Engineering Nick Dor – Sr. Director, Engineering GREE International Friday, November 15, 2013

AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Embed Size (px)

DESCRIPTION

Presentation from GREE Ops team at AWS re:Invent conference in Las Vegas in 2013

Citation preview

Page 1: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Gaming OpsRunning High-Performance Ops for Mobile Gaming

Eduardo Saito – Director, Engineering

Nick Dor – Sr. Director, Engineering

GREE International Friday, November 15, 2013

Page 2: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming
Page 3: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale

– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 4: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

NOC

Ops

Dev

SME (Network, DBA,…)

Othermonitoring tools…

TriageEscalationCommunication

Incident Management

Page 5: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

NOC, automated

Ops Dev

Critical

Critical

Non-Critical

Othermonitoring tools…

Application-level issue?Who’s the dev of this game? Phone #?I can’t find the dev… who’s his manager?Oh, the problem is in the backend service, who’s the dev for that service?

Page 6: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Alert Workflow - DevOps way

Ops

Dev, Game X, Server

Dev, Game Y, Client/iOSDev, Service A

Each alert go directly to the right team that can resolve it !

Dev, Service B

Analytics

Page 7: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

Alerts go to the person that can resolve it

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

App-level alerts can be triggered by issues in:

• Server-side• Client-side

• iOS• Android

Page 8: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Dev and Ops are responsible

Team In pager duty

Ops 8

Dev 32, from ~20 games(server-side or client-side, android or iOS developers)

Analytics 5

Page 9: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Big, Simple Status Dashboard

Page 10: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Big dashboard = quick status

Page 11: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Big dashboard=meta monitoring

Page 12: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

IM Bot informs in the game channel that an alert was triggered

Use IM Bot for status

Both Ops and Dev receive the alert, troubleshoot

IM Bot = collaboration

IM Bot detects issue is resolved and send all-clear

IM Bot = transparency

Page 13: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Review your incidents and alerts

• Monday morning incident review meeting– Weekly on-call hand-over– Address false-positives / fine-tune your monitoring– Heads-up for events / major releases

• Problem management– Any major or recurrent incident = Problem– Problem = requires post-mortem– Remediation items from post-mortem also tracked weekly till closure

Page 14: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Incident ManagementLessons Learned

• Use automatic paging/escalation tools• Make the alerts go to the right team directly• Use big display dashboard• Use IM-bots to communicate outages• Do weekly reviews of the incidents / alerts• Do post-mortems, follow-up on remediation items

Page 15: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale

– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 16: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Change Management

Type Content Owner Tool

Configuration Management

3rd. Party packages and configuration

Ops Puppet

Release – code deploy

1st. Party code Dev Jenkins + In-house scripts

Release – asset deploy

1st. Party – images / new game content / new missions

Dev Jenkins + In-house scripts

Page 17: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Configuration Management

pull push

Ops do changes / test locally

peerreview

pull changes to prod puppet

puppet clients (prod servers) pull changes

syntaxvalidation

not good

Page 18: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Configuration Management Benefits

• Automate and speed-up deployment• Repeatable• Declarative modules/manifests = documentation• All prod changes:

– peer-reviewed via pull-requests in Git– validated by Puppet lint– locally tested via Vagrant (every component has a Vagrant VM)– communicated through email and IM

Page 19: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Change Management

Type Content Owner Tool

Configuration Management

3rd. Party packages and configuration

Ops Puppet

Release – code deploy

1st. Party code Dev Jenkins + In-house scripts

Release – asset deploy

1st. Party – images / new game content / new missions

Dev Jenkins + In-house scripts

Page 20: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Release Management – Code deploy

pushQA

Prod

Beta

Deployhostdev

dev

S3If Prod deploy, in Ops channel of that project:

In QA/dev channel of that project:

Page 21: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Change Management

Type Content Owner Tool

Configuration Management

3rd. Party packages and configuration

Ops Puppet

Release – code deploy

1st. Party code Dev Jenkins + In-house scripts

Release – asset deploy

1st. Party – images / new game content / new missions

Dev Jenkins + In-house scripts

Page 22: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Release Management – Asset deploy

CodeReview

Warns?

Ops approval

Override?

Yes

Yes

NoDev kick off new asset deploy job

Run validation

Deploy to prod

Page 23: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Change Management Lessons Learned

• Changes are made directly by the team that is responsible for that code– 3rd. party code is configuration management = owned by Ops– 1st. party code is release management = owned by Dev

• Changes are made through tools– Configuration management through Puppet– Release management through Jenkins + internal tool

• No change is done manually• All changes are communicated and tracked

Page 24: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale

– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 25: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Auto-scale use-cases

–On-demand• for the daily traffic fluctuations and organic growth

–Scheduled• for in-game events

Page 26: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Auto-scale on-demand and scheduled

CPU

# instances in ELB

# auto-scale instances

Page 27: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Scheduled Auto-scale

1- Scheduled pre-provisioning config enabled

CPU

# instances in ELB

# auto-scale instances

as-put-scheduled-update-group-action ccios-app-ScheduledUpFriday

--auto-scaling-group ccios-app-asg --recurrence “00 17 * * 5”

--min-size 16

Scheduled action

Page 28: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Scheduled Auto-scale

2 - Spare capacity in place, ready for event

CPU

# instances in ELB

# auto-scale instances

Page 29: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Scheduled Auto-scale

3 - Event starts, 4x spike

CPU

# instances in ELB

# auto-scale instances

ADD EVENT SCREENSHOT HERE

Page 30: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

On-demand Auto-scale

4 – On-demand auto-scale reacts to CPU above 60% and adds more servers

CPU

# instances in ELB

# auto-scale instances

as-put-scaling-policyccios-app-ScaleUpPolicy60

--auto-scaling-group ccios-app-asg--adjustment=8 --type ChangeInCapacity

On-demand policy

Page 31: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

On-demand Auto-scale

5 - Scheduled pre-provisioning config is removed

CPU

# instances in ELB

# auto-scale instances

as-put-scheduled-update-group-action ccios-app-ScheduledDownFriday --auto-scaling-group ccios-app-asg

--recurrence "0 21 * * 5" --min-size 2

Scheduled action

Page 32: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

On-demand Auto-scale

6 – On-demand auto-scale terminate some instances as CPU drops below 40%

CPU

# instances in ELB

# auto-scale instances

as-put-scaling-policyccios-app-ScaleDownPolicy40

--auto-scaling-group ccios-app-asg--adjustment=-2 --type ChangeInCapacity

On-demand policy

Page 33: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Auto-scale bootstrap workflowEvent Description DurationCloudwatch alarm is triggered Eg. CPU > 60% for 5 minutes 5 minutes

Auto-scale policy is executed Launches n new instances 2 minutes

User-data script is executed This script is defined on the autoscale launch config. Installs base packages, gets instance_id, IP and hostgroup

1 minute

Bootstrap script is executed This script is loaded from S3. It renames host, runs puppet, deploy code, starts web service

11 minutes

Health-check passes and servers start to get traffic

Health-check must pass before ELB start to send traffic to new host

1 minute

Page 34: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Auto-scale external dependencies

Dependency How to resolveConfiguration Management (Puppet/Chef)

Pre-load all necessary package in the AMI / architecture HA for config management

External Repo Pre-load all necessary packages in the AMI / setup internal HA repo

Code deploy Same as above, or put in S3

Monitoring registration Make it asynchronous

Server registration Make it asynchronous

Page 35: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Auto-scale Lessons Learned

• Reduce time to spin-up new instances:– Pre-install all base packages into AMI

• Address those risks:– on-demand and scheduled AS conflicts– bootstrap validation and graceful termination– health-checks: keep it simple– keep some servers out of auto-scale pool, just in case– map and resolve/monitor external dependencies for auto-scale – consider using 2 different thresholds, for quicker ramp-up

Page 36: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 37: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

• under-utilized hosts• overloaded hosts

• EBS/ELB not in use

• exposed DBs• EC2 behind ELB exposed

directly

• AZ / region distribution• backup audit

• un-healthy instances in ELB• ELB misconfigs

• optimal # of RI• hosts outside RI• cost break-down using tags• estimate on-demand costs

Cloud Optimization areas

Cost Usage

AvailabilitySecurity

Page 38: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Cloud Optimization tools

AWS Trusted Advisor 3rd. Party commercial tools

Open Source tools (eg. Netflix Ice)

In-house tools

Excel !

Page 39: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Cloud Optimization Lessons Learned

• Try Trusted Advisor

• Pilot 3rd.-party solutions

• Evaluate what metrics are important for each component of your architecture

• Do in-house development for other optimizations you need that are not covered by TA or 3rd. party solutions

• Tag all assets! Automate tagging!

Page 40: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 41: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

GREE Games• All Mobile, all Free-to-Play

– iOS & Android smart phones– Big focus on tablets

• Role Playing Games (RPG)– Multi-million dollar franchise, top-grossing titles– Some of the oldest games on the App Store

• Hardcore– Deeper more intense gameplay mechanics

• Real-Time Strategy (RTS)– Fast action, small unit management

• Casino & Casual Games– Familiar games, wider audience, casual play

Page 42: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Example Game Architecture – RPG

• Application Servers– PHP – Game events Analytics

• Cache Layer– Memcached Elasticache

• Batch Processing Servers– Node.js (moving to GO)– Batches database writes

• Database– MySQL RDS

RDS RDS RDSFailover

DB

ELB

App App AppApp

Cache Cache CacheCache

Batch Batch

Page 43: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Caching Strategy - Current

• Game architecture predates stable NoSQL– We wanted similar performance at scale– Keep combined average internal response times below 300ms

• Memcache Authoritative– Still use an RDBMS; potential data loss is limited

• Allows for cheaper/simpler DB layer– Always do full row replacements (ie: no current_row_value +1)

Page 44: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Data Flow• Reads

– ELB App Cache

• Writes (Synchronous)– ELB App Cache DB– ELB App Cache Batch DB – Standard write-through– No blind writes; always fetch current ver.

• Writes (Asynchronous)– Batch DB– Batch writes to DB every 30 seconds

RDS RDS RDS

ELB

App App AppApp

Cache Cache CacheCache

Batch Batch

Page 45: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Batch Processor

• 80% of game write traffic is Async– Each write is versioned

• Example: Player items (loot) after multiple quests– 10 items in 30 sec; app server sends 10 writes downstream– Batch processor sends last record with final item count to DB

• Greatly reduced writes on DB– Shard at table and DB server level for larger games

Page 46: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Near Future Trends for GREE OPS

• Multi-region games– Latency-sensitive games and the shift towards real-time– Geographic data replication challenges

• Continuous Delivery• Automation of Game Studio tasks

– Game design, art, data/asset deploy– Tighter event pre-provisioning and scale-down

Page 47: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

More Performance – Lower Costs

• Facebook HipHop Virtual Machine– JIT compilation & execution of PHP– 5x faster vs. Zend PHP 5.2– Achieved 3x to 4x reduction in application server count– https://github.com/facebook/hhvm

• Google GO– Used for high-concurrency applications– Achieved 2x reduction in batch processing servers vs. Node.js– http://golang.org

Page 48: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 49: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Moving a Game – Why?• Physical datacenter to AWS

– West coast East coast– Faster access to EU markets & players

• Reduce necessary attention to infrastructure– Caching & DB layer; custom high-availability middleware

• Take advantage of cloud provisioning– Scripted instance spin-ups, auto-scaling for events/load

• Save money – Reduce stand-by server pool– Provision for average load, not peak

Page 50: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Moving a Live Game – Whaaaaat?

• Live game, two platforms (iOS, Android)– Several million $$$ in combined monthly revenue– More than one million unique players/month

• ~ 30GB Dataset• Minimal downtime (< 5 minutes)

– Mostly to allow for change to reverse proxy config

• Debian CentOS• Physical machines AWS

Page 51: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Moving a Live Game - How

• Develop timeline• R&D & architecture review• Data migration & sync• Game server/client updates• Load testing• D-Day steps & checklist

Page 52: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Moving a Live Game - Timeline

• 3 months overall• DB dataset transfer validation

– Setup direct MySQL to RDS replication– Initial DB transfer time: approx. 8 hours

• Functional & performance testing– Load & capacity profile for application, DB servers– Heavy use of APM metrics – New Relic

Page 53: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Moving a Live Game - Architecture

• Changes required– Caching – discreet memcached to Elasticache nodes– Database – physical MySQL DB servers to RDS

• Decided to drop internally developed MySQL proxy– Bittersweet: great automatic failover; limited internal knowledge

• RDS failover mechanics added to possible game downtime

– Load balancers• LVS to ELB

• Processes– Code asset deployment

Page 54: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Moving a Live Game – D-Day

• Put game into maintenance (shutdown)• Break DB replication (west east)• Setup reverse proxy in datacenter

– Forward traffic from west east AWS ELB

• Bring game back online– Reverse proxy sends traffic to AWS

• Update DNS to point to ELB– Wait for DNS propagation– Slow DNS updates hit the reverse proxy in datacenter

Page 55: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Moving a Live Game – Before & After

Page 56: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 57: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Analytics & Monetization

• Specialize in “Live Events”– Higher player engagement (fun!) = more revenue

• Single-player events– “Epic Boss” – Limited-time quests

• Player organization events – Guild vs. Guild battles (World Domination, Syndicate Wars)– Raid Bosses – members help to take down a tough NPC– Tap into social “meta-gaming”

Page 58: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Modern War World Domination Results (August 2013)

Page 59: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Analytics for Player Engagement

• Player retention– 1st week and beyond– Tutorial completion rates

• Balancing mechanics– Player vs. Environment (PvE), Player vs. Player (PvP)– Encourage interaction with other players

• When too much good can be bad– Analytics needs to be paired with player feedback– Fun for all players, payers AND non-payers

Page 60: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Analytics for Decision-making

• Devices & Markets– Understand most popular devices (esp. Android)– Focus efforts on the top devices for your market

• Launching a game– “Soft-launch” – only launch in certain markets, tune game– “Hard-launch” – money down (marketing), marquee live events

• When to sunset & decommission– Depends on strategic goals, infra/engineering costs, etc.

Page 61: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Analytics – Some Scale

• Over 5000 transactions/sec sent to Analytics• Several billion game events per day

– Attacking, winning, losing, buying, clicking, swiping, etc.

• Anticipating 10x increase in next two years• Building petabyte scale data warehouse

capacity

Page 62: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Analytics Pipeline• Working towards “zero-latency” pipeline

– Latency = ETL, summarization, reporting & dashboard– Already reduced from 24 hours to 1 hour in last year

Page 63: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Agenda• Part 1 – Lessons Learned

– Incident Management– Change Management– Auto-scale– Cloud Optimization Tools and Capacity Planning

• Part 2 – Game Architecture, Analytics & Monetization– Game Architecture– Moving a live game– Analytics & Monetization– Cloud Insights

Page 64: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Cloud Insights

• Agility (Time to Deliver)• Elasticity – scale up/down quickly

– Auto Scaling is critical

• Service simplification (RDS/Elasticache/ELB)• Professional development for OPS Team

– Physical (Datacenter/Network focus) vs. Virtual (DevOps focus)

Page 65: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Cloud Insights – Lessons Learned

• Reliability & performance consistency varies• Stuff breaks often

– Develop an “anti-fragile” mindset; build to anticipate failure

• Cost-predictability still elusive• Orphaned servers

– Easy to create; must constantly clean up

• Large-scale monitoring is hard– No silver bullet yet

Page 66: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Thank You

• Thanks to the GREE OPS & Engineering Teams!

[email protected]

[email protected]

• We’re Hiring DevOps Team Members!!

http://gree-corp.com/jobs

Page 67: AWS re:Invent 2013 - MBL303 Gaming Ops - Running High-performance Ops for Mobile Gaming

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

MBL303