15
Using MongoDB as a Central Queue for Distributed Job Processing

Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Embed Size (px)

DESCRIPTION

Shaddy Zeineddine presented Queuing w/ MongoDB & Break Media's API on April 23rd at Factual.

Citation preview

Page 1: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Using MongoDB as a Central Queue for Distributed Job Processing

Page 2: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Shaddy Zeineddine <[email protected]>

Software Developer

sandbox.chunkofwood.com

www.linkedin.com/in/shaddyz

Presented by...

Through experiences at...

Page 3: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Web Services API Motives ­ High Level Requirements

> I want one search engine for all our properties.

> I want to encode videos into a set of formats which are 

compatible with all browsers and devices.

> I want to display and promote related content across 

properties.

> I want to display thumbnails to users as soon as they upload a 

video.

> All new projects must be scalable, load balanced, highly 

available.

Page 4: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Initial System Design

Page 5: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Why use MongoDB?

“Natural fit” ­ Minimal relations between data & native JSON interchange format

Super simple replication

Awesome PHP client driver

Good documentation

Readily available support from 10gen

High, scalable performance

Developer centric

OOP friendly

Page 6: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

The RDBMS Schema Problem

Page 7: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Distributed job processing cases

> Cron jobs/scheduled tasks

Jobs are run at predefined times

> Callback processing

FIFO processing for initial attempt

Subsequent attempts processed after variable waiting time

> Video encoding

Priority based processing, then FIFO

Minimize input video transfers

> Immediate thumbnail generation from uploaded videos

Uploaded videos only available on one node

Page 8: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

One queue to rule them all: Priority Queue

> Priority is defined by the implementation

> Worker­aware

> Designed to be altered

Page 9: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Processor Daemon Breakdown

> “Manager” parent process

> starts/stops child processes

> listens for signals {SIGTERM, SIGKILL, SIGHUP}

> “Worker” child processes

> polls queue

> processes job

> dies after processing

Page 10: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

The Queue Collection

Default document schema

Cronjob document schema

public function enqueue(Job &$job) { $this->preEnqueue($job); $jobArray = $job->toArray(); $jobArray['_timeQueued'] = time(); $jobArray['_worker'] = $this->getWorker($jobArray); $jobArray['_priority'] = $this->getPriority($jobArray); $this->db->insert($jobArray); }

> db.cronjobQueue.findOne(){

"_id" : ObjectId("517496db1f2c8ad317000000"),"eventId" : "7fad1ff51a00924dd4991a91bb045559","jobName" : "Demo.HelloWorld","jobParams" : "","locked" : 0,"runAt" : 1366600800

}

Page 11: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

The Queue Collection

Default dequeuing

Cronjob dequeuing

$jobArray = $this->doDequeueQuery( array('_worker' => array('$in' => array($this->worker, ''))), null, null, array('sort' => array('_priority' => $this->priorityOrder), 'remove' => true));

$nextEvent = $this->db->find(array('locked' => 0), array('runAt' => 1)) ->sort(array('runAt' => 1)) ->limit(1) ->getNext();

Page 12: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Cronjob Processing: A Closer Look

> Queue is populated by a static file

> Jobs are run at predefined times

# Break API Cron Schedule## * * * * * Job Parameters# ┬ ┬ ┬ ┬ ┬ ┬ ┬# │ │ │ │ │ │ └ JSON object of parameters ("{'source': 'mademan'}")# │ │ │ │ │ └───── Job name (e.g. "Search.Sync")# │ │ │ │ └──────── day of week (0 - 6) (0 is Sunday, or use names)# │ │ │ └───────────── month (1 - 12)# │ │ └────────────────── day of month (1 - 31)# │ └─────────────────────── hour (0 - 23)# └──────────────────────────── min (0 - 59)#

# Synchronize & Optimize the Solr index from MongoDB @ 2:16 am*/5 * * * * Search.Sync@daily Search.UpdateFeatures*/10 * * * * Encode.EnqueueJobs*/10 * * * * Encode.UpdateJobPriorities@daily Encode.PurgeOldJobs

Page 13: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Demonstrate the Cronjob processor

Please stand by...

Page 14: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Additional Challenges

> A job fails while being processed

Re­enqueue incomplete jobs from a secondary collection

> A worker is terminated while processing a job

Reset all jobs associated to the worker on startup

> Providing up­to­date progress information to other nodes

Maintain progress in secondary collection

Page 15: Shaddy Zeineddine: Queuing w/ MongoDB & BreakMedia's API

Questions?