Upload
carolinerose
View
1.040
Download
2
Embed Size (px)
DESCRIPTION
Shaddy Zeineddine presented Queuing w/ MongoDB & Break Media's API on April 23rd at Factual.
Citation preview
Using MongoDB as a Central Queue for Distributed Job Processing
Shaddy Zeineddine <[email protected]>
Software Developer
sandbox.chunkofwood.com
www.linkedin.com/in/shaddyz
Presented by...
Through experiences at...
Web Services API Motives High Level Requirements
> I want one search engine for all our properties.
> I want to encode videos into a set of formats which are
compatible with all browsers and devices.
> I want to display and promote related content across
properties.
> I want to display thumbnails to users as soon as they upload a
video.
> All new projects must be scalable, load balanced, highly
available.
Initial System Design
Why use MongoDB?
“Natural fit” Minimal relations between data & native JSON interchange format
Super simple replication
Awesome PHP client driver
Good documentation
Readily available support from 10gen
High, scalable performance
Developer centric
OOP friendly
The RDBMS Schema Problem
Distributed job processing cases
> Cron jobs/scheduled tasks
Jobs are run at predefined times
> Callback processing
FIFO processing for initial attempt
Subsequent attempts processed after variable waiting time
> Video encoding
Priority based processing, then FIFO
Minimize input video transfers
> Immediate thumbnail generation from uploaded videos
Uploaded videos only available on one node
One queue to rule them all: Priority Queue
> Priority is defined by the implementation
> Workeraware
> Designed to be altered
Processor Daemon Breakdown
> “Manager” parent process
> starts/stops child processes
> listens for signals {SIGTERM, SIGKILL, SIGHUP}
> “Worker” child processes
> polls queue
> processes job
> dies after processing
The Queue Collection
Default document schema
Cronjob document schema
public function enqueue(Job &$job) { $this->preEnqueue($job); $jobArray = $job->toArray(); $jobArray['_timeQueued'] = time(); $jobArray['_worker'] = $this->getWorker($jobArray); $jobArray['_priority'] = $this->getPriority($jobArray); $this->db->insert($jobArray); }
> db.cronjobQueue.findOne(){
"_id" : ObjectId("517496db1f2c8ad317000000"),"eventId" : "7fad1ff51a00924dd4991a91bb045559","jobName" : "Demo.HelloWorld","jobParams" : "","locked" : 0,"runAt" : 1366600800
}
The Queue Collection
Default dequeuing
Cronjob dequeuing
$jobArray = $this->doDequeueQuery( array('_worker' => array('$in' => array($this->worker, ''))), null, null, array('sort' => array('_priority' => $this->priorityOrder), 'remove' => true));
$nextEvent = $this->db->find(array('locked' => 0), array('runAt' => 1)) ->sort(array('runAt' => 1)) ->limit(1) ->getNext();
Cronjob Processing: A Closer Look
> Queue is populated by a static file
> Jobs are run at predefined times
# Break API Cron Schedule## * * * * * Job Parameters# ┬ ┬ ┬ ┬ ┬ ┬ ┬# │ │ │ │ │ │ └ JSON object of parameters ("{'source': 'mademan'}")# │ │ │ │ │ └───── Job name (e.g. "Search.Sync")# │ │ │ │ └──────── day of week (0 - 6) (0 is Sunday, or use names)# │ │ │ └───────────── month (1 - 12)# │ │ └────────────────── day of month (1 - 31)# │ └─────────────────────── hour (0 - 23)# └──────────────────────────── min (0 - 59)#
# Synchronize & Optimize the Solr index from MongoDB @ 2:16 am*/5 * * * * Search.Sync@daily Search.UpdateFeatures*/10 * * * * Encode.EnqueueJobs*/10 * * * * Encode.UpdateJobPriorities@daily Encode.PurgeOldJobs
Demonstrate the Cronjob processor
Please stand by...
Additional Challenges
> A job fails while being processed
Reenqueue incomplete jobs from a secondary collection
> A worker is terminated while processing a job
Reset all jobs associated to the worker on startup
> Providing uptodate progress information to other nodes
Maintain progress in secondary collection
Questions?