12. a review of condor, sge and pbs

GRID COMPUTINGA REVIEW OF CONDOR, SGE AND PBS

Sandeep Kumar PooniaHead of Dept. CS/IT, Jagan Nath University, Jaipur

B.E., M. Tech., UGC-NET

LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE

11/16/2013 1Sandeep Kumar Poonia

Condor

Condor is a resource management and job scheduling system, aresearch project from University of Wisconsin–Madison.

Condor platforms

HP systems running HPUX10.20 Sun SPARC systems running Solaris 2.6/2.7/8/9 SGI systems running IRIX 6.5 (not fully supported) Intel x86 systems running Redhat Linux 7.1/7.2/7.3/8.0/9.0, Windows NT4.0,

XP and 2003 Server (the Windows systems are not fully supported) ALPHA systems running Digital UNIX 4.0, Redhat Linux 7.1/7.2/7.3 and Tru64

5.1 (not fully supported) PowerPC systems running Macintosh OS X and AIX 5.2L (not fully supported) Itanium systems running Redhat 7.1/7.2/7.3 (not fully supported) Windows systems (not fully supported).

The architecture of a Condor pool

Resources in Condor are normally organized in the formof Condor pools. A pool is an administrated domain ofhosts, not specifically dedicated to a Condorenvironment.

Condor pool normally has one Central Manager (master host)and an arbitrary number of Execution (worker) hosts.

A Condor Execution host can be configured as a jobExecution host or a job Submission host or both.

The Central Manager host is used to manage resources andjobs in a Condor pool.

Host machines in a Condor pool may not be dedicated toCondor.

If the Central Manager host in a Condor pool crashes, jobsthat are already running will continue to run unaffected.

Queued jobs will remain in the queue unharmed, but theycannot begin running until the Central Manager host isrestarted.

Daemons in a Condor poolA daemon is a program that runs in the background once started. Toconfigure a Condor pool, the following Condor daemons need to bestarted.

Daemons in a Condor pool

The condor_master daemon runs on each host in a Condor pool tokeep all the other daemons running in the pool.

It spawns daemons such as condor_startd and condor_schedd, andperiodically checks if there are new binaries installed for any ofthese daemons.

If so, the condor_master will restart the affected daemons. In addition, if any daemon crashes, the master will send an email to

the administrator of the Condor pool and restart the daemon. The condor_master also supports various administrative commands,

such as starting, stopping or reconfiguring daemons remotely.


The condor_startd daemon runs on each host in a Condor pool. It advertises information related to the node resources to the

condor_collector daemons running on the Master host formatching pending resource requests.

This daemon is also responsible for enforcing the policies thatresource owners require, which determine under whatconditions remote jobs will be started, suspended, resumed,vacated or killed.

When the condor_startd is ready to execute a Condor job onan Execution host, it spawns the condor_starter.


The condor_starter daemon onlyruns on Execution hosts.

It is the condor_starter that actuallyspawns a remote Condor job on agiven host in a Condor pool.

The condor_starter daemon sets upthe execution environment andmonitors the job once it is running.

When a job completes, thecondor_starter sends back jobstatus information to the jobSubmission node and exits.


The condor_schedd daemon running on each host in a Condor pooldeals with resource requests.

User jobs submitted to a node are stored in a local job queuemanaged by the condor_schedd daemon.

Condor command-line tools such as condor_submit, condor_q orcondor_rm interact with the condor_schedd daemon to allow usersto submit a job into a job queue, and to view and manipulate the jobqueue.

If the condor_schedd is down on a given machine, none of thesecommands will work.

Daemons in a Condor pool The condor_shadow daemon only runs on

Submission hosts in a Condor pool and actsas the resource manager for user jobsubmission requests.

The condor_shadow daemon performsremote system calls allowing jobs submittedto Condor to be checkpointed.

Any system call performed on a remoteExecution host is sent over the network,back to the condor_shadow daemon on theSubmission host, and the results are alsosent back to the Submission host.

Daemons in a Condor poolThecondor_collectordaemon only runson the CentralManager host.This daemon

interacts withcondor_startd andcondor_schedddaemons runningon other hosts tocollect all theinformation aboutthe status of aCondor pool suchas job requestsand resourcesavailable

Thecondor_statuscommand can beused to query thecondor_collectordaemon forspecific statusinformationabout a Condorpool.


The condor_negotiator daemon only runs on the Central Managerhost and is responsible for matching a resource with a specific jobrequest within a Condor pool.

Periodically, the condor_negotiator daemon starts a negotiationcycle, where it queries the condor_collector daemon for the currentstate of all the resources available in the pool.

It interacts with each condor_schedd daemon running on aSubmission host that has resource requests in a priority order, andtries to match available resources with those requests.


The condor_kbdd daemon only runs on an Execution host installingDigital Unix or IRIX.

On these platforms, the condor_startd daemon cannot determineconsole (keyboard or mouse) activity directly from the operatingsystem.

The condor_kbdd daemon connects to an X Server and periodicallychecks if there is any user activity.

If so, the condor_kbdd daemon sends a command to thecondor_startd daemon running on the same host.

Daemons in a Condor pool The condor_ckpt_server daemon runs on a checkpoint

server, which is an Execution host, to store and retrievecheckpointed files.

If a checkpoint server in a Condor pool is down, Condorwill revert to sending the checkpointed files for a given jobback to the job Submission host.

Job life cycle in Condor


1. Job submission: A job issubmitted by a Submission hostwith condor_submit command


2. Job request advertising:Once it receives a job request,the condor_schedd daemon onthe Submission host advertisesthe request to thecondor_collector daemonrunning on the CentralManager host


3. Resource advertising: Eachcondor_startd daemon running onan Execution host advertisesresources available on the host tothe condor_collector daemonrunning on the Central Managerhost.


4. Resource matching: Thecondor_negotiator daemon runningon the Central Manager hostperiodically queries thecondor_collector daemon (Step 4)to match a resource for a user jobrequest.5. It then informs thecondor_schedd daemon running onthe Submission host of the matchedExecution host


6. Job execution: The condor_schedd daemon running onthe job Submission host interacts with the condor_startddaemon running on the matched Execution host (Step 6),7. which will spawn a condor_starter daemon (Step 7).


8. The condor_schedd daemon on the Submission hostspawns a condor_shadow daemon9. to interact with the condor_starter daemon for jobexecution.10. The condor_starter daemon running on the matchedExecution host receives a user job to execute.


11. Return output: When a job is completed, the resultswill be sent back to the Submission host by theinteraction between the condor_shadow daemonrunning on the Submission host and the condor_starterdaemon running on the matched Execution host

Security management in Condor

Condor provides strong support for authentication,encryption, integrity assurance, as well asauthorization.

A Condor system administrator using configurationmacros enables most of these security features.

When Condor is installed, there is no authentication,encryption, integrity checks or authorization checks inthe default configuration settings.

This allows newer versions of Condor with securityfeatures to work or interact with previous versionswithout security support.

An administrator must modify the configurationsettings to enable the security features.

Job management in Condor

JobA Condor job is a work unit submitted to a Condor pool forexecution.

Condor manages jobs in the following aspects.

Job typesJobs that can be managed by Condor are executable sequential orparallel codes, using, for example, PVM or MPI.A job submission may involve a job that runs over a long period, ajob that needs to run many times or a job that needs manymachines to run in parallel.

QueueEach Submission host has a job queue maintained by thecondor_schedd daemon running on the host.A job in a queue can be removed and placed on hold.


Job statusA job can have one of the following status:• Idle: There is no job activity.• Busy: A job is busy running.• Suspended: A job is currently suspended.• Vacating: A job is currently checkpointing.• Killing: A job is currently being killed.• Benchmarking: The condor_startd is running benchmarks.


Job run-time environmentsThe Condor universe specifies a Condor executionenvironment. There are seven universes in Condor 6.6.3as described below. The default universe is the Standard Universe (except

where the configuration variable DEFAULT_UNIVERSEdefines it otherwise), and tells Condor that this job hasbeen re-linked via condor_compile with Condorlibraries and therefore supports checkpointing andremote system calls.

The Vanilla Universe is an execution environment forjobs which have not been linked with Condor libraries;and it is used to submit shell scripts to Condor.


The PVM Universe is used for a parallel job written with PVM3.4.

The Globus Universe is intended to provide the standardCondor interface to users who wish to start Globus jobs fromCondor. Each job queued in the job submission file istranslated into the Globus Resource Specification Language(RSL) and subsequently submitted to Globus via the GlobusGRAM protocol.

The MPI Universe is used for MPI jobs written with the MPICHpackage.

The Java Universe is used for programs written in Java. The Scheduler Universe allows a Condor job to be executed on

the host where the job is submitted. The job does not needmatchmaking for a host and it will never be preempted.


Job submission with a shared file systemIf Vanilla, Java or MPI jobs are submitted without usingthe file transfer mechanism, Condor must use ashared file system to access input and output files.

In this case, the job must be able to access the datafiles from any machine on which it could potentiallyrun.


Job submission without a shared file system Condor also works well without a shared file

system. A user can use the file transfer mechanism in

Condor when submitting jobs. Condor will transfer any files needed by a job from

the host machine where the job is submitted into atemporary working directory on the machinewhere the job is to be executed.

Condor executes the job and transfers output backto the Submission machine.

Job priorityJob priorities allow the assignment of a priority level to eachsubmitted Condor job in order to control the order ofexecution. The priority of a Condor job can be changed.


Chirp I/OThe Chirp I/O facility in Condor provides a sophisticated I/Ofunctionality. It has two advantages over simple whole-filetransfers. First, the use of input files is done at run time rather than

submission time. Second, a part of a file can be transferred instead of

transferring the whole file.

Resource management in CondorCondor manages resources in a Condor pool in the following aspects.

Tracking resource usageThe condor_startd daemon on each host reports to thecondor_collector daemon on the Central Manager host about theresources available on that host.

User priorityCondor hosts are allocated to users based upon a user’spriority.A lower numerical value for user priority means higherpriority, so a user with priority 5 will get more resourcesthan a user with priority 50.

Job scheduling in a Condor pool is not strictly based on afirstcome- first-server selection policy. Rather, to keep large jobsfrom draining the pool of resources, Condor uses a unique up-down algorithm that prioritizes jobs inversely to the number ofcycles required to run the job.

Condor supports the following policies in scheduling jobs. First come first serve: This is the default scheduling policy. Preemptive scheduling: Preemptive policy lets a pending

high-priority job take resources away from a running job oflower priority.

Dedicated scheduling: Dedicated scheduling means that jobsscheduled to dedicated resources cannot be preempted.

Job scheduling policies in Condor

Resource matching is used to match an Executionhost to run a selected job or jobs.

The condor_collector daemon running on the CentralManager host receives job request advertisementsfrom the condor_schedd daemon running on aSubmission host and resource availabilityadvertisements from the condor_startd daemonrunning on an Execution host.

A resource match is performed by thecondor_negotiator daemon on the Central Managerhost by selecting a resource based on jobrequirements.

Resource matching in Condor

Jobs can be submitted directly to a Condor poolfrom a Condor host, or via Globus (GT2 or earlierversions of Globus).

The Globus host is configured with Condorjobmanager provided by Globus.

When using a Condor jobmanager, jobs aresubmitted to the Globus resource, e.g. usingglobus_job_run.

However, instead of forking the jobs on the localmachine, jobs are re-submitted by Globus toCondor using the condor_submit tool.

Condor support in Globus

Submitting jobs to a Condor pool via Condor or Globus

Submitting jobs to Globus via Condor-G

Sun Grid Engine

The SGE is a distributed resource management andscheduling system from Sun Microsystems that can beused to optimize the utilization of software andhardware resources in a UNIX-based computingenvironment.The SGE can be used to find a pool of idle resources andharnesses these resources; also it can be used fornormal activities, such as managing and scheduling jobsonto the available resources.The latest version of SGE is Sun N1 Grid Engine (N1GE)version 6

Sun Grid Engine: The architecture of the SGE


Master host: A single host is selectedto be the SGE master host.This host handles all requests fromusers, makes job-scheduling decisionsand dispatches jobs to executionhosts.


Submit host: Submit hostsare machines configured tosubmit, monitor andadminister jobs, and tomanage the entire cluster.

Execution host: Executionhosts have the permissionto run SGE jobs.


Administration host: SGE administrators useadministration hosts to make changes to the cluster’sconfiguration, such as changing distributed resourcemanagement parameters, configuring new nodes oradding or changing users.


Shadow master host: While there is only one master host,other machines in the cluster can be designated as shadowmaster hosts to provide greater availability. A shadow masterhost continually monitors the master host, and automaticallyand transparently assumes control in the event that themaster host fails. Jobs already in the cluster are not affectedby a master host failure.

Daemons in an SGE cluster

sge_qmaster – The MasterdaemonThe sge_qmaster daemon is thecentre of the cluster’smanagement and schedulingactivities; it maintains tables abouthosts, queues, jobs, system loadand user permissions. It receivesscheduling decisions fromsge_schedd daemon and requestsactions from sge_execd daemonon the appropriate executionhost(s). The sge_qmaster daemonruns on the Master host.


sge_schedd – The SchedulerdaemonThe sge_schedd is a schedulingdaemon that maintains an upto-date view of the cluster’s statuswith the help of sge_qmasterdaemon. It makes the schedulingdecision about which job(s) aredispatched to which queue(s). Itthen forwards these decisions tothe sge_qmaster daemon, whichinitiates the requisite actions. Thesge_schedd daemon also runs onthe Master host.


sge_execd – The ExecutiondaemonThe sge_execd daemon isresponsible for the queue(s) on itshost and for the execution of jobsin these queues by startingsge_shepherd daemons.Periodically, it forwardsinformation such as job status orload on its host, to thesge_qmaster daemon. Thesge_execd daemon runs on anExecute host.


sge_commd – TheCommunication daemonThe sge_commd daemoncommunicates over a well-known TCP port and is usedfor all communication amongSGE components.The sge_commd daemonruns on each Execute hostand the Master host in anSGE cluster.


sge_shepherd – The JobControl daemonStarted by the sge_execddaemon, the sge_shepherddaemon runs for each jobbeing actually executed on ahost. The sge_shepherddaemon controls the job’sprocess hierarchy and collectsaccounting data after the jobhas completed.

SGE supports four job types – batch, interactive,parallel and array.

The first three have obvious meanings, the fourthtype – array job – is where a single job can bereplicated a specified number of times, eachdiffering only by its input data set, which is usefulfor parameter studies.

Job management in SGE

SGE supports three execution modes – batch,interactive and parallel. Batch mode is used to run straightforward

sequential programs. In interactive mode, users are given shell

access (command line) to some suitable hostvia, for example, X-windows.

In a parallel mode, parallel programs using thelikes of MPI and PVM are supported.

Job run-time environments in SGE

Jobs submitted to the Master host in an SGE cluster areheld in a spooling area until the scheduler determinesthat the job is ready to run.SGE matches the available resources to a job’srequirements; for example, matching the availablememory, CPU speed and available software licences,which are periodically collected by Execution hosts.The requirements of the jobs may be very different andonly certain hosts may be able to provide thecorresponding services. Once a resource becomesavailable for execution of a new job, SGE dispatches thejob with the highest priority and matchingrequirements.

Job selection and resource matching in SGE

Job run-time environments in SGE

Fundamentally, SGE uses two sets of criteria to schedule jobs – jobpriorities and equal share.

Job prioritiesThis criterion concerns the order of the scheduling of differentjobs, a first-in-first-out (FIFO) rule is applied by default.

Equal-share schedulingThe FIFO rule sometimes leads to problems, especially whenusers tend to submit a series of jobs at almost the same time.All the jobs that are submitted in this case will be designated tothe same group of queues and will have to potentially wait avery long time before executing. equal-share scheduling avoidsthis problem by sorting the jobs of a user already owning anexecuting job to the end of the precedence list.

Submitting jobs to an N1GE cluster via N1GE or Globus

The Portable Batch System (PBS)

The PBS is a resource management and schedulingsystem.

It accepts batch jobs (shell scripts with controlattributes), preserves and protects the job until it runs; itexecutes the job, and delivers the output back to thesubmitter.

A batch job is a program that executes on the backendcomputing environment without further user interaction.

PBS may be installed and configured to support jobsexecuting on a single computer or on a cluster-basedcomputing environment.

PBS is capable of allowing its nodes to be grouped intomany configurations.


The PBS architecture

PBS uses a Masterhost and anarbitrary number ofExecution and jobSubmission hosts.The Master host isthe centralmanager of a PBScluster; a host canbe configured as aMaster host and anExecution host.

The Portable Batch System (PBS)Daemons in a PBS cluster

pbs_server: The pbs_server daemon only runs onthe PBS Master host (server). Its main function isto provide the basic batch services, such asreceiving/creating a batch job, modifying a job,protecting the job against system crashes andexecuting the job.


pbs_mom: The pbs_mom daemon runs on eachhost and is used to start, monitor and terminatejobs, under instruction from the pbs_serverdaemon.


pbs_sched: The pbs_sched daemon runs on theMaster host and determines when and where torun jobs. It requests job state information frompbs_server daemon and resource stateinformation from pbs_mom daemon and thenmakes decisions for scheduling jobs.

Jobs submitted to PBS are put in job queues. Jobscan be sequential or parallel codes using MPI.A server can manage one or more queues; a batchqueue consists of a collection of zero or more jobsand a set of queue attributes.Jobs reside in the queue or are members of thequeue.In spite of the name, jobs residing in a queue neednot be ordered with FIFO.

Job selection in PBS

In PBS, resources can be identified either explicitlythrough a job control language, or implicitly bysubmitting the job to a particular queue that isassociated with a set of resources.

Once a suitable resource is identified, a job can bedispatched for execution.

PBS clients have to identify a specific queue to submitto in advance, which then fixes the set of resourcesthat may be used; this hinders further dynamic andqualitative resource discovery.

Resource matching in PBS

Jobs can be submitted to or from a PBS cluster to Globus

Additional PBS Pro services include:

Cycle harvesting: PBS Pro can run jobs on idleworkstations and suspend or re-queue the jobswhen the workstation becomes used, based oneither load average or keyboard/mouse input.

Site-defined resources: A site can define one ormore resources which can be requested by jobs. Ifthe resource is “consumable”, it can be tracked atthe server, queue and/or node level.

“Peer to Peer” scheduling: A site can have multiple PBSPro clusters (each cluster has its server, scheduler and oneor more execution systems). A scheduler in any givencluster can be configured to move jobs from other clustersto its cluster when the resources required by the job areavailable locally.

Advance reservations: Resources, such as nodes or CPUs,can be reserved in advance with a specified start and endtime/date. Jobs can be submitted against the reservationand run in the time period specified. This ensures therequired computational resources are available whentime-critical work must be performed.

Additional PBS Pro services include:

Education

12. a review of condor, sge and pbs