Upload
sandeep-poonia
View
562
Download
1
Embed Size (px)
DESCRIPTION
Condor is a resource management and job scheduling system, a research project from University of Wisconsin–Madison.
Citation preview
GRID COMPUTINGA REVIEW OF CONDOR, SGE AND PBS
Sandeep Kumar PooniaHead of Dept. CS/IT, Jagan Nath University, Jaipur
B.E., M. Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
11/16/2013 1Sandeep Kumar Poonia
Condor
Condor is a resource management and job scheduling system, aresearch project from University of Wisconsin–Madison.
Condor platforms
HP systems running HPUX10.20 Sun SPARC systems running Solaris 2.6/2.7/8/9 SGI systems running IRIX 6.5 (not fully supported) Intel x86 systems running Redhat Linux 7.1/7.2/7.3/8.0/9.0, Windows NT4.0,
XP and 2003 Server (the Windows systems are not fully supported) ALPHA systems running Digital UNIX 4.0, Redhat Linux 7.1/7.2/7.3 and Tru64
5.1 (not fully supported) PowerPC systems running Macintosh OS X and AIX 5.2L (not fully supported) Itanium systems running Redhat 7.1/7.2/7.3 (not fully supported) Windows systems (not fully supported).
The architecture of a Condor pool
Resources in Condor are normally organized in the formof Condor pools. A pool is an administrated domain ofhosts, not specifically dedicated to a Condorenvironment.
Condor pool normally has one Central Manager (master host)and an arbitrary number of Execution (worker) hosts.
A Condor Execution host can be configured as a jobExecution host or a job Submission host or both.
The Central Manager host is used to manage resources andjobs in a Condor pool.
Host machines in a Condor pool may not be dedicated toCondor.
If the Central Manager host in a Condor pool crashes, jobsthat are already running will continue to run unaffected.
Queued jobs will remain in the queue unharmed, but theycannot begin running until the Central Manager host isrestarted.
Daemons in a Condor poolA daemon is a program that runs in the background once started. Toconfigure a Condor pool, the following Condor daemons need to bestarted.
Daemons in a Condor pool
The condor_master daemon runs on each host in a Condor pool tokeep all the other daemons running in the pool.
It spawns daemons such as condor_startd and condor_schedd, andperiodically checks if there are new binaries installed for any ofthese daemons.
If so, the condor_master will restart the affected daemons. In addition, if any daemon crashes, the master will send an email to
the administrator of the Condor pool and restart the daemon. The condor_master also supports various administrative commands,
such as starting, stopping or reconfiguring daemons remotely.
Daemons in a Condor pool
The condor_startd daemon runs on each host in a Condor pool. It advertises information related to the node resources to the
condor_collector daemons running on the Master host formatching pending resource requests.
This daemon is also responsible for enforcing the policies thatresource owners require, which determine under whatconditions remote jobs will be started, suspended, resumed,vacated or killed.
When the condor_startd is ready to execute a Condor job onan Execution host, it spawns the condor_starter.
Daemons in a Condor pool
The condor_starter daemon onlyruns on Execution hosts.
It is the condor_starter that actuallyspawns a remote Condor job on agiven host in a Condor pool.
The condor_starter daemon sets upthe execution environment andmonitors the job once it is running.
When a job completes, thecondor_starter sends back jobstatus information to the jobSubmission node and exits.
Daemons in a Condor pool
The condor_schedd daemon running on each host in a Condor pooldeals with resource requests.
User jobs submitted to a node are stored in a local job queuemanaged by the condor_schedd daemon.
Condor command-line tools such as condor_submit, condor_q orcondor_rm interact with the condor_schedd daemon to allow usersto submit a job into a job queue, and to view and manipulate the jobqueue.
If the condor_schedd is down on a given machine, none of thesecommands will work.
Daemons in a Condor pool The condor_shadow daemon only runs on
Submission hosts in a Condor pool and actsas the resource manager for user jobsubmission requests.
The condor_shadow daemon performsremote system calls allowing jobs submittedto Condor to be checkpointed.
Any system call performed on a remoteExecution host is sent over the network,back to the condor_shadow daemon on theSubmission host, and the results are alsosent back to the Submission host.
Daemons in a Condor poolThecondor_collectordaemon only runson the CentralManager host.This daemon
interacts withcondor_startd andcondor_schedddaemons runningon other hosts tocollect all theinformation aboutthe status of aCondor pool suchas job requestsand resourcesavailable
Thecondor_statuscommand can beused to query thecondor_collectordaemon forspecific statusinformationabout a Condorpool.
Daemons in a Condor pool
The condor_negotiator daemon only runs on the Central Managerhost and is responsible for matching a resource with a specific jobrequest within a Condor pool.
Periodically, the condor_negotiator daemon starts a negotiationcycle, where it queries the condor_collector daemon for the currentstate of all the resources available in the pool.
It interacts with each condor_schedd daemon running on aSubmission host that has resource requests in a priority order, andtries to match available resources with those requests.
Daemons in a Condor pool
The condor_kbdd daemon only runs on an Execution host installingDigital Unix or IRIX.
On these platforms, the condor_startd daemon cannot determineconsole (keyboard or mouse) activity directly from the operatingsystem.
The condor_kbdd daemon connects to an X Server and periodicallychecks if there is any user activity.
If so, the condor_kbdd daemon sends a command to thecondor_startd daemon running on the same host.
Daemons in a Condor pool The condor_ckpt_server daemon runs on a checkpoint
server, which is an Execution host, to store and retrievecheckpointed files.
If a checkpoint server in a Condor pool is down, Condorwill revert to sending the checkpointed files for a given jobback to the job Submission host.
Job life cycle in Condor
Job life cycle in Condor
1. Job submission: A job issubmitted by a Submission hostwith condor_submit command
Job life cycle in Condor
2. Job request advertising:Once it receives a job request,the condor_schedd daemon onthe Submission host advertisesthe request to thecondor_collector daemonrunning on the CentralManager host
Job life cycle in Condor
3. Resource advertising: Eachcondor_startd daemon running onan Execution host advertisesresources available on the host tothe condor_collector daemonrunning on the Central Managerhost.
Job life cycle in Condor
4. Resource matching: Thecondor_negotiator daemon runningon the Central Manager hostperiodically queries thecondor_collector daemon (Step 4)to match a resource for a user jobrequest.5. It then informs thecondor_schedd daemon running onthe Submission host of the matchedExecution host
Job life cycle in Condor
6. Job execution: The condor_schedd daemon running onthe job Submission host interacts with the condor_startddaemon running on the matched Execution host (Step 6),7. which will spawn a condor_starter daemon (Step 7).
Job life cycle in Condor
8. The condor_schedd daemon on the Submission hostspawns a condor_shadow daemon9. to interact with the condor_starter daemon for jobexecution.10. The condor_starter daemon running on the matchedExecution host receives a user job to execute.
Job life cycle in Condor
11. Return output: When a job is completed, the resultswill be sent back to the Submission host by theinteraction between the condor_shadow daemonrunning on the Submission host and the condor_starterdaemon running on the matched Execution host
Security management in Condor
Condor provides strong support for authentication,encryption, integrity assurance, as well asauthorization.
A Condor system administrator using configurationmacros enables most of these security features.
When Condor is installed, there is no authentication,encryption, integrity checks or authorization checks inthe default configuration settings.
This allows newer versions of Condor with securityfeatures to work or interact with previous versionswithout security support.
An administrator must modify the configurationsettings to enable the security features.
Job management in Condor
JobA Condor job is a work unit submitted to a Condor pool forexecution.
Condor manages jobs in the following aspects.
Job typesJobs that can be managed by Condor are executable sequential orparallel codes, using, for example, PVM or MPI.A job submission may involve a job that runs over a long period, ajob that needs to run many times or a job that needs manymachines to run in parallel.
QueueEach Submission host has a job queue maintained by thecondor_schedd daemon running on the host.A job in a queue can be removed and placed on hold.
Job management in Condor
Job statusA job can have one of the following status:• Idle: There is no job activity.• Busy: A job is busy running.• Suspended: A job is currently suspended.• Vacating: A job is currently checkpointing.• Killing: A job is currently being killed.• Benchmarking: The condor_startd is running benchmarks.
Job management in Condor
Job run-time environmentsThe Condor universe specifies a Condor executionenvironment. There are seven universes in Condor 6.6.3as described below. The default universe is the Standard Universe (except
where the configuration variable DEFAULT_UNIVERSEdefines it otherwise), and tells Condor that this job hasbeen re-linked via condor_compile with Condorlibraries and therefore supports checkpointing andremote system calls.
The Vanilla Universe is an execution environment forjobs which have not been linked with Condor libraries;and it is used to submit shell scripts to Condor.
Job management in Condor
The PVM Universe is used for a parallel job written with PVM3.4.
The Globus Universe is intended to provide the standardCondor interface to users who wish to start Globus jobs fromCondor. Each job queued in the job submission file istranslated into the Globus Resource Specification Language(RSL) and subsequently submitted to Globus via the GlobusGRAM protocol.
The MPI Universe is used for MPI jobs written with the MPICHpackage.
The Java Universe is used for programs written in Java. The Scheduler Universe allows a Condor job to be executed on
the host where the job is submitted. The job does not needmatchmaking for a host and it will never be preempted.
Job management in Condor
Job submission with a shared file systemIf Vanilla, Java or MPI jobs are submitted without usingthe file transfer mechanism, Condor must use ashared file system to access input and output files.
In this case, the job must be able to access the datafiles from any machine on which it could potentiallyrun.
Job management in Condor
Job submission without a shared file system Condor also works well without a shared file
system. A user can use the file transfer mechanism in
Condor when submitting jobs. Condor will transfer any files needed by a job from
the host machine where the job is submitted into atemporary working directory on the machinewhere the job is to be executed.
Condor executes the job and transfers output backto the Submission machine.
Job priorityJob priorities allow the assignment of a priority level to eachsubmitted Condor job in order to control the order ofexecution. The priority of a Condor job can be changed.
Job management in Condor
Chirp I/OThe Chirp I/O facility in Condor provides a sophisticated I/Ofunctionality. It has two advantages over simple whole-filetransfers. First, the use of input files is done at run time rather than
submission time. Second, a part of a file can be transferred instead of
transferring the whole file.
Resource management in CondorCondor manages resources in a Condor pool in the following aspects.
Tracking resource usageThe condor_startd daemon on each host reports to thecondor_collector daemon on the Central Manager host about theresources available on that host.
User priorityCondor hosts are allocated to users based upon a user’spriority.A lower numerical value for user priority means higherpriority, so a user with priority 5 will get more resourcesthan a user with priority 50.
Job scheduling in a Condor pool is not strictly based on afirstcome- first-server selection policy. Rather, to keep large jobsfrom draining the pool of resources, Condor uses a unique up-down algorithm that prioritizes jobs inversely to the number ofcycles required to run the job.
Condor supports the following policies in scheduling jobs. First come first serve: This is the default scheduling policy. Preemptive scheduling: Preemptive policy lets a pending
high-priority job take resources away from a running job oflower priority.
Dedicated scheduling: Dedicated scheduling means that jobsscheduled to dedicated resources cannot be preempted.
Job scheduling policies in Condor
Resource matching is used to match an Executionhost to run a selected job or jobs.
The condor_collector daemon running on the CentralManager host receives job request advertisementsfrom the condor_schedd daemon running on aSubmission host and resource availabilityadvertisements from the condor_startd daemonrunning on an Execution host.
A resource match is performed by thecondor_negotiator daemon on the Central Managerhost by selecting a resource based on jobrequirements.
Resource matching in Condor
Jobs can be submitted directly to a Condor poolfrom a Condor host, or via Globus (GT2 or earlierversions of Globus).
The Globus host is configured with Condorjobmanager provided by Globus.
When using a Condor jobmanager, jobs aresubmitted to the Globus resource, e.g. usingglobus_job_run.
However, instead of forking the jobs on the localmachine, jobs are re-submitted by Globus toCondor using the condor_submit tool.
Condor support in Globus
Submitting jobs to a Condor pool via Condor or Globus
Submitting jobs to Globus via Condor-G
Sun Grid Engine
The SGE is a distributed resource management andscheduling system from Sun Microsystems that can beused to optimize the utilization of software andhardware resources in a UNIX-based computingenvironment.The SGE can be used to find a pool of idle resources andharnesses these resources; also it can be used fornormal activities, such as managing and scheduling jobsonto the available resources.The latest version of SGE is Sun N1 Grid Engine (N1GE)version 6
Sun Grid Engine: The architecture of the SGE
Sun Grid Engine: The architecture of the SGE
Master host: A single host is selectedto be the SGE master host.This host handles all requests fromusers, makes job-scheduling decisionsand dispatches jobs to executionhosts.
Sun Grid Engine: The architecture of the SGE
Submit host: Submit hostsare machines configured tosubmit, monitor andadminister jobs, and tomanage the entire cluster.
Execution host: Executionhosts have the permissionto run SGE jobs.
Sun Grid Engine: The architecture of the SGE
Administration host: SGE administrators useadministration hosts to make changes to the cluster’sconfiguration, such as changing distributed resourcemanagement parameters, configuring new nodes oradding or changing users.
Sun Grid Engine: The architecture of the SGE
Shadow master host: While there is only one master host,other machines in the cluster can be designated as shadowmaster hosts to provide greater availability. A shadow masterhost continually monitors the master host, and automaticallyand transparently assumes control in the event that themaster host fails. Jobs already in the cluster are not affectedby a master host failure.
Daemons in an SGE cluster
sge_qmaster – The MasterdaemonThe sge_qmaster daemon is thecentre of the cluster’smanagement and schedulingactivities; it maintains tables abouthosts, queues, jobs, system loadand user permissions. It receivesscheduling decisions fromsge_schedd daemon and requestsactions from sge_execd daemonon the appropriate executionhost(s). The sge_qmaster daemonruns on the Master host.
Daemons in an SGE cluster
sge_schedd – The SchedulerdaemonThe sge_schedd is a schedulingdaemon that maintains an upto-date view of the cluster’s statuswith the help of sge_qmasterdaemon. It makes the schedulingdecision about which job(s) aredispatched to which queue(s). Itthen forwards these decisions tothe sge_qmaster daemon, whichinitiates the requisite actions. Thesge_schedd daemon also runs onthe Master host.
Daemons in an SGE cluster
sge_execd – The ExecutiondaemonThe sge_execd daemon isresponsible for the queue(s) on itshost and for the execution of jobsin these queues by startingsge_shepherd daemons.Periodically, it forwardsinformation such as job status orload on its host, to thesge_qmaster daemon. Thesge_execd daemon runs on anExecute host.
Daemons in an SGE cluster
sge_commd – TheCommunication daemonThe sge_commd daemoncommunicates over a well-known TCP port and is usedfor all communication amongSGE components.The sge_commd daemonruns on each Execute hostand the Master host in anSGE cluster.
Daemons in an SGE cluster
sge_shepherd – The JobControl daemonStarted by the sge_execddaemon, the sge_shepherddaemon runs for each jobbeing actually executed on ahost. The sge_shepherddaemon controls the job’sprocess hierarchy and collectsaccounting data after the jobhas completed.
SGE supports four job types – batch, interactive,parallel and array.
The first three have obvious meanings, the fourthtype – array job – is where a single job can bereplicated a specified number of times, eachdiffering only by its input data set, which is usefulfor parameter studies.
Job management in SGE
SGE supports three execution modes – batch,interactive and parallel. Batch mode is used to run straightforward
sequential programs. In interactive mode, users are given shell
access (command line) to some suitable hostvia, for example, X-windows.
In a parallel mode, parallel programs using thelikes of MPI and PVM are supported.
Job run-time environments in SGE
Jobs submitted to the Master host in an SGE cluster areheld in a spooling area until the scheduler determinesthat the job is ready to run.SGE matches the available resources to a job’srequirements; for example, matching the availablememory, CPU speed and available software licences,which are periodically collected by Execution hosts.The requirements of the jobs may be very different andonly certain hosts may be able to provide thecorresponding services. Once a resource becomesavailable for execution of a new job, SGE dispatches thejob with the highest priority and matchingrequirements.
Job selection and resource matching in SGE
Job run-time environments in SGE
Fundamentally, SGE uses two sets of criteria to schedule jobs – jobpriorities and equal share.
Job prioritiesThis criterion concerns the order of the scheduling of differentjobs, a first-in-first-out (FIFO) rule is applied by default.
Equal-share schedulingThe FIFO rule sometimes leads to problems, especially whenusers tend to submit a series of jobs at almost the same time.All the jobs that are submitted in this case will be designated tothe same group of queues and will have to potentially wait avery long time before executing. equal-share scheduling avoidsthis problem by sorting the jobs of a user already owning anexecuting job to the end of the precedence list.
Submitting jobs to an N1GE cluster via N1GE or Globus
The Portable Batch System (PBS)
The PBS is a resource management and schedulingsystem.
It accepts batch jobs (shell scripts with controlattributes), preserves and protects the job until it runs; itexecutes the job, and delivers the output back to thesubmitter.
A batch job is a program that executes on the backendcomputing environment without further user interaction.
PBS may be installed and configured to support jobsexecuting on a single computer or on a cluster-basedcomputing environment.
PBS is capable of allowing its nodes to be grouped intomany configurations.
The Portable Batch System (PBS)
The PBS architecture
PBS uses a Masterhost and anarbitrary number ofExecution and jobSubmission hosts.The Master host isthe centralmanager of a PBScluster; a host canbe configured as aMaster host and anExecution host.
The Portable Batch System (PBS)Daemons in a PBS cluster
pbs_server: The pbs_server daemon only runs onthe PBS Master host (server). Its main function isto provide the basic batch services, such asreceiving/creating a batch job, modifying a job,protecting the job against system crashes andexecuting the job.
The Portable Batch System (PBS)
pbs_mom: The pbs_mom daemon runs on eachhost and is used to start, monitor and terminatejobs, under instruction from the pbs_serverdaemon.
The Portable Batch System (PBS)
pbs_sched: The pbs_sched daemon runs on theMaster host and determines when and where torun jobs. It requests job state information frompbs_server daemon and resource stateinformation from pbs_mom daemon and thenmakes decisions for scheduling jobs.
Jobs submitted to PBS are put in job queues. Jobscan be sequential or parallel codes using MPI.A server can manage one or more queues; a batchqueue consists of a collection of zero or more jobsand a set of queue attributes.Jobs reside in the queue or are members of thequeue.In spite of the name, jobs residing in a queue neednot be ordered with FIFO.
Job selection in PBS
In PBS, resources can be identified either explicitlythrough a job control language, or implicitly bysubmitting the job to a particular queue that isassociated with a set of resources.
Once a suitable resource is identified, a job can bedispatched for execution.
PBS clients have to identify a specific queue to submitto in advance, which then fixes the set of resourcesthat may be used; this hinders further dynamic andqualitative resource discovery.
Resource matching in PBS
Jobs can be submitted to or from a PBS cluster to Globus
Additional PBS Pro services include:
Cycle harvesting: PBS Pro can run jobs on idleworkstations and suspend or re-queue the jobswhen the workstation becomes used, based oneither load average or keyboard/mouse input.
Site-defined resources: A site can define one ormore resources which can be requested by jobs. Ifthe resource is “consumable”, it can be tracked atthe server, queue and/or node level.
“Peer to Peer” scheduling: A site can have multiple PBSPro clusters (each cluster has its server, scheduler and oneor more execution systems). A scheduler in any givencluster can be configured to move jobs from other clustersto its cluster when the resources required by the job areavailable locally.
Advance reservations: Resources, such as nodes or CPUs,can be reserved in advance with a specified start and endtime/date. Jobs can be submitted against the reservationand run in the time period specified. This ensures therequired computational resources are available whentime-critical work must be performed.
Additional PBS Pro services include: