16
Running Batch Jobs in R: How to deal with coarsely parallel problems WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP Malcolm Haddon May 2014

Running Batch Jobs in R: How to deal with coarsely parallel problems WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP Malcolm Haddon May 2014

Embed Size (px)

Citation preview

Running Batch Jobs in R:How to deal with coarsely parallel problems

WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP

Malcolm HaddonMay 2014

| Batch Jobs in R | Haddon

Computer Intensive•Many, many, many iterations:

•Management Strategy Evaluation•Monte Carlo Markov Chains•Lots of replicates of any analyses

•Large scale simulations:•multi-species, •multi-populations, •multi-’etc’

•Any computing job that takes a long time or uses a lot of computing resources

2 |

| Batch Jobs in R | Haddon

Why the Fuss?• Solving BIG computing problems has its own

strategies.• If a job:

• takes a very long time, or •uses very large amounts of RAM •Then how can it be split up most effectively?

•Depends on the scale at which processes are independent.•May need trials to find best compromise.

3 |

| Batch Jobs in R | Haddon

Coarsely Parallel Processes • Not talking about finely parallel processes such as

cellular models in Oceanography or visualization. • The use of GPUs containing thousands of small processors is

ideally suited to such analyses.• Some emphasis on this with the CSIRO clusters, (Bragg, etc)

and the Advanced Scientific Computing program

• Instead: focussed on serial and sequential problems where analysis order is important.

• Population processes• Many biological processes

• Cannot split up time-series trajectories – but can treat each trajectory as a different process (coarsely parallel)

4 |

| Batch Jobs in R | Haddon

Alternative Approaches to Simulation.

5 |

Apply 8 Harvest Strategiesto an abalone fishery over 40 years with 1000 replicates (8 x 1000)

for (HS in 1:8) { for (iter in 1:1000) { } }

plot and tabulateresults

Apply 8 Harvest Strategiesto an abalone fishery over 40 years with 1000 replicates (8 x 1000)

for (iter in 1:1000) { }

Combineplot and tabulate

results

for (iter in 1:1000) { }

for (iter in 1:1000) { }

…..

Store Results Store Results Store Results…..

Split the jobinto 8 parts

Next Steps

| Batch Jobs in R | Haddon

The R program

6 |

| Batch Jobs in R | Haddon7 |

batchsimab.rsource(“Lots of Functions”)

source(“Constants”)

setwdresultdir

read in Data

source(“run_specification”)

write to csv file(s)write to Rdata filesplots to tiff/pdf/etc

| Batch Jobs in R | Haddon

Top Level: runbatch.R – contains:## SET PARAMETERS AS DESIRED IN ## runspecification.R and constants.R

>wkdir <- "C:/A_CSIRO/Rcode/abalone/SimAb">setwd(wkdir) ## points to directory containing batchsimab.r

>command <- "R.exe --vanilla < “batchsimab.R">shell(command, wait=FALSE)

##(R.exe must be on the path).

8 |

| Batch Jobs in R | Haddon

Top Level: runbatch.R – contains:## SET PARAMETERS AS DESIRED IN ## RunSpecification.R and constants.R primaryloop <- c(val1, val2, val3,..) for (toplevel in 1:length(primaryloop) { sink(“RunSpecification.R”) … … sink() command <- "R.exe --vanilla < batchsimab.R" shell(command, wait=FALSE)}

## Can re-write values in RunSpecification.R

9 |

| Batch Jobs in R | Haddon

• pickLML <- c(127,132,138,145)• for (pick in 1:length(pickLML)) {• filename <- "alt_runspecification.r"• sink(filename)• cat("##Select the HCR \n")• cat("StepH <- FALSE \n")• cat("ConstH <- TRUE \n")• cat("## Define the Scenarios \n")• cat("initDepl_L <- c(0.7) \n")• cat("inH_L <- c(0.1) \n")• cat("origTAC <- 150.0 \n")• cat(paste("LML <- ",pickLML[pick],sep="") ," \n")• cat("reps <- 100 \n")• sink()• command <- "R.exe --vanilla < batchsimab.R"• shell(command, wait=FALSE)• Sys.sleep(5.0)• }10 |

| Batch Jobs in R | Haddon

alt-runspecification.r - contents• batch <- TRUE • ##Select the HCR • StepH <- FALSE • ConstH <- TRUE

• ## Define the Scenarios • initDepl_L <- c(0.7) • inH_L <- c(0.1)

• origTAC <- 150.0

• LML <- 138 • reps <- 100

11 |

| Batch Jobs in R | Haddon

Alternative Approach

12 |

Not that useful for coarsely parallel problems, but excellent for finely parallel processes.

| Batch Jobs in R | Haddon

Alternative Approaches•Can use one’s own desktop or laptop.•Can use a secondary machine (remote login)•Can use a CSIRO cluster machine (bragg for

Linux or bragg-w for windows, plus others).•Clusters are very effective for finely parallel

work but less so for coarsely parallel jobs.•Can use Condor – harvests CPU time on remote

machines on network automatically.• wiki.csiro.au/display/ASC/Scientific+Computing+Homepage

13 |

| Batch Jobs in R | Haddon

Conclusion• The use of batch jobs provides a solution for completing

certain types of task.• If you are using computer intensive methods then you

might gain greatly from using coarsely parallel methods.• Trade-off between the benefits and the set-up time and

post-run processing determines when it becomes sensible to use coarsely parallel methods• Invariably more than 1 way exists to do the same thing:• https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage

14 |

WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP

Thank you

CSIRO Marine and Atmospheric ResearchMalcolm Haddontel. 61 3 6232 5097email. [email protected]. www.csiro.au

| Batch Jobs in R | Haddon

Adding in R.exe to Path•Control Panel

•System–Advanced System Settings

–Environmental Variables• PATH - edit

•Paste “; C:/Program Files/R/R3.1.0/bin/x64” onto the end of the present PATH and exit.

16 |