23
Turbocharge your R Rob Zinkov July 12th, 2011 Rob Zinkov () Turbocharge your R July 12th, 2011 1 / 23

Los Angeles R users group - July 12 2011 - Part 2

Embed Size (px)

Citation preview

Page 1: Los Angeles R users group - July 12 2011 - Part 2

Turbocharge your R

Rob Zinkov

July 12th, 2011

Rob Zinkov () Turbocharge your R July 12th, 2011 1 / 23

Page 2: Los Angeles R users group - July 12 2011 - Part 2

Outline

1 Introduction

2 .C

3 .Call

4 Rcpp

Rob Zinkov () Turbocharge your R July 12th, 2011 2 / 23

Page 3: Los Angeles R users group - July 12 2011 - Part 2

Introduction

What is the point of this talk?

Show you how to speed up your R code

Rob Zinkov () Turbocharge your R July 12th, 2011 3 / 23

Page 4: Los Angeles R users group - July 12 2011 - Part 2

Introduction

Caveats

• Please try to optimize your R code first

• Some of these mechanisms will make coding harder

Rob Zinkov () Turbocharge your R July 12th, 2011 4 / 23

Page 5: Los Angeles R users group - July 12 2011 - Part 2

.C

• This is the basic mechanism

• Explicitly copies the data into C

• Only accepts integer vectors

Rob Zinkov () Turbocharge your R July 12th, 2011 5 / 23

Page 6: Los Angeles R users group - July 12 2011 - Part 2

.C

Step 1. Put function in file (foo.c)

void foo(int *nin, double *x)

{

int n = nin[0];

int i;

for (i=0; i<n; i++)

x[i] = x[i] * x[i];

}

Rob Zinkov () Turbocharge your R July 12th, 2011 6 / 23

Page 7: Los Angeles R users group - July 12 2011 - Part 2

.C

• Note this is a void function

• Note arguments are passed in as pointers

• Try to limit one function per file

Rob Zinkov () Turbocharge your R July 12th, 2011 7 / 23

Page 8: Los Angeles R users group - July 12 2011 - Part 2

.C

Step 2. Compile file with R

$ R CMD SHLIB foo.c

Rob Zinkov () Turbocharge your R July 12th, 2011 8 / 23

Page 9: Los Angeles R users group - July 12 2011 - Part 2

.C

Step 3. Load into R

> dyn.load("foo.so")

Rob Zinkov () Turbocharge your R July 12th, 2011 9 / 23

Page 10: Los Angeles R users group - July 12 2011 - Part 2

.C

Step 4. Call your code

.C("foo", n=as.integer(5), x=as.double(rnorm(5)))

Rob Zinkov () Turbocharge your R July 12th, 2011 10 / 23

Page 11: Los Angeles R users group - July 12 2011 - Part 2

.C

• Arguments to .C are name of function followed by arguments

• Arguments must be the right type

• Touching C code runs risks of segfaults

Rob Zinkov () Turbocharge your R July 12th, 2011 11 / 23

Page 12: Los Angeles R users group - July 12 2011 - Part 2

.Call

Why?

• Less copying of data structures (lower memory)

• Access more of R data structures

• Access more kinds of R data

• Do more in C

Rob Zinkov () Turbocharge your R July 12th, 2011 12 / 23

Page 13: Los Angeles R users group - July 12 2011 - Part 2

.Call

.Call code

#include <R.h>

#include <Rinternals.h>

#include <Rmath.h>

SEXP vecSum(SEXP Rvec){

int i, n;

double *vec, value = 0;

vec = REAL(Rvec);

n = length(Rvec);

for (i = 0; i < n; i++) value += vec[i];

printf("The value is: %4.6f \n", value);

return R_NilValue;

}

Rob Zinkov () Turbocharge your R July 12th, 2011 13 / 23

Page 14: Los Angeles R users group - July 12 2011 - Part 2

.Call

R CMD SHLIB vecSum.c

dyn.load("vecSum.so")

.Call("vecSum", rnorm(10))

Rob Zinkov () Turbocharge your R July 12th, 2011 14 / 23

Page 15: Los Angeles R users group - July 12 2011 - Part 2

.Call

SEXP ab(SEXP Ra, SEXP Rb){

int i, a, b;

SEXP Rval;

Ra = coerceVector(Ra, INTSXP);

Rb = coerceVector(Rb, INTSXP);

a = INTEGER(Ra)[0];

b = INTEGER(Rb)[0];

PROTECT(Rval = allocVector(INTSXP, b - a + 1));

for (i = a; i <= b; i++)

INTEGER(Rval)[i - a] = i;

UNPROTECT(1);

return Rval;

}

Rob Zinkov () Turbocharge your R July 12th, 2011 15 / 23

Page 16: Los Angeles R users group - July 12 2011 - Part 2

.Call

Since memory is shared explicit care must be taken not to collide with R

Rob Zinkov () Turbocharge your R July 12th, 2011 16 / 23

Page 17: Los Angeles R users group - July 12 2011 - Part 2

Rcpp

Why?

• Use C++ instead of C

• Ability to use objects to represent R more naturally

• Easier to load code

Rob Zinkov () Turbocharge your R July 12th, 2011 17 / 23

Page 18: Los Angeles R users group - July 12 2011 - Part 2

Rcpp

src <- ’

IntegerVector tmp(clone(x));

double rate = as< double >(y);

int tmpsize = tmp.size();

RNGScope scope;

for (int ii =0; ii < tmpsize; ii++) {

tmp(ii) = Rf_rbinom(tmp(ii), rate);

};

return tmp;

require(inline)

## compile the function, inspect the process with verbose=T

testfun2 = cxxfunction(signature(x=’integer’, y=’numeric’),

src, plugin=’Rcpp’, verbose=T)

Rob Zinkov () Turbocharge your R July 12th, 2011 18 / 23

Page 19: Los Angeles R users group - July 12 2011 - Part 2

Rcpp

require(inline)

testfun = cxxfunction(

signature(x="numeric",

i="integer"),

body = ’

NumericVector xx(x);

int ii = as<int>(i);

xx = xx * ii;

return( xx );

’, plugin="Rcpp")

testfun(1:5, 3)

Rob Zinkov () Turbocharge your R July 12th, 2011 19 / 23

Page 20: Los Angeles R users group - July 12 2011 - Part 2

Rcpp

Conclusions

It is fairly easy to make R faster

Rob Zinkov () Turbocharge your R July 12th, 2011 20 / 23

Page 21: Los Angeles R users group - July 12 2011 - Part 2

Rcpp

Conclusions

Now go make your R code faster

Rob Zinkov () Turbocharge your R July 12th, 2011 21 / 23

Page 22: Los Angeles R users group - July 12 2011 - Part 2

Rcpp

References

• http://www.stat.umn.edu/ charlie/rc/

• http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html

• http://www.sfu.ca/ sblay/R-C-interface.ppt

• http://www.biostat.jhsph.edu/ bcaffo/statcomp/files/dotCall.pdf

• http://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-quickref.pdf

• http://www.jstatsoft.org/v40/i08/paper

Rob Zinkov () Turbocharge your R July 12th, 2011 22 / 23

Page 23: Los Angeles R users group - July 12 2011 - Part 2

Rcpp

Questions?

Rob Zinkov () Turbocharge your R July 12th, 2011 23 / 23