R language

Preview:

Citation preview

R Language

ian

Why ?

Background of R

What is R?

GNU Project Developed by John Chambers @ Bell LabFree software environment for statistical computing and graphicsFunctional programming language written primarily in C, Fortran

R is functional programming language R is an interpreted language R is object oriented-language

R Language

Statistic analysis on the fly

Mathematical function and graphic module embedded

FREE! & Open Source! http://cran.r-project.org/src/base/

Why Using R

What is your programming language of choice, R, Python or something else?  “I use R, and occasionally matlab, for data analysis. There is a large, active and extremely knowledgeable R community at Google.”http://simplystatistics.org/2013/02/15/interview-with-nick-chamandy-statistician-at-google/

Data Scientist of these Companies Using R

“Expert knowledge of SAS (With Enterprise Guide/Miner) required and candidates with strong knowledge of R will be preferred”http://www.kdnuggets.com/jobs/13/03-29-apple-sr-data-scientist.html?utm_source=twitterfeed&utm_medium=facebook&utm_campaign=tfb&utm_content=FaceBook&utm_term=analytics#.UVXibgXOpfc.facebook

In 2007, Revolution Analytics providea commercial support for Revolution R http://www.revolutionanalytics.com/products/revolution-r.php http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php

Big Data Appliance, which integrates R, Apache Hadoop, Oracle Enterprise Linux, and a NoSQL database with the Exadata hardware http://

www.oracle.com/us/products/database/big-data-appliance/overview/index.html

Commercial support for R

Free for Community Version http://www.revolutionanalytics.com/downloads/

http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php

Revolotion R

  Base R 2.14.2 64

Revolution R (1-core)

Revolution R (4-core)

Speedup (4 core)

Matrix Calculation 17.4 sec 2.9 sec 2.0 sec 7.9x

Matrix Functions 10.3 sec 2.0 sec 1.2 sec 7.8x

Program Control 2.7 sec 2.7 sec 2.7 sec Not Appreciable

R Studio http://www.rstudio.com/

IDE

RGUI• http://www.r-project.org/

Shiny makes it super simple for R users like you to turn analyses into interactive web applications that anyone can use

http://www.rstudio.com/shiny/

Web App Development

CRAN (Comprehensive R Archive Network)

Package ManagementRepository URLCRAN http://cran.r-project.org/web/packages/Bioconductor http://www.bioconductor.org/packages/release/Softwa

re.htmlR-Forge http://r-forge.r-project.org/

R Basic

help() help(demo)

demo() demo(is.things)

q() ls() rm()

rm(x)

Basic Command

Vector List Factor Array Matrix Data Frame

Basic Object

物件類型 (type) 主要是向量 (vector), 矩陣 (matrix), 陣列 (array),因素 (factor), 列表 (list), 資料框架 (data frame), 函式 (function).

物件基本元素之“模式” (basic mode) 分成 1."numeric", 實數型 , 含 "integer", 整數型 ( 有時需特別指定 ),與 "double", 倍精確度型 . 2."logical", 邏輯型 (true or false), 以 TRUE(T) 或 FALSE(F) 呈現 ,

( 也可以是 1 (T) 與 0 (F). 3."complex", 複數型 4."character", 文字型 ( 或字串 ), 通常輸入時 , 在文字或字串兩側加上雙引號 (").

Scalar x=3; y<-5; x+y

Vectors x = c(1,2,3, 7); y= c(2,3,5,1); x+y; x*y; x – y; x/y; x =seq(1,10); y= 2:11; x+y x =seq(1,10,by=2); y =seq(1,10,length=2) rep(c(5,8), 3) x= c(1,2,3); length(x)

Objects & Arithmetic

Summary X = c(1,2,3,4,5,6,7,8,9,10) mean(x), min(x), median(x), max(x), var(x) summary(x)

Subscripting x = c(1,2,3,4,5,6,7,8,9,10) x[1:3]; x[c(1,3,5)]; x[c(1,3,5)] * 2 + x[c(2,2,2)] x[-(1:6)]

Summaries and Subscripting

Contain a heterogeneous selection of objects e <- list(thing="hat", size="8.25"); e l <-

list(a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10)

l$j man = list(name="Qoo", height=183);

man$name

Lists

Ordered collection of items to present categorical value

Different values that the factor can take are called levels

Factors phone = factor(c('iphone', 'htc', 'iphone',

'samsung', 'iphone', 'samsung')) levels(phone)

Factor

Array An extension of a vector to more than two dimensions a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))

Matrices A vector to two dimensions – 2d-array x = c(1,2,3); y = c(4,5,6); rbind(x,y);cbind(x,y) x = rbind(c(1,2,3),c(4,5,6)); dim(x) x<-matrix(c(1,2,3,4,5,6),nr=3); x<-matrix(c(1,2,3,4,5,6),nrow=3, ,byrow=T) x<-matrix(c(1,2,3,4),nr=2);y<-matrix(c(5,6),nr=2); x%*%y t(matrix(c(1,2,3,4),nr=2)) solve(matrix(c(1,2,3,4),nr=2))

Matrices & Array

Useful way to represent tabular data essentially a matrix with named columns

may also include non-numerical variables

Example df =

data.frame(a=c(1,2,3,4,5),b=c(2,3,4,5,6));df

Data Frame

Function `%myop%` <- function(a, b) {2*a + 2*b}; 1 %myop% 1 f <- function(x) {return(x^2 + 3)} create.vector.of.ones <- function(n) { return.vector <- NA; for (i in 1:n) { return.vector[i] <- 1; } return.vector; } create.vector.of.ones(3)

Control Structures If …else… Repeat, for, while

Catch error – trycatch

Function

Functional language Characteristic apply.to.three <- function(f) {f(3)} apply.to.three(function(x) {x * 7})

Anonymous Function

All R code manipulates objects. Every object in R has a type In assignment statements, R will copy the

object, not just the reference to the object Attributes

Objects and Classes

Many R functions were implemented using S3 methods

In S version 4 (hence S4), formal classes and methods were introduced that allowed Multiple arguments Abstract types inheritance.

S3 & S4 Object

S4 OOP Example setClass("Student", representation(name = "character",

score="numeric")) studenta = new ("Student", name="david", score=80 ) studentb = new ("Student", name="andy", score=90 )setMethod("show", signature("Student"), function(object) { cat(object@score+100) }) setGeneric("getscore", function(object)

standardGeneric("getscore")) Studenta

OOP of S4

A package is a related set of functions, help files, and data files that have been bundled together.

Basic Command library(rpart) CRAN Install (.packages())

Packages

29

Package used in Machine Learning for Hackers

Apply Returns a vector or array or list of values

obtained by applying a function to margins of an array or matrix.

data <- cbind(c(1,2),c(3,4)) data.rowsum <- apply(data,1,sum) data.colsum <- apply(data,2,sum) data

Apply

Save and Load x = USPersonalExpenditure save(x, file="~/test.RData") rm(x) load("~/test.RData") x

File IO

Charts and Graphics

xrange = range(as.numeric(colnames(USPersonalExpenditure)));

yrange= range(USPersonalExpenditure); plot(xrange, yrange, type="n", xlab="Year",ylab="Category" )

for(i in 1:5) {

lines(as.numeric(colnames(USPersonalExpenditure)),USPersonalExpenditure[i,], type="b", lwd=1.5)

}

Plotting Example

Reference & Resource

R in a nutshell

Study Material

Online Reference

37

Community Resources for R help

Websites Stackoverflow  Cross Validated R-help R-devel R-sig-* Package-specific mailing list

Blog R-bloggers

Twitter https://twitter.com/#rstats

Quora http://www.quora.com/R-software

Resource

Conference useR! R in Finance R in Insurance Others Joint Statistical Meetings Royal Statistical Society Conference

Local User Group http://blog.revolutionanalytics.com/local-r-groups.html

Taiwan R User Group http://www.facebook.com/Tw.R.User http://www.meetup.com/Taiwan-R/

Resource (Con’d)

05/03/2023 40Confidential | Copyright 2012 Trend Micro Inc.

Thank You!

Recommended