Econometric Theory (I)
A Quick Introduction to R
Yen, Chia-YiNTU
Sep 18, 2014
Installatioan
Having trouble?http://homepage.ntu.edu.tw/~ckuan/courses.html
(prof. Chung-Ming Kuan ‘s website)
optional, but believe me it’s super charming!
R Engine http://cran.rstudio.com/
R IDE: Rstudiohttp://www.rstudio.com/ide/download/desktop
a FREE
software environment for statistical computing and graphics
R is ….
Why R
Data Science is Hot ! 最多人使用的統計語言
最多人用它分析資料
# 矩陣運算
# 統計分析
# 與C++對接容易
Software used in data analysis competitions in 2011.
source :http://r4stats.com/articles/popularity/
http://blog.revolutionanalytics.com/2012/08/r-language-popularity-for-data-mining.html
Why R: demo
Why R:demo
R Engine
Rstudio
script mode
Help & Graphics
Debug
Interactive mode
a quick look at R
Expression
Variable (變數)
Function (函式)
Module (模組)
Package (套件)
calculation
numeric, string, booleanvector,matrix, data.frame
built-in, package, self-defined
# Input / Output (I/O)# Linear Algebra# Regression
Let’s Demo !
Open "0918_firstR.R” in Rstudio.
What’s the difference between...
Interactive mode
v.s.
Script mode
Expression
# 其實很像計算機...
# 回傳 TRUE 或 FALSE
# R code
1+1(1+1)^4((1+1)^4-2)%%3sin(((1+1)^4-2)%%3)
# R code
2<32==32!=3
data
Variable 變數
like a
container that stores
# 注意: "=" & "==" 功能不同
# "=":給值 (assign) # "==": 相等於 (equal to)
numericstring boolean
# R code
x = 3 # numeric 數值
x = "Hi, Everyone ~~" # string 字串
x = 'Hi, Everyone ~~' # 一定要有引號. x = TRUE # boolean 布林值
x = T # 一定要大寫
x = 2<3
data
Variable 變數
like a
container that stores
vectormatrix data.frameetc.
vec mat
Variable 變數
1
2
3
4
# R code
# vectora = c(1,2,3,4) # numeric vector 數值向量
b = c("1", "2", "3","4") # string vector 字串向量
c = c( T, F, T, T) # boolean vector 布林向量
# matrixd = matrix(a, nrow=2, ncol=2)dim(a) = c(2,2)
# data.framee = data.frame(string = b, booling = c) #it can store different type data
1 3
2 4
“1” T
“2” F
“3” T
“4” F
numericvector
numeric matrix
data.frame
Function 函式
like a collection of computation 也就是說, 把一堆運算包起來
do some computation
a function:length
[1,2,3,4] 4return
# R codea = c(1,2,3,4)result = length(a)
resultinput
Function 函式
do some computation
function: mean
[1,2,3,4] 2.5return
output
Built-in self-defined(package)
# Built-indata = 1:4output= mean(data)
data
# Self-definedMyMean = function(data){ total = sum(data) len = length(data) result = total / len return(result)}data = 1:4output = MyMean(data)
input
Module 模組
like a collection of function[example] data_preprocess.R
Package 套件
you can expand your built-in function by installing a packages
like a collection of module
Package 套件
how to use PACKAGES ?
# R code
x = 1:10 # 設定x軸y = sin(3*x) # 設定y軸
plot(x,y) # 原本R預設的畫圖函式
# 為了畫比較漂亮的圖....install.packages(“ggplot2”) # 將 ggplot2這個套件從官網上載到本機端
#括號是必要的
library(ggplot2) # 從本機端 load 到這份程式碼裡
qplot(x,y) # 可以使用 ggplot2裡面寫好的函式 qplot了
別忘了你的好朋友...
Help & Google
# R code
help(mean)?meanexample(mean)
Flow Control
#1 if
if (expression){statement
}
# R code
data = rnorm(100) #從標準常態分配中抽
100個樣本點
mu = mean(data)mu > 0
if ( mu > 0 ){ print("mean is greater than 0")}else{ print("mean is less than 0")}
如果偵測到TRUE,就執行大括弧內敘述;否則不執行
Flow Control
#2 while
while (expression){statement
}
# R code
for ( i in 1:3){ data = rnorm(i) print(data)}
只要偵測到TRUE,執行大括弧內敘述;否則不執行
Flow Control
#3 For
For( i in 1: 3){statement (i)
}
# R code
data = rnorm(100) #從標準常態分配中抽
100個樣本點
mu = mean(data)mu > 0
while (mu > 0){ print("mean is greater than 0") # mu = "tested"}# 發生無窮迴圈,試著把while內的註解打
開
當 i = 1 , 執行一次當 i = 2 ,再執行一次當 i = 3 ,再執行一次結束迴圈
Homework (optional)
Code School http://tryr.codeschool.com
DataMind http://www.datamind.org/#/
Thanks for your listening :)
Appendix: object(物件)
所有的物件(objects)都有兩種基本屬性(intrinsic attributes): 格式(mode)與長度(length)
Appendix: operator(運算子)
Appendix: sampling (隨機抽樣)
Appendix: Most-used function(1)
Appendix: Most-used function(2)
Appendix: Basic Graphics (1)
Appendix: Basic Graphics (2)
Appendix: Low-level Graphics