Upload
-
View
287
Download
1
Embed Size (px)
Citation preview
R (2)
Data Science
-
2016/10/17()
. CC --
3.0 http://shouzo.github.io/
Agenda() Prepare() Basic() Theme() Reference
() Prepare
() Prepare
() Prepare
"RStudio"
() Basic
() Basic
() Basic 1.
1.
R rvest
https://blog.gtwang.org/r/rvest-web-scraping-with-r/
() Basic 1.
Xpath
() Basic
Google (Chrome) -
1.
() Basic
Mozilla Firefox - FireBug
1.
() Basic 1.
Xpath
https://github.com/aweimeowaweimeow
() Basic 1.
() Basic 1.
() Basic 1.
() Basic 1.
() Basic 1.
Xpath
() Basic 1.
() Basic 1.
() Basic 1.
() Basic 1.
TAG
() Basic 1.
() Basic 1.
() Basic
2.
2.
() Basic 2.
(1) CSV(2) XML(3) JSON(4) DB ()(5) RData(6) SPSSStataSASOctave ...
CSV
() BasicCSV STEP1read.table()
CSV ( tab )
read.csv2()
read.delim2()
[ ]
read.table (file= , header= TRUE or FALSE, sep= "")[ ]
file header sep
#()theUrl
() BasicCSV
STEP2head()[ ] head()
STEP3data.frame()[ ] data.frame (1 = 1, 2 = 2, 3 = 3, ......,stringsAsFactors=TRUE or False)[ ] stringsAsFactors character () factor character
2.
() BasicSTEP2STEP3
>head(tomato) #RoundTomatoPriceSourceSweetAcidColorTextureOverall 11SimpsonSM 3.99WholeFoods 2.82.83.73.43.421Tuttorosso(blue) 2.99Pioneer 3.32.83.43.02.931Tuttorosso(green) 0.99Pioneer 2.82.63.32.82.941LaFedeSMDOP 3.99ShopRite 2.62.83.02.32.852CentoSMDOP 5.49DAgostino 3.33.12.92.83.162CentoOrganic 4.99DAgostino 3.22.92.93.12.9Avg.of.TotalsTotal.of.Avg 1 16.116.12 15.315.33 14.314.34 13.413.45 14.415.26 15.515.1>>xy#"q"character>q>theDFtheDF$Sport [1]"Hockey""Football" "Baseball" "Curling""Rugby""Lacrosse" [7]"Basketball" "Tennis""Cricket""Soccer"
2.
() Theme
() Theme
() Theme
(1) ""
(2)
(3)
() Theme
1.
1.
() Theme 1.
STEP 1STEP 2STEP 3""STEP 4""STEP 5""
Text mining and word cloud fundamentals in R5 simple steps you should know
https://goo.gl/snM2nZ
() Theme 1. STEP 1
http://www.technewsworld.com/story/83998.htmlBig Data and Analytics: Creating New Value
() Theme 1. STEP 1
() Theme 1. STEP 2
#install.packages("rvest")#""install.packages("tm")#""install.packages("SnowballC")#Textstemminginstall.packages("wordcloud")#""install.packages("RColorBrewer")#Colorpalettes
#library("rvest")library("tm")library("SnowballC")library("wordcloud")library("RColorBrewer")
RStudio
() Theme 1. STEP 3""
Chrome "" ( "F12")
() Theme 1.
"Copy Xpath"
2
2
1
STEP 3""
() Theme 1. STEP 3""
Xpath ()
//*[@id="storybody"] Xpath
() Theme 1. STEP 3""
#"source.page"source.page
() Theme 1. STEP 3""
#""docs
() Theme 1. STEP 3""
""
#""toSpace
() Theme 1. STEP 4""
dtm
() Theme 1. STEP 5""
#set.seed(1000)
#wordcloud(words=d$word,freq=d$freq,min.freq= 2,max.words=30,random.order=FALSE,rot.per=0.35,colors=brewer.pal( 8,"Dark2"))
() Theme
2.
2.
() Theme 2.
STEP 1STEP 2STEP 3""STEP 4""STEP 5""
http://andrew.ga/works/TextMining/
() Theme 2. STEP 1
http://www.appledaily.com.tw/realtimenews/arti
cle/life/20161016/968938/
() Theme 2. STEP 1
() Theme 2. STEP 2
RStudio#install.packages("rvest")#""install.packages("jiebaR")#""install.packages("tm")#""install.packages("wordcloud2")#""
#library("rvest")library("jiebaR")library("tm")library("wordcloud2")
() Theme 2. STEP 3""
Chrome "" ( "F12")
() Theme 2. STEP 3""
1
2
"Copy Xpath"
() Theme 2. STEP 3""
Xpath ()
//*[@id="summary"] Xpath
() Theme 2. STEP 3""
#"source.page"source.page
() Theme 2. STEP 3""
space_tokenizer= function(x){unlist(strsplit(as.character(x[[ 1]]),'[[:space:]]+' ))}
jieba_tokenizer= function(d){unlist(segment(d[[ 1]],mixseg)) }
#CNCorpus####CNCorpusFunctionStart#### CNCorpus= function(d.vec){doc
() Theme 2.
content.corpus=CNCorpus(list(content.vec))#CNCorpuscontent.corpus
() Theme 2.
frequency
() Reference
() Reference
http://datascienceandr.org/
() Reference
1. R - Wush WuChih Cheng LiangJohnson Hsieh
2. R - &http://goo.gl/18mwug
3. R - https://goo.gl/NPdzzP
1. DataCamphttps://www.datacamp.com/
2. R for Data Sciencehttp://r4ds.had.co.nz/
R
Jared P. Lander
() Reference
Taiwan R User Grouphttps://www.facebook.com/Tw.R.User/
https://www.facebook.com/twdsconf/
/ Data Visualizationhttps://www.facebook.com/data.visualize/
() Reference
Q & A