[系列活動] 手把手教你R語言資料分析實務

  • View
    3.497

  • Download
    3

Embed Size (px)

Text of [系列活動] 手把手教你R語言資料分析實務

  • ,

    R

    1

  • ()

    : https://goo.gl/65QPyM

    : https://goo.gl/JKHJAE

    (R, RStudio)

    2

    https://goo.gl/65QPyM

  • ()

    session_00_install_packages.R

    3

  • FAQ.1: : UTF-8

  • FAQ.2: : , setwd

  • FAQ.3: Rtools: , session00.R

    https://cran.r-project.org/bin/windows/Rtools/R

    https://cran.r-project.org/bin/windows/Rtools/

  • Pkg installr

    # installing/loading the package:if(!require(installr)) {install.packages("installr"); require(installr)}

    # using the package:updateR()

    7

    FAQ.3: Rtools (2): R !!!

  • FAQ.4: dependent package: ,

    Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called Rcpp

    > install.package(Rcpp, dep = T)

  • FAQ.5: Mac :1) , par(family = 'STHeiti')2) , plot( , family = STHeiti)

  • 10

    Lecturers

  • 11

  • (demo)

    data code

    12

  • session_00_install_R_packages.R

    Rstudio

    Ex. http://www.appledaily.com.tw/appledaily/article/headline/20170112/37516585

    , Sli.do

    13

    http://search.appledaily.com.tw/charity/projlist/

  • Outline

    A. & ---

    B. ---

    C. ---

    D. ---

    14

  • Session A

    Data Collection

  • ()

    16

  • 17

    http://www.appledaily.com.tw/appledaily/article/headline/20160624/37280888

    http://search.appledaily.com.tw/charity/projlist/

  • download.file session_A_DataCollection.R

    - A-01

    18

  • # Article urlurl ()

  • 20

  • : xml, csv, json

    -- R packages

    21

  • HTML

    Title Paragraph

    Item 1 Item 2

    Title Paragraph

    Item 1 Item 2

    22

  • Chrome/Firefox F12

    http://search.appledaily.com.tw/charity/projlist/

  • (pkg xml2)

    xml2 read_html; read_xml

    xml_find_all; xml_find_first

    xml_text; xml_attr

    Title Paragraph

    Item 1 Item 2

    Xpath

    24

  • XPath: a path of XML Tree

    / (children)

    // (descendant)

    [@] (attribute)

    *

    | OR

    25

    A book

    1

    2

    1

    2

  • XPath: a path of XML Tree

    / (children)

    // (descendant)

    [@] (attribute)

    *

    | OR

    26

    A book

    1

    2

    1

    2

    : https://goo.gl/jNHmDH

    Xpath: https://goo.gl/OcfO0O

    https://goo.gl/jNHmDHhttps://goo.gl/OcfO0O

  • Xpath = //*[@id=inquiry3]/table//tr[4]/td[1]

    http://search.appledaily.com.tw/charity/projlist/

  • (pkg xml2)

    library(xml2)

    # set your target urldoc

  • xml_text(doc):

    xml_attr(doc, attr):

    29

    Title Paragraph

    Item 1 Item 2

    Title Paragraph

    Item 1 Item 2

  • A-01 (8 mins)

    Xpath1-1:

    1-2:

    1-3:

    bonus:

    - A-01

    30

    http://search.appledaily.com.tw/charity/projlist/

  • A-01 (8 mins)

    1-1

    1-2

    1-3

    - A-01

    31

    http://search.appledaily.com.tw/charity/projlist/

  • A-01 (8 mins)

    Xpath1-1:

    1-2:

    1-3:

    bonus:

    bonus

    - A-01

    32

  • A_ex01 ()

    Xpath1-1: "//*[@id='inquiry3']/table//tr/td[1]"

    1-2: "//*[@id='inquiry3']//tr[2]/th"

    1-3: "//*[@id='inquiry3']//tr/td[6]/a"

    bonus: "//*[@id='charity_day']"

    - A-01

    35

  • data frame

    # vectordata.frame(a1=a1, a2=a2, a3=a3, )

    # matrix()

    - A-02

    36

  • A-02 (13 mins)

    csv # npage

  • aid

    title

    date.published

    case.closed

    donation

    url.article

    url.detail

    A-02 (13 mins)

    - A-02

    df_article_raw.csv

    # npage

  • aid

    title

    date.published

    case.closed

    donation

    url.article

    url.detail ()

    A-02 (13 mins)

    - A-02

    df_article_raw.csv

    Xpath

    39

    http://search.appledaily.com.tw/charity/projlist/

  • A_ex02 ()

    - A-02

    40

  • A-03 (Optional)

    - A-03

    41

    http://search.appledaily.com.tw/charity/projlist/

  • A-03 (Optional)

    Outcomedf_article_raw.csv

    .txt

    .txt

    - A-03

    42

    http://search.appledaily.com.tw/charity/projlist/

  • Facebook Facebook Graph API

    http://graph.facebook.com/?fields=share&id=http://www.appledaily.com.tw/appledaily/article/headline/20170113/37517808

    43

  • ()

    ( / )

    ()

    44

  • A-01 Xpath

    A-02 ()

    A-03 ()

    45

  • aid ()

    case.closed ()

    date.published ()

    donation ()

    title ()

    url.article ()

    url.detail ()

    journalist ()

    n.image ()

    n.word ()

    donor ()

    date.funded ()

    df_article_raw df_article

    46

    A - 02

    A - 03

  • Next session starts at AM 11:00

    Stay Tuned Well be back soon!!

    47

  • A-03

    48

  • A_ex03 ()

    - A-03

    df_article_raw.csv

    49

  • Session BExplanatory Data Analysis

  • Data Manipulation

    51

  • Outcome (df_article_raw.csv)

    .txt (data/db_article_txt/)

    .txt (data/db_detail_txt/)

    EDA- B-01

    52

  • df_article.csv

    aid

    case.closed

    date.published

    donation

    title

    url.article

    url.detail

    donor

    date.funded

    journalist

    n.image

    n.word

    df_article.csv

    df_article_raw.csv

    53

  • df_article.csv

    54

  • df_donation.csv

    In db_detail_txt.rar

    df_donation.csv

    55

  • B-01 (10 mins)

    crawl df_donation.csv

    EDA - B-01

    56

  • Character Encoding Problem (Mac)

    read.csv Sys.getlocale() locale

    system("defaults write org.R-project.Rforce.LANG en_US.UTF-8")

    system("defaults write org.R-project.Rforce.LANG zh_TW.UTF-8")

    read.csv parameters fileEncoding = "UTF-8"

    57

  • 58

  • EDA ?

    EDA (Exploratory Data Analysis)

    outliers

    59

  • 60

    1 2

    4

    3

  • 61

  • Summary Functions in R

    Function Description

    names() Functions to get or set the names of an object

    head(), tail()Returns the first or last parts of a vector, matrix, table,

    data frame or function /()

    str() Compactly display the internal structure of an R object

    summary() Produce result summaries

    dim() Retrieve or set the dimension of an object

    length() Get or set the length of vectors

    complete.cases()Return a logical vector indicating which cases are

    complete, i.e., have no missing values NA

    as.Date()Convert between character representations and

    objects of class "Date" representing calendar dates

    Function name and parameter http://jeromyanglim.blogspot.tw/2010/05/abbreviations-of-r-commands-explained.html

    62

    http://jeromyanglim.blogspot.tw/2010/05/abbreviations-of-r-commands-explained.html

  • Visualization Functions in R

    Function Description

    plot() Generic function for plotting of R objects ( or R object)

    boxplot()Produce box-and-whisker plot(s) of the given

    (grouped) values

    hist() Computes a histogram of the given data values

    barplot()Creates a bar plot with vertical or horizontal

    bars

    arrows() Draw arrows between pairs of points (x0, y0, x1, y1)

    abline()a, b: the intercept and slope, single values.

    y = [A] + [B]x a, b

    lines()Join the corresponding points with line

    segments.

    Function name and parameter http://jeromyanglim.blogspot.tw/2010/05/abbreviations-of-r-commands-explained.html

    63

    http://jeromyanglim.blogspot.tw/2010/05/abbreviations-of-r-commands-explained.html

  • session_B_eda.R

    # load in apple daily article> d dim(d)[1] 3779 12

    # check the column names> names(d)[1] "aid" "case.closed" "date.funded" "date.published" [5] "donation" "donor" "journalist" "n.image" "n.word" [10] "title" "url.article" "url.detail"

    EDA - B-02

    64

  • (character, integer, date )

    > typeof(d$date.published)

    > sapplay(d, typeof) # ,

    > d$date.published d$title

  • # use str() to have a brief data summary> str(d)

    str()

    EDA - B-02

    66

  • summary() NA

    EDA - B-02

    67

  • (NA)

    (NA)

    1. which() + is.na()2. !complete.case()3. summary()

    1. 2. NA

    1. 2. na.omit()

    1. 2. 3.

    68

  • B-02 (Later)

    NA n.word NA

    n.image NA

    EDA - B-02

    69

  • 70

  • - -

    vs.

    / vs. / /

    Box plot

    Scatter plot

    Bar plot

    Line chart

    Density plot

    71

  • EDA

    (Box-plot)

    (Histogram)

    (Scatter-plot) (Line-chart)

    72

  • ?

    73

  • # use hist() to check donation distribution> hist(d$donation, breaks = 100)

    hist()

    EDA - B-02

    74

  • EDA - B-02Question :

    http://www.appledaily.com.tw/appledaily/article/headline/20160315/37108832

  • ?

    76

  • # use plot to check relationship between numbers of donors and total donation (and draw a linear line)> plot(d$donor,

Recommended

View more >