Imb 2011 Seminar6 Pca Fa

Embed Size (px)

Citation preview

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    1/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 1

    PRINCIPAL COMPONENT ANALYSIS

    The main idea of this method is to form, from a set of existing variables, a new variable

    (or new variables, but as few as possible) that contain as much variability of the original

    data as possible. This is a method of data reduction; we reduce the number of variables

    in order to handle data more easily.

    In most cases we wish to get only one dimension (variable) that contains most of the

    variability of the original data. This variable than represents some sort of index of a

    certain property that is measured by the original variables. For example:

    - we are measuring the development of a region. We measure the differences with

    several variables (e.g. GDP/pc, infant mortality,...). With the help of principal

    component analysis we can construct an index of development.

    - a controller in a factory has several indicators of quality - with principal

    components analysis we can construct a quality index

    PRINCIPAL COMPONENT ANALYSIS WITH SPSS PROCEDURE FACTOR

    ANALYSIS

    SPSS can perform principal component analysis, but the procedure for doing so is

    hidden within the procedure for factor analysis. Procedure can perform the analysis with

    standardized and original (non-standardized) data. With this procedure we can

    - compute descriptive statistics for all variables

    - make the correlation matrix

    - compute communalities

    - compute the share of variance of original data, explained by each and all components

    - plot the scree-plot

    COMPUTATION OF THE PARAMETERS OF PRINCIPAL COMPONENTS ANALYSIS

    1. Enter or load the data

    2. Select Analyze | Dimension Reduction | Factor; we get the menu Factor Analysis

    (Figure 1)

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    2/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 2

    Figure 1: Dialog window Factor Analysis

    3. In the left box we select the variables that we want to enter into the principal

    components analysis and transfer them into the right box.

    4. Click Extraction...; we get the menu Factor Analysis: Extraction (Figure 2). The

    option for performing principal components analysis is Principal Components in the

    fieldMethod. Other options in this field are for factor analysis. .

    5. We click OK, the window Factor Analysis closes and the results of the analysis

    appear in theViewerwindow.

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    3/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 3

    Figure 2: Dialog window Factor Analysis: Extraction

    In the boxAnalyzewe can set, whether the analysis will be performed on original (non-

    standardized) (Covariance matrix) or standardized data (Correlation matrix).

    When choosing the analysis on original data, the importance of a variable is determined

    by the relative size of its variance higher variance means higher importance of that

    variable. If we dont want the variability of a variable to determine its importance, we

    decide to standardize data and so to use the correlation matrix.

    The decision, which one to use, depends on the nature of the problem. If we think the

    variables are more or less equally important, we decide for the standardization; if the

    variability of the variable is of any importance, we use covariance matrix in the analysis.

    When variables are of very different measurement sizes (e.g. infant mortality in % against

    GDP/pc in $) the standardization is usually the only sensible choice.

    Field Display offers the possibility of printing the unrotated solution (the only one in

    principal component analysis). The solution can contain only some components; the

    number of components is set by the rules in the fieldExtract.

    Field Displayalso sets the display of the scree-plot. Scree-plot is useful in determining the

    number of components needed.

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    4/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 4

    In fieldExtractwe set how many components we want to be displayed. We can set the

    number of components we want or set the cut-off eigenvalue. Default value is 1 in the

    case of standardized data or the average eigenvalue in case of original data.

    DESCRIPTIVE STATISTICS AND CORRELATION MATRICES

    ClickDescriptives, which opens the dialog windowFactor Analysis: Descriptives (Figure

    3). In this dialog we set:

    - in field Statistics the display of descriptive statistics and the initial solution (all

    components)

    Figure 3: Dialog window Factor Analysis: Descriptives

    - in field Correlation Matrix we set the display of correlation matrix, significances,...

    KMO or Keiser-Meyer-Olin-ova measure of sampling adequacy shows the strength

    of connection between variables; it can be between 0 and 1, values closer to 1 are

    more desirable. Bartlet test of sphericity tests for the assumption, that the correlation

    matrix is an identity matrix (variables are not correlated). In this case, principal

    component analysis can not be performed.

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    5/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 5

    EXAMPLE

    FACTOR/VARIABLES total_liters value_sum transactions share_olive_oil/MISSING LISTWISE/ANALYSIS total_liters value_sum transactions share_olive_oil/PRINT UNIVARIATE INITIAL CORRELATION KMO EXTRACTION/PLOT EIGEN/CRITERIA MINEIGEN(1) ITERATE(25)/EXTRACTION PC/ROTATION NOROTATE/METHOD=CORRELATION.

    Factor Analysis

    Descriptive Statistics

    Mean Std. Deviation Analysis N

    total_liters 1.5709 1.49828 504

    value_sum 10.1272 9.69014 504

    transactions 1.90 1.597 504

    share_olive_oil 8.6048 11.73409 504

    Correlation Matrix

    total_liters value_sum transactions share_olive_oil

    total_liters 1.000 .824 .842 .249

    value_sum .824 1.000 .867 .299

    transactions .842 .867 1.000 .210

    Correlation

    share_olive_oil .249 .299 .210 1.000

    KMO and Bartlett's Test

    Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .767

    Approx. Chi-Square 1436.940

    df 6

    Bartlett's Test of Sphericity

    Sig. .000

    Communalities

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    6/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 6

    Initial Extraction

    total_liters 1.000 .863

    value_sum 1.000 .894

    transactions 1.000 .881

    share_olive_oil 1.000 .157

    Extraction Method: Principal

    Component Analysis.

    Total Variance Explained

    Initial Eigenvalues

    Extraction Sums of Squared

    Loadings

    Component

    Total% of

    VarianceCumulative

    % Total% of

    VarianceCumulative

    %

    1 2.796 69.898 69.898 2.796 69.898 69.898

    2 .898 22.461 92.359

    3 .180 4.511 96.870

    dimen

    sion0

    4 .125 3.130 100.000

    Extraction Method: Principal Component Analysis.

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    7/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 7

    Component Matrixa

    Compone

    nt

    1

    total_liters .929

    value_sum .946

    transactions .939

    share_olive_oil .396

    Extraction Method: Principal

    Component Analysis.

    a. 1 components extracted.

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    8/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 8

    FACTOR ANALYSIS

    With principal component analysis we tried to explain as much variance of the original

    data as possible by forming new, synthetic variables. In factor analysis we try to find

    some dimensions, traits, that can not be measured directly, but affect certain variables

    that can be measured.

    For example, measuring intelligence. We can not measure intelligence, but we can

    measure certain capabilities of an individual (mathematical, logical...) that are affected by

    intelligence.

    FACTOR ANALYSIS WITH SPSS DIFFERENCES FROM PRINCIPALCOMPONENTS ANALYSIS

    Although the logic of both is different, both principal components and factor analysis are

    supported in the same SPSS function. In factor analysis the following methods of

    extraction are used:

    1. Principal factors

    - this method differs from principal components only in logic and explanation.

    Initial solution is always based on this method

    - Methods creates factors, that are uncorrelated (between themselves) linear

    combinations of initial variables.

    2. Principal axes

    - Method creates factors from the modified correlation matrix, which has diagonal

    values less than 0. This is an iteration method; in the first step the diagonal values

    are communalities of the initial (principal factors) solution. In the following steps,

    communities from previous steps are used until the solution converges.

    3. alpha factoring

    - method assumes, that we deal with a sample and tests for significances.

    4. image factoring

    - this is actually the first step of principal axes method; modified correlation matrix

    with multiple determination coefficients on the diagonal is used.

    5. ordinary least squares

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    9/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 9

    - minimizes the differences between the actual and estimated correlation matrix,

    not taking account of the diagonal values

    6. generalized least squares

    - minimizes the differences between the actual and estimated correlation matrix,

    not taking account of the diagonal values; variables are weighted by the inverse

    value of their uniqueness

    Most commonly used is the method of principal axes. Principal factors is less

    appropriate, because it doesnt take account of the existence of specific factors, that

    influence variables, existence of which if shown by communalities less than 1. It is only

    used when other methods dont converge.

    Rotation is used in order do improve the solution, to get a more clear picture. We know

    orthogonal and oblique (non-orthogonal) rotations.

    Rotations in SPSS:

    1. Varimax

    - orthogonal rotation, that minimizes the number of variables that have high

    loadins on each factor; it simplifies the interpretation of factors

    2. Quartimax- orthogonal rotation; that minimizes the number of factors needed to explain each

    variable; it simplifies the interpretation of the observed variables

    3. Equamax

    - orthogonal rotation, combination of varimax and quartimax.

    4. Oblimin

    - oblique rotation; non-orthogonal rotations are used, when orthogonal rotation

    dont give an interpretable solution. Delta determines the obliqueness, 0 meaning

    the most oblique rotation

    5. Promax

    - oblique rotation

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    10/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 10

    Difference between pattern and structure loadings

    - structure loadings are correlation coefficients between variable and factor

    - pattern loadings are regression coefficients between variable and factor

    - product of pattern loadings for two variables gives correlation between this two

    variables

    - structure loadings are commonly explained

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    11/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 11

    EXAMPLE

    Factor Analysis

    This example is done on the personality questions in the database.

    We do the factor analysis following the same steps as with principal factor analysis.

    FACTOR/VARIABLES Q17.1 Q17.2 Q17.3 Q17.4 Q17.5 Q17.6 Q17.7 Q17.8 Q17.9

    Q17.10Q17.11 Q17.12 Q17.13 Q17.14 Q17.15 Q17.16 Q17.17 Q17.18 Q17.19

    Q17.20/MISSING LISTWISE /ANALYSIS Q17.1 Q17.2 Q17.3 Q17.4 Q17.5 Q17.6

    Q17.7 Q17.8Q17.9 Q17.10 Q17.11 Q17.12 Q17.13 Q17.14 Q17.15 Q17.16 Q17.17

    Q17.18Q17.19 Q17.20/PRINT UNIVARIATE INITIAL CORRELATION KMO EXTRACTION ROTATION/PLOT EIGEN/CRITERIA MINEIGEN(1) ITERATE(25)/EXTRACTION PAF/CRITERIA ITERATE(25)/ROTATION VARIMAX/METHOD=CORRELATION .

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    12/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 12

    Correlationmatrix

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    13/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 13

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    14/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 14

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    15/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 15

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    16/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 16

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    17/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 17

    ADEQUACY OF DATA

    From the correlation matrix we could see that most correlations are not high, but some

    are and many more are statistically significant.

    Bartlett test shows significant differences and KMO measure at 0.738 shows that the data is

    appropriate for this type of analysis.

    STANDARDIZED OR ORIGINAL DATA?

    As all questions are measured on the same scale, one could use covariance matrix (non-

    standardized data) for the analysis. However, use of standardized data is still correct.

    Because of a simpler output and because its much more common in practice, correlation

    matrix is usually used in the example.

    NUMBER OF FACTORS

    Based on the scree plot one would use four factors, although the Kaiser rule suggests to

    use five factors.

    INTERPRETATION OF FACTORS

    Factors are interpreted based on structure loadings. We can interpret the non-rotated solution or

    use one of the rotations.

    In the example, we used varimax rotation. We have four factors that can be interpreted as

    follows:- optimism and self-esteem

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    18/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 18

    - sociability

    - desperation and indecisiveness

    - artism

    When orthogonal rotation doesnt give a sensible interpretation we use oblique rotation.

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    19/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 19

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    20/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    Marko Pahor 20

  • 7/30/2019 Imb 2011 Seminar6 Pca Fa

    21/21

    Seminar 6: PCA and Factor Analysis January 19 2011

    M k P h 21

    In our case there arent many differences between orthogonal and oblique rotation.Factor correlation matrix shows the obliqueness higher the correlations, more obliquethe rotation.