Exploratory Factor Analysis - 日本言語テスト学会 – …jlta2016.sakura.ne.jp/wp-content/uploads/2015/10/JLTA...correlation matrix ) 8 The common factor model • Decomposing

Exploratory Factor Analysis

（探索的因子分析）

Yasuyo Sawaki

Waseda University

JLTA2011 Workshop

Momoyama Gakuin University

October 28, 2011

1

Today’s schedule

Part 1: EFA basics

• Introduction to factor analysis

• Logic behind EFA

• Key steps of EFA

• EFA vs. CFA

(BREAK)

Part 2: EFA Exercise

2

What is factor analysis?

• A multivariate statistical technique developed in the early 1900s

• A method to examine relationships between observed variables （観測変数）and latent factors（潜在因子）

3

Types of factor analysis

• Exploratory factor analysis (EFA; 探索的因子分析)

– Data-driven approach

– Often used in early stages of an investigation

• Confirmatory factor analysis (CFA; 検証的/確認的因子分析)

– Theory-driven approach

– Often used in later stages of an investigation to confirm specific hypotheses

• Combining EFA and CFA

4

Previous applications of EFA in applied linguistics (1)

• Development and validation of survey and assessment instruments

Example 1: Bachman, Davidson, Ryan, & Choi (1995)

• Examining comparability of the factor structures of two English language tests (Cambridge FCE and TOEFL)

• EFAs for each test separately, followed by another EFA run for a combined analysis of the two tests

5


• Development and validation of survey and assessment instruments

Example 2: Vandergrift, Goh, Mareschal, & Tafagodtari (2006)

• Developing and validating a new survey on L2 learners’ metacognitive awareness and strategy use in listening comprehension

• An initial EFA to finalize survey content, followed by a CFA on a different sample

6


• Descriptive use of EFA

Example: Biber, Conrad, Reppen, Byrd, & Helt (2002)

• An EFA of an academic English language corpus to identify linguistic characteristics of various spoken and written registers

7

Logic behind factor analysis

• Both EFA and CFA based on the common factor model（共通因

子モデル）

Underlying logic: Variables correlate because they tap into the same construct(s) to certain degrees

Goal: To identify an optimal number of latent factors (factor solution) that describes the pattern of relationships among a set of variables sufficiently well (reproduction of the observed correlation matrix )

8

The common factor model

• Decomposing variance of observed variables into two parts:

– Common variance: part of variance influenced by a latent factor shared across different variables (common factors; 共通因子)

– Unique variance: part of variance not explained by the common factors

• Variance explained by factors other than the common factors (unique factors; 独自因子）

• Variance due to measurement error

9

EFA: Issues

• Readily available in statistical packages

• Easy to implement, but careful consideration of various issues in different steps of the analysis is required to obtain interpretable and meaningful analysis results

10

Key steps of EFA

Step 1: Checking appropriateness of study design and data type for conducting EFA

Step 2: Deciding on the number of factors to extract

Step 3: Extracting and rotating factors

Step 4: Interpreting key EFA results

11

A sample scenario

• An EFL speaking and listening test

• Sample size: N=200

• Variables 1-6 for speaking, and Variables 7-12 for listening (4-6 score points available for each variable)

• Expected findings

– There are two latent factors: One each for speaking and listening

– The two factors are correlated with each other because they are both different aspects of L2 language ability

• Software: SPSS (PASW Statistics Version 18)

12

Step 1: Checking appropriateness of study design and data type for EFA (1)

Data type: determines the type of correlation matrix to be analyzed

Common examples

• Interval or quasi-interval scales Pearson correlation matrix

• Ordered categorical data

– Dichotomous data tetrachoric correlation matrix

– Polytomous data polychoric correlation matrix

13


Number of variables: At least 3 variables needed per factor

Factor

Variable 1

Variable 2

Variable 3

Just identified model

Example 1: 3 variables to identify one factor Number of data points available in a correlation matrix with k=3: k(k+1)/2 = 3(3+1)/2 = 6

14



Over identified model

Factor

Variable 1

Variable 2

Variable 3

Variable 4

Example 2: 4 variables to identify one factor Number of data points available in a correlation matrix with k=4 : k(k+1)/2 = 4(5+1)/2 = 10

15



Under identified model

Factor

Variable 1

Variable 2

Example 2: 4 variables to identify one factor Number of data points available in a correlation matrix with k=2 : k(k+1)/2 = 2(3+1)/2 = 3

16


Sample size: Suggestions about sample size in the literature vary

A required sample size depends on many factors such as:

• Strength of correlations between factors and their indicator variables

• Reliability

• Score distribution of variables (e.g., normality)

• Number of factors to extract

SEE: Fabriger et al. (1999), Floyd & Widaman (1995), Tabachnick & Fidell (2007)

17

Step 2: Deciding on the number of factors to extract(1)

Various approaches to determining the number of factors

• Approaches based on eigenvalues for the correlation matrix

Eigenvalue（固有値）: Shows how much of the variance of a set of variables can be explained by a factor

• Goodness of model fit（モデルの適合度） : goodness-of-fit statistics available for certain estimation methods (e.g., ML)

18


• Approaches based on eigenvalues for the correlation matrix

Goal: To identify the number of factors with large enough eigenvalues to explain relationships among observed variables

• Kaiser’s criterion (Kaiser, 1960)

• Scree test (Cattell, 1966)

• Parallel analysis (PA; Horn, 1965)

19


• Kaiser’s criterion (Kaiser, 1960)

– The number of factors to extract = the number of factors with eigenvalues exceeding 1.0

20


• Scree plot (Cattell, 1966)

– The number of factors to extract = the “elbow” in the graph

21


• Parallel analysis (PA; Horn, 1965)

– Comparison of eigenvalues obtained from real data against those obtained from multiple samples of random numbers (see Hayton,et al., 2004; Liu& Rijmen, 2008 for sample SPSS and SAS programs)

Steps:

1. Obtain a scree plot for the actual data being analyzed

2. Run a program for PA, which calculates eigenvalues for multiple samples of random numbers; then take the mean of eigenvalues for each component across the PA runs

3. Plot the mean eigenvalues for individual components from the PA over the scree plot for comparing the eigenvalues from the actual data and those from the PA runs

22


• Goodness-of-fit statistics

– Available for certain estimation methods of model parameter estimation

Example: maximum likelihood (ML; 最尤法)

23


Example: maximum likelihood (ML; 最尤法)

24


• Issues of consideration

– Some widely used methods are easy to implement (e.g., Kaiser’s criterion, scree test)

– However, different methods have different disadvantages

• Kaiser’s criterion: Often found to produce misleading results

• Scree plots: Can be difficult to decide where the “elbow” is

• ML estimation: Data have to satisfy distributional assumptions (deviation from normality misleading analysis results)

25


• Issues of consideration (continued)

– What matrix should be analyzed when determining the number of factors? (Brown, 2006; Fabriger et al., 1999)

• Kaiser’s criterion: observed correlation matrix required

• Scree test: possible both on observed and reduced correlation matrices

• What should we do in practice?

– Use multiple methods

– In later steps of the analysis, carefully examine substantive interpretability and parsimony of a given factor solution

26

Step 3: Extracting and rotating factors(1)

Choice of a factor extraction method depends on multiple factors such as data type, distribution, and information you need

Common methods used with continuous variables:

Method Distributional assumption

Goodness-of-fit statistics

Principal factor analysis

No assumption imposed

Not available

Maximum likelihood (ML)

Normality assumed Available

27


• Principal component analysis (PCA; 主成分分析)

– NOT based on the common factor model; NOT consistent with the purpose of examining underlying factor structure

– A mathematical data reduction method; NOT based on the common factor model

– More appropriate when, for example, the goal is to create a composite variable out of a larger number of factors

28


Factor rotation

• One factor solution: Results from the initial factor solution can be interpreted

• A solution with multiple factors: Results from the initial factor solution is difficult to interpret Factor rotation (a mathematical transformation) is often conducted

– Orthogonal rotation（直行回転）: Correlations among factors NOT allowed (e.g., Varimax rotation)

– Oblique rotation（斜交回転）: Correlations among factors allowed (e.g., Oblimin rotation; Promax rotation)

29


Factor loadings without

rotation (based on PFA)

30


31


Factor loadings and inter-factor

correlation after Promax rotation

(based on PFA)

32


33


Issues of consideration

• With language data, which type of factor rotation (orthogonal vs. oblique) is generally preferred?

• Should factors be correlated to use an oblique rotation?

34

Step 4: Interpreting key analysis results(1)

Key analysis results to focus on:

– Communality （共通性）: The estimated proportion of the variance of a given variable explained by the factors included in the model

– Factor loading（因子負荷量）: A standardized estimate of the strength of the relationship between an observed variable and a latent factor (similar to a standardized regression coefficient)

– Factor correlation（因子間相関）: A “true” correlation between a pair of factors; adjusted for measurement error

35


Communality （共通性）

Communality should be less

than 1.0 for every single

variable.

If communality > 1.0 for any,

the factor solution cannot be

interpreted in a meaningful

way.

36


Calculating communality

Example: When the inter-factor correlation (ϕ21) = .66, the communality of Speaking 3 can be calculated as:

Communality = λ312

+ λ322 + 2λ31 ϕ21λ32

= (.70)2 + (.08)2 + 2(.70)(.66)(.08)

= .490 + .0064 + .074 = .570

(For details about calculating communality, see Brown, 2006, pp. 90-91)

37


Factor loadings and inter-factor

correlation after Promax rotation

(based on PFA)

38

EFA vs. CFA (1)

Differences in approaches

• Exploratory factor analysis (EFA; 探索的因子分析)

– Data-driven approach

– Often used in early stages of an investigation

• Confirmatory factor analysis (CFA; 検証的/確認的因子分析)

– Theory-driven approach

– Often used in later stages of an investigation to confirm specific hypotheses

39

EFA vs. CFA (2)

Graphic representation of the differences (per Brown, 2006)

Factor 1

Variable 1

Variable 2

Variable 3

Variable 4

Variable 5

Variable 6

Factor 2

EFA

(oblique rotation)

40

EFA vs. CFA (3)

Graphic representation of the differences (per Brown, 2006)

Factor 1

Variable 1

Variable 2

Variable 3

Variable 4

Variable 5

Variable 6

Factor 2

CFA

41

EFA vs. CFA (4)

Issues of consideration about using EFA and CFA (Jöreskog, 2007)

• Stages of research: exploratory vs. confirmatory phases

• Nature of investigation: “Factor analysis need not be strictly exploratory or strictly confirmatory. Most studies are to some extent both exploratory and confirmatory because they involve some variables of known and other variables of unknown composition.” (Jöreskog, 2007, p. 58)

• Cross-validation of results from exploratory studies by conducting confirmatory analyses on different data sets

(e.g., using randomly-split samples)

42

Useful guidelines for conducting EFA

• Factor analysis is often criticized not because of the fundamental weakness of the methodology but because of its misuses in previous factor analysis applications

• Further reading:

– Brown (2006)

– Fabriger et al. (1999)

– Floyd & Widaman (1995)

– Preacher & MacCallum (2003)

– Tabachnick & Fidell (2007)

43

References

Bachman, L. F., Davidson, F., Ryan, K., & Choi, I.-C. (1995). An investigation into the comparability of two tests of English as a foreign language. Cambridge, U.K.: Cambridge University Press.

Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multidimensional comparison. TESOL Quarterly, 26, 9-48.

Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.

Cattell, R. B. (1966). The scree test for the number of common factors. Multivariate Behavioral Research, 1, 245-276.

Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.

Fabriger, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272-299.

44

References (continued)

Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7(3), 286-299.

Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7(2), 191-205.

Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185.

Jöreskog, K. G. (2007). Factor analysis and its extensions. In R. Cudeck & R. C. MacCallum (Eds.), Factor analysis at 100: Historical developments and future directions. Mahwah, NJ: Erlbaum.

Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20, 141-151.

Liu, O. L., & Rijmen, F. (2008). A modified procedure for parallel analysis of ordered categorical data. Behavioral Research Methods, 40, 556-562.

45

References (continued)

Preacher, K. J., & MacCallum, R. C. (2003). Repairing Tom Swift’s electric factor analysis machine. Understanding Statistics, 2(1), 13-43.

Sawaki, Y. (in press). Factor analysis. In Carol A. Chapelle (Ed.), Encyclopedia of applied linguistics. New York: Wiley.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Pearson.

Vandergrift, L., Goh, C. C. M., Mareschal, C. J., & Tafaghodtari, M. H. (2006). The metacognitive awareness listening questionnaire: Development and validation. Language Learning, 56(3), 431-462.

Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99(3), 432-442.

46

Questions?

E-mail me at: [email protected]

47

Documents

Exploratory Factor Analysis - 日本言語テスト学会 – …jlta2016.sakura.ne.jp/wp-content/uploads/2015/10/JLTA...correlation matrix ) 8 The common factor model • Decomposing