118
SAS Training Basic 1

SAS BASICS

Embed Size (px)

Citation preview

Page 1: SAS BASICS

SAS TrainingBasic

1

Page 2: SAS BASICS

Agenda

2

Introduction to SAS Software Program Data preparation & TabulationTest of Difference: T-test, and ANOVATest of Association: Correlation & Regression Analysis

Page 3: SAS BASICS

3

Page 4: SAS BASICS

SAS•From traditional statistical analysis of

variance and predictive modeling to exact methods and statistical visualization techniques, SAS/STAT software is designed for both specialized and enterprise wide analytical needs. SAS/STAT software provides a complete, comprehensive set of tools that can meet the data analysis needs of the entire organization. 

4

Page 5: SAS BASICS

SAS Components

5

SAS Enterprise Guide

SAS 9.2

Graphical user interface application for some common basic data analysis tasks.

Command-based application for a wide variety of data analysis tasks.

Page 6: SAS BASICS

SAS Enterprise Guide•To  open  the  statistical  software 

package  SAS  go  to  the Start Menu  >>>  All Programs >>>  SAS  >>>  SAS Enterprise Guide 4.3

6

Page 7: SAS BASICS

SAS 9.2•To  open  the  statistical  software 

package  SAS  go  to  the Start Menu >> All Program >> SAS >> SAS 9.2 (English)

7

Page 8: SAS BASICS

What Is SAS Enterprise Guide?What Is SAS Enterprise Guide? SAS

Enterprise Guide is an easy-to-use Windows client application that provides these features:

8

– access to much of the functionality of SAS

– an intuitive, visual, customizable interface

– transparent access to data – ready-to-use tasks for analysis and

reporting – easy ways to export data and results

to other applications – scripting and automation – a program editor with syntax

completion and built-in function help

Page 9: SAS BASICS

Explore the Main Windows9

1

2

3

Page 10: SAS BASICS

Create a Project for This Tutorial• If SAS Enterprise Guide is not open, start it now.

In the Welcome window, select New Project. • If SAS Enterprise Guide is already open,

select File >> New Project. If you already had a project open in SAS Enterprise Guide, you might be prompted to save the project. Select the appropriate response.

• The new project opens with an empty Process Flow window.

10

Page 11: SAS BASICS

1. The Project Tree•You can use the Project Tree window to

manage the objects in your project. You can delete, rename, and reorder the items in the project. You can also run a process flow or schedule a process flow to run at a particular time. 

11

Page 12: SAS BASICS

2. Workspace and Process Flow WindowsYou can have one or more process flows in your project. When you create a new project, an empty Process Flow window opens. As you add data, run tasks, and generate output, an icon for each object is added to the process flow.

The process flow displays the objects in a project, any relationships that exist between the objects, and the order in which the objects will run when you run the process flow.

12

Page 13: SAS BASICS

3. The Task ListYou can use tasks to do everything from manipulating data, to running specific analytical procedures, to creating reports.

Many tasks are also available as wizards, which contain a limited number of options and can provide a quick and easy way to use some of the tasks.

13

Page 14: SAS BASICS

Add SAS Data to the Project•You can add SAS data

files and other types of files, including OLAP cubes, information maps, ODBC-compliant data, and files that are created by other software packages, such as Microsoft Word or Microsoft Excel.

14

Page 15: SAS BASICS

•SAS Enterprise Guide requires all data that it accesses to be in table format. A table is a rectangular arrangement of rows (also called observations) and columns (also called variables). 

15

Name Gender Age WeightJones M 48 128.6Laverne M 58 158.3Jaffe F . 115.5Wilson M 28 170.1

Page 16: SAS BASICS

• a column's type is important because it affects how the column can be used in a SAS Enterprise Guide task. A column's type can be either character or numeric.

• Character variables, such as Name and Gender in the preceding data set, can contain any values. Missing character values are represented by a blank.

• Numeric variables, such as Age and Weight in the preceding data set, can contain only numeric values. Currency, date, and time data is stored as numeric variables. Missing numeric values are represented by a period.

16

Name Gender Age WeightJones M 48 128.6Laverne M 58 158.3Jaffe F . 115.5Wilson M 28 170.1

Page 17: SAS BASICS

Local and Remote Data•When you open data in SAS Enterprise

Guide, you must select whether you want to look for the data on your local computer, a SAS server, or in a SAS folder.

17

Page 18: SAS BASICS

Local and Remote Data (Cont’)•If you click My Computer, you can

browse the directory structure of your computer. You can open any type of data file that SAS Enterprise Guide can read.

•If you click Servers, you can look for your data on a server. A server can either be a local server if SAS software is installed on your own computer, or it can be a remote server if SAS software is installed on a different computer.

18

Page 19: SAS BASICS

Open Data from Server• Within each server there are icons that you can

select for Libraries and Files. Libraries are shortcut names for directory locations that SAS knows about. Some libraries are defined by SAS, and some are defined by SAS Enterprise Guide. Libraries contain only SAS data sets.

• The Files folder on a server enables you to access data files in the directory structure on the computer where the SAS server is running. For example, if you wanted to open a Microsoft Excel file on a server that is defined in your repository, you would use the Files node to locate and open the file.

19

Page 20: SAS BASICS

Open Data from SAS FoldersIf you click SAS Folders, you can browse the list of SAS folders that you can access. SAS folders are defined in the SAS Metadata Server and can be used to provide a central location for your stored processes, information maps, and projects so that they can be shared with other SAS applications. SAS folders can also contain content that is not in the SAS Metadata Server, such as data files.

20

Page 21: SAS BASICS

Add SAS Data from Your Local Computer•Select File >> Open >> Data. In the Open

Data window, select My Computer. •Open the SAS Enterprise Guide samples

directory and double-click Data. By default, the sample programs, projects, and data are located in C:\Program Files\SAS\EnterpriseGuide\4.3\Sample.By default, all file types are displayed in the window. Files with the   icon are SAS data sets. Press CTRL and select Orders.sd2 and Products.sas7bdat, and then click Open.

21

Page 22: SAS BASICS

Add SAS Data from Your Local Computer (Cont’)• Shortcuts to

the Products and Orderstables are added to the project, and the data sets open in data grids.

• By default, the tables open in read-only mode. In this mode, you can browse, resize column widths, hide and hold columns and rows, and copy columns and rows to a new table.

• You cannot edit the data in the table unless you change to edit mode. Select Edit >> Remove Protect Data

22

Page 23: SAS BASICS

View the Properties of a Data Set• In the project tree, right-click Products and

select Properties from the pop-up menu. The Properties for Products window opens. You can see information about general properties such as the physical location of the data and the date it was last modified.

23

Page 24: SAS BASICS

View the Properties of a Data Set (Cont’)•In the selection pane, click Columns.

Here you can view a list of columns in your data and the column attributes.

24

Page 25: SAS BASICS

Add Data from a SAS Library• Select File >> Open

>> Data. In the Open Data window, select Servers.

• Double-click Libraries, and then double-click SASHELP. As you can see, only SAS data sets are stored in libraries

• Scroll in the window and double-click the PRDSALE data set. A shortcut to the data is added to the project and the data opens in the data grid.

25

Page 26: SAS BASICS

Save the Project• Select File >> Save

Project As. • The Save window opens

and prompts you to choose whether to save the project on your computer or on a server. Select My Computer.

• In the Save window, select a location for the project. In the File name box, type ‘your file name’. Project files are saved with the extension .egp.

• Click Save.

26

Page 27: SAS BASICS

27

Page 28: SAS BASICS

Data Input•There are two main simple tasks for data

input;▫Manually Input Data▫Import from an External File

28

Page 29: SAS BASICS

Manually Input Data1. Create a SAS Library2. Create a SAS Data Set3. Input data

29

Page 30: SAS BASICS

What is a SAS Data Library?

•A SAS data library is a collection of one or more SAS files that are recognized by SAS and can be referenced and stored as a unit. Each file is a member of the library. SAS data libraries help to organize your work. For example, if a SAS program uses more than one SAS file, then you can keep all the files in the same library. Organizing files in libraries makes it easier to locate the files and reference them in a program.

30

Page 31: SAS BASICS

Telling SAS Where the SAS Data Library Is Located•directly specify the operating

environment's physical name for the location of the SAS data library.

•assign a SAS libref (library reference), which is a SAS name that is temporarily associated with the physical location name of the SAS data library.

31

Page 32: SAS BASICS

Using Librefs for Temporary and Permanent Libraries•When you start a SAS session, SAS

automatically assigns the libref WORK to a special SAS data library. Normally, the files in the WORK library are temporary files.

•Files that are stored in any SAS data library other than the WORK library are usually permanent files; that is, they endure from one SAS session to the next. Store SAS files in a permanent library if you plan to use them in multiple SAS sessions.

32

Page 33: SAS BASICS

Create a SAS Library•Tools >> Assign Project Library

33

Page 34: SAS BASICS

Create a SAS Library – Step 1•Specify name and server for the library

34

Page 35: SAS BASICS

Create a SAS Library – Step 2•Specify the engine for the library

35

Page 36: SAS BASICS

Create a SAS Library – Step 3•Specify options for the library

36

Page 37: SAS BASICS

Create a SAS Library – Step 4• Click Test Library, checking it’s OK to create this

library• Press Finish to create the library

37

Page 38: SAS BASICS

Create a SAS Library

•Check created library at Server List

•When a libref is assigned to a SAS data library, you can use the libref throughout the SAS session to access the SAS files that are stored in that library or to create new files.

38

Page 39: SAS BASICS

Create SAS Data Set•File >> New >> Data

39

Page 40: SAS BASICS

Create SAS Data Set – Step 1•Specify name ‘TEST’ and location

‘DEMO’

40

Page 41: SAS BASICS

Create SAS Data Set – Step 2•Create columns and specify their

properties

41

Name Gender Age WeightJones M 48 128.6Laverne M 58 158.3Jaffe F . 115.5Wilson M 28 170.1

Page 42: SAS BASICS

Input Data42

Page 43: SAS BASICS

Import from an External File•The Import Data wizard enables you to

create SAS data sets from text, HTML, or PC-based database files (including Microsoft Excel, Microsoft Access, and other popular formats). When you use the Import Data wizard, you can specify import options for each file that you import.

43

Page 44: SAS BASICS

Import Data•File >> Import Data

44

Page 45: SAS BASICS

Import Data (Cont’)•Desktop >> SAS Training >> Data

Advising Survey.xls

45

Page 46: SAS BASICS

Import Data (Cont’)•Specify Data

46

Page 47: SAS BASICS

Import Data (Cont’)•Select Data Source

47

Page 48: SAS BASICS

Import Data (Cont’)•Define Field Attributes

48

Page 49: SAS BASICS

Import Data (Cont’)•Advanced Options

49

Page 50: SAS BASICS

Import Data Result50

Page 51: SAS BASICS

Import SPSS file51

Page 52: SAS BASICS

Import SPSS file – Step 1•Select an SPSS file to import

52

Page 53: SAS BASICS

Import SPSS file – Step 2•Specify a name for the imported table

53

Page 54: SAS BASICS

Import SPSS file Result54

Page 55: SAS BASICS

Create Format•Tasks >> Data >> Create Format

55

Page 56: SAS BASICS

Create Format (Cont’)•Set Format Name ‘GENDER’•Select Library - SASUSER•Select Format Type ‘Character’

56

Page 57: SAS BASICS

Define Formats•Click New Label and type a name of a

label•Click New Range and select type of values

and type a value according to the specified label

•Repeat the steps•Click Run

57

Page 58: SAS BASICS

Applying User-Defined Formats•Open a SAS Data Set•Unprotect Data: Edit >> Unprotect

Data

58

Page 59: SAS BASICS

Applying User-Defined Formats (Cont’)•Right-click the column•Select Properties

59

Page 60: SAS BASICS

Applying User-Defined Formats (Cont’)•In the left pane, select Formats•In Categories box, select User Defined•In Formats box, select the desired

Formats

60

Page 61: SAS BASICS

Applying Formats in Tasks•Custom formats can be applied in the

same places that formats defined in SAS can be used.

61

Page 62: SAS BASICS

SAS Tasks•After you have data in your project, you

can create reports and run analyses on the data.

•To do this, you select a SAS task from the Task List or from the Tasks menu. Some tasks have wizards to guide you through the decisions that you need to make. Wizards are available from menus or from a link next to the related task in the Task List.

62

Page 63: SAS BASICS

Using Tasks in SAS Enterprise Guide

• The icon next to each variable represents the variable's type. Country is a character variable ( ). Year is a numeric variable ( ). Month is a numeric variable in date-and-time format ( ). Actual and Predict are numeric variables in currency format ( ).

63

Page 64: SAS BASICS

One-Way Frequencies TaskWe should create One-Way Frequencies

(tables and graphs) to check our data set one last time before we intensively analyze the data.

64

Page 65: SAS BASICS

One-Way FrequenciesUnder Data, select Q1-Q19, Gender,

Nation, Year, and Major for Analysis variables.

65

Page 66: SAS BASICS

One-Way FrequenciesUnder Plots, check Vertical for Bar chart.

66

Page 67: SAS BASICS

One-Way FrequenciesCheck Frequency Tables and/or Bar charts

for any errors (e.g., typo). Make necessary correction(s).

67

Page 68: SAS BASICS

Filter and SortUse Tasks >> Data >> Filter and Sort... or Sort

data... to help you find the error(s).

68

Page 69: SAS BASICS

Summary Statistics TaskThe Summary Statistics task can be used to

calculate summary statistics based on groups within the data. You can produce reports, graphs, and data sets as output.

69

Page 70: SAS BASICS

Summary Statistics TaskThe Summary Statistics task has both a

wizard and the standard task dialog box that can be used to set up the results.

70

Page 71: SAS BASICS

Summary Statistics: Task RolesUse the wizard to assign variables to roles.

71

Specify variables whosevalues define subgroups.

Compute statisticsfor each numeric

variable in the list.

Page 72: SAS BASICS

Summary Statistics: Statistics and ResultsChoose statistics and results to include,

including a report, graphics, and an output data set.

72

Page 73: SAS BASICS

Summary Statistics: Advanced ViewOpening the task in Advanced View enables

additional options to further modify the output.

73

Page 74: SAS BASICS

Summary TablesThe Summary Tables wizard or task can be

used to generate a tabular summary report.

74

Page 75: SAS BASICS

Summary Tables WizardThe Summary Tables wizard enables you to select

analysis variable(s) and statistics, assign classification variables to define rows and columns, and specify totals.

75

Page 76: SAS BASICS

Summary Tables Wizard76

Page 77: SAS BASICS

77

Page 78: SAS BASICS

One-Sample t-Test

•Tasks >> ANOVA >> t Test

78

Page 79: SAS BASICS

•Selected One Sample.

79

Page 80: SAS BASICS

•Under Data, choose Q19 as the Analysis variable task role and Gender as the Group analysis by.

80

Page 81: SAS BASICS

•Under Analysis, input H0 = 3.

81

Page 82: SAS BASICS

T-Test Output82

Since p-value is less than 0.05, it can be concluded that average male students also consider themselves as a well-prepared students for advising appointment

Since p-value is less than 0.05, it can be concluded that average female students consider themselves as a well-prepared students for advising appointment (significantly higher than 3).

Page 83: SAS BASICS

Two-Sample t-Test

•Tasks >> ANOVA >> t Test

83

Page 84: SAS BASICS

•Selected Two Sample.

84

Page 85: SAS BASICS

•Under Data, choose Q6 as the analysis variable task role and Gender as the classification variable.

85

Page 86: SAS BASICS

•Under Plots, check Summary plot, Confidence interval plot, and Normal quantile-quantile (Q-Q) plot.

86

Page 87: SAS BASICS

T-Test Output87

the probability is greater than 0.05. So there is evidence that the variances for the two groups, female students and male students, are not different.

Equaled variance is assumed. Pooled method is used. Since p-value is greater than 0.05, it cannot be concluded that there is significant difference in Advisor Satisfaction between male and female students.

Page 88: SAS BASICS

One-Way ANOVA•Tasks >> ANOVA >> One-Way ANOVA

88

Page 89: SAS BASICS

•Under Data, assign Q6 and Year to the task roles of Dependent variable and Independent variable, respectively.

89

Page 90: SAS BASICS

•Under Tests, click Levene’s test

90

Page 91: SAS BASICS

•Under Means Comparison, check Bonferroni t test, Duncan’s multiple-range test, and Scheffe’s multiple comparison procedure for Post Hoc tests

91

Page 92: SAS BASICS

•Under Plots, check Means for Plots Types.

•Then, click Run.

92

Page 93: SAS BASICS

One-Way ANOVA results93

Since p-value is greater than 0.05, it can be concluded that there is no significant difference in average Advisor Satisfaction among year(s) of study. Therefore, there is no need to check the Post Hoc tests.

Page 94: SAS BASICS

Post Hoc Test: Bonferroni t Tests

94

Page 95: SAS BASICS

Post Hoc Test: Scheffe’s Tests95

Page 96: SAS BASICS

ANOVA: Means Plot of Q6 by Year

96

Page 97: SAS BASICS

97

Page 98: SAS BASICS

Data Exploration, Correlations, and Scatter Plots•Tasks >> Multivariate >> Correlations

98

Page 99: SAS BASICS

• With Data selected at the left, assign Q1, Q2, Q3, Q4, and Q5 to the task role of Analysis variables and Q6 to the role of Correlate with.

99

Page 100: SAS BASICS

Correlation Types100

Page 101: SAS BASICS

• In Results, check the box for Create a scatter plot for each correlation pair. Also, check the box at the right for Show correlations in decreasing order of magnitude and uncheck the box for Show statistics for each variable.

101

Page 102: SAS BASICS

Correlation Analysis102

•Since p-values are less than 0.05, there are significant (positive) relationships between Q6 (Overall satisfaction on Advisor) and Q1, Q2, Q3, Q4, Q5.

Page 103: SAS BASICS

Linear Regression•Tasks >> Regression >> Linear

Regression

103

Page 104: SAS BASICS

•Drag Q6 to the dependent variable task role and Q1, Q2, Q3, Q4, Q5. to the explanatory variables task role.

104

Page 105: SAS BASICS

Regression: ModelModel Selection Method: Full model

fitted (by default)

105

Page 106: SAS BASICS

Regression: Statistics•Under Details on estimates, check

Standardized regression coefficients•Perform some Diagnostics

106

Page 107: SAS BASICS

Regression Diagnostics•Unusual and Influential data

(Outliers/Leverage)•Tests on Normality of Residuals•Tests on Nonconstant Error of Variance

(Heteroscedasticity)•Tests on Correlations among Predictors

(Multicollinearity)•Tests on Nonlinearity•Tests on Dependence of Residuals

(Autocorrelation)•Model Specification

107

Page 108: SAS BASICS

Diagnostics: Collinearity Analysis•This option requests a detailed analysis of

collinearity among the regressors. This includes eigenvalues, condition indices, and decomposition of the variances of the estimates with respect to each eigenvalue.

108

Page 109: SAS BASICS

Diagnostics: Collinearity Analysis• Check Tolerance (1/VIF) or Variance Inflation (VIF)• Some researchers use the more lenient cutoff of 5.0 or

even 10.0 to signal when multicollinearity is a problem. The researcher may wish to drop the variable with the highest VIF if multicollinearity is indicated and theory warrants.

• The condition indices are the square roots of the ratio of the largest eigenvalue to each individual eigenvalue. The largest condition index is the condition number of the scaled X matrix. Belsey, Kuh, and Welsch (1980) suggest that, when this number is around 10, weak dependencies might be starting to affect the regression estimates. When this number is larger than 100, the estimates might have a fair amount of numerical error (although the statistical standard error almost always is much greater than the numerical error).

109

Page 110: SAS BASICS

Diagnostics: Heteroscedasticity Test

•This option tests that the first and second moments of the model are correctly specified.

•Asymptotic covariance matrix. This option displays the estimated asymptotic covariance matrix of the estimates under the hypothesis of heteroscedasticity.

110

Page 111: SAS BASICS

Diagnostics: Durbin-Watson Statistic

• The Durbin-Watson statistic shows whether or not the errors have first-order autocorrelation. (This test is appropriate only for time series data.) The sample autocorrelation of the residuals is also produced.

• The value of d ranges from 0 to 4. Values close to 0 indicate extreme positive autocorrelation; close to 4 indicates extreme negative autocorrelation; and close to 2 indicates no serial autocorrelation. As a rule of thumb, d should be between 1.5 and 2.5 to indicate independence of observations. Positive autocorrelation means standard errors of the b coefficients are too small. Negative autocorrelation means standard errors are too large.

111

Page 112: SAS BASICS

• Under Plots, select Custom list of plots under Show plots for regression analysis. In the menu that appears, uncheck the box for Diagnostic plots and check the box for Histogram plot of the residual, Normal quartile plot of the residual and Residual plots.

112

Page 113: SAS BASICS

Regression Analysis113

These are the F Value and p-value, respectively, testing the null hypothesis that the Model does not explain the variance of the response variable.

R-Square defines the proportion of the total variance explained by the Model.

Page 114: SAS BASICS

Regression Analysis114

These are the t Value and p-value, respectively, testing the null hypothesis that the coefficients are significantly equal to 0.

Page 115: SAS BASICS

Regression: Diagnostics115

Might suggest violation of normality of residuals assumption

Page 116: SAS BASICS

Regression: Diagnostics116

Might suggest violation of normality of residuals assumption

Page 117: SAS BASICS

Regression: Diagnostics117

Page 118: SAS BASICS

Q&A

118