27
Class 1 Introduction to SAS Reading data 1. Inline 2. External file a. Text file b. Excel file Invoking SAS procedures SAS 软件使用

Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Class 1

Introduction to SAS

Reading data

1. Inline

2. External file

a. Text file

b. Excel file

Invoking SAS procedures

SAS 软件使用

Page 2: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

About this Class There might not be an exam

My evaluation of your achievement is based on

your homework and periodical quizzes in class.

After each class, you should save your SAS files

and upload them to BB.

The same is for class quizzes.

A note of warning: If I find anyone copying

other's homework, both will get a zero grade.

Repeated offenders will fail this class!

Page 3: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Introduction to SAS SAS can be used interactively, but its most

power is in batch mode, with which I’m

most familiar

SAS is made up with two steps:

Data step

Procedure step

Everything in SAS is accomplished by one

of this step.

This is the most fundamental concept in

SAS

Page 4: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

SAS's approach to data analysis Before we conduct any SAS statistical analysis,

we first need to get data into SAS

We might need to do something with the data: modify

it or create new variables, for example.

We might need to combine data from different places,

in some ways.

We either need to stack them on top of each other, or

we need to merge them together.

Once data is organized, we can analyze.

In the real world, most of our time is spend

on getting data organized.

Page 5: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

SAS’s Data Step In the first step, you use SAS to create a dataset

for SAS procedures to work with.

SAS’s power is most demonstrated here.

There seems to be no data that SAS can’t read.

It all depends on the programmer to write codes

for SAS to read your data.

SAS data is viewed as variables in columns and

observations in rows.

It does not normally tread data as a matrix!

So you can't normally access random element of

a dataset, only one row at a time.

Page 6: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

SAS’s Procedure Step

Once your data has been created, using SAS

procedures is truly a simple matter.

There are countless predefined SAS

procedures that can take care of just about

every statistical analysis that you can think of.

The richness of those procedures makes SAS

stand out as the king of statistical analysis!

Lets start with some very simple examples.

Please open and run ex2-3-1.sas.

Page 7: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

An Textbook Example (ex2-3-1.sas)

DATA ex1; /*此为数据步的开始,建立名为ex1的数据集*/

INPUT A B; /*读入数值变量A,B的值*/

DATALINES ; /*以下是数据行*/

23 45

34 56

;

RUN; /*数据步结束 (Optional)*/

PROC PRINT; /*此为过程步的开始*/

RUN; /*过程步结束,运行本程序*/

Page 8: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Data step In our last example, the first line:

“DATA ex1;”

“Data” is the SAS key word, in blue

ex1 is the name of the new dataset, and

“;” ends the statement.

The 2nd line:

“INPUT A B;”

“Input” is the SAS key word,

A B are two variables’ name to input

The 3rd line:

“DATALINES ;”

Tells SAS what follows are data itself

It must be the last statement within the data step!

Page 9: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

SAS outputs Once a program segment has been

submitted, there are two main outputs from

SAS:

Log (日志)

List program running information

Output(输出)

List actual output of each procedure

Lets all first open and run the ample code

“ex2-3-1” from within SAS and see the

actual log output (see next)

Page 10: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

SAS log file1 *ex2-3-1; /*此为注释语句*/2 DATA ex1; /*此为数据步的开始,建立名为ex1的数据集*/3 INPUT A B; /*读入数值变量A,B的值*/4 DATALINES ;

NOTE: 数据集 WORK.EX1 有 2 个观测和 2 个变量。NOTE: “DATA 语句”所用时间(总处理时间):

实际时间 0.28 秒CPU 时间 0.00 秒

4 ! /*以下是数据行*/7 ;8 RUN; /*数据步结束*/9 PROC PRINT; /*此为过程步的开始*/10 RUN;

NOTE: 有 2 个从数据集 WORK.EX1 读取的观测。NOTE: “PROCEDURE PRINT”所用时间(总处理时间):

实际时间 0.42 秒CPU 时间 0.03 秒

10 ! /*过程步结束,运行本程序*/

As we can see, the log file keeps tab of program running information, as a record of SAS’s work

Lets see another example next

Page 11: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

2nd Textbook Example (ex2-3-2.sas)*ex2-3-2; /* 一个星号时单行评语,*/

DATA score;

INPUT num $ name $ English computer ; /*读入4个变量 */

DATALINES; /* 数据开始语句 也可用CARDS语句 */081 ZHANGLIN 88 90

082 ZHAOHUA 99 89

083 WANGQANG 78 96

084 LIULI 84 79

085 SHIDONG 69 88

086 KONGYING 77 79

087 LILING 82 67

088 GUANFEN 80 91

091 MAQIANG 66 78

092 NEWHUA 88 99

; /*分号为数据结束语句 */

PROC MEANS; /*调用MEANS过程*/

RUN;

Lets all open and run this ample SAS code, and I will try to explain

what it does in greater details

Page 12: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Input character variable

In this example, the 2nd line:

“INPUT num $ name $ English computer ;”

Again, “Input” is the SAS keyword

This time, we are inputting 4 variables, for example:

081 ZHANGLIN 88 90

Here we want to take “081” as text, not a number

We need a way to tell SAS, so we do it by adding a “$”

after the variable’s name in the input statement

Since there is a “$” after the first 2 variable names,

they are considered character variables in SAS

The last two variables are numerical variables

Lets see the actual running of the SAS code next

Page 13: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Two SAS outputs: Log and ListSAS log file output:

11 *ex2-3-2;

12 DATA score;

13 INPUT num $ name $ English computer ; /*读入4个变量 */

14 DATALINES;

NOTE: 数据集 WORK.SCORE 有 10 个观测和 4 个变量。NOTE: “DATA 语句”所用时间(总处理时间):

实际时间 0.01 秒CPU 时间 0.00 秒

14 ! /* 数据开始语句 也可用CARDS语句 */

25 ; /*分号为数据结束语句 */

26 PROC MEANS; /*调用MEANS过程*/

27 RUN;

NOTE: 有 10 个从数据集 WORK.SCORE 读取的观测。NOTE: “PROCEDURE MEANS”所用时间(总处理时间):

实际时间 0.62 秒CPU 时间 0.04 秒

SAS list output:

SAS 系统 2010年04月22日 星期四 下午04时48分01秒 2

MEANS PROCEDURE

变量 N 均值 标准差 最小值 最大值

---------------------------------------------------------------------------

English 10 81.1000000 9.5852897 66.0000000 99.0000000

computer 10 85.6000000 9.6861872 67.0000000 99.0000000

Page 14: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Data Storage in SAS

So far, we’ve created SAS datasets that

once we quit SAS, they are gone.

If we want to save SAS dataset for later use,

we need to store it in a permanent location.

This is done in SAS by providing a location

of a folder or directory with the “Libname”

statement.

The data created will have a two level name:

libref.SAS-data-set.

Page 15: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Reading data from a

text file

Page 16: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Reading External Data

Most of the time, we need to read data more

than within the programming code.

SAS has a rich set of approaches to read

external data.

Here we will show a simple way to read external

data.

We will provide SAS with the location of the

external file with a “filename” statement.

Please open the file “Attend.sas”.

The following is what you will see.

Page 17: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

An Example of Regression with SASoptions linesize=120;

filename datain 'D:\teaching\Data\textfiles\ATTEND.raw'; /* location of the external raw input file */

libname dataout 'D:\teaching\sas\sasfile'; /* location of the permanent sas dataset */

data dataout.attend;

infile datain ;

input attend termGPA priGPA ACT final atndrte hwrte frosh soph skipped stndfnl

;

prigpa2 =prigpa**2; /* creating new variables within data step */

act2 =act**2;

attendprigpa=attend * prigpa;

label attend ="classes attended out of 32"

termGPA ="GPA for term"

priGPA ="cumulative GPA prior to term"

ACT ="ACT score"

final ="final exam score"

atndrte ="percent classes attended"

hwrte ="percent homework turned in"

frosh ="=1 if freshman"

soph ="=1 if sophomore"

skipped ="number of classes skipped"

stndfnl ="(final - mean)/sd“

prigpa2 =“PriGPA^2”

act2 =“Act ^2”

attendprigpa=“attend * PriGPA”

;

proc means; /* just be sure we did read the data correctly */

proc reg ;

eq_6_18: model stndfnl=attend prigpa act prigpa2 act2 attendprigpa;

run;

I will explain all of this later

Page 18: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Reading data from a text file In a DATA step,

Data can be read from a text file,

Output can be saved in a permanent location

In our last example, we had:

filename datain 'D:\teaching\Data\textfiles\ATTEND.raw';

/* please modify the location of your data file here */

libname dataout 'D:\teaching\sas\sasfile';

/* you need to modify this location also */

data dataout.attend;

infile datain ;

input attend termGPA priGPA ACT final

atndrte hwrte frosh soph skipped stndfnl

;

Please make the changes first

before running the program!

Page 19: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Matching labels In a DATA step,

Data can be read from a text file,

Output can be saved in a permanent location

In our last example, we had:

filename mydatain

'D:\teaching\Data\textfiles\ATTEND.raw'; /* please

modify the location of your data file here */

libname mydataout 'D:\teaching\sas\sasfile';

/* you need to modify this location also */

data mydataout.attend;

infile mydatain ;

input attend termGPA priGPA ACT final

atndrte hwrte frosh soph skipped stndfnl

;

We can give it any

matching name we

want, as long as it

is not a key word.

Page 20: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Reading data from a text file After we have defined filename as “datain”, and gave a

location for it, we can use it in the DATA step by

specifying with the statement “Infile datain;”

We have also defined a location for a permanent sas

dataset with the statement “Libname”, and called it

“dataout”, so that we can save the dataset we are going to

create in the data step as “dataout.attend”.

In this case, we can think of “Libname” as a folder name

This time, we don’t have to input the data from our sas

program statement.

Instead, we read the input variables directly from the

external text file.

Page 21: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Alternative Location Specification We can specify a folder rather than a single file for

filename statement, as in:

filename datain 'D:\teaching\Data\textfiles';

/* here we are showing the folder’s name only */

libname dataout 'D:\teaching\sas\sasfile';

/* you need to modify this location also */

data dataout.attend;

infile datain(ATTEND.raw) ;/* we list the file here. */

input attend termGPA priGPA ACT final

atndrte hwrte frosh soph skipped stndfnl

;

Page 22: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Location of external file There are many ways to tell SAS where the external file

is located in the DATA step, for example

1. Reading directly:

infile 'D:\teaching\Data\textfiles\ATTEND.raw';

2. Telling SAS where the file is located first with filename

statement filename datain 'D:\teaching\Data\textfiles\ATTEND.raw';

infile datain ;

3. Telling SAS the folder where the data is, as in:

filename datain 'D:\teaching\Data\textfiles';

infile datain(ATTEND.raw)

What are the advantages of each?

Adv: simple.

Dis-adv: berried inside

the code.

Adv: State on the top of the code, easy to modify.

Dis-adv: each file each name.

Adv: State on the top of the

code, easy to modify.

Dis-adv: more complex

Page 23: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Creating new variables in data step We can create new variables in the data

step, as we’ve done, for example:act2 =act**2; /* SAS way of saying act2 */

Since SAS variable names used to be only

8 characters long, we might want to place a

more descriptive label for each variable as:label attend ="classes attended out of 32“

termGPA="GPA for term"…;

We need to end our label statement with “;”.

Page 24: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Invoking SAS regression procedure After we have created a SAS dataset, we might want

to check it out with:

proc means; /* just be sure we did read the data correctly */

Next, we can run an OLS regression as:

proc reg ;

eq_6_18: model stndfnl=attend prigpa act prigpa2 act2

attendprigpa;

run;

To run a regression using “Reg” procedure, we need

a model statement, and we place the

dependent=a list of independent variables;

Be sure to end the statement with “;”.

Page 25: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

More on OLS Regression

In this example, we also gave a label for the model (it is

the equation 6.18 of Wooldridge’s textbook)

If we have multiple model statement for a single “Reg”,

we will need labels to identify each output with a label.

A label is a (max 8 letters) word ending with “:”, as in

“eq_6_18:”.

In the “proc reg” procedure, by default, we are using the

data we’ve just created.

Otherwise, we can use data= option to indicate which

data to use.

Page 26: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

Homework Assignment Please replicate the regression result of 6.4iii of

Wooldridge on page 223 using data HTV.

The regression equation is:

Log(wage) = β0+β1educ +β2pareduc

+β3pareduc*educ +β4expr +u

Notice the pareduc is not in the dataset.

Please post your regression outputs from SAS

and SAS log without editing to your blackboard

account.

I might come-up with a quiz for next class based

on today's class and the homework assignment!

Page 27: Class 1 Introduction to SAS€¦ · Introduction to SAS SAS can be used interactively, but its most power is in batch mode, with which I’m most familiar SAS is made up with two

A note about your homework For homework, you can get help, but you need to try it

yourself at the end.

For those that are trying to help others, don't just give

them the codes, but to explain how you did your work.

There will be a penalty for simply copying other's code.

For all those that copied homework from each other,

each group that have been identified, a point is taken for

the number of people in each group.

For example, if I found 5 people in a group sharing the

same set of sas codes, then 5 points will be deducted for

every person in the group, regardless of who copied

from whom.