4
Computing New Variables Creating new variables in a dataset occurs in a data step. The general format is like an equation, with your new variable on the left of an equals sign and what the new variable should be on the right of the equals sign. DATA new_data; SET old_data; test1 = "A"; test2 = 3; RUN; This example code creates two new variables, a character variable named test1 and a numeric variable named test2. The value of the variable test1 will be “A” for all observations in the dataset new_data and the value for test2 will be 3 for all observations. Note the use of quotations for a character variable. If you do not use an informat statement to declare the type of variable you are creating or a length statement to tell SAS how long to make the variable, then SAS will assign its defaults. Generally this is not a problem for standard numeric variables, but it might be for character variables. The length of test1 is only 1 because SAS uses the length of the first value assigned to the variable. If you later want to change some values to “ABC” you won’t be able to do that with a length of 1. For character variables it is best to declare them in an informat statement first so that you can assign them a length that works for you. Creating variables in this way gives you some flexibility. You can put a formula on the right side of the equals sign in order to do a calculation. You can put a variable name on the right hand side of the equation. You can also incorporate conditional logic into variable creation so that your new variable equals one value if a condition occurs and another value if the condition does not occur. Let’s use our sample dataset to show some examples of variable creation.

Computing New Variables

Embed Size (px)

DESCRIPTION

Computing New Variables

Citation preview

Page 1: Computing New Variables

Computing New Variables

Creating new variables in a dataset occurs in a data step. The general format is like an equation, with your new variable on the left of an equals sign and what the new variable should be on the right of the equals sign.

   DATA new_data;    SET old_data;    test1 = "A";    test2 = 3;   RUN;

This example code creates two new variables, a character variable named test1 and a numeric variable named test2. The value of the variable test1 will be “A” for all observations in the dataset new_data and the value for test2 will be 3 for all observations. Note the use of quotations for a character variable.

If you do not use an informat statement to declare the type of variable you are creating or a length statement to tell SAS how long to make the variable, then SAS will assign its defaults. Generally this is not a problem for standard numeric variables, but it might be for character variables. The length of test1 is only 1 because SAS uses the length of the first value assigned to the variable. If you later want to change some values to “ABC” you won’t be able to do that with a length of 1. For character variables it is best to declare them in an informat statement first so that you can assign them a length that works for you.

Creating variables in this way gives you some flexibility. You can put a formula on the right side of the equals sign in order to do a calculation. You can put a variable name on the right hand side of the equation. You can also incorporate conditional logic into variable creation so that your new variable equals one value if a condition occurs and another value if the condition does not occur. Let’s use our sample dataset to show some examples of variable creation.

Example. Using the height and weight variables, calculate the student’s body mass index (BMI). Also, convert the height variable (currently in inches) to meters. Finally, create an indicator (or dummy) variable that is equal to 1 if the student has any siblings and 0 if the student has none.

Page 2: Computing New Variables

   DATA sample_new_vars;    SET sample;    bmi = (weight / (height*height) ) * 703;    heightInMeters = height * 0.0254;    IF siblings >= 1 THEN sibling_indicator =1;    IF siblings = 0 THEN sibling_indicator = 0;   RUN;

In the example program above a new dataset calledsample_new_vars is created. By using the SET statement, the sample_new_vars dataset starts by being an exact copy of the dataset sample and then three new variables are added: bmi,heightInMeters, and sibling_indicator. Both bmi andheightInMeters are created by simple arithmetic using existing variables in the dataset. The variablesibling_indicator is created with two IF-THEN statements that use conditional logic rules to establish the values of the variable.

Using Built-In SAS Functions

Page 3: Computing New Variables

SAS has numerous built-in functions that allow you to manipulate existing variables and create new variables. Functions are used in a data step. There are too many functions to explore all of them in detail here, but we’ll go through several useful ones. A list of all SAS functions can be found in the SAS Help and Documentation Guide, as shown below. In this section of the Help, you can look up a specific function listed alphabetically, or browse through the functions separated into categories.

SAS functions follow this general format:

      function-name (argument1, ..., argument-n);

Where function-name is the SAS-given name of the function, and argument1, … , argument-n represent key pieces of information that SAS requires in order to execute the function. The number of arguments, and what they are, vary by the function. Arguments are always separated by a comma and contained within parentheses. Let’s look at an example. The ROUND function

Page 4: Computing New Variables

rounds a numeric value to the specified integer or decimal point value. The format of the ROUND function is:

      ROUND (argument, rounding-unit);

ROUND is the function name, argument is the number or variable containing numbers that you want to have rounded, and rounding-unit is 10, 100, 0.1, 0.01, or whatever unit you want the numeric value to be rounded to. For example, ROUND (34.58, 0.1) tells SAS to round the number 34.58 to the nearest tenth. SAS will return 34.6. More common, the argument in the function statement is a variable for which you want all values in your dataset rounded. For example, new_variable = ROUND (old_variable, 0.1).

There are a few key pieces of information that you will need to know to successfully execute a function. First, you will need to know the function name – or at least the keyword for the function name that SAS uses. Second, you will need to know the required arguments for the function, meaning the key pieces of information SAS needs to know in exactly the way SAS wants to see it, in order for the function to execute properly. Third, you’ll need to know what type of value SAS will return on the function. Will it return a character or string value? And of what length? Lastly, you’ll need to be aware of how the function will treat any missing values in the given variable.