Upload
seshanirmalap
View
221
Download
0
Embed Size (px)
Citation preview
8/14/2019 Char Func Handout
1/42
8/14/2019 Char Func Handout
2/42
Some Functions We Will Discuss
LENGTH
SUBSTR
COMPBL
COMPRESS
VERIFY
INPUT
PUT
TRANWRD SCAN
TRIM
UPCASE
LOWCASE
INDEX
INDEXC
INDEXW
SPEDIS
LENGTH
8/14/2019 Char Func Handout
3/42
Some SAS 9 Functions
ANYDIGIT
ANYALPHA
NOTDIGIT
NOTALPHA
CATX and CATS
COMPARE
LENGTHC LENGTHN
STRIPCOUNTCOUNTC
PROPCASEFINDFINDC
8/14/2019 Char Func Handout
4/42
Character Storage Lengths
data chars1; length string $ 7;
string = 'abc';
length = length(string);storage_length = lengthc(string);display = ":" || string || ":";put storage_length= /
length= /display=;
run;
8/14/2019 Char Func Handout
5/42
SAS Log
11 data chars1;12 length string $ 7;13 string = 'abc';14 storage_length = lengthc(string);15 length = length(string);
16 display = ":" || string || ":";17 put storage_length= /18 length= /19 display=;20 run;
storage_length=7length=3display=:abc :
8/14/2019 Char Func Handout
6/42
Moving the LENGTH Statement
data chars2;
string = 'abc';
length string $ 7;
storage_length = lengthc(string);
length = length(string);
display = ":" || string || ":";
put storage_length= /
length= /
display=;
run;
8/14/2019 Char Func Handout
7/42
SAS Log1 data chars2;
2 string = 'abc';3 length string $ 7;WARNING: Length of character variable string has already been set.
Use the LENGTH statement as the very first statement in the DATASTEP to declare the length of a character variable.
4 storage_length = lengthc(string);
5 length = length(string);6 display = ":" || string || ":";7 put storage_length= /8 length= /9 display=;10 run;
storage_length=3length=3display=:abc:
8/14/2019 Char Func Handout
8/42
8/14/2019 Char Func Handout
9/42
The PUT Function
data special;***PUT is a special function often usedfor numeric to character conversion;input sas_date number ss;c_date = put(sas_date,date9.);
money = put(number,dollar8.);ss_char = put(ss,ssn.);
datalines;0 1234 123456789;
Listing of Data Set SPECIAL
sas_date number ss c_date money ss_char
0 1234 123456789 01JAN1960 $1,234 123-45-6789
8/14/2019 Char Func Handout
10/42
Converting Multiple Blanks to aSin le Blank
data multiple;input #1 @1 Name $20.
#2 @1 Address $30.#3 @1 City $15.
@20 State $2.@25 Zip $5.;
name = compbl(name);address = compbl(address);city = compbl(city);
datalines;Ron Cody89 Lazy Brook Rd.
Flemington NJ 08822Bill Brown28 Cathy StreetNorth City NY 11518
Multiple
Name AddressRon Cody 89 Lazy Brook Rd.
Bill Brown 28 Cathy Street
City State ZipFlemington NJ 08822North City NY 11518
8/14/2019 Char Func Handout
11/42
How to Remove Charactersfrom a Strin
data phone;input phone $15.;phone1 = compress(phone);
phone2 = compress(phone,'(-) ');datalines;(908)235-4490(201) 555-77 99
; Phone
phone phone1 phone2
(908)235-4490 (908)235-4490 9082354490
(201) 555-77 99 (201)555-7799 2015557799
8/14/2019 Char Func Handout
12/42
Another COMPRESS Example
data social;
input ss_char $11.;
ss = input(compress(ss_char,'-'),9.);
easy_ss = input(ss_char,comma11.);
datalines;
123-45-6789
;
ss = 123456789 (numeric)ss_easy = 123456789 (numeric)
8/14/2019 Char Func Handout
13/42
Compress Function (SAS 9 changes)
COMPRESS (char_value )
char_value is a SAS character value
comp_string is a character value containing thecharacters to remove from char_value.
modifiers add additional characters to the list of
characters to remove or modify the way the function
works (see next slide).
8/14/2019 Char Func Handout
14/42
Compress Function Modifiers (SAS9
Selected list of COMPRESS modifiers (upper- orlowercase)
a adds upper- and lowercase letters
d adds numerals (digits)
i ignores case
k keeps listed characters instead of removingthem
s adds space (blank, tabs, lf, cr) to the list
p adds punctuation
8/14/2019 Char Func Handout
15/42
Examples
For these examples, char = "A C123XYZ",
8/14/2019 Char Func Handout
16/42
Using the Compress Modifiers
data phone;input phone $15.;
number = compress(phone,,'kd');
datalines;(908)235-4490
(201) 555-77 99
; Listing of Data Set PHONE
phone number
(908)235-4490 9082354490(201) 555-77 99 2015557799
8/14/2019 Char Func Handout
17/42
The VERIFY Function
data verify;input @1 id $3.
@5 answer $5.;
position = verify(answer,'abcde');datalines;001 acbed002 abxde
003 12cce004 abc e;
Verify
id answer position
001 acbed 0002 abxde 3003 12cce 1004 abc e 4
8/14/2019 Char Func Handout
18/42
Watch Out for Trailing Blanks
data trailing;
length string $ 10;
string = 'abc';
position = verify(string,'abcde');
run;
String = 'abc 'Position = 4 (the position of the first trailing
8/14/2019 Char Func Handout
19/42
8/14/2019 Char Func Handout
20/42
Using VERIFY for Data Cleaning
data clean;input id $;
***Valid ID's contain letters X,Y, or Z
and digits;
ifverify(trim(id),'XYZ0123456789') eq 0then valid = 'Yes';
else valid = 'No';
datalines;
12X67YZ67WXYZ
;
Listing of Data Set CLEAN
id valid
12X67YZ Yes67WXYZ No
8/14/2019 Char Func Handout
21/42
Substring Example
data pieces_parts;input Id $9.;
length State $ 2;
state = substr(Id,3,2);
Num = input(substr(Id,5),4.);datalines;
XYNY123
XYNJ1234;
Listing of Data Set PIECES_PARTS
Id State Num
XYNY123 NY 123XYNJ1234 NJ 1234
8/14/2019 Char Func Handout
22/42
The SUBSTR Function on the Left-Hand Side of the Equal Sign
data pressure;input sbp dbp @@;length sbp_chk dbp_chk $ 4;sbp_chk = put(sbp,3.);
dbp_chk = put(dbp,3.);if sbp gt 160 thensubstr(sbp_chk,4,1) = '*';
if dbp gt 90 then substr(dbp_chk,4,1) = '*';
datalines;120 80 180 92 200 110;
8/14/2019 Char Func Handout
23/42
The SUBSTR Function on the Left-Hand Side of the Equal Sign
Listing of Data Set PRESSURE
sbp dbp sbp_chk dbp_chk
120 80 120 80
180 92 180* 92*
200 110 200* 110*
8/14/2019 Char Func Handout
24/42
Parsing a String
data take_apart;input @1 Cost $10.;Integer = input(scan(Cost,1,' /'),8.);Num = input(scan(Cost,2,' /'),8.);Den = input(scan(Cost,3,' /'),8.);
ifmissing(Num) then Amount = Integer;else Amount = Integer + Num/Den;
datalines;1 3/4
12 1/2123;
Listing of Data Set TAKE_APART
Cost Integer Num Den Amount
1 3/4 1 3 4 1.7512 1/2 12 1 2 12.50123 123 . . 123.00
8/14/2019 Char Func Handout
25/42
Using the SCAN Function toExtract a Last Name
data first_last;length last_name $ 15;input @1 name $20.
@22 phone $13.;
***extract the last name from name;last_name = scan(name,-1,' ');*** minus value scans from the right;
datalines;eff W. Snoker (908)782-4382
Raymond Albert (732)235-4444Alfred Edward Newman (800)123-4321Steven . Foster 201 567-9876
8/14/2019 Char Func Handout
26/42
Using the SCAN Function toExtract a Last Name
Names and Phone Numbers in AlphabeticalOrder (by Last Name)
Name Phone NumberRaymond Albert (732)235-4444
Steven J. Foster (201)567-9876
Alfred Edward Newman (800)123-4321
Jose Romerez (516)593-2377Jeff W. Snoker (908)782-4382
8/14/2019 Char Func Handout
27/42
8/14/2019 Char Func Handout
28/42
8/14/2019 Char Func Handout
29/42
Locating One Word in a StringFunction INDEXW
data _null_;
string = 'anything goes any where';
index = index(string,'any');
indexw = indexw(string,'any');put index= indexw=;
run;
index = 1indexw = 15
Note: You can specify delimiters for indexw in a third argument
8/14/2019 Char Func Handout
30/42
Changing CaseData case;
input name $15.;upper = upcase(name);
lower = lowcase(name);
proper = propcase(name);
Datalines;gEOrge SMITH
The end
;Listing of Data Set CASE
name upper lower proper
gEOrge SMITH GEORGE SMITH george smith George Smith The end THE END the end The End
8/14/2019 Char Func Handout
31/42
Substituting One Word for Anotherin a Strin
data convert;
input @1 address $20. ;
*** Convert Street, Avenue and
Boulevard to their abbreviations;
Address = tranwrd(address,'Street','St.');
Address = tranwrd(address,'Avenue','Ave.');
Address = tranwrd(address,'Road','Rd.');
datalines;
89 Lazy Brook Road
123 River Rd.12 Main Street
;
Listing of Data Set CONVERT
Obs Address
1 89 Lazy Brook Rd.2 123 River Rd.3 12 Main St.
8/14/2019 Char Func Handout
32/42
Spelling distance
data compare;length string1 string2 $ 15;
input string1 string2;
points = spedis(string1,string2);
datalines;
same same
same sam
first xirst
last lasx
receipt reciept
;
Listing of Data Set COMPARE
string1 string2 points
same same 0
same sam 8first xirst 40last lasx 25receipt reciept 7
8/14/2019 Char Func Handout
33/42
The "ANY" Functions
data find_alpha_digit;
input string $20.;
first_alpha = anyalpha(string);
first_digit = anydigit(string);datalines;
no digits here
the 3 and 4
123 456 789
;
Listing of Data Set FIND_ALPHA_DIGIT
first_ first_string alpha digit
no digits here 1 0the 3 and 4 1 5123 456 789 0 1
8/14/2019 Char Func Handout
34/42
The "NOT" FunctionsBeware of Trailin Blanks
length string $ 10;
string = '123';
position = notdigit(string);
pos_trim = notdigit(trim(string));
position = 4 (position of first blank)
pos_trim = 0
8/14/2019 Char Func Handout
35/42
The "NOT" Functions
data data_cleaning;input string $20.;
not_alpha = notalpha(trim(string));
not_digit = notdigit(trim(string));
datalines;
abcdefg
1234567
abc123
1234abcd
;
Listing of Data Set DATA_CLEANING
not_ not_string alpha digit
abcdefg 0 11234567 1 0abc123 4 11234abcd 1 5
8/14/2019 Char Func Handout
36/42
Concatenation Functions
data join_up;
length cats $ 6 catx $ 17;
string1 = 'ABC ';
string2 = ' XYZ ';
string3 = '12345';cats = cats(string1,string2);
catx = catx('***',string1,string2,string3);
run;
cats = 'ABCXYZ'catx = 'ABC***XYZ***12345
Without the length statement, cats and catx would have a length of
8/14/2019 Char Func Handout
37/42
Some LENGTH Functions
data how_long;one = 'ABC ';miss = ' '; /* char missing value */
length_one = length(one);lengthn_one = lengthn(one);lengthc_one = lengthc(one);length_two = length(miss);
lengthn_two = lengthn(miss);lengthc_two = lengthc(miss);run;
33610
1
8/14/2019 Char Func Handout
38/42
The COMPARE Function
COMPARE(string1, string2 )I ignore case
L remove leading blanks
: truncate the longer string to the length of the shorter
string. The default is to pad the shorter string with blanksbefore a comparison.
(Note: similar to the =: comparison operator)
If string1 and string2 are the same, COMPARE returns a value of 0.
If the arguments differ, the sign of the result is negative if string1 precedesstring2 in a sort sequence, and positive if string1 follows string2 in a sortsequence
The magnitude of the result is equal to the position of the leftmostcharacter at which the strings differ.
8/14/2019 Char Func Handout
39/42
8/14/2019 Char Func Handout
40/42
The STRIP Functiondata _null_;
length concat $ 8;file print;one = ' ABC ';two = ' XYZ ';one_two = ':' || one || two || ':';strip = ':' || strip(one) || strip(two) || ':';concat = cats(':',one,two,':');put one_two= /
strip= /=
one_two=: ABC XYZ :strip=:ABCXYZ:concat=:ABCXYZ:
8/14/2019 Char Func Handout
41/42
COUNT and COUNTC Functions
data Dracula; /* Get it Count Dracula */input string $20.;count_abc = count(string,'abc');countc_abc = countc(string,'abc');count_abc_i = count(string,'abc','i');
datalines;xxabcxABCxxbbbbcbacba;
Listing of Data Set DRACULA
count_ countc_ count_
string abc abc abc_i
xxabcxABCxxbbbb 1 7 2cbacba 0 6 0
8/14/2019 Char Func Handout
42/42
Contact Information
Author: Ron Cody
You may download copies of the Powerpointpresentation from:
www2.umdnj.edu/codyweb/biocomputing