Char Func Handout

Embed Size (px)

Citation preview

  • 8/14/2019 Char Func Handout

    1/42

  • 8/14/2019 Char Func Handout

    2/42

    Some Functions We Will Discuss

    LENGTH

    SUBSTR

    COMPBL

    COMPRESS

    VERIFY

    INPUT

    PUT

    TRANWRD SCAN

    TRIM

    UPCASE

    LOWCASE

    INDEX

    INDEXC

    INDEXW

    SPEDIS

    LENGTH

  • 8/14/2019 Char Func Handout

    3/42

    Some SAS 9 Functions

    ANYDIGIT

    ANYALPHA

    NOTDIGIT

    NOTALPHA

    CATX and CATS

    COMPARE

    LENGTHC LENGTHN

    STRIPCOUNTCOUNTC

    PROPCASEFINDFINDC

  • 8/14/2019 Char Func Handout

    4/42

    Character Storage Lengths

    data chars1; length string $ 7;

    string = 'abc';

    length = length(string);storage_length = lengthc(string);display = ":" || string || ":";put storage_length= /

    length= /display=;

    run;

  • 8/14/2019 Char Func Handout

    5/42

    SAS Log

    11 data chars1;12 length string $ 7;13 string = 'abc';14 storage_length = lengthc(string);15 length = length(string);

    16 display = ":" || string || ":";17 put storage_length= /18 length= /19 display=;20 run;

    storage_length=7length=3display=:abc :

  • 8/14/2019 Char Func Handout

    6/42

    Moving the LENGTH Statement

    data chars2;

    string = 'abc';

    length string $ 7;

    storage_length = lengthc(string);

    length = length(string);

    display = ":" || string || ":";

    put storage_length= /

    length= /

    display=;

    run;

  • 8/14/2019 Char Func Handout

    7/42

    SAS Log1 data chars2;

    2 string = 'abc';3 length string $ 7;WARNING: Length of character variable string has already been set.

    Use the LENGTH statement as the very first statement in the DATASTEP to declare the length of a character variable.

    4 storage_length = lengthc(string);

    5 length = length(string);6 display = ":" || string || ":";7 put storage_length= /8 length= /9 display=;10 run;

    storage_length=3length=3display=:abc:

  • 8/14/2019 Char Func Handout

    8/42

  • 8/14/2019 Char Func Handout

    9/42

    The PUT Function

    data special;***PUT is a special function often usedfor numeric to character conversion;input sas_date number ss;c_date = put(sas_date,date9.);

    money = put(number,dollar8.);ss_char = put(ss,ssn.);

    datalines;0 1234 123456789;

    Listing of Data Set SPECIAL

    sas_date number ss c_date money ss_char

    0 1234 123456789 01JAN1960 $1,234 123-45-6789

  • 8/14/2019 Char Func Handout

    10/42

    Converting Multiple Blanks to aSin le Blank

    data multiple;input #1 @1 Name $20.

    #2 @1 Address $30.#3 @1 City $15.

    @20 State $2.@25 Zip $5.;

    name = compbl(name);address = compbl(address);city = compbl(city);

    datalines;Ron Cody89 Lazy Brook Rd.

    Flemington NJ 08822Bill Brown28 Cathy StreetNorth City NY 11518

    Multiple

    Name AddressRon Cody 89 Lazy Brook Rd.

    Bill Brown 28 Cathy Street

    City State ZipFlemington NJ 08822North City NY 11518

  • 8/14/2019 Char Func Handout

    11/42

    How to Remove Charactersfrom a Strin

    data phone;input phone $15.;phone1 = compress(phone);

    phone2 = compress(phone,'(-) ');datalines;(908)235-4490(201) 555-77 99

    ; Phone

    phone phone1 phone2

    (908)235-4490 (908)235-4490 9082354490

    (201) 555-77 99 (201)555-7799 2015557799

  • 8/14/2019 Char Func Handout

    12/42

    Another COMPRESS Example

    data social;

    input ss_char $11.;

    ss = input(compress(ss_char,'-'),9.);

    easy_ss = input(ss_char,comma11.);

    datalines;

    123-45-6789

    ;

    ss = 123456789 (numeric)ss_easy = 123456789 (numeric)

  • 8/14/2019 Char Func Handout

    13/42

    Compress Function (SAS 9 changes)

    COMPRESS (char_value )

    char_value is a SAS character value

    comp_string is a character value containing thecharacters to remove from char_value.

    modifiers add additional characters to the list of

    characters to remove or modify the way the function

    works (see next slide).

  • 8/14/2019 Char Func Handout

    14/42

    Compress Function Modifiers (SAS9

    Selected list of COMPRESS modifiers (upper- orlowercase)

    a adds upper- and lowercase letters

    d adds numerals (digits)

    i ignores case

    k keeps listed characters instead of removingthem

    s adds space (blank, tabs, lf, cr) to the list

    p adds punctuation

  • 8/14/2019 Char Func Handout

    15/42

    Examples

    For these examples, char = "A C123XYZ",

  • 8/14/2019 Char Func Handout

    16/42

    Using the Compress Modifiers

    data phone;input phone $15.;

    number = compress(phone,,'kd');

    datalines;(908)235-4490

    (201) 555-77 99

    ; Listing of Data Set PHONE

    phone number

    (908)235-4490 9082354490(201) 555-77 99 2015557799

  • 8/14/2019 Char Func Handout

    17/42

    The VERIFY Function

    data verify;input @1 id $3.

    @5 answer $5.;

    position = verify(answer,'abcde');datalines;001 acbed002 abxde

    003 12cce004 abc e;

    Verify

    id answer position

    001 acbed 0002 abxde 3003 12cce 1004 abc e 4

  • 8/14/2019 Char Func Handout

    18/42

    Watch Out for Trailing Blanks

    data trailing;

    length string $ 10;

    string = 'abc';

    position = verify(string,'abcde');

    run;

    String = 'abc 'Position = 4 (the position of the first trailing

  • 8/14/2019 Char Func Handout

    19/42

  • 8/14/2019 Char Func Handout

    20/42

    Using VERIFY for Data Cleaning

    data clean;input id $;

    ***Valid ID's contain letters X,Y, or Z

    and digits;

    ifverify(trim(id),'XYZ0123456789') eq 0then valid = 'Yes';

    else valid = 'No';

    datalines;

    12X67YZ67WXYZ

    ;

    Listing of Data Set CLEAN

    id valid

    12X67YZ Yes67WXYZ No

  • 8/14/2019 Char Func Handout

    21/42

    Substring Example

    data pieces_parts;input Id $9.;

    length State $ 2;

    state = substr(Id,3,2);

    Num = input(substr(Id,5),4.);datalines;

    XYNY123

    XYNJ1234;

    Listing of Data Set PIECES_PARTS

    Id State Num

    XYNY123 NY 123XYNJ1234 NJ 1234

  • 8/14/2019 Char Func Handout

    22/42

    The SUBSTR Function on the Left-Hand Side of the Equal Sign

    data pressure;input sbp dbp @@;length sbp_chk dbp_chk $ 4;sbp_chk = put(sbp,3.);

    dbp_chk = put(dbp,3.);if sbp gt 160 thensubstr(sbp_chk,4,1) = '*';

    if dbp gt 90 then substr(dbp_chk,4,1) = '*';

    datalines;120 80 180 92 200 110;

  • 8/14/2019 Char Func Handout

    23/42

    The SUBSTR Function on the Left-Hand Side of the Equal Sign

    Listing of Data Set PRESSURE

    sbp dbp sbp_chk dbp_chk

    120 80 120 80

    180 92 180* 92*

    200 110 200* 110*

  • 8/14/2019 Char Func Handout

    24/42

    Parsing a String

    data take_apart;input @1 Cost $10.;Integer = input(scan(Cost,1,' /'),8.);Num = input(scan(Cost,2,' /'),8.);Den = input(scan(Cost,3,' /'),8.);

    ifmissing(Num) then Amount = Integer;else Amount = Integer + Num/Den;

    datalines;1 3/4

    12 1/2123;

    Listing of Data Set TAKE_APART

    Cost Integer Num Den Amount

    1 3/4 1 3 4 1.7512 1/2 12 1 2 12.50123 123 . . 123.00

  • 8/14/2019 Char Func Handout

    25/42

    Using the SCAN Function toExtract a Last Name

    data first_last;length last_name $ 15;input @1 name $20.

    @22 phone $13.;

    ***extract the last name from name;last_name = scan(name,-1,' ');*** minus value scans from the right;

    datalines;eff W. Snoker (908)782-4382

    Raymond Albert (732)235-4444Alfred Edward Newman (800)123-4321Steven . Foster 201 567-9876

  • 8/14/2019 Char Func Handout

    26/42

    Using the SCAN Function toExtract a Last Name

    Names and Phone Numbers in AlphabeticalOrder (by Last Name)

    Name Phone NumberRaymond Albert (732)235-4444

    Steven J. Foster (201)567-9876

    Alfred Edward Newman (800)123-4321

    Jose Romerez (516)593-2377Jeff W. Snoker (908)782-4382

  • 8/14/2019 Char Func Handout

    27/42

  • 8/14/2019 Char Func Handout

    28/42

  • 8/14/2019 Char Func Handout

    29/42

    Locating One Word in a StringFunction INDEXW

    data _null_;

    string = 'anything goes any where';

    index = index(string,'any');

    indexw = indexw(string,'any');put index= indexw=;

    run;

    index = 1indexw = 15

    Note: You can specify delimiters for indexw in a third argument

  • 8/14/2019 Char Func Handout

    30/42

    Changing CaseData case;

    input name $15.;upper = upcase(name);

    lower = lowcase(name);

    proper = propcase(name);

    Datalines;gEOrge SMITH

    The end

    ;Listing of Data Set CASE

    name upper lower proper

    gEOrge SMITH GEORGE SMITH george smith George Smith The end THE END the end The End

  • 8/14/2019 Char Func Handout

    31/42

    Substituting One Word for Anotherin a Strin

    data convert;

    input @1 address $20. ;

    *** Convert Street, Avenue and

    Boulevard to their abbreviations;

    Address = tranwrd(address,'Street','St.');

    Address = tranwrd(address,'Avenue','Ave.');

    Address = tranwrd(address,'Road','Rd.');

    datalines;

    89 Lazy Brook Road

    123 River Rd.12 Main Street

    ;

    Listing of Data Set CONVERT

    Obs Address

    1 89 Lazy Brook Rd.2 123 River Rd.3 12 Main St.

  • 8/14/2019 Char Func Handout

    32/42

    Spelling distance

    data compare;length string1 string2 $ 15;

    input string1 string2;

    points = spedis(string1,string2);

    datalines;

    same same

    same sam

    first xirst

    last lasx

    receipt reciept

    ;

    Listing of Data Set COMPARE

    string1 string2 points

    same same 0

    same sam 8first xirst 40last lasx 25receipt reciept 7

  • 8/14/2019 Char Func Handout

    33/42

    The "ANY" Functions

    data find_alpha_digit;

    input string $20.;

    first_alpha = anyalpha(string);

    first_digit = anydigit(string);datalines;

    no digits here

    the 3 and 4

    123 456 789

    ;

    Listing of Data Set FIND_ALPHA_DIGIT

    first_ first_string alpha digit

    no digits here 1 0the 3 and 4 1 5123 456 789 0 1

  • 8/14/2019 Char Func Handout

    34/42

    The "NOT" FunctionsBeware of Trailin Blanks

    length string $ 10;

    string = '123';

    position = notdigit(string);

    pos_trim = notdigit(trim(string));

    position = 4 (position of first blank)

    pos_trim = 0

  • 8/14/2019 Char Func Handout

    35/42

    The "NOT" Functions

    data data_cleaning;input string $20.;

    not_alpha = notalpha(trim(string));

    not_digit = notdigit(trim(string));

    datalines;

    abcdefg

    1234567

    abc123

    1234abcd

    ;

    Listing of Data Set DATA_CLEANING

    not_ not_string alpha digit

    abcdefg 0 11234567 1 0abc123 4 11234abcd 1 5

  • 8/14/2019 Char Func Handout

    36/42

    Concatenation Functions

    data join_up;

    length cats $ 6 catx $ 17;

    string1 = 'ABC ';

    string2 = ' XYZ ';

    string3 = '12345';cats = cats(string1,string2);

    catx = catx('***',string1,string2,string3);

    run;

    cats = 'ABCXYZ'catx = 'ABC***XYZ***12345

    Without the length statement, cats and catx would have a length of

  • 8/14/2019 Char Func Handout

    37/42

    Some LENGTH Functions

    data how_long;one = 'ABC ';miss = ' '; /* char missing value */

    length_one = length(one);lengthn_one = lengthn(one);lengthc_one = lengthc(one);length_two = length(miss);

    lengthn_two = lengthn(miss);lengthc_two = lengthc(miss);run;

    33610

    1

  • 8/14/2019 Char Func Handout

    38/42

    The COMPARE Function

    COMPARE(string1, string2 )I ignore case

    L remove leading blanks

    : truncate the longer string to the length of the shorter

    string. The default is to pad the shorter string with blanksbefore a comparison.

    (Note: similar to the =: comparison operator)

    If string1 and string2 are the same, COMPARE returns a value of 0.

    If the arguments differ, the sign of the result is negative if string1 precedesstring2 in a sort sequence, and positive if string1 follows string2 in a sortsequence

    The magnitude of the result is equal to the position of the leftmostcharacter at which the strings differ.

  • 8/14/2019 Char Func Handout

    39/42

  • 8/14/2019 Char Func Handout

    40/42

    The STRIP Functiondata _null_;

    length concat $ 8;file print;one = ' ABC ';two = ' XYZ ';one_two = ':' || one || two || ':';strip = ':' || strip(one) || strip(two) || ':';concat = cats(':',one,two,':');put one_two= /

    strip= /=

    one_two=: ABC XYZ :strip=:ABCXYZ:concat=:ABCXYZ:

  • 8/14/2019 Char Func Handout

    41/42

    COUNT and COUNTC Functions

    data Dracula; /* Get it Count Dracula */input string $20.;count_abc = count(string,'abc');countc_abc = countc(string,'abc');count_abc_i = count(string,'abc','i');

    datalines;xxabcxABCxxbbbbcbacba;

    Listing of Data Set DRACULA

    count_ countc_ count_

    string abc abc abc_i

    xxabcxABCxxbbbb 1 7 2cbacba 0 6 0

  • 8/14/2019 Char Func Handout

    42/42

    Contact Information

    Author: Ron Cody

    You may download copies of the Powerpointpresentation from:

    www2.umdnj.edu/codyweb/biocomputing