Sas guida per l'uso

DIPARTIMENTO DI MATEMATICA UNIVERSITÀ DI GENOVA

Via Dodecaneso, 35 - 16146 GENOVA (Italy) tel. +39-010-3536751 fax +39-010-3536752

NOTE INTRODUTTIVE AL SISTEMA SAS

Fabio Rapallo - Ivano Repetto - Maria Piera Rogantin

Dipartimento di Matematica - Università di Genova

3

INDICE A. Aspetti generali A1. Il linguaggio SAS A2. I passi di DATA A3. I passi di PROC A4. I DATA SET di tipo SAS A5. Tipi di variabili

B. Come eseguire un programma SAS

C. Esempi di programmi SAS C1. Primo esempio C2. Secondo esempio: le procedure PRINT e CONTENTS C3. Osservazioni per la scrittura dei programmi

D. Il passo di DATA D1. Creazione di un DATA SET SAS D2. Alcuni esempi D3. DATA SET SAS permanenti

E. Manipolazione dei DATA SET E1. Selezione di sottoclassi di osservazioni E2. Selezione di osservazioni consecutive E3. Selezione di variabili E4. Cambio di nome a variabili E5. Costruzione di più DATA SET SAS E6. Concatenazione di più DATA SET SAS E7. Lettura di DATA SET di "tipo" diverso

F. Ancora sul passo di DATA F1. Espressioni e funzioni SAS F2. I valori mancanti F3. Somme cumulate

F4. Approfondimenti sull’esecuzione di un passo di DATA F5. Gli array

F6. Istruzioni di controllo F7. L'istruzione INPUT F8. L'istruzione INFILE F9. L'istruzione OUTPUT F10. Scrittura su un file esterno e istruzione PUT

G. Il passo di PROC G1. Alcune opzioni e istruzioni usato in un passo di PROC G2. Procedura SORT G3. Procedura PRINT G4. Procedura MEANS G5. Procedura FREQ (e procedura FORMAT) G6. Procedura UNIVARIATE G7. Altre procedure statistiche elementari G8. Alcune procedure che operano su DATA SET SAS G9. Selezione di variabili e di osservazioni in una procedura

H. Istruzioni e procedure grafiche H1. Alcune istruzioni per gli output grafici H2. Procedura GCHART H3. Procedura GPLOT

I. Errori e lettura del LOG

4

J. Approfondimenti: manipolazione di Data Set SAS J1. Overview of Methods for Combining SAS Data Sets J2. Manipolazione di Data Set SAS J2.1 Per concatenare i DSS: uso di Set J2.2 Per concatenare i DSS: uso di Set – by e di Merge – by J2.3 Per affiancare DSS con variabili diverse: uso di Set – Set e di Merge J2.4 Per aggiornare un DSS: uso di Update e di Merge – by J2.5 Per aggingere osservazioni a un DSS: la proc Append J3. Osservazioni ripetute: uso di Set – by e variabili first.<..> e last.<..>

K. Approfondimenti: lettura di dati grezzi K1. Input a lista con formato K2. Input con nome K3. Sospensione dell’input: uso di @ K4. Opzioni di Infile per leggere dati con delimitatori nell’input a lista

L. Approfondimenti: formati di lettura e scrittura dei dati L1. Istruzione Format L2. Istruzione Informat L3. Istruzione Lenght L4. Istruzione Atttrib L5. La Proc Format L5.1 Istruzione Value L5.2 Istruzione Invalue L5.3 Istruzione Picture L5.4 Alcuni esempi di cambio di formati L5.5 Funzioni di conversione da variabile carattere a numerica e viceversa L6. SAS Date, Time, and Datetime Values L7. Alcune funzioni di arrotondamento L8. Alcune funzioni sulle variabili carattere

M. Approfondimenti: le Macro SAS M1. Introduzione alla programmazione con macro M2. SAS Macro Language: Reference

N. Approfondimenti: come operare con matrici in SAS

5

A. Aspetti generali

A1. IL LINGUAGGIO SAS Il SAS è un sistema software che fornisce strumenti necessari per analizzare dati. E' composto da: - un linguaggio usato per la manipolazione dei dati; - una libreria di procedure pre-confezionate per uso generale.

Esiste un modulo SAS BASE più vari moduli per particolari applicazioni quali ad esempio: - statistica (STAT) - controllo qualità (QC) - ricerca operativa (OR) - serie temporali (TSA) - manipolazione matrici (IML) - grafica avanzata (GRAPH) - gestione risorse calcolatore - gestione Data Base.

Il SAS consente di: - leggere dati - trasformare e manipolare di dati (mediante l'utilizzo di funzioni matematiche e statistiche, di

concatenazione e di ordinamento) - aggiornare dati - stampare prospetti - generare grafici - ridurre e sintetizzare dati - effettuare analisi matematiche sui dati

Ogni programma è composto da passi (STEP). Esistono 2 tipi di passi: i passi di DATA e i passi di PROC. A2. I PASSI DI DATA

Si usano per creare DATA SET SAS partendo da files già esistenti. Tali files possono essere non SAS (grezzi), o già di tipo SAS. I dati in ingresso possono subire trattamenti durante il passo di DATA.

dati grezzi

DATA SET SAS

dati grezzi

DATA SET SAS

passo di DATA

Un passo di data inizia con l'istruzione DATA seguito dal nome del Data Set Sas che si vuole costruire. A3. I PASSI DI PROC Servono per produrre tabulati, rapporti, statistiche, ecc. I dati su cui operano devono già essere in formato DATA SET SAS. Con un passo di Proc si possono creare anche altri DATA SET SAS che possono essere analizzati successivamente da altri passi di DATA o di PROC. Un passo di proc inizia con l'istruzione PROC seguito dal nome della procedura che si vuole eseguire. Un passo di programma (di data o di proc) termina con l'istruzione

RUN; o con un nuovo passo di data o di proc.

6

A4. I DATA SET DI TIPO SAS Un DATASET di tipo SAS è un insieme di dati omogenei organizzati in forma rettangolare. ESEMPIO:

variabili

osse

rvaz

ioni

Le colonne sono chiamate variabili; ciascuna di esse ha un nome (si consiglia di assegnare sempre nomi mnemonici). Un nome deve seguire le seguenti regole sintattiche: a) deve essere formato da 1 a 8 caratteri; b) può contenere cifre al suo interno;

Un DATA SET SAS si compone di due parti distinte: - una parte descrittiva in cui vengono memorizzate tutte le informazioni necessarie affinchè il SAS

possa in un qualunque momento rileggere i dati in modo totalmente automatico (es.: nomi e attributi delle variabili, "storia" di come è stato costruito, ... );

- una parte in cui vengono memorizzati i dati propriamente detti. Per visualizzare le due parti, si devono usare delle PROC diverse.

Osservazione: un DATA SET SAS non è un file di dati tradizionale ma è leggibile solo con il software con cui è stato costruito. Ogni file SAS ha un nome. I nomi dei file seguono le stesse regole dei nomi delle variabili. A5. TIPI DI VARIABILI Una variabile può essere di due tipi: - NUMERICA (es. età, altezza, peso) - CARATTERE (es. cognome, sesso) Una variabile è caratterizzata da una serie di attributi: - nome - lunghezza (LENGTH) - formato di ingresso (INFORMAT) - formato di uscita (FORMAT) - etichetta (LABEL) che possono essere specificati con opportune istruzioni.

La lunghezza massima del nome di una variabile (salvo diversa dichiarazione) è di 8 caratteri . Le variabili assumono valori che dipendono dalla elaborazione che si sta effettuando. In particolari situazioni non esistono valori associabili ad una variabile (sia in fase di INPUT dei dati, sia a causa di operazioni su dati "invalidi"); in questo caso il SAS associa un particolare valore alla variabile, definito "valore mancante" o "valore vuoto" o "missing value". Il SAS esegue automaticamente la conversione da variabile carattere a numerica quando: - una variabile carattere è assegnata ad una variabile numerica definita precedentemente; - si esegue il confronto tra una variabile carattere e una numerica; - si eseguono operazioni aritmetiche su variabili carattere (solo nel caso in cui siano formate da cifre). Il SAS esegue automaticamente una conversione da variabile numerica a carattere quando: - una variabile numerica è assegnata ad una variabile carattere definita precedentemente; - una funzione agisce su una variabile numerica ma ha un formato carattere come argomento; - una variabile numerica è usata come operando di un operatore tipico delle variabili carattere (esempio l'operatore di concatenazione di stringhe).

7

B. COME ESEGUIRE UN PROGRAMMA CON IL SAS PER WINDOWS

Avviando il SAS da Windows compare una schermata composta generalmente da due finestre: una di Log e l'altra di Program Editor. Un'altra finestra, quella di Output si apre quando il programma crea un output non grafico Alcune osservazioni sulla scrittura e l'esecuzione dei programmi: - il testo del programma va scritto nella finestra di Program Editor - per salvare il programma su disco: dalla finestra di Program Editor, dal menù File si seleziona:

Save o Save as Con Save il programma (dopo il primo salvataggio) è salvato con il nome dell'ultimo programma

richiamato (fare attenzione). Il nome di un programma salvato compare nell'intestazione della finestra di Program Editor.

- per richiamare un programma salvato in precedenza in un file: dalla finestra di Program Editor, dal menù File si seleziona:

Open --> Read File - per far eseguire programma occorre dalla finestra di P.E., dal menù Local selezionare:

Submit (che corrisponde al tasto funzionale F8) - ad ogni esecuzione il testo del programma scompare dalla finestra del Program Editor, ma può

essere richiamato selezionando dalla finestra di Program Editor, dal menù Local: Recall (corrispondente al tasto funzionale F4)

con Recall si richiama l'ultimo programma eseguito (se si ripete l'operazione due volte vengono richiamati gli ultimi due programmi eseguiti, e così via)

- mentre il programma viene eseguito nella finestra di Log compaiono le indicazioni di ciò che il programma sta facendo come, ad esempio, tempi di esecuzione delle procedure, eventuali errori, numero di osservazioni lette nel Data Set, ecc.

- se è previsto un output non grafico questo viene scritto nella finestra di Output - avendo le finestre un'estensione limitata, non sempre tutto il loro contenuto è visibile. Per scorrere

all'interno di una finestra si eseguono le solite operazioni delle applicazioni Windows - per spostarsi da una finestra all'altra si può utilizare il menù Window o il mouse o usare i tasti

funzionali: - F5 per la finestra di Program Editor - F6 per la finestra di Log - F7 per la finestra di Output - i comandi di edizione di testo si trovano nel menu Edit - per rimuovere il contenuto di tutte le linee di testo da qualsiasi finestra, dal menù Edit selezionare:

Clear text (corrispondente ai tasti control+e) - per conservare i risultati contenuti nella finestra di Log e di Output in un file permanente si usa il

comando Save come per il salvataggio di un programma - per conoscere il contenuto dei tasti funzionali bisogna, dal menù Help selezionare:

Keys con tale operazione compare una finestra Keys con le indicazioni volute - nel menù Help, selezionando

SAS System si possono trovare la sintassi e le spiegazioni per le varie procedure e per l'uso dei comandi SAS

8

C. ESEMPI DI PROGRAMMA SAS C1 PRIMO ESEMPIO Con questo programma: - si costruisce un data set SAS di nome CLASSE leggendo i dati inseriti nel programma - si ordinano i dati secondo una variabile - si stampa il contenuto del DSS costruito - si costruiscono alcune statistiche PROGRAMMA SAS n. 1: DATA CLASSE; INPUT NOME $ A_CORSO $ - i nomi delle variabili sono separati da blank A_NASCIT ES_DATI MEDIA; - le prime due variabili sono di tipo carattere DATALINES; - i dati sono inseriti nel programma; XXX 1F 1965 12 95 si vedrà in seguito il caso con dati su file ZZZ 4R 1966 13 100 - ogni linea corrisponde a una osservazione WWW 4 1968 12 107 TTT 3 1967 9 100 ; PROC SORT data=classe; - lavora sull'ultimo Data Set BY ES_DATI; run; - ordina le oss. rispetto alla variabile ES_DATI PROC PRINT data=classe; - stampa le variabili del Data Set con il titolo indicato TITLE 'STUDENTI ORDINATI PER NUMERO ESAMI DATI'; run; - determina l'esecuzione del passo di proc PROC CONTENTS data=classe; - stampa le informazioni sul Data Set con il titolo assegnato precedentemente PROC MEANS data=classe; - calcola alcune statistiche su tutte le variabili numeriche con il titolo precedente RUN; - determina l'esecuzione del passo di proc PROC MEANS data=classe; - calcola alcune statistiche sulle variabili indicate var a_nascit; dopo l’istruzioneVAR RUN; - determina l'esecuzione del passo di proc OUTPUT SAS della procedura PRINT: STUDENTI ORDINATI PER NUMERO ESAMI DATI

OBS NOME A_CORSO A_NASCITA ES_DATI MEDIA

1 TTT 3 1967 9 100 2 XXX 1F 1965 12 95 3 WWW 4 1968 12 107 4 ZZZ 4R 1966 13 100

9

C2 SECONDO ESEMPIO: LE PROCEDURE PRINT E CONTENTS Con questo programma: - si costruisce un data set SAS di nome ES1 leggendo i dati inseriti nel programma - si costruiscono nuove variabili da quelle di partenza - si stampa il contenuto del DSS costruito (sia i dati che la descrizione) PROGRAMMA SAS n. 2: data es1; input sesso $ eta hinch wlib; altezza=hinch*2.54; peso=wlib*0.4536; datalines; f 14 56.3 85.0 f 15 62.3 105.0 f 15 63.3 108.0 f 16 59.0 92.0 f 19 62.5 112.5 f 17 62.5 112.0 f 18 59.0 104.0 f 14 56.5 69.0 f 16 62.0 94.5 f 14 53.8 68.5 f 13 61.5 104.0 f 17 61.5 103.5 f 15 64.5 123.5 f 14 58.3 93.0 f 14 51.3 50.5 f 14 58.8 89.0 f 19 65.3 107.0 f 15 59.5 78.5 f 14 61.3 115.0 f 18 63.3 114.0 f 14 61.8 85.0

(non è riportato una parte dell'INPUT) m 16 56.8 75.0 m 15 64.8 128.0 m 19 64.5 98.0 m 16 58.0 84.0 m 15 62.8 99.0 m 17 63.8 112.0 m 15 57.8 79.5 m 15 57.3 80.5 m 17 63.5 102.5 m 14 55.0 76.0 m 16 66.5 112.0 m 18 65.0 114.0 m 16 61.5 140.0 m 16 62.0 107.5 ; run; proc print data=es1; title ' ';run; i due primi run non sono necessari proc contents data=es1;run;

10

OUTPUT SAS: L'output della proc print è il seguente: OBS SESSO ETA HINCH WLIB ALTEZZA PESO 1 f 14 56.3 85.0 143.002 38.5560 2 f 15 62.3 105.0 158.242 47.6280 3 f 15 63.3 108.0 160.782 48.9888 4 f 16 59.0 92.0 149.860 41.7312 5 f 19 62.5 112.5 158.750 51.0300 6 f 17 62.5 112.0 158.750 50.8032 7 f 18 59.0 104.0 149.860 47.1744 8 f 14 56.5 69.0 143.510 31.2984 9 f 16 62.0 94.5 157.480 42.8652

(non è riportato una parte dell'output)

230 m 15 57.3 80.5 145.542 36.5148 231 m 17 63.5 102.5 161.290 46.4940 232 m 14 55.0 76.0 139.700 34.4736 233 m 16 66.5 112.0 168.910 50.8032 234 m 18 65.0 114.0 165.100 51.7104 235 m 16 61.5 140.0 156.210 63.5040 236 m 16 62.0 107.5 157.480 48.7620

L'output della proc contents è il seguente: CONTENTS PROCEDURE Data Set Name: WORK.ES1 Observations: 236 Member Type: DATA Variables: 6 Engine: V611 Indexes: 0 Created: 10:55 Friday, December 4, 1998 Observation Length: 48 Last Modified: 10:55 Friday, December 4, 1998 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information----- Data Set Page Size: 8192 Number of Data Set Pages: 2 File Format: 607 First Data Page: 1 Max Obs per Page: 169 Obs in First Data Page: 147 -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ------------------------------------ 5 ALTEZZA Num 8 32 2 ETA Num 8 8 3 HINCH Num 8 16 6 PESO Num 8 40 1 SESSO Char 8 0 4 WLIB Num 8 24

11

C3. OSSERVAZIONI PER LA SCRITTURA DEI PROGRAMMI - le istruzioni terminano con il carattere " ; " - si possono usare tutte le colonne di una linea - si possono scrivere più istruzioni su una linea (separate ovviamente da ;) - si può scrivere una istruzione su più linee - si possono mettere più istruzioni "RUN" all'interno di un programma - i commenti vanno compresi fra /* e */ ( es. PROC SORT; /* ordinamento dei dati */ )

ABBREVIAZIONI PER LISTE DI VARIABILI a) X1-Xn si considerano tutte le variabili da X1 a Xn ( X1 X2 X3 ... Xn)

b) X--A si considerano tutte le variabili da X a A

X-NUMERIC-A si considerano tutte le variabili numeriche da X a A

X-CHARACTER-A si considerano tutte le variabili carattere da X a A

c) _NUMERIC_ si considerano tutte le variabili numeriche

_CHARACTER_ si considerano tutte le variabili carattere

_ALL_ si considerano tutte le variabili

ESEMPIO 1: l'istruzione INPUT dell'esempio n.1 può scriversi: INPUT NOME $ A_CORSO $ VAR1-VAR3;

dopo PROC PRINT si potrebbe mettere l'istruzione: VAR NOME--VAR3 ;

che sarebbe equivalente a: VAR NOME A_CORSO VAR1 VAR2 VAR3;

(tale istruzione indica che la PROC deve essere effettuata solo per le variabili indicate)

ESEMPIO 2: data uno; input x1 x2 y x3 x5; datalines; 1 2 3 4 5 6 7 8 9 0 ;

proc print; var x1--x3; run; OUTPUT SAS: Obs x1 x2 y x3 1 1 2 3 4 2 6 7 8 9

proc print; var x1-x3; run; OUTPUT SAS: Obs x1 x2 x3 1 1 2 4 2 6 7 9

proc print; var x1--x5; run; OUTPUT SAS: Obs x1 x2 y x3 x5 1 1 2 3 4 5 2 6 7 8 9 0

proc print; var x1-x5; run; LOG SAS: ERROR: Variable X4 in suffix list not in data set.

D. IL PASSO DI DATA

12

D1. CREAZIONE DI UN DATA SET SAS DATI SU FILE ESTERNO:

DATA nome Data Set ; INFILE '[path] nome ' ; apre il file per la lettura

INPUT .........; descrive l'input assegnando un nome alle variabili con eventuale formato di lettura. altre istruzioni usate nel passo di DATA ; DATI INSERITI NEL PROGRAMMA:

DATA nome Data Set ; INPUT .........; altre istruzioni ; DATALINES; immediatamente prima dei dati linee di dati ; indica la fine dei dati DATI DA UN ALTRO DATA SET:

DATA nome del nuovo Data Set ; SET nome del Data Set da cui leggere i dati; altre istruzioni ;

OSS: oltre all'istruzione SET si possono usare anche le istruzioni MERGE e PUT con risultato analogo.

13

D2. ALCUNI ESEMPI DATI INSERITI NEL PROGRAMMA: PROGRAMMA SAS n. 2: data es1; input sesso $ eta hinch wlib; altezza=hinch*2.54; peso=wlib*0.4536; datalines; f 143 56.3 85.0 f 155 62.3 105.0 f 153 63.3 108.0 f 161 59.0 92.0 f 191 62.5 112.5 f 171 62.5 112.0 f 185 59.0 104.0 f 142 56.5 69.0 f 160 62.0 94.5 f 140 53.8 68.5 f 139 61.5 104.0 f 178 61.5 103.5 m 153 57.8 79.5 m 155 57.3 80.5 m 178 63.5 102.5 m 142 55.0 76.0 m 164 66.5 112.0 m 189 65.0 114.0 m 164 61.5 140.0 m 167 62.0 107.5 ; run; DATI SU FILE ESTERNO: PROGRAMMA SAS n. 3: data es2; infile 'a:es1.txt'; input sesso $ eta hinch wlib; altezza=hinch*2.54; peso=wlib*0.4536; run; DATI DA UN ALTRO DATA SET: PROGRAMMA SAS n. 4: data es3; set es2; if eta < 16 then cl_eta = 'giovane'; else cl_eta='vecchio'; run; proc print data=es3; var eta cl_eta; run;

14

D3. DATA SET SAS PERMANENTI

COME RENDERE PERMANENTE UN DATA SET SAS

a) bisogna creare una "libreria" con l'istruzione: LIBNAME nome simbolico libreria ' path '; si indica la directory dove scrivere i

Data Set permanenti

b) quando si costruisce il Data Set bisogna scrivere: DATA nome simbolico libreria .nome Data Set ; l'estensione del file costruito è SD2 INPUT ......; .............

I Data Set sono conservati con il nome: nome Data Set .SD2 nel path specificato dall'istruzione LIBNAME.

Esempio. PROGRAMMA SAS n. 5: libname corso 'a:\corsosas'; data corso.es3; set es2; if eta < 16 then cl_eta = 'giovane'; else cl_eta='vecchio'; run; proc print data=corso.es3; var eta cl_eta; run;

I Data Set permanenti sono memorizzati nella directory a:\corsosas. L'istruzione libname vale per tutti i Data Set costruiti nella sessione.

Il data set costruito è memorizzata nel file a:\corsosas\es3.sd2 che ha la struttura di un Data set SAS.

COME ACCEDERE AD UN DATA SET PERMANENTE

DATA nome nuovo Data Set ; SET nome simbolico libreria .nome Data Set ; .......... data corso.nuovo; set corso.es3;

15

E. MANIPOLAZIONE DI DATA SET Consideriamo il seguente esempio. PROGRAMMA SAS n. 6: libname corso 'a:\corsosas'; data corso.disney; input nome $ & sesso $ eta altezza peso; length nome $ 12; datalines; pippo m 32 190 54 paperino m 34 150 50 minnie f 35 145 40 clarabella f 30 180 65 nonna papera f 99 140 55 qui m 8 120 30 quo m 8 120 30 qua m 8 120 30 emy f 8 117 25 ely f 8 117 25 edy f 8 117 25 ; proc print data=corso.disney; run;

OUTPUT SAS: OBS NOME SESSO ETA ALTEZZA PESO 1 pippo m 32 190 54 2 paperino m 34 150 50 3 minnie f 35 145 40 4 clarabella f 30 180 65 5 nonna papera f 99 140 55 6 qui m 8 120 30 7 quo m 8 120 30 8 qua m 8 120 30 9 emy f 8 117 25 10 ely f 8 117 25 11 edy f 8 117 25 E1. SELEZIONE DI SOTTOCLASSI DI OSSERVAZIONI La selezione di sottoclassi di osservazioni contenute in un Data Set può avvenire con diverse forme. Presentiamo alcune di queste possibilità continuando l'esempio precedente: PROGRAMMA SAS n. 7:

data maschi; il DDS costruito è temporaneo set corso.disney; nome DSS con i dati di cui si vuole selezionare una sottoclasse if sesso='m'; precisa il criterio di selezione proc print data=maschi; run;

l'istruzione: if sesso='m'; può essere sostituita equivalentemente dalle istruzioni if sesso ^='m' then delete; if sesso ='m' then output;

16

OUTPUT SAS: OBS NOME SESSO ETA ALTEZZA PESO 1 pippo m 32 190 54 2 paperino m 34 150 50 3 qui m 8 120 30 4 quo m 8 120 30 5 qua m 8 120 30

E2. SELEZIONE DI OSSERVAZIONI CONSECUTIVE Si possono usare varie opzioni dell'istruzione SET, di cui presentiamo alcune possibilità (riferite sempre all'esempio di partenza): PROGRAMMA SAS n. 8: data prime3; set corso.disney(obs=3); proc print data=prime3; data dalla3; set corso.disney(firstobs=3); proc print data=dalla3; data centrali; set corso.disney(firstobs=3 obs=5); proc print data=centrali; run;

OUTPUT SAS: OBS NOME SESSO ETA ALTEZZA PESO 1 pippo m 32 190 54 2 paperino m 34 150 50 3 minnie f 35 145 40 OBS NOME SESSO ETA ALTEZZA PESO 1 minnie f 35 145 40 2 clarabella f 30 180 65 3 nonna papera f 99 140 55 4 qui m 8 120 30 5 quo m 8 120 30 6 qua m 8 120 30 7 emy f 8 117 25 8 ely f 8 117 25 9 edy f 8 117 25 OBS NOME SESSO ETA ALTEZZA PESO 1 minnie f 35 145 40 2 clarabella f 30 180 65 3 nonna papera f 99 140 55

17

PROGRAMMA SAS n. 8 bis data es2bis; infile 'a:es1.txt' firstobs=3; input sesso $ eta hinch wlib; proc print data=es2bis;

OUTPUT SAS: OBS SESSO ETA HINCH WLIB 1 f 153 63.3 108.0 2 f 161 59.0 92.0 3 f 191 62.5 112.5 4 f 171 62.5 112.0 5 f 185 59.0 104.0 6 f 142 56.5 69.0 7 f 160 62.0 94.5 8 f 140 53.8 68.5 9 f 139 61.5 104.0 10 f 178 61.5 103.5 11 m 153 57.8 79.5 12 m 155 57.3 80.5 13 m 178 63.5 102.5 14 m 142 55.0 76.0

(è omessa una parte dell’output)

E3. SELEZIONE DI VARIABILI La selezione delle variabili avviene mediante l'utilizzo delle istruzioni DROP e KEEP. Queste istruzioni sono complementari e servono per specificare: - quali variabili del vEcchio Data Set non si vogliono ricopiare nel nuovo (istruzione DROP). - quali variabili del vecchio Data Set si vogliono ricopiare nel nuovo (istruzione KEEP). Le istruzioni DROP e KEEP sono non eseguibili. Possono pertanto comparire in qualunque punto di un passo di Data. Esempio: PROGRAMMA SAS n. 9: data etasesso; set corso.disney; drop altezza peso; oppure keep nome eta sesso; .............. run;

In tal caso il nuovo Data Set non contiene più le variabili ALTEZZA e PESO, però tali variabili possono essere usate nelle istruzioni e nel calcolo di nuove variabili (ad es. rapporto=altezza/peso). DROP e KEEP possono comparire anche come opzioni di un Data Set SAS di input, come segue: PROGRAMMA SAS n. 10: data etasesso; set corso.disney(drop = altezza peso); .............. run;

In tal caso le variabili ALTEZZA e PESO non possono essere usate in alcun modo nel nuovo Data Set.

18

E4. CAMBIO DI NOME A VARIABILI È sufficiente usare l'istruzione RENAME come segue: PROGRAMMA SAS n. 10: data nuovo(rename=(sesso=mf)); set corso.disney; /* altre istruzioni */ run;

E5. COSTRUZIONE DI PIÙ DATA SET Si usano le istruzioni IF e SELECT che permettono di effetuare delle scelte condizionate. Esempio. PROGRAMMA SAS n. 11: data corso.maschi corso.femmine; set corso.disney; if sesso='m' then output corso.maschi; else if sesso='f' then output corso.femmine; else put 'osservazioni sbagliate' _all_; proc print data=corso.maschi; proc print data=corso.femmine; run; oppure: data corso.maschi corso.femmine; set corso.disney; select(sesso); when('m') output corso.maschi; when('f') output corso.femmine; otherwise put 'osservazioni sbagliate' _all_; end; proc print data=corso.maschi; proc print data=corso.femmine; run; OUTPUT SAS: (in entrambi i casi) OBS NOME SESSO ETA ALTEZZA PESO 1 pippo m 32 190 54 2 paperino m 34 150 50 3 qui m 8 120 30 4 quo m 8 120 30 5 qua m 8 120 30 OBS NOME SESSO ETA ALTEZZA PESO 1 minnie f 35 145 40 2 clarabella f 30 180 65 3 nonna papera f 99 140 55 4 emy f 8 117 25 5 ely f 8 117 25 6 edy f 8 117 25

19

Se la variabile Sesso contenesse un valore diverso da ‘m’ o ‘f’, ad esempio ‘M’, nella finestra di Log si avrebbe un messaggio come indicato nella istruzione put. PROGRAMMA SAS n. 11 bis: data errore; if nome='paperino' then sesso='M'; set corso.disney; data corso.maschi corso.femmine; set errore; if sesso='m' then output corso.maschi; else if sesso='f' then output corso.femmine; else put 'osservazioni sbagliate ' _all_; run;

LOG SAS: osservazioni sbagliate nome=paperino sesso=M eta=34 altezza=150 peso=50 _ERROR_=0 _N_=2

E6. CONCATENAZIONE DI PIÙ DATA SET Si usa ancora una volta l'istruzione SET, come nel seguente esempio in cui i DS hanno le stesse variabili: PROGRAMMA SAS n. 12: data corso.tutti; set corso.maschi corso.femmine; proc print data=corso.tutti; run; OUTPUT SAS della proc print:

OBS NOME SESSO ETA ALTEZZA PESO 1 pippo m 32 190 54 2 paperino m 34 150 50 3 qui m 8 120 30 4 quo m 8 120 30 5 qua m 8 120 30 6 minnie f 35 145 40 7 clarabella f 30 180 65 8 nonna papera f 99 140 55 9 emy f 8 117 25 10 ely f 8 117 25 11 edy f 8 117 25 L'istruzione Set usata nel seguente modo produrrebbe un output diverso: PROGRAMMA SAS n. 13: data corso.tutti2; set corso.maschi; set corso.femmine; proc print data=corso.tutti2; run; OUTPUT SAS n. 14: OBS NOME SESSO ETA ALTEZZA PESO 1 minnie f 35 145 40 2 clarabella f 30 180 65 3 nonna papera f 99 140 55 4 emy f 8 117 25 5 ely f 8 117 25

20

Il DSS TUTTI2 ha un numero di osservazioni uguale al minimo fra le osservazioni di MASCHI e FEMMINE; inoltre – in questo caso in cui le variabili dei due DSS sono le stesse – il secondo DSS viene scritto sul primo. Le due istruzioni set si possono usare quando i DS hanno variabili diverse (ma rilevate sulla stessa popolazione), come si vede nel seguente esempio. In questo caso i due DS risultano "affiancati". PROGRAMMA SAS n. 14: data corso.maschi1; set corso.maschi; keep nome sesso eta;

data corso.maschi2; set corso.maschi; keep nome altezza peso;

data corso.maschi3; set corso.maschi1; set corso.maschi2; proc print; run;

OUTPUT SAS: OBS NOME SESSO ETA ALTEZZA PESO 1 pippo m 32 190 54 2 paperino m 34 150 50 3 qui m 8 120 30 4 quo m 8 120 30 5 qua m 8 120 30 Nella variabile NOME sono scritti i valori assunti nel secondo DS. Se i Data Set hanno un diverso numero di osservazioni per ciascuna variabile, viene costruito un nuovo Data Set contenente tutte le variabili dei Data Set precedenti, mettendo a missing le osservazioni mancanti. Si potrebbe ottenere un DS simile al precedente utilizzando l'istruzione merge nel seguente modo:

PROGRAMMA SAS n. 15: proc sort data=corso.maschi1 out=corso.maschi1s; by nome; proc sort data=corso.maschi2 out=corso.maschi2s; by nome; data corso.maschi3s; merge corso.maschi1s corso.maschi2s; by nome; proc print data=corso.maschi1s; run; OUTPUT SAS: OBS NOME SESSO ETA ALTEZZA PESO 1 paperino m 34 150 50 2 pippo m 32 190 54 3 qua m 8 120 30 4 qui m 8 120 30 5 quo m 8 120 30 ALTRO ESEMPIO:

21

data uno; input n $ x y; datalines; a 12 13 b 14 15 d 16 17 ;

data due; input n $ x z; datalines; a 22 23 b 24 25 c 26 27 ;

data tre; set uno; set due; proc print; run;

OUTPUT SAS: Obs n x y z 1 a 22 13 23 2 b 24 15 25 3 c 26 17 27 ATTENZIONE A QUESTA OSSERVAZIONE

data quattro; merge uno due; by n; proc print; run;

OUTPUT SAS: Obs n x y z 1 a 22 13 23 2 b 24 15 25 3 c 26 . 27 4 d 16 17 .

E7. LETTURA DI DATI DA DATA SET DI "TIPO" DIVERSO Consideriamo il seguente esempio: PROGRAMMA SAS n. 16: proc means data=corso.disney; legge dal Data Set corso.disney var altezza peso; opera solo sulle variabili altezza e peso output out=sommario mean=m_alt m_peso; dà alle due medie i nomi e costruisce il Data Set proc print data=sommario; run; OUTPUT SAS della proc print: OBS _TYPE_ _FREQ_ M_ALT M_PESO 1 0 11 137.818 39 Se si vuole costruire un Data Set con gli scarti dalle medie bisogna operare nel seguente modo. PROGRAMMA SAS n. 17: data corso.diney1; if _n_=1 then set sommario; set corso.disney; alt_c=altezza-m_alt; peso_c=peso-m_peso; drop _type_ _freq_; proc print; run;

In tal modo si costruisce un Data Set con le variabili precedenti più le due medie m_alt e m_peso e le due nuove che sono alt_c e peso_c. OUTPUT SAS:

22

0BS M_ALT M_PESO NOME SESSO ETA ALTEZZA PESO ALT_C PESO_C 1 137.818 39 pippo m 32 190 54 52.1818 15 2 137.818 39 paperino m 34 150 50 12.1818 11 3 137.818 39 minnie f 35 145 40 7.1818 1 4 137.818 39 clarabella f 30 180 65 42.1818 26 5 137.818 39 nonna papera f 99 140 55 2.1818 16 6 137.818 39 qui m 8 120 30 -17.8182 -9 7 137.818 39 quo m 8 120 30 -17.8182 -9 8 137.818 39 qua m 8 120 30 -17.8182 -9 9 137.818 39 emy f 8 117 25 -20.8182 -14 10 137.818 39 ely f 8 117 25 -20.8182 -14 11 137.818 39 edy f 8 117 25 -20.8182 -14 Se non si mettesse l'istruzione if _n_=1 then ... verrebbe costruito un Data Set con tutte le variabili e un numero di osservazioni pari a quelle di sommario (primo Ds a cui si fa il set). OUTPUT SAS: 0BS M_ALT M_PESO NOME SESSO ETA ALTEZZA PESO ALT_C PESO_C 1 137.818 39 pippo m 32 190 54 52.1818 15

23

F. ANCORA SUL PASSO DI DATA F1. ESPRESSIONI E FUNZIONI SAS ESPRESSIONI SAS Sono le solite: costanti, date, operatori sia su variabili carattere che numeriche, ecc. FUNZIONI SAS Le funzioni del SAS, così come in tutti gli altri linguaggi di programmazione, sono dei programmi già scritti che si richiamano con una parola chiave e ritornano un valore calcolato sugli argomenti che vengono passati alla funzione. Il formato delle funzioni può essere uno dei seguenti: NOME-FUNZIONE (arg1 , arg2 , ... , argn ); NOME-FUNZIONE (OF var1 - varn ); NOME-FUNZIONE (OF var1 var2 var3......varn ); (il più comune è il primo tipo presentato) Gli argomenti delle funzioni possono essere: - costanti o variabili numeriche - costanti o variabili alfanumeriche - espressioni comprese quelle in cui compaiono altre funzioni Le funzioni SAS si possono distinguere nelle seguenti classi: - funzioni aritmetiche ricordiamo ABS, MIN, MAX, DIM (indica la dimensione di un'array) HBOUND, LBOUND

(forniscono i limiti di un'array), ecc. - funzioni di troncamento - funzioni matematiche ricordiamo EXP, LOG, GAMMA (funzione Gamma completa), LGAMMA (log. nat. della funz.

Gamma). - funzioni trigonometriche - funzioni probabilistiche valore integrale valore quantile POISSON PROBBETA BETAINV PROBBNML PROBCHI CINV PROBF FINV PROBGAM GAMINV PROBNORM PROBIT PROBT TINV (con opportuni parametri) - funzioni statistiche ricordiamo MIN (minimo), MAX (massimo), MEAN (media), N (numero di dati non missing),

NMISS (numero di dati missing), RANGE (rango), STD (standard deviation), SUM (somma), VAR (varianza), USS (somma dei quadrati dei dati), CSS (somma dei quadrati dei dati centrati sulla media).

- funzioni per generare numeri casuali ricordiamo NORMAL (che genera una variabile normale), RANBIN (che genera un'osservazione da

una binomiale), RANEXP (che genera una osservazione da un'esponenziale di parametro 1), RAGGAMM, ecc.

- funzioni per elaborazione di stringhe - funzioni per elaborare date e tempi - funzioni di sistema - funzioni speciali

24

DIFFERENZA TRA FUNZIONI E PROCEDURE Le funzioni producono statistiche per ogni osservazione (riga) nel Data Set SAS e producono risultati pari al numero di osservazioni. Le procedure producono statistiche per le variabili (colonne) nel Data Set SAS. PROGRAMMA SAS n. 17 bis: data temperature; input citta $ t6 t12 t18; media_temp=mean(t6,t12,t18); istruzioni equivalenti media_temp=mean(of t6 t12 t18); media_temp=mean(of t6—-t18); datalines; Genova 19 24 22 Milano 15 18 18 Napoli 26 30 29 ; run; OUTPUT SAS proc print; run;

Obs citta t8 t12 t18 media_temp 1 Genova 19 24 22 21.6667 2 Milano 15 18 18 17.0000 3 Napoli 26 30 29 28.3333

proc means; run;

The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ t8 3 20.0000000 5.5677644 15.0000000 26.0000000 t12 3 24.0000000 6.0000000 18.0000000 30.0000000 t18 3 23.0000000 5.5677644 18.0000000 29.0000000 media_temp 3 22.3333333 5.6960025 17.0000000 28.3333333 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

F2. VALORI MANCANTI Il valore di una variabile viene messo a missing se il campo di input è blank oppure è un punto (salvo diversa specificazione di formato). I valori mancanti si propagano nelle espressioni aritmetiche; nelle funzioni, invece, il discorso cambia. I valori mancanti nei confronti vengono messi a "meno infinito". Esempio. PROGRAMMA SAS n. 18: data es5; input dato1 dato2; somma=dato1+dato2; totale=sum(dato1,dato2); media1=(dato1+dato2)/2; media2=mean(dato1,dato2); datalines; 1 3 6 4 . 78 8 1 12 14 ; proc print;run;

25

OUTPUT SAS: Obs dato1 dato2 somma totale media1 media2 1 1 3 4 4 2.0 2.0 2 6 4 10 10 5.0 5.0 3 . 78 . 78 . 78.0 4 8 1 9 9 4.5 4.5 5 12 14 26 26 13.0 13.0

Osservare che: . + 78 ha come risultato .

sum( . , 78) ha come risultato 78 In generale le funzioni "ignorano" i valori missing; con SUM i missing sono considerati 0, con MEAN viene fatta la somma dei valori non missing e il risultato viene diviso per il numero dei valori non missing, ... . F3. LE SOMME CUMULATE Istruzione RETAIN E' una istruzione non eseguibile e quindi può essere messa in qualunque punto del passo di DATA; svolge le due seguenti funzioni: - trattiene i valori delle variabili dalla precedente esecuzione del passo di Data - assegna dei valori iniziali alle variabili. La sintassi dell'istruzione RETAIN è la seguente:

RETAIN variabili valori iniziali; Esempio. PROGRAMMA SAS n. 19: DATA ADD; RETAIN TOTALE 0; INPUT PUNTEGGI; TOTALE=TOTALE+PUNTEGGI; DATALINES; 10 3 7 5 ; PROC PRINT;RUN;

OUTPUT SAS: OBS TOTALE PUNTEGGI 1 10 10 2 13 3 3 20 7 4 25 5

PROGRAMMA SAS n. 19 bis: DATA ADD2; SET ADD; RETAIN CONTO 0; CONTO=CONTO+3; PROC PRINT;RUN;

OUTPUT SAS: OBS CONTO TOTALE PUNTEGGI 1 3 10 10 2 6 13 3 3 9 20 7 4 12 25 5

26

Istruzione somma

Per effettuare somme cumulate, come nell’esempio precedente per la variabile CONTO, si può usare la seguente espressione sintetica:

CONTO + 3; Che corrisponde alle istruzioni Retain CONTO 0; CONTO=sum(CONTO, 3); La sua sintassi generale è: variabile + espressione; Alcuni esempi che illustrano possibili espressioni sono: bilancia + (- debito); somma2 + x*x; nx + (x ne .); Con l’ultima istruzione si contano i valori non missing; infatti l’espressione logica (x ne .)vale 1 se è veririficata e 0 altrimenti. Retain e valori mancanti PROGRAMMA SAS n. 20: DATA ADD; RETAIN CONTO TOTALE 0; INPUT PUNTEGGI; TOTALE=TOTALE+PUNTEGGI; CONTO=CONTO+3; DATALINES; 10 3 7 . (missing value) 6 4 ; PROC PRINT;RUN;

OUTPUT SAS: OBS CONTO TOTALE PUNTEGGI 1 3 10 10 2 6 13 3 3 9 20 7 4 12 . . 5 15 . 6 6 18 . 4

Se si volesse avere TOTALE con valore anche per le osservazioni 4, 5 e 6 si dovrebbe fare: TOTALE = SUM (TOTALE, PUNTEGGI);

Se si omettesse l'istruzione RETAIN le variabili TOTALE e PUNTEGGI avrebbero solo valori missing.

27

F4. APPROFONDIMENTI SUL PASSO DI DATA (tratto da Help on line SAS) Flow of Action

When you submit a DATA step for execution, it is first compiled and then executed. The following figure shows the flow of action for a typical SAS DATA step.

Flow of Action in the DATA Step

28

The Compilation Phase

When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a type conversion is necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items:

input buffer is a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. Note that this buffer is created only when the DATA step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly into the program data vector.)

program data vector (PDV)

is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred). SAS does not write these variables to the output data set.

descriptor information

is information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables.

The Execution Phase By default, a simple DATA step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step is described as follows:

1. The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.

2. SAS sets the newly created program variables to missing in the program data vector (PDV). 3. SAS reads a data record from a raw data file into the input buffer, or it reads an observation from

a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record.

4. SAS executes any subsequent programming statements for the current record. 5. At the end of the statements, an output, return, and reset occur automatically. SAS writes an

observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here.

6. SAS counts another iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation.

7. The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file. Note: The figure shows the default processing of the DATA step. You can code data-reading statements (such as INPUT or SET), or data-writing statements (such as OUTPUT), in any order in your program.

29

Processing a DATA Step: A Walkthrough

Sample DATA Step The following statements provide an example of a DATA step that reads raw data, calculates totals, and creates a data set: data total_points (drop=TeamName); [1] input TeamName $ ParticipantName $ Event1 Event2 Event3; [2] TeamTotal + (Event1 + Event2 + Event3); [3] datalines; Knights Sue 6 8 8 Cardinals Jane 9 7 8 Knights John 7 7 7 Knights Lisa 8 9 9 Knights Fran 7 6 6 Knights Walter 9 8 10 ;

The DROP= data set option prevents the variable TeamName from being written to the output SAS data set called TOTAL_POINTS.

The INPUT statement describes the data by giving a name to each variable, identifying its data type (character or numeric), and identifying its relative location in the data record.

The Sum statement accumulates the scores for three events in the variable TeamTotal.

Creating the Input Buffer and the Program Data Vector When DATA step statements are compiled, SAS determines whether to create an input buffer. If the input file contains raw data (as in the example above), SAS creates an input buffer to hold the data before moving the data to the program data vector (PDV). (If the input file is a SAS data set, however, SAS does not create an input buffer. SAS writes the input data directly to the PDV.) The PDV contains all the variables in the input data set, the variables created in DATA step statements, and the two variables, _N_ and _ERROR_, that are automatically generated for every DATA step. The _N_ variable represents the number of times the DATA step has iterated. The _ERROR_ variable acts like a binary switch whose value is 0 if no errors exist in the DATA step, or 1 if one or more errors exist. The following figure shows the Input Buffer and the program data vector after DATA step compilation. Input Buffer and Program Data Vector

Variables that are created by the INPUT and the Sum statements (TeamName, ParticipantName, Event1, Event2, Event3, and TeamTotal) are set to missing initially. Note that in this representation, numeric variables are initialized with a period and character variables are initialized with blanks. The automatic variable _N_ is set to 1; the automatic variable _ERROR_ is set to 0. The variable TeamName is marked Drop in the PDV because of the DROP= data set option in the DATA statement. Dropped variables are not written to the SAS data set. The _N_ and _ERROR_ variables are dropped because automatic variables created by the DATA step are not written to a SAS data set. See SAS Variables for details about automatic variables.

30

Reading a Record SAS reads the first data line into the input buffer. The input pointer, which SAS uses to keep its place as it reads data from the input buffer, is positioned at the beginning of the buffer, ready to read the data record. The following figure shows the position of the input pointer in the input buffer before SAS reads the data. Position of the Pointer in the Input Buffer Before SAS Reads Data

The INPUT statement then reads data values from the record in the input buffer and writes them to the PDV where they become variable values. The following figure shows both the position of the pointer in the input buffer, and the values in the PDV after SAS reads the first record. Values from the First Record are Read into the Program Data Vector

After the INPUT statement reads a value for each variable, SAS executes the Sum statement. SAS computes a value for the variable TeamTotal and writes it to the PDV. The following figure shows the PDV with all of its values before SAS writes the observation to the data set. Program Data Vector with Computed Value of the Sum Statement

Writing an Observation to the SAS Data Set When SAS executes the last statement in the DATA step, all values in the PDV, except those marked to be dropped, are written as a single observation to the data set TOTAL_POINTS. The following figure shows the first observation in the TOTAL_POINTS data set. The First Observation in Data Set TOTAL_POINTS

SAS then returns to the DATA statement to begin the next iteration. SAS resets the values in the PDV in the following way:

! The values of variables created by the INPUT statement are set to missing. ! The value created by the Sum statement is automatically retained. ! The value of the automatic variable _N_ is incremented by 1, and the value of _ERROR_ is reset

to 0. The following figure shows the current values in the PDV.

31

Current Values in the Program Data Vector

Reading the Next Record SAS reads the next record into the input buffer. The INPUT statement reads the data values from the input buffer and writes them to the PDV. The Sum statement adds the values of Event1, Event2, and Event3 to TeamTotal. The value of 2 for variable _N_ indicates that SAS is beginning the second iteration of the DATA step. The following figure shows the input buffer, the PDV for the second record, and the SAS data set with the first two observations. Input Buffer, Program Data Vector, and First Two Observations

As SAS continues to read records, the value in TeamTotal grows larger as more participant scores are added to the variable. _N_ is incremented at the beginning of each iteration of the DATA step. This process continues until SAS reaches the end of the input file.

When the DATA Step Finishes Executing The DATA step stops executing after it processes the last input record. You can use PROC PRINT to print the output in the TOTAL_POINTS data set: Output from the Walkthrough DATA Step

Total Team Scores 1 Participant Team Obs Name Event1 Event2 Event3 Total 1 Sue 6 8 8 22 2 Jane 9 7 8 46 3 John 7 7 7 67 4 Lisa 8 9 9 93 5 Fran 7 6 6 112 6 Walter 9 8 10 139

32

F5. GLI ARRAY (tratto da Help on line SAS)

Syntax ARRAY array-name { subscript } <$><length> <array-elements> <(initial-value-list)>; !

"#$%&'()*!!array-name

names the array. {subscript}

describes the number and arrangement of elements in the array by using an asterisk, a number, or a range of numbers. Subscript has one of these forms: {dimension-size(s)} indicates the number of elements in each dimension of the array. Dimension-size is a numeric representation of either the number of elements in a one-dimensional array or the number of elements in each dimension of a multidimensional array.

$ indicates that the elements in the array are character element.

length specifies the length of elements in the array that have not been previously assigned a length.

array-elements names the elements that make up the array. Array-elements must be either all numeric or all character, and they can be listed in any order. The elements can be variables

lists variable names. (initial-value-list) gives initial values for the corresponding elements in the array. The values for elements can be numbers or character strings. You must enclose all character strings in quotation marks. To specify one or more initial values directly, use the following format: (initial-value(s)) To specify an iteration factor and nested sublists for the initial values, use the following format: <constant-iter-value*> <(>constant value | constant-sublist<)>

Examples!

+,-&./'!01!2'34(4($!"##-5*!!! array rain {5} janr febr marr aprr mayr; ! array days{7} d1-d7; ! array month{*} jan feb jul oct nov; ! array x{*} _NUMERIC_; ! array qbx{10}; ! array meal{3};

+,-&./'!61!"**4$(4($!7(4)4-/!8%&'#49!:-/%'*!!! array test{4} t1 t2 t3 t4 (90 80 70 70); ! array test{4} t1-t4 (90 80 2*70); ! array test{4} _TEMPORARY_ (90 80 70 70);

+,-&./'!;1!2'34(4($!7(4)4-/!<=-#-9)'#!:-/%'*!!! array test2{*} a1 a2 a3 ('a','b','c');

33

+,-&./'!>1!?*4($!7)'#-)4@'!2ABCDD.!E#D9'**4($!!In this example, the statements process each element of the array, using the value of variable I as the subscript on the array references for each iteration of the DO loop. If an array element has a value of 99, the IF-THEN statement changes that value to 100. array days{7} d1-d7; do i=1 to 7; if days{i}=99 then days{i}=100; end;

+,-&./'!F1!G'3'#'(94($!H-(5!"##-5*!4(!A('!I)-)'&'()!!You can refer to more than one array in a single SAS statement. In this example, you create two arrays, DAYS and HOURS. The statements inside the DO loop substitute the current value of variable I to reference each array element in both arrays. array days{7} d1-d7; array hours{7} h1-h7; do i=1 to 7; if days{i}=99 then days{i}=100; hours{i}=days{i}*24; end;

+,-&./'!J1!?*4($!)='!"*)'#4*K!G'3'#'(9'*!-*!-!:-#4-L/'!C4*)!!! array cost{10} cost1-cost10; totcost=sum(of cost {*}); ! array days{7} d1-d7; input days {*}; ! array hours{7} h1-h7; put hours {*};

ATTENZIONE: la struttura di array non viene conservata nel Data Set ma è utilizzabile solo nel passo di Data nel quale l’array è costruito. F6. ISTRUZIONI DI CONTROLLO 1) IF espressione THEN istruzione ; (espressione=1 se vera, espressione=0 se falsa) ELSE istruzione ;

- se devono essere eseguite più istruzioni: IF espressione THEN DO; serie di istruzioni ; END; ELSE DO; serie istruzioni; END; - l'istruzione IF espressione ; è equivalente all'espressione IF (¬ espressione ) THEN DELETE;

34

ESEMPIO. PROGRAMMA SAS n 21: data corso.maschi corso.femmine; set corso.disney; if sesso='m' then output corso.maschi; else if sesso='f' then output corso.femmine; else put 'osservazioni sbagliate ' _all_; run; 2) SELECT (variabile ) ; espressione che valuta un singolo valore WHEN (espressione ) istruzione ; OTHERWISE istruzione ; END; ESEMPIO. PROGRAMMA SAS n.21 bis: data corso.maschi corso.femmine; set corso.disney; select(sesso); when('m') output corso.maschi; when('f') output corso.femmine; otherwise put 'osservazioni sbagliate' _all_; end; run; Attenzione: SELECT si può usare solo per valori esatti, altrimenti usare IF ESEMPIO. PROGRAMMA SAS n.21 ter: data piccoli grandi; set corso.disney; if altezza <150 then output piccoli; else output grandi; proc print data=piccoli; proc print data=grandi; run; data piccoli grandi; set corso.disney; select(altezza); when(<150) output piccoli; otherwise output grandi; end; run; /*guardare il log: non funziona!!! Con when ci vuole un valore esatto*/ 3) DO variabile indice = val. iniziale [TO val. finale [BY incremento ] [WHILE (espressione )] [UNTIL(espressione )] ; istruzioni ; END; dove: - la variabile indice può anche essere di tipo carattere; - val.iniziale e val.finale possono essere sostituiti da una serie di valori separati da ","

attenzione: nel Data Set viene scritto solo il valore che le variabili (create o modificate nelle istruzioni che compaiono "dentro il DO") hanno alla fine dell'esecuzione dell'istruzione DO; se si vuole conservare il valore delle variabili a ogni passo del DO è necessario scrivere "dentro il DO" l'istruzione OUTPUT;

35

ESEMPIO. PROGRAMMA SAS n. 22: data es4; do i=-1 to 1 by .1; x=probnorm(i); output; end; proc print; run;

OUTPUT SAS: OBS I X 1 -1.00000 0.15866 2 -0.90000 0.18406 3 -0.80000 0.21186 4 -0.70000 0.24196 5 -0.60000 0.27425 6 -0.50000 0.30854 7 -0.40000 0.34458 8 -0.30000 0.38209 9 -0.20000 0.42074 10 -0.10000 0.46017 11 -0.00000 0.50000 12 0.10000 0.53983 13 0.20000 0.57926 14 0.30000 0.61791 15 0.40000 0.65542 16 0.50000 0.69146 17 0.60000 0.72575 18 0.70000 0.75804 19 0.80000 0.78814 20 0.90000 0.81594 21 1.00000 0.84134

4) DO WHILE (espressione ) 5) DO UNTIL (espressione )

per le istruzioni DO WHILE e DO UNTIL valgono le stesse considerazioni fatte per l'istruzione DO a proposito dell'istruzione OUTPUT

6) GOTO etichetta ; le etichette si dichiarano come ET:....

7) LINK etichetta ;

36

ESEMPI DI USO DEL CICLO DO

COSTRUZIONE DATA SET ORIGINALE

data uno; input x y a$; datalines; 5 7 n 3 2 a 4 7 a 2 1 n 5 5 n ;

data due; set uno; do i=5; z=x*i; end; proc print;run;

Obs x y a i z 1 5 7 n 5 25 2 3 2 a 5 15 3 4 7 a 5 20 4 2 1 n 5 10 5 5 5 n 5 25

data tre; set uno; do i=5,7; z=x*i; end; proc print;run;

Obs x y a i z 1 5 7 n 7 35 2 3 2 a 7 21 3 4 7 a 7 28 4 2 1 n 7 14 5 5 5 n 7 35

data quattro; set uno; do i=5,7; z=x*i; output; end; proc print;run;

Obs x y a i z 1 5 7 n 5 25 2 5 7 n 7 35 3 3 2 a 5 15 4 3 2 a 7 21 5 4 7 a 5 20 6 4 7 a 7 28 7 2 1 n 5 10 8 2 1 n 7 14 9 5 5 n 5 25 10 5 5 n 7 35

data cinque; set uno; do i=y; z=x*i; output; end; proc print;run;

Obs x y a i z 1 5 7 n 7 35 2 3 2 a 2 6 3 4 7 a 7 28 4 2 1 n 1 2 5 5 5 n 5 25

37

data sei; set uno; do i=y, x; z=x*i; output; end; proc print;run;

Obs x y a i z 1 5 7 n 7 35 2 5 7 n 5 25 3 3 2 a 2 6 4 3 2 a 3 9 5 4 7 a 7 28 6 4 7 a 4 16 7 2 1 n 1 2 8 2 1 n 2 4 9 5 5 n 5 25 10 5 5 n 5 25

data sette; set uno; do i=y, x; z=x*i; end; proc print;run;

Obs x y a i z 1 5 7 n 5 25 2 3 2 a 3 9 3 4 7 a 4 16 4 2 1 n 2 4 5 5 5 n 5 25

data otto; set uno; do i=1 to 7 by 2; z=x*i; end; proc print;run;

Obs x y a i z 1 5 7 n 9 35 2 3 2 a 9 21 3 4 7 a 9 28 4 2 1 n 9 14 5 5 5 n 9 35

data nove; set uno; do i=1 to 7 by 2; z=x*i; output; end; proc print;run;

Obs x y a i z 1 5 7 n 1 5 2 5 7 n 3 15 3 5 7 n 5 25 4 5 7 n 7 35 5 3 2 a 1 3 6 3 2 a 3 9 7 3 2 a 5 15 8 3 2 a 7 21 9 4 7 a 1 4 10 4 7 a 3 12 11 4 7 a 5 20 12 4 7 a 7 28 13 2 1 n 1 2 14 2 1 n 3 6 15 2 1 n 5 10 16 2 1 n 7 14 17 5 5 n 1 5 18 5 5 n 3 15 19 5 5 n 5 25 20 5 5 n 7 35

38

data dieci; set uno; do i='r','s','t'; z=trim(a)||i; output; end; proc print;run;

Obs x y a i z 1 5 7 n r nr 2 5 7 n s ns 3 5 7 n t nt 4 3 2 a r ar 5 3 2 a s as 6 3 2 a t at 7 4 7 a r ar 8 4 7 a s as 9 4 7 a t at 10 2 1 n r nr 11 2 1 n s ns 12 2 1 n t nt 13 5 5 n r nr 14 5 5 n s ns 15 5 5 n t nt

data undici; set uno; n=0; do until(n>5); !!!!!! n+1; output; end; proc print;run;

data dodici; n=0; do until (n>5); n+1; output; end; proc print;run;

Obs n 1 1 2 2 3 3 4 4 5 5 6 6

Obs x y a n 1 5 7 n 1 2 5 7 n 2 3 5 7 n 3 4 5 7 n 4 5 5 7 n 5 6 5 7 n 6 7 3 2 a 1 8 3 2 a 2 9 3 2 a 3 10 3 2 a 4 11 3 2 a 5 12 3 2 a 6 13 4 7 a 1 14 4 7 a 2 15 4 7 a 3 16 4 7 a 4 17 4 7 a 5 18 4 7 a 6 19 2 1 n 1 20 2 1 n 2 21 2 1 n 3 22 2 1 n 4 23 2 1 n 5 24 2 1 n 6 25 5 5 n 1 26 5 5 n 2 27 5 5 n 3 28 5 5 n 4 29 5 5 n 5 30 5 5 n 6

data tredici; n=0; do while (n<=5); n+1; output; end; proc print;run;

Obs n 1 1 2 2 3 3 4 4 5 5 6 6

39

F7. L' ISTRUZIONE INPUT L'istruzione INPUT si leggono dati grezzi (non Data Set SAS) su file, residenti su disco, ecc. Non ci sono limiti al numero di istruzioni INPUT che possono comparire in un passo di Data. Ciascuna istruzione INPUT legge: - da un file indirizzato da una istruzione INFILE che deve precedere la istruzione INPUT - da programma se non compare alcuna istruzione INFILE. In tal caso per segnalare al SAS l'inizio dei

dati, deve comparire una istruzione DATALINES (o CARDS) come ultima istruzione del passo di Data ed immediatamente prima dell'inizio dei dati.

In SAS sono possibili vari tipi di input, di cui i principali sono: 1) a colonne 2) a lista 3) a formato E' possibile nella stessa istruzione combinare i tre tipi di input. INPUT A COLONNE: La sintassi della istruzione e': INPUT var_1 [$] col_partenza_1 [ - col_fine_1 ] [ . decimali_1 ] .... [ var_n [$] col_partenza_n [ - col_fine_N ] [ . decimali_n ] ]; Valgono le seguenti regole di scrittura: - il $ è usato per indicare una variabile carattere - i campi contenenti i dati possono essere letti in qualsiasi ordine

- campi o parti di campi possono essere riletti

Esempio: INPUT NOME $ 1-8 INIZIALE $ 1 ; - le variabili carattere possono essere al massimo lunghe 200 caratteri (da precisare con l'istruzione LENGHT) e possono avere spazi come caratteri propri del campo (ad es. DE LUCA).

Alcune considerazioni sull'INPUT a colonna: - i valori mancanti vanno codificati con un punto (all'interno del campo) o con degli spazi; - i valori carattere, prima di essere assegnati alle variabili, sono allineati a sinistra. Ad esempio, avendo l'istruzione INPUT SESSO $ 1-3 allora: M (il valore della variabile compare in prima colonna) M (il valore della variabile compare in seconda colonna) M (il valore della variabile compare in terza colonna) sono letti allo stesso modo. - i valori numerici possono comparire ovunque nel campo; possono essere specificati il segno, le cifre

decimali o l'esponente. Esempio:

con l'istruzione INPUT X 1-6; si può leggere uno dei seguenti numeri 23 (il segno € indica uno spazio bianco) 23.0 2.3E1 23 -23

- in un campo numerico non sono permessi spazi bianchi (esempio: - 23 )

Esempio di input a colonna: INPUT PESO 20-24 ETA 13-14 NOME $ 1-8 SESSO $ 11 ALTEZZA 16-18 INIZ_NOM $ 1;

40

INPUT A LISTA: La sintassi della istruzione è la seguente:

INPUT variabile_1 [$] [&]...........[ [variabile_n [$] [&] ] ; Valgono le seguenti regole di scrittura: - il $ è usato per indicare una variabile carattere; - ogni campo deve essere specificato in ordine di registrazione dei dati d'ingresso; - i campi devono essere separati da uno o più spazi o da altro carattere specificato con l’ozione

DELIMITER= dell’istruzione INFILE (vedi approfondimenti successivi); - le variabili carattere possono essere al massimo lunghe 8 caratteri (a meno che non usi l'istruzione LENGHT);

- se vi sono spazi bianchi come caratteri propri del campo (ad es. DE LUCA) deve essere messo nella dichiarazione il simbolo &. In tal caso il campo successivo deve essere separato da almeno due spazi.

Considerazioni sull'input a lista: - il SAS ricerca nella riga il primo carattere diverso da blank e da quel punto in poi, fino al successivo

blank (o successivi due, se compare il simbolo &), il campo letto rappresenta il valore della variabile; - le variabili da leggere debbono comparire nella istruzione INPUT nello stesso ordine in cui appaiono

nei dati; - i valori mancanti vanno tutti codificati con un punto; - l'istruzione INPUT termina quando è stato assegnato un valore a tutte le variabili. Se i dati presenti in

una riga non sono sufficienti, verrà letto il record successivo. ESEMPI FILE DI INPUT: Country Car MPG Weight Drive Horse Displa Cyl. Accel. Ratio power cement U.S. Buick Estate Wagon 16.9 4.360 2.73 155 350 8 14.9 U.S. Ford Country Squire Wagon 15.5 4.054 2.26 142 351 8 14.3 U.S. Chevy Malibu Wagon 19.2 3.605 2.56 125 267 8 15.0 U.S. Chrysler LeBaron Wagon 18.5 3.940 2.45 150 360 8 13.0 U.S. Chevette 30.0 2.155 3.70 68 98 4 16.5 Japan Toyota Corona 27.5 2.560 3.05 95 134 4 14.2 Japan Datsun 510 27.2 2.300 3.54 97 119 4 14.7 U.S. Dodge Omni 30.9 2.230 3.37 75 105 4 14.5 Germany Audi 5000 20.3 2.830 3.90 103 131 5 15.9 Sweden Volvo 240 GL 17.0 3.140 3.50 125 163 6 13.6 Sweden Saab 99 GLE 21.6 2.795 3.77 115 121 4 15.7 France Peugeot 694 SL 16.2 3.410 3.58 133 163 6 15.8 U.S. Buick Century Special 20.6 3.380 2.73 105 231 6 15.8 U.S. Mercury Zephyr 20.8 3.070 3.08 85 200 6 16.7 U.S. Dodge Aspen 18.6 3.620 2.71 110 225 6 18.7 U.S. AMC Concord D/L 18.1 3.410 2.73 120 258 6 15.1 U.S. Chevy Caprice Classic 17.0 3.840 2.41 130 305 8 15.4 U.S. Ford LTD 17.6 3.725 2.26 129 302 8 13.4 U.S. Mercury Grand Marquis 16.5 3.955 2.26 138 351 8 13.2 U.S. Dodge St Regis 18.2 3.830 2.45 135 318 8 15.2 U.S. Ford Mustang 4 26.5 2.585 3.08 88 140 4 14.4 U.S. Ford Mustang Ghia 21.9 2.910 3.08 109 171 6 16.6 Japan Mazda GLC 34.1 1.975 3.73 65 86 4 15.2 Japan Dodge Colt 35.1 1.915 2.97 80 98 4 14.4 U.S. AMC Spirit 27.4 2.670 3.08 80 121 4 15.0 Germany VW Scirocco 31.5 1.990 3.78 71 89 4 14.9 Japan Honda Accord LX 29.5 2.135 3.05 68 98 4 16.6 U.S. Buick Skylark 28.4 2.670 2.53 90 151 4 16.0 U.S. Chevy Citation 28.8 2.595 2.69 115 173 6 11.3 U.S. Olds Omega 26.8 2.700 2.84 115 173 6 12.9 U.S. Pontiac Phoenix 33.5 2.556 2.69 90 151 4 13.2 U.S. Plymouth Horizon 34.2 2.200 3.37 70 105 4 13.2 Japan Datsun 210 31.8 2.020 3.70 65 85 4 19.2 Italy Fiat Strada 37.3 2.130 3.10 69 91 4 14.7 Germany VW Dasher 30.5 2.190 3.70 78 97 4 14.1 Japan Datsun 810 22.0 2.815 3.70 97 146 6 14.5 Germany BMW 320i 21.5 2.600 3.64 110 121 4 12.8 Germany VW Rabbit 31.9 1.925 3.78 71 89 4 14.0

41

PROGRAMMA SAS n. 23: option pagesize=60 nodate; libname corso 'a:\corsosas'; data corso.CARS; infile 'a:\acp.txt' firstobs=3; length TIPO $ 25; input NAZIONE $ TIPO $ & CONSUMO PESO DRIVE_R POTENZA CILINDRA NUM_C RIPRESA; proc print round; var tipo NAZIONE CONSUMO PESO DRIVE_R POTENZA CILINDRA NUM_C RIPRESA; run;

OUTPUT SAS (parziale): OBS TIPO NAZIONE CONSUMO PESO DR_R POTEN CILIN NUM_C RIPRESA 1 Buick Estate Wagon U.S. 16.9 4.36 2.73 155 350 8 14.9 2 Ford Country Squire Wagon U.S. 15.5 4.05 2.26 142 351 8 14.3 3 Chevy Malibu Wagon U.S. 19.2 3.61 2.56 125 267 8 15.0 4 Chrysler LeBaron Wagon U.S. 18.5 3.94 2.45 150 360 8 13.0 5 Chevette U.S. 30.0 2.16 3.70 68 98 4 16.5 6 Toyota Corona Japan 27.5 2.56 3.05 95 134 4 14.2 7 Datsun 510 Japan 27.2 2.30 3.54 97 119 4 14.7 8 Dodge Omni U.S. 30.9 2.23 3.37 75 105 4 14.5 9 Audi 5000 Germany 20.3 2.83 3.90 103 131 5 15.9 10 Volvo 240 GL Sweden 17.0 3.14 3.50 125 163 6 13.6 11 Saab 99 GLE Sweden 21.6 2.80 3.77 115 121 4 15.7 12 Peugeot 694 SL France 16.2 3.41 3.58 133 163 6 15.8 13 Buick Century Special U.S. 20.6 3.38 2.73 105 231 6 15.8 14 Mercury Zephyr U.S. 20.8 3.07 3.08 85 200 6 16.7 15 Dodge Aspen U.S. 18.6 3.62 2.71 110 225 6 18.7 16 AMC Concord D/L U.S. 18.1 3.41 2.73 120 258 6 15.1 17 Chevy Caprice Classic U.S. 17.0 3.84 2.41 130 305 8 15.4 18 Ford LTD U.S. 17.6 3.73 2.26 129 302 8 13.4 19 Mercury Grand Marquis U.S. 16.5 3.96 2.26 138 351 8 13.2 20 Dodge St Regis U.S. 18.2 3.83 2.45 135 318 8 15.2

(non è riportata una parte dell'output)

42

INPUT A FORMATO: La sintassi dell'istruzione è la seguente: INPUT puntatore_1 variabile_1 formato_1 ....... [puntatore_n variabile_n formato_n ] ; dove: - puntatore può essere uno dei seguenti: +n il cursore si sposta di n colonne / il cursore va alla prima colonna del record successivo # n il cursore si posiziona alla prima colonna del record n @ n il cursore si posiziona alla colonna n @ 'carattere ' il cursore si posiziona alla prima colonna diversa da blank dopo il carattere indicato @@ indica che i dati di un record si riferiscono a più osservazioni

- variabile è il nome valido di una o più variabili SAS

- formato può essere uno dei seguenti (sono elencati i più comuni): w. formato per leggere un campo numerico lungo w cifre; w.d formato per leggere un campo numerico lungo w cifre di cui d decimali; $w. formato per leggere un campo alfanumerico lungo w caratteri; $CHARw. formato per leggere un campo alfanumerico lungo w caratteri contenente blank; Ew. formato esadecimale. Con l'input a formato i valori mancanti vanno codificati con un punto o con degli spazi. Si possono considerare variabili numeriche con 0 al posto di blank. Si possono leggere anche "date" (in notazione inglese, italiana,...). Esempio PROGRAMMA SAS n. 24: data es1; input sesso $ eta :3.1 hinch wlib @@; altezza=hinch*2.54; peso=wlib*0.4536; datalines; f 143 56.3 85.0 f 155 62.3 105.0 f 153 63.3 108.0 f 161 59.0 92.0 f 191 62.5 112.5 f 171 62.5 112.0 f 185 59.0 104.0 f 142 56.5 69.0 f 160 62.0 94.5 f 140 53.8 68.5 f 139 61.5 104.0 f 178 61.5 103.5 f 157 64.5 123.5 f 149 58.3 93.0 f 143 51.3 50.5 f 145 58.8 89.0 f 191 65.3 107.0 f 150 59.5 78.5 f 147 61.3 115.0 f 180 63.3 114.0 f 141 61.8 85.0 f 140 53.5 81.0 f 164 58.0 83.5 f 176 61.3 112.0 ; run; Esempio: INPUT NOME $ 8. @11 SESSO $ 1. +1 ETA 2. +1 ALTEZZA 3. +1 PESO 5.1;

E' possibile raggruppare i dati che vanno letti con il medesimo formato e raggruppare il formato come nel seguente esempio: INPUT GEN 3. FEB 3. MAR 3.; può essere indifferentemente scritto come: INPUT (GEN FEB MAR) (3. 3. 3.); INPUT (GEN FEB MAR) (3.);

43

COMBINAZIONE DEI DIVERSI TIPI DI INPUT:

Si consideri il seguente esempio: DATA CLASSE; INPUT NOME $ @ 11 SESSO $ 1. ETA 13-14 ALTEZZA @ 20 PESO 5.1 ; DATALINES; GIANNI M 12 155 48.2 il segno indica uno spazio bianco MARCO M 12 151€43.7 /* altri dati */ ; RUN; la variabile NOME è letta a lista, la variabile SESSO a formato, la variabile ETA a colonna, la variabile ALTEZZA a lista, la variabile PESO a formato. ISTRUZIONI MULTIPLE DI INPUT:

Ogni volta che viene eseguita un'istruzione INPUT viene letto un record; se i dati relativi a una osservazione si trovano su più record bisogna mettere più istruzioni INPUT.

Esempio: DATA CLASSE; INPUT NOME $ 1-8 SESSO $ 11; INPUT ETA 3-4; INPUT ALTEZZA 1-4 PESO 6-10; DATALINES; GIANNI M 12 155 48.2 MARCO M 12 151 43.7 /* altri dati */ ; RUN; F8. L'ISTRUZIONE INFILE Questa istruzione identifica un file in formato testo da cui si vogliono leggere i dati tramite l'istruzione input. La sintassi è:

infile nomefile opzioni;

La principali opzioni sono: firstobs = n. primo record da leggere obs = n. ultimo record da leggere

Esempio: data pippo; infile 'a:pluto.txt' firstobs=3 obs=10; input x1-x5; run;

Vengono letti i record dal terzo al decimo.

44

F9. L'ISTRUZIONE OUTPUT Quando una istruzione di OUTPUT è presente in un passo di Data, il SAS aggiunge osservazioni al Data Set solo quando viene eseguita l'istruzione OUTPUT. In sostanza questa istruzione inibisce la scrittura implicita operata dal SAS al termine del passo di Data.

La sintassi dell'istruzione è la seguente: OUTPUT [nome di uno o più Data Set SAS ] ;

Quando il SAS incontra una o più istruzioni di OUTPUT: - copia l'osservazione o le osservazioni appena lette sul (o sui) Data Set(s) specificati. - prosegue l'esecuzione del passo di DATA con la istruzione seguente l'istruzione OUTPUT. L'istruzione OUTPUT è necessaria quando: - si devono creare osservazioni multiple con i dati di una singola linea dei dati di ingresso; - si devono creare Data Set SAS multipli da un singolo Data Set di ingresso; - si deve creare una sola osservazione combinando dati da linee di ingresso multiple.

Vediamo due esempi (riferiti al DATA SET DISNEY costruito in precedenza): PROGRAMMA SAS n. 25: data corso.maschi corso.femmine; set corso.disney; if sesso='m' then output corso.maschi; else if sesso='f' then output corso.femmine; else put 'osservazioni sbagliate' _all_; proc print data= corso.maschi; proc print data= corso.femmine; data corso.maschi; set corso.disney; if sesso='m' then output; run; F10. SCRITTURA SU UN FILE ESTERNO E ISTRUZIONE PUT Per scrivere dati in formato testo su disco e sufficiente usare le seguenti istruzioni: DATA _NULL_; non costruisce nessun Data Set SET nome Data Set SAS; lettura dei dati in ingresso (si può usare anche INPUT) FILE '[path] nome '; apre il file di uscita PUT [specifiche]; scrittura sul file L'istruzione PUT è l'equivalente dell'istruzione INPUT per la scrittura e specifica cosa e come scrivere ciascuna linea. La forma generale dell'istruzione è:

PUT [specifiche ]; dove le specifiche indicano lo stile da usare; esso può essere: - a lista - a colonna - a formato - a stringa - guidato (a noi interessano solo i primi tre casi) Analogamente all'istruzione INPUT, i vari modi di scrittura possono venire usati in una stessa istruzione PUT. Se non viene definito un file esterno, l'istruzione PUT stampa i dati nella finestra di Log. In uno stesso passo di Data possono essere creati più files su dischi contenenti i risultati dell'analisi effettuata. Ciascuno di questi Data Set deve essere "puntato" da una istruzione FILE, la cui sintassi nella forma generale è quella scritta sopra.

45

G. PASSO DI PROC La forma generale della chiamata di procedura è:

PROC nome procedura [DATA = nome Data Set SAS ]; G1. ALCUNE ISTRUZIONI USATE IN UN PASSO DI PROC BY [DESCENDING] variabili [NOTSORTED]; con tale istruzione i dati vengono processati per gruppi eventualmente in ordine discendente o in

ordine di apparizione. OSS: i dati devono essere già ordinati altrimenti occorre usare l'opzione NOTSORTED. OUTPUT [OUT = nome Data Set SAS ] [parola chiave =nome ] ; serve per creare un Data Set con i risultati di una procedura. TITLE ['testo ']; definisce il testo da stampare all'inizio di ogni pagina di output. FOOTNOTE ['testo ']; definisce il testo da stampare in fondo ad ogni pagina di output. FORMAT variabili formato; si usa per associare ad una variabile un "formato" che definisce come i valori di quella variabile

devono essere presentati in uscita. LABEL variabile = 'etichetta '; si usa per associare una etichetta ad una variabile, al fine di rendere maggiormente leggibili i risultati

prodotti dalle diverse procedure di uscita. TITLE e FOOTNOTE rimangono attivi anche in successivi passi di PROC fino a quando non vengono ridefiniti o annullati. SELEZIONE DI VARIABILI E DI OSSERVAZIONI SULLE QUALI FAR AGIRE UNA PROCEDURA VAR variabili; indica su quali variabili deve operare la procedura. WHERE espressione; permette di selezionare osservazioni dal Data Set su cui opera la procedura. Esempio PROGRAMMA SAS N. 25 bis: proc print data=a.cars; var nazione tipo peso; where nazione='Germany'; run; Osservare che nella colonna Obs rimane il numero dell’osservazione del DSS

OUTPUT SAS Obs NAZIONE TIPO PESO 9 Germany Audi 5000 1.28369 26 Germany VW Scirocco 0.90266 35 Germany VW Dasher 0.99338 37 Germany BMW 320i 1.17936 38 Germany VW Rabbit 0.87318

46

G2. LA PROCEDURA SORT Questa procedura esegue l'ordinamento di valori numerici o carattere (secondo il codice ASCII).

La sintassi è la seguente: PROC SORT [opzioni] ; BY [DESCENDING] variabile [DESCENDING] variabile ...;

Le principali opzioni sono le seguenti:

- DATA = nome Data Set SAS specifica il nome del Data Set da ordinare. Se omesso, il sistema SAS assume l'ultimo creato. - OUT = nome Data Set SAS il nome del Data Set di uscita; se non specificato il Data Set ordinato si sovrappone a quello originale

(se non ci sono errori).

L'istruzione BY deve sempre essere specificata. L'opzione DESCENDING indica che la variabile deve essere ordinata in ordine decrescente. Quando ci sono più variabili specificate con BY l'ordinamento viene fatto a partire dalla prima. Esempio: PROGRAMMA SAS n. 26: proc sort data= corso.disney out=dis; by sesso eta; proc print; run;

OUTPUT SAS della proc print: OBS NOME SESSO ETA ALTEZZA PESO 1 emy f 8 117 25 2 ely f 8 117 25 3 edy f 8 117 25 4 clarabella f 30 180 65 5 minnie f 35 145 40 6 nonna papera f 99 140 55 7 qui m 8 120 30 8 quo m 8 120 30 9 qua m 8 120 30 10 pippo m 32 190 54 11 paperino m 34 150 50

47

G3. LA PROCEDURA PRINT Questa procedura visualizza tutte o alcune variabili di un Data Set SAS. Alcune caratteristiche della procedura sono le seguenti: - viene eseguita in modo automatico la formattazione delle informazioni da stampare entro le

dimensioni della pagina. - le variabili sono stampate su colonne che sono identificate dal nome stesso della variabile o dalla

etichetta ad essa associata; salvo indicazione contraria dell'utente, viene inoltre stampata una colonna identificata con OBS contenente un numero progressivo con cui vengono identificate le osservazioni.

- le osservazioni sono stampate, quando possibile, una per riga. La sintassi della procedura la seguente:

PROC PRINT [opzioni ];

Le principali opzioni sono:

- DATA = nome DataSet SAS specifica il nome del Data Set da stampare. Se omesso, il sistema SAS assume l'ultimo creato. - NOOBS elimina dall'output la colonna con il numero progressivo delle osservazioni - ROUND arrotonda i dati alla seconda cifra decimale Le principali istruzioni della procedura PRINT sono le seguenti: - VAR lista variabili ; indica quali variabili devono essere stampate ed in quale ordine. Se omessa, vengono stampate tutte le

variabili nell'ordine in cui sono state memorizzate nel Data Set SAS. - BY lista variabili ; per avere una stampa separata su osservazioni raggruppate secondo le variabili definite da BY

(attenzione i dati devono essere già ordinati).

48

Esempio: PROGRAMMA SAS n. 27: data corso.dis_eta; set corso.disney; e='giovane'; if eta>10 then e='vecchio'; proc sort data=corso.dis_eta out=eta1; by e sesso; proc print; by e sesso; run;

OUTPUT SAS: --------------------------- E=giovane SESSO=f -------------------------------------- OBS NOME ETA ALTEZZA PESO 1 emy 8 117 25 2 ely 8 117 25 3 edy 8 117 25 --------------------------- E=giovane SESSO=m -------------------------------------- OBS NOME ETA ALTEZZA PESO 4 qui 8 120 30 5 quo 8 120 30 6 qua 8 120 30 --------------------------- E=vecchio SESSO=f -------------------------------------- OBS NOME ETA ALTEZZA PESO 7 minnie 35 145 40 8 clarabella 30 180 65 9 nonna papera 99 140 55 --------------------------- E=vecchio SESSO=m -------------------------------------- OBS NOME ETA ALTEZZA PESO 10 pippo 32 190 54 11 paperino 34 150 50

49

- SUM lista variabili ; indica che devono essere stampati i totali delle variabili elencate. - SUMBY variabile ; va usata solo nel caso sia presente una istruzione BY ed una SUM con più variabili; la variabile

specificata deve comparire nella istruzione BY. Ogni volta che tale variabile cambia valore, vengono stampati i totali delle variabili che sono specificate nella istruzione SUM.

Esempio: PROGRAMMA SAS n. 28: proc sort data= corso.disney out=dis; by sesso; proc print data=dis; by sesso; sum eta altezza peso; sumby sesso; run;

OUTPUT SAS: ---------------------------- SESSO=f ----------------------------- OBS NOME ETA ALTEZZA PESO 1 minnie 35 145 40 2 clarabella 30 180 65 3 nonna papera 99 140 55 4 emy 8 117 25 5 ely 8 117 25 6 edy 8 117 25 --- ------- ---- SESSO 188 816 235 ---------------------------- SESSO=m ----------------------------- OBS NOME ETA ALTEZZA PESO 7 pippo 32 190 54 8 paperino 34 150 50 9 qui 8 120 30 10 quo 8 120 30 11 qua 8 120 30 --- ------- ---- SESSO 90 700 194 === ======= ==== 278 1516 429

50

G4. LA PROCEDURA MEANS La procedura serve per produrre statistiche relative a tutte le osservazioni di un Data Set SAS. La sintassi è la seguente:

PROC MEANS [opzioni ] ; Alcune opzioni sono: - DATA = nome Data Set SAS specifica il Data Set su cui vengono calcolate le statistiche. Se omesso, si considera l'ultimo creato. - NOPRINT sopprime la stampa dei risultati nella finestra di output - MAXDEC = n indica il numero di cifre decimali desiderate. - VARDEF = DF | N | ... specifica il denominatore della formula della varianza. Il default è DF=(numero dati - 1). Esempio: PROGRAMMA SAS n. 29: proc means data= corso.disney; var altezza peso; run; proc means data= corso.disney maxdec=2; var altezza peso; run;

OUTPUT SAS: Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- ALTEZZA 11 137.8181818 26.3811227 117.0000000 190.0000000 PESO 11 39.0000000 14.5258390 25.0000000 65.0000000 -------------------------------------------------------------------- Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- ALTEZZA 11 137.82 26.38 117.00 190.00 PESO 11 39.00 14.53 25.00 65.00 -------------------------------------------------------------------- PROGRAMMA SAS n. 30: proc means data= corso.disney; var altezza peso; output out=sommario mean=m_alt m_peso; proc print; run; OUTPUT SAS (della proc print): OBS _TYPE_ _FREQ_ M_ALT M_PESO 1 0 11 137.818 39

51

Può essere richiesto il calcolo delle seguenti statistiche (che vanno specificate come opzioni se si vogliono diverse da quelle di default che sono indicate nella prima colonna):

N numero di osservazioni esclusi i missing MEAN media aritmetica MIN minimo MAX massimo STD deviazione standard NMISS numero di valori a missing NOBS numero totale di osservazioni SUM sommatoria

RANGE massimo-minimo VAR varianza STDERR errore standard CSS somma dei quadrati corretta USS somma dei quadrati non corretta CV coefficiente di variazione T t di Student

Alcune istruzioni che possono essere usate con PROC MEANS: - BY lista variabili ; (i dati devono per essere precedentemente sortati con PROC SORT). - CLASS lista variabili ; - FREQ variabile ; nel calcolo delle statistiche ciascuna osservazione è considerata con una frequenza pari a n, dove n è il

valore della variabile FREQ. Se il valore della variabile FREQ è minore di 1, l'osservazione non è usata nei calcoli. Se il valore non è intero ne viene considerata la parte intera.

- OUTPUT OUT = Data Set SAS parole chiavi = nomi...; richiede l'uscita delle statistiche in un nuovo Data Set. Le parole chiavi specificano le statistiche che

si vogliono nel nuovo Data Set e il nome delle variabili che contengono le statistiche. Le parole chiavi ammesse sono quelle elencate sopra.

- VAR lista variabili ; indica su quali variabili devono essere effettuate le statistiche. PROGRAMMA SAS n. 31: proc sort data= corso.disney out= dis; by sesso; proc means data=dis maxdec=2; var altezza peso; output out=sommario mean=m_alt m_peso; by sesso; proc print; run; OUTPUT SAS: ------------------------------------- SESSO=f --------------------------------- Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- ALTEZZA 6 136.00 24.96 117.00 180.00 PESO 6 39.17 17.44 25.00 65.00 -------------------------------------------------------------------- ------------------------------------- SESSO=m --------------------------------- Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------- ALTEZZA 5 140.00 30.82 120.00 190.00 PESO 5 38.80 12.13 30.00 54.00 -------------------------------------------------------------------- OBS SESSO _TYPE_ _FREQ_ M_ALT M_PESO 1 f 0 6 136 39.1667 2 m 0 5 140 38.8000

52

G5. LA PROCEDURA FREQ Conteggia le frequenze dei valori e produce tavole mono- o n-dimensionali con frequenze, frequenze cumulate, percentuali, percentuali cumulate.

La sintassi è la seguete: PROC FREQ [opzioni ];

Le opzioni che si possono usare sono: - DATA = nome Data Set SAS specifica il Data Set su cui vengono calcolate le statistiche. Se omesso, si considera l'ultimo creato.

- ORDER=FREQ | DATA | INTERNAL | FORMATTED specifica l'ordine in cui i livelli della variabile devono essere riportati. (se FREQ, i livelli sono ordinati

con i valori delle frequenze in ordine decrescente, se DATA secondo l'ordine in cui essi compaiono nei dati di input, se INTERNAL secondo il valore interno - è il default, se FORMATTED, secondo il valore esterno formattato).

- PAGE viene stampata una tabella per pagina

Alcune istruzioni usate con FREQ:

- BY variabili ; può essere usato per ottenere analisi separate per i gruppi definiti.

- TABLES richieste / opzioni ; (attenzione richieste e opzioni vanno separate da / ) specifica su quali variabili costruire la tabella di frequenza e le sue dimensioni. Le richieste sono composte da una o più variabili unite da un asterisco. esempio TABLES ETA SESSO ; costruisce 2 tabelle monodimensionali TABLES ETA * SESSO ; costruisce 1 tabella bidimensionale Se TABLES è omesso, vengono costruite tabelle monodimensionali per tutte le variabili. Abbreviazioni: TABLES A*(B C); equivalente a TABLES A*B A*C; TABLES (A B)*(C D); equivalente a TABLES A*C A*D B*C B*D; TABLES (A B C)*D; equivalente a TABLES A*D B*D C*D; TABLES (A--C); equivalente a TABLES A B C; TABLES (A--C)*D; equivalente a TABLES A*D B*D C*D; TABLES A--C*D; ambiguo e non si può usare Opzioni di TABLES:

- MISSING interpreta i missing come una classe valida ai fini dell'analisi. - OUT = nome Data Set SAS scrive l'ultima tabella più a destra indicata nelle richieste. - CHISQ richiede un test chi-quadro - NOFREQ sopprime la stampa delle frequenze - NOPERCENT sopprime la stampa delle percentuali - NOROW sopprime le percentuali di riga - NOCOL sopprime le percentuali di colonna - NOCUM sopprime la stampa delle cumulative, frequenze e percentuali - NOPRINT sopprime la stampa delle tabelle

53

PROGRAMMA SAS n. 32: proc freq data= corso.es3; tables altezza; run; OUTPUT SAS: Cumulative Cumulative ALTEZZA Frequency Percent Frequency Percent ----------------------------------------------------- 128.27 1 0.4 1 0.4 130.302 1 0.4 2 0.8 130.81 1 0.4 3 1.2 133.35 1 0.4 4 1.6 134.112 1 0.4 5 2.0 135.382 1 0.4 6 2.3 135.89 1 0.4 7 2.7 136.652 1 0.4 8 3.1 138.43 2 0.8 10 3.9


173.99 1 0.4 248 96.9 175.26 1 0.4 249 97.3 176.53 2 0.8 251 98.0 177.292 1 0.4 252 98.4 180.34 2 0.8 254 99.2 182.88 2 0.8 256 100.0 È possibile conservare in un DSS i valori assunti dalla variabile specificata (ordinati), le corripondenti frequenze assolute e relative tramite l'opzione out=nome DSS del comando Tables. PROGRAMMA SAS n. 33: proc freq data= corso.es3; tables altezza / out=freq_alt; proc print; run; OUTPUT SAS della PROC PRINT: OBS ALTEZZA COUNT PERCENT 1 128.270 1 0.39063 2 130.302 1 0.39063 3 130.810 1 0.39063 4 133.350 1 0.39063


65 176.530 2 0.78125 66 177.292 1 0.39063 67 180.340 2 0.78125 68 182.880 2 0.78125 Per conservare anche le frequenze cumulate (assolute o relative) si procede nel seguente modo: PROGRAMMA SAS n. 34: data freq_al2; set freq_alt; fc_ass+count; fc_perc+percent; proc print; run;

54

OUTPUT SAS: OBS ALTEZZA COUNT PERCENT FC_ASS FC_PERC 1 128.270 1 0.39063 1 0.3906 2 130.302 1 0.39063 2 0.7813 3 130.810 1 0.39063 3 1.1719 4 133.350 1 0.39063 4 1.5625


65 176.530 2 0.78125 251 98.047 66 177.292 1 0.39063 252 98.438 67 180.340 2 0.78125 254 99.219 68 182.880 2 0.78125 256 100.000 La PROC FREQ è soprattutto utilizzata per costruire tabelle di contingenza a due o più vie (attenzione: questo ha senso per variabili qualitative o variabili quantitative che assumono un numero "piccolo" di valori). PROGRAMMA SAS n. 35: proc freq data=corso.dis_eta; tables e*sesso; run; OUTPUT SAS: TABLE OF E BY SESSO E SESSO Frequency| Percent | Row Pct | Col Pct |f |m | Total ---------+--------+--------+ giovane | 3 | 3 | 6 | 27.27 | 27.27 | 54.55 | 50.00 | 50.00 | | 50.00 | 60.00 | ---------+--------+--------+ vecchio | 3 | 2 | 5 | 27.27 | 18.18 | 45.45 | 60.00 | 40.00 | | 50.00 | 40.00 | ---------+--------+--------+ Total 6 5 11 54.55 45.45 100.00

55

DIVISIONE IN CLASSI DI UNA VARIABILE QUANTITATIVA ATTRAVERSO LA PROCEDURA FORMAT Attenzione: dal punto di vista statistico la suddivisione in classi di una variabile quantitativa è spesso un'operazione ARBITRARIA che può falsare i risultati. PROGRAMMA SAS n. 36: proc format ; value c_eta low - 16 = 'giovani' 16 - 18 = 'medi' 18 - high = 'vecchi'; value c_alt low - 145 = 'bassi' 145 - 165 = 'medi' 165 - high = 'alti'; value c_peso low - 40 = 'magri' 40 - 60 = 'medi' 60 - high = 'grassi'; run; Se si volessero gli intervalli chiusi a sinistra si dovrebbe scrivere: value c_peso low -< 40 = 'magri' 40 -< 60 = 'medi' 60 -< high = 'grassi'; PROCEDURA FREQ: proc freq data=corso.es3; format peso c_peso. eta c_eta. altezza c_alt.; tables eta*(peso altezza) / norow nocol nofreq;run; OUTPUT SAS: TABLE OF ETA BY PESO ETA PESO Percent |magri |medi |grassi | Total --------+--------+--------+--------+ giovani | 24.22 | 22.66 | 0.39 | 47.27 --------+--------+--------+--------+ medi | 5.47 | 24.22 | 1.56 | 31.25 --------+--------+--------+--------+ vecchi | 0.39 | 14.84 | 6.25 | 21.48 --------+--------+--------+--------+ Total 77 158 21 256 30.08 61.72 8.20 100.00 TABLE OF ETA BY ALTEZZA ETA ALTEZZA Percent |bassi |medi |alti | Total --------+--------+--------+--------+ giovani | 13.67 | 31.64 | 1.95 | 47.27 --------+--------+--------+--------+ medi | 1.56 | 23.44 | 6.25 | 31.25 --------+--------+--------+--------+ vecchi | 0.78 | 8.59 | 12.11 | 21.48 --------+--------+--------+--------+ Total 41 163 52 256 16.02 63.67 20.31 100.00

56

G6. LA PROCEDURA UNIVARIATE La procedura produce statistiche descrittive per le variabili numeriche, quali valori estremi della variabile, quantili, tabelle di frequenza, grafica box plot e stem and leaf. La sintassi è la seguente:

PROC UNIVARIATE [opzioni ]; Alcune opzioni sono: - DATA = nome Data Set SAS - NOPRINT sopprime tutte le stampe dell'output. Può essere usato quando si vuole creare un nuovo Data Set. - PLOT produce uno stem-and-leaf plot, un box plot e un grafico della funzione di ripartizione empirica

sovrapposto a quello della funzione di ripartizione della Normale - FREQ crea una tabella di frequenza con i valori della variabile, le relative frequenze, percentuali, e

percentuali cumulate - NORMAL effettua un test per verificare se i dati provengono da una distribuzione Normale - PCTLDEF = valore specifica con quale formula calcolare il quantile. - VARDEF = DF | WEIGHT | N | WDF specifica il divisore da usare nel calcolo della varianza (DF indica che devono essere usati i gradi di libertà , N-1; WEIGHT indica che deve essere usata la

somma dei pesi; N indica che si usa il numero delle osservazioni; WDF indica che si usa la somma dei pesi meno 1). Il default è DF.

specifica le unità da usare per arrotondare i valori delle variabili - per il calcolo di percentili addizionali a quelli di default elencati sotto: - PCTLNAME = nomi dei percentili specifica i nomi dei percentili - PCTLPTS = valori dei percentili specifica quali percentili sono da calcolare - PCTLPRE = prefissi delle variabili specifica i prefissi da usare nel DS di output

per le variabili contenenti i nuovi percentili

Per ciascuna variabile la procedura stampa: nome variabile, etichetta, n° osservazioni, somma pesata e somma di pesi delle osservazioni, media, somma, standard deviation, varianza, Skewness, Kurtosis, USS, CSS, errore standard della media, tests di ipotesi sulla media = 0, numero di osservazioni diverse da 0, massimo, minimo, quartili (Q1, mediana, Q3), distanza interquartile (Q3-Q1), rango, moda, percentili (1° 5° 10° 90° 95° 99°), valori missing (simbolo, n°, percentuale rispetto al totale delle osservazioni), ... . Alcune istruzioni usate con la procedura UNIVARIATE sono: - VAR variabili ; le statistiche sono calcolate per le variabili specificate - BY variabili ; usato per ottenere analisi separate sulle osservazioni - FREQ variabile ; ciascuna osservazione nel Data Set, che viene analizzata, è assunta rappresentare n osservazioni, dove

n è il valore della variabile specificata nell'istruzione FREQ - ID lista variabili ; le variabili specificate sostituiscono la colonna identificata con OBS - WEIGHT variabile ; quando è specificata, la PROC UNIVARIATE usa il valore della variabile specificata dopo WEIGHT

per calcolare la media pesata e la varianza pesata - OUTPUT OUT = Data Set SAS parole chiavi = nomi...; richiede l'uscita delle statistiche in un nuovo Data Set. Le parole chiavi specificano le statistiche che

si vogliono nel nuovo Data Set e il nome delle variabili che contengono le statistiche

57

Le parole chiavi ammesse sono: N NMISS NOBS MEAN SUM STD VAR SKEWNESS KURTOSIS SUMWGT MAX MIN RANGE Q3 MEDIAN Q1 QRANGE P1 P5 P10 P90 P95

P99 MODE SIGNRANK NORMAL (con i significati già indicati) PROGRAMMA SAS n. 37: proc univariate data=corso.es3 plot; var altezza; run; OUTPUT SAS: The UNIVARIATE Procedure Variable: ALTEZZA Moments N 256 Sum Weights 256 Mean 155.890516 Sum Observations 39907.972 Std Deviation 10.229599 Variance 104.644695 Skewness -0.0091442 Kurtosis -0.19828 Uncorrected SS 6247958.73 Corrected SS 26684.3972 Coeff Variation 6.56204063 Std Error Mean 0.63934994 Basic Statistical Measures Location Variability Mean 155.8905 Std Deviation 10.22960 Median 156.2100 Variance 104.64470 Mode 156.2100 Range 54.61000 Interquartile Range 15.24000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 243.8266 Pr > |t| <.0001 Sign M 128 Pr >= |M| <.0001 Signed Rank S 16448 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 182.880 99% 180.340 95% 171.450 90% 168.910 75% Q3 163.322 50% Median 156.210 25% Q1 148.082 10% 143.002 5% 139.700 1% 130.810 0% Min 128.270 Extreme Observations ------Lowest----- -----Highest----- Value Obs Value Obs 128.270 116 177.292 183 130.302 11 180.340 176 130.810 95 180.340 181 133.350 174 182.880 213 134.112 43 182.880 241

58

Stem Leaf # Boxplot 18 0033 4 | 17 5777 4 | 17 00011112334 11 | 16 555555555566666666678888888888899999999 39 | 16 0000000000111111111111122223333333344444 40 +-----+ 15 555566666666666666666666677777777777777888999 45 *--+--* 15 00000011111111111222222222222344444444444444 44 | | 14 55556666666667777777777888888999 32 +-----+ 14 000000011222223333444444444 27 | 13 56788 5 | 13 0134 4 | 12 8 1 | ----+----+----+----+----+----+----+----+----+ Multiply Stem.Leaf by 10**+1 Normal Probability Plot

(non è riportata una parte di output) G7. ALTRE PROCEDURE STATISTICHE ELEMENTARI SUMMARY calcola statistiche descrittive sulle variabili numeriche del Data Set SAS. E' simile alla PROC MEANS. Di default non produce output. TABULATE costruisce tabelle di statistiche descrittive. CORR calcola i coefficienti di correlazione fra le variabili. G8. ALCUNE PROCEDURE CHE OPERANO SUI DATA SET SAS APPEND serve per aggiungere informazioni in un Data Set SAS COMPARE confronta i valori delle variabili in due Data Set SAS CONTENTS scrive descrizioni dei contenuti di uno o più Data Set SAS COPY produce copie di una intera (o parti di) una libreria SAS DATASETS serve per mettere un Data Set in una libreria

TRANSPOSE crea un nuovo Data Set SAS invertendo osservazioni con variabili

59

H. ISTRUZIONI E PROCEDURE GRAFICHE

H1. INTRODUZIONE Gli esempi relativi al modulo SAS/GRAPH utilizzano i dati relativi a studenti dei Corsi di laurea in

Matematica, Informatica e Biologia dell’Università del Piemonte Orientale che hanno frequentato il corso di Modelli Matematici e Statistici nell’anno accademico 1998/1999. Di seguito sono riportati il questionario e la codifica delle variabili qualitative.

N. SESSO

(M,F) ALTEZZA PESO CORSO

LAUREA NUMEROSCARPA

COLORE OCCHI

COLORE CAPELLI

ATT. SPORTIVA

DIPLOMASuperiore

1

CORSO LAUREA M= matematica B = biologia I = informatica

COLORE OCCHI 1 = scuri 2 = verdi 3 = azzurri

COL. CAPELLI 1 = scuri 2 = castani 3 = biondi

ATT. SPORTIVA 1 = nulla 2 = media 3 = alta

DIPLOMA : 1 = liceo scientif. 2 = liceo classico 3 = ist. tecnico 4 = ist. magistrale 5 = altro

H2. LA PROCEDURA GCHART La procedura GCHART è una procedura usata per produrre: istogrammi (bar chart) - verticali o orizzontali, istogrammi tridimensionali (block chart), areogrammi circolari (pie chart) o stellari (star chart). Occorre fornire alcune informazioni di base:

- tipo di rappresentazione grafica desiderata; - variabile/i oggetto dell'analisi; - significato dell'altezza di ogni barra (blocco), o dell'area di un settore dell'areogramma; - indicazioni relative al criterio secondo il quale i valori della/e variabile/i di analisi dovranno essere

raggruppati prima della loro rappresentazione grafica. La sintassi della procedura la seguente: PROC GCHART [opzioni ]; BY variabili ; istruzioni per specificare il tipo di rappresentazione: VBAR variabili / [opzioni ] ; per istogrammi verticali HBAR variabili / [opzioni ] ; per istogrammi orizzontali BLOCK variabili / [opzioni ] ; per istogrammi tridimensionali PIE variabili / [opzioni ] ; per areogrammi circolari STAR variabili / [opzioni ] ; per rappresentazioni polari

Le opzioni della dichiarazione PROC GCHART sono:

- DATA = nome Data Set SAS

- ANNOTATE = nome Data Set SAS specifica il data set da usare per le annotazioni; questo data set deve essere costruito in modo

particolare

- GOUT = nome Catalogo SAS serve per produrre output grafici permanenti

60

PROGRAMMA SAS n. 38: libname corso 'a:\corsosas'; proc gchart data=corso.mms; hbar claurea; vbar claurea; block claurea; run; OUTPUT SAS

61

Principali opzioni per le istruzioni VBAR e HBAR.

- RAXIS = valori | AXISn - MAXIS =valori | AXISn specificano la descrizione degli assi (rispettivamente delle frequenze e della variabile); l'istruzione AXISn deve essere precedentemente definita

PROGRAMMA SAS n. 39: goption vsize=6 cm hsize=12 cm ftext=swiss htext=1; axis1 label=(h=2) length=5 cm; axis2 label=(h=1.5) length=3 cm; proc gchart data=corso.mms; vbar claurea / maxis=axis1 raxis=axis2; run;

OUTPUT SAS:

DISCRETE

Specifica che la variabile numerica oggetto di analisi assume valori discreti. Se l'opzione non è presente, la procedura suppone che la variabile assuma valori continui e, in assenza dell'opzione MIDPOINTS=, vengono scelti i baricentri delle classi rappresentate da barre o settori.

PROGRAMMA SAS n. 40: proc gchart data=corso.mms; vbar scarpa; vbar scarpa / discrete; run; OUTPUT SAS:

62

GROUP = variabile produce grafici affiancati, ciascuno dei quali rappresenta le osservazioni che hanno un dato valore

della variabile PROGRAMMA SAS n. 41: proc gchart data=corso.mms; vbar sesso / group = sport discrete; block sesso / group = sport discrete; run; OUTPUT SAS:

SUBGROUP = variabile suddivide ciascuna barra o blocco in tante parti quanti sono i valori diversi assunti dalla variabile

indicata

PROGRAMMA SAS n. 42: pattern1 c=black v=x3; pattern2 c=black v=r3; proc gchart data=corso.mms; vbar sport / subgroup=sesso discrete; run; OUTPUT SAS:

63

PATTERNID = SUBGROUP | GROUP | MIDPOINT | BY specifica di cambiare "disegno" ogni volta che cambiano valore le variabili definite con le opzioni SUBGROUP, GROUP o MIDPOINT o con l'istruzione BY. Per definire personalmente i "disegni" si usano le istruzioni PATTERN1 ... PATTERNm

PROGRAMMA SAS n. 43: pattern1 c=black v=x5; pattern2 c=black v=s; proc gchart data=corso.mms; vbar sesso / group=occhi discrete; vbar sesso / group=occhi discrete patternid=group; run; goption reset=(all); OUTPUT SAS:

Le seguenti due opzioni si usano quando è definita l'opzione GROUP e indicano rispettivamente lo spazio da lasciare fra le barre dei gruppi e la descrizione dell'asse dei gruppi

- GSPACE = n

- GAXIS = AXISn

64

MIDPOINTS = valori definisce l'insieme dei punti di mezzo che saranno associati a ciascuna barra o settore della

rappresentazione grafica; detti valori, numerici o alfanumerici, potranno essere elencati secondi le forme seguenti:

1) una lista di valori numerici, esempio: 5 10 15 20 25 30; 2) una forma ripetitiva del tipo: 5 TO 30 BY 5 che genera i seguenti baricentri con passo 5: 5 10 15 20 25 30; 3) una forma ripetitiva del tipo: 10 TO 1 BY -1 che genera i seguenti baricentri con passo -1: 10 9 8 7 6 5 4 3 2 1; 4) una lista di stringhe racchiuse tra apici del tipo: 'Gennaio' 'Febbraio' 'Marzo' ecc... PROGRAMMA SAS n. 44: proc gchart data=corso.mms; vbar altezza / midpoints=150 to 190 by 20; vbar altezza / midpoints=150 to 190 by 10; run;

PROGRAMMA SAS n. 45: data part; input p @@; datalines; 117 117 118 119 121 122 122 126 127 128 128 128 129 129 129 132 133 134 135 136 137 138 139 141 142 144 148 150 155 156 ; proc gchart data=part; hbar p / midpoints= 120 135 150; hbar p / midpoints =123 138 153; run;

65

- TYPE = parola chiave specifica a cosa deve essere proporzionale la dimensione di ciascuna barra o settore della

rappresentazione grafica; in particolare, parola chiave può essere: - FREQ la frequenza con cui un valore è stato incontrato - PERCENT la percentuale di osservazioni che assumono un dato valore - CFREQ o CPERCENT rispettivamente, la frequenza o la percentuale cumulata

SUM ciascuna barra rappresenta la somma dei valori della variabile specificata nella opzione SUMVAR=

MEAN il valore medio dei valori della variabile specificata nella opzione SUMVAR=

SUMVAR = variabile rappresenta il nome di una variabile i cui valori saranno elaborati in funzione dell'opzione TYPE=.

Esempi di uso di SUMVAR e TYPE: PROGRAMMA SAS n. 46: proc gchart data=corso.mms; vbar scarpa / discrete type=mean sumvar=altezza; vbar scarpa / discrete type=mean sumvar=altezza group=sesso; run;

OUTPUT SAS:

66

LEVELS = n nel caso che la variabile da rappresentare negli istogrammi assuma valori continui, mediante questa

opzione è possibile specificare che il numero delle barre dell'istogramma dovrà essere n PROGRAMMA SAS n. 47: proc gchart data=corso.mms; vbar altezza / levels=7; run;

REF = lista specifica una lista di linee di riferimento da disegnare sull'asse delle frequenze

SPACE = n specifica lo spazio fra le barre

WIDTH = n specifica l'ampiezza delle barre PROGRAMMA SAS n. 48: proc gchart data=corso.mms; vbar diploma / discrete ref= 10 30 space=0 width=5; run; OUTPUT SAS:

67

Ulteriori opzioni per VBAR e HBAR:

CTEXT = colore - COUTLINE = colore specificano il colore rispettivamente dei testi e dei contorni.

MISSING specifica che i valori mancanti devono essere considerati valori validi a cui associare barre o settori.

CAXIS = colore specifica il colore degli assi

G100 è usata quando è presente l'opzione GROUP e forza le barre al 100 % in ciascun gruppo

LEGEND = LEGENDn specifica la legenda da associare a ciascun grafico (nel caso in cui sia prevista, ad esempio se è

specificata l'opzione SUBGROUP); l'istruzione LEGENDn deve essere precedentemente definita

NOLEGEND indica di omettere la legenda usata per ciascun sottogruppo

ASCENDING DESCENDING indicano di stampa le barre in ordine crescente (risp. decrescente) rispetto alle frequenze

FRAME CFRAME = colore la prima specifica di bordare l'area in cui è compreso il grafico; se non è presente l'opzione CFRAME il

colore del bordo è lo stesso di quello degli assi

NOZEROS specifica che ogni barra con valore zero sia soppressa

NOAXIS sopprime la stampa degli assi

MINOR = n specifica il numero di tacche piccole da stampare fra le tacche grandi sull'asse delle frequenze Ulteriori opzioni per HBAR:

NOSTATS specifica che nessuna statistica venga stampata in un istogramma a barre orizzontali

FREQ e CFREQ indicano che siano stampate a lato del grafico le frequenze corrispondenti a ciascuna barra (con CFREQ le frequenze cumulate)

PERCENT e CPERCENT indica che siano stampate le percentuali per ciascuna barra (con CPERCENT le percentuali cumulate)

SUM e MEAN indicano che siano stampati rispettivamente il numero totale di osservazioni per ciascuna barra e la

media delle osservazioni rappresentate da ciascuna barra

NOSYMBOL indica di omettere la legenda dei simboli usati per ciascun sottogruppo

68

H3. LA PROCEDURA GPLOT La procedura è in grado di produrre diagrammi cartesiani, cioè di rappresentare nel piano l'andamento di una variabile al variare di un'altra.

La sintassi è la seguente: PROC GPLOT [opzioni ] ;

Le opzioni della procedura sono: - DATA = nome Data Set SAS - ANNOTATE = nome Data Set SAS - GOUT = nome Catalogo SAS usati come nella procedura GCHART

- UNIFORM specifica che la scala sugli assi sia uguale quando è presente l'istruzione BY (per poter confrontare

grafici dei diversi livelli della variabile indicata con BY) Le principali istruzioni della procedura sono:

- BY variabile ; con il solito significato

- PLOT richieste / opzioni ; - PLOT2 richieste / opzioni ; dove richieste : 1) lista di: var_vert * var_oriz 2) lista di: var_vert * var_oriz = n 3) lista di: var_vert * var_oriz = variabile specifica le variabili (verticali e orizzontali) da visualizzare sul grafico (1), usando l'n-esimo simbolo (eventualmente specificato con l'istruzione SYMBOLn ) (2), o usando la variabile i cui valori sono plottati per ciascun punto (3). PROGRAMMA SAS n. 49: proc gplot data=corso.mms; plot altezza*peso=1; run; OUTPUT SAS:

Si possono richiedere più grafici contemporaneamente (uno per pagina salvo opzione OVERLAY indicata sotto).

L'istruzione PLOT2 genera un secondo asse verticale sulla destra dei grafici prodotti con l'istruzione PLOT. Ha la stessa sintassi della procedura PLOT.

69

Le principali opzioni dell'istruzione PLOT sono: Opzioni generali: ANNOTATE = nome Data Set SAS AREAS = n specifica quali aree sopra o sotto le linee disegnate devono essere riempite; le aree sono numerate dal basso in alto (l'area fra l'asse orizzontale e il grafico più "basso" è l'area 1, l' area fra tale grafico e quello immediatamente più "alto" è l'area 2 e così via); i "disegni" dei ricoprimenti possono essere personalizzati con le istruzioni PATTERN1 ... PATTERNn. LEGEND = LEGENDn - NOLEGEND la prima opzione specifica la legenda da associare a ciascun grafico (nel caso in cui sia prevista); l'istruzione LEGENDn deve essere precedentemente definita; la seconda opzione indica di ometterla OVERLAY per ottenere grafici sovrapposti SKIPMISS crea una interruzione nella linea che unisce i punti quando vi sono dei valori mancanti Per linee di riferimento: AUTOHREF - AUTOVREF disegna automaticamente linee di riferimento in corrispondenza delle tacche maggiori HREF = valori - VREF = valori specifica che una linea orizzontale (risp.verticale) sia disegnata sul grafico all'altezza dei valori indicati CHREF = colore - CVREF = colore - LHREF = n - LVREF = n specifica i colori e i caratteri per le linee GRID produce una "griglia" Per definire gli assi: NOAXIS sopprime la stama degli assi, delle etichette e dei valori CAXIS = colore - CTEXT = colore specificano il colore rispettivamente della linea degli assi, dei testi sugli assi FRAME - CFRAME = colore la prima specifica di bordare l'area in cui è compreso il grafico; se non è presente l'opzione CFRAME il colore del bordo è lo stesso di quello degli assi HAXIS = valori | AXISn - VAXIS = valori | AXISn specifica le tacche dell'asse verticale (risp. orizzontale); se si usa AXISn l'istruzione corrispondente deve essere presente Esempio nel caso in cui si usano i valori: PLOT y*x / VAXIS=10 TO 100 BY 5;

PLOT y*x / VAXIS=10 100 1000 10000; HMINOR = n - VMINOR = n specifica il numero di tacche piccole da disegnare fra le tacche grandi HZERO VZERO richiede che le tacche sull'asse verticale (risp. orizzontale) inizino dall'origine VREVERSE specifica che l'ordine dei valori sull'asse verticale sia invertito

70

H4. ALCUNE ISTRUZIONI PER GLI OUTPUT GRAFICI Le seguenti istruzioni permettono di personalizzare l'output grafico. L'istruzione:

GOPTION opzioni ; permette di impostare il default dell'output grafico. Deve essere posizionata prima delle procedure grafiche e rimane attiva fino a una dichiarazione successiva Alcune opzioni sono:

- per i testi: CTEXT = colore FTEXT = font HTEXT = n per i testi scritti (con le istruzioni successive si possono modificare alcuni specifici testi)

- per i titoli: CTITLE = colore FTITLE = font HTITLE = n per i titoli (con le istruzioni successive si possono modificare alcuni specifici titoli)

Colori validi: B blue C cyan W white A gray|grey R red P pink K black N brown G green Y yellow M magenta O orange

Le tabelle dei font si trovano alle pagine 166-174 del manuale SAS GRAPH vol.1.

- per le dimensioni dell'output grafico: VSIZE = n [IN | CM | CELL | PCT] HSIZE = n [IN | CM | CELL | PCT]

- per il tipo di device grafico: GDEVICE = nome ad esempio il nome è: winprtg se la stampante è a toni di grigio winprtc se la stampante è a colori

- per annullare istruzioni grafiche precedentemente assegnate RESET = ALL (istruzione, istruzione, ...) è utile usare questa istruzione all'inizio di ogni programma. Esempio: goption device=winprtg vsize=15 cm hsize=18 cm ftext=swiss htext=1;

71

Le seguenti istruzioni possono essere posizionate sia fuori che dentro le procedure grafiche.

Le seguenti due istruzioni operano come già osservato per gli output non grafici.

TITLEn [opzioni] 'testo ' ;

FOOTNOTEn [opzioni] 'testo ' ;

Alcune opzioni specifiche per gli output grafici sono:

- per il colore C = colore

- per l'altezza H = n

- per il tipo di carattere F = font

- per la posizione D = (coordinate)

La seguente istruzione serve per precisare il "disegno" con cui fare gli istogrammi (linee oblique a destra, a sinistra, a croce, vuoto, pieno) quando l'istogramma deve essere suddiviso per sottoclassi.

PATTERNn opzioni ; Alcune opzioni sono:


- per il tipo di disegno V = valore

TABELLA DEI PATTERN:

72

La seguente istruzione serve per precisare i caratteri con cui stampare i grafici e il modo con cui unire i punti.

SYMBOLn opzioni ; Alcune opzioni sono:


- per l'altezza H = n

- per il carattere da plottare in corrispondenza dei punti del grafico

V = valore

- per indicare se e come vanno interpolati i punti I = NONE | JOIN | HILO | ...

- per il tipo di carattere F = font

- per l'ampiezza W = n

- per il tipo di tratteggio delle linee con cui vanno uniti i punti

L = n

73

La seguente istruzione serve per precisare come visualizzare gli assi.

AXISn opzioni ; Alcune opzioni sono: - per il colore C = colore

- per la descrizione degli assi LABEL = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)

- per i valori delle tacche VALUE = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)

- per il tipo di linea (tratteggio) degli assi STYLE = n

- per l'ampiezza in pixel della linea degli assi WIDTH = n

- per la lunghezza della linea degli assi LENGTH = n [IN | CM | CELL | PCT]

- per definire le coordinate dell'origine ORIGIN = coordinate

- per per definire i valori delle tacche ORDRE = lista

- per le tacche principali MAJOR = NONE |(C = colore | H = n[IN | CM | CELL | PCT] | N = n | W = n | ...)

- per le tacche piccole MINOR = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | N = n | W = n | ...) La seguente istruzione serve per precisare l'aspetto delle legende che compaiono nei grafici.

LEGENDn opzioni ; Alcune opzioni sono:

- per la descrizione degli assi LABEL = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)

- per i valori delle tacche VALUE = NONE | (C = colore | H = n[IN | CM | CELL | PCT] | F = font | ...)

74

H5. La procedura BOXPLOT:

Programma SAS n. 50: proc sort data=corso.mms out=mms ;by sesso;run; proc boxplot data=mms; plot altezza*sesso/boxstyle = schematicid; run; proc sort data=corso.mms out=mms ;by sport;run; proc boxplot data=mms; plot altezza*sport/boxstyle = schematicid; run; quit;

OUTPT SAS:

75

Un esempio completo di uso della procedura GPLOT: PROGRAMMA SAS n. 51: libname corso 'a:\mms'; goption reset=all; goption device=winprtg hsize=18 cm vsize=25 cm ftext=swiss htext=1; goption reset=(symbol,axis,title); symbol1 c=blue f=swissb v=-; symbol2 c=green f=swissb v=@; symbol3 c=red v=dot; symbol4 c=pink f=swissb v=x; symbol5 c=yellow f=swissb v=#; symbol6 c=cyan f=swissb v=+; proc gplot data=corso.ocp2; plot PRIN2*PRIN1=NAZIONE/ href=0 vref=0 frame ; title h=1.5 'Carta delle osservazioni'; run;

76

data ANNOTA; set corso.MATP; x=COL1; y=COL2; xsys='2'; ysys='2'; text=_NAME_; size=1; label y= 'secondo asse'; label x= 'primo asse'; keep X Y TEXT SIZE XSYS YSYS; run; goption reset=(symbol,axis); symbol1 v=none; axis1 order=-1 to 1 by 0.2 length=15 cm; proc gplot data=ANNOTA; plot Y*X=1/href=0 vref=0 annotate=ANNOTA frame haxis=axis1 vaxis=axis1; title h=1.5 'Carta delle variabili'; run;

OUTPUT SAS:

77

I. ERRORI E LETTURA DEL LOG (tratto da SASOnlineDOC)

Error Processing and Debugging Definitions SAS performs error processing during both the compilation and the execution phases of SAS processing. You can debug SAS programs by understanding processing messages in the SAS log and then fixing your code. You can use the DATA Step Debugger to detect logic errors in a DATA step during execution. SAS recognizes five types of errors. This type of error ...

occurs when ... and is detected at ...

Syntax programming statements do not conform to the rules of the SAS language

compile time

Semantic the language element is correct, but the element may not be valid for a particular usage

compile time

execution-time SAS attempts to execute a program and execution fails

execution time

Data data values are invalid execution time macro-related you use the macro facility incorrectly macro compile time or execution time,

DATA or PROC step compile time or execution time

Types of Errors

Syntax Errors Syntax errors occur when program statements do not conform to the rules of the SAS language. Examples of syntax errors include

! misspelling a SAS keyword ! using unmatched quotation marks ! forgetting a semicolon ! specifying an invalid statement option ! specifying an invalid data set option.

When SAS encounters a syntax error, it first attempts to correct the error by attempting to interpret what you meant, then continues processing your program based on its assumptions. If SAS cannot correct the error, it prints an error message to the log. In the following example, the DATA statement is misspelled, and SAS prints a warning message to the log. Because SAS could interpret the misspelled word, the program runs and produces output. date temp; x=1; run; proc print data=temp; run;

78

SAS Log: Syntax Error (misspelled key word)

1 date temp; ---- 14 WARNING 14-169: Assuming the symbol DATA was misspelled as date. 2 x=1; 3 run; NOTE: The data set WORK.TEMP has 1 observations and 1 variables. NOTE: DATA statement used: real time 0.17 seconds cpu time 0.04 seconds 4 5 proc print data=temp; 6 run; NOTE: PROCEDURE PRINT used: real time 0.14 seconds cpu time 0.03 seconds

Some errors are explained fully by the message that SAS prints in the log; other error messages are not as easy to interpret because SAS is not always able to detect exactly where the error occurred. For example, when you fail to end a SAS statement with a semicolon, SAS does not always detect the error at the point where it occurs because SAS statements are free-format (they can begin and end anywhere). In the following example, the semicolon at the end of the DATA statement is missing. SAS prints the word ERROR in the log, identifies the possible location of the error, prints an explanation of the error, and stops processing the DATA step. data temp x=1; run; proc print data=temp; run; SAS Log: Syntax Error (missing semicolon)

1 data temp 2 x=1; - 76 ERROR 76-322: Syntax error, statement will be ignored. 3 run; NOTE: The SAS System stopped processing this step because of errors. NOTE: DATA statement used: real time 0.11 seconds cpu time 0.02 seconds 4 5 proc print data=temp; ERROR: File WORK.TEMP.DATA does not exist. 6 run; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE PRINT used: real time 0.06 seconds cpu time 0.01 seconds

Whether subsequent steps are executed depends on which method of running SAS you use, as well as on your operating environment.

79

Semantic Errors Semantic errors occur when the form of the elements in a SAS statement is correct, but the elements are not valid for that usage. Semantic errors are detected at compile time and can cause SAS to enter syntax check mode. Examples of semantic errors include

! specifying the wrong number of arguments for a function ! using a numeric variable name where only a character variable is valid ! using illegal references to an array.

In the following example, SAS detects an illegal reference to the array ALL. data _null_; array all{*} x1-x5; all=3; datalines; 1 1.5 . 3 2 4.5 3 2 7 3 . . ; run; SAS Log: First Example of a Semantic Error

cpu time 0.02 seconds 1 data _null_; 2 array all{*} x1-x5; ERROR: Illegal reference to the array all. 3 all=3; 4 datalines; NOTE: The SAS System stopped processing this step because of errors. NOTE: DATA statement used: real time 2.28 seconds cpu time 0.06 seconds 10 ; 11

The following is another example of a semantic error. In this DATA step, the libref SOMELIB has not been previously assigned in a LIBNAME statement. data test; set somelib.old; run; SAS Log:Second Example of a Semantic Error

cpu time 0.00 seconds 1 data test; ERROR: Libname SOMELIB is not assigned. 2 set somelib.old; 3 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.TEST may be incomplete. When this step was stopped there were 0 observations and 0 variables. NOTE: DATA statement used: real time 0.17 seconds

80

Execution-Time Errors Definition Execution-time errors occur when SAS executes a program that contains data values. Most execution-time errors produce warning messages or notes in the SAS log but allow the program to continue executing. (footnote 1)The location of an execution-time error is usually given as line and column numbers in a note or error message. Common execution-time errors include the following:

! illegal arguments to functions ! illegal mathematical operations (for example, division by 0) ! observations in the wrong order for BY-group processing ! reference to a nonexistent member of an array (occurs when the array's subscript is out of

range) ! open and close errors on SAS data sets and other files in INFILE and FILE statements ! INPUT statements that do not match the data lines (for example, an INPUT statement in which

you list the wrong columns for a variable or fail to indicate that the variable is a character variable).

Out-of-Resources Condition An execution-time error can also occur when you encounter an out-of-resources condition, such as a full disk, or insufficient memory for a SAS procedure to complete. When these conditions occur, SAS attempts to find resources for current use. For example, SAS may ask the user for permission to delete temporary data sets that might no longer be needed, or to free the memory in which macro variables are stored. When an out-of-resources condition occurs in a windowing environment, you can use the SAS CLEANUP system option to display a requestor panel that enables you to choose how to resolve the error. When you run SAS in batch, noninteractive, or interactive line mode, the operation of CLEANUP depends on your operating environment. For more information about this system option, see CLEANUP in the "SAS System Options" chapter in SAS Language Reference: Dictionary, and in the SAS documentation for your operating environment. Examples In the following example, an execution-time error occurs when SAS uses data values from the second observation to perform the division operation in the assignment statement. Division by 0 is an illegal mathematical operation and causes an execution-time error. options linesize=64 nodate pageno=1 pagesize=25; data inventory; input Item $ 1-14 TotalCost 15-20 UnitsOnHand 21-23; UnitCost=TotalCost/UnitsOnHand; datalines; Hammers 440 55 Nylon cord 35 0 Ceiling fans 1155 30 ; proc print data=inventory; format TotalCost dollar8.2 UnitCost dollar8.2; run; SAS Log: Execution-Time Error

cpu time 0.02 seconds 1 2 options linesize=64 nodate pageno=1 pagesize=25; 3

81

4 data inventory; 5 input Item $ 1-14 TotalCost 15-20 6 UnitsOnHand 21-23; 7 UnitCost=TotalCost/UnitsOnHand; 8 datalines; NOTE: Division by zero detected at line 12 column 22. RULE:----+----1----+----2----+----3----+----4----+----5----+---- 10 Nylon cord 35 0 Item=Nylon cord TotalCost=35 UnitsOnHand=0 UnitCost=. _ERROR_=1 _N_=2 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 12:22 NOTE: The data set WORK.INVENTORY has 3 observations and 4 variables. NOTE: DATA statement used: real time 2.78 seconds cpu time 0.08 seconds 12 ; 13 14 proc print data=inventory; 15 format TotalCost dollar8.2 UnitCost dollar8.2; 16 run; NOTE: There were 3 observations read from the dataset WORK.INVENTORY. NOTE: PROCEDURE PRINT used: real time 2.62 seconds

SAS Output: Execution-Time Error

The SAS System 1 Total Units Obs Item Cost OnHand UnitCost 1 Hammers $440.00 55 $8.00 2 Nylon cord $35.00 0 . 3 Ceiling fans $1155.00 30 $38.50

SAS executes the entire step, assigns a missing value for the variable UnitCost in the output, and writes the following to the SAS log:

! a note ! the values stored in the input buffer ! the contents of the program data vector at the time the error occurred ! a note explaining the error.

Note that the values listed in the program data vector include the _N_ and _ERROR_ automatic variables. These automatic variables are assigned temporarily to each observation and are not stored with the data set. In the following example of an execution-time error, the program processes an array and SAS encounters a value of the array's subscript that is out of range. SAS prints an error message to the log and stops processing. options linesize=64 nodate pageno=1 pagesize=25; data test; array all{*} x1-x3; input I measure; if measure > 0 then

82

all{I} = measure; datalines; 1 1.5 . 3 2 4.5 ; proc print data=test; run;

cpu time 0.02 seconds 1 options linesize=64 nodate pageno=1 pagesize=25; 2 3 data test; 4 array all{*} x1-x3; 5 input I measure; 6 if measure > 0 then 7 all{I} = measure; 8 datalines; ERROR: Array subscript out of range at line 12 column 7. RULE:----+----1----+----2----+----3----+----4----+----5----+---- 10 . 3 x1=. x2=. x3=. I=. measure=3 _ERROR_=1 _N_=2 NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.TEST may be incomplete. When this step was stopped there were 1 observations and 5 variables. NOTE: DATA statement used: real time 0.90 seconds cpu time 0.09 seconds 12 ; 13 14 proc print data=test; 15 run; NOTE: There were 1 observations read from the dataset WORK.TEST. NOTE: PROCEDURE PRINT used: real time 0.81 seconds

Data Errors Data errors occur when some data values are not appropriate for the SAS statements that you have specified in the program. For example, if you define a variable as numeric, but the data value is actually character, SAS generates a data error. SAS detects data errors during program execution and continues to execute the program, and does the following:

! writes an invalid data note to the SAS log. ! prints the input line and column numbers that contain the invalid value in the SAS log.

Unprintable characters appear in hexadecimal. To help determine column numbers, SAS prints a rule line above the input line.

! prints the observation under the rule line. ! sets the automatic variable _ERROR_ to 1 for the current observation.

In this example, a character value in the Number variable results in a data error during program execution: options linesize=64 nodate pageno=1 pagesize=25;

83

data age; input Name $ Number; datalines; Sue 35 Joe xx Steve 22 ; proc print data=age; run; The SAS log shows that there is an error in line 61, position 5-6 of the program. SAS Log: Data Error

cpu time 0.01 seconds 1 2 options linesize=64 nodate pageno=1 pagesize=25; 3 4 data age; 5 input Name $ Number; 6 datalines; NOTE: Invalid data for Number in line 61 5-6. RULE:----+----1----+----2----+----3----+----4----+----5----+---- 8 Joe xx Name=Joe Number=. _ERROR_=1 _N_=2 NOTE: The data set WORK.AGE has 3 observations and 2 variables. NOTE: DATA statement used: real time 0.06 seconds cpu time 0.02 seconds 10 ; 11 12 proc print data=age; 13 run; NOTE: There were 3 observations read from the dataset WORK.AGE. NOTE: PROCEDURE PRINT used: real time 0.01 seconds

SAS Output: Data Error

The SAS System 1 Obs Name Number 1 Sue 35 2 Joe . 3 Steve 22

You can also use the INVALIDDATA= system option to assign a value to a variable when your program encounters invalid data. For more information, see the INVALIDDATA= system option in SAS Language Reference: Dictionary. Format Modifiers for Error Reporting The INPUT statement uses the ? and the ?? format modifiers for error reporting. The format modifiers control the amount of information that is written to the SAS log. Both the ? and the ?? modifiers supress the invalid data message. However, the ?? modifier also sets the automatic variable _ERROR_ to 0. For example, these two sets of statements are equivalent:

! nput x ?? 10-12; ! input x ? 10-12;

84

! _error_=0;

In either case, iSAS sets the invalid values of X to missing values.

Macro-related Errors Several types of macro-related errors exist:

! macro compile time and macro execution-time errors, generated when you use the macro facility itself

! errors in the SAS code produced by the macro facility. For more information about macros, see SAS Macro Language: Reference.

FOOTNOTE 1: When you run SAS in noninteractive mode, more serious errors can cause SAS to enter syntax check mode and stop processing the program.

Error Processing

Syntax Check Mode If a DATA step has a syntax error, SAS can enter syntax check mode. SAS internally sets the OBS= option to 0 and the REPLACE/NOREPLACE option to NOREPLACE. When these options are in effect, SAS

! reads the remaining statements in the DATA step ! checks that statements are valid SAS statements ! executes global statements ! identifies any other errors that it finds ! creates the descriptor portion of any output data sets that are specified in program statements ! does not write any observations to new data sets that SAS creates ! does not execute most of the subsequent DATA steps or procedures in the program (exceptions

include PROC DATASETS and PROC CONTENTS). Note: Any data sets that are created after SAS has entered syntax check mode do not replace existing data sets with the same name.

How Different Modes Process Errors When SAS encounters most syntax or semantic errors, SAS underlines the point where it detects the error and identifies the error by number. If SAS encounters a syntax error when you run noninteractive SAS programs or batch jobs, it enters syntax check mode and remains in this mode until the program finishes executing. When you run SAS in interactive line mode or in a windowing environment, syntax check mode is in effect only during the step where SAS encountered the error. When the system detects an error, it stops executing the current step and continues processing the next step.

Processing Multiple Errors Depending on the type and severity of the error, the method you use to run SAS, and your operating environment, SAS either stops program processing or flags errors and continues processing. SAS continues to check individual statements in procedures after it finds certain kinds of errors. Thus, in some cases SAS can detect multiple errors in a single statement and may issue more error messages for a given situation, particularly if the statement containing the error creates an output SAS data set. The following example illustrates a statement with two errors: data temporary; Item1=4; run; proc print data=temporary; var Item1 Item2 Item3; run;

85

SAS Log: Multiple Program Errors

cpu time 0.00 seconds 1 data temporary; 2 Item1=4; 3 run; NOTE: The data set WORK.TEMPORARY has 1 observations and 1 variables. NOTE: DATA statement used: real time 0.10 seconds cpu time 0.01 seconds 4 5 proc print data=temporary; ERROR: Variable ITEM2 not found. ERROR: Variable ITEM3 not found. 6 var Item1 Item2 Item3; 7 run; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE PRINT used: real time 0.53 seconds cpu time 0.01 seconds

SAS displays two error messages, one for the variable Item2 and one for the variable Item3. When running debugged production programs that are unlikely to encounter errors, you may want to force SAS to abend after a single error occurs. You can use the ERRORABEND system option to do this.

Using System Options to Debug a Program You can use the following system options to control error handling (resolve errors) in your program: BYERR controls whether SAS generates an error message and sets the error flag when a

_NULL_ data set is used in the SORT procedure. DKRICOND= controls the level of error detection for input data sets during the processing of DROP=,

KEEP=, and RENAME= data set options. DKROCOND= controls the level of error detection for output data sets during the processing of

DROP=, KEEP=, and RENAME= data set options and the corresponding DATA stepstatements.

DSNFERR controls how SAS responds when a SAS data set is not found. ERRORABEND specifies how SAS responds to errors. ERRORCHECK= controls error handling in batch processing. ERRORS= controls the maximum number of observations for which complete error messages are

printed. FMTERR determines whether SAS generates an error message when a format of a variable cannot

be found. INVALIDDATA= specifies the value that SAS assigns to a variable when invalid numeric data is

encountered. MERROR controls whether SAS issues a warning message when a macro-like name does not

match a macro keyword. SERROR controls whether SAS issues a warning message when a defined macro variable

reference does not match a macro variable. VNFERR controls how SAS responds when a _NULL_ data set is used. For more information, see "SAS System Options" in SAS Language Reference: Dictionary.

Using Return Codes In some operating environments SAS passes a return code to the system, but accessing return codes is specific to your operating environment. Operating Environment Information: For more information about return codes, see the SAS documentation for your operating environment.

86

J. APPROFONDIMENTI: MANIPOLAZIONE DI DATA SET SAS J1. Overview of Methods for Combining SAS Data Sets You can use these methods to combine SAS data sets:

! concatenating ! interleaving ! one-to-one reading ! one-to-one merging ! match merging ! updating.

!

<D(9-)'(-)4($!The following figure shows the results of concatenating two SAS data sets. Concatenating the data sets appends the observations from one data set to another data set. The DATA step reads DATA1 sequentially until all observations have been processed, and then reads DATA2. Data set COMBINED contains the results of the concatenation. Note that the data sets are processed in the order in which they are listed in the SET statement.

!

!7()'#/'-@4($!The following figure shows the results of interleaving two SAS data sets. Interleaving intersperses observations from two or more data sets, based on one or more common variables. Data set COMBINED shows the result.

!

!

87

A('B)DBA('!G'-M4($!-(M!A('B)DBA('!H'#$4($!The following figure shows the results of one-to-one reading and one-to-one merging. One-to-one reading combines observations from two or more SAS data sets by creating observations that contain all of the variables from each contributing data set. Observations are combined based on their relative position in each data set, that is, the first observation in one data set with the first in the other, and so on. The DATA step stops after it has read the last observation from the smallest data set. One-to-one merging is similar to a one-to-one reading, with two exceptions: you use the MERGE statement instead of multiple SET statements, and the DATA step reads all observations from all data sets. Data set COMBINED shows the result.

!

H-)9=BH'#$4($!The following figure shows the results of match-merging. Match-merging combines observations from two or more SAS data sets into a single observation in a new data set based on the values of one or more common variables. Data set COMBINED shows the results.

!

!

88

?.M-)4($!

The following figure shows the results of updating a master data set. Updating uses information from observations in a transaction data set to delete, add, or alter information in observations in a master data set. You can update a master data set by using the UPDATE statement or the MODIFY statement. If you use the UPDATE statement, your input data sets must be sorted by the values of the variables listed in the BY statement. (In this example, MASTER and TRANSACTION are both sorted by Year.) If you use the MODIFY statement, your input data does not need to be sorted.

UPDATE replaces an existing file with a new file, allowing you to add, delete, or rename columns. MODIFY performs an update in place by rewriting only those records that have changed, or by appending new records to the end of the file.

Note that by default, UPDATE and MODIFY do not replace nonmissing values in a master data set with missing values from a transaction data set.

!

89

J2. MANIPOLAZIONE DI DATA SET SAS J2.1. PER CONCATENARE I DSS: USO DI SET I DSS sono concatenati uno dietro l’altro secondo l’ordine con cui sono scritti nell’istruzione SET. Il DSS costruito ha l’unione delle variabili dei DSS di partenza e un numero di osservazioni pari alla somma del numero di osservazioni dei DSS di partenza. DSS Animali a Antilope 5 a Ariete . b Balena 3 c Canguro 7

DSS Animali_1 a Aquila 18 b Bufalo 19 c Cervo 17 d Delfino . d Daino 13 e Elefante 15 f Farfalla 16

DSS Piante a Ananas 29 b Banano 25 c Cocco . c Ciliegio 27 d Dattero 24 e Ebano 25

data a.concat1; set a.animali a.animali_1; run;

Obs Comune Animale Numero 1 a Antilope 5 2 a Ariete . 3 b Balena 3 4 c Canguro 7 5 a Aquila 18 6 b Bufalo 19 7 c Cervo 17 8 d Delfino . 9 d Daino 13 10 e Elefante 15 11 f Farfalla 16

I due DSS di partenza hanno le stesse variabili.

data a.concat2; set a.animali a.piante; run;

Obs Comune Animale Numero Piante 1 a Antilope 5 2 a Ariete . 3 b Balena 3 4 c Canguro 7 5 a 29 Ananas 6 b 25 Banano 7 c . Cocco 8 c 27 Ciliegio 9 d 24 Dattero 10 e 25 Ebano

I due DSS di partenza hanno alcune variabili uguali e alcune diverse.

90

J2.2. PER INTERCALARE I DSS: USO DI SET – BY E DI MERGE – BY La sintassi è: data DSS; set nome-DSS1 nome-DSS2; BY nome-var;

data DSS; merge nome-DSS1 nome-DSS2; BY nome-var;

Il DSS costruito ha l’unione delle variabili dei DSS di partenza. I DSS di partenza devono essere ordinati secondo la variabile BY. Nel DSS costruito le osservazioni dei DSS di partenza sono intercalate le une alle altre secondo la variabile BY: il numero delle osservazioni è uguale quindi alla somma del numero delle osservazioni dei DSS di partenza.

Nel DSS costruito le osservazioni dei DSS di partenza sono COMBINATE in una sola osservazione secondo la variabile BY. Il numero di osservazioni del DSS finale è uguale alla somma del numero massimo di osservazioni della variabile BY in tutti i DSS. Il valore di ciascuna variabile è “ritenuto” fino a quando non sono state scritte tutte le osservazioni della variabile BY.

DSS Animali a Antilope 5 a Ariete . b Balena 3 c Canguro 7



data a.usosetby1; set a.animali a.animali_1; by comune; run;

Obs Comune Animale Numero 1 a Antilope 5 2 a Ariete . 3 a Aquila 18 4 b Balena 3 5 b Bufalo 19 6 c Canguro 7 7 c Cervo 17 8 d Delfino . 9 d Daino 13 10 e Elefante 15 11 f Farfalla 16

I due DSS di partenza hanno le stesse variabili.

data a.usosetby2; set a.animali a.piante; by comune; run;

Obs Comune Animale Numero Piante 1 a Antilope 5 2 a Ariete . 3 a 29 Ananas 4 b Balena 3 5 b 25 Banano 6 c Canguro 7 7 c . Cocco 8 c 27 Ciliegio 9 d 24 Dattero 10 e 25 Ebano

I due DSS di partenza hanno alcune variabili uguali e alcune diverse.

91

DSS Animali_nr a Antilope 5 b Balena . c Canguro 7

DSS Piante_nr a Ananas 29 b Banano 25 c Cocco . d Dattero 24 e Ebano 25

data a.usomerge1; merge a.animali_nr a.piante_nr; by comune; run;

Obs Comune Animale Numero Piante 1 a Antilope 29 Ananas 2 b Balena 25 Banano 3 c Canguro . Cocco 4 d 24 Dattero 5 e 25 Ebano

I due DSS hanno una sola osservazione per ogni valore della variabile BY.

DSS Animali a Antilope 5 a Ariete . b Balena 3 c Canguro 7


data a.usomergeby1; merge a.animali a.piante; by comune; run;

Obs Comune Animale Numero Piante 1 a Antilope 29 Ananas 2 a Ariete . Ananas 3 b Balena 25 Banano 4 c Canguro . Cocco 5 c Canguro 27 Ciliegio 6 d 24 Dattero 7 e 25 Ebano

data a.usomergeby2; merge a.piante a.animali; by comune; run;

Obs Comune Piante Numero Animale 1 a Ananas 5 Antilope 2 a Ananas . Ariete 3 b Banano 3 Balena 4 c Cocco 7 Canguro 5 c Ciliegio 27 Canguro 6 d Dattero 24 7 e Ebano 25

Uno dei due DSS ha più osservazioni per uno stesso valore della variabile BY. La variabile NUMERO (comune ai due DSS, ma non variabile BY) assume i valori del DSS2. Se le osserazioni sono duplicate per il primo DSS e non per il secondo la variabile NUMERO assume i valori del primo DSS.

92

J2.3. PER “AFFIANCARE” DSS CON VARIABILI DIVERSE: USO DI SET – SET E DI MERGE La sintassi è: data DSS; set nome-DSS1; set nome-DSS2;

data DSS; merge nome-DSS1 nome-DSS2;

Il DSS costruito ha l’unione delle variabili dei DSS di partenza. Le variabili del DSS2 vengono sovrapposte a quelle del DSS1. Se le variabili del DD2 sono mancanti vengono lasciati i valori del DSS1. Più precisamente. Viene letta la prima osservazione dal DSS1 poi la prima osservazione dal secondo DSS2: se entrambi i DSS contengono le stesse variabili il valore del DSS1 è sostituito da quello del DSS2, anche se il valore è missing. CASO DI SET – SET Il procedimento è ripetuto fino a quando è letta l’ultima osservazione del più CORTO DSS. Il DSS costruito ha un numero di osservazioni pari al numero MINIMO di osservazioni dei due DSS di partenza.

CASO DI MERGE Il procedimento è ripetuto fino a quando è letta l’ultima osservazione del più LUNGO DSS. Il DSS costruito ha un numero di osservazioni pari al numero MASSIMO di osservazioni dei due DSS di partenza.

PRIMO CASO: Le osservazioni sono scritte nello stesso ordine nei due DSS da affiancare (fra i nostri esempi consideriamo quei DSS che non hanno osservazioni ripetute). DSS Animali_nr a Antilope 5 b Balena . c Canguro 7


data a.usosetset1; set a.animali_nr; set a.piante_nr; run;

Obs Comune Animale Numero Piante 1 a Antilope 29 Ananas 2 b Balena 25 Banano 3 c Canguro . Cocco

data a.usomsetset2; set a.piante_nr; set a.animali_nr; run;

Obs Comune Piante Numero Animale 1 a Ananas 5 Antilope 2 b Banano . Balena 3 c Cocco 7 Canguro

I DSS di partenza hanno osservazioni scritte nello stesso ordine.

data a.usomerge1; merge a.animali_nr a.piante_nr; run;

Obs Comune Animale Numero Piante 1 a Antilope 29 Ananas 2 b Balena 25 Banano 3 c Canguro . Cocco 4 d 24 Dattero 5 e 25 Ebano

data a.usomerge2; merge a.piante_nr a.animali_nr; run;

Obs Comune Piante Numero Animale 1 a Ananas 5 Antilope 2 b Banano . Balena 3 c Cocco 7 Canguro 4 d Dattero 24 5 e Ebano 25

I DSS di partenza hanno osservazioni scritte nello stesso ordine nei due DSS. Il risultato è analogo al caso di MERGE - BY.

93

SECONDO CASO: Le osservazioni NON sono scritte nello stesso ordine nei due DSS da affiancare (fra i nostri esempi consideriamo quei DSS che hanno osservazioni ripetute). DSS Animali a Antilope 5 a Ariete . b Balena 3 c Canguro 7



data a.usosetset3; set a.animali; set a.piante; run;

Obs Comune Animale Numero Piante 1 a Antilope 29 Ananas 2 b Ariete 25 Banano 3 c Balena . Cocco 4 c Canguro 27 Ciliegio

data a.usosetset3; set a.piante; set a.animali; run;

Obs Comune Piante Numero Animale 1 a Ananas 5 Antilope 2 a Banano . Ariete 3 b Cocco 3 Balena 4 c Ciliegio 7 Canguro

data a.usomerge3; merge a.animali a.piante; run;

Obs Comune Animale Numero Piante 1 a Antilope 29 Ananas 2 b Ariete 25 Banano 3 c Balena . Cocco 4 c Canguro 27 Ciliegio 5 d 24 Dattero 6 e 25 Ebano

data a.usomerge4; merge a.piante a.animali; run;

Obs Comune Piante Numero Animale 1 a Ananas 5 Antilope 2 a Banano . Ariete 3 b Cocco 3 Balena 4 c Ciliegio 7 Canguro 5 d Dattero 24 6 e Ebano 25

I DSS di partenza hanno alcune variabili uguali e altre diverse. Il DSS2 viene sovrapposto Al DSS1; la variabile Numero contiene i valori del DSS2 anche se il valore è mancante. I DSS di partenza hanno osservazioni ripetute per la variabile Comune. Fare molta attenzione!

TERZO CASO: I DSS di partenza hanno le stesse variabili. data a.usosetset4; set a.animali; set a.animali_1; run; proc print; run;

Obs Comune Animale Numero 1 a Aquila 18 2 b Bufalo 19 3 c Cervo 17 4 d Delfino .

94

J2.4. PER AGGIORNARE UN DSS: USO DI UPDATE La sintassi dell’istruzione è la seguente: UPDATE DSS-principale DSS-di-transizione; BY variabile/i ; Il DSS principale è quello da aggiornare, il DSS di transizione contiene i valori aggiornati. È opportuno che i valori della variabile BY siano unici per ogni osservazione del DSS principale: se il DSS principale contiene due osservazioni con lo stesso valore della variabile BY la prima è aggiornata e la seconda è ignorata. Se il DSS di transizione contiene un valore mancante, rimane il valore del DSS principale. DSS Piante a Ananas 29 b Banano 25 c Cocco . c Ciliegio 27 d Dattero 24 e Ebano 25


DSS Piante_nr_2 c 32 Cipresso d 34 e 35 Edera

DSS Piante_3 c 32 Cipresso c 39 Cedro d 34 e 35 Edera

data a.usoupdate; update a.piante_nr a.piante_nr_2; by comune; run;

Obs Comune Piante Numero 1 a Ananas 29 2 b Banano 25 3 c Cipresso 32 4 d Dattero 34 5 e Edera 35

Il DSS principale non ha osservazioni ripetute per la variabile BY.

data a.usoupdate; update a.piante a.piante_nr_2; by comune; run;

Obs Comune Piante Numero 1 a Ananas 29 2 b Banano 25 3 c Cipresso 32 4 c Ciliegio 27 5 d Dattero 34 6 e Edera 35

Il DSS principale ha osservazioni ripetute per la variabile BY. E’ aggiornata solo la prima osservazione.

data usoupdate; update piante piante_3; by comune; run;

Obs Comune Piante Numero 1 a Ananas 29 2 b Banano 25 3 c Cedro 39 4 c Ciliegio 27 5 d Dattero 34 6 e Edera 35

Sia il DSS principale che quello di transizione hanno osservazioni ripetute per la variabile BY: ha effetto solo l’ultima osservazione del DSS di transizione sulla prima osservazione del DSS principale.

PER AGGIORNARE UN DSS: USO DI MERGE – BY data upmergeby; merge piante piante_3; by Comune; run;

Obs Comune Piante Numero 1 a Ananas 29 2 b Banano 25 3 c Cipresso 32 4 c Cedro 39 5 d 34 6 e Edera 35

Si sono modificate entrambe le osservazioni con valore ripetuto per la variabile BY, ma il valore mancante ha sostituito il valore originale del primo DSS.

95

J2.5 PER AGGIUNGERE OSSERVAZIONI A UN DSS: LA PROC APPEND La sintassi della procedura è la seguente:

PROC APPEND BASE= DSS-principale DATA= DSS-di-transizione <FORCE>;

Il DSS principale è quello a cui si vogliono aggiungere osservazioni, che sono contenute nel DSS di transizione.

Le variabili del DSS di transizione non contenute già nel DSS principale vengono ignorate.

Se le variabili sono le stesse nei due DSS, la Proc Append e l’istruzione SET producono un risultato analogo: se il DSS principale è molto grande può essere più efficiente usare la procedura APPEND che usare l’istruzione SET.

Attenzione: il DSS principale viene modificato; può essere opportuno fare una copia del DSS principale.

data a.animali_bis; set a.animali;

data a.animali1_bis; set a.animali1;

data a.piante_bis; set a.piante;

proc append base=a.animali_bis data=a.animali1_bis; run;

Obs Comune Animale Numero 1 a Antilope 5 2 a Ariete . 3 b Balena 3 4 c Canguro 7 5 a Aquila 18 6 b Bufalo 19 7 c Cervo 17 8 d Delfino . 9 d Daino 13 10 e Elefante 15 11 f Farfalla 16

I DSS hanno le stesse variabili.

proc append base=a.animali_bis data=a.piante_bis force; run; proc print; run;

Obs Comune Animale Numero 1 a Antilope 5 2 a Ariete . 3 b Balena 3 4 c Canguro 7 5 a Aquila 18 6 b Bufalo 19 7 c Cervo 17 8 d Delfino . 9 d Daino 13 10 e Elefante 15 11 f Farfalla 16 12 a 29 13 b 25 14 c . 15 c 27 16 d 24 17 e 25

I DSS non hanno le stesse variabili. È necessario mettere l’opzione FORCE. La variabile Piante del DSS di transizione viene ignorata.

96

J3. OSSERVAZIONI RAGGRUPPATE: USO DI SET – BY E VARIABILI FIRST.<..> E LAST.<..> Understanding BY Groups BY Groups with a Single BY Variable

The following figure represents the results of processing your data with the single BY variable ZipCode. The input SAS data set contains street names, cities, states, and ZIP codes that are arranged in an order that you can use with the following BY statement: by ZipCode;

The figure shows five BY groups each containing the BY variable ZipCode. The data set is shown with the BY variable ZipCode printed on the left for easy reading, but the position of the BY variable in the observations does not matter.

BY Groups for the Single BY Variable ZipCode

The first BY group contains all observations with the smallest BY value, which is 33133; the second BY group contains all observations with the next smallest BY value, which is 33146, and so on.

BY Groups with Multiple BY Variables

The following figure represents the results of processing your data with two BY variables, State and City. This example uses the same data set as in BY Groups with a Single BY Variable, and is arranged in an order that you can use with the following BY statement: by State City;

The figure shows three BY groups. The data set is shown with the BY variables State and City printed on the left for easy reading, but the position of the BY variables in the observations does not matter.

97

BY Groups for the BY Variables State and City

The observations are arranged so that the observations for Arizona occur first. The observations within each value of State are arranged in order of the value of City. Each BY group has a unique combination of values for the variables State and City. For example, the BY value of the first BY group is AZ Tucson, and the BY value of the second BY group is FL Lakeland.

Invoking BY-Group Processing You can invoke BY-group processing in both DATA steps and PROC steps by using a BY statement. For example, the following DATA step program uses the SET statement to combine observations from three SAS data sets by interleaving the files. The BY statement shows how the data is ordered. data all_sales; set region1 region2 region3; by State City Zip; ... more SAS statements ... run;

This section describes BY-group processing for the DATA step. For information on BY-group processing with procedures, see the SAS Procedures Guide.

How the DATA Step Identifies BY Groups In the DATA step, SAS identifies the beginning and end of each BY group by creating two temporary variables for each BY variable: FIRST.variable and LAST.variable. These temporary variables are available for DATA step programming but are not added to the output data set. Their values indicate whether an observation is

! the first one in a BY group

! the last one in a BY group

! neither the first nor the last one in a BY group

! both first and last, as is the case when there is only one observation in a BY group.

You can take actions conditionally, based on whether you are processing the first or the last observation of a BY group. When an observation is the first in a BY group, SAS sets the value of the FIRST.variable to 1. For all other observations in the BY group, the value of the FIRST.variable is 0. Likewise, if an observation is

98

the last in a BY group, SAS sets the value of LAST.variable to 1. For all other observations in the BY group, the value of LAST.variable is 0. If the observations are sorted by more than one BY variable, the FIRST.variable for each variable in the BY statement is set to 1 at the first occurrence of a new value for the variable.

This example shows how SAS uses the FIRST.variable and LAST.variable to flag the beginning and end of four BY groups. Six temporary variables are created within the program data vector. These variables can be used during the DATA step, but they do not become variables in the new data set.

In the figure that follows, observations in the SAS data set are arranged in an order that can be used with this BY statement: by State City ZipCode;

SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City, LAST.City, FIRST.ZipCode, and LAST.ZipCode.

FIRST. and LAST. Values for Four BY Groups

ESEMPIO L’esempio è riferito a un insieme di dati clinici; della maggior parte dei pazienti vengono fatte più rilevazioni in epoche successive: la variabile “giorno” indica il conteggio dei giorni dall’inizio dello studio clinico. La variabile “paziente” indica il numero che identifica il paziente Il seguente programma costruisce un DSS con una variabile contenente il numero di rilevazioni per ciascun paziente. data numero;

set tp1.cirr_seq; by id;

retain num_oss; if first.id =1 then num_oss =0; num_oss =num_oss +1; if last.id=1 then output; keep id num_oss;

99

K. APPROFONDIMENTI: LETTURA DI DATI GREZZI Abbiamo già visto tre tipi di input

! a lista ! a column ! a formato

In questi approfondimenti ne esamineremo altri due: ! a lista con formato ! con nome

K1. INPUT A LISTA CON FORMATO È la versione più flessibile dell’input.

! La & permette di leggere variabili carattere che contengono blank (già visto) ! Il : (due punti) scritto dopo il nome della variabile permette di usare fromati (gli stessi dell’input a

formato). La differenza con l’input a formato è che in questo caso SAS legge fino a incontrare un carattere bianco; quindi

o input x : 4.1 input a lista con formato significa “la variabile è scritta su al più 4 colonne di cui una corrisponde alle decine e una al separatore decimale

o input x 4.1 input a formato significa “la variabile è scritta su esattamente 4 colonne di cui una corrisponde alle decine e una al separatore decimale

! La ~ (tilde) fpremette di leggee e conservare le virgolette, le doppie virgolette e I delimitatori dentro variabili carattere

ESEMPIO di uso di : e ~ data scores; infile datalines dsd; input Name : $9. Score1-Score3 Team ~ $25. Div $; datalines; Smith,12,22,46,"Green Hornets, Atlanta",AAA Mitchel,23,19,25,"High Volts, Portland",AAA Jones,09,17,54,"Vulcans, Las Vegas",AA ; proc print data=scores noobs; run; OUTPUT SAS

Name Score1 Score2 Score3 Team Div Smith 12 22 46 "Green Hornets, Atlanta" AAA Mitchel 23 19 25 "High Volts, Portland" AAA Jones 9 17 54 "Vulcans, Las Vegas" AA

100

K2. INPUT CON NOME Serve per leggere dati in cui i valori sono preceduti dal nome della variabile e dal segno uguale (=) data games; input name=$ score1= score2=; datalines; name=riley score1=1132 score2=1187 ; proc print data=games; run;

!

N;O!IAIE+8I7A8+!2+CCP !78E?Q1!?IA!27!R

@ holds an input record for the execution of the next INPUT statement within the same iteration of the DATA step. This line-hold specifier is called trailing @.

Restriction: The trailing @ must be the last item in the INPUT statement.

Tip: The trailing @ prevents the next INPUT statement from automatically releasing the current input record and reading the next record into the input buffer. It is useful when you need to read from a record multiple times.

+,-&./'1!SD/M4($!-!G'9D#M!4(!)='!7(.%)!T%33'#!!

This example reads a file that contains two kinds of input data records and creates a SAS data set from these records. One type of data record contains information about a particular college course. The second type of record contains information about the students enrolled in the course. You need two INPUT statements to read the two records and to assign the values to different variables that use different formats. Records that contain class information have a C in column 1; records that contain student information have an S in column 1, as shown here: ----+----1----+----2----+ C HIST101 Watson S Williams 0459 S Flores 5423 C MATH202 Sen S Lee 7085 To know which INPUT statement to use, check each record as it is read. Use an INPUT statement that reads only the variable that tells whether the record contains class or student. data schedule(drop=type); infile file-specification; retain Course Professor; input type $ 1 @; if type='C' then input course $ professor $; else if type='S' then do; input Name $10. Id; output schedule; end; proc print; run; The first INPUT statement reads the TYPE value from column 1 of every line. Because this INPUT statement ends with a trailing @, the next INPUT statement in the DATA step reads the same line. The IF-THEN statements that follow check whether the record is a class or student line before another INPUT statement reads the rest of the line. The INPUT statements without a trailing @ release the held line. The

101

RETAIN statement saves the values about the particular college course. The DATA step writes an observation to the SCHEDULE data set after a student record is read. The following output that PROC PRINT generates shows the resulting data set SCHEDULE.

The SAS System 1 OBS Course Professor Name Id 1 HIST101 Watson Williams 459 2 HIST101 Watson Flores 5423 3 MATH202 Sen Lee 7085

+,-&./'1!ED*4)4D(4($!)='!ED4()'#!U4)=!-!8%&'#49!:-#4-L/'!!

This example uses a numeric variable to position the pointer. A raw data file contains records with the employment figures for several offices of a multinational company. The input data records are

----+----1----+----2----+----3----+ 8 New York 1 USA 14 5 Cary 1 USA 2274 3 Chicago 1 USA 37 22 Tokyo 5 ASIA 80 5 Vancouver 2 CANADA 6 9 Milano 4 EUROPE 123

The first column has the column position for the office location. The next numeric column is the region category. The geographic region occurs before the number of employees in that office. You determine the office location by combining the @numeric-variable pointer control with a trailing @. To read the records, use two INPUT statements. The first INPUT statement obtains the value for the @ numeric-variable pointer control. The second INPUT statement uses this value to determine the column that the pointer moves to. data office (drop=x); infile file-specification; input x @; if 1<=x<=10 then input @x City $9.; else do; put 'Invalid input at line ' _n_; delete; end; run; The DATA step writes only five observations to the OFFICE data set. The fourth input data record is invalid because the value of X is greater than 10. Therefore, the second INPUT statement does not execute. Instead, the PUT statement writes a message to the SAS log and the DELETE statement stops processing the observation.

102

K4. OPZIONI DI INFILE PER LEGGERE DATI CON DELIMITATORI NELL’INPUT A LISTA By default, the delimiter to read input data records with list input is a blank space. Both the DSD option and the DELIMITER= option affect how list input handles delimiters. The DELIMITER= option specifies that the INPUT statement use a character other than a blank as a delimiter for data values that are read with list input. When the DSD option is in effect, the INPUT statement uses a comma as the default delimiter. To read a value as missing between two consecutive delimiters, use the DSD option. By default, the INPUT statement treats consecutive delimiters as a unit. When you use DSD, the INPUT statement treats consecutive delimiters separately. Therefore, a value that is missing between consecutive delimiters is read as a missing value. To change the delimiter from a comma to another value, use the DELIMITER= option. For example, this DATA step program uses list input to read data that are separated with commas. The second data line contains a missing value. Because SAS allows consecutive delimiters with list input, the INPUT statement cannot detect the missing value.

data scores; infile datalines delimiter=','; input test1 test2 test3; datalines; 91,87,95 97,,92 ,1,1 ;

With the FLOWOVER option in effect, the data set SCORES contains two, not three, observations. The second observation is built incorrectly:

OBS TEST1 TEST2 TEST3

1 91 87 95 2 97 92 1

To correct the problem, use the DSD option in the INFILE statement. infile datalines dsd; Now the INPUT statement detects the two consecutive delimiters and therefore assigns a missing value to variable TEST 2 in the second observation.

OBS TEST1 TEST2 TEST3

1 91 87 95 2 97 . 92 3 1 1 1

The DSD option also enables list input to read a character value that contains a delimiter within a quoted string. For example, if data are separated with commas, DSD enables you to place the character string in quotation marks and read a comma as a valid character. SAS does not store the quotation marks as part of the character value. To retain the quotation marks as part of the value, use the tilde (~) format modifier in an INPUT statement.

+,-&./'!01!<=-($4($!SDU!2'/4&4)'#*!-#'!Q#'-)'M!!

By default, the INPUT statement uses a blank as the delimiter. This DATA step uses a comma as the delimiter: data num; infile datalines dsd; input x y z; datalines; ,2,3 4,5,6 7,8,9 ;

The argument DATALINES in the INFILE statement allows you to use an INFILE statement option to read in-stream data lines. The DSD option sets the comma as the default delimiter. Because a comma precedes the first value in the first dataline, a missing value is assigned to variable X in the first observation, and the value 2 is assigned to variable Y.

103

If the data uses multiple delimiters or a single delimiter other than a comma, simply specify the delimiter values with the DELIMITER= option. In this example, the characters a and b function as delimiters: data nums; infile datalines dsd delimiter='ab'; input X Y Z; datalines; 1aa2ab3 4b5bab6 7a8b9 ; The output that PROC PRINT generates shows the resulting NUMS data set. Values are missing for variables in the first and second observation because DSD causes list input to detect two consecutive delimiters. If you omit DSD, the characters a, b, aa, ab, ba, or bb function as the delimiter and no variables are assigned missing values.

The SAS System 1 OBS X Y Z 1 1 . 2 2 4 5 . 3 7 8 9

This DATA step uses modified list input and the DSD option to read data that are separated by commas and that may contain commas as part of a character value:

data scores; infile datalines dsd; input Name : $9. Score Team : $25. Div $; datalines; Joseph,76,"Red Racers, Washington",AAA Mitchel,82,"Blue Bunnies, Richmond",AAA Sue Ellen,74,"Green Gazelles, Atlanta",AA ;

The output that PROC PRINT generates shows the resulting SCORES data set. The delimiter (comma) is stored as part of the value of TEAM while the quotation marks are not. The folowing output shows how to use the tilde (~) format modifier in an INPUT statement to retain the quotation marks in character data.

OBS NAME SCORE TEAM DIV 1 Joseph 76 Red Racers, Washington AAA 2 Mitchel 82 Blue Bunnies, Richmond AAA 3 Sue Ellen 74 Green Gazelles, Atlanta AA

104

K5. Valori Missing particolari Is a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore. Example The following example uses data from a marketing research company. Five testers were hired to test five different products for ease of use and effectiveness. If a tester was absent, there is no rating to report, and the value is recorded with an X for "absent." If the tester was unable to test the product adequately, there is no rating, and the value is recorded with an I for "incomplete test." The following program reads the data and displays the resulting SAS data set. Note the special missing values in the first and third data lines: data period_a; missing X I; input Id $4. Foodpr1 Foodpr2 Foodpr3 Coffeem1 Coffeem2; datalines; 1001 115 45 65 I 78 1002 86 27 55 72 86 1004 93 52 X 76 88 1015 73 35 43 112 108 1027 101 127 39 76 79 ; proc print data=period_a; title 'Results of Test Period A'; footnote1 'X indicates TESTER ABSENT'; footnote2 'I indicates TEST WAS INCOMPLETE'; run;

The following output is produced:

Results of Test Period A Obs Id Foodpr1 Foodpr2 Foodpr3 Coffeem1 Coffeem2 1 1001 115 45 65 I 78 2 1002 86 27 55 72 86 3 1004 93 52 X 76 88 4 1015 73 35 43 112 108 5 1027 101 127 39 76 79 X indicates TESTER ABSENT I indicates TEST WAS INCOMPLETE

105

L. APPROFONDIMENTI: FORMATI DI LETTURA E SCRITTURA DEI DATI

L1. ISTRUZIONE FORMAT L2. ISTRUZIONE INFORMAT L3. ISTRUZIONE LENGTH L4. ISTRUZIONE ATTRIB L5. PROC FORMAT

L5.1. Istruzione VALUE L5.2. Istruzione INVALUE L5.3. Istruzione PICTURE L5.4. Alcuni esempi di cambio di formati L5.5. Funzioni di conversione da variabile carattere a numerica e viceversa

L6. SAS Date, Time, and Datetime Values L7. ALCUNE FUNZIONI DI ARROTONDAMENTO L8. ALCUNE FUNZIONI SULLE VARIABILI CARATTERE

Associating Informats and Formats with Variables!

Step! Informats ! Formats !

In a DATA step !

Use the ATTRIB or INFORMAT statement to permanently associate an informat with a variable. Use the INPUT function or INPUT statement to associate the informat with the variable only for the duration of the DATA step.

Use the ATTRIB or FORMAT statement to permanently associate a format with a variable. Use the PUT function or PUT statement to associate the format with the variable only for the duration of the DATA step.!

In a PROC step!

The ATTRIB and INFORMAT statements are valid in base SAS procedures. However, in base SAS software, typically you do not assign informats in PROC steps because the data have already been read into SAS variables. !

Use the ATTRIB statement or the FORMAT statement to associate formats with variables. If you use either statement in a procedure that produces an output data set, the format is permanently associated with the variable in the output data set. If you use either statement in a procedure that does not produce an output data set, the statement associates the format with the variable only for the duration of the PROC step. !

COSTRUZIONE DI UN DATA SET SAS DI ESEMPIO

data pippo2; input a $ x; datalines; mnopqr 55 fghilm 52 mnopqr 53 ;

106

L1. ISTRUZIONE FORMAT Un FORMAT è un’istruzione che SAS usa per scrivere i valori dei dati. Si usa l’istruzione FORMAT per controllare l’aspetto dei dati o, in alcuni casi, per raggruppare i dati da analizzare. Per esempio, il format WORDS22, che converte i valori numerici nella corrispondente versione in lettere (in inglese), scrive il valore numerico 692 come six hundred ninety-two.

La sintassi è <$>format<w>.<d>

L1.1 - Ambiente: passo di data 1-modifica del formato di variabile numerica data pluto; set pippo2; format x 5.1;

Obs a x 1 mnopqr 55.0 2 fghilm 52.0 3 mnopqr 53.0

The CONTENTS Procedure ... --Alphabetic List of Variables and Attributes-- # Variable Type Len Pos Format ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 2 x Num 8 0 5.1

2-modifica del formato di una variabile carattere data pluto2; set pippo2; format a $3.;

Obs a x 1 mno 55 2 fgh 52 3 mno 53

The CONTENTS Procedure ... --Alphabetic List of Variables and Attributes-- # Variable Type Len Pos Format ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 $3. 2 x Num 8 0

OSSERVAZIONE: In entrambi i casi la lunghezza della variabile rimane quella originale.

L2.2 - Ambiente: passo di proc 1-modifica del formato di variabile numerica data pluto; set pippo2; proc print; format x 5.1; proc contents; run;

Obs a x 1 mnopqr 55.0 2 fghilm 52.0 3 mnopqr 53.0

# Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 2 x Num 8 0

2-modifica del formato di una variabile carattere data pluto2; set pippo2; proc print; format a $3.; proc contents; run;

Obs a x 1 mno 55 2 fgh 52 3 mno 53

# Variable Type Len Pos

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 2 x Num 8 0

107

L2. ISTRUZIONE INFORMAT Un INFORMAT è un’istruzione che SAS usa per leggere e assegnare a una variabile i valori dei dati. Per esempio il seguente valore contiene un segno di dollaro e alcune virgole: $1,000,000 Per rimuovere il segno di dollaro ($) e le virgole (,) prima di memorizzare il valore numerico 1000000 in una variabile, bisogna leggere questo valore con l’informat COMMA11. . Se non si definisce esplicitamente una variabile precedentemente, SAS usa l’informat per determinare se la variabile è numerica o carattere e per determinare la lunghezza di una variabile carattere. La sintassi è la seguente:

<$>informat<w>.<d> !

Informat per categoria CHARACTER instructs SAS to read character data values into character variables. COLUMN-BINARY instructs SAS to read data stored in column-binary or multipunched form into

character and numeric variables. DATE and TIME instructs SAS to read data values into variables that represent dates, times, and

datetimes. NUMERIC instructs SAS to read numeric data values into numeric variables. USER-DEFINED instructs SAS to read data values by using an informat that is created with an

INVALUE statement in PROC FORMAT. Ambiente: passo di data 1-modifica dell’informat di variabile numerica data pluto; set pippo2; informat x 5.1;

Obs a x 1 mnopqr 55 2 fghilm 52 3 mnopqr 53

The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes---- # Variable Type Len Pos Informatƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 2 x Num 8 0 5.1

2-modifica del formato di variabile carattere data pluto2; set pippo2; informat a $3.;

Obs a x 1 mnopqr 55 2 fghilm 52 3 mnopqr 53

The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Informatƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 $3. 2 x Num 8 0

OSSERVAZIONE: In entrambi i casi la lunghezza della variabile rimane quella originale.

108

ALCUNE DEI PRINCIPALI INFORMAT E FORMAT PER LE VARIABILI CARATTERE, NUMERICHE E PER DATE E TEMPI

Categories and Descriptions of Informats Category Informat Description Character $CHARw. Reads/Writes character data with blanks

$CHARZBw. Converts binary 0s to blanks

$QUOTEw. Removes matching quotation marks from character data

$REVERJw. Reads/Writes character data from right to left and preserves blanks

$REVERSw. Reads/Writes character data from right to left and left aligns

$UPCASEw. Converts character data to uppercase

$VARYINGw. Reads/Writes character data of varying length

$w. Reads/Writes standard character data

Date and Time DATEw. Reads/Writes date values in the form ddmmmyy or ddmmmyyyy DATETIMEw. Reads/Writes datetime values in the form ddmmmyy hh:mm:ss.ss or

ddmmmyyyy hh:mm:ss.ss DDMMYYw. Reads/Writes date values in the form ddmmyy or ddmmyyyy MMDDYYw. Reads/Writes date values in the form mmddyy or mmddyyyy MONYYw. Reads/Writes month and year date values in the form mmmyy or mmmyyyy

TIMEw. Reads/Writes hours, minutes, and seconds in the form hh:mm:ss.ss YYMMDDw. Reads/Writes date values in the form yymmdd or yyyymmdd YYMMNw. Reads/Writes date values in the form yyyymm or yymm YYQw. Reads/Writes quarters of the year

Numeric COMMAw.d Removes embedded characters

COMMAXw.d Removes embedded characters

Ew.d Reads/Writes numeric values that are stored in scientific notation and double-precision scientific notation

FLOATw.d Reads/Writes a native single-precision, floating-point value and divides it by 10 raised to the dth power

HEXw. Converts hexadecimal positive binary values to either integer (fixed-point) or real (floating-point) binary values

NUMXw.d Reads/Writes numeric values with a comma in place of the decimal point

PERCENTw.d Reads/Writes percentages as numeric values

w.d Reads/Writes standard numeric data

YENw.d Removes embedded yen signs, commas, and decimal points

ZDw.d Reads/Writes zoned decimal data

ZDBw.d Reads/Writes zoned decimal data in which zeros have been left blank

ZDVw.d Reads/Writes and validates zoned decimal data

109

L3. ISTRUZIONE LENGTH Specifica il numero di byte che SAS usa per immaganizzare i valori delle variabili. Si può usare solo in un passo di Data. La sintassi è:

LENGTH <variable-1><...variable-n> <$> <length> <DEFAULT=n>;

Per modificare la lunghezza di una variabile di un DSS già esistente bisogna mettere l’istruzione prima che il DSS sia dichiarato (con set, merge, ...).

Ambiente: passo di data 1-modifica modifica della lunghezza di una variabile carattere data pluto2; length a $3.; set pippo2;

The CONTENTS Procedure ... # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 3 8 2 x Num 8 0

2-modifica del formato di variabile carattere data pluto2; set pippo2; length a $3.;

The CONTENTS Procedure ... # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 2 x Num 8 0

110

L4. ISTRUZIONE ATTRIB !

Associates a format, informat, label, and/or length with one or more variables Syntax ATTRIB variable-list(s) attribute-list(s) ; !

"#$%&'()*!variable-list

names the variables that you want to associate with the attributes. Tip:! List the variables in any form that SAS

allows.!attribute-list

specifies one or more attributes to assign to variable-list. Specify one or more of these attributes in the ATTRIB statement: FORMAT=format associates a format with variables in variable-list. Tip:! The format can be either a standard SAS format or a format that is defined with the

FORMAT procedure.!INFORMAT=informat associates an informat with variables in variable-list. Tip:! The informat can be either a standard SAS informat or an informat that is defined

with the FORMAT procedure.!LABEL='label' associates a label with variables in variable-list. LENGTH=<$>length specifies the length of variables in variable-list. Requirement:!

Put a dollar sign ($) in front of the length of character variables.!

Tip:! Use the ATTRIB statement before the SET statement to change the length of variables in an output data set when you use an existing data set as input.!

Range:! For character variables, the range is 1 to 32,767 for all operating environments. Operating Environment Information: For numeric variables, the minimum length you can specify with the LENGTH= specification is 2 in some operating environments and 3 in others.

!

Details!!Q='!T-*49*!Using the ATTRIB statement in the DATA step permanently associates attributes with variables by changing the descriptor information of the SAS data set that contains the variables. You can use ATTRIB in a PROC step, but the rules are different.

111

SDU!I"I!Q#'-)*!:-#4-L/'*!U='(!VD%!"**4$(!7(3D#&-)*!U4)=!)='!78WAGH"QX!A.)4D(!D(!)='!"QQG7T!I)-)'&'()!Informats that are associated with variables by using the INFORMAT= option on the ATTRIB statement behave like informats that are used with modified list input. SAS reads the variables by using the scanning feature of list input, but applies the informat. In modified list input, SAS

! does not use the value of w in an informat to specify column positions or input field widths in an external file

! uses the value of w in an informat to specify the length of previously undefined character variables

! ignores the value of w in numeric informats

! uses the value of d in an informat in the same way it usually does for numeric informats

! treats blanks that are embedded as input data as delimiters unless you change their status with a DELIMITER= option specification in an INFILE statement.

If you have coded the INPUT statement to use another style of input, such as formatted input or column input, that style of input is not used when you use the INFORMAT= option on the ATTRIB statement.

Comparisons!You can use either an ATTRIB statement or an individual attribute statement such as FORMAT, INFORMAT, LABEL, and LENGTH to change an attribute that is associated with a variable.

Examples!Here are examples of ATTRIB statements that contain

! single variable and single attribute: attrib cost length=4;

! single variable with multiple attributes: attrib saleday informat=mmddyy. format=worddate.;

! multiple variables with the same multiple attributes: attrib x y length=$4 label='TEST VARIABLE';

! multiple variables with different multiple attributes: attrib x length=$4 label='TEST VARIABLE' y length=$2 label='RESPONSE';

! variable list with single attribute: attrib month1-month12 label='MONTHLY SALES';

112

L5. PROC FORMAT L5.1. Istruzione VALUE Si costruisce una etichetta permanente nella sessione aperta L5.1a - Ambiente: passo di data proc format; value P low-53='basso' 53-high='alto';/* NB: `basso’ per valori <=53 */ value $ C 'mnopqr'=0 'fghilm'=1; data pluto3; set pippo2; format x P. a C.;

Obs a x 1 0 alto 2 1 basso 3 0 basso # Variable Type Len Pos Format ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 $C. 2 x Num 8 0 P.

L5.1b - Ambiente: passo di proc proc format; value P low-53='basso' 53-high ='alto'; /* NB: ‘basso’ per valori <=53 */ value $ C 'mnopqr'=0 'fghilm'=1; data pluto3; set pippo2; run; proc print; format x P. a $C.; run; proc contents;run;

Obs a x 1 0 alto 2 1 basso 3 0 basso # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 2 x Num 8 0

113

L5.2. Istruzione INVALUE 1 - da carattere a numero proc format; invalue fb 'm' =1 'f'=2; data pippo; input a fb. x; datalines; m 55 f 52 m 53 ; run;

Obs a x 1 1 55 2 2 52 3 1 53 The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Num 8 0 2 x Num 8 8

2 - da numero a carattere proc format; invalue $ fc low-53='basso' 53-high='alto'; data pippo; length x $ 6; input a $ x $ fc.; datalines; m 55 f 52 m 53 ; run;

Obs a x 1 m alto 2 f basso 3 m basso The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 0 2 x Char 6 8

114

L5.3. Istruzione PICTURE (SCRITTURA DI VARIABILI NUMERICHE ) data pippo; input a b: ddmmyy. c; datalines; 21111.5 12/05/01 1213344 3.8 13/12/02 1223 ; proc format; picture separa low-high='000,000.000' (dig3sep=',' decsep='.') ; picture sep low-high='000@000&000' (dig3sep='@' decsep='&') ; /*un po strano*/ picture data (default=20) low-high='%A,%d/%m/%Y'(datatype=date); picture doll low-high='000,000,000' (dig3sep=','prefix='$ '); run; la procedura con picture si richiama nella solita maniera, indifferentemente dall’ambiente in cui si lavora: format a separa. b data. c doll.; L’outputdi

proc print; format a separa. b data. c doll.; run;

è il seguente: Obs a b c 1 21,111.500 Saturday,12/5/2001 $ 1,213,344 2 3.800 Friday,13/12/2002 $ 1,223

e la proc contents, a seconda se la formattazione sia stata effettuata in un PASSO DI PROC o in un PASSO DI DATA , dà i seguenti risultati: # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Num 8 0 2 b Num 8 8 3 c Num 8 16

# Variable Type Len Pos Format ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Num 8 0 SEPARA. 2 b Num 8 8 DATA. 3 c Num 8 16 DOLL.

La sintassi dell’istruzione PICTURE è:

PICTURE name <(format-option(s))> <value-range-set-1 <(picture-1-option(s) )> <...value-range-set-n <(picture-n-option(s))>>>; !

e si usano le seguenti opzioni: To do this! Use this option!Control the attributes of the format! !

! Specify a fuzz factor for matching values to a range! DEFAULT=!

! Specify a fuzz factor for matching values to a range! FUZZ=!

! Specify a maximum length for the format! MAX=!

! Specify a minimum length for the format! MIN=!

! Specify multiple pictures for a given value or range and for overlapping ranges!

MULTILABEL

! Store values or ranges in the order that you define them! NOTSORTED!

! Round the value to the nearest integer before formatting! ROUND!

115

Control the attributes of each picture in the format! !

! Specify a character that completes the formatted value! FILL=!

! Specify a number to multiply the variable's value by before it is formatted! MULTIPLIER=

! Specify that numbers are message characters rather than digit selectors! NOEDIT!

! Specify a character prefix for the formatted value! PREFIX=!

Si possono poi usare degli argomenti; riportiamo quelli usati nell’esempio : DECSEP='character' specifies the separator character for the fractional part of a number.!

Default:! . (a decimal point)!

DIG3SEP='character' specifies the three-digit separator character for a number. Default:! , (a comma)!

PREFIX='prefix' !specifies a character prefix to place in front of the value's first significant digit. You must use zero digit selectors or the prefix will not be used. The picture must be wide enough to contain both the value and the prefix. If the picture is not wide enough to contain both the value and the prefix, the format truncates or omits the prefix. Typical uses for PREFIX= are printing leading dollar signs and minus signs. For example, the PAY. format prints the variable value 25500 as $25,500.00: picture pay low-high='000,009.99' (prefix='$');

Default:! no prefix!

Interaction:! If you use the FILL= and PREFIX= options in the same picture, the format places the prefix and then the fill characters.!

DATATYPE=DATE | TIME | DATETIME !specifies that you can use directives in the picture as a template to format date, time, or datetime values.

con delle direttive delle quali le principali sono : %a! Locale's abbreviated weekday name!%A! Locale's full weekday name!%b! Locale's abbreviated month name!%B! Locale's full month name!%d! Day of the month as a decimal number (1-31), with no leading zero!%H! Hour (24-hour clock) as a decimal number (0-23), with no leading zero!%j! Day of the year as a decimal number (1-366), with no leading zero!%m! Month as a decimal number (1-12), with no leading zero!%M! Minute as a decimal number (0-59), with no leading zero!%y! Year without century as a decimal number (0-99), with no leading zero!%Y! Year with century as a decimal number!%U! Week number of the year (Sunday as the first day of the week) as a decimal number

(0,53), with no leading zero!

116

L5.4. Riassunto: alcuni esempi di cambio di formati A. CAMBIO DI FORMATO IN INPUT 1 - da carattere a numero proc format; invalue fb 'm' =1 'f'=2; data pippo; input a fb. x; datalines; m 55 f 52 m 53 ; run;

Obs a x 1 1 55 2 2 52 3 1 53 The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Num 8 0 2 x Num 8 8

2 - da numero a carattere proc format; invalue $ fc low-53='basso' 53-high='alto'; data pippo; input a $ x $ fc.; length x $ 6; datalines; m 55 f 52 m 53 ; run;

Obs a x 1 m alto 2 f basso 3 m basso The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 0 2 x Char 5 8

117

B. CAMBIO DI FORMATO LEGGENDO I DATI DA UN DSS data pippo2; input a $ x; datalines; mnopqr 55 fghilm 52 mnopqr 53 ; 1-cambio di formato di variabile numerica data pluto; set pippo2; attrib x format= 5.1; Obs a x 1 mnopqr 55.0 2 fghilm 52.0 3 mnopqr 53.0

The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Format ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 2 x Num 8 0 5.1

2-cambio di formato di variabile carattere data pluto2; set pippo2; attrib a format= $3.; Obs a x 1 mno 55 2 fgh 52 3 mno 53

The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Format ƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 $3. 2 x Num 8 0

3-cambio di formato, ma NON DI TIPO...è come una etichetta permanente nella sessione aperta data pluto2; set pippo2; attrib a format= $3.; Obs a x 1 0 alto 2 1 basso 3 0 basso

The CONTENTS Procedure ... ----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Format ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 a Char 8 8 $C. 2 x Num 8 0 P.

118

L5.5. FUNZIONI DI CONVERSIONE DA VARIABILE CARATTERE A NUMERICA E VICEVERSA

PUT Returns a value using a specified format Syntax PUT(source, format.)

"#$%&'()*!source

identifies the SAS variable or constant whose value you want to reformat. The source argument can be character or numeric.

format. contains the SAS format that you want applied to the variable or constant that is specified in the source. To override the default alignment, you can add an alignment specification to a format: - L left aligns the value. - C centers the value. - R right aligns the value. Restriction: The format. must be of the same type as the source, either character or numeric.

Details

The format must be the same type (numeric or character) as the value of source. The result of the PUT function is always a character string. If the source is numeric, the resulting string is right aligned. If the source is character, the result is left aligned. Use PUT to convert a numeric value to a character value. PUT writes (or produces a reformatted result) only while it is executing. To preserve the result, assign it to a variable. Comparisons The PUT statement and the PUT function are similar. The PUT function returns a value using a specified format. You must use an assignment statement to store the value in a variable. The PUT statement writes a value to an external destination (either the SAS log or a destination you specify).

INPUT Returns the value produced when a SAS expression that uses a specified informat expression is read

Syntax INPUT(source, <? | ??>informat.)

"#$%&'()*!source contains the SAS character expression to which you want to apply a specific informat. ? or ??

The optional question mark (?) and double question mark (??) format modifiers suppress the printing of both the error messages and the input lines when invalid data values are read. The ? modifier suppresses the invalid data message. The ?? modifier also supresses the invalid data message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when invalid data are read.

informat. is the SAS informat that you want to apply to the source.

Details The INPUT function enables you to read the value of source by using a specified informat. The informat determines whether the result is numeric or character. Use INPUT to convert character values to numeric values.

119

Comparisons The INPUT function returns the value produced when a SAS expression is read using a specified informat. You must use an assignment statement to store that value in a variable. The INPUT statement uses an informat to read a data value and then optionally stores that value in a variable. Examples

+,-&./'!01!<D(@'#)4($!8%&'#49!:-/%'*!)D!<=-#-9)'#!:-/%'!!

In this example, the first statement converts the values of CC, a numeric variable, into the four-character hexadecimal format, and the second writes the same value that the PUT function returns. cchex=put(cc,hex4.); put cc hex4.;

+,-&./'!61!<D(@'#)4($!<=-#-9)'#!:-/%'*!)D!8%&'#49!:-/%'*!!

This example uses the INPUT function to convert a character value to a numeric value and store it in another variable. The COMMA9. informat reads the value of the SALE variable, stripping the commas. The resulting value, 2115353, is stored in FMTSALE. data testin; input sale $9.; fmtsale=input(sale,comma9.); datalines; 2,115,353 ;

+,-&./'!61!?*4($!E?Q!-(M!78E?Q!W%(9)4D(*!!

In this example, PUT returns a numeric value as a character string. The value 122591 is assigned to the CHARDATE variable. INPUT returns the value of the character string as a SAS date value using a SAS date informat. The value 11681 is stored in the SASDATE variable. numdate=122591; chardate=put(numdate,z6.); sasdate=input(chardate,mmddyy6.);

120

L6. SAS Date, Time, and Datetime Values Definitions SAS date value

is a value that represents the number of days between January 1, 1960, and a specified date. SAS can perform calculations on dates ranging from A.D. 1582 to A.D. 19,900. Dates before January 1, 1960, are negative numbers; dates after are positive numbers. ! SAS date values account for all leap year days, including the leap year day in the year 2000. ! SAS date values can reliably tell you what day of the week a particular day fell on as far back

as September 1752, when the calendar was adjusted by dropping several days. SAS day-of-the-week and length-of-time calculations are accurate in the future to A.D. 19,900.

! Various SAS language elements handle SAS date values: functions, formats and informats. SAS time value

is a value representing the number of seconds since midnight of the current day. SAS time values are between 0 and 86400.

SAS datetime value is a value representing the number of seconds between January 1, 1960 and an hour/minute/second within a specified date.

The following figure shows some dates written in calendar form and as SAS date values. How SAS Converts Calendar Dates to SAS Date Values

Two-Digit and Four-Digit Years SAS software can read two-digit or four-digit year values. If SAS encounters a two-digit year, the YEARCUTOFF= option can be used to specify which century within a 100 year span the two-digit year should be attributed to. For example, YEARCUTOFF=1950 means that two-digit years 50 through 99 correspond to 1950 through 1999, while two-digit years 00 through 49 correspond to 2000 through 2049. Note that while the default value of the YEARCUTOFF= option in Version 8 of the SAS System is 1920, you can adjust the YEARCUTOFF= value in a DATA step to accomodate the range of date values you are working with at the moment. To correctly handle 2-digit years representing dates between 2000 and 2099, you should specify an appropriate YEARCUTOFF= value between 1901 and 2000.

The Year 2000 SAS software treats the year 2000 like any other leap year. If you use two-digit year numbers for dates, you'll probably need to adjust the default setting for the YEARCUTOFF= option to work with date ranges for your data, or switch to four-digit years. The following program changes the YEARCUTOFF= value to 1950. This change means that all two digit dates are now assumed to fall in the 100-year span from 1950 to 2049. options yearcutoff=1950; data _null_; a='26oct02'd; put 'SAS date='a; put 'formatted date='a date9.; run; The PUT statement writes the following lines to the SAS log: SAS date=15639 formated date=26OCT2002

121

Working with SAS Dates and Times

7(3D#&-)*!-(M!WD#&-)*!The SAS System converts date, time and datetime values back and forth between calendar dates and clock times with SAS language elements called formats and informats.

! Formats present a value, recognized by SAS, such as a time or date value, as a calendar date or clock time in a variety of lengths and notations.

! Informats read notations or a value, such as a clock time or a calendar date, which may be in a variety of lengths, and then convert the data to a SAS date, time, or datetime value.

2-)'!-(M!Q4&'!QDD/*!L5!Q-*K!The following table correlates tasks with various SAS System language elements that are available for working with time and date data. To write SAS date values in recognizable forms use this DATE FORMATS (PRINCIPALI)

List Input Result List Input Result

DATEw. 14686 17MAR00 MMDDYYw. 14686 03/17/00

DAYw. 14686 17 MMDDYY10. 14686 03/17/2000

DDMMYYw. 14686 17/03/00 MMDDYYBw. 14686 03 17 00

DDMMYY10. 14686 17/03/2000 MMDDYYB10.w. 14686 03 17 2000

DDMMYYBw. 14686 17 03 00 MMDDYYCw. 14686 03:17:00

DDMMYYB10. 14686 17 03 2000 MMDDYYC10 14686 03:17:2000

DDMMYYCw. 14686 17:03:20 MMDDYYDw. 14686 03-17-00

DDMMYYC10. 14686 17:03:2000 MMDDYYD10. 14686 03-17-2000

DDMMYYDw. 14686 17-03-00 MMDDYYS 14686 03/17/00

DDMMYYD10. 14686 17-03-2000 MMDDYYS10. 14686 03/17/2000

DDMMYYNw. 14686 17MAR00 MMYY.xw. 14686 03M2000

DDMMYYN10 14686 17MAR2000 MMYYCw. 14686 03:2000

DDMMYYPw. 14686 17.03.00 MMYYD. 14686 03-2000

DDMMYYP10. 14686 17.03.2000 MMYYN. 14686 032000

DDMMYYSw. 14686 17/03/00 MMYYP. 14686 03.2000

DDMMYYS10. 14686 17/03/2000 MMYYS. 14686 03/2000

DOWNAME. 14686 Friday MONNAME. 14686 March

EURDFDEw. 14686 17MAR00 MONTH. 14686 3

EURDFDE9. 14686 17MAR2000 MONYY. 14686 MAR2000

EURDFDNw. 14686 5 TIMEw.d 14686 4:04:46

EURDFDWNw. 14686 Friday TIMEAMPMw.d 14686 4:04:46 AM

EURDFMYw. 14686 MAR00 TOD 14686 4:04:46

EURDFDMY7 14686 MAR2000 WEEKDATEw. 14686 Friday, March 17, 2000

EURDFWDXw. 14686 17MAR2000 WEEKDAYw. 14686 6

EURDFMNw. 14686 March WORDDATE.w. 14686 March 17, 2000

EURDFWKXw. 14686 Friday, 17 MAR 2000 Per i giorni della settimana in italiano ITADFDWNw. In francese FRADFDWNw.

122

To read calendar dates as SAS date use this DATE INFORMATS (PRINCIPALI) Note: YEARCUTOFF=1920 List Input Result MMDDYY10. 03172000 14686 DATEw. 17MAR2000 -14534 MONYYw. MAR00 14670 DATE9. 17MAR2000 14686 NENGOw. H.12/03/17 14686 DDMMYYw. 170300 14686 YYMMDDw. 000317 14686 DDMMYY8. 17032000 14686 YYMMDD10. 20000317 14686 MMDDYYw. 031700 14686 YYQw. 00Q1 14610 To do this ... Use this . List Input Result Extract a date from a datetime value

Date functions DATEPART '17MAR00:00:00 'DT 14686

Return today's date as a SAS date Date functions DATE() or TODAY() ( ) SAS date for today Extract calendar dates from SAS Date functions DAY 14686 17 HOUR 14686 4 MINUTE 14686 4 MONTH 14686 3 WEEKDAY 14686 6 YEAR 14686 2000 Write a date as a constant in an expression

SAS date constant 'ddmmmyy'd or 'ddmmmyyyy'

'17mar00'd '17mar2000'd

14686

Time Tasks Write SAS time values as time values

time formats HHMM. 53132 14:46

HOUR. 53132 15 MMSS. 53132 885 TIME. 53132 14:45:32 TOD. 53132 14:45:32 Read time values as SAS time values

Time informats TIME 14:45:32 53132

Return the current time of day as a SAS time value

Time functions TIME( ) ( ) SAS time value at moment of execution in NNNNN.NN

Return the time part of a SAS datetime value

Time functions TIMEPART SAS datetime value in NNNNNNNNNN.N

SAS time value part of date value in NNNNN.NN

Datetime Tasks Write SAS datetime values as datetime values

Datetime formats DATEAMPM 1217083532 26JUL98:02:45 PM

DATETIME 1268870400 17MAR00:00:00 :00

EURDFDT 1217083532 26JUL98:14:45:32 Read datetime values as SAS datetime values

Datetime informats DATETIME 17MAR00:00:00:00 1268870400

Return the current date and time of day as a SAS datetime value

Datetime functions DATETIME() () SAS datetime value at moment of execution in NNNNNNNNNN.N

Interval Tasks Return the number of specified time intervals that lie between the two date or datetime values

Interval functions INTCK week 2 01aug60 01jan01

1055

Advances a date, time, or datetimvalue by a given interval, and returna date, time, or datetime value

Interval functions INTNX day 14086 01jan60

14086

Examples

123

+,-&./'!01!24*./-54($!2-)'Y!Q4&'Y!-(M!2-)')4&'!:-/%'*!-*!G'9D$(4Z-L/'!2-)'*!-(M!Q4&'*!

The following example demonstrates how a value may be displayed as a date, a time, or a datetime. Remember to select the SAS language element that converts a SAS date, time, or datetime value to the intended date, time or datetime format. See the previous tables for examples. Note:

! Time formats count the number of seconds within a day, so the values will be between 0 and 86400.

! DATETIME formats count the number of seconds since January 1, 1960, so for datetimes that are greater than 02JAN1960:00:00:01, (integer of 86401) the datetime value will always be greater than the time value.

! When in doubt, look at the contents of your data set for clues as to which type of value you are dealing with.

This program uses the DATETIME, DATE and TIMEAMPM formats to display the value 86399 to a date and time, a calendar date, and a time. data test; options nodate pageno=1 linesize=80 pagesize=60; Time1=86399; format Time1 datetime.; Date1=86399; format Date1 date.; Time2=86399; format Time2 timeampm.; run; proc print data=test; title 'Same Number, Different SAS Values'; footnote1 'Time1 is a SAS DATETIME value'; footnote2 'Date1 is a SAS DATE value'; footnote3 'Time2 is a SAS TIME value'.; run; Datetime, Date and Time Values for 86399

Same Number, Different SAS Values 1 Obs Time1 Date1 Time2 1 01JAN60:23:59:59 20JUL96 11:59:59 PM Time1 is a SAS DATETIME value Date1 is a SAS DATE value Time2 is a SAS TIME value.

+,-&./'!61!G'-M4($Y![#4)4($Y!-(M!<-/9%/-)4($!2-)'!:-/%'*!

This program reads four regional meeting dates and calculates the dates on which announcements should be mailed. data meeting; options nodate pageno=1 linesize=80 pagesize=60; input region $ mtg : mmddyy8.; sendmail=mtg-45; datalines; N 11-24-99 S 12-28-99 E 12-03-99 W 10-04-99 ; proc print data=meeting; format mtg sendmail date9.; title 'When To Send Announcements'; run; Calculated Date Values: When to Send Mail

124

When To Send Announcements Obs region mtg sendmail 1 N 24NOV1999 10OCT1999 2 S 28DEC1999 13NOV1999 3 E 03DEC1999 19OCT1999 4 W 04OCT1999 20AUG1999

ALCUNE DELLE PRINCIPALI FUNZIONI PER DATE E TEMPI Function Description MINUTE Returns the minute from a SAS time or datetime value

DATDIF Returns the number of days between twodates

MONTH Returns the month from a SAS date value

DATE Returns the current date as a SAS datevalue

QTR Returns the quarter of the year from a SAS date value

DATEPART Extracts the date from a SAS datetimevalue

SECOND Returns the second from a SAS time or datetime value

DATETIME Returns the current date and time of dayas a SAS datetime value

TIME Returns the current time of day

DAY Returns the day of the month from a SASdate value

TIMEPART Extracts a time value from a SAS datetime value

DHMS Returns a SAS datetime value from date,hour, minute, and second

TODAY Returns the current date as a SAS date value

HMS Returns a SAS time value from hour,minute, and second values

WEEKDAY Returns the day of the week from a SAS date value

HOUR Returns the hour from a SAS time ordatetime value

YEAR Returns the year from a SAS date value

MDY Returns a SAS date value from month,day, and year values

YRDIF Returns the difference in years between two dates

DATDIF Returns the number of days between two dates DATDIF(sdate,edate,basis) "#$%&'()*!sdate specifies a SAS date value that identifies the starting date. edate specifies a SAS date value that identifies the ending date. basis identifies a character constant or variable that describes how SAS calculates the date difference:

'30/360' specifies a 30 day month and a 360 day year. Each month is considered to have 30 days, and each year 360 days, regardless of the actual number of days in each month or year. 'ACT/ACT' (or 'Actual')uses the actual number of days between dates.

Examples In the following example, DATDIF returns the actual number of days between two dates, and the number of days based on a 30-month and 360-day year. data _null; sdate='16oct78'd; edate='16feb96'd; actual=datdif(sdate, edate, 'act/act'); days360=datdif(sdate, edate, '30/360'); put actual= days360=;run;

SAS Statements Results!

put actual=; put days360=;!

6332 6240

125

L7. ALCUNE FUNZIONI DI ARROTONDAMENTO ROUND Rounds to the nearest round-off unit Syntax ROUND(argument,round-off-unit)

SAS Statement Results var1=223.456; x=round(var1,1); put x 9.5;

223.00000

var2=223.456; x=round(var2,.01); put x 9.5;

223.46000

x=round(223.456,100);put x 9.5;

200.00000

x=round(223.456); put x 9.5;

223.00000

x=round(223.456,.3); put x 9.5;

223.50000

FLOOR Returns the largest integer that is less than or equal to the argument Syntax FLOOR(argument)

SAS Statements Results var1=2.1; a=floor(var1); put a;

2

var2=-2.4; b=floor(var2); put b;

-3

c=floor(3); put c;

3

d=floor(-1.6); put d;

-2

e=floor(1.-1.e-13); put e;

1

INT Returns the integer value Syntax INT(argument)

SAS Statement Results var1=2.1; x=int(var1); put x=;

2

var2=-2.4; y=int(var2); put y=;

-2

a=int(3); put a=;

3

b=int(-1.6); put b=;

-1

126

L8. ALCUNE FUNZIONI SULLE VARIABILI CARATTERE SUBSTR (right of =) Extracts a substring from an argument Syntax <variable=>SUBSTR(argument,position<,n>)

"#$%&'()*!variable specifies a valid SAS variable name. argument specifies any SAS character expression. position specifies a numeric expression that is the beginning character position. n specifies a numeric expression that is the length of the substring to extract. Interaction: If n is larger than the length of the expression that remains in argument after position,

SAS extracts the remainder of the expression. Tip: If you omit n, SAS extracts the remainder of the expression The SUBSTR function returns a portion of an expression that you specify in argument. The portion begins with the character specified by position and is the number of characters specified by n. A variable that is created by SUBSTR obtains its length from the length of argument.

SAS Statements Results ----+----1----+----2 date='06MAY98'; month=substr(date,3,3);year=substr(date,6,2); put @1 month @5 year;

MAY 98

SUBSTR (left of =) Replaces character value contents Syntax SUBSTR(argument,position<,n>)=characters-to-replace

"#$%&'()*!Argument specifies a character variable. position specifies a numeric expression that is the beginning character position. n specifies a numeric expression that is the length of the substring that will be replaced. Restriction: n can not be larger than the length of the expression that remains in argument after position.. Tip: If you omit n SAS uses all of the characters on the right side of the assignment statement to

replace the values of argumentcharacters-to-replace specifies a character expression that will replace the contents of argument.

Tip: Enclose a literal string of characters in quotation marks

When you use the SUBSTR function on the left side of an assignment statement, SAS places the value of argument with the expression on right side. SUBSTR replaces n characters starting at the character you specify in position.

SAS Statements Results a='KIDNAP'; substr(a,1,3)='CAT'; put a;

CATNAP

b=a; substr(b,4)='TY'; put b;

CATTY

COMPBL Removes multiple blanks from a character string The COMPBL function removes multiple blanks in a character string by translating each occurrence of two or more consecutive blanks into a single blank. The value that the COMPBL function returns has a default length of 200. You can use the LENGTH statement, before calling COMPBL, to set the length of the value.

SAS Statements Results ----+----1----+----2- string='Hey Diddle Diddle'; string=compbl(string); put string; Hey Diddle Diddle

127

TRIM Syntax TRIM(argument) argument specifies any SAS character

expression.

Details TRIM copies a character argument, removes all trailing blanks, and returns the trimmed argument as a result. If the argument is blank, TRIM returns one blank. TRIM is useful for concatenating because concatenation does not remove trailing blanks. Assigning the results of TRIM to a variable does not affect the length of the receiving variable. If the trimmed value is shorter than the length of the receiving variable, SAS pads the value with new blanks as it assigns it to the variable.

Comparisons The TRIM and TRIMN functions are similar. TRIM returns one blank for a blank string. TRIMN returns a null string (zero blanks) for a blank string

Example 1: Removing Trailing Blanks data test; input part1 $ 1-10 part2 $ 11-20; hasblank=part1||part2; noblank=trim(part1)||part2; put hasblank; put noblank; datalines;

Data Line Results

apple sauce ----+----1----+----2

apple sauce

applesauce

+,-&./'!61!<D(9-)'(-)4($!-!T/-(K!<=-#-9)'#!+,.#'**4D(!!SAS Statements Results

X="A"||trim(" ")||"B"; put x; A B

x=" "; y=">"||trim(x)||"<"; put y; > <

UPCASE Converts all letters in an argument to uppercase Syntax UPCASE(argument) argument specifies any SAS character expression.

Details The UPCASE function copies a character argument, converts all lowercase letters to uppercase letters, and returns the altered value as a result.

SAS Statements Results name=upcase('John B. Smith'); put name;

JOHN B. SMITH

SCAN Selects a given word from a character expression Syntax SCAN(argument,n<, delimiters>)

!"#$%&'()*!!argument specifies any character expression. n specifies a numeric expression that

produces the number of the word in the character string you want SCAN to select

Tip: If n is negative, SCAN selects the word in the character string starting from the end of the string. If |n| is greater than the number of words in the character string, SCAN returns a blank value.

delimiters specifies a character expression that produces characters that you want SCAN to use as word separators in the character string.

Default: If you omit delimiters in an ASCII environment, SAS uses the following characters: blank . < ( + & ! $ * ) ; ^ - / , % | Tip: If you represent delimiters as a constant,

enclose delimiters in quotation marks. SAS Statements Results arg='ABC.DEF(X=Y)'; word=scan(arg,3); put word;

X=Y

128

ALTRE PRINCIPALI FUNZIONI PER LE VARIABILI CARATTERE COMPRESS Removes specific characters from a character string

DEQUOTE Removes quotation marks from a character value

INDEX Searches a character expression for a string of characters

INDEXC Searches a character expression for specific characters

INDEXW Searches a character expression for a specified string as a word

LEFT Left aligns a SAS character expression

LENGTH Returns the length of an argument

LOWCASE Converts all letters in an argument to lowercase

MISSING Returns a numeric result that indicates whether the argument contains a missing value

QUOTE Adds double quotation marks to a character value

REPEAT Repeats a character expression

REVERSE Reverses a character expression

RIGHT Right aligns a character expression

SPEDIS Determines the likelihood of two words matching, expressed as the asymmetric spelling distance between the two words

TRANSLATE Replaces specific characters in a character expression

TRANWRD Replaces or removes all occurrences of a word in a character string

TRIMN Removes trailing blanks from character expressions and returns a null string (zero blanks) if the expression is missing

129

M. APPROFONDIMENTI: LE MACRO SAS M1. Introduzione alla programmazione con macro Problema da affrontare 1. Si devono leggere 80 file di dati relativi a rilevazioni in 80 diverse stazioni del Mare di Ross

(Antartide). I file sono in formato testo e il loro nome è: m001.asc ... m009.asc m010.asc ... m080.asc

In ciascun file i dati sono separati da spazi bianchi, una osservazione per riga; la prima osservazione è sulla seconda. Le variabili sono nell’ordine:

- profondità - temperatura - salinità - 4 variabili non di interesse - fluorescenza - 2 variabili non di interesse - densità - 1 variabile non di interesse In ciascun DSS va aggiunta una variabile indicante il numero della stazione di rilevazione (il

numero indicato nel nome del file) PROGRAMMA SAS (per il primo file)

data a.ant1; infile 'C:\ANTARTIDE\m001.asc' firstobs=2; input profond temper salinita x1 x2 x3 x4 fluoresc x5 x6 densita x7; stazione=1; drop x1-x7; run;

Il programma va ripetuto per tutti gli 80 file modificando ciascuna volta il numero del file di input, quello del file di output e il numero della stazione.

2. I DSS così costruiti vanno concatenati uno di seguito all’altro con un passo di data del tipo:

data a.tutti; set a.ANT1 a.ANT2 a.ANT3 a.ANT4 fino al DSS a.ANT80; run;

In situazioni come questa la programmazione tramite macro permette di risolvere in modo rapido il problema. Una macro è una parte di programma SAS con un suo linguaggio e una sua sintassi particolare. Le macro sono estremamente utili in situazioni – come la precedente – in cui è necessario produrre strighe di testo da concatenare ad altre strighe. Per il problema precedente (punto 2) si deve scrivere

A.ANT1 A.ANT2 A.ANT3 A.ANT4 … fino a A.ANT80 in cui la prima parte della stringa A.ANT è uguale e la seconda parte è un numero che varia da 1 a 80.

Le istruzioni del linguaggio delle macro incominciano con il simbolo %

Una macro inizia con l’istruzione %macro <nome>; e termina con l’istruzione %mend <nome>; Una macro che produce le stringhe

A.ANT1 A.ANT2 A.ANT3 A.ANT4 … A.ANT80 è la seguente:

%macro concatena; %do i=1 %to 80;

130

a.ant&i %end; %mend concatena;

Osserviamo che il parametro di macro i, quando viene utilizzato, è indicato con &i, cioè è preceduto dal simbolo &, mentre quando viene dichiarato (nel ciclo do) è scritto senza prefisso. La macro viene richiamata nel programma nel seguente modo:

data a.tutti; set %concatena ; run;

E’ possibile costruire una macro più generale che riceva dall’esterno, quando viene

chiamata, i due numeri di inizio e fine del nome dei DSS (nel caso precedente 1 e 80). In tal caso il nome dei parametri di macro devono essere scritti tra parentesi dopo il nome:

%macro concatena(in, fin); %do i=&in %to &fin; a.ant&i %end; %mend concatena; Abbiamo già visto che i parametri di macro (in questo caso in e fin) quando vengono

utilizzati devono essere preceduti dal simbolo & La macro viene richiamata nel programma nel seguente modo:

data a.tutti; set %concatena(1,80) /* oppure %concatena(4,15) .... */ ; run;

Generalizziamo ulteriormente la macro in modo che possa ricevere dall’esterno, quando viene chiamata, la prima parte della striga del nome del DSS da costruire (nell’esempio precedente A.ANT)

%macro concatena(in, fin, nomeds); %do i=&in %to &fin; &nomeds.&i %end; %mend concatena; Per costruire una stringa formata dalla concatenzione di due parametri di macro (&nomeds e &i),

bisogna che essi siano separati da . (punto) &nomeds.&i

La macro viene richiamata nel programma nel seguente modo: data a.tutti; set %concatena(1,80,a.ant) ; run;

Infine inseriamo nella macro anche la parte di programma relativa alla costruzione del DSS di output, inserendo anche il nome di quest’ultimo fra i parametri di macro.

%macro concatena(nomeds,in,fin,dsout); data &dsout; set %do i=&in %to &fin; &nomeds.&i %end; ; run; %mend concatena;

La sua chiamata è: %concatena(a.ant,4,8,a.finale);

131

Costruiamo ora una macro che semplifichi la lettura dei dati dai file di testo. Inizialmente prendiamo in esemi i file numerati da 1 a 9.

%macro lettura; %do n=1 %to 9; data a.ant&n; infile "C:\ANTARTIDE\m00&n..asc" firstobs=2; input profond temper salinita x1 x2 x3 x4 fluoresc x5 x6 densita x7; stazione=&n; drop x1-x7; run; %end; %mend lettura;

Osserviamo che:

1. dovendo scrivere un parametro di macro all’interno di una stringa delimitata dal simbolo ' (apice) – vedi programma iniziale – bisogna rimpiazzare tale simbolo con il simbolo " (virgolette o doppio apice)

2. i due punti che compaiono in m00&n..asc

si riferiscono il primo alla separazione del parametro di macro dalla stringa successiva e il secondo fa parte del nome del file Anche in questo caso nella macro possono essere messi due parametri che indicano i numeri di

inizio e di fine dei nomi dei file di dati; può essere scritta anche una macro analoga per leggere i file da m010 a m080 (cambia il numero di 0 nel nome) %macro lettura(in,fin); %do n=&in %to &fin; data ant&n; infile "C:\ANTARTIDE\m00&n..asc" firstobs=2; input profond temper salinita x1 x2 x3 x4 fluoresc x5 x6 densita x7; stazione=&n; drop x1-x7; run; %end; %mend lettura; %lettura(1,9);

%macro lettura2(in,fin); %do n=&in %to &fin; data ant&n; infile "C:\ANTARTIDE\m0&n..asc" firstobs=2; input profond temper salinita x1 x2 x3 x4 fluoresc x5 x6 densita x7; stazione=&n; drop x1-x7; run; %end; %mend lettura; %lettura2(10,80)

Le due macro possono essere condensate in una sola utilizzando l’istruzione %if... %then...

%else...; del linguaggio delle macro. %macro lettura(in,fin); %do n=&in %to &fin; data ant&n; %if &n < 10 %then infile "C:\ANTARTIDE\m00&n..asc" firstobs=2; %else infile "C:\ANTARTIDE\m0&n..asc" firstobs=2; ; input profond temper salinita x1 x2 x3 x4 fluoresc x5 x6 densita x7; stazione=&n; drop x1-x7; run; %end; %mend lettura;

Attenzione al ; dopo l’istruzione %if...%then...%else...; La chiamata della macro è: %lettura(1,80);

132

M2. SAS Macro Language: Reference The SAS macro language consists of statements, functions, and automatic macro variables.

Macro Statements

A macro language statement instructs the macro processor to perform an operation. It consists of a string of keywords, SAS names, and special characters and operators, and it ends in a semicolon. Some macro language statements are allowed only in macro definitions, but you can use others anywhere in a SAS session or job, either inside or outside macro definitions (referred to as open code). Macro Language Statements Allowed in Macro Definitions and Open Code lists macro language statements that you can use in both macro definitions and open code.

Macro Language Statements Allowed in Macro Definitions and Open Code Statement Description %* comment designates comment text %DISPLAY displays a macro window %GLOBAL creates macro variables that are available during the execution of an entire SAS session %INPUT supplies values to macro variables during macro execution %KEYDEF assigns a definition to or identifes the definition of a function key %LET creates a macro variable and assigns it a value %MACRO begins a macro definition %PUT writes text or the values of macro variables to the SAS log %SYSCALL invokes a SAS call routine %SYSEXEC issues operating system commands %SYSLPUT defines a new macro variable or modifies the value of an existing macro variable on a

remote host or server %SYSRPUT assigns the value of a macro variable on a remote host to a macro variable on the local host%WINDOW defines customized windows

Macro Language Statements Allowed in Macro Definitions Only

Statement Description %DO begins a %DO group %DO, Iterative executes statements repetitively, based on the value of an index variable %DO %UNTIL executes statements repetively unti la condition is true %DO %WHILE executes statements repetitively while a condition is true %END ends a %DO group %GOTO branches macro processing to the specified label %IF-%THEN/%ELSE conditionally processes a portion of a macro %label: identifies the destination of a %GOTO statement %LOCAL creates macro variable that are available only during the execution of the macro

where they are defined %MEND ends a macro definition

I)-)'&'()*!Q=-)!E'#3D#&!"%)D&-)49!+@-/%-)4D(!Some macro statements perform an operation based on an evaluation of an arithmetic or logical expression. They perform the evaluation by automatically calling the %EVAL function. If you get an error message about a problem with %EVAL when a macro does not use %EVAL explicitly, check for one of these statements. The macro statements that perform automatic evaluation are:

%DO macro-variable=expression %TO expression <%BY expression>;

133

%DO %UNTIL(expression); %DO %WHILE(expression); %IF expression %THEN action;

Macro Functions In general, a macro language function processes one or more arguments and produces a result. You can use all macro functions in both macro definitions and open code. Macro functions include character functions, evaluation functions, and quoting functions.

Macro Functions Function Description %BQUOTE, %NRBQUOTE

mask special characters and mnemonic operators in a resolved value at macro execution.

%EVAL evaluates arithmetic and logical expressions using integer arithmetic. %INDEX returns the position of the first character of a string. %LENGTH returns the length of a string. %QUOTE, %NRQUOTE

mask special characters and mnemonic operators in a resolved value at macro executin. Unmatched quotation marks ('") and parentheses ( () ) must be marked with a preceding %.

%SCAN, %QSCAN search for a warod specified by its number. %QSCAN masks special characters and mnemonic operators in its result.

%STR, %NRSTR mask special characters and mnemonic operators in constant text at macro compilation. Unmatched quotation marks ('") and parentheses ( () ) must be marked with a preceding %.

%SUBSTR, %QSUBSTR

produce a substring of a characater string. %QSUBSTR masks special characters and mnemonic operators in its result.

%SUPERQ masks all special characters and mnemonic operators at macro execution but prevents resolution of the value.

%SYSEVALF evaluates arithmetic and logical expressions using floating point arithmetic. %SYSFUNC, %QSYSFUNC

execute SAS functions or user-written functions. %QSYSFUNC masks special charactaers and mnemonic operators in its result.

%SYSGET returns the value of a specified host environment variable. %SYSPROD reports whether a SAS software product is licensed at the site. %UNQUOTE unmasks all special characters and mnemonic operators for a value. %UPCASE, %QUPCASE

convert characters to uppercase. %QUPCASE masks special characters and mnemonic operators in its result.

Character Functions

Character functions change character strings or provide information about them. Macro Character Functions

Function Description %INDEX returns the position of the first character of a string. %LENGTH returns the length of a string %SCAN, %QSCAN search for a word that is specified by a number. %QSCAN masks special

characters and mnemonic operataors in its result. %SUBSTR, %QSUBSTR

produce a substring of a character string. %QSUBSTR masks special characters and mnemonic operators in its result.

%UPCASE, %QUPCASE

convert characters to uppercase. %QUPCASE masks special charactaers and mnemonic operators in its result.

For macro character functions that have a Q form (for example, %SCAN and %QSCAN), the two functions work alike except that the function beginning with Q masks special characters and mnemonic

134

operators in its result. In general, use the function beginning with Q when an argument has been previously masked with a macro quoting function or when you want the result to be masked (for example, when the result may contain an unmatched quotation mark or parenthesis).

Many macro character functions have names corresponding to SAS character functions and perform similar tasks (such as %SUBSTR and SUBSTR). But, macro functions operate before the DATA step executes. Consider this DATA step: data out.%substr(&sysday,1,3); /* macro function */ set in.weekly (keep=name code sales); length location $4; location=substr(code,1,4); /* SAS function */ run;

Running the program on Monday creates the data set name OUT.MON, as shown: data out.MON; /* macro function */ set in.weekly (keep=name code sales); length location $4; location=substr(code,1,4); /* SAS function */ run;

Suppose that the IN.WEEKLY variable CODE contains the values cary18593 and apex19624. The SAS function SUBSTR operates during DATA step execution and assigns these values to the variable LOCATION, cary and apex.

Evaluation Functions Evaluation functions evaluate arithmetic and logical expressions. They temporarily convert the operands in the argument to numeric values. Then, they perform the operation specified by the operand and convert the result to a character value. The macro processor uses evaluation functions to:

! make character comparisons

! evaluate logical (Boolean) expressions

! assign numeric properties to a token, such as an integer in the argument of a function.

Macro Evaluation Functions

Function Description %EVAL evaluates arithmetic and logical expressions using integer arithmetic %SYSEVALF evaluates arithmetic and logical expressions using floating point

arithmetic

%EVAL is called automatically by the macro processor to evaluate expressions in the arguments to the statements that perform evaluation, listed on Statements That Perform Automatic Evaluation, and in the following functions:

%QSCAN(argument,n<,delimiters>) %QSUBSTR(argument,position<,length>) %SCAN(argument,n<,delimiters>) %SUBSTR(argument,position<,length>)

Quoting Functions

Macro quoting functions mask special characters and mnemonic operators so the macro processor interprets them as text instead of elements of the macro language.

\\\\OO!!

135

Other Functions

Three other macro functions do not fit into the earlier categories, but they provide important information. Macro Quoting Functions lists these functions:

Macro Quoting Functions Function Description %SYSFUNC, %QSYSFUNC

execute SAS language functions or user-written functions within the macro facility.

%SYSGET returns the value of the specified host environment variable. For details, see the SAS Companion for your operating system.

%SYSPROD reports whether a SAS software product is licensed at the site.

……………. Macro Variables: Introduction Macro variables are tools that enable you to dynamically modify the text in a SAS program through symbolic substitution. You can assign large or small amounts of text to macro variables, and after that, you can use that text by simply referencing the variable that contains it.

Macro variable values have a maximum length of 32K characters. The length of a macro variable is determined by the text assigned to it instead of an explicit length declaration. So its length varies with each value it contains. Macro variables contain only character data. However, the macro facility has features that allow a variable to be evaluated as a number when it contains a value that can be interpreted as a number. The value of a macro variable remains constant until it is explicitly changed. Macro variables are independent of SAS data set variables.

Macro variables defined by macro programmers are called user-defined macro variables. Those defined by the SAS System are called automatic macro variables. You can define and use macro variables anywhere in SAS programs, except within data lines.

When a macro variable is defined, the macro processor adds it to one of the program's macro variable symbol tables. When a macro variable is defined in a statement that is outside a macro definition (called open code) or when the variable is created automatically by the SAS System (except SYSPBUFF), the variable is held in the global symbol table, which SAS creates at the beginning of a SAS session. When a macro variable is defined within a macro and is not explicitly defined as global, the variable is typically held in the macro's local symbol table, which SAS creates when the macro starts executing.

When it is in the global symbol table, a macro variable exists for the remainder of the current SAS session. A variable in the global symbol table is called a global macro variable. It has global scope because its value is available to any part of the SAS session.

When it is in a local symbol table, a macro variable exists only during execution of the macro in which it is defined. A variable in a local symbol table is called a local macro variable. It has local scope because its value is available only until the macro stops executing. Chapter 2 contains figures that illustrate a program with a global and a local symbol table.

Using Macro Variables After a macro variable is created, you typically use the variable by referencing it with an ampersand preceding its name (&variable-name), which is called a macro variable reference. These references perform symbolic substitutions when they resolve to their value. You can use these references anywhere in a SAS program. To resolve a macro variable reference that occurs within a literal string, enclose the string in double quotation marks. Macro variable references that are enclosed in single quotation marks are not resolved. Compare the following statements that assign a value to macro variable DSN and use it in a TITLE statement: %let dsn=Newdata;

136

title1 "Contents of Data Set &dsn"; title2 'Contents of Data Set &dsn';

In the first TITLE statement, the macro processor resolves the reference by replacing &DSN with the value of macro variable DSN. In the second TITLE statement, the value for DSN does not replace &DSN. The SAS System sees the following statements: TITLE1 "Contents of Data Set Newdata"; TITLE2 'Contents of Data Set &dsn';

You can refer to a macro variable as many times as you need to in a SAS program. The value remains constant until you change it. For example, this program refers to macro variable DSN twice: %let dsn=Newdata; data temp; set &dsn; if age>=20; run; proc print; title "Subset of Data Set &dsn"; run;

Each time the reference &DSN appears, the macro processor replaces it with Newdata. Thus, the SAS System sees these statements: DATA TEMP; SET NEWDATA; IF AGE>=20; RUN; PROC PRINT; TITLE "Subset of Data Set NewData"; RUN;

Note: If you reference a macro variable that does not exist, a warning message is printed in the SAS log. For example, if macro variable JERRY is misspelled as JERY, the following produces an unexpected result: %let jerry=student; data temp; x="produced by &jery"; run;

This produces the following message: WARNING: Apparent symbolic reference JERY not resolved. Combining Macro Variable References with Text It is often useful to place a macro variable reference next to leading or trailing text (for example, DATA=PERSNL&YR.EMPLOYES, where &YR contains two characters for a year), or to reference adjacent variables (for example, &MONTH&YR). This allows you to reuse the same text in several places or to reuse a program because you can change values for each use. To reuse the same text in several places, you can write a program with macro variable references representing the common elements. You can change all the locations with a single %LET statement, as shown: %let name=sales; data new&name; set save.&name; more SAS statements if units>100; run; After macro variable resolution, the SAS System sees these statements: DATA NEWSALES; SET SAVE.SALES; more SAS statements IF UNITS>100; RUN;

137

Notice that macro variable references do not require the concatenation operator as the DATA step does. The SAS System forms the resulting words automatically.

2'/4&4)4($!H-9#D!:-#4-L/'!8-&'*!U4)=4(!Q',)!

Sometimes when you use a macro variable reference as a prefix, the reference does not resolve as you expect if you simply concatenate it. Instead, you may need to delimit the reference by adding a period to the end of it.

A period immediately following a macro variable reference acts as a delimiter; that is, a period at the end of a reference forces the macro processor to recognize the end of the reference. The period does not appear in the resulting text.

Continuing with the example above, suppose that you need another DATA step that uses the names SALES1, SALES2, and INSALES.TEMP. You might add the following step to the program: /* first attempt to add suffixes--incorrect */ data &name1 &name2; set in&name.temp; run;

After macro variable resolution, the SAS System sees these statements: DATA &NAME1 &NAME2; SET INSALESTEMP; RUN;

None of the macro variable references have resolved as you intended. The macro processor issues warning messages, and the SAS System issues syntax error messages. Why?

Because NAME1 and NAME2 are valid SAS names, the macro processor searches for those macro variables rather than for NAME, and the references pass into the DATA statement without resolution.

In a macro variable reference, the word scanner recognizes that a macro variable name has ended when it encounters a character that is not allowed in a SAS name. However, you can use a period ( . ) as a delimiter for a macro variable reference. For example, to cause the macro processor to recognize the end of the word NAME in this example, use a period as a delimiter between &NAME and the suffix: /* correct version */ data &name.1 &name.2;

The SAS System now sees this statement: DATA SALES1 SALES2;

<#'-)4($!-!E'#4DM!)D!WD//DU!G'*D/@'M!Q',)!

Sometimes you need a period to follow the text resolved by the macro processor. For example, a two-level data set name needs to include a period between the libref and data set name. When the character following a macro variable reference is a period, use two periods. The first is the delimiter for the macro reference, and the second is part of the text. For example, set in&name..temp;

After macro variable resolution, the SAS System sees this statement: SET INSALES.TEMP; You can end any macro variable reference with a delimiter, but the delimiter is necessary only if the characters that follow can be part of a SAS name. For example, both of these TITLE statements are correct: title "&name.--a report"; title "&name--a report";

They produce: TITLE "sales--a report";

138

Forcing a Macro Variable to Be Local At times you need to ensure that the macro processor creates a local macro variable rather than changing the value of an existing macro variable. In this case, use the %LOCAL statement to create the macro variable.

Explicitly make all macro variables created within macros local when you do not need their values after the macro stops executing. Debugging the large macro programs is easier if you minimize the possibility of inadvertently changing a macro variable's value. Also, local macro variables do not exist after their defining macro finishes executing, while global variables exist for the duration of the SAS session; therefore, local variables use less overall storage.

Suppose you want to use the macro NAMELST to create a list of names for a VAR statement, as shown here: %macro namelst(name,number); %do n=1 %to &number; &name&n %end; %mend namelst;

You invoke NAMELST in this program: %let n=North State Industries; proc print; var %namelst(dept,5); title "Quarterly Report for &n"; run;

After macro execution, the SAS compiler sees the following statements: proc print; var dept1 dept2 dept3 dept4 dept5; title "Quarterly Report for 6"; run;

The macro processor changes the value of the global variable N each time it executes the iterative %DO loop. (After the loop stops executing, the value of N is 6, as described in " %DO" in Chapter 13, "Macro Language Dictionary.") To prevent conflicts, use a %LOCAL statement to create a local variable N, as shown here: %macro namels2(name,number); %local n; %do n=1 %to &number; &name&n %end; %mend namels2;

Now execute the same program: %let n=North State Industries; proc print; var %namels2(dept,5); title "Quarterly Report for &n"; run;

The macro processor generates the following statements: proc print; var dept1 dept2 dept3 dept4 dept5; title "Quarterly Report for North State Industries"; run;

Global and Local Variables with the Same Name shows the symbol tables before NAMELS2 executes, while NAMELS2 is executing, and when the macro processor encounters the reference &N in the TITLE statement.

139

Creating Global Macro Variables

The %GLOBAL statement creates a global macro variable if a variable with the same name does not already exist there, regardless of what scope is current. For example, in the macro NAME4, the macro CONDITN contains a %GLOBAL statement that creates the macro variable COND as a global variable: %macro conditn; %global cond; %let old=sales; %let cond=cases>0; %mend conditn;

Here is the rest of the program: %let new=inventry; %macro name4; %let new=report; %let old=warehse; %conditn data &new; set &old; if &cond; run; %mend name4; %name4

Invoking NAME4 generates these statements: data report; set sales; if cases>0; run;

Suppose you want to put the SAS DATA step statements outside NAME4. In this case, all the macro variables must be global for the macro processor to resolve the references. You cannot add OLD to the %GLOBAL statement in CONDITN because the %LET statement in NAME4 has already created OLD as a local variable to NAME4 by the time CONDITN begins to execute. (You cannot use the %GLOBAL statement to make an existing local variable global.)

Thus, to make OLD global, use the %GLOBAL statement before the variable reference appears anywhere else, as shown here in the macro NAME5: %let new=inventry; %macro conditn; %global cond; %let old=sales; %let cond=cases>0; %mend conditn; %macro name5; %global old; %let new=report; %let old=warehse; %conditn %mend name5; %name5 data &new; set &old; if &cond; run;

Now the %LET statement in NAME5 changes the value of the existing global variable OLD rather than creating OLD as a local variable. The SAS compiler sees the following statements: data report; set sales; if cases>0; run;

140

N. COME OPERARE SU MATRICI IN SAS N1. IL MODULO SAS/IML Il calcolo matriciale in SAS può essere effettuato tramite una procedura a cui si accede con il comando:

proc iml; e da cui si esce con il comando:

quit; Le istruzioni possono essere eseguite in modo interattivo o essere inserite in programmi. La procedura è provvista di un linguaggio di programmazione che prevede il trattamento di espressioni aritmetiche e carattere, input e output di dati, controlli sull'esecuzione (if, do, goto,...). Gli elementi di dati fondamentali sono le matrici. Le espressioni usano operatori che si applicano alle intere matrici. La procedura incorpora un vocabolario molto ampio di operatori, funzioni e routine. Non è necessario dichiarare dimensioni, spazi, attributi, ... Ha un solo difetto! Non è immediato il passaggio dalle strutture di dati proprie del linguaggio SAS (Data Set SAS) alle corrispondenti strutture matriciali e viceversa. N2. PER ASSEGNARE DIRETTAMENTE I VALORI A UNA MATRICE (O A UN VETTORE) Esempi:

Per costruire una matrice X di 3 righe e due colonne, X = 135

246

"#

$#

%#%#

&#

'#(#(#,

si usa l'istruzione: x = {1 2, 3 4, 5 6}

I valori sono scritti dentro parentesi graffe {.}, si assegnano per riga, separati da spazi bianchi, le righe sono separate da virgola. Analogamente, per costruire un vettore v di 3 elementi, v = 1 2 3) *, si usa l'istruzione:

v = {1 2 3} I valori sono scritti dentro parentesi graffe, separati da spazi bianchi. Un vettore così definito è un vettore riga. E' possibile usare istruzioni per ripetere alcuni valori. N3. DA DATA SET SAS A MATRICE E VICEVERSA Il modo più usuale per assegnare valori a una matrice è quello di utilizzare i valori contenuti in un Data Set SAS. COME COSTRUIRE UNA MATRICE DA UN DATA SET Per costruire una matrice di nome A con colonne uguali alle variabili e con tutte le osservazioni di un Data Set SAS di nome PIPPO si usano le istruzioni: USE pippo; READ ALL INTO a; L'istruzione USE apre il Data Set (se è omessa viene considerato il Data Set corrente) e l'istruzione READ legge dal Data Set e costruisce la matrice. E' possibile selezionare variabili e/o osservazioni del Data Set. La sintassi (non esaustiva) delle due istruzioni USE e READ è la seguente:

USE DataSet <operando VAR> <WHERE (espressione ! L'operando VAR permette di selezionare alcune variabili del DATA set; con WHERE si selezionano osservazioni.

141

Esempio:

USE pippo VAR {nome indirizzo} WHERE (prov = 'GE'); vengono considerate solo le variabili "nome" e "indirizzo" degli abitanti della provincia di Genova. READ <range> <operando VAR> <operando POINT> <WHERE (espressione)>

INTO nome matrice; L'uso dell'operando VAR e di WHERE è analogo al precedente (le variabili devono essere tutte numeriche o tutte carattere; numeriche è il default). Il range serve per specificare ulteriormente le osservazioni da considerare. <range> = ALL tutte le osservazioni CURRENT l'oss. corrente (default) NEXT l'oss. successiva alla corrente AFTER tutte le successive alla corrente <operando POINT> Esempi di operando POINT: POINT 10 l'osservazione 10 POINT {10 25} le oss. 10 e 25 POINT (20 : 25) le oss. dalla 20 alla 25 POINT ((20 : 25) | | (30 : 35)) le oss. dalla 20 alla 25 e dalla 30 alla 35 Esempi: READ ALL VAR {X Y} INTO MAT; tutte le osservazioni delle variabili X e Y READ POINT 23 INTO MAT; tutte le variabili numeriche dell'osservazione 23 COME COSTRUIRE UN DATA SET DA UNA MATRICE Esempio: Per costruire un Data Set SAS di nome PLUTO da una matrice di nome A con variabili uguali alle colonne di A e con osservazioni le righe di A si usano le istruzioni: CREATE pluto FROM a; APPEND FROM a; L'istruzione CREATE crea il Data Set; in questa forma semplice le variabili hanno nome COL1, COL2, COLn. L'istruzione APPEND aggiunge dati alla fine del Data Set. N4. CALCOLI CON MATRICI (aspetti principali) Nel seguito indicheremo con: - A, B, C, ... le matrici - v i vettori - s gli scalari OPERAZIONI ARITMETICHE Operazioni elemento per elemento M = - A cambia il segno M = A + B somma di matrici M = A + s somma a ogni elemento di A il valore s M = A - B sottrae M = A - s sottrae a ogni elemento di A il valore s M = A / B divide M = A / s divide ogni elemento di A per il valore s

142

M = A # B moltiplica (prodotto di Schur o Hadamard) M = A # v moltiplica ogni elemento sulle righe di A per i corrispondenti elementi di v v deve avere un n° di elementi pari alle righe di A M = A # s moltiplica ogni elemento di A per il valore s M = A ## B gli elelemti di A sono elevati al corrispondente elemento di B (se un valore

di A è < 0 il corrsipondente elemelemto di B deve essere intero) M = A <> B prende il massimo M = A >< B prende il minimo Operazioni matriciali M = A * B prodotto righe per colonne M = A ** s equivale a M = A +#A +#,,,#+#A (s volte) (A deve essere quadrata) M = A ** (-1) equivale a M = INV(A) matrice inversa M = A @ B prodotto di Kronecker o prodotto diretto se: A mat. n x m, B mat. h x k allora M mat. nh x mk s = TRACE(A) somma gli elementi diagonali OPERAZIONI DI CONFRONTO (agiscono elemento per elemento) M = A < B M = A > B M = A = B M = A <= B M = A >= B M = A ^= B M è una matrice di 0 e 1 OPERAZIONI LOGICHE (agiscono elemento per elemento) M = A & B un elemento di M è 1 se i corrispondenti elementi di A e B sono entrambi ! 0 M = A | B un elemento di M è 1 se uno dei due corrispondenti elementi di A e B è ! 0 M = ^A un elemento di M è 1 se il corrispondente elemento di A è = 0 FUNZIONI DI INDAGINE Permettono di controllare se tutti o alcuni elementi di una matrice sono diversi da 0, per trovare il numero di elementi uguali a 0, il numero di righe e colonne, ... FUNZIONI DI RIDUZIONE Permettono di calcolare il massimo, il minimo, la somma e la somma dei quadrati degli elementi di una matrice. FUNZIONI E OPERAZIONI DI MANIPOLAZIONE E RICOSTRUZIONE M = A' oppure M = T(A) trasposta M = DIAG(A) crea una matrice diagonale con gli elementi diagonali di A M = DIAG(v) crea una matrice diagonale con gli elementi di v s = VECDIAG(A) crea un vettore da una matrice diagonale M = I(s) crea la matrice identità s x s Altre funzioni permettono di inserire righe o colonne, di rimuovere elementi, di creare matrici a blocchi diagonali, matrici con valori ripetuti, sottomatrici, concatenare matrici orizzontalmente e verticalmente, ...

143

FUNZIONI E CALL DI ALGEBRA LINEARE s = DET(M) determinante M = INV (A) inversa CALL EIGEN (v,M,A) crea un vettore v con gli autovalori (in ordine decrescente), una matrice M con colonne i corrispondenti autovettori di una matrice quadrata A v = EIGVAL (A) crea una vettore v con gli autovalori di A M = EIGVEC (A) crea una matrice M con colonne gli autovettori di A CALL SVD (U,q,V,A) decompone la matrice A, di dimensioni m x n, con n"m: A = U + diag (q) + V' dove: U'U = V'V = VV' = In U, matrice m x n, autovettori normalizzati di AA' q, vettore n, valori singolari (rad. quadr. AA' e A'A) V, matrice n x n, autovettori di A'A U = ROOT (A) A deve essere simmetrica, definita non negativa crea una matrice triangolare superiore, U'U=A x = SOLVE (A,B) A quadrata e non singolare risolve l'insieme di equazioni lineari A x = B non usare x=INV(A+B)

Documents

Sas guida per l'uso