HD Su Dung Phan Men SAS Bang Tieng Viet

Embed Size (px)

Citation preview

PGS. TS. L QUANG HNG

NG DNG SAS PHN TCH S LIU TH NGHIM

2009Li m u SAS (Statistical Analysis Systems) p dng ngn ng lp trnh phn tch s liu. Ring SAS/STAT bao gm trn 60 phng thc phn tch s liu p dng cho phn tch phng sai, hi qui, phn tch tng hp, v phn tch a bin.

D liu lp trnh trn word x l thng k ca SAS ngn gn, khong 9 hng vi 24 t, c thit k trc v s liu c chuyn trc tip t file word, excel, l dng lu tr s liu thng k ph bin nht. Ngoi ra c th s dng s liu lu tr t file text, file ca SAS phn tch thng k. Cch sp xp bng s liu excel theo ct hay hng, m ha bng s hay tn ging cy trng, tn phng php, x l nhiu ch tiu rt thun tin trong file mu word. Sau khi lp trnh y s liu to file mu (sample), x l bng lnh RUN vi thi gian rt nhanh, ch mt vi giy cho tt c cc cch x l 1 ln nh: phn tch phng sai, xp nhm cc nghim thc ca cc yu t, tnh ma trn tng tc cc yu t, v th Kt qu phn tch c gii thch rt r rng v so snh cc nghim thc v xp nhm (grouping) theo k t A, B cho yu t c hai nghim thc v A, B, C, D, E cho yu t c nhiu nghim thc. Cc gi tr xc sut cho cc yu t n v t hp u th hin r trong bng ANOVA. Quyn sch ny trnh by mt s phng php x l s liu th nghim thng dng trong ngnh nng sinh hc lin quan n khoa hc cy trng, cn c trn cc bi tp mu bao gm cc phng thc x l ANOVA, tng quan, hi qui thc hin cho th nghim ph bin nht. Cc bi tp mu thng k v cc lnh vc khc nh y hc, ha hc, x hi, c hc c th tham kho trong chng trnh ca phn mm SAS (phn Help > Using this windows > Sample SAS Programs and Applications). Ngoi ra SAS c th x l s liu vi nhiu lnh, bt u t thanh cng c vi lnh Solutions > Analysis > Analyst > Open vi file Excel, file SAS> Statistics > ANOVA. Rt mong c s gp quyn sch c s dng thun tin hn. Cc gp xin gi v: PGS.TS L Quang Hng Khoa Nng hc, i hc Nng Lm TP HCM. Lin h E-mail: [email protected] Trn trng, Tc gi Update: 29-7-09, 86 tr. Mc lc Chng 1 PHNG PHP PHN TCH PHNG SAI (ANOVA), XP NHM (GROUPING) NGHIM THC V SO SNH TNG TC (INTERACTION) 1.1. Mc tiu 1.2. Ngun s liu theo di th nghim2

3 3

1.3. To file word mu (sample) 1.4. X l s liu vi SAS 1.5. Gii thch kt qu 1.6. Trnh by kt qu 1.7. Phng thc to file mu cho th nghim hai yu t 1.8. ngha cc t v chuyn i gi tr 1.9. c s (plot size) v lp li (replications) Chng 2 TH NGHIM B TR HON TON NGU NHIN (Completely Randomized Design, CRD) 2.1. Th nghim hon ton ngu nhin mt yu t 2.2. Th nghim hon ton ngu nhin hai yu t Chng 3 TH NGHIM KHI Y NGU NHIN (Randomized Complete Block Design, RCBD) 3.1. Khi y hon ton ngu nhin mt yu t 3.2. Kiu vung la tinh 3.3. Khi y ngu nhin hai yu t 3.4. Th nghim l ph 3.5. Th nghim l sc 3.6. Th nghim ba yu t 3.7. Cc lnh (SAS Code) x l s liu tnh phng sai (ANOVA) thng dng Chng 4 TNH GI TR TRUNG BNH, T-TEST, CHI- BNH PHNG TNG QUAN V HI QUI 4.1. Tnh gi tr trung bnh 4.2. T- test 4.3. Chi-bnh phng 4.4. Ma trn tng quan 4.5. Hi qui tuyn tnh n bin 4.6. Hi qui tuyn tnh a bin 4.7. Hi qui a bin bc hai 4.8. Ti u ha v xc nh im 4.9. th hnh li chiu mt phng ba chiu Ti liu tham kho Chng 1

4 6 8 9 10 17 18

18 22

24 26 28 34 47 51 59

64 66 67 68 71 72 75 77 80 86

PHNG PHP PHN TCH PHNG SAI (ANOVA), XP NHM (GROUPING) NGHIM THC V SO SNH TNG TC (INTERACTION)1.1.

Mc tiu:3

Mc tiu ca phn tch ANOVA (ANalysis Of VAriance) l xc nh cc nghim thc c ngha khi gi tr tnh F nh hn mc xc sut (probability) p < 0,05 hay p < 0,01 l mc thng dng trong nng nghip, sinh hc. Sau cc nghim thc c xp nhm (grouping, SAS, 2004; homogeneous grouping: nhm tng ng (NRCS, 2007) vi cc k t A, B cho hai nghim thc v A, B, C, D, E cho nhiu nghim thc l so snh sai khc v chn c nghim thc ph hp ca th nghim. i vi th nghim nhiu yu t, cn c so snh tng tc (interaction) ca cc yu t. Cc mu bi tp c to ra t file excel v word d s dng v lu s liu dng .doc, .xls, .sas. Ngun s liu theo di th nghim: S liu c thu thp, x l v lu t file excel ty theo kiu b tr th nghim. Th d so snh nng sut (kg/ 20 m2) nm ging ci ngt ln lt l G22, Z15, X31, K14, D25, c th ghi bng s nghim thc l 1, 2 , 3, 4, 5; hoc ghi tn ging; c b tr th nghim kiu khi y hon ton ngu nhin (Randomized Complete Block Design) bn khi (I, II, III, IV). Nm nghim thc th nghim c ghi bng tn ging trong file excel, khi ghi trc, nghim thc ghi sau. S th nghim Chiu bin thin1.2. Hng dc cao

I II

19.00

37.00

210.28

514.94

411.86

214.59 8.23

18.00

514.63

411.99

36.00

III 3 IV 514.90

411.77

215.15

17.00

513.81

19.12

37.40

215.00

48.00

thp

Cch ghi s liu lu trong file excelkhoi 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 nthuc G22 Z15 X31 K14 D25 G22 Z15 X31 K14 D25 G22 Z15 X31 K14 D25 G22 Z15 X31 K14 D25 nsuat 9.00 10.28 7.00 11.86 14.94 8.00 14.59 6.00 11.99 14.63 7.00 15.15 8.23 11.77 13.81 9.12 15.00 7.40 8.00 14.90

phn tch kt qu, cn thc hin: - To file mu word4

-

1.3.

X l vi chng trnh thng k SAS Ghi li bng ANOVA, nu khc bit ca nghim thc mc p < 0,05 hay p < 0,01 th chn xp nhm cho ph hp. Ghi k t vo cc tr trung bnh ca nghim thc xp nhm. Nu p > 0,05 cc nghim thc khng khc nhau (ns, non- significant). Ghi LSD (khc bit c ngha nh nht), xc sut p v CV%.

To file word mu (sample): file mu l file thng dng x l bng chng trnh SAS vi cc lnh (command) ANOVA v xp nhm. File word mu c s dng v x l cho nhiu file v nhiu ch tiu c th mt ln trong SAS. C th s dng file excel to file mu. File word mu gm ba phn: (1) nhp lnh khai bin, (2) nhp s liu t excel (hoc trc tip, t cc file khc) v (3) nhp lnh x l ANOVA v xp nhm. Th nghim kiu khi y ngu nhin n yu t, theo di nng sut ca nm ging ci ngt (kg/ 20 m2), trng trn bn khi. Tng s l 4 x 5 = 20 . Cc lnh x l nh sau: - DATA: tn file, ghi t mt n nhiu ch nh DATA; hay DATA CAI NGOT; - INPUT: chn k hiu cho input, ch ghi mt k t hay mt t, ti a l tm k t. Nu nhiu t cn c gch ni di, hoc xc nh di length$10 (mi k t). Nu dng bng hng ngang c cc bin ni tip, ghi:INPUT T Y@@; Datalines; (thay cho cards;)

* Cch 1: K (Khi), T (nghim thc), Y (nng sut), c cch mt khong hoc du $ nhINPUT K T Y; hay INPUT K $ T $ Y; * Cch 2: ghi thng mt t cho mt bin s: INPUT KHOI NTHUC NSUAT;

- CARDS; lnh nhp s, kt thc bng du ; - S liu excel vi cc s ghi du theo h ngn ng Anh M: 0.5 thay v 0,5 (ting Vit th chng trnh khng x l c). - PROC: PROCEDURE, cch x l, nh ANOVA, GLM, REG, SRREG (hi qui),PROC ANOVA; ring PROC GLM; c s dng kt hp tnh ANOVA v so snh tng tc cc yu t.

- CLASS: xp loi cc bin dng phn tch, gm c khi (K) v nghim thc (T),CLASS K T;

- MODEL: m hnh phn tch nng sut (Y) = khi (K) v nghim thc (T)MODEL Y = K T;

- MEANS: lit k cc gi tr trung bnh nghim thc (T) MEANS T; - LSD ALPHA = 0.01: xp nhm cc gi tr trung bnh nghim thc mc alpha = 0.01. C th chn DUNCAN khi trn nm gi tr trung bnh nghim thc. Alpha chn mc alpha = 0.05 hay alpha = 0.01. Nu ghi LSD; mc nh xp nhm mc p = 0.05. Nu mun chn c hai, ghi ng thi: MEANS T / LSD ALPHA = 0.05;MEANS T / LSD ALPHA = 0.01;

SAS x l c hai, khi xem trung bnh cc nghim thc bng xp nhm v chn mc c ngha p F

R-Square Source K T 0.838208 3 4

Coeff Var DF

15.16212 1.657750 10.93350 Anova SS Mean Square F Value 0.3030850 42.4850450

0.9092550 169.9401800

0.11 0.9524 15.46 0.0001

NANG SUAT THUC THU The ANOVA Procedure t Tests (LSD) for Y NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate. Alpha 0.01 Error Degrees of Freedom 12 Error Mean Square 2.748135 Critical Value of t 3.05454 Least Significant Difference 3.5806 Means with the same letter are not significantly different. t Grouping Mean N T A A B A B B C 14.570 13.755 10.905 4 4 4 D25 Z15 K14

9

D D

C C D

8.280 7.158

4 4

G22 X31

1. 5. Gii thch kt qu: Xem bng ANOVAThe ANOVA Procedure Dependent Variable: Y Sum of Source DF Squares Mean Square F Value Model 7 170.8494350 24.4070621 8.88 Error 12 32.9776200 2.7481350 Corrected Total 19 203.8270550 R-Square 0.838208

Pr > F 0.0006

Source K T

Coeff Var Root MSE Y Mean 15.16212 1.657750 10.93350 DF Anova SS Mean Square F Value Pr > F 3 0.9092550 0.3030850 0.11 0.9524 4 169.9401800 42.4850450 15.46 0.0001

- Nghim thc T c F Value 15,46 vi Pr > F l F 0.0002

12

Error Corrected Total

10

67.3333333 17

6.7333333

752.0000000 Root MSE 2.594867 Y Mean 55.00000 F Value Pr > F

R-Square 0.910461 Source K S P S*P Source K S P S*P

Coeff Var 4.717940 DF

Type I SS

Mean Square 208.6666667 10.6666667 98.0000000 74.0000000 Mean Square 208.6666667 10.6666667 98.0000000 74.0000000

2 2 1 2

417.3333333 21.3333333 98.0000000 148.0000000

30.99 F 30.99 |t| 59.3333333 48.0000000 57.6666667 52.3333333 55.0000000 57.6666667 0.0013 0.8899 0.0301 0.2208 0.8899

S 1 1 2 2 3 3

P 1 2 1 2 1 2

Gii thch: s dng mc xc sut p-value so snh tng tc theo Dunnett test (Adjustment for Multiple Comparisons: Dunnett), khi p < 0,05 th cc gi tr trung bnh bnh phng c nh hng c lp khc nhau, nu p > 0,05 th cc gi tr ny nh hng nh nhau. Phng php so snh Dunnett test cho thy: cc tng tc S1P1, S2P1, S3P1 v S3P2 c nh hng nh nhau n nng sut (p t 0,2208 n 0,8899). Tng tc nh hng c lp l S1P2 (p = 0,0013) v S2P2 (p = 0,0301).2 YEU TO The GLM Procedure Class Level Information Class K SP Levels 3 6 Values

123 S1P1 S1P2 S2P1 S2P2 S3P1 S3P2 18

Number of observations 2 YEU TO The GLM Procedure Dependent Variable: Y Source Model Error Corrected Total R-Square 0.910461 Source K SP 2 5 DF DF 7 10 Sum of Squares 684.6666667 67.3333333 17

Mean Square 97.8095238 6.7333333

F Value 14.53

Pr > F 0.0002

752.0000000 Root MSE 2.594867 Y Mean 55.00000 F Value Pr > F

Coeff Var 4.717940

Type I SS

Mean Square 208.6666667 53.4666667

417.3333333 267.3333333

30.99 F

417.3333333 267.3333333 2 YEU TO

30.99 F 0.0253

97.71428571 Root MSE 1.414214 Y Mean 14.42857 F Value Pr > F

Coeff Var 9.801480

Type I SS 80.04761905 11.26666667 0.40000000 Type III SS 67.60000000 10.00000000 0.40000000

Mean Square 80.04761905 11.26666667 0.40000000 Mean Square 67.60000000 10.00000000 0.40000000

40.02 0.0080 5.63 0.0982 0.20 0.6850 F Value Pr > F

33.80 0.0101 5.00 0.1114 0.20 0.6850

2 YEU TO KHONG CAN DOI The GLM Procedure t Tests (LSD) for Y NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 3 Error Mean Square 2 Critical Value of t 3.18245 Least Significant Difference 3.4374 Harmonic Mean of Cell Sizes 3.428571 NOTE: Cell sizes are not equal.

22

Means with the same letter are not significantly different. t Grouping A B Mean 18.333 11.500 3 4 N A2 A1 A

Gii thch: - Tng t do ca th nghim l n - 1 = 7 - 1 = 6. F test ton th nghim l 15,29 vi xc sut p = 0,0253, chng t c khc bit trong 4 trung bnh nghim thc. - Th nghim cn i cc th nghim thng c bng c lng Type I SS v Type III SS (SS = Sum of Squares, tng bnh phng) bng nhau, nhng trong th nghim khng cn i ny, s dng Type III SS l ph hp. - So snh khc bit mc = 0,05 cho thy khng c tng tc A*B (p = 0,6850), chng t nh hng ca yu t A khng l thuc vo yu t B v ngc li. Cn tnh khc bit tng yu t, trong yu t B khng khc bit (p = 0,1114), yu t A c khc bit (p = 0,0101) mc p < 0,05.

Chng 3 PHN TCH PHNG SAI TH NGHIM KHI Y NGU NHIN (Randomized Complete Block Design, RCBD) 3.1. Khi y hon ton ngu nhin mt yu t y l kiu b tr ph bin nht trong nghin cu nng nghip. p dng cho vic so snh cc ging, loi phn bn trong iu kin t ai, ngoi cnh tng i t ng nht. Thng c chiu bin thin ca hng dc hoc hng nh sng, ph t, pH, cn iu chnh ph hp v kch thc, chiu di . Kiu RCBD gim sai s th nghim, nhng chu nh hng ca khi. Th nghim so snh nng sut ti (kg/ 36m2) ca 6 ging u H Lan trong 4 khi, s dng k t thay tn ging (Barnard, 1994). B tr th nghim theo khi y hon ton ngu nhin, bn ln lp li, su nghim thc. Tng s = 4x6 = 24 (k = khi; t = nghim thc, tn ging; y = nng sut). S th nghim nh sau:Hng dc cao I II III IV f e c e d f d d c c e c e b a a b a b f a d f b Thp

data; input k $ t $ y; cards; 1 f 9 1 d 14.6 1 c 18.3 1 e 14.1 1 b 21.9 1 a 22.4

23

2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4

e f c b a d c d e a b f e d c a f b

14.2 14.1 17.4 25.6 23.9 19.2 12.7 15.8 11.5 21.1 23.7 6.4 12.1 16.1 15.9 19.6 12.3 18.3

; proc anova; class k t; model y = k t; means t /duncan alpha=0.01; title 'Thi nghiem 1 yeu to RCBD'; run;Thi nghiem 1 yeu to RCBD The ANOVA Procedure Class Level Information Class k t Levels 4 6 Values

1234 abcdef 24

Number of observations

Thi nghiem 1 yeu to RCBD The ANOVA Procedure Dependent Variable: y Source Model Error Corrected Total R-Square 0.897505 Source k t 3 5 DF DF 8 15 Sum of Squares 497.3300000 56.7950000 23 Mean Square 62.1662500 3.7863333 F Value 16.42 Pr > F F

Coeff Var 11.66927

Anova SS

Mean Square 17.6316667 88.8870000

52.8950000 444.4350000

4.66 0.0171 23.48 0,05 cng nh tng tc D2V4 (chu k xn c 45 ngy,ging Medina). Tng tc chu k xn c D2 (45 ngy) vi cc ging Jackson, Highlander v San Macros nh hng c lp cng nh chu k xn c D3 (chu k xn c 60 ngy) tng tc vi bn ging c u c p < 0,0001 v trong nng sut cao nht l tng tc D3V2 (chu k xn c 60 ngy, ging Highlander) vi nng sut 11855,7 (kg/). Trnh by kt qu nh sau: Bng 3.2. nh hng ca chu k xn c v ging n nng sut c (kg/) Ging (V) Chu k xn c (D) V1 V2 V3 V4 Trung bnh (Jackson) (Highlander) (San Macros) (Medina) chu k xn c D1 (30 ngy) 6751,0 fg 6789,0 fg 6544,7 g 6515,3 g 6650,0 C D2 (45 ngy) 8808,0 e 9583,0 d 8779,0 e 7282,0 f 8613,0 B D3 (60 ngy) 11337,0 ab 11855,7 a 10500,3 c 10784,3 bc 11119,3 A Trung bnh ging 8965,3 B 9409,2 A 8608,0 B 8193,9 CCc trung bnh cng k t khng khc bit c ngha thng k mc xc sut vi p < 0,05 cho yu t D, yu t V, tng tc D*V; CV = 4,21%

Bi tp s dng tn ging v chu k xn c (NCRS 2007, trang 52), so snh tng tc LSMEANS vi Tukey test.

Chu k xn c: 30da = 30 ngy, Jackson = ging c Lu : GIONG$15. v XENCOGIONG$20. l ghi di tn nghim thc khi c s k t trn 8. Kt qu ging nh phn m ha bng s, c rt gn cho cc phn xp nhm, so snh tng tc Tukey test nh sau:DATA; INPUT KHOI Cards; 1 30da 1 30da 1 30da 1 30da 2 30da 2 30da 2 30da 2 30da 3 30da 3 30da 3 30da XENCO $ GIONG$15. NSUAT XENCOGIONG$20.; Jackson Highlander San Macros Medina Jackson Highlander San Macros Medina Jackson Highlander San Macros 6789 6578 6589 6534 6743 6789 6700 6500 6721 7000 6345 30da Jackson 30da Highlander 30da San Macros 30da Medina 30da Jackson 30da Highlander 30da San Macros 30da Medina 30da Jackson 30da Highlander 30da San Macros

37

3 1 1 1 1 2 2 2 2 3 3 3 3 1 1 1 1 2 2 2 2 3 3 3 3

30da 45da 45da 45da 45da 45da 45da 45da 45da 45da 45da 45da 45da 60da 60da 60da 60da 60da 60da 60da 60da 60da 60da 60da 60da

Medina Jackson Highlander San Macros Medina Jackson Highlander San Macros Medina Jackson Highlander San Macros Medina Jackson Highlander San Macros Medina Jackson Highlander San Macros Medina Jackson Highlander San Macros Medina

6512 8812 9500 7816 6956 8745 9654 8721 6956 8867 9595 9800 7934 11345 11999 10456 10009 11099 11678 10678 10999 11567 11890 10367 11345

30da Medina 45da Jackson 45da Highlander 45da San Macros 45da Medina 45da Jackson 45da Highlander 45da San Macros 45da Medina 45da Jackson 45da Highlander 45da San Macros 45da Medina 60da Jackson 60da Highlander 60da San Macros 60da Medina 60da Jackson 60da Highlander 60da San Macros 60da Medina 60da Jackson 60da Highlander 60da San Macros 60da Medina

; proc glm; class KHOI XENCO GIONG; model NSUAT = KHOI XENCO KHOI*XENCO GIONG XENCO*GIONG; test h=XENCO e=KHOI*XENCO; means XENCO GIONG XENCO*GIONG/lsd alpha=0.05; lsmeans XENCO*GIONG /pdiff adjust=tukey; titleSPLIT PLOT P 52 statistix; run; proc GLM; class KHOI XENCOGIONG; model NSUAT = KHOI XENCOGIONG; means XENCOGIONG /Duncan alpha=0.05; run;SPLIT PLOT P 52 statistix The GLM Procedure Class Level Information Class KHOI XENCO GIONG Levels 3 3 4 Values 123 30da 45da 60da Highlander Jackson Medina San Macros 36

Number of observations SPLIT PLOT P 52 statistix The GLM Procedure

38

Dependent Variable: NSUAT Source Model Error Corrected Total R-Square 0.981922 Source KHOI XENCO KHOI*XENCO GIONG XENCO*GIONG Source KHOI XENCO KHOI*XENCO GIONG XENCO*GIONG DF 2 2 DF 17 18 35 Sum of Squares 133707792.4 2461661.2 Mean Square 7865164.3 136759.0 F Value 57.51 Pr > F F

Coeff Var 4.205193

Type I SS

Mean Square

875333.4 437666.7 3.20 0.0647 120440064.9 60220032.4 440.34