Training Manual for Data Analysis using SAS by Sujai Das (good books to read for women .txt) 📕

Comments: 0

Complain

Excerpt from the book:

SAS (Statistical Analysis System) software is comprehensive software which deals with many
problems related to Statistical analysis, Spreadsheet, Data Creation, Graphics, etc. It is a layered,
multivendor architecture. Regardless of the difference in hardware, operating systems, etc., the
SAS applications look the same and produce the same results. The three components of the SAS
System are Host, Portable Applications and Data. Host provides all the required interfaces
between the SAS system and the operating environment. Functionalities and applications reside
in Portable component and the user supplies the Data. We, in this course will be dealing with the
software related to perform statistical analysis of data.

Read Book

Read free book «Training Manual for Data Analysis using SAS by Sujai Das (good books to read for women .txt) 📕» - read online or download for free at americanlibrarybooks.com

Download in Format:

Author: Sujai Das

Read book online «Training Manual for Data Analysis using SAS by Sujai Das (good books to read for women .txt) 📕». Author - Sujai Das

Go to page:

year rep a b;

MODEL yield = a b a*b / DDFM=KR;

/* DDFM specifies the method for computing the denominator degrees of freedom for the tests of fixed effects resulting from the MODEL*/

RANDOM year rep(year) year*a year*rep*a year*a*b; LSMEANS a b a*b / PDIFF;

STORE spd1;

run;

PROC PLM SOURCE = SPD1; LSMEANS a b a*b/pdiff lines; RUN;

Exercise 3.10: An agricultural field experiment was conducted in 9 treatments using 36 plots arranged in 4 complete blocks and a sample of harvested output from all the 36 plots are to be analysed blockwise by three technicians using three different operations. The data collected is given below:

1. Perform the analysis of the data considering that technicians and operations are crossed with each other and nested in the blocking factor.

2. Perform the analysis by considering the effects of technicians as negligible.

3. Perform the analysis by ignoring the effects of the operations and technicians.

Procedure:

Prepare the data file.

DATA Name;

INPUT BLK TECH OPER TRT OBS; Cards;

. . . .

;

Perform analysis of objective 1 using PROC GLM. The statements are as follows: Proc glm;

Class blk tech oper trt;

Model obs= blk tech (blk) oper(blk) trt/ss2; Lsmeans trt oper(blk)/pdiff;

Run;

Perform analysis of objective 2 using PROC GLM with the additional statements as follows: Proc glm;

Class blk tech oper trt;

Model obs= blk oper(blk) trt/ss2;

run;

Perform analysis of objective 3 using PROC GLM with the additional statements as follows: Proc glm;

Class blk tech oper trt; Model obs = blk trt/ss2; run;

Exercise 3.11: A greenhouse experiment on tobacco mossaic virus was conducted. The experimental unit was a single leaf. Individual plants were found to be contributing significantly to error and hence were taken as one source causing heterogeneity in the experimental material. The position of the leaf within plants was also found to be contributing significantly to the error. Therefore, the three positions of the leaves viz. top, middle and bottom were identified as levels of second factor causing heterogeneity. 7 solutions were applied to leaves of 7 plants and number of lesions produced per leaf was counted. Analyze the data of this experiment.

The figures at the intersections of the plants and leaf position are the solution numbers and the figures in the parenthesis are number of lesions produced per leaf.

Procedure:

Prepare the data file. DATA Name;

INPUT plant posi $ trt count; Cards;

. . . .

;

Perform analysis using PROC GLM. The statements are as follows:

Proc glm;

Class plant posi trt count;

Model count= plant posi trt/ss2; Lsmeans trt/pdiff; Run;

Exercise 3.12: The following data was collected through a pilot sample survey on Hybrid Jowar crop on yield and biometrical characters. The biometrical characters were average Plant Population (PP), average Plant Height (PH), average Number of Green Leaves (NGL) and Yield (kg/plot).

1. Obtain correlation coefficient between each pair of the variables PP, PH, NGL and yield.

2. Fit a multiple linear regression equation by taking yield as dependent variable and biometrical characters as explanatory variables. Print the matrices used in the regression computations.

3. Test the significance of the regression coefficients and also equality of regression coefficients of a) PP and PH b) PH and NGL

4. Obtain the predicted values corresponding to each observation in the data set.

5. Identify the outliers in the data set.

6. Check for the linear relationship among the biometrical characters.

7. Fit the model without intercept.

8. Perform principal component analysis.

88.44

0.9800

5.00

4.080

99.55

0.6450

9.60

2.830

63.99

0.6350

5.60

2.570

101.77

0.2900

8.20

7.420

138.66

0.7200

9.90

2.620

90.22

0.6300

8.40

2.000

76.92

1.2500

7.30

1.990

126.22

0.5800

6.90

1.360

80.36

0.6050

6.80

0.680

150.23

1.1900

8.80

5.360

56.50

0.3550

9.70

2.120

136.00

0.5900

10.20

4.160

144.50

0.6100

9.80

3.120

157.33

0.6050

8.80

2.070

91.99

0.3800

7.70

1.170

121.50

0.5500

7.70

3.620

64.50

0.3200

5.70

0.670

116.00

0.4550

6.80

3.050

77.50

0.7200

11.80

1.700

70.43

0.6250

10.00

1.550

133.77

0.5350

9.30

3.280

89.99

0.4900

9.80

2.690

Procedure: Prepare a data file Data mlr;

Input PP PH NGL Yield; Cards;

. . . .

;

For obtaining correlation coefficient, Use PROC CORR; Proc Corr;

Var PP PH NGL Yield;

run;

For fitting of multiple linear regression equation, use PROC REG

Proc Reg;

Model Yield = PP PH NGL/ p r influence vif collin xpx i; Test 1: Test PP =0; Test 2: Test PH=0;

Test 3: Test NGL=0;

Test 4: Test PP-PH=0; Test 4a: Test PP=PH=0; Test 5: Test PH-NGL=0; Test 5a: Test PH=NGL=0;

Model Yield = PP PH NGL/noint;

run;

Proc reg;

Model Yield = PP PH NGL; Restrict intercept =0;

Run;

For diagnostic plots

Proc Reg plots(unpack)=diagnostics; Model Yield = PP PH NGL;

run;

For variable selection, one can use the following option in model statement:

Selection=stepwise sls=0.10;

For performing principal component analysis, use the following: PROC PRINCOMP;

VAR PP PH NGL YIELD;

run;

Example 3.13: An experiment was conducted at Division of Agricultural Engineering, IARI, New Delhi for studying the capacity of a grader in number of hours when used with three different speeds and two processor settings. The experiment was conducted using a factorial completely randomised design in 3 replications. The treatment combinations and data obtained on capacity of grader in hours given as below:

2265

2280

2278

3040

3028

3040

Experimenter was interested in identifying the best combination of speed and processor setting that gives maximum capacity of the grader in hours.

Solution: This data can be analysed as per procedure of factorial CRD and one can use the following SAS steps for performing the nalysis:

Data ex1a;

Input rep speed proset cgrader;

/*here rep: replication; proset: processor setting and cgrader: capacity of the grader in hours*/ Cards;

1 1 1 1852

1 1 2 1848

1 1 3 1855

. . . .

3 3 1 3040

3 3 2 3028

3 3 3 3040

;

Proc glm data=ex1; Class speed prost;

Model cgrader=speed post speed*post;

Lsmeans speed post speed*post/pdiff adjust=tukey lines; Run;

The above analysis would identify test the significance of main effects of speed and processor setting and their interaction. Through this analysis one can also identify the speed level (averaged over processor setting) {Processor Setting (averaged over speed levels)} at which the capacity of the grader is maximum. The multiple comparisons between means of combinations of speed and processor setting would help in identifying the combination at which capacity of the grader is maximum.

Exercise 3.14: An experiment was conducted with five levels of each of the four fertilizer treatments nitrogen, Phosphorus, Potassium and Zinc. The levels of each of the four factors and yield obtained are as given below. Fit a second order response surface design using the original data. Test the lack of fit of the model. Compute the ridge of maximum and minimum responses. Obtain predicted residual Sum of squares.

11.28

8.44

13.29

7.71

120

8.94

10.9

11.85

120

11.03

120

8.26

120

7.87

12.08

11.06

120

7.98

120

10.43

120

9.78

120

12.59

160

8.57

9.38

120

9.47

7.71

100

8.89

9.18

10.79

8.11

10.14

10.22

10.53

9.5

11.53

11.02

Procedure:

Prepare a data file.

/* yield at different levels of several factors */

title 'yield with factors N P K Zn';

data dose;

input n p k Zn y ; label y = "yield" ;

cards;

. . . . .

;

*Use PROC RSREG.

ods graphics on;

proc rsreg data=dose plots(unpack)=surface(3d);

model y= n p k Zn/ nocode lackfit press;

run;

ods graphics off; *If we do not want surface plots, then we may proc rsreg;

model y= n p k Zn/ nocode lackfit press; Ridge min max;

run;

Exercise 3.15: Fit a second order response surface design to the following data. Take replications as covariate.

Procedure:

Prepare a data file.

/* yield at different levels of several factors */

title 'yield with factors x1 x2';

data respcov;

input fert1 fert2 x1 x2 yield ;

cards;

. . . . .

;

/*Use PROC RSREG.*/ ODS Graphics on;

proc rsreg plots(unpack)=surface(3d);

model yield = rep fert1 fert2/ covar=1 nocode lackfit ; Ridge min max;

run;

ods graphics off;

Exercise 3.16: Following data is related to the length(in cm) of the ear-head of a wheat variety

9.3, 18.8, 10.7, 11.5, 8.2, 9.7, 10.3, 8.6, 11.3, 10.7, 11.2, 9.0, 9.8, 9.3, 10.3, 10, 10.1 9.6, 10.4. Test the data that the median length of ear-head is 9.9 cm.

Procedure:

This may be tested using any of the three tests for location available in Proc Univariate viz. Student’s test, the sign test, and the Wilcoxon signed rank test. All three tests produce a test statistic for the null hypothesis that the mean or median is equal to a given value 0 against the

two-sided alternative that the mean or median is not equal to 0. By default, PROC UNIVARIATE sets the value of 0 to zero. You can use the MU0= option in the PROC UNIVARIATE statement to specify the value of 0. If the data is from a normal population, then we can infer using t-test otherwise non-parametric tests sign test, and the Wilcoxon signed rank test may be used for drawing inferences.

Procedure: data npsign; input length; cards;

9.3

18.8

10.7

11.5

8.2

9.7

10.3

8.6

11.3

10.7

11.2

9.0

9.8

9.3

10.3

10.0

10.1

9.6

10.4

;

PROC UNIVARIATE DATA=npsign MU0=9.9; VAR length;

HISTOGRAM / NOPLOT ;

RUN;

QUIT;

Exercise 3.17: An experiment was conducted with 21 animals to determine if the four different feeds have the same distribution of Weight gains on experimental animals. The feeds 1, 3 and 4 were given to 5 randomly selected animals and feed 2 was given to 6 randomly selected animals. The data obtained is presented in the following table.

Procedure:

data np;

input feed wt;

datalines;

3.35

3.80

3.55

3.36

3.81

3.79

4.10

4.11

3.95

4.25

4.40

4.00

4.50

4.51

4.75

5.00

3.57

3.82

4.09

3.96

3.82

;

PROC NPAR1WAY DATA=np WILCOXON; /*for performing Kruskal-Walis test*/;

VAR wt; CLASS feed; RUN;

Example 3.18: Finney (1971) gave a data representing the effect of a series of doses of carotene (an insecticide) when sprayed on Macrosiphoniella sanborni (some obscure insects). The Table below contains the concentration, the number of insects tested at each dose, the proportion dying and the probit transformation (probit+5) of each of the observed proportions.

Concentratio n (mg/1)

No. of insects (n)

No. of affected (r)

%kill (P)

Log concentration (x)

Empirical probit

10.2

1.01

6.18

7.7

0.89

6.08

5.1

0.71

5.05

3.8

0.58

4.56

2.6

0.41

3.82

Perform the probit analysis on the above data.

Procedure data probit; input con n r; datalines;

10.2 50 44

7.7 49 42

5.1 46 24

3.8 48 16

2.6 50 6

0 49 0

;

ods html;

Proc Probit log10 ;

Model r/n=con/lackfit inversecl; title ('output of probit analysis'); run;

ods html close;

Model Information

Data Set WORK.PROBIT Events Variable r Trials Variable n Number of Observations 5

Number of Events 132

Number of Trials 243

Name of Distribution Normal

Log Likelihood -120.0516414

Number of Observations Read

Number of Observations Used

Number of Events

132

Number of Trials

243

Algorithm converged.

Goodness-of-Fit Tests

Statistic

Value

Pr > ChiSq

Pearson Chi-Square

1.7289

0.6305

L.R. Chi-Square

1.7390

0.6283

Response-Covariate Profile

Response Levels 2

Number of Covariate Values 5

Since the chi-square is small (p > 0.1000), fiducial limits will be calculated using a t value of 1.96

Type III Analysis of Effects

Wald

Effect DF

Chi-Square Pr > ChiSq

Log10(con) 1 77.5920 <.0001

Analysis of Parameter Estimates

Parameter

Estimate

Standard

Error

95% Confidence

Limits

Chi-Square

Pr > ChiSq

Intercept

-2.8875

0.3501

-3.5737 -2.2012

68.01

<.0001

Log10(con)

4.2132

0.4783

3.2757 5.1507

77.59

<.0001

Probit Model in Terms of

Tolerance Distribution

MU SIGMA

0.68533786 0.23734947

Estimated Covariance Matrix for

Tolerance Parameters

MU SIGMA

0.000488

-0.000063

SIGMA

-0.000063

0.000726

Probit Analysis on Log10(con) Probability Log10(con) 95% Fiducial Limits

0.01

0.13318

-0.03783

0.24452

0.02

0.19788

0.04453

0.29830

0.03

0.23893

0.09668

0.33253

0.04

0.26981

0.13584

0.35834

0.05

0.29493

0.16764

0.37940

0.06

0.31631

0.19466

0.39737

0.07

0.33506

0.21832

0.41316

0.08

0.35184

0.23946

0.42733

0.09

0.36711

0.25866

0.44026

0.10

0.38116

0.27631

0.45218

0.15

0.43934

0.34898

0.50192

0.20

0.48558

0.40618

0.54202

0.25

0.52525

0.45467

0.57700

0.30

0.56087

0.49759

0.60904

0.35

0.59388

0.53666

0.63942

0.40

0.62521

0.57295

0.66905

0.45

0.65551

0.60716

0.69861

0.50

0.68534

0.63983

0.72870

0.55

0.71516

0.67142

0.75986

0.60

0.74547

0.70240

0.79265

0.65

0.77679

0.73330

0.82766

0.70

0.80980

0.76480

0.86563

0.75

0.84543

0.79777

0.90761

0.80

0.88510

0.83352

0.95533

0.85

0.93133

0.87427

1.01188

0.90

0.98951

0.92456

1.08401

0.91

1.00357

0.93658

1.10155

0.92

1.01883

0.94960

1.12065

0.93

1.03562

0.96387

1.14170

0.94

1.05436

0.97976

1.16526

0.95

1.07574

0.99783

1.19218

0.96

1.10086

1.01898

1.22388

0.97

1.13174

1.04490

1.26294

0.98

1.17279

1.07924

1.31498

0.99

1.23750

1.13315

1.39721

Probit Analysis on con

Probability con 95% Fiducial Limits

0.01

1.35888

0.91657

1.75599

0.02

1.57718

1.10799

1.98745

0.03

1.73353

1.24935

2.15043

0.04

1.86129

1.36724

2.28215

0.05

1.97212

1.47110

2.39553

0.06

2.07163

1.56554

2.49671

0.07

2.16302

1.65317

2.58917

0.08

2.24825

1.73565

2.67506

0.09

2.32868

1.81410

2.75586

0.10

2.40526

1.88932

2.83257

0.15

2.75005

2.23349

3.17629

0.20

3.05900

2.54788

3.48353

0.25

3.35157

2.84884

3.77571

0.30

3.63808

3.14478

4.06477

0.35

3.92538

3.44084

4.35935

0.40

4.21897

3.74068

4.66710

0.45

4.52389

4.04724

4.99582

0.50

4.84549

4.36343

5.35423

0.55

5.18995

4.69265

5.75260

0.60

5.56506

5.03963

6.20374

0.65

5.98127

5.41132

6.72450

0.70

6.45363

5.81830

7.33883

0.75

7.00531

6.27722

8.08377

0.80

7.67532

6.81590

9.02252

0.85

8.53758

7.48633

10.27723

0.90

9.76143

8.40534

12.13411

0.91

10.08243

8.64132

12.63428

0.92

10.44313

8.90434

13.20233

0.93

10.85466

9.20181

13.85792

0.94

11.33346

9.54469

14.63036

0.95

11.90537

9.95006

15.56609

0.96

12.61427

10.44674

16.74479

0.97

13.54388

11.08927

18.32046

0.98

14.88655

12.00168

20.65263

0.99

17.27807

13.58779

24.95808

Interpretation: The goodness-of-fit tests (p-values = 0.6305, 0.6283) suggest that the distribution and the model fits the data adequately. In this case, the fitting is done on normal equivalent deviate only without adding 5. Therefore, log LD50 or lof ED50 corresponds to the value of Probit=0. Log LD50 is obtained as 0.685338. Therefore, the stress level at which the

50% of the insects will be killed is (100.685338=4.845 mg/l). Similarly the stress level at which

65% of the insects will be killed is (100.776793 = 5.981 mg/l). Although both values are given in the table above.

4. Discussion

We have initiated a link “Analysis of Data” at Design Resources Server (www.iasri.res.in/design) to provide steps of analysis of data generated from designed experiments by using statistical packages like SAS, SPSS, MINITAB, and SYSTAT, MS-

EXCEL etc. For details and live examples one may refer to the link Analysis of data at http://www.iasri.res.in/design/Analysis%20of%20data/Analysis%20of%20Data.html.

How to see SAS/STAT Examples?

One can learn from the examples available at http://support.sas.com/rnd/app/examples/STATexamples.html

How to use HELP?

Help  SAS help and Documentation  Contents  Learning to use SAS  Sample SAS Programs  SAS/STAT …

5. Strengthening Statistical Computing for NARS

NAIP Consortium on Strengthening Statistical Computing for NARS (www.iasri.res.in/sscnars)

targets