W. P. Carey School of Business
   

Search:   
   W. P. Carey MBA  |  Undergraduate  |  Executive Ed  |  Departments  |  Directory  |  News  |  Calendars  |  Tools & Resources  |  ASU
Information Technology
Research Computing Tip
Summing with Missing Values under SAS

The Missing Values Problem

When summing a series of variables, it is often desirable to have any missing values treated as zero. However, this can be problematic if a standard arithmetic statement is used. Consider the following statement:

totalhours = totalhours + laborhours;

If totalhours and labor hours contain valid values, totalhours will contain the sum of totalhours and laborhours. However, if laborhours contains a missing value, totalhours will be set to a missing value regardless of its previous value. If the user wants missing values to be treated as zero, it is necessary to take a different approach. There are two alternative methods of summing variables that address this problem: the sum function and the sum statement.

The Sum Function

The SAS sum function takes the form of:

Variable1=Sum(variable2,variable3,...);

The sum function requires that at least two variables are specified. Variable1 is set to the total of all of the variables within the sum() statement with any variables containing missing values treated as zero. Thus, given the following:

Totalhours=0;

Laborhours=1;

Vacation=2;

Sickleave=.;

Totalhours=sum(laborhours,vacation,sickleave);

The resulting value of totalhours would be 3.

The Sum Statement

The SAS sum statement takes the form of:

Variable1 + expression;

This code assigns the summed value of variable1 and the expression to variable1 and flags variable1 to be retained. If the value of the expression is missing, it is treated a zero for the purposes of the calculation. In addition, variable1 becomes a rolling sum that is retained through each iteration of the datastep. For example, the statement above could be written:

totalhours + laborhours;

The value of totalhours is set to the sum of totalhours and laborhours with any missing values for laborhours treated as zero. With each subsequent iteration of the datastep, the value of totalhours is increased by the value of laborhours for the current observation.

Both of these summation methods solve the missing value problem. However, the choice between the two methods depends upon the purpose of the calculation. The sum function is best suited to creating a tally of a series of variables within a given record. On the other hand, the sum statement is particularly efficient for calculating a rolling sum across records. Also, the sum statement can be used to create a counter variable within a SAS program or to increment a variable as part of a loop.

 

  Web Page Feedback © 2008 Arizona Board of Regents Privacy Statement