PROC FREQ with Sample Explanation in SAS

shreyansh

Member
Staff member
This tutorial explains how to use PROC FREQ in SAS, along with examples.

What PROC FREQ does ?

PROC FREQ is a procedure to summarize categorical variables in SAS. It computes count/frequency and cumulative frequency of categories of a categorical variable. Besides such kind of summary, it can also be used to produce bar charts and tests for association between two categorical variables.
Create a Sample Dataset

Below is a sample SAS program that creates a sample SAS dataset, which will be used to explain examples in this tutorial.

data intellectsage;
input A B $ C;
cards;
8 X 30
8 X 50
1 X 110
3 Y 20
4 Y 37
6 Z 51
1 Z 23
8 Z 45
;
run;

Output:
Intellectsage_proc_freq.png

Example 1 : To verify the distribution of a nominal categorical variable(Character)

Assume you want to observe the frequency distribution of variable 'B'.

proc freq data = intellectsage;
tables B;
run;

The TABLES statement instructs SAS to produce n-way frequency and cross-tabulation tables and calculates the statistics for those tables.

Output:
Intellectsage_proc_freq1.png

It answers the question of which category holds the maximum number of cases. Here, the category 'Z' contains the maximum number of values.

Example 2 : To exclude unwanted statistics in the table

Assume that you want to exclude the cumulative frequency and cumulative percent from the table. NOCUM tells SAS to not to return cumulative scores.

proc freq data = intellectsage;
tables B /nocum;
run;

Output:
Intellectsage_proc_freq2.png

If you wish only for frequency and not percent distribution and cumulative statistics.

proc freq data = intellectsage;
tables B /nopercent nocum;
run;

Output:
Intellectsage_proc_freq3.png

Example 3 : Cross-Tabulation ( 2*2 Table)

Assume you want to see how the variable 'B' is distributed by variable 'A'.

proc freq data = intellectsage;
tables B * A;
run;

Output:
Intellectsage_proc_freq4.png

Example 4 : Display Table in List Format

Suppose you have no interest in the display of output in tabular format and instead you need the final analysis to be displayed in list format (See the image below)

proc freq data = intellectsage;
tables B * A / list;
run;

Output:
Intellectsage_proc_freq5.png

The forward slash followed by the keyword LIST produces the table in a list styled format.

Example 5 : Suppressing the unwanted statistics in cross tabulation

proc freq data = intellectsage;
tables B * A / norow nocol nopercent;
run;

Output:
Intellectsage_proc_freq6.png

The NOROW option suppresses row percentage in cross-tabulation. Likewise, NOCOL option suppresses column percentage.

Example 6 : Get Several Crosstab

Suppose you want to get more than one cross-tabulation. To achieve this, you can use the command given below

proc freq data = intellectsage;
tables B * (A C) / norow nocol nopercent;
run;

The command tables B*(A C); is essentially identical to the command tables B*A B*C; In this case, it produces two tables - B by A and B by C.

Output:
Intellectsage_proc_freq7.png

Example 7 : Count of Unique Values

NLEVELS option is used to calculate the number of unique values in a variable.

proc freq data = intellectsage nlevels;
tables B;
run;

Output:
Intellectsage_proc_freq8.png

Example 8 : Use WEIGHT Statement

Use the WEIGHT statement when counts are already available. It makes PROC FREQ use count data to produce frequency and cross-tabulation tables.

Data intellectsage;
input present $ postof $ counts;
cards;
Y Y 35
Y N 14
N Y 42
N N 26
;
run;

proc freq data=intellectsage;
tables present*postof;
weight counts;
run;

Output:
Intellectsage_proc_freq9.png
 
Example 9 : Save result in a SAS dataset

Assume that you want to store the result in a SAS dataset instead of viewing in result window.

proc freq data = intellectsage noprint;
tables B *A / out = temp;
run;

Output:
Intellectsage_proc_freq10.png

OUT Option. The OUT option saves the result in a data file. NOPRINT option prevents SAS to print it in results window.

Example 10 : Perform Chi-Square Test

The CHISQ option applies chi-square tests of homogeneity or independence as well as measures of association between two categorical variables. Also it identifies the categories that have a statistically significant categorical variables which we ought to use in the predictive model. All the categories with chi-square value below or equal to 0.05 are retained.

proc freq data = intellectsage noprint;
tables B * A/chisq;
output All out=intellectsage_chi chisq;
run;

Example 11 : Create Bar Chart and Dot Plot

Bar chart can be created with PROC FREQ. To create a bar chart for variable 'y', we add plots=freqplot (type=bar). It will show frequency in graph by default. To add percent, you need to add scale=percent. We also use ODS graphics ON statement because SAS will be creating the graphs later. Then we turn it off.

Ods graphics on;
Proc freq data=intellectsage order=freq;
Tables B/ plots=freqplot (type=bar scale=percent);
Run;
Ods graphics off;

Output:
Intellectsage_proc_freq11.png

Similarly, we can create dot plot by adding type=dot. See the implementation below
Ods graphics on;
Proc freq data=intellectsage order=freq;
Tables B/ plots=freqplot (type=dot);
Run;
Ods graphics off;

Output:
Intellectsage_proc_freq12.png

Example 12 : Including Missing Values in Analysis


By default, PROC FREQ does not include missing values when computing percent and cumulative percent. The number of missing values are reported separately (below the table). Refer the image below.

Proc freq data=sashelp.heart;
Tables deathcause;
Run;

Output:
Intellectsage_proc_freq13.png

By using MISSING option, it includes missing value as a separate category and all the corresponding statistics are produced based on it.

Proc freq data=sashelp.heart;
Tables deathcause / missing;
Run;

Output:
Intellectsage_proc_freq14.png

Example 13 : Ordering / Sorting

In PROC FREQ, categories of a character variable are ordered alphabetically by default. Categories for a numeric variable are ordered smallest to largest value.

To sort categories in descending order by frequency, that is largest to smallest count, use the ORDER=FREQ option.

Proc freq data=sashelp.heart order = FREQ;
Tables deathcause / missing;
Run;

Output:
Intellectsage_proc_freq15.png

In general, a nominal variable should be shown with distribution of the variable after sorting categories by frequency. For an ordinal variable, it should be shown on the basis of level of categories.

The order option can be used to reorder categories based on a particular FORMAT.
 
Back
Top