Statistics for the Behavioral Sciences Lesson 2 Frequency Distribution Roger N. Morrissette, PhD

I. Ranked Distributions (Video Lesson 2 I) (YouTube version)

Frequency distributions organize Raw Data or numbers that have been collected. The first step in the process of organizing your newly collected raw data is to generate a Ranked Distribution. Ranked Distributions simply rank order all of the numbers of your raw data.

The following are examples scores for 20 students on the first examination:

51   98   55   71   87   82   83   55   90   65   76   90   71   82   97   67   99   71   88   59

To create a Ranked Distribution rearrange the data from the highest number to the lowest

The following are the same scores as above but in a ranked distribution:

99   98   97   90   90   88   87   83   82   82   76   71   71   71   67   65   59   55   55   51

II. Frequency Distributions (Video Lesson 2 II) (YouTube version)

Simple Frequency Distributions are created by listing all the possible score values in any distribution and then indicating the frequency (how often each score occurs). Frequency Distributions are useful only if they simplify the data. The table below shows the raw data fro the above example in a Frequency Distribution:

 Grade Score Frequency 99 1 98 1 97 1 A 90 2 88 1 87 1 83 1 B 82 2 76 1 C 71 3 67 1 D 65 1 F 59 1 55 2 51 1 Total 20

Notice that there are 2 scores of 90 and 3 scores of 71 in the original data set above. These values are represented as a frequency of 2 and 3 respectively in the Frequency Distribution. Notice also that the total frequency at the bottom (20) is the same number as the number of raw data data points you have.

III. Grouped Frequency Distributions (Video Lesson 2 III) (YouTube version)

When you have a high amount of unique scores you should generate a Grouped Frequency Distribution. In a grouped frequency distribution raw data are combined into equalized groups called class intervals. The grouped frequency distribution gives you the whole picture at a glance.

 Grade Group Frequency A 90-99 5 B 80-89 5 C 70-79 4 D 60-69 2 F 50-59 4 Total 20

Constructing Class Intervals

Construction of your class intervals is largely dependent on the type of data you are working with. When dealing with grade data as above then Class Interval sizes of 10 and a total of 5 for Number of Class Intervals works best. For most data there are several different ways that you could construct your class intervals and no one is necessarily better than another. There are some general rules about class intervals that make the data easier to understand. Good Class Interval size numbers are multiples of 2, or 5. Generally speaking, interval sizes should be between 10 and 20. Less than 10 results in loss of information about the original data and more than 20 is difficult to comprehend. The Number of Class Intervals should reside somewhere between 5 and 20. You can calculate the number of intervals and interval size that would be best for any set of data.

To determine the number of intervals needed you first need to compute the range of your data:

Range = high score - low score

For our original data:

Range = 99-51

Range = 48

The second step is to select interval size (i). Let's say you select an interval size of 5. Use the formula below to calculate the number of intervals you should use.

number of intervals ~ range/i (interval size)

For our original data:

number of intervals ~ 48/5

number of intervals ~ 9.6 rounded up to 10 (Note: we always round the number of intervals up so we make sure to include all of our data)

So we can use 10 intervals with a class interval size of 5 to represent our data:

 Interval Frequency 95-99 3 90-94 2 85-89 2 80-84 3 75-79 1 70-74 3 65-69 2 60-64 0 55-59 3 50-54 1 Total 20

Each class interval is represented by a lower limit (e.g., 95 for the top interval) and an upper limit (e.g., 99 for the top interval). It is usually best to establish a lower limit that is a multiple of the interval size. This makes the table easier to understand. Once the intervals are complete you simply count the number of the data points (or frequency) that fits within each class interval.

To calculate the interval size (i) that would be best for any set of data you first need to compute the range of your data:

Range = high score - low score

For our original data:

Range = 99-51

Range = 48

The second step is to select the number of intervals you would use. Let's say you select 10 intervals. Use the formula below to calculate the interval size you should use:

(i) (interval size) ~ range/number of intervals

For our original data:

(i) (interval size) ~  48/10

(i) (interval size)4.8 rounded up to 5

IV. Apparent Limits and Real Limits (Video Lesson 2 IV) (YouTube version)

Apparent Limts are the same units as the original data while Real Limits are the lower apparent limit minus 0.5 and the upper limit plus 0.5. Notice the difference in the table below:

 RealLimits ApparentLimits Frequency 94.5-99.5 95-99 3 89.5-94.5 90-94 2 84.5-89.5 85-89 2 79.5-84.5 80-84 3 74.5-79.5 75-79 1 69.5-74.5 70-74 3 64.5-69.5 65-69 2 59.5-64.5 60-64 0 54.5-59.5 55-59 3 49.5-54.5 50-54 1 Total 20

V. Midpoints (Video Lesson 2 V) (YouTube version)

The Midpoint is the exact center of an interval. When the interval size is odd the midpoints will be whole numbers. When the interval size is even the midpoints will end in .5. The midpoint is calculated with the formula below:

Midpoint = (lower limit + upper limit) / 2

 RealLimits ApparentLimits Midpoints Frequency 94.5-99.5 95-99 97 3 89.5-94.5 90-94 92 2 84.5-89.5 85-89 87 2 79.5-84.5 80-84 82 3 74.5-79.5 75-79 77 1 69.5-74.5 70-74 72 3 64.5-69.5 65-69 67 2 59.5-64.5 60-64 62 0 54.5-59.5 55-59 57 3 49.5-54.5 50-54 52 1 Total 20

VI. Cumulative Frequency (Video Lesson 2 VI) (YouTube version)

If frequency is the total number of scores that fall within a class interval, then Cumulative Frequency is the total number of scores that fall below the upper real limit of an interval. This is useful when you need to know how many scores fall below a particular score. The easiest way to calculate cumulative frequency is to start at the bottom interval and add the Frequency scores as you move up the table. This technique and the final outcome are shown in the table below:

 RealLimits ApparentLimits Midpoints Frequency Calculation Cumulative Frequency 94.5-99.5 95-99 97 3 17 + 3 = 20* 89.5-94.5 90-94 92 2 15 + 2 = 17 84.5-89.5 85-89 87 2 13 + 2 = 15 79.5-84.5 80-84 82 3 10 + 3 = 13 74.5-79.5 75-79 77 1 9 + 1 = 10 69.5-74.5 70-74 72 3 6 + 3 = 9 64.5-69.5 65-69 67 2 4 + 2 = 6 59.5-64.5 60-64 62 0 4 + 0 = 4 54.5-59.5 55-59 57 3 1 + 3 = 4 49.5-54.5 50-54 52 1 1 = 1 Total 20*

*Note that the final cumulative frequency score should equal the total frequency score.

VII. Relative Frequency (Video Lesson 2 VII) (YouTube version)

Relative Frequency is used if you want to compare the frequencies of one distribution with another when the total number of data points is different. Relative Frequency is the proportion of scores from the distribution that fall within the real limits of an interval. This is similar to a percentage of scores where the percentage is the proportion multiplied by 100. The Relative Frequency is computed by dividing the frequency in the interval by the Total Frequency or total number of scores (n):

Relative Frequency = frequency / n (Total Frequency)

 RealLimits ApparentLimits Midpoints Frequency Cumulative Frequency Relative Frequency 94.5-99.5 95-99 97 3 20 0.15 89.5-94.5 90-94 92 2 17 0.10 84.5-89.5 85-89 87 2 15 0.10 79.5-84.5 80-84 82 3 13 0.15 74.5-79.5 75-79 77 1 10 0.05 69.5-74.5 70-74 72 3 9 0.15 64.5-69.5 65-69 67 2 6 0.10 59.5-64.5 60-64 62 0 4 0.00 54.5-59.5 55-59 57 3 4 0.15 49.5-54.5 50-54 52 1 1 0.05 Total 20 1.00*

*Note that the sum of the relative frequency should equal 1.00 (or very close to 1.00 if you have to round your relative frequency values).

VIII. Cumulative Relative Frequency (Video Lesson 2 VIII) (YouTube version)

Cumulative Relative Frequency is the total proportion of Relative Frequency scores that lie below the real upper limit of the interval. The easiest way to calculate Cumulative Relative Frequency is to start at the bottom interval and add the Relative Frequency scores as you move up the table as we did with Cumulative Frequency. This final outcome is shown in the table below:

 RealLimits ApparentLimits Midpoints Frequency Cumulative Frequency Relative Frequency Cumulative RelativeFrequency 94.5-99.5 95-99 97 3 20 0.15 1.00* 89.5-94.5 90-94 92 2 17 0.10 0.85 84.5-89.5 85-89 87 2 15 0.10 0.75 79.5-84.5 80-84 82 3 13 0.15 0.65 74.5-79.5 75-79 77 1 10 0.05 0.50 69.5-74.5 70-74 72 3 9 0.15 0.45 64.5-69.5 65-69 67 2 6 0.10 0.30 59.5-64.5 60-64 62 0 4 0.00 0.20 54.5-59.5 55-59 57 3 4 0.15 0.20 49.5-54.5 50-54 52 1 1 0.05 0.05 Total 20 1.00*

*Note that the final cumulative relative frequency score should equal the total relative frequency score.

IX. Cumulative Percent (Video Lesson 2 IX) (YouTube version) (Calculating a Grouped Frequency Distribution Table) (mp4 version)

Cumulative Percent is simply the Cumulative Relative Frequency multiplied by 100.  The Cumulative Percent is shown in the table below:

 RealLimits ApparentLimits Midpoints Frequency Cumulative Frequency Relative Frequency Cumulative RelativeFrequency Cumulative Percent 94.5-99.5 95-99 97 3 20 0.15 1.00 100* 89.5-94.5 90-94 92 2 17 0.10 0.85 85 84.5-89.5 85-89 87 2 15 0.10 0.75 75 79.5-84.5 80-84 82 3 13 0.15 0.65 65 74.5-79.5 75-79 77 1 10 0.05 0.50 50 69.5-74.5 70-74 72 3 9 0.15 0.45 45 64.5-69.5 65-69 67 2 6 0.10 0.30 30 59.5-64.5 60-64 62 0 4 0.00 0.20 20 54.5-59.5 55-59 57 3 4 0.15 0.20 20 49.5-54.5 50-54 52 1 1 0.05 0.05 5 Total 20

*Note that the final cumulative percent score should equal 100%.

Additional Videos on the Concepts that might help:

Frequency Distribution Table

How to Create a Grouped Frequency Table

Making a Frequency Distribution Table

Frequency Tables

Create a Frequency Table and Chart