Statistics for the Behavioral Sciences Lesson 7 Correlation Roger N. Morrissette, PhD

# Let's test this hypothesis that depression scores are negatively correlated to self-esteem scores. We design our surveys and sample 8 subjects. Their data is presented below. Data for a correlation are always presented in two columns like the data set shown below.  Depression scores are our X data and Self-Esteem Scores are our Y data:

 Depression (X) Self-Esteem (Y) 10 104 12 100 19 98 4 150 25 75 15 105 21 82 7 133

II. Scatterplots (Video Lesson 7 II) (YouTube version)

A scatterplot is a graphical representation of the two sets of data you are comparing. The X-axis plots your first or "X" data, and the Y-axis plots your second or "Y" set of data. The scatterplot can tell you two important things about the relationship between your two variables. First it can show you if you have a weak or strong relationship between your variables. Secondly, it can tell you if your variables are negatively or positively related.

A. Scatterplots can show the strength of the relationship between two variables

1. Weak relationships will have a wide scattering of the plots

2. Strong relationships will have a minimal scattering of the plots

## 1. Positive Correlation

both factors vary in the same direction

as one factor increases, the other increases

## 2. Negative Correlation

both factors vary in opposite directions

as one factor increases, the other decreases

## 3. Zero or Neutral Correlation

the two factors show no relationship to one another

## III. The Pearson Product Moment Correlation (Correlation Coefficient) (Video Lesson 7 III) (YouTube version) (Correlation Calculation - YouTube version) (mp4 version)

The correlation coefficient is a statistic that calculates the actual relationship between two variables. It has a range between -1.00 and +1.00. You can not get a correlation of 1.5. A value of -1.00 would be a perfect (very strong) negative correlation, a value of +1.00 would be a perfect (very strong) positive correlation, and a value of 0.00 would be a (very weak) zero or neutral correlation.  To calculate the correlation coefficient we use the Pearson Product Moment Correlation (r):

The formula reads: r equals. In the numerator: n or number of pairs multiplied by the sum of X and Y then subtract the sum of X times the sum of Y. In the denominator: Take the square root of the final sum of n times the sum of X squared minus the sum of X then squared, then multiply that value by n times the sum of Y squared, then minus the sum of Y then squared.

 Depression (X) Self-Esteem (Y) 10 104 12 100 19 98 4 150 25 75 15 105 21 82 7 133

To calculate the correlation coefficient (r) for the data above we first need to expand the columns just as we did when we calculated standard deviation. If you look at the formula above you will see that we need an X squared column, a Y squared column and an X times Y column. This first step is show below:

 X Y X2 Y2 XY 10 104 100 10816 1040 12 100 144 10000 1200 19 98 361 9604 1862 4 150 16 22500 600 25 75 625 5625 1875 15 105 225 11025 1575 21 82 441 6724 1722 7 133 49 17689 931

The next step is to calculate the sums of our columns:

 X Y X2 Y2 XY 10 104 100 10816 1040 12 100 144 10000 1200 19 98 361 9604 1862 4 150 16 22500 600 25 75 625 5625 1875 15 105 225 11025 1575 21 82 441 6724 1722 7 133 49 17689 931 113 847 1961 93983 10805 n = 8

Now we have all the information we need to solve our equation:

r = (8 x 10805) - (113 x 847) / square root [(8 x 1961) - (113)2] x [(8 x 93983) - (847)2]

r = (86440) - (95711) / square root [(15688) - (12769)] x [(751864) - (717409)]

r = - 9271 / square root [(2919) x (34455)]

r = - 9271 / square root (100574145)

r = - 9271 / 10028.666

r = - 0.9244

Our correlational coefficient is negative and very close to 1.00 which tells us that we have a strong negative relationship between our two variables. If we look at the scatterplot of our data we can see that the scatterplot is in aligned with our correlational coefficient.

## IV. Determining Significance (Video Lesson 7 IV) (YouTube version)

Now that we have calculated our correlation coefficient we need determine how significant it is. There are two ways to determine the significance of a correlation: the first is to calculate the Coefficient of Determination and the second is to use the R Table

## A. The Coefficient of Determination

The coefficient of determination determines how much of the variance of one factor can be explained by the variability of a factor with which it is correlated. To calculate the coefficient of determination we simply square the r value.

Coefficient of Determination = r2

## B. The R Table

The R Table is located in its entirety in Appendix A in the back of the text book. A shortened version is also available at the bottom of this lecture. It starts on page 435 and gives the critical R values based on degrees of freedom of your sample, the level of significance of the statistical test, and whether your hypothesis is one- or two-tailed. These three factors plus the critical R values are represented in the R Table and will be explained one at a time.

1. Degrees of Freedom.

The term degrees of freedom refers to the number of scores within a data set that are free to vary.  In any sample with a fixed mean, the sum of the deviation scores is equal to zero. If your sample has an n  equal to 10. The first 9 scores are free to vary but the 10th score must be a specific value that makes the entire distribution equal to zero. Therefore in a single sample the degrees of freedom would be equal to n - 1. The degrees of freedom for a correlation is slightly different because n equals number of pairs not simply sample size. Therefore, the degrees of freedom for a correlation in n - 2. So to calculate the degrees of freedom you simply take the number of pairs and subtract two. For our data set of depression and self-esteem scores the degrees of freedom are calculated the following way:

df = n -2

df = 8 - 2

df = 6

The R Table shows the degrees of freedom values in the far left column as shown below:

 Levels of Significance for a One-Tailed Test
 0.05 0.025 0.01 0.005
 Levels of Significance for a Two-Tailed Test
 df 0.1 0.05 0.02 0.01 1 0.988 0.997 0.9995 0.9999 2 0.9 0.95 0.98 0.99 3 0.805 0.878 0.934 0.959 4 0.729 0.811 0.882 0.917 5 0.669 0.754 0.833 0.874 6 0.622 0.707 0.789 0.834 7 0.582 0.666 0.75 0.798 8 0.549 0.632 0.716 0.765 9 0.521 0.602 0.685 0.735 10 0.497 0.576 0.658 0.708

The table continues

2. One- or Two-tailed hypotheses.

The number of tails of a hypothesis predict the direction of the hypothesis.  This concept will be discussed in greater detail in chapter 11. For now, you should know that if a correlation hypothesis is simply predicting an effect without predicting either a negative or positive direction of that effect, it is considered a Two-Tailed hypothesis. If the hypothesis is predicting either a negative or positive direction then it is a One-Tailed hypothesis. Since our hypothesis as stated predicts a negative correlation it is a One-Tailed Test. The two levels of hypothesis tests are highlighted below:

 Levels of Significance for a One-Tailed Test
 0.05 0.025 0.01 0.005
 Levels of Significance for a Two-Tailed Test
 df 0.1 0.05 0.02 0.01 1 0.988 0.997 0.9995 0.9999 2 0.9 0.95 0.98 0.99 3 0.805 0.878 0.934 0.959 4 0.729 0.811 0.882 0.917 5 0.669 0.754 0.833 0.874 6 0.622 0.707 0.789 0.834 7 0.582 0.666 0.75 0.798 8 0.549 0.632 0.716 0.765 9 0.521 0.602 0.685 0.735 10 0.497 0.576 0.658 0.708

3. Levels of Significance.

The levels of significance or "p values" will also be discussed in greater detail in chapters 11, 12, and 13. For now you should simply know that a level of significance at .05 is equivalent to p = .05 which means that there is a 95% probability of statistical significance (1.00 - 0.05 = 0.95 or 95%) between your two variables. The .05 value is considered standard in science. Levels of significance that are smaller show greater significance and values that are larger show less significance. This value must be given to you in the problem. For our example let's use a p = .05.  The table below shows the highlighted levels of significance:

 Levels of Significance for a One-Tailed Test
 0.05 0.025 0.01 0.005
 Levels of Significance for a Two-Tailed Test
 df 0.1 0.05 0.02 0.01 1 0.988 0.997 0.9995 0.9999 2 0.9 0.95 0.98 0.99 3 0.805 0.878 0.934 0.959 4 0.729 0.811 0.882 0.917 5 0.669 0.754 0.833 0.874 6 0.622 0.707 0.789 0.834 7 0.582 0.666 0.75 0.798 8 0.549 0.632 0.716 0.765 9 0.521 0.602 0.685 0.735 10 0.497 0.576 0.658 0.708

4. Critical R Values.

Critical values are threshold values for significance. Your calculated r value must exceed the critical r value in the R Table to be considered significant. The table below shows the highlighted critical r values:

 Levels of Significance for a One-Tailed Test
 0.05 0.025 0.01 0.005
 Levels of Significance for a Two-Tailed Test
 df 0.1 0.05 0.02 0.01 1 0.988 0.997 0.9995 0.9999 2 0.9 0.95 0.98 0.99 3 0.805 0.878 0.934 0.959 4 0.729 0.811 0.882 0.917 5 0.669 0.754 0.833 0.874 6 0.622 0.707 0.789 0.834 7 0.582 0.666 0.75 0.798 8 0.549 0.632 0.716 0.765 9 0.521 0.602 0.685 0.735 10 0.497 0.576 0.658 0.708

Now let's put it all together. The table below shows the criteria of our example to determine if our calculated r value of is significant:

 Levels of Significance for a One-Tailed Test
 0.05 0.025 0.01 0.005
 Levels of Significance for a Two-Tailed Test
 df 0.1 0.05 0.02 0.01 1 0.988 0.997 0.9995 0.9999 2 0.9 0.95 0.98 0.99 3 0.805 0.878 0.934 0.959 4 0.729 0.811 0.882 0.917 5 0.669 0.754 0.833 0.874 6 0.622 0.707 0.789 0.834 7 0.582 0.666 0.75 0.798 8 0.549 0.632 0.716 0.765 9 0.521 0.602 0.685 0.735 10 0.497 0.576 0.658 0.708

According to Table R, For a One-Tailed test at p = .05 with 6 degrees of freedom the critical value we must exceed to consider our calculated r value to be significant is .622.

Since our calculated r = -0.9244

We conclude that our correlation is significant.

*Note that the final cumulative percent score should equal 100%