Degrees of freedom statistics (DOF)

Wednesday, November 12th, 2014

A few links from various sources to explain Degrees of freedom (DF) in statistics.

Let’s start with a simple explanation of degrees of freedom. I will describe how to calculate degrees of freedom in an F-test (ANOVA) without much statistical terminology. When reporting an ANOVA, between the brackets you write down degrees of freedom 1 (df1) and degrees of freedom 2 (df2), like this: “F(df1, df2) = …”. Df1 and df2 refer to different things, but can be understood the same following way.

Imagine a set of three numbers, pick any number you want. For instance, it could be the set [1, 6, 5]. Calculating the mean for those numbers is easy: (1 + 6 + 5) / 3 = 4.

Now, imagine a set of three numbers, whose mean is 3. There are lots of sets of three numbers with a mean of 3, but for any set the bottom line is this: you can freely pick the first two numbers, any number at all, but the third (last) number is out of your hands as soon as you picked the first two. Say our first two numbers are the same as in the previous set, 1 and 6, giving us a set of two freely picked numbers, and one number that we still need to choose, x: [1, 6, x]. For this set to have a mean of 3, we don’t have anything to choose about x. X has to be 2, because (1 + 6 + 2) / 3 is the only way to get to 3. So, the first two values were free for you to choose, the last value is set accordingly to get to a given mean. This set is said to have two degrees of freedom, corresponding with the number of values that you were free to choose (that is, that were allowed to vary freely).

This generalizes to a set of any given length. If I ask you to generate a set of 4, 10, or 1.000 numbers that average to 3, you can freely choose all numbers but the last one. In those sets the degrees of freedom are respectively, 3, 9, and 999. The general rule then for any set is that if n equals the number of values in the set, the degrees of freedom equals n – 1.

This is the basic method to calculate degrees of freedom, just n – 1. It is as simple as that. The thing that makes it seem more difficult, is the fact that in an ANOVA, you don’t have just one set of numbers, but there is a system (design) to the numbers. In the simplest form you test the mean of one set of numbers against the mean of another set of numbers (one-way ANOVA). In more complicated one-way designs, you test the means of three groups against each other. In a 2 x 2 design things seem even more complicated. Especially if there’s a within-subjects variable involved (Note: all examples on this page are between-subjects, but the reasoning mostly generalizes to within-subjects designs). However things are not as complicated as you might think. It’s all pretty much the same reasoning: how many values are free to vary to get to a given number?

Wikipedia says:
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (i.e., the sample variance has N-1 degrees of freedom, since it is computed from N random scores minus the only 1 parameter estimated as intermediate step, which is the sample mean).

Mathematically, degrees of freedom is the number of dimensions of the domain of a random vector, or essentially the number of ‘free’ components (how many components need to be known before the vector is fully determined).
Source: Wikipedia



One of the questions an instrutor dreads most from a mathematically unsophisticated audience is, “What exactly is degrees of freedom?” It’s not that there’s no answer. The mathematical answer is a single phrase, “The rank of a quadratic form.” The problem is translating that to an audience whose knowledge of mathematics does not extend beyond high school mathematics. It is one thing to say that degrees of freedom is an index and to describe how to calculate it for certain situations, but none of these pieces of information tells what degrees of freedom means.

As an alternative to “the rank of a quadratic form”, I’ve always enjoyed Jack Good’s 1973 article in the American Statistician “What are Degrees of Freedom?” 27, 227-228, in which he equates degrees of freedom to the difference in dimensionalities of parameter spaces. However, this is a partial answer. It explains what degrees of freedom is for many chi-square tests and the numerator degrees of freedom for F tests, but it doesn’t do as well with t tests or the denominator degrees of freedom for F tests.

At the moment, I’m inclined to define degrees of freedom as a way of keeping score. A data set contains a number of observations, say, n. They constitute n individual pieces of information. These pieces of information can be used to estimate either parameters or variability. In general, each item being estimated costs one degree of freedom. The remaining degrees of freedom are used to estimate variability. All we have to do is count properly.

A single sample: There are n observations. There’s one parameter (the mean) that needs to be estimated. That leaves n-1 degrees of freedom for estimating variability.

Two samples: There are n1+n2 observations. There are two means to be estimated. That leaves n1+n2-2 degrees of freedom for estimating variability.

One-way ANOVA with g groups: There are n1+..+ng observations. There are g means to be estimated. That leaves n1+..+ng-g degrees of freedom for estimating variability. This accounts for the denominator degrees of freedom for the F statistic.






Leave a Reply

Your email address will not be published.