Goodness of Fit

The value of $\chi ^2$ at the minimum gives a measure of how well the straight line actually fits the data. To see this, consider the meaning of each term in Eq (2). The numerator is the square of the difference between the measured value, given by $y_i$ and the ideal straight line value, given by $y_{i,model}$. If the discrepancy between these two values is due only to statistical scatter according to the assumed Gaussian distribution, then we would expect that most of the time the numerator would be roughly the same size as the denominator. In other words, we would expect that the average term in the expression for $\chi ^2$ would be just $1$, and the total would be roughly $N$. Well, actually, we would expect to do just a bit better than an average of $1$, since the act of optimizing the value of $m$ and $b$ gives us an advantage. Clearly, if there were only two points, then we could always arrange for the straight line to go through them exactly, giving $\chi^2_0 = 0$. In this case the advantage is everything. But the more points there are, the harder it will be to get a straight line, unless the data really suggest a straight line, and the more we expect the value of $\chi^2_0$ to come close to $N$. The key concept here is the number of “degrees of freedom” (df). It is defined as the number of independent data points minus the number of fitting parameters.

\begin{displaymath}
df = N_{data} - n_{param}.
\end{displaymath} (32)

In our case it is $N-2$. So suppose we evaluate $\chi^2_0$ and get an answer that is not $N-2$. What does it mean? If the answer is much bigger than $N-2$, we might be tempted to say that the measured points differ from a straight line by much more than would be expected from the known errors $\sigma_i$. Therefore the data would not justify the assumption that the ideal values lie on a straight line. If the value of $\chi^2_0$ is much less than $N-2$ we might be tempted to say we overestimated our errors $\sigma_i$. But even though the average value ought to come out around $N-2$, we must remember we are dealing with statistics here, so statistical fluctuations could well be the cause of the discrepancy between the actual $\chi^2_0$ and $N-2$.

The concepts introduced in the previous paragraphs are formalized in the analysis of goodness of fit. The question we are asking can be phrased in probabilistic terms: Given a set of variables that are distributed according to the multivariate Gaussian distribution of Eq. (20), what is the probability $P(\chi^{2})d\chi^{2}$ that $\chi^{2}$, computed according to Eq. (2), has a value in the range $(\chi^{2},\chi^{2}+
d\chi^{2})$? The answer is just the integral:

$\displaystyle P_{N}(\chi^{2})$ $\textstyle =$ $\displaystyle \int dy_{1}\,dy_{2}\,\ldots{}dy_{N}\,
P(y_1,y_2,\ldots,y_N)$  
    $\displaystyle \delta\left[\chi^{2} - \frac{(y_1-\bar y_{1,model})^2}{\sigma_1^2...
...2}{\sigma_2^2} - \ldots
- \frac{(y_N - \bar y_{N,model})^2}{\sigma_N^2}\right].$ (33)

A change of variables to

\begin{displaymath}
q_{i} = (y_i-\bar y_{i,model})/\sigma_i
\end{displaymath}

gives
$\displaystyle P_{N}(\chi^{2})$ $\textstyle =$ $\displaystyle (2\pi)^{-N/2}\int dq_{1}\,dq_{2}\,\ldots{}dq_{N}\,
\exp[-q_1^2/2 - q_2^2/2 - \ldots - q_N^2/2]$  
    $\displaystyle \delta[\chi^{2} - q_1^2 - q_2^2 - \ldots - q_N^2].$ (34)

We can think of the integration variables as defining components of a vector ${\bf q} = (q_{1},q_{2},\ldots{},q_{N})$. The delta function requires that the square of the length of the vector be just $\chi^{2}$. In fact in view of the delta function, the exponential can be replaced by $\exp(-\chi^{2}/2)$ and the remaining integration just gives the surface area of a sphere of radius $\sqrt{\chi^{2}}$ in an N-dimensional space. The final result is
\begin{displaymath}
P_{N}(\chi^{2}) =
\frac{1}{2^{N/2}\Gamma(N/2)}e^{-\chi^{2}/2}(\chi^{2})^{(N/2)-1}
\end{displaymath} (35)

This is called the $\chi^{2}$ distribution for $N$ degrees of freedom (df). When we are fitting $N$ points to a line with two adjustable parameters, we must substitute $N-2$ degrees of freedom in place of $N$ in this formula to correct for the bias we discussed above.

We now return to the question we asked at the beginning of this section. Suppose we minimized $\chi^{2}$ and found a value $\chi_{0}^{2}$. Is it a good fit? Stated in more precise statistical language, we ask what is the probability that we could have gotten a value as large or larger than $\chi_{0}^{2}$ as a result of chance, based on the probability Eq. (35). If this probability is too small, i.e. such a large value is unlikely, we might suspect that the fit is bad. This probability is related to an integral:

\begin{displaymath}
CL = P_{N}(\chi^{2} > \chi_{0}^{2}) =
\int_{\chi_{0}^{2}}^{\infty}P_{N}(\chi^{2})d\chi^{2}.
\end{displaymath} (36)

The integral gives the chance of exceeding $\chi^{2}_{0}$ with $N$ degrees of freedom. This probability is sometimes called the “confidence level” or “$p$-value”. It is plotted in the graph in Fig. 1. The graph is based on two numbers, namely $\chi^2_0$ and the number of degrees of freedom. (The graph uses $n_D$ in place of $N$). To read the graph, select the curve that corresponds to $n_D$. Then locate the value of $\chi^2_0$ on the top or bottom, and find where the curve crosses the vertical line corresponding to $\chi^2_0$. Read the confidence level from the axis on the left. The confidence level is the probability that the observed value of $\chi^2_0$ could be equaled or exceeded by merely random fluctuations. If this probability is low, then we could reject the straight line theory with confidence. For example, suppose we had $10$ points ($n_D = 8$) and got $\chi^2_0 = 20$, much bigger than we would have expected. The graph gives a confidence level of 0.01 in this case. That means that such a large value of $\chi ^2$ would be expected to occur as a result of random fluctuations only 1% of the time. On the other hand if we had $\chi^2_0 = 10$, the confidence level would be $0.25$, so such a large fluctuation would be expected about $1/4$ of the time.

The confidence level graph is based on the assumption that the probability distribution is Gaussian as stated. If the probability distribution is different or there are correlations among the measurements, we can't use this graph.

Figure 3: Confidence level vs $\chi ^2$ for various $n_D$.