How Good Is the Regression?

The accuracy of the model is described by two statistics: the coefficient of correlation and the coefficient of determination [1, 5, 7, 12, 14, 31, 33, 34, 35, 36, 37, 39].


Coefficient of correlation

The coefficient of correlation \( r \) is a measure of the strength of the linear relationship between two variable \( X \) and \( Y \). It is calculated by formula (11.11):


\( r=\frac{{SS}_{xy}}{\sqrt{{SS}_x{SS}_y}} \) (11.11)


where


\( {SS}_x=\sum\left(x_i-\bar{x}\right)^2=\sum{x_i^2-\frac{\left(\sum x_i\right)^2}{n}} \) (11.12)


\( {SS}_y=\sum\left(y_i-\bar{y}\right)^2=\sum{y_i^2-\frac{\left(\sum y_i\right)^2}{n}} \) (11.13)


\( {SS}_{xy}=\sum\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)=\sum{x_iy_i-\frac{\left(\sum x_i\right)\left(\sum y_i\right)}{n}} \) (11.14)


Some values or \( r \) and their implications are given on figure 11.7:


coefficient of correlation
Fig. 11.7 Values or \( r \) and their implications

The high level of correlation doesn’t necessary mean that relationship between \( X \) and \( Y \) exists. It could mean a linear trend may exists.

The low level of correlations necessary doesn’t mean that \( X \) and \( Y \) are unrelated – it could mean that \( X \) and \( Y \) are not strongly linearly related.

 

Coefficient of determination

The coefficient of determination measures the contribution of \( X \) in predicting \( Y \).  The formula is give in (11.14):


\( r^2=\frac{{SS}_y-SSE}{{SS}_y}=1-\frac{SSE}{{SS}_y} \) (11.14)


In the simple linear regression in equal to the square of the simple linear coefficient of correlation \( r \).

The value of \( r^2 \) is always between 0 and 1, because \( r \) is between -1 and 1. Thus, \( r^2=0.7 \) indicates that 70 percent of dependent variable \( Y \) can be explained by the independent variable \( X \).