Chapter 9 Correlation and partitioning of variation

The coefficient of determination, \(R^2\), compares the variation in the response variable to the variation in the fitted model value. It can be calculated as a ratio of variances:

Swim <- SwimRecords # from mosaicData
mod <- lm( time ~ year + sex, data = Swim)
var(fitted(mod)) / var(Swim$time)
## [1] 0.8439936

The convenience function rsquared() does the calculation for you:

rsquared(mod)
## [1] 0.8439936

The regression report is a standard way of summarizing models. Such a report is produced by most statistical software packages and used in many fields. The first part of the table contains the coefficients — labeled “Estimate” — along with other information that will be introduced starting in Chapter @ref(“chap:confidence”). The \(R^2\) statistic is a standard part of the report; look at the second line from the bottom.

summary(mod)
## 
## Call:
## lm(formula = time ~ year + sex, data = Swim)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7027 -2.7027 -0.5968  1.2796 19.0759 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 555.71678   33.79991  16.441  < 2e-16 ***
## year         -0.25146    0.01732 -14.516  < 2e-16 ***
## sexM         -9.79796    1.01287  -9.673 8.79e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.983 on 59 degrees of freedom
## Multiple R-squared:  0.844,  Adjusted R-squared:  0.8387 
## F-statistic: 159.6 on 2 and 59 DF,  p-value: < 2.2e-16

Occasionally, you may be interested in the correlation coefficient \(r\) between two quantities.
You can, of course, compute \(r\) by fitting a model, finding \(R^2\), and taking a square root.

mod2 <- lm( time ~ year, data = Swim)
coef(mod2)
## (Intercept)        year 
## 567.2420024  -0.2598771
sqrt(rsquared(mod2))
## [1] 0.7723752

The cor() function computes this directly:

cor(Swim$time, Swim$year)
## [1] -0.7723752

Note that the negative sign on \(r\) indicates that record swim time decreases as year increases. This information about the direction of change is contained in the sign of the coefficient from the model. The magnitude of the coefficient tells how fast the time is changing (with units of seconds per year). The correlation coefficient (like \(R^2\)) is without units.

Keep in mind that the correlation coefficient \(r\) summarizes only the simple linear model A ~ B where B is quantitative. But the coefficient of determination, \(R^2\), summarizes any model; it is much more useful. If you want to see the direction of change, look at the sign of the correlation coefficient.