# Chapter 9 Correlation and partitioning of variation

The coefficient of determination, \(R^2\), compares the variation in the response variable to the variation in the fitted model value. It can be calculated as a ratio of variances:

```
Swim <- SwimRecords # from mosaicData
mod <- lm( time ~ year + sex, data = Swim)
var(fitted(mod)) / var(Swim$time)
```

`## [1] 0.8439936`

The convenience function `rsquared()`

does the calculation for you:

`rsquared(mod)`

`## [1] 0.8439936`

The ** regression report** is a standard way of summarizing models. Such a report is produced by most statistical software packages and used in many fields. The first part of the table contains the coefficients — labeled “Estimate” — along with other information that will be introduced starting in Chapter @ref(“chap:confidence”). The \(R^2\) statistic is a standard part of the report; look at the second line from the bottom.

`summary(mod)`

```
##
## Call:
## lm(formula = time ~ year + sex, data = Swim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7027 -2.7027 -0.5968 1.2796 19.0759
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 555.71678 33.79991 16.441 < 2e-16 ***
## year -0.25146 0.01732 -14.516 < 2e-16 ***
## sexM -9.79796 1.01287 -9.673 8.79e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.983 on 59 degrees of freedom
## Multiple R-squared: 0.844, Adjusted R-squared: 0.8387
## F-statistic: 159.6 on 2 and 59 DF, p-value: < 2.2e-16
```

Occasionally, you may be interested in the correlation coefficient \(r\) between two quantities.

You can, of course, compute \(r\) by fitting a model, finding \(R^2\), and taking a square root.

```
mod2 <- lm( time ~ year, data = Swim)
coef(mod2)
```

```
## (Intercept) year
## 567.2420024 -0.2598771
```

`sqrt(rsquared(mod2))`

`## [1] 0.7723752`

The `cor()`

function computes this directly:

`cor(Swim$time, Swim$year)`

`## [1] -0.7723752`

Note that the negative sign on \(r\) indicates that record swim `time`

decreases as `year`

increases. This information about the direction of change is contained in the sign of the coefficient from the model. The magnitude of the coefficient tells how fast the `time`

is changing (with units of seconds per year). The correlation coefficient (like \(R^2\)) is without units.

Keep in mind that the correlation coefficient \(r\) summarizes only the simple linear model A ~ B where B is quantitative. But the coefficient of determination, \(R^2\), summarizes any model; it is much more useful. If you want to see the direction of change, look at the sign of the correlation coefficient.