The mosaic package makes several summary statistic functions (like mean and sd) formula aware.

mean_(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

mean(x, ...)

median(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

range(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

sd(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

max(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

min(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

IQR(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

fivenum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

iqr(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

prod(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

favstats(x, ..., data = NULL, groups = NULL, na.rm = TRUE)

quantile(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE))

var(x, y = NULL, na.rm = getOption("na.rm", FALSE), ..., data = NULL)

cor(x, y = NULL, ..., data = NULL)

cov(x, y = NULL, ..., data = NULL)

Arguments

x a numeric vector or a formula additional arguments a data frame in which to evaluate formulas (or bare names). Note that the default is data = parent.frame(). This makes it convenient to use this function interactively by treating the working environment as if it were a data frame. But this may not be appropriate for programming uses. When programming, it is best to use an explicit data argument -- ideally supplying a data frame that contains the variables mentioned. a grouping variable, typically a name of a variable in data a logical indicating whether NAs should be removed before computing a numeric vector or a formula

Details

Many of these functions mask core R functions to provide an additional formula interface. Old behavior should be unchanged. But if the first argument is a formula, that formula, together with data are used to generate the numeric vector(s) to be summarized. Formulas of the shape x ~ a or ~ x | a can be used to produce summaries of x for each subset defined by a. Two-way aggregation can be achieved using formulas of the form x ~ a + b or x ~ a | b. See the examples.

Note

Earlier versions of these functions supported a "bare name + data frame" interface. This functionality has been removed since it was (a) ambiguous in some cases, (b) unnecessary, and (c) difficult to maintain.

Examples

mean(HELPrct$age) #>  35.65342 mean( ~ age, data = HELPrct) #>  35.65342 mean( ~ drugrisk, na.rm = TRUE, data = HELPrct) #>  1.887168 mean(age ~ shuffle(sex), data = HELPrct) #> female male #> 35.05607 35.83815 mean(age ~ shuffle(sex), data = HELPrct, .format = "table") #> shuffle(sex) mean #> 1 female 37.28037 #> 2 male 35.15029 # wrap in data.frame() to auto-convert awkward variable names data.frame(mean(age ~ shuffle(sex), data = HELPrct, .format = "table")) #> shuffle.sex. mean #> 1 female 35.62617 #> 2 male 35.66185 mean(age ~ sex + substance, data = HELPrct) #> female.alcohol male.alcohol female.cocaine male.cocaine female.heroin #> 39.16667 37.95035 34.85366 34.36036 34.66667 #> male.heroin #> 33.05319 mean( ~ age | sex + substance, data = HELPrct) #> female.alcohol male.alcohol female.cocaine male.cocaine female.heroin #> 39.16667 37.95035 34.85366 34.36036 34.66667 #> male.heroin #> 33.05319 mean( ~ sqrt(age), data = HELPrct) #>  5.936703 sum( ~ age, data = HELPrct) #>  16151 sd(HELPrct$age)
#>  7.710266
sd( ~ age, data = HELPrct)
#>  7.710266
sd(age ~ sex + substance, data = HELPrct)
#> female.alcohol male.alcohol female.cocaine male.cocaine female.heroin #> 7.980333 7.575644 6.195002 6.889772 8.035839 #> male.heroin #> 7.973568
var(HELPrct\$age)
#>  59.4482
var( ~ age, data = HELPrct)
#>  59.4482
var(age ~ sex + substance, data = HELPrct)
#> female.alcohol male.alcohol female.cocaine male.cocaine female.heroin #> 63.68571 57.39037 38.37805 47.46896 64.57471 #> male.heroin #> 63.57779
IQR(width ~ sex, data = KidsFeet)
#> B G #> 0.75 0.60
iqr(width ~ sex, data = KidsFeet)
#> B G #> 0.75 0.60
favstats(width ~ sex, data = KidsFeet)
#> sex min Q1 median Q3 max mean sd n missing #> 1 B 8.4 8.875 9.15 9.625 9.8 9.190000 0.4517801 20 0 #> 2 G 7.9 8.550 8.80 9.150 9.5 8.784211 0.4935846 19 0
cor(length ~ width, data = KidsFeet)
#>  0.6410961
cov(length ~ width, data = KidsFeet)
#>  0.4304453
tally(is.na(mcs) ~ is.na(pcs), data = HELPmiss)
#> is.na(pcs) #> is.na(mcs) TRUE FALSE #> TRUE 2 0 #> FALSE 0 468
cov(mcs ~ pcs, data = HELPmiss) # NA because of missing data
#>  NA
cov(mcs ~ pcs, data = HELPmiss, use = "complete") # ignore missing data
#>  13.46433
# alternative approach using filter explicitly cov(mcs ~ pcs, data = HELPmiss %>% filter(!is.na(mcs) & !is.na(pcs)))
#>  13.46433