SimplifyStats provides a set of functions to simplify the process of 1) generating descriptive statistics for the numeric variables of multiple groups and 2) performing hypothesis testing between all combinations of groups.
The function group_summarize can be used to generate descriptive statistics for multiple groups based on unique combinations of the grouping variables.
library(SimplifyStats)
# Generate data.
df <- iris
# Modify df to demonstrate additional functionality.
## Add an NA.
df$Sepal.Length[1] <- NA
## Add another grouping variable.
df$Condition <- rep(c("untreated","treated"), 75)
# Generate descriptive statistics.
group_summarize(
df,
group_cols = c("Species","Condition"),
var_cols = c("Sepal.Length","Sepal.Width"),
na.rm = TRUE
)
#> # A tibble: 12 x 17
#> Variable Species Condition N Mean StdDev StdErr Min Quartile1 Median
#> <chr> <fct> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Sepal.L~ setosa untreated 24 5.02 0.399 0.0814 4.4 4.77 5
#> 2 Sepal.L~ setosa treated 25 4.99 0.317 0.0633 4.3 4.8 5
#> 3 Sepal.L~ versic~ untreated 25 5.99 0.556 0.111 5 5.6 5.9
#> 4 Sepal.L~ versic~ treated 25 5.88 0.478 0.0956 4.9 5.6 5.9
#> 5 Sepal.L~ virgin~ untreated 25 6.50 0.603 0.121 4.9 6.2 6.5
#> 6 Sepal.L~ virgin~ treated 25 6.67 0.669 0.134 5.6 6.3 6.5
#> 7 Sepal.W~ setosa untreated 25 3.48 0.325 0.0651 2.9 3.2 3.5
#> 8 Sepal.W~ setosa treated 25 3.38 0.426 0.0853 2.3 3.1 3.4
#> 9 Sepal.W~ versic~ untreated 25 2.78 0.336 0.0672 2 2.6 2.9
#> 10 Sepal.W~ versic~ treated 25 2.76 0.297 0.0594 2.3 2.5 2.8
#> 11 Sepal.W~ virgin~ untreated 25 2.94 0.287 0.0574 2.5 2.8 3
#> 12 Sepal.W~ virgin~ treated 25 3.01 0.356 0.0713 2.2 2.8 3
#> # ... with 7 more variables: Quartile3 <dbl>, Max <dbl>, PropNA <dbl>,
#> # Kurtosis <dbl>, Skewness <dbl>, `Jarque-Bera_p.value` <dbl>,
#> # `Shapiro-Wilk_p.value` <dbl>
Similarly, the function pairwise_stats can be used to perform pairwise statistical tests for multiple variables based on unique combinations of the grouping variables.
# Generate descriptive statistics.
pairwise_stats(
df,
group_cols = c("Species","Condition"),
var_cols = c("Sepal.Length", "Sepal.Width"),
t.test
)
#> # A tibble: 30 x 15
#> Variable A.Species A.Condition B.Species B.Condition estimate estimate1
#> <chr> <fct> <chr> <fct> <chr> <dbl> <dbl>
#> 1 Sepal.L~ setosa untreated setosa treated 0.0328 5.02
#> 2 Sepal.L~ setosa untreated versicol~ untreated -0.971 5.02
#> 3 Sepal.L~ setosa untreated versicol~ treated -0.859 5.02
#> 4 Sepal.L~ setosa untreated virginica untreated -1.48 5.02
#> 5 Sepal.L~ setosa untreated virginica treated -1.65 5.02
#> 6 Sepal.L~ setosa treated versicol~ untreated -1.00 4.99
#> 7 Sepal.L~ setosa treated versicol~ treated -0.892 4.99
#> 8 Sepal.L~ setosa treated virginica untreated -1.52 4.99
#> 9 Sepal.L~ setosa treated virginica treated -1.68 4.99
#> 10 Sepal.L~ versicol~ untreated versicol~ treated 0.112 5.99
#> # ... with 20 more rows, and 8 more variables: estimate2 <dbl>,
#> # statistic <dbl>, p.value <dbl>, parameter <dbl>, conf.low <dbl>,
#> # conf.high <dbl>, method <chr>, alternative <chr>