Skip to contents

Frequency of Value Combinations in Data Frame Columns

Usage

fieldSummary(x, groupBy = names(x)[-1L], lengthColumn = "", na = "Unknown")

Arguments

x

data frame

groupBy

vector of character naming the columns (fields) in x to be included in the evaluation. Default: names of all columns in x except the first one (assuming it could be an ID column).

lengthColumn

optional. Name of column in x to be summed up

na

optional. Value to be treated as NA. Default: "Unknown"

Examples

n <- 1000L
sample_replace <- function(x, ...) sample(x, size = n, replace = TRUE, ...)
x <- data.frame(
  pipe_id = 1:n,
  material = sample_replace(c("clay", "concrete", "other")),
  age_cat = sample_replace(c("young", "old")),
  length = as.integer(rnorm(n, 50)),
  stringsAsFactors = FALSE
)

fieldSummary(x)
#>    material age_cat length Count Percentage
#> 1      clay   young     46     1        0.1
#> 2      clay     old     47     3        0.3
#> 3  concrete     old     47     3        0.3
#> 4     other     old     47     3        0.3
#> 5      clay   young     47     7        0.7
#> 6  concrete   young     47     5        0.5
#> 7     other   young     47     1        0.1
#> 8      clay     old     48    11        1.1
#> 9  concrete     old     48    24        2.4
#> 10    other     old     48    26        2.6
#> 11     clay   young     48    26        2.6
#> 12 concrete   young     48    23        2.3
#> 13    other   young     48    23        2.3
#> 14     clay     old     49    62        6.2
#> 15 concrete     old     49    61        6.1
#> 16    other     old     49    63        6.3
#> 17     clay   young     49    42        4.2
#> 18 concrete   young     49    57        5.7
#> 19    other   young     49    52        5.2
#> 20     clay     old     50    54        5.4
#> 21 concrete     old     50    58        5.8
#> 22    other     old     50    48        4.8
#> 23     clay   young     50    52        5.2
#> 24 concrete   young     50    72        7.2
#> 25    other   young     50    59        5.9
#> 26     clay     old     51    28        2.8
#> 27 concrete     old     51    20        2.0
#> 28    other     old     51    25        2.5
#> 29     clay   young     51    20        2.0
#> 30 concrete   young     51    19        1.9
#> 31    other   young     51    23        2.3
#> 32     clay     old     52     1        0.1
#> 33 concrete     old     52     6        0.6
#> 34    other     old     52     4        0.4
#> 35     clay   young     52     8        0.8
#> 36 concrete   young     52     6        0.6
#> 37    other   young     52     4        0.4
fieldSummary(x, "age_cat")
#>   age_cat Count Percentage
#> 1     old   500         50
#> 2   young   500         50
fieldSummary(x, "material")
#>   material Count Percentage
#> 1     clay   315       31.5
#> 2 concrete   354       35.4
#> 3    other   331       33.1
fieldSummary(x, "material", lengthColumn = "length")
#>   material Count length Percentage
#> 1     clay   315  15604       31.5
#> 2 concrete   354  17527       35.4
#> 3    other   331  16389       33.1