Skip to contents

Frequency of Value Combinations in Data Frame Columns

Usage

fieldSummary(x, groupBy = names(x)[-1L], lengthColumn = "", na = "Unknown")

Arguments

x

data frame

groupBy

vector of character naming the columns (fields) in x to be included in the evaluation. Default: names of all columns in x except the first one (assuming it could be an ID column).

lengthColumn

optional. Name of column in x to be summed up

na

optional. Value to be treated as NA. Default: "Unknown"

Examples

n <- 1000L
sample_replace <- function(x, ...) sample(x, size = n, replace = TRUE, ...)
x <- data.frame(
  pipe_id = 1:n,
  material = sample_replace(c("clay", "concrete", "other")),
  age_cat = sample_replace(c("young", "old")),
  length = as.integer(rnorm(n, 50)),
  stringsAsFactors = FALSE
)

fieldSummary(x)
#>    material age_cat length Count Percentage
#> 1      clay     old     47     2        0.2
#> 2  concrete     old     47     5        0.5
#> 3     other     old     47     1        0.1
#> 4      clay   young     47     3        0.3
#> 5  concrete   young     47     3        0.3
#> 6     other   young     47     5        0.5
#> 7      clay     old     48    22        2.2
#> 8  concrete     old     48    28        2.8
#> 9     other     old     48    24        2.4
#> 10     clay   young     48    23        2.3
#> 11 concrete   young     48    22        2.2
#> 12    other   young     48    25        2.5
#> 13     clay     old     49    54        5.4
#> 14 concrete     old     49    57        5.7
#> 15    other     old     49    49        4.9
#> 16     clay   young     49    52        5.2
#> 17 concrete   young     49    64        6.4
#> 18    other   young     49    54        5.4
#> 19     clay     old     50    57        5.7
#> 20 concrete     old     50    54        5.4
#> 21    other     old     50    54        5.4
#> 22     clay   young     50    75        7.5
#> 23 concrete   young     50    41        4.1
#> 24    other   young     50    53        5.3
#> 25     clay     old     51    20        2.0
#> 26 concrete     old     51    30        3.0
#> 27    other     old     51    22        2.2
#> 28     clay   young     51    30        3.0
#> 29 concrete   young     51    33        3.3
#> 30    other   young     51    16        1.6
#> 31     clay     old     52     4        0.4
#> 32 concrete     old     52     2        0.2
#> 33    other     old     52     5        0.5
#> 34     clay   young     52     6        0.6
#> 35 concrete   young     52     3        0.3
#> 36     clay     old     53     1        0.1
#> 37 concrete   young     53     1        0.1
fieldSummary(x, "age_cat")
#>   age_cat Count Percentage
#> 1     old   491       49.1
#> 2   young   509       50.9
fieldSummary(x, "material")
#>   material Count Percentage
#> 1     clay   349       34.9
#> 2 concrete   343       34.3
#> 3    other   308       30.8
fieldSummary(x, "material", lengthColumn = "length")
#>   material Count length Percentage
#> 1     clay   349  17312       34.9
#> 2 concrete   343  16981       34.3
#> 3    other   308  15229       30.8