Apply Groups of Filter Criteria from Configuration


apply_filters(data, groups, length_column = NULL, id_columns = names(data)[1L])



data frame


names of filter criteria groups defined in list returned by kwb.prep:::read_filter_criteria


name of column in data containing lengths (to be summed up for the overview that is returned)


names of column(s) in data that uniquely identify the records. This column / these columns are returned in order to report about the records that have been removed


data, filtered according to the specified criteria. The returned data frame has an attribute filter_info being a list with as many elements as there are groups. The elements are named according to the values given in groups. Each list element is a list with one element overview (being a data frame with one row per filter criterion) and further elements removed_<i> being data frames with only id_columns that represent the records that have been removed in the according filter step i.


# Define filter criteria
criteria <- list(
  sepal = c(
    "sepal short" = "Sepal.Length < 5",
    "sepal narrow" = "Sepal.Width < 3"
  petal = c(
    "petal short" = "Petal.Length < 5",
    "petal narrow" = "Petal.Width < 3"

# Write criteria to temporary yaml file
tdir <- tempdir()
yaml::write_yaml(criteria, file.path(tdir, "filter_criteria.yml"))

# Set path to temporary "config" folder so that kwb.prep knows about it

# Apply filter groups "sepal" and "petal" to the iris dataset
result <- apply_filters(iris, c("sepal", "petal"))
#> ## Filterschritte 'sepal' anwenden auf 'sepal'
#> ```{r eval = FALSE}
#> Evaluating Sepal.Length < 5 ...
#>   is TRUE for      22 rows ( 14.7 %),
#>     FALSE for     128 rows ( 85.3 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 22
#> Evaluating Sepal.Width < 3 ...
#>   is TRUE for      57 rows ( 38.0 %),
#>     FALSE for      93 rows ( 62.0 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 4
#> ```
#> Eine Uebersicht ueber die entfernten Haltungen und deren Laengen kann nicht erzeugt werden, da
#> - keine Laengenangaben uebergeben wurden
#> ## Filterschritte 'petal' anwenden auf 'petal'
#> ```{r eval = FALSE}
#> Evaluating Petal.Length < 5 ...
#>   is TRUE for       4 rows (100.0 %),
#>     FALSE for       0 rows (  0.0 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 4
#> Evaluating Petal.Width < 3 ...
#>   is TRUE for       4 rows (100.0 %),
#>     FALSE for       0 rows (  0.0 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 4
#> ```
#> Eine Uebersicht ueber die entfernten Haltungen und deren Laengen kann nicht erzeugt werden, da
#> - keine Laengenangaben uebergeben wurden

# Have a look at the result
#> 'data.frame':	4 obs. of  5 variables:
#>  $ Sepal.Length: num  4.4 4.5 4.9 4.9
#>  $ Sepal.Width : num  2.9 2.3 2.4 2.5
#>  $ Petal.Length: num  1.4 1.3 3.3 4.5
#>  $ Petal.Width : num  0.2 0.3 1 1.7
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 2 3
#>  - attr(*, "filter_info")=List of 2
#>   ..$ sepal:List of 3
#>   .. ..$ overview :'data.frame':	2 obs. of  5 variables:
#>   .. .. ..$ CleaningStep   : chr [1:2] "Keep 'Sepal.Length < 5'" "Keep 'Sepal.Width < 3'"
#>   .. .. ..$ Count.go       : int [1:2] 128 93
#>   .. .. ..$ Percentage.go  : num [1:2] 85.3 62
#>   .. .. ..$ Count.keep     : int [1:2] 22 4
#>   .. .. ..$ Percentage.keep: num [1:2] 14.67 2.67
#>   .. ..$ removed_1:'data.frame':	128 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num [1:128] 5.1 5 5.4 5 5.4 5.8 5.7 5.4 5.1 5.7 ...
#>   .. ..$ removed_2:'data.frame':	93 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num [1:93] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.9 5.4 ...
#>   ..$ petal:List of 3
#>   .. ..$ overview :'data.frame':	2 obs. of  5 variables:
#>   .. .. ..$ CleaningStep   : chr [1:2] "Keep 'Petal.Length < 5'" "Keep 'Petal.Width < 3'"
#>   .. .. ..$ Count.go       : int [1:2] 0 0
#>   .. .. ..$ Percentage.go  : int [1:2] 0 0
#>   .. .. ..$ Count.keep     : int [1:2] 4 4
#>   .. .. ..$ Percentage.keep: num [1:2] 100 100
#>   .. ..$ removed_1:'data.frame':	0 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num(0) 
#>   .. ..$ removed_2:'data.frame':	0 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num(0)