Skip to contents

Apply Groups of Filter Criteria from Configuration

Usage

apply_filters(data, groups, length_column = NULL, id_columns = names(data)[1L])

Arguments

data

data frame

groups

names of filter criteria groups defined in list returned by kwb.prep:::read_filter_criteria

length_column

name of column in data containing lengths (to be summed up for the overview that is returned)

id_columns

names of column(s) in data that uniquely identify the records. This column / these columns are returned in order to report about the records that have been removed

Value

data, filtered according to the specified criteria. The returned data frame has an attribute filter_info being a list with as many elements as there are groups. The elements are named according to the values given in groups. Each list element is a list with one element overview (being a data frame with one row per filter criterion) and further elements removed_<i> being data frames with only id_columns that represent the records that have been removed in the according filter step i.

Examples


# Define filter criteria
criteria <- list(
  sepal = c(
    "sepal short" = "Sepal.Length < 5",
    "sepal narrow" = "Sepal.Width < 3"
  ),
  petal = c(
    "petal short" = "Petal.Length < 5",
    "petal narrow" = "Petal.Width < 3"
  )
)

# Write criteria to temporary yaml file
tdir <- tempdir()
yaml::write_yaml(criteria, file.path(tdir, "filter_criteria.yml"))

# Set path to temporary "config" folder so that kwb.prep knows about it
set_user_config_dir(tdir)

# Apply filter groups "sepal" and "petal" to the iris dataset
result <- apply_filters(iris, c("sepal", "petal"))
#> ## Filterschritte 'sepal' anwenden auf 'sepal'
#> 
#> ## Filterschritte 'sepal' anwenden auf 'sepal'
#> 
#> ```{r eval = FALSE}
#> Evaluating Sepal.Length < 5 ...
#>   is TRUE for      22 rows ( 14.7 %),
#>     FALSE for     128 rows ( 85.3 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 22
#> Evaluating Sepal.Width < 3 ...
#>   is TRUE for      57 rows ( 38.0 %),
#>     FALSE for      93 rows ( 62.0 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 4
#> ```
#> 
#> Eine Uebersicht ueber die entfernten Haltungen und deren Laengen kann nicht erzeugt werden, da
#> 
#> - keine Laengenangaben uebergeben wurden
#> 
#> ## Filterschritte 'petal' anwenden auf 'petal'
#> 
#> ## Filterschritte 'petal' anwenden auf 'petal'
#> 
#> ```{r eval = FALSE}
#> Evaluating Petal.Length < 5 ...
#>   is TRUE for       4 rows (100.0 %),
#>     FALSE for       0 rows (  0.0 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 4
#> Evaluating Petal.Width < 3 ...
#>   is TRUE for       4 rows (100.0 %),
#>     FALSE for       0 rows (  0.0 %) and
#>        NA for       0 rows (  0.0 %).
#>   Selected rows now: 4
#> ```
#> 
#> Eine Uebersicht ueber die entfernten Haltungen und deren Laengen kann nicht erzeugt werden, da
#> 
#> - keine Laengenangaben uebergeben wurden
#> 

# Have a look at the result
str(result)
#> 'data.frame':	4 obs. of  5 variables:
#>  $ Sepal.Length: num  4.4 4.5 4.9 4.9
#>  $ Sepal.Width : num  2.9 2.3 2.4 2.5
#>  $ Petal.Length: num  1.4 1.3 3.3 4.5
#>  $ Petal.Width : num  0.2 0.3 1 1.7
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 2 3
#>  - attr(*, "filter_info")=List of 2
#>   ..$ sepal:List of 3
#>   .. ..$ overview :'data.frame':	2 obs. of  5 variables:
#>   .. .. ..$ CleaningStep   : chr [1:2] "Keep 'Sepal.Length < 5'" "Keep 'Sepal.Width < 3'"
#>   .. .. ..$ Count.go       : int [1:2] 128 93
#>   .. .. ..$ Percentage.go  : num [1:2] 85.3 62
#>   .. .. ..$ Count.keep     : int [1:2] 22 4
#>   .. .. ..$ Percentage.keep: num [1:2] 14.67 2.67
#>   .. ..$ removed_1:'data.frame':	128 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num [1:128] 5.1 5 5.4 5 5.4 5.8 5.7 5.4 5.1 5.7 ...
#>   .. ..$ removed_2:'data.frame':	93 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num [1:93] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.9 5.4 ...
#>   ..$ petal:List of 3
#>   .. ..$ overview :'data.frame':	2 obs. of  5 variables:
#>   .. .. ..$ CleaningStep   : chr [1:2] "Keep 'Petal.Length < 5'" "Keep 'Petal.Width < 3'"
#>   .. .. ..$ Count.go       : int [1:2] 0 0
#>   .. .. ..$ Percentage.go  : int [1:2] 0 0
#>   .. .. ..$ Count.keep     : int [1:2] 4 4
#>   .. .. ..$ Percentage.keep: num [1:2] 100 100
#>   .. ..$ removed_1:'data.frame':	0 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num(0) 
#>   .. ..$ removed_2:'data.frame':	0 obs. of  1 variable:
#>   .. .. ..$ Sepal.Length: num(0)