Decreasingly sorted frequencies of strings, by default weighted by their length. This function can be used to find the most "important" folder paths in terms of frequency and length.

sorted_importance(x, weighted = TRUE)

Arguments

x

vector of character strings

weighted

if TRUE (default) the frequencies of strings are multiplied by the corresponding string lengths

Value

named integer vector (of class table) containing the decreasingly sorted importance values of the elements in x. The importance of a string is either its frequency in x (if weighted is FALSE) or the product of this frequency and the string length (if weighted is TRUE)

Examples

strings <- c("a", "a", "a", "bc", "bc", "cdefg") (importance <- kwb.pathdict:::sorted_importance(strings))
#> x #> cdefg bc a #> 5 4 3
# Check that each input element is mentioned in the output all(unique(strings) %in% names(importance))
#> [1] TRUE
# weighted = FALSE just returns the frequencies of strings in x (importance <- kwb.pathdict:::sorted_importance(strings, weighted = FALSE))
#> x #> a bc cdefg #> 3 2 1
# Check if the sum of frequencies is the number of elements in x sum(importance) == length(strings)
#> [1] TRUE
# You may use the function to assess the "importance" of directory paths kwb.pathdict:::sorted_importance(dirname(kwb.pathdict:::example_paths()))
#> x #> //very/long/path/to/the/projects/project-1/wp-1/input #> 106 #> //very/long/path/to/the/projects/project-2/Berichte #> 102 #> //very/long/path/to/the/projects/project-1/wp-1/analysis #> 56 #> //very/long/path/to/the/projects/project-1/wp 2/output #> 54 #> //very/long/path/to/the/projects/project-1/wp 2/input #> 53 #> //very/long/path/to/the/projects/project-2/Grafiken #> 51 #> //very/long/path/to/the/projects/project-2/Daten #> 48