Convert Long File Paths to Simple Paths
to_simple_names(paths, method = 1L, get_base = NULL, sha1_digits = 4)
vector of character containing file paths
method = 1
: file names generated match the pattern
file_<xx>
with <xx>
being an integer number of two digits.
method = 2
: file names generated match the pattern file_<sha>
with <sha>
being the first sha1_digits
digits of the sha1
hash (see e.g. http://www.sha1-online.com/) of the base names of the
paths
. By default, the base name is the file name (without folder
path) without extension. The base names can be determined individually by
providing a function in get_base
function taking a vector of character as input and returning
a vector of character as output. If not NULL
, this function will be
used to determine the base paths from the paths
when method =
2
was specified.
number of digits used when method = 2
is to be
applied
vector of character as long as paths
paths <- c("v1_ugly_name_1.doc", "v1_very_ugly_name.xml",
"v2_ugly_name_1.docx", "v2_very_ugly_name.xmlx")
to_simple_names(paths, method = 1L)
#> [1] "file_01.doc" "file_02.xml" "file_03.docx" "file_04.xmlx"
writeLines(sort(to_simple_names(paths, method = 2L)))
#> file_2ecd.xml
#> file_3f3a.xmlx
#> file_82f1.doc
#> file_f400.docx
# All sha1 are different because all base names (file name without extension
# by default) are different. If you want to give the same sha1 to files that
# correspond to each other but have a different extension, set the function
# that extracts the "base name" of the file:
get_base <- function(x) kwb.utils::removeExtension(gsub("^v\\d+_", "", x))
writeLines(sort(to_simple_names(paths, method = 2L, get_base = get_base)))
#> file_3abc.xml
#> file_3abc.xmlx
#> file_d71a.doc
#> file_d71a.docx
# Now the file names that have the same base name (neglecting the prefix
# v1_ or v2_) get the same sha1 and thus appear as groups in the sorted
# file list