Convert Long File Paths to Simple Paths

to_simple_names(paths, method = 1L, get_base = NULL, sha1_digits = 4)

Arguments

paths

vector of character containing file paths

method

method = 1: file names generated match the pattern file_<xx> with <xx> being an integer number of two digits. method = 2: file names generated match the pattern file_<sha> with <sha> being the first sha1_digits digits of the sha1 hash (see e.g. http://www.sha1-online.com/) of the base names of the paths. By default, the base name is the file name (without folder path) without extension. The base names can be determined individually by providing a function in get_base

get_base

function taking a vector of character as input and returning a vector of character as output. If not NULL, this function will be used to determine the base paths from the paths when method = 2 was specified.

sha1_digits

number of digits used when method = 2 is to be applied

Value

vector of character as long as paths

Examples

paths <- c("v1_ugly_name_1.doc",  "v1_very_ugly_name.xml",
           "v2_ugly_name_1.docx", "v2_very_ugly_name.xmlx")
           
to_simple_names(paths, method = 1L)
#> [1] "file_01.doc"  "file_02.xml"  "file_03.docx" "file_04.xmlx"
writeLines(sort(to_simple_names(paths, method = 2L)))
#> file_2ecd.xml
#> file_3f3a.xmlx
#> file_82f1.doc
#> file_f400.docx

# All sha1 are different because all base names (file name without extension
# by default) are different. If you want to give the same sha1 to files that 
# correspond to each other but have a different extension, set the function 
# that extracts the "base name" of the file:

get_base <- function(x) kwb.utils::removeExtension(gsub("^v\\d+_", "", x))

writeLines(sort(to_simple_names(paths, method = 2L, get_base = get_base)))
#> file_3abc.xml
#> file_3abc.xmlx
#> file_d71a.doc
#> file_d71a.docx

# Now the file names that have the same base name (neglecting the prefix 
# v1_ or v2_) get the same sha1 and thus appear as groups in the sorted 
# file list