Skip to contents

Datasets

Datasets for statistical analysis and predicting well capacity

model_data_reduced
Input Data for Well Capacity Prediction
rehabs
Input Data for Well Capacity Prediction: Well Rehabilitations
operational_start
Input Data for Well Capacity Prediction: Operational Start

Data preparation

Functions for preparing / cleaning data

read_csv()
read csv data file exported by Sebastian Schimmelpfennig from db2
read_ms_access()
read table from MS Access data base via odbc connection under 64-bit-R
read_select_rename()
read table from MS Access data base; select and rename columns as defined in renamings table ('old_name' -> 'new_name')
rename_values()
rename values of a character vector according to renamings table
select_rename_cols()
selects and renames columns from a data frame according to a reference table
classify_Qs()
Transfer Qs_rel into binary factor with low and high specific capacity
combine_pump_test_and_Q_monitoring_data()
Combined Pumptest and Q Monitoring Dataset
extdata_file()
Get Path to File in This Package
replace_na_with_median()
Replace NAs with median
fill_up_na_with_median_from_lookup()
Fill up NA values with median of lookup table
get_pump_test_vars()
Get Default Pump Test Variables
get_W_static_data()
Get W_static measurement data from Neubaupumpversuche, Kurzpumpversuche and other sources
interpolate_and_fill()
Interpolate and fill up static water level
interpolate_Qs()
Interpolates Qs time series data to a given time interval
load_renamings_csv()
Load renaming table from CSV file
load_renamings_excel()
load renaming table from original excel file
prepare_pump_test_data()
prepare pump test data with one row per Qs-measurement + rehab history
prepare_pump_test_data_1()
Prepare pump test data in wide format
prepare_pump_test_data_2()
reformats untidy pump test data from wide into long format
prepare_quality_data()
Prepare Quality Data
prepare_simulation_donothing_df()
Prepare 'Do Nothing' Simulation Data Frame
prepare_volume_data()
Prepare Volume Data
summarise_marginal_factor_levels()
summarise factor levels with relative frequency below a threshold
tidy_factor()
turn character into factor, sort factor levels and replace NA level

Data analyis

Functions for statistical analysis feature importance ranking

chi2.CramersV.test()
Title
frequency_table()
calculate absolute and relative frequencies of categorical varables

Data visualisation

Functions for plotting

Qs_heatmap_plot()
Heatmap / raster plot for Qs values over time with each well as one line
correlation_plot()
plots Qs_rel vs. input variable as box plot (categorical input variable) or scatterplot (numerical input variable)
plot_distribution()
plot frequency distribution of numerical variable
plot_frequencies()
plot frequency distribution of factor variable
plot_predictions_points_per_well()
Plot Predictions: Points per Well
scatterplot()
scatterplot for comparing numeric predictions with observations
paste_percent()
Paste percent sign to numbers

Export data

Functions for exporting data

save_data()
Save data frame in different formats: csv, RData, rds

Modelling

Functions for ML-modelling

get_predictions()
Get Predictions
resample_dataset()
Resample Dataset
prepare_simulation_donothing_df()
Prepare 'Do Nothing' Simulation Data Frame

Scripts

folder “scripts” contains all scripts for data preparation, data analysis, ml modelling (mainly relying on functions from R package “tidymodels”) and plotting