Tutorial: How to use fhpredict

Install the package

Install the package fhpredict from GitHub, using the package remotes. Install from the “dev” branch to get the latest version:

# install.packages(remotes)

remotes::install_github("kwb-r/fhpredict@dev", build_vignettes = TRUE)

Users

In the data model of the app everything is assigned to a user identified by a unique identifier, the user id. To get an overview on available users and their associated ids, run:

# Get overview on available users in a data frame
users <- fhpredict::api_get_users()

# Print the data frame
users

Bathing Spots

Overview on available bathing spots

Use the function api_get_bathingspot() to get an overview on the bathing spots that are stored in the postgres database. You need to specify the user to whom the bathing spots are associated. See above for how to get a list of available user ids.

# Get overview on the first "limit" bathingspots associated to user with id 3
spots <- fhpredict::api_get_bathingspot(user_id = 3)
spots2 <- fhpredict::api_get_bathingspot(user_id = 3, check = FALSE)
identical(spots, spots2)

fhpredict::api_get_bathingspot(user_id = 33)
fhpredict::api_get_bathingspot(user_id = 33, check = FALSE)

# Number of bathing spots returned
nrow(spots)

By default, only the first bathing spots are considered in the returned data frame. Set the argument limit to a high number so that you get a list of all available bathing spots:

# Get overview on all bathing spots associated to user with id 3
all_spots <- fhpredict::api_get_bathingspot(user_id = 3, limit = 10000)

# Number of bathing spots returned
nrow(all_spots)

Create a new bathing spot

Delete a bathing spot

Fetching data related to one bathing spot

Use the overview as retrieved above to lookup the bathing spot id that you are interested in. Then, call api_get_bathingspot() again, this time by setting the argument spot_id accordingly:

# Lookup bathing spots of interest by name
all_bathing_spots[grep("Jungfernheide", all_bathing_spots$name), c("id", "name")]

# Lookup the bathing spot id and store it in a variable spot_id
spot_id <- 17

# Load the data available for the corresponding bathing spot id
spot <- fhpredict::api_get_bathingspot(user_id = 3, spot_id = spot_id)

It is also possible to get a bathing spot without specifying a user id:

# It is also possible to get a bathing spot without specifying a user id
spot_no_user <- fhpredict::api_get_bathingspot(spot_id = spot_id)

# Is this the same as what we get when a user is specified?
identical(spot, spot_no_user)

Accessing the properties of a bathing spot

The variable spot now contains a list of properties of the selected bathing spot. As the list contains a lot of NULL elements, we remove these elements before looking at the overall structure of the list:

# Look at the overall structure of the list omitting any NULL element
str(kwb.utils::excludeNULL(spot), 1)

Use the $ operator to access the different properties of the bathing spot:

spot$id
spot$nameLong

Polygon defining the “catchment area”

We are especially interested in the coordinates of the polygon that defines the area over which to average the rain data that is assumed to influence the water quality of the bathing spot. This information is stored in the list element area. It can be transformed into a GeoJSON string using the function toJSON() from the jsonlite package. There is also a list element area_coordinates that contains the same coordinates in the form of a two column data frame.

# Polygon coordinates as a GeoJSON string
jsonlite::toJSON(spot$area)

# Polygon coordinates as a data frame
head(spot$area_coordinates)

Water quality measurements

The spot object contains the results of water quality measurements in its list element measurements. Use the (non-exported) function flatten_recursive_list to convert the corresponding recursive list into a data frame.

# Provide the measurements related to the bathing spot
measurements <- fhpredict:::flatten_recursive_list(spot$measurements)

# Have a look at the first measurements (after removing empty columns)
head(kwb.utils::removeEmptyColumns(measurements))

Measurements

Measurements of water quality (concentration of E.coli) are imported from CSV files into the database by the end user. Use the following code to check for which bathing spots measurements are available:

# Set user ID
user_id <- 3

# Get information on bathing spots
spots <- fhpredict::api_get_bathingspot(user_id = user_id, limit = 1000)

# Prepare a named vector of bathing spot IDs to be looped through
spot_ids <- stats::setNames(nm = spots$id)

# Loop through the bathing spots and get the measurements (if any)
measurements <- lapply(spot_ids, function(spot_id) {
  fhpredict::api_get_bathingspot(user_id, spot_id)$measurements
})

# Determine and print the IDs of spots with at least one measurement
n_measurements <- lengths(measurements)

# Show the corresponding number n of measurements
data.frame(
  spot_id = names(n_measurements)[n_measurements > 0], 
  n = unname(n_measurements[n_measurements > 0])
)

Models

Overview on available models

Use the function api_get_model() to get an overview on the models that are stored in the postgres database. You need to pass the user id (here: 3) and the id of the bathing spot (here: 18) to the function.

# Read all models that are stored for one bathing spot of one user
model_info <- fhpredict::api_get_model(user_id = 3, spot_id = 18)

The function returns a data frame with one row per available model:

# Show meta information on the models
model_info

Use this overview on available models to lookup the id of the model that you actually want to fetch from the database.

Fetching a model

Use the model id to fetch a specific model from the database:

# Load model with id 27 from the database
model <- fhpredict::api_get_model(user_id = 3, spot_id = 18, model_id = 3)

# Show the (rstan) model
print(model)

Saving a model

For testing purposes we store a simple, small object instead of a STAN model. We use the cars dataset that is shipped with “base R”. Use the function api_add_model() to add the “model” to the database:

# Look at the head of the cars dataset
head(cars)

# Add the cars dataset to the database
model_id <- fhpredict:::api_add_model(
  user_id = 3, spot_id = 18, model = cars, comment = "Cars in R"
)

The function returns the id of the model that was given by the database. We stored the id in the variable model_id. To check if the model arrived in the database we read it back, again using api_get_model()

# Read the "model" back
my_cars <- fhpredict::api_get_model(3, 18, model_id)

We convince ourselves that what we get is identical to what we stored:

# Nothing should have changed!
identical(cars, my_cars)

Deleting a model

Use the function api_delete_model() to remove a model from the database. Let’s delete the model that we just added. Its id is given in the variable model_id.

# Delete the model with the id given in "model_id"
fhpredict::api_delete_model(3, 18, model_id)

Purification Plants

Use the following script to check if there are any purification plants defined. As there are many bathing spots, the script takes quite a long time and is not run here so that you do not see any outputs.

# Get meta information on all bathing spots
bathing_spots <- fhpredict::api_get_bathingspot(user_id = 3, limit = 10000)

# Number of bathing spots
nrow(bathing_spots)

# Access the "purificationPlants" endpoint for each bathing spot
purification_plants <- lapply(bathing_spots$id, function(spot_id) {
  
  message("spot id = ", spot_id)
  
  result <- fhpredict::safe_postgres_get(path = sprintf(
    "%s/purificationPlants", 
    fhpredict:::path_bathingspot(user_id = 3, spot_id = spot_id)
  ))
  
  result$data
})

# How many bathing spots have purification plants?
sum(lengths(purification_plants) > 0)

Rains

This chapter describes how to

read binary rain data related to a given time interval from files on the Amazon File Server,
spatially select and aggregate rain data that are related to a given bathing spot,
store the rain data in the Postgres database.

Define bathing spot and reference time:

user_id <- 3
spot_id <- 17 # Jungfernheide
sampling_time <- "1050"

There is a top-level function that can be used to perform all steps at once. For the single steps that are performed within this function, see below.

In the following we reduce the range of days for which to load rain data to three days. Omitting date_range or setting it to NULL will instead load rain data for the whole range of dates for which measurements are available.

# Function to be triggered by the user: Provide rain data for the bathing spot
control <- fhpredict::provide_rain_data(
  user_id = 3, spot_id = 17
  #, date_range = as.Date(c("2008-07-01", "2008-07-03"))
)

while (control$remaining) {
  control <- fhpredict::provide_rain_data(control)
}

Check that the data arrived by reloading them from the database:

# Reload rain data from the database
rain <- fhpredict::api_get_rain(user_id = 3, spot_id = 17)

# Show the first records of the rain data
head(rain)

Rains: Details

Read rain data stored for a bathing spot

(rain <- fhpredict::api_get_rain(user_id = 3, spot_id = 43))

Delete all rain data stored for a bathing spot

system.time(fhpredict::api_delete_rain(user_id = 3, spot_id = 43))

Add some fake rain data

n <- 10000
new_rain <- data.frame(
  datum = seq(as.Date("2019-10-10"), by = 1, length.out = n),
  rain = 0.1 * sample(1:10, size = n, replace = TRUE)
)

fhpredict:::is_valid_postgres_api_token(fhpredict:::read_token())

fhpredict:::api_delete_rain(user_id, spot_id)
fhpredict::api_add_rain(user_id = 3, spot_id = 43, rain = new_rain)

date_from <- "20080701"
date_to <- "20080703"

Read rain data from binary Radolan files

# Get URLs to Radolan files to be downloaded and read
urls <- fhpredict:::get_radolan_urls_bucket(
  from = date_from, 
  to = date_to, 
  time = sampling_time, 
  bathing_season_only = TRUE
)

# Read rain data for a certain time period
radolan_stack <- fhpredict:::read_radolan_raster_stack(urls)

# Show the raster images
raster::plot(radolan_stack)

Spacially select and aggregate rain data

We want to use only the rain data that lie within a polygon around the bathing spot. This area is a piece of metadata that are stored in the Postgres database for each bathing spot. We first read all metadata about the bathing spot defined above and provide the area information.

# Get metadata about the current bathing spot
spot <- fhpredict::api_get_bathingspot(spot_id = spot_id)

# Check the name of the bathing spot
spot$nameLong

# Provide the polygon in the same structure as returned by
# select_relevant_rain_area(), a function used by Wolfgang to define a polygon
# using an R Shiny app
area <- fhpredict:::convert_area_structure(spot_area = spot$area)

# Show the structure of the area information (a recursive list)
str(area, 2)

In the next step we use the area information to cut the area regions from the rain data that so far comprises whole Germany:

# Crop the polygons from each raster layer
cropped <- fhpredict:::crop_area_from_radolan_stack(area, radolan_stack)

We check the result by plotting the areas. Important note: It seems that not the polygon is cut but the “extent”, i.e. the smallest possible rectangle that contains the polygon!

raster::plot(cropped)

We now calculate the average rain in each raster layer and provide a simple data frame:

# Get the mean over all layers for each point on the raster
aggregated <- raster::cellStats(cropped, stat = mean)

# The day information can be restored from the names of the layers
dates <- as.Date(substr(names(radolan_stack), 2, 9), format = "%Y%m%d")

rain <- data.frame(
  datum = dates,
  rain = as.numeric(aggregated) / 10
)

Let’s have a look at the created data frame:

rain

Store the rain data in the Postgres database

# Add rain data frame to the database
rain_ids <- fhpredict::api_add_rain(user_id, spot_id, rain)

# Read rain from database
rain <- fhpredict::api_get_rain(user_id, spot_id)

# What should we do with duplicates?
sum(duplicated(kwb.utils::selectColumns(rain, "dateTime")))

# Delete rain data with given rain record ids
fhpredict::api_delete_rain(user_id, spot_id, rain_ids)

# For simplicity reasons: delete all rain data before inserting new data
#fhpredict::api_delete_rain(user_id, spot_id)

# TODO: Remove only the duplicates, keeping the most current values

Discharge data

# Get existing discharge data
discharge_db <- fhpredict::api_get_discharge(user_id, spot_id)

# Show the first records
head(discharge_db)

# Provide discharge test data
discharge <- data.frame(
  dateTime = as.POSIXct(tz = "Europe/Berlin", c(
    "2019-09-09 16:02:00", "2019-09-09 16:03:00"
  )),
  discharge = c(1.0, 1.1)
)

# Add discharge test data to database
ids <- fhpredict:::api_add_discharge(user_id, spot_id, discharge)

# Delete the records that have just been added. Omitting the IDs will remove
# all discharge data related to the bathing spot
fhpredict::api_delete_discharge(user_id, spot_id, ids = ids)

Irradiances and Generic Inputs

# Define user and bathing spot -------------------------------------------------
user_id <- 3
spot_id <- 42

# Define artificial time series ------------------------------------------------
n <- 10
timeseries <- data.frame(
  date = seq(as.Date("2018-07-12"), by = 1, length.out = n), 
  dateTime = "12:13:14",
  value = rnorm(n, 100)
)

# Get overview on generic inputs -----------------------------------------------
fhpredict:::api_get_generic(user_id, spot_id)
fhpredict:::api_get_generic(user_id, spot_id, generic_id = 16)

# Delete existing generic inputs -----------------------------------------------
fhpredict:::api_delete_generic(user_id, spot_id, generic_id = 6)

# Delete all generics!
fhpredict:::api_delete_generic(user_id, spot_id)

# Create a new generic input ---------------------------------------------------
fhpredict:::api_add_generic(user_id, spot_id, name = "generic_1")
fhpredict:::api_add_generic(user_id, spot_id, name = "generic_2")
fhpredict:::api_add_generic(user_id, spot_id, name = "generic_3")

# Add measurements to a specific generic input ---------------------------------
fhpredict:::api_add_generic_measurements(user_id, spot_id, 16, data = timeseries)

# Read back the generic input. Where are the measurements?
fhpredict:::postgres_get(
  fhpredict:::path_generic_measurements(user_id, spot_id, 16)
)

fhpredict:::api_get_generic(user_id, spot_id, generic_id = 16)

# Get irradiance measurements --------------------------------------------------
fhpredict:::api_get_irradiances(user_id, spot_id)

# Delete all irradiance measurements -------------------------------------------
fhpredict:::api_delete_irradiances(user_id, spot_id)

# Add irradiance measurements --------------------------------------------------
fhpredict:::api_add_irradiances(user_id, spot_id, data = timeseries)

Get overview on available data

The following function reads all timeseries that are stored for a bathing spot and returns the range of dates covered as well as the number nof data points.

user_id <- 9
spots <- fhpredict::api_get_bathingspot(user_id)
spot_ids <- setNames(spots$id, spots$name)
summaries <- lapply(spot_ids, fhpredict::get_data_summary, user_id = user_id)