Changelog
Source:NEWS.md
wasserportal 0.5.0.9000 (development version)
Wrap each
httr2::req_perform_parallel()batch intb_push_station_telemetry()mode = "single"in a batch-level retry loop (4 attempts with2 / 4 / 8 sbackoff). The per-requestretry_on_failure = TRUEadded in the previous bullet recovers from a curl-level error on a fresh libcurl handle, but when the upstream load balancer silently drops a connection in the curl pool the dead handle stays poisoned across all four configured per-request retries: every retry hits the same dead handle and dies with “Send failure: Broken pipe” within milliseconds, the resulting curl condition bubbles up throughreq_perform_parallel()and aborts the whole station (observed in the wild after only ~2240/13039 records on station 7045 on 2026-05-13 09:45, 3 s between last good POST and the abort – no perceptible retry pause). Retrying the batch as a whole forces httr2 to allocate a new connection on the next attempt and is safe because the underlying(ts, key)telemetry POSTs are idempotent on the ThingsBoard side – a re-POST of an already accepted record overwrites itself with the same value, never creates a duplicate row.Pass
retry_on_failure = TRUEto everyhttr2::req_retry()call inR/push_to_thingsboard.R(single-mode and bulk telemetry, attributes, latest telemetry, telemetry delete). The defaultreq_retry()only retries HTTP responses with selected status codes; transport-layer dropouts that error out before the request produces a response (TCP “Broken pipe”, peer-closed TLS session, brief DNS hiccups) used to bubble straight up throughhttr2::req_perform_parallel()and abort the whole station mid push – observed in the wild after ~25 min on station 7044 at record ~9030/13362. Withretry_on_failure = TRUEthe same record gets retried up to four times with the existing exponential backoff (2, 4, 8, 16 s), and because ThingsBoard de-duplicates by(ts, key)the retry never produces a duplicate row even when the first attempt actually reached the server before the connection dropped.Add
tb_setup_devices(),tb_push_station_telemetry()andtb_push_station_attributes()for shipping Wasserportal time series and master data into a ThingsBoard tenant via the device-token telemetry API.tb_setup_devices()bootstraps a fresh tenant from an account-level API key, so the rest of the workflow runs from R aloneAdd
vignettes/thingsboard-demo.Rmdwalking through the ThingsBoard Cloud free-tier (Maker) demo oneu.thingsboard.cloud, including the switch to self-hosted Community EditionAdd
inst/scripts/push_to_thingsboard.Rconsuming the daily JSON artefacts on thegh-pagesbranch (no Wasserportal scrape of its own). The script picks the five groundwater stations with the longest combined gwl + gwq history and the most distinct gwq parameters, uploads merged master data as device attributes and pushes both the level and quality time series as telemetryConvert
Rechtswert_UTM_33_N/Hochwert_UTM_33_N(ETRS89 / UTM zone 33N, EPSG:25833) to WGS84latitude/longitudeattributes so ThingsBoard map widgets work out of the boxAdd
.github/workflows/thingsboard-push.yamlrunning the script on push tomain/master/dev, daily at 07:00 UTC and viaworkflow_dispatch. Credentials are read from theTB_HOSTandTB_API_KEYrepository secretsAuthenticate
tb_setup_devices()with theX-Authorization: ApiKey <key>request header that ThingsBoard expects for account-level API keys (the standardAuthorization: Bearer ...and the JWT-styleX-Authorization: Bearer ...variants both return HTTP 401)Drop pre-1970 timestamps inside
build_telemetry_payload(). Some Wasserportal groundwater stations start in the 1950s, which yields negative epoch milliseconds (the Unix/POSIX epoch is defined as 1970-01-01 UTC, see IEEE Std 1003.1, “4.16 Seconds Since the Epoch”). ThingsBoard transportstsas a JavaLongof epoch milliseconds (see the HTTP Device API reference); negative values are spec-legal but the Maker free tier observed in this branch responds with an opaque HTTP 500 to such posts. Filteringts_ms > 0keeps the rest of the (post-1970) history flowing through. For station 3 this drops about 17 years of monthly groundwater level readings while preserving the remaining ~7800 valuesWire a
tb_error_body()helper intohttr2::req_error(body = ...)on the telemetry and attributes calls so future ThingsBoard failures surface the JSONmessagefield in the R error instead of the generic “HTTP 500 Internal Server Error” wrapperAdd
tb_push_latest_telemetry()for the simplest{"key": value}form (server-stamped time). Used ininst/scripts/push_to_thingsboard.Ras a smoke test before the bulk push: the bulk array-of-records form returns an opaque HTTP 500 on the ThingsBoard Cloud Maker free tier even though the same device accepts attribute writes and the simpler per-record formatAdd a
modeparameter totb_push_station_telemetry()("single"by default,"bulk"for self-hosted CE). Single mode POSTs each record as a standalone{"ts": ms, "values": {...}}object so historical telemetry actually goes through on Maker free; bulk mode keeps the previous fast array-per-chunk behaviour for self-hosted CEAdd a
throttle_secondsparameter totb_push_station_telemetry()so the inter-request sleep can be tuned per ThingsBoard plan instead of being hardcoded.NULL(default) keeps the previous values (50 ms in single mode, 100 ms in bulk mode); pass a non-zero number to slow down or0to push as fast as the server permits (e.g. self-hosted CE)Add
tb_plan_defaults()and a matchingTB_PLANenv var so the GH-Actions push picksmode,chunk_sizeandthrottle_secondsfrom the per-device transport rate limits documented at https://thingsboard.io/docs/paas/eu/subscriptions/. Presets:free->singlemode (proven to work end-to-end on the Maker free tier);free-bulk-> bulk preset for Free withchunk_size = 10/throttle_seconds = 1.0; confirmed not to work on the public Cloud Maker tier as of 2026-05 – the gateway returns the same empty-body HTTP 500 to a 10-record array as it did to the original 100-record one, so the array form is rejected regardless of payload size. Kept as a reproducible baseline.prototype/pilot/startup/business->bulkwithchunk_size = 30/throttle_seconds = 1.0(~30 dp/s, near the 2 000 dp/min per-device cap shared across all paid tiers);ce-> unlimited bulk for self-hosted Community Edition. AddTB_TELEMETRY_MODE,TB_CHUNK_SIZEandTB_THROTTLE_SECONDSenv vars on top ofTB_PLANso individual values can be overridden without switching plansExpose the plan and the per-run knobs as
workflow_dispatchinputs inthingsboard-push.yaml(plan,station_ids,history_days,telemetry_types) and document the workflow_dispatch input -> repository secret -> hardcoded default fallback chain in a header comment of the env block. The default plan isfree(single mode, proven to work);free-bulkis exposed as a workflow_dispatch option but stays out of the cron path until ThingsBoard lifts the Maker array-form rejectionDrop the
tb_push_latest_telemetry()“smoke test” thatinst/scripts/push_to_thingsboard.Rran per device before the bulk telemetry push. The smoke test posted one value per station with{"key": value}(no timestamp – server stamped with the current wall-clock time), originally as a fail-fast probe for the Maker free-tier auth/payload path. The visible side effect was a stale “GW-Stand =@ ” row that drowned out the real most-recent measurement in the device’s Latest telemetry view. The bulk historical push fails on its own first POST anyway, so the safety net was redundant. tb_push_latest_telemetry()itself stays as an exported helper for ad-hoc connectivity probesAdd
tb_get_device_id(),tb_list_device_telemetry_keys()andtb_delete_device_telemetry()for read-only device discovery and selective telemetry cleanup against the ThingsBoard plugin API (GET /api/tenant/devices,GET /api/plugins/telemetry/DEVICE/{id}/keys/timeseries,DELETE /api/plugins/telemetry/DEVICE/{id}/timeseries/delete). All three acceptTB_HOST/TB_API_KEYfrom the environment so they can be called from a fresh R session without explicit credentials. Passkeys = NULLtotb_delete_device_telemetry()to wipe every key the device currently stores; server-side attributes (latitude, longitude, Bezirk, …) are left in place so the map widget keeps working after a wipe. Stale rows from the now-removed smoke test can also be cleared interactively in the ThingsBoard UI (Device > Latest telemetry > tick the row > trash icon)Add
inst/extdata/thingsboard-dashboard.json, an importable ThingsBoard dashboard for the demo: an OpenStreetMap of the five Berlin groundwater stations, a master-data entities table and two time-series charts (groundwater level, selected quality parameters). All four widgets discover thewasserportal-gw-*devices via anentityName-prefix alias so the import works without hardcoding device IDs. The dashboard-level timewindow runs from1970-01-01 UTC(POSIX epoch) to2027-01-01 UTCwithaggregation = NONEandlimit = 50000per series, so the charts return raw unaveraged measurements over the full Wasserportal archive rather than daily averages (the earlierAVGaggregation over the 130-year1970..2100window had made ThingsBoard show an indefinite loading spinner whenever the time-window selector was touched; switching toNONEkeeps the wide range usable because the server only needs to return up to 50000 sorted raw points per (entity, key) pair which is comfortably above the ~16000 GW-Stand and ~8000 GWQ records per station that the Wasserportal archive contains). The map widget uses the moderntypeFullFqn = "system.map"reference together with thelatKeyName = "latitude"/lngKeyName = "longitude"settings binding that thesystem.mapwidget accepts as a stable backward-compatible attribute mapping, so markers render right after import (an earliermarkersarray variant withxKey/yKeyleft the map empty against the same lat/lon attributes)Speed up
mode = "single"withhttr2::req_perform_parallel(). The previous sequential one-POST-at-a-time loop was network-bound at ~1.2 records/s for the GWQ push (~5 h per station for the full history); concurrent posting withmax_active = 10lifts that to ~10 records/s.tb_push_station_telemetry()gains amax_activeparameter;tb_plan_defaults()returns it per plan (default10for Free,1elsewhere); the script readsTB_MAX_ACTIVEfrom env / repo secrets through the sameenv_or()plan-fallback chain. Pace concurrent batches one-max_active-group at a time and retry on transient HTTP 500/502/503/504 with exponential backoff, so the Free tier’s 600 messages/minute sustained per-device limit doesn’t trip the gateway after ~35 s at 48 records/s (the symptom we hit with the initial implementation)Send one telemetry record per
(timestamp, key, value)triple inmode = "single"instead of grouping every Parameter that shares a timestamp into a single record. Wasserportal groundwater quality data has ~30 analytes per sampling event; the resulting “fat”valuesdicts produced an opaque empty-body HTTP 500 on Cloud Maker even though the same keys went through one at a time (seetb_push_latest_telemetry()smoke tests).build_telemetry_payload()gains agroup_by_tsparameter (defaultTRUE); the push function flips it off in single mode and keeps grouping in bulk mode for compact array chunksSanitise telemetry keys before serialising the values dict. Wasserportal groundwater quality parameters such as
Leitfaehigkeit 25 grd C vor Ort,Wasserst. (ROK) vor,pH-Wert (Feld)orTemperatur (Wasser)triggered an opaque HTTP 500 on the Maker free tier when used as raw JSON keys (after the level data already pushed cleanly). The newsanitize_tb_key()helper folds umlauts, drops parentheses and replaces spaces / dots / commas with underscores so quality data goes through too. Add aTB_TELEMETRY_TYPESenv var ("gwl,gwq"by default) so a partial retry can skip the slow level re-push and only re-do the quality push
wasserportal 0.5.0 2026-05-07
- Modernize GitHub Actions workflows: use
r-lib/actions/setup-r-dependencies@v2andr-lib/actions/check-r-package@v2onubuntu-latestinstead of the deprecated v2/ubuntu-20.04/r-hub/sysreqstoolchain - Bump JavaScript actions to Node-24-compatible versions (
actions/checkout@v5,actions/upload-artifact@v5) and setFORCE_JAVASCRIPT_ACTIONS_TO_NODE24=trueso transitiver-lib/actions/*@v2steps opt into Node 24 as well, ahead of the June 2nd 2026 deprecation of Node 20 on GitHub Actions runners - Add Claude Code review workflows (
claude.yaml,claude-code-review.yaml) -
get_wasserportal_master_data(): match the new HTML5 markup of the master-data table (<caption>Pegel Berlin</caption>instead of the legacysummary="Pegel Berlin"attribute) - Decode wasserportal pages explicitly as
windows-1252. The pages declare UTF-8 in<meta charset>but the server actually emits Latin-1 bytes (e.g.0xE4forä); trusting the meta declaration left those bytes mis-marked as UTF-8 and brokesubst_special_chars()’sä→ae/ü→uesubstitutions on Windows R - Bypass
rvest::html_table()andxml2::xml_text(trim = TRUE)inget_wasserportal_master_data()andget_wasserportal_stations_table(): both delegate to asub("^[[:space:] ]+", ...)pass that fails on Windows R when the cell text contains umlauts. Tables are now extracted directly viaxml2and trimmed with a locale-safegsub(..., useBytes = TRUE)helper (trim_bytes()) - Make
get_stations()andget_wasserportal_masters_data()resilient when parallel workers cannot fetch a station overview: load thewasserportalnamespace into the cluster and droptry-errorresults beforedata.table::rbindlist()/dplyr::left_join() - Make live-HTTP tests skip gracefully when
wasserportal.berlin.deis unreachable from the test host (CRAN, sandboxed CI) - Update
get_wasserportal_masters_data()test expectations to include the newAnmerkungcolumn that wasserportal added to surface-water master data
wasserportal 0.4.0 2024-04-05
- New feature: add support for downloading all available surface water quality data for one or multiple monitoring stations. For details see
get_surfacewater_qualities() - Bugfix for groundwater level and quality due to new Wasserportal API
- Add project AD4GD as funder
wasserportal 0.3.0 2023-02-19
- Fix errors in GitHub actions: use actions from branches
v2,v3, not frommaster - Fix errors in tutorial.Rmd
- Fix errors in documentation
- Do not run examples that use parallel processing
-
get_stations(): add argumentn_cores -
get_wasserportal_stations_table(): Use new (three letter) variable codes -
read_wasserportal_raw(): adapt request to new API version, add argumentapi_version -
read_wasserportal_raw_gw(): adapt request to new API version - Clean code, mainly to reduce duplicated duplication and to improve readability
- check for more errors
- use “safe” element selection
- use more helper functions
- use helper functions in vignettes
- improve names
wasserportal 0.2.0 2022-09-08
Add functions for exporting time series data to
zipfiles (wp_masters_data_to_list()) and master data tocsvfiles (wp_timeseries_data_to_list()), which will be uploaded to https://kwb-r.github.io/wasserportal/<filename>In addition
importfunctions for downloading and importing the datasets above into R as lists were added (list_timeseries_data_to_zip(),list_masters_data_to_csv())Code cleaning by
@hsonnestartedFix
master datarequests by using themaster_urlinstead ofstation_id, as the latter was not unique. Now functionsget_wasserportal_master_data()and it wrapper functionget_wasserportal_masters_data()require themaster_urlinstead ofstation_idas input parameter. The functionget_stationsnow adds the columnstammdaten_linkas additional column for each sublist element of the sublistoverview_list.Fix to scrape
groundwater leveldata from all available monitoring stations (instead of only 5!) and export to.csvfile. In addition switch also to.csvexport forgroundwater qualityinstead of.jsondue to reduced storage space (stations_gwq_data.json file is already 98.4 MB large.Add functions (
get_daily_surfacewater_data()) and adapt article Surface Water for scraping all available daily surface water data and exporting to one.csvfile for each parameter (containing all monitoring stations)Deactivate PROMISCES workflows (see wasserportal v0.1.0), due to failing Zenodo download. Will be moved into own R package, most properly kwb.promisces.
wasserportal 0.1.1 2022-06-09
- Fix bug in
get_wasserportal_stations_table()now correctly naming parametertemperature(formerly incorrectlylevel) - Fix Surface Water article
- Adapt Zenodo DOI badge to cite always latest release
wasserportal 0.1.0 2022-06-01
R package for scraping groundwater data (groundwater level and quality) from Wasserportal Berlin. Please note that the support for scraping surface water monitoring stations is currently very limited!
Functions:
-
get_stations(): returns metadata for all available monitoring stations -
get_wasserportal_masters_data(): get master data for selectedstation_ids -
read_wasserportal_raw_gw(): enables the download ofgroundwater data. Checkout the Tutorial article how to use it for downloading one or multiple stations at once. -
read_wasserportal(): works forsurface watermonitoring stations, but is outdated, as it is based on an outdated static file with station/variable names (i.e. only for11instead of82stations currently provided!) instead of relying on new metadata provided online. This will be fixed within the next release. For progress on this issue checkout #21
Workflows:
Tutorial article how to download groundwater level and quality data
-
Further Usage by combining previously scraped (see tutorial above) data and performing some analysis:
Groundwater, e.g. creating a map with GW level trends
Two workflows (REACH UBA, Norman List) created within the project PROMISCES for assessing prevalence and the spatial distribution of persistent, mobile and toxic (PMT) substances in the Berlin groundwater, based on different PMT lists, i.e. REACH UBA or Norman List.
wasserportal 0.0.0.9000
Added a
NEWS.mdfile to track changes to the package.see https://style.tidyverse.org/news.html for writing a good
NEWS.md