Changelog
Source:NEWS.md
wasserportal 0.7.0 2026-06-19
- Add
tb_login()and username/password (JWT) authentication across the ThingsBoard tenant-API helpers (tb_setup_devices(),tb_get_device_id(),tb_list_device_telemetry_keys(),tb_delete_device_telemetry()). Self-hosted ThingsBoard Community Edition (e.g.https://dashboards.inowas.org) has no account-level API keys – that is a ThingsBoard Cloud convenience – so it can only be reached viaPOST /api/auth/login(username + password), which returns a short-lived JWT sent asX-Authorization: Bearer <token>. Each helper now resolves its auth header via an internaltb_auth_header(): setTB_USERNAME+TB_PASSWORD(these take precedence overTB_API_KEYwhen both are present) and, for self-hosted instances,TB_PLAN=ce(bulk mode, no throttling). The account-level API-key path (ThingsBoard Cloud) keeps working unchanged, and the device-token telemetry push (/api/v1/{token}/telemetry) is identical on all editions. Thethingsboard-push.yamlworkflow reads the two new credentials from theTB_USERNAME/TB_PASSWORDrepository secrets. - Make the station selection of
inst/scripts/push_to_thingsboard.Rconfigurable for full (non-demo) pushes:TB_MAX_DEVICES=0lifts the 5-device cap (push every candidate station), and a newTB_STATION_SCOPEchooses which groundwater stations qualify –both(default: level AND quality, the proven demo set),any(level OR quality),gwl/gwq(has that series, possibly both) orgwl-only/gwq-only(has only that series). Both knobs are exposed asthingsboard-push.yamlrepository secrets andworkflow_dispatchinputs. Distinct gwq parameters per station are now counted once viasplit()+vapply()instead of a per-station table rescan, so scoring the full several-hundred-station pool stays fast. The helper returns a plain named integer vector (matching the idiom already used inR/get_stations.RandR/inspect_gh_pages_zips.R) instead of the 1-D array that an intermediatetapply()implementation produced. - Update the Kompetenzzentrum Wasser Berlin (KWB) author logo in
_pkgdown.ymlto the new brand asset (logos.kompetenz-wasser.io/KWB_Logo_M_Blau_RGB.svg). - Declare
Depends: R (>= 4.1.0)– the package (notablyR/push_to_thingsboard.Randinst/scripts/push_to_thingsboard.R) uses the native|>pipe, whichR CMD checkotherwise flags as an undeclared dependency – and drop the unusedLazyDatafield (there is nodata/directory, soR CMD buildomitted it anyway). - Warn in
tb_auth_header()when only one ofTB_USERNAME/TB_PASSWORDis set and a leftoverTB_API_KEYcauses a silent fallback to the Cloud API-key path. The typical misconfiguration (workflow secret missing on one of the two JWT credentials with a staleTB_API_KEYstill populated) used to surface only as a genericauth: account API keylog line; the newwarning()calls out the misconfiguration so the user can fix the missing secret instead of chasing a wrong-credentials failure further downstream. The pure Cloud (onlyTB_API_KEY) and pure JWT (bothTB_USERNAMEandTB_PASSWORD) paths stay quiet, and the existingstop()for the no-credentials case is unchanged. - Clarify the station-selection diagnostic in
inst/scripts/push_to_thingsboard.R: the row previously labelledwith gwl AND gwqis nowin both master files AND has both seriesbecause it is the only row that intersectsmaster_gwlandmaster_gwq(strict), while the per-series rows (with gwl data,with gwq data) count against the union of the two master files (relaxed). Stations that appear in only one master file but have rows in both gwl and gwq data were being counted in the per-series totals but not in the intersect total, so the displayed numbers did not add up the way a reader expected when the two master files don’t perfectly overlap. An inline comment documents the intentional asymmetry. - Make
tb_login()more robust against flaky upstreams: widen the retry predicate from thehttr2default (HTTP 429 / 503 only) to{408, 429, 500, 502, 503, 504}and bumpmax_triesfrom 3 to 4, matching the predicate already used bytb_push_station_telemetry().POST /api/auth/loginis idempotent, so retrying is safe; this keepstb_setup_devices()from aborting on a cold-start 500 / 502 / 504 from a self-hosted ThingsBoard sitting behind nginx or a load balancer. Also document the trade-off that a non-2xx response surfaces an excerpt of the server’s response body (viatb_error_body(), up to ~800 chars) in the R error andreq_retry()retry messages – stock ThingsBoard only echoes the error description, so the password does not leak, but operators of self-hosted instances whose reverse proxy echoes request fields back in the error body should mask the relevant secrets in their CI config. - Validate the numeric
TB_*environment variables (TB_MAX_DEVICES,TB_HISTORY_DAYS,TB_CHUNK_SIZE,TB_THROTTLE_SECONDS,TB_MAX_ACTIVE) up front ininst/scripts/push_to_thingsboard.Rand abort with a clear message when a value is not a number, instead of letting anNAcrash a downstreamif (x > 0)only after every device attribute set has already been pushed. The message points out the usual cause:.Renvirondoes not support inline# comments, soTB_HISTORY_DAYS = 7 # ...otherwise coerces toNA. - Clean up the public signature of
tb_list_device_telemetry_keys()by dropping theauthargument that 0.7.0 had briefly added for the chained-call case. The single in-package consumer (tb_delete_device_telemetry()) now calls an internaltb_list_device_telemetry_keys_impl(device_id, auth, host)that takes the pre-resolvedX-Authorizationheader, so the one-round-trip saving is preserved without mixing “pass me credentials” and “skip credentials, here’s the header” in the same exported function. Removes the silent precedence whereauth = "Bearer ..."together withapi_key/ JWT credentials would have ignored the latter without warning. - Improve the station-selection block of
inst/scripts/push_to_thingsboard.Rin two ways. First, flag orphan stations – IDs that have rows ingwl_data/gwq_databut are missing from both master files – with amessage()listing the count and the first few IDs. Every scope intersects its candidate set withmaster_union, so those orphans are silently dropped from the candidate pool; without the message a master / data drift would be invisible in the diagnostic counts. Second, add a newin either master AND has both seriesrow to the diagnostic block (computed asmaster_union ∩ ids_gwl ∩ ids_gwq) so that the row-sum identitywith_gwl + with_gwq - both = only_gwl + only_gwq + bothactually holds for readers scanning the message; the existing strict row is renamed tostrict: in both masters AND both seriesand gets an inline(strict: master_intersect)annotation so the intentional asymmetry againstmaster_unionstays visible.
wasserportal 0.6.0 2026-06-17
Wrap each
httr2::req_perform_parallel()batch intb_push_station_telemetry()mode = "single"in a batch-level retry loop (4 attempts with2 / 4 / 8 sbackoff). The per-requestretry_on_failure = TRUEadded in the previous bullet recovers from a curl-level error on a fresh libcurl handle, but when the upstream load balancer silently drops a connection in the curl pool the dead handle stays poisoned across all four configured per-request retries: every retry hits the same dead handle and dies with “Send failure: Broken pipe” within milliseconds, the resulting curl condition bubbles up throughreq_perform_parallel()and aborts the whole station (observed in the wild after only ~2240/13039 records on station 7045 on 2026-05-13 09:45, 3 s between last good POST and the abort – no perceptible retry pause). Retrying the batch as a whole forces httr2 to allocate a new connection on the next attempt and is safe because the underlying(ts, key)telemetry POSTs are idempotent on the ThingsBoard side – a re-POST of an already accepted record overwrites itself with the same value, never creates a duplicate row.Pass
retry_on_failure = TRUEto everyhttr2::req_retry()call inR/push_to_thingsboard.R(single-mode and bulk telemetry, attributes, latest telemetry, telemetry delete). The defaultreq_retry()only retries HTTP responses with selected status codes; transport-layer dropouts that error out before the request produces a response (TCP “Broken pipe”, peer-closed TLS session, brief DNS hiccups) used to bubble straight up throughhttr2::req_perform_parallel()and abort the whole station mid push – observed in the wild after ~25 min on station 7044 at record ~9030/13362. Withretry_on_failure = TRUEthe same record gets retried up to four times with the existing exponential backoff (2, 4, 8, 16 s), and because ThingsBoard de-duplicates by(ts, key)the retry never produces a duplicate row even when the first attempt actually reached the server before the connection dropped.Add
tb_setup_devices(),tb_push_station_telemetry()andtb_push_station_attributes()for shipping Wasserportal time series and master data into a ThingsBoard tenant via the device-token telemetry API.tb_setup_devices()bootstraps a fresh tenant from an account-level API key, so the rest of the workflow runs from R aloneAdd
vignettes/thingsboard-demo.Rmdwalking through the ThingsBoard Cloud free-tier (Maker) demo oneu.thingsboard.cloud, including the switch to self-hosted Community EditionAdd
inst/scripts/push_to_thingsboard.Rconsuming the daily JSON artefacts on thegh-pagesbranch (no Wasserportal scrape of its own). The script picks the five groundwater stations with the longest combined gwl + gwq history and the most distinct gwq parameters, uploads merged master data as device attributes and pushes both the level and quality time series as telemetryConvert
Rechtswert_UTM_33_N/Hochwert_UTM_33_N(ETRS89 / UTM zone 33N, EPSG:25833) to WGS84latitude/longitudeattributes so ThingsBoard map widgets work out of the boxAdd
.github/workflows/thingsboard-push.yamlrunning the script on push tomain/master/dev, daily at 07:00 UTC and viaworkflow_dispatch. Credentials are read from theTB_HOSTandTB_API_KEYrepository secretsAuthenticate
tb_setup_devices()with theX-Authorization: ApiKey <key>request header that ThingsBoard expects for account-level API keys (the standardAuthorization: Bearer ...and the JWT-styleX-Authorization: Bearer ...variants both return HTTP 401)Drop pre-1970 timestamps inside
build_telemetry_payload(). Some Wasserportal groundwater stations start in the 1950s, which yields negative epoch milliseconds (the Unix/POSIX epoch is defined as 1970-01-01 UTC, see IEEE Std 1003.1, “4.16 Seconds Since the Epoch”). ThingsBoard transportstsas a JavaLongof epoch milliseconds (see the HTTP Device API reference); negative values are spec-legal but the Maker free tier observed in this branch responds with an opaque HTTP 500 to such posts. Filteringts_ms > 0keeps the rest of the (post-1970) history flowing through. For station 3 this drops about 17 years of monthly groundwater level readings while preserving the remaining ~7800 valuesWire a
tb_error_body()helper intohttr2::req_error(body = ...)on the telemetry and attributes calls so future ThingsBoard failures surface the JSONmessagefield in the R error instead of the generic “HTTP 500 Internal Server Error” wrapperAdd
tb_push_latest_telemetry()for the simplest{"key": value}form (server-stamped time). Used ininst/scripts/push_to_thingsboard.Ras a smoke test before the bulk push: the bulk array-of-records form returns an opaque HTTP 500 on the ThingsBoard Cloud Maker free tier even though the same device accepts attribute writes and the simpler per-record formatAdd a
modeparameter totb_push_station_telemetry()("single"by default,"bulk"for self-hosted CE). Single mode POSTs each record as a standalone{"ts": ms, "values": {...}}object so historical telemetry actually goes through on Maker free; bulk mode keeps the previous fast array-per-chunk behaviour for self-hosted CEAdd a
throttle_secondsparameter totb_push_station_telemetry()so the inter-request sleep can be tuned per ThingsBoard plan instead of being hardcoded.NULL(default) keeps the previous values (50 ms in single mode, 100 ms in bulk mode); pass a non-zero number to slow down or0to push as fast as the server permits (e.g. self-hosted CE)Add
tb_plan_defaults()and a matchingTB_PLANenv var so the GH-Actions push picksmode,chunk_sizeandthrottle_secondsfrom the per-device transport rate limits documented at https://thingsboard.io/docs/paas/eu/subscriptions/. Presets:free->singlemode (proven to work end-to-end on the Maker free tier);free-bulk-> bulk preset for Free withchunk_size = 10/throttle_seconds = 1.0; confirmed not to work on the public Cloud Maker tier as of 2026-05 – the gateway returns the same empty-body HTTP 500 to a 10-record array as it did to the original 100-record one, so the array form is rejected regardless of payload size. Kept as a reproducible baseline.prototype/pilot/startup/business->bulkwithchunk_size = 30/throttle_seconds = 1.0(~30 dp/s, near the 2 000 dp/min per-device cap shared across all paid tiers);ce-> unlimited bulk for self-hosted Community Edition. AddTB_TELEMETRY_MODE,TB_CHUNK_SIZEandTB_THROTTLE_SECONDSenv vars on top ofTB_PLANso individual values can be overridden without switching plansExpose the plan and the per-run knobs as
workflow_dispatchinputs inthingsboard-push.yaml(plan,station_ids,history_days,telemetry_types) and document the workflow_dispatch input -> repository secret -> hardcoded default fallback chain in a header comment of the env block. The default plan isfree(single mode, proven to work);free-bulkis exposed as a workflow_dispatch option but stays out of the cron path until ThingsBoard lifts the Maker array-form rejectionDrop the
tb_push_latest_telemetry()“smoke test” thatinst/scripts/push_to_thingsboard.Rran per device before the bulk telemetry push. The smoke test posted one value per station with{"key": value}(no timestamp – server stamped with the current wall-clock time), originally as a fail-fast probe for the Maker free-tier auth/payload path. The visible side effect was a stale “GW-Stand =@ ” row that drowned out the real most-recent measurement in the device’s Latest telemetry view. The bulk historical push fails on its own first POST anyway, so the safety net was redundant. tb_push_latest_telemetry()itself stays as an exported helper for ad-hoc connectivity probesAdd
tb_get_device_id(),tb_list_device_telemetry_keys()andtb_delete_device_telemetry()for read-only device discovery and selective telemetry cleanup against the ThingsBoard plugin API (GET /api/tenant/devices,GET /api/plugins/telemetry/DEVICE/{id}/keys/timeseries,DELETE /api/plugins/telemetry/DEVICE/{id}/timeseries/delete). All three acceptTB_HOST/TB_API_KEYfrom the environment so they can be called from a fresh R session without explicit credentials. Passkeys = NULLtotb_delete_device_telemetry()to wipe every key the device currently stores; server-side attributes (latitude, longitude, Bezirk, …) are left in place so the map widget keeps working after a wipe. Stale rows from the now-removed smoke test can also be cleared interactively in the ThingsBoard UI (Device > Latest telemetry > tick the row > trash icon)Add
inst/extdata/thingsboard-dashboard.json, an importable ThingsBoard dashboard for the demo: an OpenStreetMap of the five Berlin groundwater stations, a master-data entities table and two time-series charts (groundwater level, selected quality parameters). All four widgets discover thewasserportal-gw-*devices via anentityName-prefix alias so the import works without hardcoding device IDs. The dashboard-level timewindow runs from1970-01-01 UTC(POSIX epoch) to2027-01-01 UTCwithaggregation = NONEandlimit = 50000per series, so the charts return raw unaveraged measurements over the full Wasserportal archive rather than daily averages (the earlierAVGaggregation over the 130-year1970..2100window had made ThingsBoard show an indefinite loading spinner whenever the time-window selector was touched; switching toNONEkeeps the wide range usable because the server only needs to return up to 50000 sorted raw points per (entity, key) pair which is comfortably above the ~16000 GW-Stand and ~8000 GWQ records per station that the Wasserportal archive contains). The map widget uses the moderntypeFullFqn = "system.map"reference together with thelatKeyName = "latitude"/lngKeyName = "longitude"settings binding that thesystem.mapwidget accepts as a stable backward-compatible attribute mapping, so markers render right after import (an earliermarkersarray variant withxKey/yKeyleft the map empty against the same lat/lon attributes)Speed up
mode = "single"withhttr2::req_perform_parallel(). The previous sequential one-POST-at-a-time loop was network-bound at ~1.2 records/s for the GWQ push (~5 h per station for the full history); concurrent posting withmax_active = 10lifts that to ~10 records/s.tb_push_station_telemetry()gains amax_activeparameter;tb_plan_defaults()returns it per plan (default10for Free,1elsewhere); the script readsTB_MAX_ACTIVEfrom env / repo secrets through the sameenv_or()plan-fallback chain. Pace concurrent batches one-max_active-group at a time and retry on transient HTTP 500/502/503/504 with exponential backoff, so the Free tier’s 600 messages/minute sustained per-device limit doesn’t trip the gateway after ~35 s at 48 records/s (the symptom we hit with the initial implementation)Send one telemetry record per
(timestamp, key, value)triple inmode = "single"instead of grouping every Parameter that shares a timestamp into a single record. Wasserportal groundwater quality data has ~30 analytes per sampling event; the resulting “fat”valuesdicts produced an opaque empty-body HTTP 500 on Cloud Maker even though the same keys went through one at a time (seetb_push_latest_telemetry()smoke tests).build_telemetry_payload()gains agroup_by_tsparameter (defaultTRUE); the push function flips it off in single mode and keeps grouping in bulk mode for compact array chunksSanitise telemetry keys before serialising the values dict. Wasserportal groundwater quality parameters such as
Leitfaehigkeit 25 grd C vor Ort,Wasserst. (ROK) vor,pH-Wert (Feld)orTemperatur (Wasser)triggered an opaque HTTP 500 on the Maker free tier when used as raw JSON keys (after the level data already pushed cleanly). The newsanitize_tb_key()helper folds umlauts, drops parentheses and replaces spaces / dots / commas with underscores so quality data goes through too. Add aTB_TELEMETRY_TYPESenv var ("gwl,gwq"by default) so a partial retry can skip the slow level re-push and only re-do the quality push
wasserportal 0.5.0 2026-05-07
- Modernize GitHub Actions workflows: use
r-lib/actions/setup-r-dependencies@v2andr-lib/actions/check-r-package@v2onubuntu-latestinstead of the deprecated v2/ubuntu-20.04/r-hub/sysreqstoolchain - Bump JavaScript actions to Node-24-compatible versions (
actions/checkout@v5,actions/upload-artifact@v5) and setFORCE_JAVASCRIPT_ACTIONS_TO_NODE24=trueso transitiver-lib/actions/*@v2steps opt into Node 24 as well, ahead of the June 2nd 2026 deprecation of Node 20 on GitHub Actions runners - Add Claude Code review workflows (
claude.yaml,claude-code-review.yaml) -
get_wasserportal_master_data(): match the new HTML5 markup of the master-data table (<caption>Pegel Berlin</caption>instead of the legacysummary="Pegel Berlin"attribute) - Decode wasserportal pages explicitly as
windows-1252. The pages declare UTF-8 in<meta charset>but the server actually emits Latin-1 bytes (e.g.0xE4forä); trusting the meta declaration left those bytes mis-marked as UTF-8 and brokesubst_special_chars()’sä→ae/ü→uesubstitutions on Windows R - Bypass
rvest::html_table()andxml2::xml_text(trim = TRUE)inget_wasserportal_master_data()andget_wasserportal_stations_table(): both delegate to asub("^[[:space:] ]+", ...)pass that fails on Windows R when the cell text contains umlauts. Tables are now extracted directly viaxml2and trimmed with a locale-safegsub(..., useBytes = TRUE)helper (trim_bytes()) - Make
get_stations()andget_wasserportal_masters_data()resilient when parallel workers cannot fetch a station overview: load thewasserportalnamespace into the cluster and droptry-errorresults beforedata.table::rbindlist()/dplyr::left_join() - Make live-HTTP tests skip gracefully when
wasserportal.berlin.deis unreachable from the test host (CRAN, sandboxed CI) - Update
get_wasserportal_masters_data()test expectations to include the newAnmerkungcolumn that wasserportal added to surface-water master data
wasserportal 0.4.0 2024-04-05
- New feature: add support for downloading all available surface water quality data for one or multiple monitoring stations. For details see
get_surfacewater_qualities() - Bugfix for groundwater level and quality due to new Wasserportal API
- Add project AD4GD as funder
wasserportal 0.3.0 2023-02-19
- Fix errors in GitHub actions: use actions from branches
v2,v3, not frommaster - Fix errors in tutorial.Rmd
- Fix errors in documentation
- Do not run examples that use parallel processing
-
get_stations(): add argumentn_cores -
get_wasserportal_stations_table(): Use new (three letter) variable codes -
read_wasserportal_raw(): adapt request to new API version, add argumentapi_version -
read_wasserportal_raw_gw(): adapt request to new API version - Clean code, mainly to reduce duplicated duplication and to improve readability
- check for more errors
- use “safe” element selection
- use more helper functions
- use helper functions in vignettes
- improve names
wasserportal 0.2.0 2022-09-08
Add functions for exporting time series data to
zipfiles (wp_masters_data_to_list()) and master data tocsvfiles (wp_timeseries_data_to_list()), which will be uploaded to https://kwb-r.github.io/wasserportal/<filename>In addition
importfunctions for downloading and importing the datasets above into R as lists were added (list_timeseries_data_to_zip(),list_masters_data_to_csv())Code cleaning by
@hsonnestartedFix
master datarequests by using themaster_urlinstead ofstation_id, as the latter was not unique. Now functionsget_wasserportal_master_data()and it wrapper functionget_wasserportal_masters_data()require themaster_urlinstead ofstation_idas input parameter. The functionget_stationsnow adds the columnstammdaten_linkas additional column for each sublist element of the sublistoverview_list.Fix to scrape
groundwater leveldata from all available monitoring stations (instead of only 5!) and export to.csvfile. In addition switch also to.csvexport forgroundwater qualityinstead of.jsondue to reduced storage space (stations_gwq_data.json file is already 98.4 MB large.Add functions (
get_daily_surfacewater_data()) and adapt article Surface Water for scraping all available daily surface water data and exporting to one.csvfile for each parameter (containing all monitoring stations)Deactivate PROMISCES workflows (see wasserportal v0.1.0), due to failing Zenodo download. Will be moved into own R package, most properly kwb.promisces.
wasserportal 0.1.1 2022-06-09
- Fix bug in
get_wasserportal_stations_table()now correctly naming parametertemperature(formerly incorrectlylevel) - Fix Surface Water article
- Adapt Zenodo DOI badge to cite always latest release
wasserportal 0.1.0 2022-06-01
R package for scraping groundwater data (groundwater level and quality) from Wasserportal Berlin. Please note that the support for scraping surface water monitoring stations is currently very limited!
Functions:
-
get_stations(): returns metadata for all available monitoring stations -
get_wasserportal_masters_data(): get master data for selectedstation_ids -
read_wasserportal_raw_gw(): enables the download ofgroundwater data. Checkout the Tutorial article how to use it for downloading one or multiple stations at once. -
read_wasserportal(): works forsurface watermonitoring stations, but is outdated, as it is based on an outdated static file with station/variable names (i.e. only for11instead of82stations currently provided!) instead of relying on new metadata provided online. This will be fixed within the next release. For progress on this issue checkout #21
Workflows:
Tutorial article how to download groundwater level and quality data
-
Further Usage by combining previously scraped (see tutorial above) data and performing some analysis:
Groundwater, e.g. creating a map with GW level trends
Two workflows (REACH UBA, Norman List) created within the project PROMISCES for assessing prevalence and the spatial distribution of persistent, mobile and toxic (PMT) substances in the Berlin groundwater, based on different PMT lists, i.e. REACH UBA or Norman List.
wasserportal 0.0.0.9000
Added a
NEWS.mdfile to track changes to the package.see https://style.tidyverse.org/news.html for writing a good
NEWS.md