Scrape + score a cosinex Vergabemarktplatz instance (generic connector)
Source:R/cosinex.R
cosinex_tenders.RdShared engine behind vmp_bb_tenders(), vmp_nrw_tenders() and
dtvp_tenders(): opens a chromote session, optionally logs in, scrapes the
extended-search results, scores them (score_relevance()), enriches via the
detail (and optional notice) layers, applies the title/CPV exclusions
(apply_title_excludes()) and tags Plattform = plattform. The detail and
notice caches are namespaced by slug, so several portals can share one
cache_dir without clobbering each other.
Usage
cosinex_tenders(
base_url,
plattform,
slug,
mount = "VMPCenter",
keywords = tender_keywords(),
login = FALSE,
max_pages = Inf,
since_days = NULL,
publication_types = c("ExAnte", "Tender"),
contracting_rules = "VOL",
screen_details = TRUE,
max_detail = Inf,
screen_notice = FALSE,
max_notice = Inf,
username = "",
password = "",
cache_dir = "reports",
relevant_only = FALSE,
headless = TRUE
)Arguments
- base_url
Portal host, e.g.
"https://www.evergabe.nrw.de".- plattform
Display name written to the
Plattformcolumn.- slug
Short id used for the per-portal cache files (e.g.
"vmp_nrw").- mount
cosinex mount segment:
"VMPCenter"(Land marketplaces) or"Center"(DTVP).- keywords
Keyword list for relevance scoring (default
tender_keywords()).- login
Log in before scraping (default
FALSE; the search is public).- max_pages
Maximum number of result pages to scrape (default
Inf).- since_days
If set, stop paging once a result page is entirely older than this many days (the search is sorted newest-first). Bounds the scrape for large portals/award histories;
NULLscrapes up tomax_pages. The precise date trim happens later inscreen_portals().- publication_types, contracting_rules
Search filter passed to
vmp_bb_scrape_tenders().- screen_details
Detail-page layer (default
TRUE; seeenrich_with_details()).- max_detail
Maximum number of detail pages to screen (default
Inf).- screen_notice
Notice-PDF layer (default
FALSE; forceslogin = TRUE; seeenrich_with_notice()).- max_notice
Maximum number of new notice PDFs to read (default
Inf).- username, password
Credentials when
login = TRUE(default env varsVMP_BB_USERNAME/VMP_BB_PASSWORD).- cache_dir
Directory for the detail/notice caches (default
"reports").- relevant_only
Return only relevant tenders (default
FALSE; the combined multi-portal run inscreen_all_portals()sets thisTRUE).- headless
Run chromote headless (default
TRUE).