business · Jun 3, 2026 · 6 min read

How to Scrape Global Public Tenders & RFPs in 2026

Aggregate live government tenders from EU TED, UK Find-a-Tender and US SAM.gov into one normalized feed — filter by keyword, country, CPV/NAICS, value and deadline.

Government procurement is one of the largest, most underexploited public datasets in the world — and it’s published openly, in fragments. The EU’s TED (Tenders Electronic Daily) covers 27 member states plus Norway and Switzerland; the UK’s Find-a-Tender Service (FTS) covers central and local contracts above WTO thresholds; the US SAM.gov publishes federal opportunities. Each has its own API, its own schema, its own classification system, and its own language conventions. The hard part isn’t access — all three are official open APIs — it’s aggregation and normalization. This guide covers how to turn three incompatible government feeds into one consistent, filterable stream of bid opportunities.

What’s worth extracting

Across the three sources, each tender notice normalizes to a consistent record. Per notice, you get:

Identity — native source identifier and notice type (contract notice, prior info, award, etc.).
Titles and text — multilingual titles and preferred-language title, plus the long description.
Buyer — contracting authority name, location, and contact info.
Classification — CPV codes (EU) and NAICS codes (US), normalized into a common classification field.
Value — estimated contract value and currency.
Dates — publication date and submission deadline, as ISO timestamps.
Nature — contract nature (works, supplies, services) and procedure type.
Links — source URL and document URLs.
Raw payload — the original source record, preserved alongside the normalized fields.
Provenance — source name and capture timestamp.

That’s a complete bid-intelligence record: enough to triage opportunities, route them to bid teams, and analyze procurement markets — without manually reconciling three different government schemas.

The extraction reality: aggregation, not anti-bot

None of these sources fights you. TED, FTS, and SAM.gov are official public APIs intended for programmatic use, and the actor runs on them with no proxy. The real engineering is in reconciling them:

Three different APIs, three pagination models. Each source pages differently; the actor follows each source’s pagination correctly so you don’t miss notices.
Multilingual content. TED notices arrive in many EU languages. The actor surfaces multilingual titles and a preferred-language text so a keyword watch actually works across borders.
Two classification systems. EU uses CPV; US uses NAICS. The actor normalizes both into a consistent classification field while preserving the native codes, so an “IT services” watch can span EU and US codes.
Heterogeneous values and dates. Different currencies, different date formats. Normalized to a single value+currency field and ISO timestamps.
One schema out. Three sources collapse into one record shape, with the original payload retained for anything the normalized schema doesn’t capture.

This normalization is the whole value proposition. Hitting one API is easy; making TED, FTS, and SAM.gov speak the same language is the work — and it’s exactly what a managed actor solves once.

▶ Run the Public Tenders Scraper — live government tenders from EU TED (27 states), UK Find-a-Tender and US SAM.gov in one normalized feed. Filter by keyword, country, CPV/NAICS, value and deadline. Official APIs, no proxy.

How querying works

The inputs let you target one normalized watch across all three sources:

sources:        TED | UK-FTS | SAM.gov  (any subset)
keyword:        free-text + buyer-name matching
countries:      ISO country codes (EU members, NO, CH, GB, US)
classification: CPV codes (EU) and/or NAICS codes (US)
contract_nature: works | supplies | services
procedure_type: open | restricted | negotiated | ...
value_range:    min / max estimated value
date_window:    publication / deadline window

The dominant pattern is a daily watch: pin your keyword and CPV/NAICS codes, scope the countries, set a rolling publication window, and schedule. Each run returns newly published notices matching your watch across all three systems, which you route to a bid team or push into a procurement-intelligence platform. For analytics, widen the window and pull historical notices for backfill.

Schema design for downstream use

A normalized notice row, source-agnostic:

{
  "source": "TED",
  "notice_id": "2026/S 105-318442",
  "notice_type": "Contract notice",
  "title": "Provision of cloud hosting and managed IT services",
  "title_lang": "en",
  "description": "The contracting authority seeks a supplier to ...",
  "buyer_name": "Ministerie van Binnenlandse Zaken",
  "buyer_country": "NL",
  "contact_email": "procurement@example.gov.nl",
  "cpv_codes": ["72000000", "72500000"],
  "naics_codes": [],
  "classification_norm": "IT services",
  "contract_nature": "services",
  "procedure_type": "open",
  "estimated_value": 4200000,
  "currency": "EUR",
  "published_at": "2026-06-01T00:00:00Z",
  "deadline_at": "2026-07-03T17:00:00Z",
  "source_url": "https://ted.europa.eu/udl?uri=TED:NOTICE:318442-2026",
  "document_urls": ["https://ted.europa.eu/.../docs.pdf"],
  "scraped_at": "2026-06-03T08:00:00Z"
}

Schema choices that matter for procurement:

Key on (source, notice_id). Notice IDs are unique within a source, not across them. The pair is your safe key.
Keep both native codes and the normalized class. cpv_codes/naics_codes let you drill into the exact category; classification_norm lets you watch across systems. You need both.
deadline_at is the operational field. Bid teams live by it. Always store it as a real timestamp, and flag notices whose deadline has passed.
Retain the raw payload. Government schemas carry fields the normalized record won’t — keep the original so you can extract more later without re-scraping.
Stamp scraped_at and track notice_type. A later “award” notice for an earlier “contract notice” is the contract-outcome signal M&A and competitive-intelligence users want.

Typical use cases

Daily IT-services tender monitoring across EU, UK, and US for bid teams (keyword + CPV/NAICS watches).
Historical backfill of public contracts for procurement-intelligence platforms and analytics.
Lead generation for government contractors — feed open notices with deadlines and buyer contact metadata.
Defense and IT consulting firms tracking sector-specific CPV/NAICS ranges.
Grant and EU-funding consultancies sweeping TED for relevant calls.
M&A and competitive intelligence — analyze awarded contracts, contractors, and contract values.
Press and policy research using authoritative procurement notices for transparency reporting.

Cost math

This actor is pay-per-event with both a per-run start fee and a per-result charge ($0.002 each) — reflecting the normalization work baked into every notice. That makes filtering your cost lever. A focused daily watch — one keyword set, a handful of CPV/NAICS codes, a few target countries — returns tens to low-hundreds of fresh notices per run, costing cents. A broad historical backfill across all sources and categories can return many thousands of notices; scope it deliberately with classification and date filters so you pay for opportunities you’ll actually pursue.

Building this yourself means integrating three separate government APIs, each with its own pagination, language handling, and classification system, then maintaining the CPV↔NAICS normalization as the schemas evolve (TED in particular has migrated formats). That’s a multi-week build and ongoing upkeep across three moving targets — precisely what the managed actor consolidates.

Common pitfalls

Assuming one schema across sources. TED, FTS, and SAM.gov disagree on nearly everything. Rely on the normalized fields for cross-source logic and the raw payload for source-specific detail.
Cross-source ID collisions. Notice IDs are not globally unique. Always key on (source, notice_id).
CPV vs. NAICS mismatch. An “IT services” watch needs both CPV (72xxxxxx) and NAICS (5415xx) codes; using only one silently drops the other continent.
Ignoring deadlines. A tender past its submission deadline is noise to a bid team. Filter or flag on deadline_at.
Language blind spots. A keyword in English won’t match a Dutch or Polish TED title unless you watch the preferred-language/normalized text. Use the multilingual fields, not just the native title.
Notice-type confusion. Don’t treat award notices as open opportunities — check notice_type before routing to a bid team.

Wrapping up

Public procurement data is huge, valuable, and genuinely open — but fragmented across three governments with three incompatible schemas. The work is aggregation and normalization, not access. For a single-source experiment you can hit TED or SAM.gov directly. For a real bid-intelligence feed — one keyword and CPV/NAICS watch spanning the EU, UK, and US with deadlines, values, and contacts in one consistent schema — a managed actor that already reconciles all three sources is the fastest route to an actionable feed.

▶ Open the Public Tenders Scraper on Apify — TED EU, UK FTS and US SAM.gov in one normalized feed, filterable by keyword, country, CPV/NAICS, value and deadline. Pay-per-event; filter tightly to keep runs lean.