L logiover
business · Jun 3, 2026 · 7 min read

How to Scrape UN Comtrade Import Export Data in 2026

A practical guide to pulling bilateral trade statistics from UN Comtrade — country-pair flows by HS commodity, values, weight and quantity — without an API key or rate-limit pain.

UN Comtrade is the single largest public repository of international trade statistics — the official record of what every reporting country imports and exports, by commodity, by partner, by year and month. It’s a government data source, not a scrappy private site, which changes the whole problem. There’s no anti-bot stack to defeat. The real challenges are different: an awkward API surface, M49 country codes instead of names, an HS commodity taxonomy you have to understand, and rate limits on the keyless tier that punish naive fan-out. This guide walks through what Comtrade actually exposes and how to pull clean bilateral trade data without an API key.

What Comtrade actually exposes

Comtrade publishes bilateral trade flows: for a given reporter country, partner country, commodity, direction, and period, it reports the trade value and physical measures. The fields that matter per row:

  • Reporter — the country that reported the flow (e.g., Germany).
  • Partner — the counterparty country (e.g., China).
  • Trade flow — import or export (and sometimes re-import / re-export).
  • Commodity — an HS code at 2-, 4-, or 6-digit granularity, or an aggregate.
  • Period — annual (a year) or monthly (a year-month).
  • Trade value — in USD, the headline figure most analysis is built on.
  • Net weight — physical mass, where reported.
  • Quantity and unit — e.g., number of units, liters, kilograms, with the quantity unit code.
  • CIF / FOB values — where the reporter distinguishes them.

The unit of data is the country × partner × commodity × period × flow tuple. That granularity is the whole point — it’s what lets you say “Germany’s imports of HS 8703 passenger cars from China grew X% from 2024 to 2025.”

The government-API reality

This is not a site that fights you. Comtrade wants the data used — it’s a UN statistical public good. The friction is structural, not adversarial:

  1. Country names aren’t keys. Comtrade addresses countries by UN M49 numeric codes. “USA” is 842, “Germany” is 276, the rest-of-world aggregate is 0. You can’t query by name; you have to resolve names to M49 codes first. A good scraper does this resolution for you.
  2. HS codes are the commodity language. The Harmonized System is a hierarchical taxonomy: 2-digit chapters (e.g., 87 vehicles), 4-digit headings (8703 cars), 6-digit subheadings (870323). You query at the level you need; coarser codes are aggregates of finer ones, so don’t double-count across levels.
  3. The keyless tier is rate-limited. Comtrade offers a free preview/public tier without an API key, but it caps request rate (roughly one request per second is the safe pace) and the rows returned per call. Hammer it and you get throttled or temporarily blocked.
  4. Fan-out multiplies fast. “All EU importers of all 4-digit textile codes, monthly, for three years” is millions of tuples. You have to fan out across countries, commodities, and periods deliberately, with throttling, or you’ll trip the limits on the first minute.

The naive approach — a loop firing requests as fast as Python can issue them — gets rate-limited immediately. What works is automatic M49 resolution, request pacing to respect the ~1 req/s limit, and structured fan-out that pages through the country/commodity/period grid politely. That throttling-and-resolution logic is exactly what a managed actor solves once so you don’t rebuild it.

Run the UN Comtrade Trade Data Scraper — bilateral import/export flows by country and HS commodity, value, weight and quantity, annual or monthly. Country-name-to-M49 resolution and rate-limit-respecting fan-out built in. No API key.

How the query structure works

You define the query along five axes, and the actor fans out across the combinations:

reporters:    ["Germany", "France"]   # resolved to M49: 276, 251
partners:     ["China", "World"]      # resolved to M49: 156, 0
commodities:  ["8703", "8708"]        # HS 4-digit: cars, car parts
flow:         ["import", "export"]
period:       monthly, 2024-01 .. 2025-12

Each cell in that grid becomes one or more request, paced to respect the keyless rate limit, and the rows are collected into a flat table. Querying partner = World (M49 0) is a useful trick: it gives you a country’s total trade in a commodity with all partners combined, which is often what you actually want for a market-size question.

Schema design for downstream analysis

Comtrade’s raw response is verbose and uses code fields everywhere. For a warehouse you want it normalized and labeled:

{
  "reporter": "Germany",
  "reporter_m49": "276",
  "partner": "China",
  "partner_m49": "156",
  "flow": "import",
  "hs_code": "8703",
  "hs_level": 4,
  "period": "2025-03",
  "period_type": "monthly",
  "trade_value_usd": 412938221,
  "net_weight_kg": 38211904,
  "quantity": 18422,
  "quantity_unit": "Number of items",
  "scraped_at": "2026-06-03T12:00:00Z"
}

Schema choices worth making early:

  • Keep both the M49 code and the resolved name. The code is the stable join key; the name is for humans and changes formatting across sources.
  • Store hs_level alongside hs_code. It prevents the classic mistake of summing a 2-digit aggregate together with its own 4-digit children.
  • Don’t assume weight or quantity is always present. Many reporters file value-only for some commodities. Treat physical measures as nullable.
  • Record period_type. Monthly and annual figures live in the same table only if you tag them; mixing them silently produces nonsense totals.

Typical use cases

What people actually do with Comtrade data:

  • Market-entry research — identify which countries import a product, in what volumes and values, and how that’s trended over years.
  • Supplier and sourcing discovery — find the origin countries and trade partners for a commodity to map a supply chain.
  • Commodity trend tracking — chart import/export value and volume by HS code across periods to spot growth or collapse.
  • Tariff and supply-chain impact analysis — model how bilateral flows shift after a tariff change or disruption.
  • Economics and BI dashboards — feed clean country×commodity×period rows into reports and visualizations.
  • AI and data products — supply LLM workflows and datasets with structured, cleaned trade statistics rather than raw API JSON.
  • Trade finance and risk — assess exposure across country pairs and commodity concentrations.

The common thread: the value is in breadth and structure. A single country-pair lookup you can do by hand on the Comtrade site; a clean, labeled, multi-year matrix across dozens of countries and commodities is sellable infrastructure.

Cost math for the managed approach

This actor is effectively free to run on a per-row basis — pricing is a trivial per-start charge ($0.00005 to start a run) with no per-result fee. The actual constraint isn’t dollars, it’s the keyless tier’s rate limit. A run that fans across, say, 30 reporters × 20 HS codes × 12 months is 7,200 tuples; paced at ~1 req/s with paging, that’s tens of minutes of patient collection, and it costs essentially nothing beyond compute.

Compare to the build-it-yourself path:

  • You’d write the M49 resolution table, the HS-level bookkeeping, and the throttling/backoff loop yourself — a day or two to get right, plus debugging the rate-limit edge cases.
  • A registered Comtrade API key raises the limits but adds an account, key management, and quota tracking. The keyless tier is genuinely sufficient for most analysis if you respect the pacing.

The saved cost here is mostly engineering hours spent on M49 mapping and backoff logic — unglamorous plumbing that the managed actor has already solved.

Common pitfalls

A few things to know before you commit to a Comtrade pipeline:

  • Reporter ≠ partner symmetry. Germany’s reported imports from China rarely exactly match China’s reported exports to Germany — different valuation (CIF vs FOB), timing, and reporting practices. Don’t expect mirror flows to reconcile.
  • Not every country reports every period. Coverage is uneven, especially for recent monthly data and smaller economies. A missing row means “not reported,” not “zero trade.”
  • HS revisions change codes over time. The HS nomenclature is revised periodically; a code’s meaning can shift across revision years. Pin the classification if you’re comparing long time spans.
  • The “World” partner is an aggregate, not a country. Useful for totals, but don’t mix it into a partner-level breakdown or you’ll double the totals.
  • Respect the rate limit. The keyless tier will throttle aggressive fan-out. Patient pacing beats parallel hammering every time.

Wrapping up

Comtrade is the rare scraping target that isn’t trying to stop you — it’s a public statistical good. The work is in the translation: names to M49 codes, the HS taxonomy, polite rate-limited fan-out across a big query grid. If you need one country pair once, the Comtrade website is fine. If you need a clean, labeled, multi-year matrix across many countries and commodities, let a managed actor handle the resolution and throttling and hand you the table.

Open the UN Comtrade Trade Data Scraper on Apify — official bilateral trade statistics, normalized and export-ready. Keyless access, rate-limit-aware. Start with Apify’s free monthly credit.

Related guides