business · Jun 3, 2026 · 6 min read

How to Scrape Numbeo Cost of Living Data in 2026

Extract Numbeo cost-of-living prices, indices and city rankings — 55+ price items per city across food, rent, transport, salaries and more, with low/avg/high ranges and composite indices.

Numbeo is the largest crowdsourced cost-of-living database on the web — prices for groceries, rent, transport, restaurants and salaries across thousands of cities, plus the composite indices (Cost of Living Index, Rent Index, Local Purchasing Power) that power countless “most expensive cities” articles and relocation calculators. The data is public and the pages are static HTML, which makes this a clean, lightweight scrape. This guide covers what Numbeo exposes, how the data is structured, and how to extract it city-by-city or as global rankings.

What’s worth extracting

Numbeo’s value is the breadth of items per city. For each city the scraper pulls 55+ price items across nine categories, each with a reported low / average / high range:

Restaurants — meal prices, inexpensive vs. mid-range, beer, coffee.
Groceries — milk, bread, eggs, produce, meat, etc.
Transport — local ticket, monthly pass, taxi, fuel.
Utilities — electricity/heating/water, mobile, internet.
Childcare — preschool, private school.
Clothing — jeans, dresses, shoes.
Rent — apartments by size and city-center vs. outside.
Apartment purchase — price per square meter, city-center vs. outside.
Salaries & finance — average net salary, mortgage interest rate.

On top of the raw items, Numbeo publishes composite indices and rankings:

Indices — Cost of Living Index, Rent Index, Groceries Index, Restaurant Price Index, Local Purchasing Power Index, and more.
Global city rankings — ranked tables across cities.
Country-level data — nationwide average prices.

And it spans seven topical categories beyond just cost-of-living: quality-of-life, crime, health-care, pollution, traffic, and property-investment indices.

Why this is a lightweight scrape

Unlike the heavily-defended sites covered in other guides, Numbeo is straightforward:

Static HTML. The price tables and index numbers are server-rendered into the page, so a lightweight static fetch plus parser is enough — no headless browser needed.
Public pages. No login, no cookies.
Polite pacing. The actor uses limited concurrency and randomized delays, and supports residential proxy configurations, specifically to avoid rate-limiting rather than to defeat a bot stack. Numbeo doesn’t fight you hard; it just doesn’t want to be hammered.

The output is deliberately flat and row-oriented — one row per price item or ranking entry — which is exactly what you want for loading into a spreadsheet, warehouse, or dashboard.

▶ Run the Numbeo Cost of Living Scraper — 55+ price items per city across 9 categories with low/avg/high ranges, plus composite indices and global rankings.

Schema design for downstream use

Because the data is flat and row-oriented, a clean per-item record looks like this:

{
  "city": "Lisbon",
  "country": "Portugal",
  "category": "rent",
  "item": "Apartment (1 bedroom) in City Centre",
  "currency": "EUR",
  "price_low": 950,
  "price_avg": 1280,
  "price_high": 1700,
  "unit": "monthly",
  "scraped_at": "2026-06-03T10:00:00Z"
}

And a ranking/index row:

{
  "city": "Lisbon",
  "country": "Portugal",
  "index_name": "Cost of Living Index",
  "index_value": 49.8,
  "rank": 214,
  "scraped_at": "2026-06-03T10:00:00Z"
}

Schema choices worth making early:

Always store low / avg / high, not just the average. The spread is information — a city with a wide grocery range behaves very differently for budgeting than one with a tight range. Throwing away low/high loses that.
Keep currency on every row. Numbeo reports in local currency by default. If you mix cities without normalizing currency you’ll compare apples to escudos.
Normalize category and item to stable keys. Item labels are long human strings; map them to stable codes if you’ll join across cities, so “Milk (1 liter)” and its localized variants line up.
Separate price rows from index rows. They have different shapes (a price has a range and currency; an index has a value and a rank). Keep two tables, not one mushy one.

Typical use cases

What people build with Numbeo data:

Relocation & expat planning — compare rent, groceries, transport and utilities across candidate destinations; power a “can I afford to move here?” calculator.
HR & salary benchmarking — adjust compensation by local purchasing power and salary data when hiring or relocating staff across markets.
Real-estate & investment research — track apartment price-per-m² and rents across markets for affordability and yield analysis.
Pricing strategy — calibrate product pricing and tiering by regional affordability.
Academic / economic research — feed PPP, inequality, and cost-of-living studies with structured city/country data.
Journalism & data viz — generate comparative city rankings, affordability maps, and the perennial “most/least expensive cities” pieces.

The common thread is comparison across places. A single city’s prices are mildly interesting; the same 55 items across 100 cities, normalized to one currency, is a genuine dataset you can rank, map, and re-run as prices drift.

Cost math

Pricing is pay-per-event with a small per-run start fee and the per-result event priced at zero, so the working cost is just the lightweight compute (and optional proxy bandwidth). Because it’s static HTML with no browser, a city’s full 55+ item set is cheap and quick to pull.

Example: a relocation calculator covering 80 cities, each refreshed monthly, is ~80 × 55 ≈ 4,400 price rows per refresh — a modest dataset that’s inexpensive to keep current.

Building it yourself isn’t hard at the access layer, but you’d own the polite-pacing/rate-limit avoidance, the parsing of the many item rows and index tables, currency handling, and the flattening into clean rows — plus re-fixing the parser whenever Numbeo adjusts its page layout. The managed actor packages that.

Common pitfalls

The data is crowdsourced, not audited. Numbeo prices come from user submissions. They’re great for relative comparison and trends, but treat any single number as an estimate, not a quote. Thinly-sampled cities are noisier.
Currency mixing is the classic mistake. Default reporting is local currency. Normalize to a common currency (and record the FX date) before any cross-city comparison.
Indices are relative, not absolute. Numbeo’s indices are normalized against a baseline (historically a reference city). An index of 50 means “half the baseline,” not “$50” — don’t read currency into index values.
Sample size varies wildly. A major capital has thousands of data points; a small town might have a handful. Where Numbeo exposes contributor counts, weight your confidence accordingly.
Items aren’t perfectly uniform across cities. Some cities lack certain items. Handle missing items as null, not zero, or you’ll drag averages down.
Re-scrape for trends. Prices and rankings shift month to month. The interesting analysis (cost-of-living inflation, ranking movement) needs scheduled re-runs, not a one-time pull.

Wrapping up

Numbeo is the broadest cost-of-living dataset available, and because it’s static, public HTML, extracting it is light work — no browser, no bot fight, just polite pacing. The real value comes from breadth and comparison: the same items across many cities, normalized to one currency, refreshed over time. For a quick look at one city you can read the page yourself. For a comparative, multi-city, refreshable dataset to power a calculator, benchmark, or research project, run it as a managed actor and let the parsing and pacing be handled for you.

▶ Open the Numbeo Cost of Living Scraper on Apify — 55+ items per city, composite indices and global rankings, by city or country. Pay per row. Start with Apify’s free monthly credit.