lead-generation · May 21, 2026 · 6 min read

How to Scrape Business Leads from OpenStreetMap in 2026

A practical guide to extracting business and POI data from OpenStreetMap via the Overpass API — names, phones, websites, hours and GPS, for any city, with no API key.

Everyone reaches for the Google Maps API when they need local business data — and then hits the quota wall, the billing card requirement, and the per-request pricing that makes a 50,000-row lead list expensive. Meanwhile, the same kind of data sits in OpenStreetMap: millions of businesses and points of interest, contributed and maintained by a global community, queryable for free through the Overpass API with no key and no card. The catch is that Overpass speaks its own query language and returns raw OSM elements that need real parsing before they’re usable as leads. This guide walks through the OSM data model, how Overpass works, and how to turn it into clean, contactable business records for any city in the world.

What’s worth extracting

For each business or POI in OpenStreetMap, the tags surface a surprisingly complete record once you normalize them:

Identity — business name, brand, operator.
Contact — phone, website, and (where tagged) email.
Location — full address components (street, housenumber, city, postcode), plus GPS latitude/longitude.
Category — the OSM amenity / shop / tourism / office tag and a derived subcategory (restaurant, pharmacy, hotel, hair salon, dentist, etc.).
Hours — machine-readable opening_hours in OSM’s standard syntax.
Context — cuisine type for restaurants, wheelchair accessibility, and external references to Wikipedia / Wikidata.
OSM metadata — the element’s stable OSM ID and element type (node / way / relation).

For lead generation, the fields that matter are name + phone + website + address. OSM coverage of those varies by region and category — dense in Europe, strong for restaurants/cafes/shops, thinner for some service businesses — but where it’s tagged, it’s clean and free.

The OSM data model in 60 seconds

OpenStreetMap stores everything as one of three element types, and understanding them is the key to good queries:

Nodes — single points. A standalone shop pin is usually a node.
Ways — ordered lists of nodes. A building footprint is a way; the business tags often live on the building outline.
Relations — groups of elements. Larger venues (a mall, a campus) can be relations.

Business attributes live in tags — key/value pairs attached to elements. A restaurant is amenity=restaurant; a pharmacy is amenity=pharmacy; a clothing store is shop=clothes. Contact tags follow conventions like phone, contact:phone, website, contact:website, opening_hours. Because tagging is community-driven, the same concept can appear under multiple keys (phone vs contact:phone), so a good scraper checks all of them and merges.

How Overpass queries work

The Overpass API is a read-only query engine over the OSM database. You send it an Overpass QL query and it returns matching elements as JSON. A query to find every restaurant in a bounding box looks like this:

[out:json][timeout:60];
(
  node["amenity"="restaurant"](52.3,4.8,52.4,5.0);
  way["amenity"="restaurant"](52.3,4.8,52.4,5.0);
);
out center tags;

A few things to note:

Spatial scoping is mandatory. You query by bounding box (south,west,north,east), by a radius around: a point, or by an administrative area. You never query “all restaurants on Earth” — Overpass would time out.
out center collapses ways/relations to a representative coordinate so every result has a usable lat/lon, not just nodes.
Public endpoints are shared infrastructure. Overpass instances are community-run and rate-limited. Hammer one and you get throttled. Production scraping rotates across endpoints and falls back when one is busy.

This is direct HTTP — no browser, no proxy, no anti-bot stack. The complexity is entirely in writing correct Overpass QL, scoping the geography, handling endpoint throttling gracefully, and parsing the inconsistent tag conventions into clean fields.

▶ Run the OpenStreetMap Business & POI Scraper — query by city, bounding box or radius across 50+ POI categories, get name, address, phone, website, hours and GPS as clean rows. No API key, no proxy.

Schema design for downstream use

When the output lands in your CRM or lead database, you want flat, deduplicated, contact-ready rows. A clean record:

{
  "osm_id": "node/2841093821",
  "osm_type": "node",
  "name": "Café de Klos",
  "category": "amenity",
  "subcategory": "restaurant",
  "cuisine": "dutch",
  "phone": "+31 20 625 5559",
  "website": "https://cafedeklos.nl",
  "email": null,
  "street": "Kerkstraat",
  "housenumber": "41",
  "city": "Amsterdam",
  "postcode": "1017 GB",
  "lat": 52.3641,
  "lon": 4.8896,
  "opening_hours": "Mo-Su 17:30-23:00",
  "wikidata": null,
  "scraped_at": "2026-05-21T12:00:00Z"
}

A few schema choices worth making early:

Keep the osm_id and osm_type together. The ID alone isn’t unique across element types — node/123 and way/123 are different objects. The pair is your stable join key and your dedup key.
Normalize the contact keys at parse time. Merge phone/contact:phone and website/contact:website into single fields, but record nothing rather than guessing.
Filter on name presence for lead lists. Unnamed POIs (a generic bench or ATM node) pollute outreach lists. Require a name when the goal is contactable businesses.
Don’t try to parse opening_hours into a calendar at scrape time. Store the raw OSM syntax string; parse it downstream only if you need it.

Typical use cases

What customers actually do with OSM business data:

Lead generation — build targeted lists of restaurants, hotels, clinics or salons in a city, with phone and website for cold outreach.
Google Maps API alternative — get comparable geographic POI data without paid Maps quotas or a billing card.
Local directory and city-guide sites — populate neighborhood pages with addresses, categories and hours.
GIS and urban research — pull structured POI layers for catchment modelling, accessibility studies and planning.
Real estate amenity scoring — count pharmacies, supermarkets and schools within a radius of a property.
Market research — map business density per neighborhood, analyze competitor clustering and find underserved areas.

The common thread is geographic breadth at zero data cost. OSM won’t have a phone number for every business, but for the categories and regions where it’s well-tagged, it’s a free, legitimate, license-clear source — far cheaper than Maps API quotas for bulk work.

Cost math for the managed approach

Because this is direct HTTP with no browser and no proxy, the compute footprint is minimal. A typical lead-gen run — every restaurant and cafe in a mid-size European city — returns a few thousand rows in well under a minute. Under this actor’s pricing, results are emitted at no per-row charge, so cost is dominated by the tiny per-run start fee. Scraping ten cities a day is still a rounding error on your monthly Apify credit.

Compared to the Google Maps Places API, the contrast is stark: Places billing runs into dollars per thousand requests once you exceed the free tier, and bulk extraction is explicitly throttled. For high-volume lead building, OSM via Overpass is the order-of-magnitude cheaper path.

What you avoid by using a managed actor rather than rolling your own:

Learning and debugging Overpass QL and bounding-box math
Handling endpoint throttling and writing fallback logic across mirrors
Normalizing the inconsistent tag conventions into clean fields
Maintaining the 50+ category filter mappings

Common pitfalls

A few things to know before you build an OSM lead pipeline, whether you build or buy:

Coverage is uneven. OSM is excellent in Western Europe and major cities, sparser in some regions and service categories. Always sanity-check density before promising a client a full list.
Overpass rate limits are real. Public endpoints throttle aggressively under load. Without endpoint rotation and backoff, large runs stall.
Tag inconsistency. The same data hides under different keys across regions and contributors. A scraper that only reads phone and ignores contact:phone silently loses leads.
Stale or duplicate entries. Community data can include closed businesses or near-duplicate nodes. Dedup on osm_type+osm_id and treat the data as a lead candidate list, not verified ground truth.
Respect the data license. OSM data is ODbL-licensed. Using it for lead generation is fine; redistributing derived datasets carries attribution and share-alike obligations worth understanding.

Wrapping up

OpenStreetMap is the most underused source of free local-business data on the web. The data model and Overpass QL have a learning curve, and the tag inconsistency punishes naive parsers — which is exactly why a managed actor that handles the queries, endpoint fallback and normalization earns its keep. If you need contactable business leads for any city without paying Google Maps API rates, this is the cleanest path.

▶ Open the OpenStreetMap Business & POI Scraper on Apify — city, bbox or radius queries, 50+ categories, clean lead-ready rows with GPS. No API key, no proxy, pay only the per-run start fee.