L logiover
lead-generation · May 29, 2026 · 6 min read

How to Scrape the NPI Registry of US Healthcare Providers in 2026

Pull US healthcare provider data from the official NPPES NPI Registry — name, specialty, license, address and phone for 8M+ doctors, dentists, pharmacies and clinics, with no API key or login.

The NPPES NPI Registry is the closest thing the US has to a master directory of every healthcare provider — over 8 million physicians, dentists, pharmacies, clinics, hospitals and other entities, each with a National Provider Identifier (NPI), specialty, license, and practice contact details. The best part for anyone building provider datasets: it’s government data with an official, open API. No login, no API key, no anti-bot fight. This guide covers what the registry exposes and how to turn it into a clean, scheduled, refreshable provider database.

Why this is the easy mode of scraping

Most guides on this site are partly about defeating bot detection. This one isn’t, and that’s the headline. The data comes from the official CMS/NPPES government API:

  • No API key. The endpoint is public.
  • No login, no cookies, no session. Nothing to authenticate or rotate.
  • No blocking to fight. It’s a government data service meant to be queried; there’s no Akamai, no reCAPTCHA, no fingerprinting.

That means the engineering challenge isn’t access — it’s volume and shaping. The registry is enormous, the API paginates and rate-limits, and raw records are nested and need normalizing. The actor’s job is to query at scale, de-duplicate, flatten records into clean rows, and return tens of thousands per run.

What’s worth extracting

Per provider, the registry yields a complete identity-and-contact record:

  • Identity — provider name and the NPI (the stable national identifier).
  • Type — individual (NPI-1) vs. organization (NPI-2).
  • Specialty — taxonomy description and taxonomy codes.
  • Credentials — license number(s).
  • Contact — practice address, phone and fax.
  • Authorized official — for organizations, the contact name and title.
  • Status — record status (active/deactivated).
  • Dates — enumeration/issuance date and last-updated date.
  • Provenance — scrape timestamp.

Records are de-duplicated by NPI, so re-running never double-counts a provider.

Filtering: the whole game is the query

Because access is trivial, the value comes from targeting. The actor filters on the dimensions that matter for building a useful list:

  • State — scope to one or more US states.
  • City — narrow to a metro.
  • Specialty / taxonomy — the most important filter for sales and recruiting: “all cardiologists,” “all retail pharmacies,” “all pediatric dentists.”
  • Provider type — individuals vs. organizations.

Combine these and you go from “8 million records” to “every endocrinologist in Texas” in one run — which is exactly the shape a downstream team can act on.

Run the NPI Registry Scraper — name, specialty, license, address and phone for US providers from the official NPPES registry. No API key, no login, tens of thousands per run.

Schema design for downstream use

A flat, CRM-ready per-provider record:

{
  "npi": "1497896453",
  "provider_type": "individual",
  "first_name": "Maria",
  "last_name": "Velez",
  "credential": "MD",
  "taxonomy_description": "Endocrinology, Diabetes & Metabolism",
  "taxonomy_code": "207RE0101X",
  "license_number": "Q9182",
  "license_state": "TX",
  "address_line": "1200 Medical Center Dr, Suite 410",
  "city": "Houston",
  "state": "TX",
  "postal_code": "77030",
  "phone": "713-555-0142",
  "fax": "713-555-0143",
  "authorized_official": null,
  "status": "active",
  "enumeration_date": "2008-06-12",
  "last_updated": "2025-11-03",
  "scraped_at": "2026-05-29T09:00:00Z"
}

Schema choices worth making early:

  • Treat npi as the primary key everywhere. It’s the stable national identifier and the canonical join key for CRM enrichment and dedup.
  • Keep taxonomy_code alongside the description. Codes are exact and machine-filterable; descriptions vary in wording. For analytics you want the code; for humans, the description.
  • Store last_updated from the registry, not just scraped_at. The registry’s own last-updated date tells you how stale a provider’s record is, independent of when you scraped it — important for verification use cases.
  • Don’t drop deactivated records silently. For compliance/verification you specifically want to see status: deactivated; for a sales list you’d filter to active.

Typical use cases

What teams do with NPI data:

  • Medical & pharma sales — build targeted prospect lists by specialty, state and city. “Every cardiologist in the Southeast” becomes a one-run export.
  • Healthcare marketing — reach dentists, physicians, clinics and pharmacies with verified, government-sourced contact data.
  • Recruitment & staffing — source healthcare professionals nationwide by specialty and location for locum, permanent, or travel roles.
  • Research & analytics — map provider density, specialty distribution, and licensing patterns across the US for health-policy or market-sizing work.
  • CRM enrichment — match and enrich existing provider records against the authoritative registry; backfill missing NPIs, specialties, or addresses.
  • Compliance & verification — confirm a provider’s NPI, license number, and active status against the official source.

The common thread is authoritative + targetable. This isn’t scraped-and-maybe-stale web data; it’s the government’s own provider master, sliced to exactly the segment you need.

Cost math and freshness

Pricing is per dataset item. Because there’s no browser and no proxy bandwidth — just API calls to a public government endpoint — the cost per record is low, and a single run can return tens of thousands of providers.

The smart move is scheduling. The registry updates as providers enroll, move, change specialty, or deactivate. Running your targeted query on a weekly or monthly schedule keeps your list fresh against last_updated, so you’re not selling to a clinic that closed or a doctor who relocated. You build an always-current provider database instead of a decaying one-time dump.

Self-hosting the equivalent isn’t hard on the access side, but you’d still own: the pagination/rate-limit handling for high-volume queries, the record flattening and taxonomy mapping, the dedup-by-NPI logic, and the scheduling harness. The managed actor packages all of that.

Common pitfalls

  • The raw API is nested and verbose. A single provider record contains arrays of taxonomies, addresses, and identifiers. Decide which taxonomy is “primary” and which address is the practice location — don’t just take the first element blindly.
  • One provider can hold multiple taxonomies. A doctor may list two specialties. If you filter by specialty, make sure you’re matching against all their taxonomies, not just the primary, or you’ll miss people.
  • Individuals vs. organizations are structurally different. NPI-2 (organization) records have an authorized official and no personal name fields; NPI-1 (individual) records are the reverse. Branch your downstream logic on provider_type.
  • Phone isn’t a personal cell. These are practice/business contact numbers from a public registry — appropriate for B2B outreach, not personal contact. Respect that distinction and applicable regulations (TCPA, etc.).
  • “Active” status can lag reality. Deactivation isn’t always instant. Cross-check against last_updated and don’t treat the registry as real-time.
  • Re-scrape for freshness. Providers move and deactivate constantly; a six-month-old export has meaningful decay. Schedule it.

Wrapping up

The NPI Registry is the rare high-value dataset that’s both authoritative and genuinely open — official government API, no key, no login, no bot fight. The work isn’t getting the data; it’s querying 8M+ records at volume, shaping the nested output into clean rows, and keeping it fresh. For a one-off pull you could hit the API yourself. For a targeted, de-duplicated, scheduled provider database aimed at exactly your segment, run it as a managed actor and let the volume handling and normalization be done for you.

Open the NPI Registry Scraper on Apify — filter by state, city, specialty and provider type; tens of thousands of providers per run from the official NPPES source. Schedule it for an always-fresh database.

Related guides