How to Scrape the NPI Registry of US Healthcare Providers in 2026
Pull US healthcare provider data from the official NPPES NPI Registry — name, specialty, license, address and phone for 8M+ doctors, dentists, pharmacies and clinics, with no API key or login.
The NPPES NPI Registry is the closest thing the US has to a master directory of every healthcare provider — over 8 million physicians, dentists, pharmacies, clinics, hospitals and other entities, each with a National Provider Identifier (NPI), specialty, license, and practice contact details. The best part for anyone building provider datasets: it’s government data with an official, open API. No login, no API key, no anti-bot fight. This guide covers what the registry exposes and how to turn it into a clean, scheduled, refreshable provider database.
Why this is the easy mode of scraping
Most guides on this site are partly about defeating bot detection. This one isn’t, and that’s the headline. The data comes from the official CMS/NPPES government API:
- No API key. The endpoint is public.
- No login, no cookies, no session. Nothing to authenticate or rotate.
- No blocking to fight. It’s a government data service meant to be queried; there’s no Akamai, no reCAPTCHA, no fingerprinting.
That means the engineering challenge isn’t access — it’s volume and shaping. The registry is enormous, the API paginates and rate-limits, and raw records are nested and need normalizing. The actor’s job is to query at scale, de-duplicate, flatten records into clean rows, and return tens of thousands per run.
What’s worth extracting
Per provider, the registry yields a complete identity-and-contact record:
- Identity — provider name and the NPI (the stable national identifier).
- Type — individual (NPI-1) vs. organization (NPI-2).
- Specialty — taxonomy description and taxonomy codes.
- Credentials — license number(s).
- Contact — practice address, phone and fax.
- Authorized official — for organizations, the contact name and title.
- Status — record status (active/deactivated).
- Dates — enumeration/issuance date and last-updated date.
- Provenance — scrape timestamp.
Records are de-duplicated by NPI, so re-running never double-counts a provider.
Filtering: the whole game is the query
Because access is trivial, the value comes from targeting. The actor filters on the dimensions that matter for building a useful list:
- State — scope to one or more US states.
- City — narrow to a metro.
- Specialty / taxonomy — the most important filter for sales and recruiting: “all cardiologists,” “all retail pharmacies,” “all pediatric dentists.”
- Provider type — individuals vs. organizations.
Combine these and you go from “8 million records” to “every endocrinologist in Texas” in one run — which is exactly the shape a downstream team can act on.
▶ Run the NPI Registry Scraper — name, specialty, license, address and phone for US providers from the official NPPES registry. No API key, no login, tens of thousands per run.
Schema design for downstream use
A flat, CRM-ready per-provider record:
{
"npi": "1497896453",
"provider_type": "individual",
"first_name": "Maria",
"last_name": "Velez",
"credential": "MD",
"taxonomy_description": "Endocrinology, Diabetes & Metabolism",
"taxonomy_code": "207RE0101X",
"license_number": "Q9182",
"license_state": "TX",
"address_line": "1200 Medical Center Dr, Suite 410",
"city": "Houston",
"state": "TX",
"postal_code": "77030",
"phone": "713-555-0142",
"fax": "713-555-0143",
"authorized_official": null,
"status": "active",
"enumeration_date": "2008-06-12",
"last_updated": "2025-11-03",
"scraped_at": "2026-05-29T09:00:00Z"
}
Schema choices worth making early:
- Treat
npias the primary key everywhere. It’s the stable national identifier and the canonical join key for CRM enrichment and dedup. - Keep
taxonomy_codealongside the description. Codes are exact and machine-filterable; descriptions vary in wording. For analytics you want the code; for humans, the description. - Store
last_updatedfrom the registry, not justscraped_at. The registry’s own last-updated date tells you how stale a provider’s record is, independent of when you scraped it — important for verification use cases. - Don’t drop deactivated records silently. For compliance/verification you specifically want to see
status: deactivated; for a sales list you’d filter to active.
Typical use cases
What teams do with NPI data:
- Medical & pharma sales — build targeted prospect lists by specialty, state and city. “Every cardiologist in the Southeast” becomes a one-run export.
- Healthcare marketing — reach dentists, physicians, clinics and pharmacies with verified, government-sourced contact data.
- Recruitment & staffing — source healthcare professionals nationwide by specialty and location for locum, permanent, or travel roles.
- Research & analytics — map provider density, specialty distribution, and licensing patterns across the US for health-policy or market-sizing work.
- CRM enrichment — match and enrich existing provider records against the authoritative registry; backfill missing NPIs, specialties, or addresses.
- Compliance & verification — confirm a provider’s NPI, license number, and active status against the official source.
The common thread is authoritative + targetable. This isn’t scraped-and-maybe-stale web data; it’s the government’s own provider master, sliced to exactly the segment you need.
Cost math and freshness
Pricing is per dataset item. Because there’s no browser and no proxy bandwidth — just API calls to a public government endpoint — the cost per record is low, and a single run can return tens of thousands of providers.
The smart move is scheduling. The registry updates as providers enroll, move, change specialty, or deactivate. Running your targeted query on a weekly or monthly schedule keeps your list fresh against last_updated, so you’re not selling to a clinic that closed or a doctor who relocated. You build an always-current provider database instead of a decaying one-time dump.
Self-hosting the equivalent isn’t hard on the access side, but you’d still own: the pagination/rate-limit handling for high-volume queries, the record flattening and taxonomy mapping, the dedup-by-NPI logic, and the scheduling harness. The managed actor packages all of that.
Common pitfalls
- The raw API is nested and verbose. A single provider record contains arrays of taxonomies, addresses, and identifiers. Decide which taxonomy is “primary” and which address is the practice location — don’t just take the first element blindly.
- One provider can hold multiple taxonomies. A doctor may list two specialties. If you filter by specialty, make sure you’re matching against all their taxonomies, not just the primary, or you’ll miss people.
- Individuals vs. organizations are structurally different. NPI-2 (organization) records have an authorized official and no personal name fields; NPI-1 (individual) records are the reverse. Branch your downstream logic on
provider_type. - Phone isn’t a personal cell. These are practice/business contact numbers from a public registry — appropriate for B2B outreach, not personal contact. Respect that distinction and applicable regulations (TCPA, etc.).
- “Active” status can lag reality. Deactivation isn’t always instant. Cross-check against
last_updatedand don’t treat the registry as real-time. - Re-scrape for freshness. Providers move and deactivate constantly; a six-month-old export has meaningful decay. Schedule it.
Wrapping up
The NPI Registry is the rare high-value dataset that’s both authoritative and genuinely open — official government API, no key, no login, no bot fight. The work isn’t getting the data; it’s querying 8M+ records at volume, shaping the nested output into clean rows, and keeping it fresh. For a one-off pull you could hit the API yourself. For a targeted, de-duplicated, scheduled provider database aimed at exactly your segment, run it as a managed actor and let the volume handling and normalization be done for you.
▶ Open the NPI Registry Scraper on Apify — filter by state, city, specialty and provider type; tens of thousands of providers per run from the official NPPES source. Schedule it for an always-fresh database.
Related guides
Eventbrite API Alternative: Public Event Search After 2019
Eventbrite removed public event search from its API in late 2019. Here is the working Eventbrite API alternative for public event data in 2026.
How to Bulk-Verify Email Deliverability in 2026
A practical guide to validating email lists at scale — syntax, MX/DNS, disposable, role and typo checks — to cut bounce rate and protect sender reputation before outreach.
How to Find Shopify Merchant Leads and Contacts in 2026
A practical guide to extracting B2B leads from Shopify stores — emails, phone numbers, social profiles and store metadata — via direct JSON endpoints with no browser.