How to Scrape GLEIF LEI Entity Data in 2026
Extract 3.3M+ legal entities from the official GLEIF LEI database — legal name, address, jurisdiction, legal form, status and registration data. No key, filter by country.
The Legal Entity Identifier (LEI) is the global standard for identifying parties to financial transactions — a 20-character code tied to authoritative registration data for over 3.3 million legal entities worldwide. The Global Legal Entity Identifier Foundation (GLEIF) publishes all of it through an official, open API: no login, no key, no blocking. For KYC, AML, due diligence, and B2B enrichment, this is one of the cleanest authoritative entity datasets on the planet. The challenge isn’t access — it’s volume, pagination, and modeling the records. This guide covers what GLEIF exposes, how to extract it at scale, and how to use it for verification and enrichment.
What’s worth extracting
Each LEI record is an authoritative entity profile. The actor returns, per entity:
- Identifier — the 20-character LEI code itself.
- Names — official legal name plus any alternative/transliterated names.
- Legal form and category — the entity’s legal form code and entity category (general, fund, branch, etc.).
- Status — operational status (active, inactive) and the LEI registration status (issued, lapsed, retired).
- Jurisdiction — the legal jurisdiction code.
- Addresses — full legal address and headquarters address (often distinct), with country, region, city, postal code.
- Registry data — the originating business-registry identifier and the managing Local Operating Unit (LOU).
- Dates — initial registration, last update, and next renewal date.
- Provenance — a scrape timestamp.
That’s a complete, verifiable entity record — ready for KYC datasets, enrichment joins, and jurisdictional analysis without any cleanup layer.
The extraction reality: an open API, volume over access
GLEIF is built to be queried. There’s no anti-bot wall and no API key requirement. What you manage is scale and pagination:
- Official API, no auth. You query directly; no login, no key, no proxy.
- Pagination through millions of records. A full global pull is 3.3M+ entities. The API pages results, and a complete extraction means following pagination reliably across hundreds of thousands of pages — the place naive scripts stall, drop pages, or duplicate.
- Server-side filtering. Filter by country, entity status, and entity category, so a targeted “active companies in Germany” pull returns only matching records rather than the whole world.
- No HTML parsing. Output comes straight from the API as structured records, exported to JSON, CSV, or Excel.
- Scales both ways. From a focused country-level pull (a few thousand entities) to the full global registry.
This is an authoritative-data problem, not a defended-site problem. The actor’s value is robust pagination at million-record scale plus the filter surface, so you get complete, deduplicated country or global pulls without managing the paging yourself.
▶ Run the GLEIF LEI Scraper — 3.3M+ legal entities with legal name, address, jurisdiction, legal form, status and registration data. No API key, filter by country and status, tens of thousands per run.
How querying works
The inputs map onto GLEIF’s filter model:
country: ISO country code(s), e.g. DE, GB, US
status: entity status (active / inactive)
+ LEI registration status (issued / lapsed)
category: entity category (general / fund / branch / ...)
limit: target record count (paged under the hood)
A practical pattern for enrichment: scope to your target country and active status, pull the set, and join it to your CRM on legal name and address (or, better, on any LEI you already hold). For compliance refresh, schedule a periodic country pull and diff to catch status changes — an entity flipping from active to lapsed is a meaningful KYC signal.
Schema design for downstream use
A flat, join-friendly entity row:
{
"lei": "529900T8BM49AURSDO55",
"legal_name": "Allianz SE",
"other_names": ["Allianz Societas Europaea"],
"legal_form": "AG (Aktiengesellschaft)",
"entity_category": "GENERAL",
"entity_status": "ACTIVE",
"registration_status": "ISSUED",
"jurisdiction": "DE",
"legal_address": {
"line1": "Koeniginstrasse 28",
"city": "Muenchen",
"region": "DE-BY",
"postal_code": "80802",
"country": "DE"
},
"hq_address": {
"line1": "Koeniginstrasse 28",
"city": "Muenchen",
"country": "DE"
},
"registry_id": "HRB 164232",
"managing_lou": "EVK05KS7XY1DEII3R011",
"initial_registration_date": "2012-11-29",
"last_update_date": "2026-04-10",
"next_renewal_date": "2027-04-10",
"scraped_at": "2026-06-01T09:00:00Z"
}
Schema choices that matter:
- The
leiis your golden key. It’s globally unique and stable — the entire point of the standard. Make it your primary key and your join key for enrichment. - Keep legal and HQ addresses separate. They’re often different (registered office vs. operating HQ), and which one you screen on depends on the workflow.
- Store both
entity_statusandregistration_status. An entity can be operationally active while its LEI has lapsed (renewal not paid). For compliance, both matter — a lapsed LEI is a data-quality flag. - Persist
next_renewal_date. It tells you how fresh and maintained the record is; stale-past-renewal records deserve lower trust. - Stamp
scraped_at. Status and addresses change; time-stamping lets you track corporate events over refreshes.
Typical use cases
- KYC / AML compliance — build and refresh entity-verification datasets for onboarding and screening.
- B2B data enrichment — match and enrich company records with authoritative LEI codes and verified legal addresses.
- Risk and due diligence — screen counterparties and map corporate structures by jurisdiction and category.
- Financial research and analytics — analyze the global registered-entity landscape by country, legal form, and status.
- RegTech and fintech products — power entity-lookup and verification features with bulk LEI data.
- Sales and lead generation — source registered companies by country and category with verified legal addresses.
The LEI’s strength is authority: this isn’t scraped-and-guessed company data, it’s the registry standard regulators built. That makes it ideal as the spine of an enrichment pipeline that other, fuzzier sources hang off.
Cost math
The actor is pay-per-event with a tiny per-run start fee and no per-result charge, so cost is essentially Apify compute. With no browser and no proxy, runs are cheap — tens of thousands of records per run. A country-level pull (a few thousand to tens of thousands of entities) typically fits inside Apify’s free monthly credit; a full 3.3M global extraction is a larger compute job but still inexpensive relative to commercial LEI data licenses.
Building it yourself is feasible — the GLEIF API is documented — but the million-record pagination is where private scripts fail: dropped pages, duplicates, and runs that die halfway through a global pull. Robust paging and dedup at that scale is the recurring engineering the actor handles.
Common pitfalls
- Pagination drift on large pulls. At million-record scale, naive offset paging loses or duplicates records. Always verify counts; the actor’s paging is built for this.
- Conflating the two status fields. “Active entity, lapsed LEI” is a real and important state. Treat
entity_statusandregistration_statusas independent. - Address ambiguity. Don’t assume legal address equals HQ. Pick the right one for your screening logic.
- Transliteration and locale. Names and addresses include non-ASCII characters and transliterated variants. Store them faithfully (UTF-8) and keep
other_namesfor matching. - Renewal staleness. A record past its
next_renewal_datemay not reflect current reality. Use the renewal date as a freshness/trust signal in compliance work.
Wrapping up
GLEIF LEI data is authoritative, open, keyless, and structured — the cleanest spine you can give an entity-resolution or KYC pipeline. The work is volume and pagination at million-record scale, not breaking in. For a single-country sample, the API is approachable. For a maintained, refreshed global or country dataset feeding compliance, enrichment, or risk workflows, a managed actor that already handles robust pagination, dedup, and the country/status/category filters gets you to complete, verified records fast.
▶ Open the GLEIF LEI Scraper on Apify — 3.3M+ entities, filter by country, status and category, export to JSON/CSV/Excel. Pay-per-event, free monthly credit to start.
Related guides
Eventbrite API Alternative: Public Event Search After 2019
Eventbrite removed public event search from its API in late 2019. Here is the working Eventbrite API alternative for public event data in 2026.
How to Bulk-Verify Email Deliverability in 2026
A practical guide to validating email lists at scale — syntax, MX/DNS, disposable, role and typo checks — to cut bounce rate and protect sender reputation before outreach.
How to Find Shopify Merchant Leads and Contacts in 2026
A practical guide to extracting B2B leads from Shopify stores — emails, phone numbers, social profiles and store metadata — via direct JSON endpoints with no browser.