marketing · Jun 5, 2026 · 5 min read

Substack Has No Public API: Export Post Data in 2026

Substack ships no public API, but every publication exposes /api/v1/posts and archive JSON on its own subdomain. Here's how to export newsletter and post data in 2026.

Substack is one of the largest publishing platforms on the web, and it ships exactly zero public API. There’s no developer portal, no API key, no documented endpoint, no terms-of-service-sanctioned way to programmatically read a publication’s posts. If you want to export newsletter and post data — for a competitive content audit, a research corpus, or a newsletter-tracking product — there is no official door to knock on. What makes this tractable anyway is an accident of architecture: every Substack publication runs on its own subdomain, and that subdomain exposes the same internal JSON endpoints the front-end uses to render itself. This guide is about exporting Substack post data in 2026 using those public per-publication endpoints, since “Substack has no public API” doesn’t mean the data is locked away.

The “no public API” situation, precisely

To set expectations honestly:

No official API exists. Substack has never published one. Nothing you build here is a sanctioned integration; you’re reading the same endpoints the website reads.
Per-publication architecture. A newsletter lives at https://{publication}.substack.com (or a custom domain that proxies the same app). There’s no central api.substack.com catalog — each publication is its own island.
Public vs. paid posts. Free posts expose full content; paywalled posts expose metadata and a preview but gate the body behind a paid subscription. No scraping changes that — locked content stays locked.

So the realistic goal is: full data for free posts, and rich metadata (title, subtitle, author, date, like/comment counts, paywall flag) for every post including paid ones.

The endpoints every publication exposes

Each Substack subdomain serves a small set of JSON endpoints that power its own reader UI. The two that matter most:

The posts archive feed:

https://{publication}.substack.com/api/v1/posts?limit=12&offset=0

This returns an array of post objects — title, subtitle, slug, canonical URL, publish date, author(s), cover image, type (newsletter / podcast / thread), audience (free vs. paid), like count, comment count, and a description/preview. Paginate with offset (or the before/after date-cursor variants the app uses) to walk the entire archive.

The single-post endpoint:

https://{publication}.substack.com/api/v1/posts/{slug}

For an individual post this returns the full record, including the body HTML for free posts. There are sibling endpoints the reader uses for comments and for the publication’s own profile/about data.

There’s also the plain RSS feed at https://{publication}.substack.com/feed, which is the most stable surface of all — standards-based, unlikely to change shape — though it carries fewer fields and a shorter window than the api/v1/posts archive.

A pragmatic pipeline uses the archive endpoint for breadth, the single-post endpoint for full bodies, and RSS as a resilient fallback.

▶ Try the Substack Newsletter Scraper on Apify — walks a publication’s full archive and pulls per-post content and metadata, no login needed. No auth required.

Pagination and walking an archive

The archive endpoint paginates simply:

https://{publication}.substack.com/api/v1/posts?limit=12&offset=0
https://{publication}.substack.com/api/v1/posts?limit=12&offset=12
https://{publication}.substack.com/api/v1/posts?limit=12&offset=24

Increase offset until you get an empty array — that’s the end of the archive. Keep limit modest (the app itself uses small pages) to stay unobtrusive. For incremental refreshes, you only need the first page or two and a dedupe on slug, since new posts land at the top.

Rate limits and how to live with them

Because there’s no API, there’s no published limit — but Substack sits behind a CDN that will throttle a single IP requesting a publication’s entire archive at speed. Sensible defaults:

Throttle to roughly one request every second or two per publication. Most archives are a few dozen to a few hundred posts; there’s no need to sprint.
Fetch metadata in bulk, bodies selectively. The archive feed is cheap; the per-post full body is one request each. Only fetch bodies for the posts you actually need.
Cache by slug. A published post rarely changes. Pull its body once and skip it on future runs.
Rotate IPs only across many publications — a single newsletter from a single IP is fine; crawling thousands of publications needs a pool.

A clean output schema

One row per post:

{
  "publication": "stratechery",
  "post_slug": "the-end-of-the-beginning",
  "url": "https://stratechery.substack.com/p/the-end-of-the-beginning",
  "title": "The End of the Beginning",
  "subtitle": "Where computing goes next",
  "author": "Ben Thompson",
  "type": "newsletter",
  "audience": "only_paid",
  "is_paywalled": true,
  "published_at": "2026-05-20T13:00:00Z",
  "like_count": 412,
  "comment_count": 88,
  "cover_image": "https://substackcdn.com/...",
  "preview_text": "A decade ago...",
  "body_html": null,
  "word_count": null,
  "scraped_at": "2026-06-05T12:00:00Z"
}

Note body_html and word_count are null here because the post is paywalled — set them only for free posts where the full body is legitimately public. Use publication + post_slug as the natural key.

Use cases

Competitive content audits — pull a rival newsletter’s full archive of titles, dates, and engagement counts to reverse-engineer their cadence and what resonates.
Newsletter discovery / directories — index publications’ metadata to build a searchable catalog of newsletters by topic.
Research corpora — export the full text of free posts across many publications for media analysis or NLP.
Author tracking — monitor a specific writer’s output, publish frequency, and engagement trend over time.
Trend detection — aggregate topics and posting velocity across a watchlist of newsletters to spot rising themes.

Build it yourself vs. a managed actor

For a single newsletter, this is a short script — page the archive, fetch a few bodies, done. The complexity arrives when you go wide: custom-domain publications proxy the same app but at a different host, the api/v1/posts shape isn’t documented or contractual so it drifts, paywall handling has to be respectful (metadata yes, locked bodies no), and RSS-vs-archive reconciliation needs care so you don’t double-count. A managed Substack newsletter scraper absorbs the per-publication quirks and endpoint drift so you just consume normalized rows.

Common pitfalls

Trying to scrape paid content. Don’t. The body of a paywalled post isn’t public; respect the paywall and capture only metadata for those.
Custom domains. Many publications front the Substack app on their own domain. Resolve to the real *.substack.com (or handle the proxy) so your endpoints work.
Undocumented shape drift. api/v1/posts is an internal endpoint, not a promise. Parse defensively and keep RSS as a fallback.
Offset overrun. Keep paging until empty; don’t hardcode a post count — archives grow.
Over-crawling. Pulling a 500-post archive at full speed from one IP is the fastest way to get throttled. Pace it and cache.

Wrapping up

Substack has no public API in 2026 and has never had one — but its per-publication architecture means every newsletter exposes /api/v1/posts, single-post JSON, and an RSS feed on its own subdomain, which together let you export post data and metadata cleanly. Respect the paywall, pace your requests, and key on publication + slug, and you can export Substack post data reliably despite the missing API. If you’d rather not handle custom domains and undocumented endpoint drift across many publications, a managed actor returns it normalized.

▶ Open the Substack Newsletter Scraper on Apify — export a publication’s full post archive with metadata and free-post content. No API, no login. Pay per post.