social-media · May 24, 2026 · 6 min read

How to Scrape Apple Podcasts Episodes in 2026

Extract podcast shows, full episode lists, MP3 audio URLs, show notes and transcripts from Apple Podcasts using the iTunes API plus RSS — no login, no browser.

Apple Podcasts is the canonical index of the podcast world — it’s where shows get their stable IDs and where most other platforms (Spotify, Overcast, Pocket Casts) cross-reference. But the public listing on podcasts.apple.com is a thin web shell that lazy-loads everything, and the show pages only display recent episodes. If you want a full episode archive with audio URLs and transcripts, you have to combine two data sources the way the Apple app does internally. This guide covers how that works and what you can actually pull in 2026.

What’s worth extracting

Apple Podcasts is a two-layer dataset: show-level metadata from Apple’s index, and episode-level detail from the show’s RSS feed. Combined, you get per episode:

Identity — episode title, GUID, episode and season numbers, the parent show name and Apple collection ID.
Content — episode description and full HTML show notes (the part with sponsor links and chapter timestamps).
Media — the direct audio file URL (MP3, M4A, etc.) — the actual downloadable asset, not a player embed.
Timing — duration in seconds, publish/release timestamp in ISO format.
Artwork & genre — episode artwork URL, show genre, storefront/country.
Podcast 2.0 extras — chapter markers and transcript links when the publisher includes them.

At the show level you also get genre, country/storefront, total episode count and the canonical RSS feed URL. The audio URL is the field most people are really after — it’s what feeds a transcription pipeline.

How the data is exposed

There’s no anti-bot wall here. The challenge is that Apple’s index and the audio archive live in two different places, and you have to stitch them:

iTunes Search & Lookup API (itunes.apple.com/search, itunes.apple.com/lookup) — discovers shows by keyword or numeric collection ID and returns show metadata plus the RSS feed URL and a handful of recent episodes. This is the canonical-index layer.
The show’s RSS feed — once you have the feed URL, the feed itself (RSS 2.0 with Podcast 2.0 extensions) holds the full episode archive, the direct audio enclosures, and any transcript/chapter tags. Apple’s own app does exactly this: index lookup, then RSS.

The practical realities:

Recent-only from Apple. The iTunes API only returns the last ~few episodes per show. The full back-catalog only lives in the RSS feed.
Feeds are huge and messy. A long-running daily show can have a multi-megabyte feed with thousands of <item> elements. You need a streaming XML parser, not a load-it-all-into-memory approach, or you’ll choke on the big ones.
Dedup across sources. The same episode appears in both the iTunes recent list and the RSS feed. You dedup on a stable GUID/identifier.
Feed dialect drift. RSS 2.0 plus the iTunes namespace plus Podcast 2.0 tags — feeds vary wildly in which fields they populate. Robust parsing means tolerating missing fields gracefully.
Rate limits. The iTunes Search API throttles around 20 requests/minute per IP, so bulk discovery needs pacing.

Because it’s HTTP + XML, throughput is high and there’s no headless browser cost.

Endpoint structure

# Search for shows by keyword
https://itunes.apple.com/search?term=true+crime&entity=podcast&limit=50

# Lookup a show by its Apple collection ID (returns the feedUrl)
https://itunes.apple.com/lookup?id=1200361736&entity=podcastEpisode

# Then fetch the full archive from the feedUrl in that response, e.g.
https://feeds.megaphone.fm/the-daily

The feedUrl field in the Lookup response is the bridge: Apple gives you the canonical pointer, the publisher’s feed gives you the archive and the audio.

▶ Run the Apple Podcasts Scraper — discover shows by keyword, iTunes ID or RSS URL, then pull full episode archives with MP3 URLs, show notes, durations, artwork and transcript links. Pay per result.

Build it yourself vs. use a managed actor

Discovering one show via the Lookup API is trivial. Building a full archive pipeline is more work than it looks:

Building from scratch — call the Search/Lookup API with pacing, extract the feed URL, fetch and stream-parse arbitrarily large RSS feeds across multiple dialects, dedup across sources, and normalize timestamps and durations. Plan on several days plus ongoing tolerance for malformed feeds.
Using a managed actor — pass keywords, IDs or feed URLs and get flat, deduplicated episode rows with audio URLs already extracted.

The streaming-XML and feed-dialect tolerance is the unglamorous part that eats the most time. That’s the part the managed actor has already absorbed.

Schema design for downstream use

For a transcription or RAG pipeline, a flat per-episode row keyed by GUID works best:

{
  "episode_guid": "gid://art19/episode/abc-123",
  "show_name": "The Daily",
  "collection_id": 1200361736,
  "episode_title": "The Sunday Read",
  "season": null,
  "episode_number": 1842,
  "description": "A look at...",
  "audio_url": "https://dts.podtrac.com/redirect.mp3/...",
  "duration_seconds": 1987,
  "published_at": "2026-05-23T07:00:00Z",
  "artwork_url": "https://...artwork.jpg",
  "genre": "News",
  "storefront": "us",
  "transcript_url": "https://...transcript.json",
  "scraped_at": "2026-05-24T10:00:00Z"
}

Schema choices worth making:

Keep audio_url and transcript_url separate. Most feeds have audio; only Podcast-2.0-compliant publishers ship transcript links. Don’t assume the transcript exists.
Dedup on episode_guid, not title — re-aired episodes and bonus reposts reuse titles.
Store duration_seconds as a number, even though feeds express it as HH:MM:SS or raw seconds inconsistently. Normalize on ingest.
Resolve redirect chains lazily. Audio URLs often go through analytics redirectors (Podtrac, Chartable). Store the URL as published; resolve only when you download.

Typical use cases

Podcast intelligence dashboards — ingest full episode lists and show metadata for niche cataloging and analytics.
RAG / LLM training — collect episode titles, descriptions and show notes as semi-structured text, and pull audio for transcription.
Transcription pipelines — feed the direct audio URLs into Whisper or Distil-Whisper at scale.
Sponsorship & ad research — find shows by topic and region, enrich with show metadata for outreach and segmentation.
Podcast SEO — build category pages, sitemaps and episode aggregators from the metadata.
Competitive media intel — analyze a competitor’s release cadence, episode-length distribution and topical drift.
Cross-platform matching — use Apple’s canonical index plus RSS to map episodes across Apple, Spotify and others.

Cost math

At pay-per-event pricing — a small per-start fee plus a fraction of a cent per episode row — pulling a 500-episode back-catalog is cents. A weekly job tracking 100 shows with full archives lands in the low single digits per month. The expensive part of a podcast pipeline isn’t this scrape — it’s the GPU time for transcription downstream, which is exactly why you want the audio URLs handed to you cleanly rather than spending your engineering budget on RSS parsing.

Common pitfalls

The iTunes API only returns recent episodes. If you skip the RSS step you’ll think a 10-year show has 5 episodes.
Big feeds break naive parsers. Multi-megabyte feeds with thousands of items need streaming, not DOM-load-everything.
Audio URLs expire or redirect. Some hosts sign URLs or route through analytics redirectors. Download promptly or expect to re-resolve.
Transcripts are rare. Only Podcast 2.0 publishers include them; don’t build a product assuming every show has one.
Storefront affects discovery. A show may rank or surface differently per country; specify the storefront for reproducible search results.

Wrapping up

Apple Podcasts gives you the canonical index; the publisher’s RSS feed gives you the full archive and the audio. Stitching the two — with a streaming parser that tolerates every flavor of malformed feed — is the real work. If you need one show once, the Lookup API and a quick feed fetch will do. If you need full archives with clean audio URLs across many shows for a transcription or intelligence pipeline, a managed actor that already handles the dedup and the dialect drift saves the unglamorous days.

▶ Open the Apple Podcasts Scraper on Apify — episodes, MP3 URLs, show notes and transcript links via iTunes API + RSS. No login. Pay per result, start on Apify’s free monthly credit.