L logiover
social-media · May 22, 2026 · 5 min read

How to Scrape LinkedIn Top Content & Top Voices in 2026

Extract LinkedIn's curated Top Content directory and verified Top Voice influencers across 40+ topics — post text, author profiles, follower counts and engagement metrics, with no login.

If the Ad Library tells you what brands are paying to say on LinkedIn, the Top Content directory tells you what people are organically rewarding. LinkedIn curates a public directory of high-engagement posts and verified Top Voice influencers across more than 40 topic categories — Marketing, AI, Leadership, Sales, Finance and more. It’s a goldmine for influencer discovery and content benchmarking, but like the rest of LinkedIn it isn’t built for export. This guide covers what the directory exposes and how to pull it at scale.

Top Content vs. the Ad Library — two different signals

These are easy to confuse, so it’s worth being precise. The Ad Library is paid, advertiser-declared creative. The Top Content directory is organic, algorithmically-curated posts — content that earned its reach through real reactions and comments, plus the authors LinkedIn has formally badged as Top Voices in a given topic.

That makes Top Content the better source when your question is about what resonates and who is influential, not who is buying ads. Both share one crucial property: they’re served from LinkedIn’s public directory, so neither requires a logged-in account.

The no-login, no-browser approach

Scraping authenticated LinkedIn — the feed, profiles, search — is hostile and risky to your account. The Top Content directory is different. It’s a public, server-rendered surface, which means:

  • No login and no cookies — you never authenticate, so there’s no account to get restricted.
  • Pure HTTP, no headless browser — the post and author data is in the server-rendered HTML, parsed with Cheerio. No Playwright, no Chromium, low memory footprint, high throughput.
  • No account-ban risk — because there’s no account in the loop at all.

The interesting engineering wrinkle here is Top Voice detection. The “Top Voice” badge is rendered differently across locales, so reliable detection combines three signals: badge metadata in the markup, parsing the author bio, and matching localized label text. Getting this right across languages is the part most home-grown scrapers get wrong.

What’s worth extracting

Per record, the directory yields a post plus its author and engagement context:

  • Post — full post text, post identifier, timestamp, media links, and tagged companies.
  • Author metadata — name, profile URL, follower count, bio.
  • Top Voice signal — whether the author carries a verified Top Voice badge in this topic.
  • Engagement metrics — reaction counts and comment counts.
  • Topic taxonomy — the category, subtopic, and leaf-topic the post was curated under (e.g., Marketing → Content Marketing → Newsletters).

Records are auto-deduplicated by post identifier, so scheduled re-runs surface only new content.

Run the LinkedIn Top Content & Top Voices Scraper — curated high-engagement posts and verified Top Voices across 40+ categories. Author profiles, follower counts, engagement. No login. $2 per 1,000 posts.

Schema design for downstream use

A clean per-post record:

{
  "post_id": "urn:li:activity:7301234567890",
  "post_text": "Most B2B content fails because it answers questions nobody asked...",
  "author_name": "Dana Okafor",
  "author_url": "https://www.linkedin.com/in/dana-okafor",
  "author_followers": 84210,
  "author_bio": "Head of Content @ Northwind. I write about positioning.",
  "is_top_voice": true,
  "category": "Marketing",
  "subtopic": "Content Marketing",
  "leaf_topic": "B2B Content",
  "reactions": 3120,
  "comments": 287,
  "tagged_companies": ["Northwind"],
  "posted_at": "2026-05-18T07:30:00Z",
  "scraped_at": "2026-05-22T11:00:00Z"
}

Schema choices worth making early:

  • Keep the full taxonomy (category / subtopic / leaf_topic). Influence is topic-specific — a Top Voice in AI is not necessarily one in Finance. Flattening to a single “topic” string loses the hierarchy you’ll want to filter on.
  • Store is_top_voice as a boolean, not a string. You’ll filter on it constantly.
  • Persist author_followers per scrape, not once. Follower counts drift; if you want to spot rising stars you need the time series.
  • Don’t discard low-engagement posts. The ratio of reactions to follower count is often more telling than raw reactions — keep both so you can compute it.

Typical use cases

What teams actually do with this data:

  • Influencer discovery — surface verified Top Voices in a specific topic for outreach, speaker sourcing, podcast guests, or advisory recruiting.
  • Content benchmarking — aggregate a category’s top posts to learn the average post length, hook style, hashtag usage, and which formats earn engagement in your niche.
  • Swipe files for content teams — collect top-performing hooks and post structures to reverse-engineer what works.
  • Lead lists of active authors — filter authors by engagement threshold and topic to build prospect lists for sales or CRM enrichment.
  • Trend monitoring — track which categories are growing and what themes are emerging, week over week.
  • Rising-star tracking — watch author recurrence and follower growth to catch influencers on the way up before they’re saturated with offers.
  • PR / media targeting — find topical, credible authors with engagement signals for press outreach.
  • AI training corpora — high-engagement post + author-metadata pairs are excellent for fine-tuning content-generation models or feeding RAG pipelines.

The common thread is topic-scoped influence plus engagement context. A list of names is cheap; a ranked list of who is influential on a specific subject right now, with the posts that prove it is what’s actually useful.

Cost math

There’s no browser and no proxy bandwidth bill — just HTTP against public directory pages — so cost is low. Pricing is $2 per 1,000 posts plus a negligible per-run start fee.

Example: tracking 10 topic categories, pulling ~300 top posts each, refreshed weekly, is ~3,000 posts/week or ~12,000/month. At $2 per thousand that’s about $24/month for a continuously-refreshed influencer-and-content feed across your whole topic set.

Building it yourself, the hard parts aren’t the HTTP — they’re the multi-locale Top Voice detection, the three-level taxonomy parsing, and dedup across runs. Those are exactly the things that break silently when LinkedIn adjusts markup, which is the maintenance the managed actor absorbs.

Common pitfalls

  • “Top Voice” is topic-scoped. Someone badged in Leadership may have no badge in Sales. Always pair the badge with the category it was detected in.
  • Curation is LinkedIn’s, not yours. The directory reflects LinkedIn’s own ranking, which is opaque and shifts. Treat it as a high-quality sample of what’s resonating, not a complete census.
  • Follower count is a vanity-prone metric. A 200k-follower author with 40 reactions is less influential on this topic than a 5k-follower author with 800. Compute engagement-to-follower ratio.
  • Locale matters. The same topic in different languages surfaces different authors. If you only scrape the English directory you’ll miss regional Top Voices.
  • Don’t scrape it once and call it done. The whole value is the trend line. A single snapshot can’t tell you who’s rising or what’s gaining momentum.

Wrapping up

The Top Content directory is the organic counterpart to the Ad Library — it shows what LinkedIn’s audience actually rewards and who is genuinely influential, topic by topic, with no login and no account risk. For a quick browse the UI is fine. For influencer discovery, content benchmarking, or a refreshed feed of rising voices in your niche, run it as a managed scraper and let the multi-locale parsing maintenance be someone else’s problem.

Open the LinkedIn Top Content Scraper on Apify — discover Top Voices, benchmark content, track trends across 40+ topics. Pay per post. Start with Apify’s free monthly credit.

Related guides