jobs · May 28, 2026 · 6 min read

How to Scrape Welcome to the Jungle (WTTJ) Jobs in 2026

Extract WTTJ job listings and company data — titles, salaries, remote policy, contract type and funding — straight from the Algolia index. No proxy, no login, fast and structured.

Welcome to the Jungle (WTTJ) is the dominant employer-branding and jobs platform for the European tech and startup scene, with deep company profiles — size, funding, offices — attached to every listing. That company context is what makes WTTJ data more valuable than a plain job board. And the best part for data work: WTTJ’s search is powered by a public Algolia index, which means you can pull structured listings at high throughput without a browser, a proxy, or a login. This guide covers how the Algolia-backed extraction works and how to get clean job-plus-company records.

What’s worth extracting

WTTJ pairs a job record with a rich company record. Flattened, each listing carries:

Job — listing identifier, title, canonical URL, employment/contract type, remote policy, experience level required.
Compensation — disclosed salary range and currency, where the employer published it (WTTJ pushes salary transparency, so coverage is better than most boards).
Classification — profession and category labels (e.g. Software Engineering, Product, Sales).
Company — name, size band, funding stage/amount, description, website, logo.
Location — office locations attached to the role.
Meta — listing language (English or French — WTTJ is bilingual), publication timestamp, scrape timestamp.

For a job board you center on title, salary, remote and apply URL. For market research the company fields — size and funding — are the differentiator; you can segment hiring demand by company stage.

Why the Algolia index beats HTML scraping

WTTJ’s front end queries a hosted Algolia search index using a public, browser-exposed search key. That’s the fast path:

No browser, no proxy — you query the search API directly with the public key. No headless Chrome, no residential IPs, no anti-bot challenge.
Algolia pagination — the index pages cleanly and supports large result sets, so high-throughput extraction is straightforward.
Rich filters server-side — keyword, contract type, remote policy, experience level, salary range and profession all map to Algolia facet filters, so you narrow at query time instead of post-filtering a giant pull.
Stable payloads — an API index changes shape far less often than a React DOM. Scraping the rendered page would mean chasing class-name churn; querying the index is durable.

The one nuance: the full HTML job description isn’t always in the search index. When you need the long-form description, you fetch it via the platform’s REST API per listing — an optional, slower step layered on top of the fast index scrape. A managed actor handles both the index query and the optional description fetch.

▶ Run the Welcome to the Jungle Jobs Scraper — queries the WTTJ Algolia index directly: title, salary, remote, contract, experience plus full company data (size, funding, website). No proxy or login. Optional full HTML descriptions.

How the query works

Conceptually, the scrape is an Algolia search with facet filters:

query: "data engineer"
facetFilters:
  - "contract_type: full_time"
  - "remote: fully_remote"
  - "experience_level: senior"
  - "profession: Software Engineering"
numericFilters:
  - "salary_min >= 60000"
page: 0   (then 1, 2, ... via Algolia pagination)

Each page returns a batch of hits with both job and embedded company attributes. You walk the pages until exhausted. Because filtering happens server-side, a tight query (“senior fully-remote data engineering roles paying 60k+”) returns a small, exact set rather than forcing you to download everything and filter locally.

Build it yourself vs. use a managed scraper

Roll your own — quick to fire one Algolia query once you’ve sniffed the public key and index name from the site’s network traffic. The tail: mapping WTTJ’s filter UI to the right facet names, handling Algolia pagination, flattening the nested company object, the optional per-listing description fetch via REST, bilingual field handling, and re-finding the key/index if WTTJ rotates them.
Managed actor — running in minutes, filters mapped, pagination and company-flattening handled, optional descriptions available. Output to JSON, CSV or Excel.

For a one-off pull, a script works. For a recurring European-tech-jobs feed with company enrichment, the facet mapping and the company-object flattening are the fiddly parts worth offloading.

Schema design for downstream use

A clean per-listing record:

{
  "job_id": "wttj-7f3a91",
  "title": "Senior Data Engineer",
  "url": "https://www.welcometothejungle.com/en/companies/acme/jobs/senior-data-engineer",
  "contract_type": "full_time",
  "remote_policy": "fully_remote",
  "experience_level": "senior",
  "salary_min": 65000,
  "salary_max": 80000,
  "salary_currency": "EUR",
  "profession": "Software Engineering",
  "category": "Data / Analytics",
  "company_name": "Acme",
  "company_size": "50-250",
  "company_funding": "Series B",
  "company_website": "https://acme.io",
  "offices": ["Paris, FR", "Remote"],
  "language": "en",
  "published_at": "2026-05-25T09:00:00Z",
  "scraped_at": "2026-05-28T10:00:00Z"
}

Schema choices worth making early:

Key on job_id. Bilingual listings can appear under English and French URLs; the ID ties them together.
Store salary_min, salary_max and salary_currency separately. WTTJ’s salary transparency is a research asset — keep it queryable, not as a display string.
Keep the company fields denormalized on the row for analytics, but consider a separate company dimension table if you’re tracking many roles per employer.
Record language. A French-only listing in an English-language board is a quality bug; filter on it.

Typical use cases

Curated European remote tech job board — filter to fully-remote roles and republish a focused vertical.
Salary transparency research — WTTJ’s disclosed ranges make it one of the better sources for studying pay by role and stage.
Hiring-demand tracking by stack or profession — count open roles by technology or category over time.
Company-recruiting monitoring — see which companies (by size and funding stage) are actively hiring.
ATS / CRM / job-alert feeds — pipe structured WTTJ listings into downstream hiring tools or alert products.

The differentiator is the company context. Plenty of boards list jobs; few hand you the employer’s size and funding stage alongside, which is what lets you segment demand by company maturity.

Cost math for the managed approach

Because it’s a direct API index query — no browser, no proxy — extraction is fast and cheap; cost is essentially compute. A daily refresh of a filtered slice (say, all remote engineering roles) is a few dollars a month. The optional full-description fetch adds per-listing REST calls, so enable it only when you actually need the long-form text. The expense you avoid is the reverse-engineering: finding the index/key and mapping every filter facet correctly, then re-doing it if WTTJ rotates the public key.

Common pitfalls

Hardcoding the public Algolia key — sites rotate these. A robust pull resolves the current key/index rather than baking in last month’s.
Fetching descriptions you don’t need — the per-listing REST call is the slow, costly part. Skip it unless your use case needs full text.
Ignoring language — pulling the FR index into an EN board ships mixed-language rows. Filter on language or query the right locale.
Treating salary as always-present — even on WTTJ not every role discloses pay. Handle nulls; don’t assume a range exists.
Flattening the company object lossily — funding and size live nested; a careless flatten drops them. Map them explicitly.

Wrapping up

WTTJ is a best-case scrape: a public Algolia index means fast, browser-free, proxy-free extraction with rich company context baked in. The work is mapping filters to facets, flattening the company object, and deciding when the optional description fetch is worth its cost. For a quick pull, a script suffices. For a recurring European-tech-jobs feed with employer enrichment, let a managed actor own the facet mapping and flattening.

▶ Open the WTTJ jobs scraper on Apify — filter by keyword, contract, remote, experience and salary; full company data included. Export JSON, CSV or Excel. Schedule it for a fresh tech-jobs feed.