Scraping Stepstone.de — DACH Jobs Data Extraction Guide
How to pull job postings from Stepstone.de — Germany's largest job board — at scale, including the anti-bot challenges and what recruiters and ATS vendors do with the data.
Stepstone.de is the dominant job board in Germany, with active postings across the DACH region — Germany, Austria, Switzerland. For recruiters, ATS vendors, and labor-market researchers, it’s the single highest-signal source of German-speaking job demand. It’s also one of the more heavily defended job boards on the European web. This guide covers what’s worth pulling, how the site fights bots, and what to do once the data lands in your warehouse.
What’s on each posting
Every Stepstone listing surfaces a consistent set of fields once you crack the bot-detection layer:
- Identity — job ID, title, company name, company logo URL.
- Location — city, region (Bundesland), country, sometimes a more specific address.
- Posted date — when the listing went live.
- Job type — full-time, part-time, contract, internship.
- Salary band — only if disclosed by the employer (about 25% of listings in 2026).
- Skills / requirements — extracted from the listing body, often with a structured skills array.
- Sponsored flag — Stepstone marks paid-promoted listings differently in the DOM.
- Industry / department — categorical tags.
- Application URL — direct link to apply, often a Stepstone-hosted form or a redirect to the employer’s ATS.
- Listing body text — the full job description in HTML or markdown form.
Recruitment intelligence at scale needs the company + posted date + skills + salary at minimum; everything else is a bonus.
The anti-bot reality
Stepstone runs Akamai Bot Manager, which is among the more aggressive WAFs on the European web. The naive requests script doesn’t survive page one. What Akamai checks:
- TLS fingerprint (JA3/JA4) — does your client look like Chrome, Firefox, Safari?
- HTTP/2 frame ordering — Akamai inspects the order in which your client sends SETTINGS, WINDOW_UPDATE, and HEADERS frames. Generic HTTP libraries get caught immediately.
- JavaScript challenge — first visit gets a
sensor_dataPOST that runs a 30-line JavaScript bot-detection routine. If your client doesn’t execute it, you get rate-limited or 403’d. - IP reputation — datacenter ASNs are flagged. Residential and mobile ASNs are treated as innocent.
What actually works:
- TLS impersonation (curl-impersonate, requests with a forged JA3, or Patchright) — your client’s TLS handshake looks like a real Chrome.
- Residential proxy pool — IPs that look like normal home internet connections.
- Browser-like header order — Akamai checks header ordering, not just content. The order in which
User-Agent,Accept,Accept-Language, etc. appear must match a real browser. - Modest concurrency — 2–4 concurrent requests per worker, not 50. Akamai’s behavioral layer flags concurrency that’s too clean.
- JS execution for the first page — once you have a valid
_abckcookie, subsequent requests within that session can be plain HTTP.
This is the kind of cat-and-mouse where every quarter Akamai ships a tweak and every scraper needs to adapt. Production-grade Stepstone scraping is not a “30-line script” job.
URL structure
Search results land at:
https://www.stepstone.de/jobs/<keyword>/in-<city>
For example:
https://www.stepstone.de/jobs/software-engineer/in-berlin
https://www.stepstone.de/jobs/marketing-manager/in-muenchen
Pagination is via ?page=N. Each result page has 25 listings by default.
The detail page for any listing is at:
https://www.stepstone.de/stellenangebote--<slug>--<job_id>-inline.html
The detail page is where the full description, requirements, and salary band live (when disclosed). You scrape the listing pages for the index and crawl into the detail pages for full records.
▶ Run the Stepstone.de Jobs Scraper — handles TLS fingerprinting, residential proxy rotation, and pagination. Returns clean job rows. Pay per job posting returned.
Clean output schema
A single row per posting:
{
"job_id": "11023845",
"title": "Senior Software Engineer (m/w/d)",
"company_name": "Beispiel GmbH",
"company_logo_url": "https://...",
"location_city": "Berlin",
"location_region": "Berlin",
"country": "DE",
"job_type": "full-time",
"posted_date": "2026-05-15",
"salary_min_eur": 65000,
"salary_max_eur": 85000,
"salary_period": "year",
"is_sponsored": false,
"skills": ["Python", "AWS", "Microservices", "Kubernetes"],
"industry": "Software Development",
"application_url": "https://www.stepstone.de/stellenangebote--...",
"description_text": "...",
"scraped_at": "2026-05-19T12:00:00Z"
}
Schema choices worth making upfront:
- Always store both
salary_minandsalary_maxas separate fields withsalary_period. Stepstone shows salary as ranges, and the period (year, month, hour) varies. - Skills extraction is fuzzy — companies write “AWS” or “Amazon Web Services” or “Cloud (AWS)”. Normalize at ingestion if you’ll query on skills heavily.
- Keep
description_texteven if you don’t use it now — it’s the most expensive field to re-fetch later, and downstream ML use cases (skills extraction, classification) want it. scraped_atmatters — postings expire and get re-posted; samejob_idmay appear and disappear over time.
Use cases driving this data
What buyers actually do with Stepstone data:
- Recruiter / sourcing platforms — “find me companies hiring senior Python engineers in Berlin right now.” Stepstone is the canonical answer for DACH.
- ATS vendors — competitive product intelligence on what skills are trending, what salary bands look like, what job titles are emerging.
- Labor-market researchers — academic and policy work needs job-volume data over time, by region, by industry.
- Company-level intelligence — track which companies are scaling (lots of new postings) vs. quiet (no new postings for 90 days).
- Salary benchmarking products — Stepstone is one of the few job boards where salary is disclosed often enough to build a benchmark.
The common thread: DACH-specific demand. If your customer base is German-speaking and you don’t have Stepstone data, you have a gap that LinkedIn jobs alone won’t close.
Pulling at scale
For an ongoing pipeline, the math is:
- ~25 listings per index page; pagination caps at 50 pages per query (1,250 results max per keyword/city combination).
- A single (keyword, city) search returns 100–3,000 listings depending on density.
- For DACH-wide coverage, you’d run ~30 city queries × ~15 keyword queries = ~450 search combinations.
- Each combination needs to be re-run daily to catch new postings.
At ~3 requests/second (the residential-proxy sustainable rate), you’re looking at:
- ~15 minutes per full DACH refresh on the index pages.
- ~30–60 minutes for the detail-page enrichment.
A managed actor handles all of this without you renting proxies or maintaining the TLS-fingerprint stack.
Build it yourself vs. managed scraper
The brutal truth about scraping Stepstone yourself:
- Day 1–3: Reading documentation on Akamai Bot Manager, trying out curl-impersonate, fighting JA3 issues.
- Day 4–7: Renting a residential proxy pool, debugging cookie handling, hitting random 403s.
- Day 8–14: Building incremental dataset logic, handling reposted jobs, normalizing skills.
- Ongoing: Every 2–8 weeks, Akamai ships a tweak. Your scraper breaks for a day. You debug.
A managed actor:
- Hour 1: First rows in your warehouse.
- Hour 2: Schema integrated with your downstream system.
- Ongoing: Pay per row, maintenance is someone else’s problem.
For one-time exploration, build it yourself. For any production pipeline, the managed-scraper math is a clear win.
Pitfalls
A few traps when working with Stepstone data:
- “In Germany only” is a lie — Stepstone.de includes a fair number of Austrian and Swiss postings, especially for companies headquartered in Munich or Vienna that want regional candidates. Filter on country explicitly.
- Salary disclosure is inconsistent — 75% of postings have no salary band. Don’t assume null = “no salary,” it might mean “employer didn’t disclose.”
- Reposted jobs — Stepstone sometimes assigns a new
job_idto a re-listed posting. Dedupe on(company_name, title, location)if you need a logical unique key. - Sponsored listings — sponsored jobs appear at the top of search results regardless of relevance. They can skew aggregate counts. Filter them if you want organic demand signal.
- German-language job titles — most postings are in German. If your downstream system is English-only, translate or normalize at ingestion.
Wrapping up
Stepstone.de is the most important job board in German-speaking Europe, and one of the more aggressively defended. A managed scraper handles the Akamai layer and the proxy rotation so you can focus on what you actually want to do with the data — recruit, research, or sell to recruiters.
▶ Open the Stepstone.de Jobs Scraper on Apify — DACH job market data with skills, salary, and company signals. Pay per job posting.
Related guides
How to Scrape Arbeitnow Jobs (DACH & EU Remote) in 2026
Pull a fresh feed of German-market and EU-remote tech jobs from Arbeitnow — filter by keyword, remote, employment type, tags and city, scheduled for daily deltas.
How to Scrape Built In Tech Jobs Data in 2026
Extract tech and startup job listings from Built In (builtin.com) at scale — salary, skills, remote flags, hiring companies — across the national board and every US tech hub.
How to Scrape elempleo Colombia Job Listings in 2026
A practical guide to extracting job postings from elempleo.com, Colombia's largest job board — titles, companies, cities, salaries and contract types — cleanly and at scale.