L logiover
real-estate · May 30, 2026 · 6 min read

How to Scrape Willhaben.at Listings in 2026

Extract real estate, used cars, jobs and marketplace listings from willhaben.at — Austria's #1 classifieds site — with prices, images, location and category-specific attributes.

willhaben.at is Austria’s classifieds monolith — 4.2M+ monthly users and the default place Austrians sell apartments, cars, jobs and second-hand goods. That makes it the single richest dataset for Austrian market intelligence: rental prices in Vienna, used-car values by make and year, hiring demand by city. But willhaben is a modern, defended site, and pulling clean category-specific data at scale takes more than a curl loop. This guide covers what willhaben exposes per category and how to extract it reliably.

What’s worth extracting

willhaben spans several verticals, and each carries category-specific attributes on top of a common core:

  • Common core — listing ID and URL, category, headline, price in EUR, location (city/district/postcode), seller type (private vs. commercial), publication and scrape timestamps, cover photo plus all image URLs.
  • Real estate (rent & sale, apartments, houses, land) — floor area (m²), number of rooms, property type, plot size, features, listing agency.
  • Used cars — brand, model, year, mileage, fuel type, gearbox, power (kW/PS), doors, color.
  • Jobs — employer, employment type, salary, application deadline.
  • Marketplace — item condition, delivery/shipping availability.

For real-estate analytics you center on price, m², rooms and location to compute price-per-m². For used-car intelligence it’s make/model/year/mileage. The category-specific attributes are where the analytical value lives.

The anti-bot reality

willhaben is not an open API. It’s a server-rendered site with bot defenses that punish naive clients:

  • Datacenter IP blocking — requests from cloud ASNs get challenged or 403’d quickly. Residential proxies are the reliable path for sustained crawling.
  • Rate sensitivity — burst traffic from a single IP trips throttling. Polite pacing and retries-with-backoff are mandatory for long runs.
  • Pagination over many result pages — a category in a city spans dozens of result pages; you must follow pagination to completion, not just grab page one.
  • German-language fields — listings are returned in German as displayed. Your parsing and any downstream taxonomy must handle German labels (Miete/Kauf, Wohnung/Haus, Benzin/Diesel).
  • Category-specific DOM — a car listing and a flat listing expose different attribute blocks. One generic parser won’t cut it; you need per-category extraction.

The workable stack is an HTML crawler (Cheerio-class) that follows pagination, residential-proxy support to survive the IP defenses, retries, deduplication, and per-category attribute parsing. A managed actor solves the proxy, pacing and per-category parsing once so you’re not rebuilding it per vertical.

Run the Willhaben.at Scraper — crawls marketplace, real estate, used cars and jobs; follows pagination, captures all images, parses category-specific attributes, with residential-proxy support against bot detection. JSON output with prices, location and full details.

How the URL structure works

willhaben organizes by category path plus filter query parameters:

https://www.willhaben.at/iad/immobilien/mietwohnungen/wien
  ?rows=90               # results per page
  &page=1
  &PRICE_FROM=500
  &PRICE_TO=1500
  &ESTATE_SIZE/LIVING_AREA_FROM=40

Cars and jobs live under their own paths (/iad/gebrauchtwagen/..., /iad/jobs/...) with their own filter params. You seed the crawler with a category-plus-city URL (optionally with price/size filters), and it walks pagination from there. The keyword filter and a crawl-size limit let you do targeted pulls or full-category exports.

Build it yourself vs. use a managed scraper

  • Roll your own — a Cheerio fetch of one Vienna rentals page is quick. The tail: residential-proxy integration and rotation, pacing and backoff to survive throttling, following pagination to the end of a category, per-category attribute parsers (four-plus verticals, each different), German-label handling, image-array capture, and dedup. Then re-fixing parsers whenever willhaben redesigns a card.
  • Managed actor — running in minutes, proxy and pacing handled, all categories parsed, images and dedup included.

For one Vienna neighborhood once, a script is defensible. For ongoing price-per-m² tracking across Vienna, Graz, Linz and Salzburg — or a used-car valuation feed — the proxy management and the multi-category parsing are exactly the maintenance you don’t want.

Schema design for downstream use

A real-estate listing row:

{
  "listing_id": "wh-1234567890",
  "url": "https://www.willhaben.at/iad/immobilien/d/mietwohnung/wien/wien-1070-...",
  "category": "mietwohnung",
  "headline": "Helle 3-Zimmer-Wohnung Neubau",
  "price_eur": 1290,
  "living_area_m2": 78,
  "rooms": 3,
  "property_type": "Wohnung",
  "city": "Wien",
  "district": "1070 Neubau",
  "postcode": "1070",
  "seller_type": "commercial",
  "agency": "XYZ Immobilien GmbH",
  "cover_photo": "https://cache.willhaben.at/...jpg",
  "images": ["https://cache.willhaben.at/...1.jpg", "..."],
  "price_per_m2": 16.5,
  "published_at": "2026-05-28",
  "scraped_at": "2026-05-30T09:00:00Z"
}

Schema choices worth making early:

  • Compute and store price_per_m2 for real estate — it’s the field every valuation model needs and it’s trivial to derive at scrape time.
  • Key on listing_id. Headlines repeat and prices change; the ID is the stable identity for diffing listings over time.
  • Keep images as an array. Property and car listings carry 10–30 photos; you’ll want them all for ML datasets or listing reconstruction.
  • Preserve seller_type. Private-vs-commercial is a core analytical split — dealer pricing differs systematically from private pricing.
  • Store German labels as-is plus a normalized field if you translate. Don’t lose the source value.

Typical use cases

  • Austrian real-estate analysis — track rent and sale prices across Vienna, Graz, Linz and Salzburg; build price-per-m² trends for valuation and investment models.
  • Used-car market intelligence — value cars by make/model/year/mileage; compare dealer vs. private pricing.
  • Labour-market research — analyze Austrian hiring demand by role and city from the jobs vertical.
  • Marketplace price monitoring — compare second-hand goods prices and spot arbitrage.
  • Lead generation — extract agency and dealer listings for B2B outreach.
  • AI training datasets — collect German-language product and property descriptions for NLP.

The value is breadth across verticals plus freshness: a re-run-and-diff cadence turns willhaben into a live Austrian market feed across property, autos and labor.

Cost math for the managed approach

willhaben needs residential proxy bandwidth to survive its bot defenses, so cost is higher than an open-API scrape but still modest on a managed model where the proxy and compute are bundled per result. A daily price-tracking crawl of a few categories across the major cities lands in low double digits per month. The bigger saving is the proxy bill and maintenance you avoid: a respectable residential pool runs hundreds per month on its own, before you’ve written a line of parser code or fixed the first DOM change.

Common pitfalls

  • Skipping residential proxies — datacenter IPs get blocked fast; a long run from a single cloud IP dies early and incompletely.
  • Stopping at page one — a Vienna rentals category has dozens of pages. Follow pagination to the end or your “market snapshot” is a sliver.
  • One parser for all categories — cars, flats and jobs expose different attribute blocks. Parse per category or you’ll get nulls everywhere.
  • Mishandling German labels — assuming English values breaks filtering. Map Miete/Kauf, Benzin/Diesel/Elektro, Wohnung/Haus explicitly.
  • Not diffing over time — a one-shot crawl is a snapshot. For price trends you need to re-run and diff on listing_id.

Wrapping up

willhaben is the richest window into the Austrian classifieds market, but it’s a defended, multi-vertical, German-language site — so the work is residential-proxy survival, full pagination, and per-category parsing. For a single neighborhood snapshot, a script with a proxy gets you there. For ongoing price-per-m² or used-car intelligence across Austria, let a managed actor carry the proxy management and the multi-category extraction.

Open the Willhaben scraper on Apify — real estate, cars, jobs and marketplace with prices, images and location; residential-proxy backed. Schedule it and diff listings over time.

Related guides