jobs · May 30, 2026 · 5 min read

How to Scrape Lagou.com China Tech Jobs in 2026

A guide to extracting tech jobs from Lagou.com (拉勾网) — salaries, tech stacks, company funding and size from ByteDance, Alibaba, Tencent and 100,000+ Chinese tech firms.

If you want to understand China’s tech labor market — salaries by role and city, which frameworks companies are hiring for, how fast a sector is staffing up — Lagou.com (拉勾网) is the source. It’s China’s largest IT-focused recruitment platform, carrying tech roles from ByteDance, Alibaba, Tencent, Baidu and well over 100,000 other companies. The data is rich (it includes company funding stage and headcount, not just the job), but it’s behind Chinese-language search facets, salary strings in a local format, and the usual need for parallelism and proxy rotation to scrape at volume. This guide covers how to extract it cleanly and at scale.

What’s worth extracting

Lagou serves server-rendered listing data, so this is direct HTTP fetching — no headless browser. Per role, after parsing and normalization, you get:

Position — job title and position ID.
Company — company name and company ID.
Location — city and district.
Salary — the raw Chinese salary string (e.g. “25k-40k·15薪”) and parsed numeric min/max so you can actually do math on it.
Requirements — years of experience and education level required.
Role taxonomy — job type / category.
Tech stack — the required skills and technologies tagged on the listing — the single most valuable field for demand analysis.
Description — the full job description text when available.
Company attributes — funding stage, employee-size band, industry classification, and logo URL.
Timing — publish timestamp, plus search metadata recording which query surfaced the row.

The company attributes are what make Lagou special: you’re not just getting jobs, you’re getting a hiring-activity signal tied to each company’s funding stage and size — gold for company research and investment due diligence.

The coverage strategy: keyword × city × experience

Lagou’s search is faceted and paginated, and any single search caps out well before it shows you everything. The scraper’s strategy is to generate combinations of keyword × city × experience-filter and fetch each, then deduplicate across combinations.

Why the experience filter matters: searching “Java engineer in Beijing” might return, say, the first few hundred results before pagination dries up. But splitting that same search by experience band (0–1 years, 1–3, 3–5, 5–10, …) surfaces a different slice of the listings in each band, so the union covers far more unique roles than the unsplit search ever would. This experience-filter expansion is the key trick for maximizing unique results — the actor does it automatically and dedups on position ID across the expanded combinations.

keywords  e.g. ["Go", "React", "推荐算法"]
cities    e.g. ["北京", "上海", "深圳", "杭州"]   (25+ supported)
exp split e.g. ["不限","1年以下","1-3年","3-5年","5-10年"]

Parallel workers, retry logic and proxy rotation keep throughput up and access stable across all those combinations.

▶ Run the Lagou Tech Jobs Scraper — keyword × city × experience expansion across 25+ cities, parallel HTTP workers, parsed Chinese salary ranges, company funding/size attributes. No browser needed — fast, cheap, scalable.

Schema design for downstream use

A clean per-role record (English field names over Chinese values):

{
  "position_id": "9183742",
  "title": "高级后端工程师 (Go)",
  "company_id": "ByteDance",
  "company_name": "字节跳动",
  "city": "北京",
  "district": "海淀区",
  "salary_raw": "40k-65k·16薪",
  "salary_min": 40000,
  "salary_max": 65000,
  "experience_required": "3-5年",
  "education_required": "本科",
  "category": "后端开发",
  "tech_stack": ["Go", "Kubernetes", "MySQL", "微服务"],
  "funding_stage": "上市公司",
  "company_size": "10000人以上",
  "industry": "互联网",
  "published_at": "2026-05-28",
  "search_meta": { "keyword": "Go", "city": "北京", "exp": "3-5年" }
}

Schema choices worth making early:

Keep salary_raw and the parsed min/max. The raw string carries the ·15薪 / ·16薪 annual-multiplier detail (number of monthly salaries per year) that the min/max alone loses — and it’s a real component of Chinese tech comp.
Don’t translate values destructively. Store the Chinese originals; translate in a derived column if you need English. The original is your source of truth.
Dedup on position_id. The keyword × city × experience expansion will surface the same role from multiple combinations by design.
Preserve search_meta. When a role shows up under three searches, the metadata tells you why — useful for debugging coverage and for weighting demand signals.
Keep tech_stack as an array. It’s the field you’ll aggregate most (“which frameworks are demand growing for”); don’t flatten it to a string.

Typical use cases

Salary benchmarking — comp by language, role and city; account for the ·N薪 multiplier for true annual figures.
Tech-stack demand analysis — which languages, frameworks and tools companies are hiring for, and how that shifts over time.
Hiring-trend monitoring — track company and sector hiring velocity from publish timestamps and posting volume.
Company research — map funding stage, headcount, industry and hiring volume per company.
Recruitment intelligence — targeted candidate sourcing using real market data.
Academic labor-market research — regional tech-hub development and wage studies.
Market-entry analysis — gauge local tech-talent availability before expanding into a Chinese city.
Investment due diligence — read hiring activity and role composition as a growth signal.
Data journalism — data-driven stories on China’s tech industry.

Cost math

Pay-per-event with a tiny run-start fee and free results, and because it’s HTTP-only (no browser) the per-listing compute is cheap. The cost lever is how wide you fan out the keyword × city × experience grid: more combinations means more requests and more runtime, but each result itself is free.

A focused run — a handful of keywords across the top cities — pulls thousands of deduplicated listings for low single-digit-dollar compute plus proxy bandwidth. A broad market sweep costs proportionally more in compute and proxy, but still vastly less than the alternative of building and babysitting your own parallel-fetch + retry + Chinese-salary-parsing pipeline, especially given the language and pagination quirks.

Common pitfalls

Under-fanning the search. Without experience-filter expansion you’ll think you scraped “all Beijing Java jobs” when you got a third of them. Use the expansion.
Mangling the salary multiplier. “25k-40k·15薪” is 15 months of pay, not 12. If you compute annual comp as min*12 you’ll understate it. Parse the ·N薪 suffix.
Encoding issues. Chinese text must be UTF-8 end to end. A mojibake company name breaks dedup and joins.
Throttling without proxy rotation. Hammering Lagou from one IP gets you rate-limited fast. The actor rotates proxies and backs off; if you build your own, plan for it.
Treating funding stage as static. A company’s funding/size is a snapshot at scrape time. For longitudinal company research, store published_at and re-scrape rather than assuming it’s constant.

Wrapping up

Lagou is the deepest window into China’s tech hiring — salaries, stacks, and company funding signals in one place. The hard parts are coverage (the keyword × city × experience expansion that surfaces unique results) and the local-format salary parsing. If you need that data at volume, on a repeatable basis, a maintained actor that fans out the search, rotates proxies, and parses the Chinese salary strings is the fast path to a clean dataset.

▶ Open the Lagou Tech Jobs Scraper on Apify — China tech jobs, salaries and company data across 25+ cities, structured JSON out. Pay-per-event, no browser. Start on Apify’s free monthly credit.