How to Scrape Lagou.com China Tech Jobs in 2026
A guide to extracting tech jobs from Lagou.com (拉勾网) — salaries, tech stacks, company funding and size from ByteDance, Alibaba, Tencent and 100,000+ Chinese tech firms.
If you want to understand China’s tech labor market — salaries by role and city, which frameworks companies are hiring for, how fast a sector is staffing up — Lagou.com (拉勾网) is the source. It’s China’s largest IT-focused recruitment platform, carrying tech roles from ByteDance, Alibaba, Tencent, Baidu and well over 100,000 other companies. The data is rich (it includes company funding stage and headcount, not just the job), but it’s behind Chinese-language search facets, salary strings in a local format, and the usual need for parallelism and proxy rotation to scrape at volume. This guide covers how to extract it cleanly and at scale.
What’s worth extracting
Lagou serves server-rendered listing data, so this is direct HTTP fetching — no headless browser. Per role, after parsing and normalization, you get:
- Position — job title and position ID.
- Company — company name and company ID.
- Location — city and district.
- Salary — the raw Chinese salary string (e.g. “25k-40k·15薪”) and parsed numeric min/max so you can actually do math on it.
- Requirements — years of experience and education level required.
- Role taxonomy — job type / category.
- Tech stack — the required skills and technologies tagged on the listing — the single most valuable field for demand analysis.
- Description — the full job description text when available.
- Company attributes — funding stage, employee-size band, industry classification, and logo URL.
- Timing — publish timestamp, plus search metadata recording which query surfaced the row.
The company attributes are what make Lagou special: you’re not just getting jobs, you’re getting a hiring-activity signal tied to each company’s funding stage and size — gold for company research and investment due diligence.
The coverage strategy: keyword × city × experience
Lagou’s search is faceted and paginated, and any single search caps out well before it shows you everything. The scraper’s strategy is to generate combinations of keyword × city × experience-filter and fetch each, then deduplicate across combinations.
Why the experience filter matters: searching “Java engineer in Beijing” might return, say, the first few hundred results before pagination dries up. But splitting that same search by experience band (0–1 years, 1–3, 3–5, 5–10, …) surfaces a different slice of the listings in each band, so the union covers far more unique roles than the unsplit search ever would. This experience-filter expansion is the key trick for maximizing unique results — the actor does it automatically and dedups on position ID across the expanded combinations.
keywords e.g. ["Go", "React", "推荐算法"]
cities e.g. ["北京", "上海", "深圳", "杭州"] (25+ supported)
exp split e.g. ["不限","1年以下","1-3年","3-5年","5-10年"]
Parallel workers, retry logic and proxy rotation keep throughput up and access stable across all those combinations.
▶ Run the Lagou Tech Jobs Scraper — keyword × city × experience expansion across 25+ cities, parallel HTTP workers, parsed Chinese salary ranges, company funding/size attributes. No browser needed — fast, cheap, scalable.
Schema design for downstream use
A clean per-role record (English field names over Chinese values):
{
"position_id": "9183742",
"title": "高级后端工程师 (Go)",
"company_id": "ByteDance",
"company_name": "字节跳动",
"city": "北京",
"district": "海淀区",
"salary_raw": "40k-65k·16薪",
"salary_min": 40000,
"salary_max": 65000,
"experience_required": "3-5年",
"education_required": "本科",
"category": "后端开发",
"tech_stack": ["Go", "Kubernetes", "MySQL", "微服务"],
"funding_stage": "上市公司",
"company_size": "10000人以上",
"industry": "互联网",
"published_at": "2026-05-28",
"search_meta": { "keyword": "Go", "city": "北京", "exp": "3-5年" }
}
Schema choices worth making early:
- Keep
salary_rawand the parsed min/max. The raw string carries the·15薪/·16薪annual-multiplier detail (number of monthly salaries per year) that the min/max alone loses — and it’s a real component of Chinese tech comp. - Don’t translate values destructively. Store the Chinese originals; translate in a derived column if you need English. The original is your source of truth.
- Dedup on
position_id. The keyword × city × experience expansion will surface the same role from multiple combinations by design. - Preserve
search_meta. When a role shows up under three searches, the metadata tells you why — useful for debugging coverage and for weighting demand signals. - Keep
tech_stackas an array. It’s the field you’ll aggregate most (“which frameworks are demand growing for”); don’t flatten it to a string.
Typical use cases
- Salary benchmarking — comp by language, role and city; account for the
·N薪multiplier for true annual figures. - Tech-stack demand analysis — which languages, frameworks and tools companies are hiring for, and how that shifts over time.
- Hiring-trend monitoring — track company and sector hiring velocity from publish timestamps and posting volume.
- Company research — map funding stage, headcount, industry and hiring volume per company.
- Recruitment intelligence — targeted candidate sourcing using real market data.
- Academic labor-market research — regional tech-hub development and wage studies.
- Market-entry analysis — gauge local tech-talent availability before expanding into a Chinese city.
- Investment due diligence — read hiring activity and role composition as a growth signal.
- Data journalism — data-driven stories on China’s tech industry.
Cost math
Pay-per-event with a tiny run-start fee and free results, and because it’s HTTP-only (no browser) the per-listing compute is cheap. The cost lever is how wide you fan out the keyword × city × experience grid: more combinations means more requests and more runtime, but each result itself is free.
A focused run — a handful of keywords across the top cities — pulls thousands of deduplicated listings for low single-digit-dollar compute plus proxy bandwidth. A broad market sweep costs proportionally more in compute and proxy, but still vastly less than the alternative of building and babysitting your own parallel-fetch + retry + Chinese-salary-parsing pipeline, especially given the language and pagination quirks.
Common pitfalls
- Under-fanning the search. Without experience-filter expansion you’ll think you scraped “all Beijing Java jobs” when you got a third of them. Use the expansion.
- Mangling the salary multiplier. “25k-40k·15薪” is 15 months of pay, not 12. If you compute annual comp as
min*12you’ll understate it. Parse the·N薪suffix. - Encoding issues. Chinese text must be UTF-8 end to end. A mojibake company name breaks dedup and joins.
- Throttling without proxy rotation. Hammering Lagou from one IP gets you rate-limited fast. The actor rotates proxies and backs off; if you build your own, plan for it.
- Treating funding stage as static. A company’s funding/size is a snapshot at scrape time. For longitudinal company research, store
published_atand re-scrape rather than assuming it’s constant.
Wrapping up
Lagou is the deepest window into China’s tech hiring — salaries, stacks, and company funding signals in one place. The hard parts are coverage (the keyword × city × experience expansion that surfaces unique results) and the local-format salary parsing. If you need that data at volume, on a repeatable basis, a maintained actor that fans out the search, rotates proxies, and parses the Chinese salary strings is the fast path to a clean dataset.
▶ Open the Lagou Tech Jobs Scraper on Apify — China tech jobs, salaries and company data across 25+ cities, structured JSON out. Pay-per-event, no browser. Start on Apify’s free monthly credit.
Related guides
How to Scrape Arbeitnow Jobs (DACH & EU Remote) in 2026
Pull a fresh feed of German-market and EU-remote tech jobs from Arbeitnow — filter by keyword, remote, employment type, tags and city, scheduled for daily deltas.
How to Scrape Built In Tech Jobs Data in 2026
Extract tech and startup job listings from Built In (builtin.com) at scale — salary, skills, remote flags, hiring companies — across the national board and every US tech hub.
How to Scrape elempleo Colombia Job Listings in 2026
A practical guide to extracting job postings from elempleo.com, Colombia's largest job board — titles, companies, cities, salaries and contract types — cleanly and at scale.