L logiover
jobs · May 20, 2026 · 7 min read

How to Scrape Built In Tech Jobs Data in 2026

Extract tech and startup job listings from Built In (builtin.com) at scale — salary, skills, remote flags, hiring companies — across the national board and every US tech hub.

Built In (builtin.com) sits in an unusual spot in the US tech-hiring world. It isn’t a general job board like Indeed — it curates startup and tech roles, organizes them around regional hubs (NYC, SF, Austin, Chicago, Boston, Seattle, LA, Colorado and more), and enriches every listing with company profiles, salary ranges and required skills. That makes it one of the richest single sources of structured tech-labor data on the open web. It also means the listings are spread across a national board plus a dozen regional sub-sites, with heavy overlap between them. This guide covers what’s worth pulling, how Built In defends itself, and how to get a clean, deduplicated feed in production.

What’s worth extracting

Built In listings are far richer than a typical board row. For each posting you can reliably pull:

  • Job identity — title, posting URL, internal job identifier, posting timestamp and recency.
  • Company metadata — company name, logo and visual assets, company size band, industry tags.
  • Compensation — salary range where the employer disclosed it (Built In pushes hard for transparency, so disclosure rates are higher here than on most boards).
  • Skills — the explicit skills list Built In attaches to each role (React, Kubernetes, Go, etc.) — this is the field that makes Built In uniquely useful for labor-market analysis.
  • Experience level — entry, mid, senior, expert.
  • Work arrangement — remote, hybrid or in-office, plus the office location(s).
  • Location — city and the regional hub the listing belongs to.
  • Description — the full job description body, not just a snippet.

The skills array and the disclosed salary band are the two fields that justify scraping Built In specifically rather than a generic aggregator. If you only need title plus company, almost any board will do. If you want to answer “what does a senior Go engineer earn in Austin and what stack do those roles ask for,” Built In is the source.

The dedup problem (and why it matters)

A listing for a remote role headquartered in NYC frequently appears on the national board, the NYC hub, and any other hub the company markets into. Naively crawling each hub and concatenating gives you the same job three or four times. Any serious Built In pipeline has to deduplicate across regions on a stable key — the internal job identifier, not the URL (URLs differ per hub for the same posting).

This is one of the less-glamorous reasons to use a managed scraper instead of rolling your own: the dedup logic across the national board plus every regional hub is fiddly, and getting it wrong silently inflates your counts by 2-4x.

The anti-bot reality

Built In runs behind Cloudflare. A bare requests call with a fake User-Agent gets you a managed challenge page or an outright 403 within the first few requests. What actually works:

  1. Residential proxies — datacenter IPs are challenged aggressively; residential ASNs pass far more often.
  2. A real browser fingerprint — Cloudflare’s challenge checks TLS (JA3/JA4) and JS execution, so a generic HTTP client is detected immediately.
  3. Modest concurrency and pacing — hammering the regional hubs in parallel trips rate limits fast.

You do not need an account or API key — all the listings this scraper targets are publicly visible. The friction is purely Cloudflare, not authentication. That’s a meaningfully easier problem than, say, LinkedIn, but it’s still enough to make a naive script useless after a few minutes.

Run the Built In Tech Jobs Scraper — tens of thousands of deduplicated US tech listings per run with salary, skills, experience and work-arrangement, residential proxies and Cloudflare handling included. Pay-per-event with a free Actor-start price.

How the listings are organized

Built In’s URL structure mirrors its hub model:

https://builtin.com/jobs                    # national board
https://builtin.com/jobs/remote             # remote-only national
https://www.builtinnyc.com/jobs             # NYC hub
https://www.builtinaustin.com/jobs          # Austin hub
https://www.builtinsf.com/jobs              # SF / Bay Area hub

Each board supports filtering by keyword, category, experience level, company size and recency through query parameters and the on-page filter UI. The scraper exposes those same filters as inputs, so you can scope a run to, for example, “senior backend roles posted in the last 7 days across NYC and remote” rather than pulling the whole universe every time.

Build it yourself vs. use a managed scraper

If you want a one-off snapshot of one hub, a Playwright script plus a residential proxy will get you there in an afternoon. The cost shows up when you need it repeatedly and across hubs:

  • Building from scratch — handle Cloudflare, crawl a dozen hubs, then build and maintain cross-region dedup. Plan for re-work whenever Built In changes its DOM or filter params.
  • Using a managed actor — point it at the hubs and filters you care about, schedule it, and get a deduplicated feed. No account, no proxy bill, no Cloudflare maintenance.

For recruitment and labor-research use cases — where you want the same query run daily or weekly — the managed path is almost always cheaper once you price in your own time.

Schema design for downstream use

A clean, flat per-listing record that’s friendly to a warehouse or a screener UI:

{
  "job_id": "blt-9f3a21",
  "title": "Senior Backend Engineer",
  "company": "Ramp",
  "company_size": "501-1000",
  "industries": ["Fintech", "Payments"],
  "salary_min": 180000,
  "salary_max": 230000,
  "salary_currency": "USD",
  "skills": ["Go", "PostgreSQL", "Kubernetes", "gRPC"],
  "experience_level": "Senior",
  "work_arrangement": "Remote",
  "location": "New York, NY",
  "hub": "builtinnyc",
  "posted_at": "2026-05-18T00:00:00Z",
  "url": "https://www.builtinnyc.com/job/senior-backend-engineer/...",
  "scraped_at": "2026-05-20T09:00:00Z"
}

A few schema decisions worth making early:

  • Keep skills as an array. It’s the highest-value field for analytics — flatten it to a string and you lose the ability to do skill-frequency analysis.
  • Store salary_min and salary_max separately, plus a flag for whether salary was disclosed at all. Disclosure rates are a metric in their own right.
  • Keep hub on every row even after dedup, so you can answer “which hubs is this company marketing into.”
  • Always store scraped_at. Postings expire and salary bands get edited; you need to know when a row was valid.

Typical use cases

What teams actually do with Built In data:

  • Recruitment and sourcing — maintain a fresh, deduplicated feed of US tech and startup roles to feed sourcers or an internal ATS.
  • Job boards and aggregators — populate a tech-jobs vertical with current, well-structured listings (salary and skills already attached).
  • Labor-market research — track hiring trends, salary ranges and in-demand skills by city and over time. The skills array makes this genuinely powerful.
  • Sales and lead generation — companies actively hiring tech talent are companies with budget; a hiring-signal feed is a clean prospecting list.
  • Competitive intelligence — watch a competitor’s headcount plans by monitoring what roles, levels and locations they’re posting.

Cost math

The scraper is pay-per-event: a tiny Actor-start charge and no per-result fee. A run that pulls tens of thousands of deduplicated listings across the national board and a handful of hubs costs effectively the compute plus that start event — cents, not dollars. Scheduling a daily refresh across your target hubs lands in the low single digits per month.

Compare that to a self-hosted build: a residential proxy pool runs $300-500/month for anything respectable, plus a VPS, plus the engineering time every time Cloudflare or the Built In DOM shifts. For a recurring feed, the managed actor wins on cost before you even count maintenance hours.

Common pitfalls

  • Double-counting across hubs. The single biggest mistake. Dedup on the internal job ID, not the URL.
  • Assuming salary is always present. Built In has high disclosure rates relative to other boards, but plenty of roles still omit it. Treat salary as nullable.
  • Treating “posted” dates as exact. Built In sometimes re-surfaces older roles. Use the posting timestamp as a recency hint, not a precise audit trail.
  • Over-broad runs. Pulling the full national universe daily when you only care about three hubs wastes time and money. Use the filters.
  • Ignoring work-arrangement noise. “Remote” on Built In sometimes means “remote within the US” or “remote-friendly.” If geography matters, cross-check the location field.

Wrapping up

Built In is the cleanest single source for US tech-hiring data with salary and skills attached — but its hub model means cross-region dedup is non-optional, and Cloudflare makes a naive script useless fast. If you need a one-time look, build it. If you need a recurring, deduplicated feed for recruiting, a job board, or labor research, use a managed actor that already handles the hubs, the proxies and the dedup.

Open the Built In Tech Jobs Scraper on Apify — deduplicated US tech and startup listings with salary, skills and work-arrangement. No account needed. Start on Apify’s free monthly credit.

Related guides