jobs · May 20, 2026 · 6 min read

How to Scrape USAJobs Federal Job Listings in 2026

Pull complete US federal hiring data from the official USAJobs.gov API — salary, grade, agency, clearance and location — at scale, without an API key or rate-limit headaches.

USAJobs.gov is the single front door to almost every competitive-service hiring announcement in the US federal government. It’s also one of the rare large-scale data sources that ships a genuinely open, documented API — no scraping tricks, no headless browser, no Akamai challenge. So why use a managed scraper at all? Because the gap between “there is an API” and “I have a clean, paginated, deduplicated federal-jobs feed in my warehouse” is wider than it looks. This guide walks through the USAJobs API’s real quirks in 2026 and how to extract the whole firehose cleanly.

What’s worth extracting

The USAJobs Search API returns a deeply nested JSON envelope per announcement. Once you flatten it, each federal job carries a rich, consistent record:

Identity — announcement number, control number (the stable join key), position title, direct apply URL.
Agency — hiring department (e.g. Department of Veterans Affairs) and the specific sub-agency or organization.
Pay — pay system (GS, WG, ES, etc.), minimum and maximum salary, pay basis (per year / per hour), and grade range with promotion potential.
Classification — occupational series (the 4-digit code, e.g. 2210 for IT), job category, supervisory status.
Eligibility — hiring paths (open to the public, federal employees, veterans, students), eligibility groups.
Conditions — security clearance level, travel percentage, telework eligibility, relocation, drug-test designation.
Logistics — number of openings, work locations (often multiple per announcement), schedule (full-time, intermittent), appointment type (permanent, term, temporary).
Timing — open date, close date, application status.

That’s 25–35 usable fields per row. For labor-market analysis you want series, salary, agency and location; for a job board you also want the apply URL and close date so you don’t surface dead listings.

The API is open — so what’s the catch?

The naive approach is “just hit the endpoint.” It works for a single page. The friction shows up at scale:

Authorization header quirk — the API expects a User-Agent set to your contact email plus an Authorization-Key header. Get either wrong and you receive a terse 401 with no helpful body. Many first-time scripts silently fail here.
Pagination caps — the search endpoint pages at up to 500 results per page but caps total reachable results per query (historically around the 10,000-record mark). To pull the full corpus you must slice your queries — by date range, by series, or by department — and stitch the slices together. A single broad query will silently truncate.
Rate limiting — the API tolerates polite traffic but throttles bursts. There’s no published hard number, so you have to back off on 429s and pace requests rather than hammer.
Inconsistent nesting — salary, locations and grades live in arrays-of-objects that aren’t always populated the same way. Robust flattening means defending against missing keys on every record.
Transient 5xx — the endpoint occasionally returns 502/503 under load. Without retry-with-backoff, a long run dies halfway.

None of these are hard individually. Together they’re the reason a “weekend script” becomes a maintenance chore. A managed actor solves the slicing, the header dance, the backoff and the flattening once.

▶ Run the USAJobs Federal Jobs Scraper — official USAJobs.gov API, no key required, full pagination with automatic query slicing. Tens of thousands of announcements per run, flattened to clean rows.

How the query structure works

The underlying call is a GET against the public search endpoint with filter parameters:

https://data.usajobs.gov/api/search
  ?Keyword=cybersecurity
  &JobCategoryCode=2210        # occupational series
  &Organization=VATA           # agency code
  &DatePosted=7                # last N days
  &ResultsPerPage=500
  &Page=1

The trick to beating the result cap is to drive pagination off a narrowing dimension. Pull series 2210 separately from series 0343, or split a wide date window into weekly buckets, so each sub-query stays under the cap. The scraper does this automatically when you ask for a broad pull; doing it by hand means maintaining a list of every agency code and series you care about.

Build it yourself vs. use a managed scraper

The honest trade-off:

Roll your own — a half day to a working single-query script, then a steady tail of edge cases: the cap-slicing logic, the retry policy, the schema flattening, and re-checking the auth header convention whenever USAJobs revises its developer docs.
Managed actor — running in minutes, output already flat, slicing handled, retries built in. You pay only per run on a near-zero pricing model since the data source itself is free and fast.

For a one-off snapshot, the script is fine. For a scheduled feed that has to be reliably complete, the slicing-and-stitching logic is exactly the part you don’t want to own.

Schema design for downstream use

A clean per-announcement row for analytics or a job board:

{
  "control_number": "832145900",
  "announcement_number": "VATA-26-CY-1234567",
  "position_title": "Information Technology Specialist (INFOSEC)",
  "department": "Department of Veterans Affairs",
  "agency": "Veterans Health Administration",
  "occupational_series": "2210",
  "pay_system": "GS",
  "grade_low": "11",
  "grade_high": "12",
  "salary_min": 86962,
  "salary_max": 135987,
  "pay_basis": "Per Year",
  "hiring_paths": ["public", "vet"],
  "clearance": "Public Trust - Background Investigation",
  "telework_eligible": true,
  "travel_percent": "Occasional travel",
  "openings": 5,
  "locations": ["Austin, TX", "Remote"],
  "open_date": "2026-05-18",
  "close_date": "2026-06-01",
  "apply_url": "https://www.usajobs.gov/job/832145900",
  "scraped_at": "2026-05-20T09:00:00Z"
}

Schema choices worth making early:

Use control_number as the primary key, never the position title. The same title appears under dozens of announcements across agencies.
Store both grade_low/grade_high and the salary range. Salary is what humans read; grade is what lets you compare roles across agencies on a common ladder.
Keep locations as an array. Federal announcements routinely list 10+ duty stations under one control number; flattening to a single string loses geography you’ll want for heatmaps.
Always record close_date. A federal-jobs feed that surfaces closed announcements is worse than no feed; filter on it at query time.

Typical use cases

Labor-market research — track federal hiring volume by agency, series and salary band over time; quantify which departments are expanding.
Recruitment intelligence — staffing firms and career coaches surface openings filtered by clearance level, occupational series or department.
Job boards and aggregators — populate a federal-jobs vertical with structured, deduplicated, currently-open announcements.
Economists and journalists — monitor public-sector employment as an economic signal, or investigate hiring surges tied to policy.
Veteran and student services — filter on hiring paths to surface only the announcements a given audience is eligible for.

The value is in completeness and freshness. A partial pull that silently hit the result cap is a research liability; a scheduled, fully-sliced daily feed is reusable infrastructure.

Cost math for the managed approach

Because the USAJobs API is free and HTTP-only — no proxy, no browser, no anti-bot bandwidth — the cost floor is basically compute. A daily full-corpus pull of tens of thousands of announcements runs in single-digit dollars per month. Compare that to the engineering cost of owning the slicing logic and re-validating it whenever USAJobs ships a docs update: the maintenance time dwarfs the runtime cost.

Common pitfalls

The silent result cap — the number-one mistake. A broad query returns “results” and looks complete while truncating at ~10K. Always slice and verify total counts against the API’s reported SearchResultCountAll.
Stale closed listings — announcements vanish from search after close, but a cached row lingers in your DB. Re-run and diff, or filter on close_date.
Series vs. category confusion — JobCategoryCode is the 4-digit occupational series, not a free-text category. Map your taxonomy to the official series codes.
Salary nulls — some pay systems (WG wage-grade) express pay hourly; don’t assume per-year. Read pay_basis before comparing numbers.
Multi-location double counting — if you explode locations into separate rows for a map, dedupe on control_number before counting “jobs.”

Wrapping up

USAJobs is the friendly case: an open, fast, free government API. The work isn’t beating anti-bot — it’s beating the result cap, flattening the nesting, and keeping the feed complete and current. If you need one snapshot, the API docs will get you there in an afternoon. If you need a reliably-complete federal-jobs feed on a schedule, let a managed actor own the slicing and stitching.

▶ Open the USAJobs scraper on Apify — filter by keyword, agency, series and date; full corpus via automatic slicing. Schedule it for a fresh federal-jobs feed.