How to Scrape Steam Game Reviews and Data in 2026
Pull Steam game metadata, prices, Metacritic scores and multilingual user reviews from Steam's public JSON API — what's exposed, how to paginate reviews, and how to do it at scale.
Steam is the largest PC-game distribution platform on earth, and unlike most of the targets in this catalog it barely fights back. Valve exposes two well-documented public JSON endpoints — one for store metadata, one for user reviews — and neither requires an API key, a login, or a residential proxy to read. The difficulty isn’t anti-bot; it’s volume, pagination, and stitching together two different endpoints into one clean dataset. This guide covers what Steam exposes, how the review cursor works, and how to pull a few hundred thousand reviews without writing your own retry loop.
What’s worth extracting
Steam splits its data across two surfaces, and a good scraper merges them.
Store metadata (from the app-details endpoint) gives you the catalog side:
- Identity — App ID, game title, type (game / DLC / demo), header image.
- Pricing — current price, original price, discount percent, currency.
- Classification — genres, store categories (single-player, co-op, controller support), developers, publishers.
- Critical reception — Metacritic score and the Metacritic review URL.
- Platform support — Windows / macOS / Linux availability flags.
- Release — release date, coming-soon flag.
User reviews (from the reviews endpoint) give you the sentiment side, one row per review:
- Review body — full review text, in the reviewer’s chosen language.
- Sentiment — the binary
voted_uprecommendation (thumbs up / down). - Playtime context — total hours played, and hours played at the moment the review was written (this matters — a 2-hour review and a 400-hour review carry very different weight).
- Engagement — helpful votes, funny votes, comment count.
- Provenance — review language, whether the game was purchased on Steam, whether it was received free, creation and last-updated timestamps.
- Developer response — if the studio replied, the text and timestamp.
For most analyses you want the store record once per game, and then a stream of review rows joined on App ID.
The two endpoints
The store metadata lives at a per-App ID JSON endpoint:
https://store.steampowered.com/api/appdetails?appids=APPID&cc=us&l=english
The cc (country code) parameter controls which currency and price you get back — pin it or you’ll get inconsistent pricing across runs. The reviews live at a separate endpoint that uses cursor-based pagination:
https://store.steampowered.com/appreviews/APPID
?json=1
&filter=recent # recent | updated | all (helpful-weighted)
&language=all # or english, schinese, russian, ...
&num_per_page=100
&cursor=* # first page; then echo back the returned cursor
The cursor pattern is the part people get wrong. Each response hands you a cursor token; you URL-encode it and pass it as the cursor parameter on the next request. You keep going until the returned review batch is empty. There is no page number — if you try to compute offsets you’ll loop forever or skip reviews. A scraper that handles this correctly will walk a popular title’s hundreds of thousands of reviews without dropping or duplicating rows.
▶ Run the Steam Game & Reviews Scraper — feed it App IDs, store URLs, or a keyword search and get merged game-detail and review records. Cursor pagination, language filtering, and sentiment sorting handled. No proxy or auth needed.
Sort order and language filtering
The filter parameter decides which reviews you see first, and it interacts with completeness:
recent— newest first. Use this for time-series sentiment (“did the last patch crater the reviews?”).updated— ordered by last edit, surfaces reviews people came back to revise.all— helpfulness-weighted. Best for a representative sample if you don’t need every single row.
Language filtering is one of Steam’s most useful features for NLP work. Setting language=schinese versus language=english versus language=all lets you build clean per-language corpora. Chinese, Russian, and English are typically the three highest-volume languages on big titles, and their sentiment can diverge sharply on the same game — invaluable signal for localization and regional pricing decisions.
Schema design for downstream use
Because two endpoints feed the dataset, decide early whether you emit one denormalized row per review (game fields repeated) or two related tables. For warehouses and joins, a per-review row that carries the App ID plus key game fields is the most convenient:
{
"app_id": 1245620,
"game_title": "ELDEN RING",
"price": 59.99,
"discount_percent": 0,
"currency": "USD",
"genres": ["Action", "RPG"],
"developers": ["FromSoftware, Inc."],
"metacritic_score": 96,
"review_id": "184392011",
"voted_up": true,
"review_text": "Prepare to die, then prepare to love it.",
"review_language": "english",
"playtime_forever_hours": 312.4,
"playtime_at_review_hours": 88.1,
"votes_helpful": 1204,
"votes_funny": 73,
"purchased_on_steam": true,
"received_free": false,
"developer_response": null,
"review_created_at": "2026-04-22T09:14:00Z",
"scraped_at": "2026-05-20T12:00:00Z"
}
A few choices worth making up front:
- Keep
playtime_at_review_hoursseparate fromplaytime_forever_hours. The first is the credibility signal; the second is how invested the player is now. Collapsing them loses the most interesting feature. - Store
review_languageon every row so you can slice corpora later without a re-scrape. - Preserve
voted_upas a boolean, not a string. It’s your label column for any sentiment model. - Stamp
scraped_at. Steam reviews and “recent reviews” summaries shift with every sale and patch.
Typical use cases
- Game market research — track pricing, discount cadence, genre saturation and release timing across a competitor set or a whole publisher’s catalog.
- Sentiment / NLP corpora — assemble multilingual, playtime-weighted review datasets where each row carries a clean thumbs-up/down label and an hours-played credibility signal.
- AI training data — Steam reviews are long-form, opinionated, and tagged by language, which makes them a strong fine-tuning and RAG source for game-domain assistants.
- Indie developer intelligence — pull every review for the top titles in your genre to see exactly what players praise and rage about.
- Price and discount tracking — watchlist a set of App IDs and re-run on a schedule to catch sales the moment they go live.
The common thread is freshness plus breadth. A single game’s review history is interesting; a genre’s full review corpus refreshed weekly is a research asset.
Build it yourself vs. use a managed scraper
Steam is genuinely scrapeable by hand — the endpoints are public and stable. The reasons to use a managed actor anyway:
- The cursor loop is fiddly. Getting pagination, deduplication, and the empty-batch stop condition right across thousands of titles is the part that eats an afternoon and then breaks silently.
- Rate limits exist even without auth. Hammer the reviews endpoint and Valve will throttle you; a managed actor already paces requests and can route through a proxy when needed.
- Two-endpoint stitching. Merging store metadata with the review stream per App ID, in the right currency and language, is boilerplate you’d rather not own.
Cost math
This actor is pay-per-event with results priced at zero — you pay only the tiny per-run start fee (about $0.00005) plus Apify platform compute, with no per-row charge. That makes large multilingual review pulls extraordinarily cheap: scraping every review for a few hundred games is effectively the cost of the compute time, not the data volume. Compared to standing up your own paced crawler on a VPS and babysitting the cursor logic, you skip the build week entirely and the running cost is rounding-error territory.
Common pitfalls
- Don’t paginate by offset. Steam reviews are cursor-based. Echo the returned cursor; stop on an empty batch.
- Pin the country code. Prices and currency follow
cc. Leave it unset and you’ll mix USD, EUR, and regional pricing in one dataset. language=allis not the union of every language filter applied separately in edge cases — if you need guaranteed per-language completeness, pull each language explicitly.- Review counts on the store page are summaries, not the full set. The “Very Positive (40,000)” badge is a rollup; the actual reviews come only from the paginated endpoint.
- Metacritic isn’t present for every title. Many indie and newer games have no Metacritic entry — treat the field as nullable.
- Free-key and reviewer-program reviews exist. The
received_freeandpurchased_on_steamflags let you filter out review-copy bias; use them if you care about organic sentiment.
Wrapping up
Steam is one of the friendliest large datasets on the open web — no key, no login, light rate-limiting — but the cursor pagination and two-endpoint merge are exactly the kind of plumbing that’s tedious to build and easy to get subtly wrong. If you just need clean, multilingual, playtime-tagged review data joined to game metadata, run a managed actor and spend your time on the analysis instead.
▶ Open the Steam Game & Reviews Scraper on Apify — bulk App IDs, store URLs or keyword search in; merged game + review rows out. Results are priced at zero per row. Start with Apify’s free monthly credit.
Related guides
TikTok Brand Mention Monitoring: A Complete 2026 Guide
Set up TikTok brand mention monitoring without a login: track every video mentioning your brand or keyword, capture full engagement metrics, and run it on a schedule for social listening.
How to Scrape Apple Podcasts Episodes in 2026
Extract podcast shows, full episode lists, MP3 audio URLs, show notes and transcripts from Apple Podcasts using the iTunes API plus RSS — no login, no browser.
How to Scrape Historical Reddit Posts and Comments in 2026
A practical guide to retrieving 10+ years of archived Reddit posts and comments via PullPush — full-text comment search, date-range queries, no login and no proxy.