How to Run a Full Technical SEO Audit Across a Whole Site in 2026
A practical guide to crawling every page of a site and auditing on-page SEO at scale — titles, meta, headings, canonicals, schema, word count and detected issues.
A technical SEO audit is mostly a counting exercise done at scale: how many pages are missing a title, how many have duplicate meta descriptions, how many have no H1 or three of them, how many are noindex by accident, how many are thin content. You can’t answer those questions by spot-checking a few pages — you need to crawl every URL and extract the on-page signals from each. This guide covers what signals matter, how a headless-free SEO crawl works, and how the per-page economics let you re-audit as often as your site changes.
What’s worth extracting per page
On-page SEO is a fixed set of signals living in each page’s server-rendered HTML. A thorough per-page audit pulls:
- Title — content and length. Too long gets truncated in SERPs; missing is a critical miss.
- Meta description — content and length, with the same length sensitivity.
- Headings —
H1andH2text and counts. ZeroH1s or multipleH1s are classic flags. - Canonical tag — present or missing, self-referencing or pointing elsewhere.
- Indexability — robots directives,
noindexsignals. An accidentalnoindexis the most expensive SEO bug there is. - Open Graph and Twitter cards — social preview metadata.
- hreflang, charset, viewport — internationalization and mobile signals; missing
viewportis a mobile-usability flag. - JSON-LD schema.org types — which structured-data types the page declares.
- Image count and alt coverage — how many images, how many missing alt text.
- Internal and external link counts.
- Word count — boilerplate-stripped, so navigation and footer text don’t inflate it. This is what flags thin content.
On top of the raw signals, the audit computes issue flags — missing/duplicate title, meta-description length problems, missing/multiple H1, missing canonical, noindex directive, images missing alt, missing viewport, thin content — so you get a prioritized punch list, not just a data dump.
How a headless-free SEO crawl works
The crawler starts from one URL and follows internal links to reach every page on the domain — thousands of pages in a single run. It fetches the server-rendered HTML over plain HTTP (no headless browser) and parses the signals above out of the markup.
The “no browser” choice has a real consequence worth understanding: it audits the server-rendered HTML. For the vast majority of sites — WordPress, server-rendered frameworks, static generators — that’s exactly what Google’s first-pass indexer sees, so it’s the right thing to audit. For a pure client-side single-page app that renders its <title> and meta in JavaScript, an HTTP crawl sees the pre-render shell. If your stack is fully client-rendered, know that going in. For everyone else, HTTP-only is what makes auditing thousands of pages fast and cheap.
▶ Run the Website SEO Audit Crawler — crawls your whole site and returns one audit row per page: title, meta, headings, canonical, schema, word count and a list of detected SEO issues. No login, no browser.
Output schema
{
"url": "https://example.com/blog/seo-guide",
"title": "The Complete SEO Guide",
"title_length": 22,
"meta_description": "Everything you need to know about SEO in 2026.",
"meta_description_length": 46,
"h1_count": 1,
"h1": ["The Complete SEO Guide"],
"canonical": "https://example.com/blog/seo-guide",
"indexable": true,
"schema_types": ["Article", "BreadcrumbList"],
"images_total": 8,
"images_missing_alt": 2,
"word_count": 1840,
"internal_links": 24,
"external_links": 5,
"issues": ["images_missing_alt"],
"audited_at": "2026-05-27T10:00:00Z"
}
The issues array is the part you’ll actually work from. Sort the whole dataset by which issues appear most often and you have your remediation backlog ranked by impact.
Use cases
- Full technical SEO audits in a single run, instead of paying per-page SaaS tools or clicking through pages by hand.
- Content audits to surface thin content, missing meta descriptions, and duplicate titles across a large blog.
- Pre-launch QA. Before a site goes live, confirm every page has a title, a canonical, and is indexable — catch the stray
noindexbefore Google does. - Scheduled monitoring. Run weekly to track on-page health as a time series; catch regressions the day a deploy introduces them.
- Migration verification. After a redesign or platform move, re-crawl and confirm titles, canonicals, and indexability survived intact.
Cost math
Pay-per-event, small per-run start fee, zero per result, one row per page. SEO audits are high-volume by nature — every page is a row, and big sites have a lot of pages.
- 10,000-page site, one audit row per page.
- One run, results free.
- Cost is the Actor start plus HTTP compute for 10,000 lightweight fetches.
Because results are free, the natural cadence is frequent. A SaaS crawler that charges per page makes you ration audits to once a quarter; free-per-result makes weekly or post-deploy audits a non-decision. That cadence is where the real SEO value lives — catching the accidental noindex in 24 hours instead of three months. (It’s no accident this is the most-run actor in the suite.)
Common pitfalls
- Client-rendered SPAs. As noted, fully JS-rendered metadata won’t be seen by an HTTP crawl. Audit the rendered output separately if that’s your stack.
- Parameterized duplicates.
?utm=and faceted URLs create near-duplicate pages that inflate “duplicate title” counts. Canonicalize before drawing conclusions. - Issue flags are heuristics. “Thin content” at, say, under 300 words is a reasonable default but not gospel for every page type (a contact page is supposed to be short). Read flags with judgment.
- Robots and sitemap scope. The crawler follows internal links; if a page is only reachable via the sitemap and orphaned in the link graph, decide whether you also want to seed from the sitemap.
- Schema validity. The audit records which schema types are declared, not whether each validates against Google’s requirements. Validate separately if rich results matter.
Wrapping up
A technical SEO audit is a crawl-and-count problem, and the only honest way to do it is across every page. Run it once for a baseline, then schedule it so regressions surface the day they ship. With free per-result pricing across thousands of pages, there’s no reason to audit on a quarterly drip when you could audit on every deploy.
▶ Open the Website SEO Audit Crawler on Apify — one audit row per page with detected issues, across your whole site. Schedulable. Start with Apify’s free monthly credit.
Related guides
How to Bulk Check URL Status Codes & Redirects in 2026
Check thousands of URLs for 200/301/404/500 status, trace full redirect chains, resolve final URLs and measure response time — a practical bulk link-audit guide for SEO and migrations.
How to Crawl a Site's Internal Link Graph in 2026
A practical guide to mapping every internal and outbound link on a website as graph edges — source, target, anchor text and rel flags — for internal-linking SEO audits.
How to Extract All URLs from a Sitemap in 2026
A practical guide to recursively crawling sitemap.xml and nested sitemap indexes to build clean, deduplicated URL lists for RAG pipelines, SEO audits and content inventories.