Googlebot is burning crawl budget on faceted-nav URLs — how do you actually stop it?

9,140 views171 replies1.2k upvotes6 days ago
D
Daniel S.Indexed
Posted 6 days ago · Original poster

Large e-com site, ~80k real products, but the faceted navigation (color, size, price, sort) generates millions of parameter URL combos. GSC Crawl Stats shows Googlebot spending the majority of requests on these junk URLs, and new products take 2–3 weeks to get indexed.

I know the textbook answers (robots.txt, canonical, nofollow) but they each have tradeoffs. What's the setup people actually run in production in 2026?

▲ Upvote↳ Reply⚑ Share
G
Grace V.SERP Master
Best answer · 6 days ago
✓ Best answer

Canonical tags do NOT save crawl budget — Google still has to crawl the URL to see the canonical. So if budget is the problem, canonicals alone won't fix it. Here's the layered setup that works:

1) Decide which facets have search demand. 'red-dress' has demand and should be a real, indexable, statically-linked landing page. 'sort=price_desc&view=grid' has zero demand — pure crawl waste.

2) Block the worthless parameter patterns in robots.txt with Disallow rules (anything with sort=, view=, sessionid=). This stops the crawl at the door. You lose link signals through them, but for sort/view that's nothing.

3) For valuable facet pages, make them crawlable via clean static URLs in your nav and sitemap, self-canonical, kept out of the parameter mess entirely.

4) Use nofollow on the in-page facet links you don't want followed as belt-and-suspenders, and keep them out of your XML sitemap.

5) Watch Crawl Stats weekly. Done right, Googlebot reallocates to product/category URLs within a couple weeks and indexing latency drops hard.

▲ 1.2k upvotes↳ Reply⚑ Share
171 replies
H
Henry M.Ranking

The 'canonical doesn't save budget' point can't be repeated enough. So many people think rel=canonical is a crawl directive. It is not.

Z
Zoe K.Indexed

We moved sort/filter to URL fragments (#) for the no-demand combos so they never generate a new crawlable URL. Cut parameter crawl ~70%.

J
Jack R.Technical

@Zoe the fragment approach is slept on. JS state in the # instead of ? is clean. Just keep the demand facets as real server-rendered URLs.

L
Lily S.Crawled

Robots.txt disallow scares me — won't blocked URLs still show as 'indexed though blocked' in GSC?

G
Grace V.SERP Master

@Lily only if they're linked or discoverable externally. Combine the disallow with not linking them internally and removing them from sitemaps and you won't see that at scale.

O
Owen D.Indexed

Log-file analysis beats GSC Crawl Stats for this. Pull a week of server logs, group by parameter, and you'll see exactly where Googlebot wastes time.

N
Nora P.Ranking

+1 logs. Crawl Stats is sampled and aggregated. Raw logs are ground truth for bot behavior.

L
Leo T.Technical

Don't forget pagination. ?page=2..n of filtered views is another budget sink. Keep deep pagination thin and let Google find products via sitemaps.

S
Stella W.Crawled

How big does a site need to be before crawl budget is even worth worrying about? Mine is 4k pages.

G
Grace V.SERP Master

@Stella under ~10k clean URLs with decent authority, basically never — Google crawls you fine. This is a large/parameter-heavy site problem specifically.

A
Aria L.Indexed

XML sitemaps with accurate lastmod got our new products crawled in days. Google leans on lastmod again — if you don't lie about it.

E
Eli C.Ranking

If you lie about lastmod once, Google stops trusting it sitewide. Only bump it on real content change.

H
Hazel B.Technical

We also handle it at the CDN — return 304s aggressively on unchanged junk URLs, so even when crawled they're cheap.

M
Miles F.Crawled

This whole thread should be the canonical (heh) answer for faceted nav. Saving it.

R
Ruby N.Indexed

Internal linking discipline solved 80% of this for us before we touched robots.txt. Don't link to what you don't want crawled.

F
Felix H.Ranking

Reminder: test robots.txt changes in the GSC tester before shipping. One wide Disallow typo can deindex your whole catalog.

Sign in to read the full thread

156+ more replies, the complete best answer, and 208,000+ archived SEO discussions are waiting.

Join free Sign in