Crawling & indexing controls

Indexed Though Blocked by robots.txt: What It Means + How to Fix It

“Indexed, though blocked by robots.txt” in Google Search Console means Google indexed your page without being able to crawl it. Here's why it happens — and how to fix it.

Updated May 7, 2026

TL;DR

“Indexed, though blocked by robots.txt” means Google has the URL in its index but your robots.txt won't let it crawl the content — so it ranks with no real snippet. robots.txt blocks crawling, not indexing. If the page should be indexed, remove the Disallow. If it shouldn't, robots.txt is the wrong tool: allow crawling and add a noindex tag instead.

The contradiction in this status dissolves once you accept one fact: robots.txt controls crawling, not indexing. Blocking a URL in robots.txt tells Google don't fetch this — it does not say don't list this. So Google can, and sometimes does, put the URL in the index on outside signals alone (most often inbound links from other sites), having never read a word of the page.

The result is the worst of both worlds. The URL shows up in search, but with no title, a generic or stale description, or the giveaway snippet “No information is available for this page.” Google is ranking a page it can't see.

So the real question isn't how do I make this go away — it's did I want this page indexed or not? The two answers need opposite fixes, and picking the wrong one is how most people get stuck here.

First, confirm what Google can and can't do

Open Pages → Why pages aren't indexed, click “Indexed, though blocked by robots.txt,” and pull one URL into URL Inspection. Two fields tell you everything:

Crawl allowed: No — confirms a robots.txt rule is blocking the fetch.
Page fetch: Failed (blocked) alongside Indexing: URL is on Google — confirms the page is indexed despite that block.

If Crawl allowed says Yes, you're not looking at this status — recheck the report. While you're in URL Inspection, note the robots.txt rule GSC names as the blocker; you'll need to find that exact Disallow: line.

Screenshot to add

GSC URL Inspection panel showing 'Crawl allowed: No' and 'Indexing: URL is on Google' for a single affected URL, with the robots.txt blocking rule called out.

This status almost always means another site linked to your page. Google found the URL through that link, couldn't crawl it, and indexed the bare URL anyway. That's why you can't make it disappear just by tightening robots.txt — the block is exactly what's causing the problem.

Decide which page this is

Everything downstream depends on this one call. Run the affected URL against both — only one is yours.

This page should be indexed

You blocked it by accident — a leftover Disallow: from a migration, an overly broad rule like Disallow: /products/ that swept up pages you want, or a staging rule that shipped to production.

Tell: it's a real page you'd be happy to see rank — a product, an article, a category — and you can't remember deliberately wanting it out of Google.

This page should not be indexed

It's a thank-you page, a faceted-filter URL, an internal search result, a PDF you'd rather hide — and you reached for robots.txt to keep it out of Google. That's the wrong tool: robots.txt stops the crawl but not the listing, which is precisely why it's stuck here.

Tell: you intended to keep this URL out of search, and robots.txt was your method.

Fix it for the page you have

Match the path to the call you just made — you only need one.

Should be indexed: remove the Disallow, then re-crawl
Open https://yourdomain.com/robots.txt and find the rule URL Inspection named — for example Disallow: /products/. Delete it, or narrow it so it no longer matches the URL you want crawled. Order and specificity matter: a broad Disallow: can override a later Allow:, so test the exact path in GSC's robots.txt report after editing.
Then re-run URL Inspection, confirm Crawl allowed: Yes, and click Request indexing. Google re-crawls, reads the real content, and rebuilds the index entry with a proper title and snippet.
Should NOT be indexed: allow crawling, then noindex
This is the counterintuitive part. To get a page out of the index, you have to let Google in. The deindex signal is a noindex tag — and Google can only see it by crawling the page.
First, remove the Disallow: rule from robots.txt so the page is crawlable. Then add a noindex directive to the page itself:
```
<meta name="robots" content="noindex" />
```
Now Google can fetch the page, read the noindex, and drop it from the index — typically within days to a few weeks. Once it's gone, you can re-block in robots.txt, but you rarely need to; the noindex does the job on its own. See Excluded by noindex tag for what the clean end-state looks like.
Then pull it out of your sitemap
For the deindex path: if the URL is in your XML sitemap, remove it. A sitemap entry tells Google this URL matters — the opposite of the noindex signal you're sending. Leaving it in keeps re-suggesting the page Google is trying to forget.

The one mistake that traps everyone here

Adding a noindex tag while the page is still blocked in robots.txt does nothing. Google can't crawl the page, so it never sees the noindex — the URL stays indexed indefinitely. noindex and a robots.txt block are mutually defeating: pick one, and for deindexing it must be noindex with crawling allowed.

Confirm the fix landed

Give it a few days to a few weeks, then verify against what you were going for:

If you wanted it indexed — re-inspect in URL Inspection for Crawl allowed: Yes, then watch the Performance report for that page; impressions and a real snippet appearing is the proof it's indexed and readable.
If you wanted it deindexed — re-inspect and confirm Google processed the noindex; the URL should move to Excluded by ‘noindex’ tag in the Pages report and fall out of site:yourdomain.com/your-page-url results.

If a page you wanted gone is still listed after a couple of weeks, the usual cause is that the block went back on too soon — Google never got a clean crawl with the noindex visible. Lift the block, leave it lifted until the page clears, and re-check.

Don't confuse it with these neighbors

Status	What it really means	Fix lives at
Indexed, though blocked by robots.txt	Google indexed the URL from outside signals but can't crawl the content	This page
Blocked by robots.txt	Crawling is blocked and the URL usually isn't indexed at all	Blocked by robots.txt guide
Excluded by ‘noindex’ tag	You deliberately deindexed it with a noindex Google was able to read	Excluded by noindex guide
Duplicate without user-selected canonical	Google picked a different version of the page to index	Duplicate canonical guide

The line that sets this apart: it's an indexed page, and the block is the cause of the problem — not the solution to it.

Stop untangling block-vs-index by hand

The hard part isn't editing robots.txt — it's spotting these on a large site, where a single broad Disallow: can strand dozens of URLs in this bucket, mixing pages you want indexed with ones you don't.

TurboConsole reads your Search Console data, flags every page indexed-but-blocked, names the exact robots.txt rule behind it, and tells you per page whether to unblock or deindex — so you fix the right ones the right way instead of inspecting URLs one at a time.

Frequently asked

How can a page be both indexed and blocked from crawling?

Google can index a URL on external signals alone — inbound links, sitemap entries, or historical crawl data — even when it currently can't fetch the page content. The URL appears in results, but Google has no fresh content to rank or describe it with, so the snippet often reads “No information is available for this page.”

Will using both a robots.txt block and a noindex tag deindex the page?

No — and this is the most common mistake. If the page is blocked in robots.txt, Google can't crawl it to see the noindex tag, so the noindex is invisible. To deindex, you must allow crawling first, let Google process the noindex, and only consider blocking in robots.txt afterward (usually you don't need to).

How long until Google removes a page from the index after I add noindex?

Once Google can crawl the page and see the noindex, removal usually happens within days to a few weeks. Request indexing in URL Inspection to nudge it. For urgent cases, the Removals tool gives a roughly 6-month suppression while the noindex propagates.

How is this different from “Blocked by robots.txt”?

“Blocked by robots.txt” means Google isn't crawling the page and usually isn't indexing it either. “Indexed, though blocked by robots.txt” is the problematic variant — Google indexed the URL anyway from outside signals but can't read the content. Same rule, different indexing outcome.

TurboConsole