Google’s John Mueller recently explained how query relevancy is determined for pages blocked by robots.txt.
It has been stated that Google will still index pages that are blocked by robots.txt. But how does Google know what types of queries to rank these pages for?
That’s the question that came up in yesterday’s Google Webmaster Central hangout:
“Nowadays everyone talks about user intent. If a page is blocked by robots.txt, and is ranking, how does Google determine the query relevancy with page content as it’s blocked?”
In response, Mueller says Google obviously cannot look at the content if it’s blocked.
So what Google does is find other ways to compare the URL with other URLs, which is admittedly much harder when blocked by robots.txt.
In most cases, Google will prioritize the indexing of other pages of a site that are more accessible and not blocked from crawling.
Sometimes pages blocked by robots.txt will rank in search results if Google considers them worthwhile. That’s determined by the links pointing to the page.
So how does Google figure out how to rank blocked pages? The answer comes down to links.
Ultimately, it wouldn’t we wise to block content with robots.txt and hope Google knows what to do with it.
But if you happen to have content that is blocked by robots.txt, Google will do its best to figure out how to rank it.
You can hear the full answer below, starting at the 21:49 mark:
“If it’s blocked by robots.txt, then obviously we can’t look at the content. So we do have to kind of improvise and find ways to compare that URL with other URLs that are kind of trying to rank for these queries, and that is a lot harder.
Because it’s a lot harder it’s also something where, if you have really good content that is available for crawling and indexing, then usually that’s something we would try to kind of use instead of a random robotted page.
So, from that point of view, it’s not that trivial. We do sometimes show robotted pages in the search results just because we’ve seen that they work really well. When people link to them, for example, we can estimate that this is probably something worthwhile, all of these things.
So it’s something where, as a site owner, I wouldn’t recommend using robots.txt to block your content and hope that it works out well. But if your content does happen to be blocked by robots.txt we will still try to show it somehow in the search results.”