In a Google SEO Office Hours hangout Google’s John Mueller was asked why Google did not crawl enough web pages. The person asking the question explained that Google was crawling at a pace that was insufficient to keep pace with an enormously large website. John Mueller explained why Google might not be crawling enough pages.
What is the Google Crawl Budget?
GoogleBot is the name of Google’s crawler that goes to web page to web page indexing them for ranking purposes.
But because the web is large Google has a strategy of only indexing higher quality web pages and not indexing the low quality web pages.
According to Google’s developer page for huge websites (in the millions of web pages):
“The amount of time and resources that Google devotes to crawling a site is commonly called the site’s crawl budget.
Note that not everything crawled on your site will necessarily be indexed; each page must be evaluated, consolidated, and assessed to determine whether it will be indexed after it has been crawled.
Crawl budget is determined by two main elements: crawl capacity limit and crawl demand.”
Related: Google SEO 101: Website Crawl Budget Explained
What Decides GoogleBot Crawl Budget?
The person asking the question had a site with hundreds of thousands of pages. But Google was only crawling about 2,000 web pages per day, a rate that is too slow for such a large site.
The person asking the question followed up with the following question:
“Do you have any other advice for getting insight into the current crawling budget?
Just because I feel like we’ve really been trying to make improvements but haven’t seen a jump in pages per day crawled.”
Google’s Mueller asked the person how big the site is.
The person asking the question answered:
“Our site is in the hundreds of thousands of pages.
And we’ve seen maybe around 2,000 pages per day being crawled even though there’s like a backlog of like 60,000 discovered but not yet indexed or crawled pages.”
Google’s John Mueller answered:
“So in practice, I see two main reasons why that happens.
On the one hand if the server is significantly slow, which is… the response time, I think you see that in the crawl stats report as well.
That’s one area where if… like if I had to give you a number, I’d say aim for something below 300, 400 milliseconds, something like that on average.
Because that allows us to crawl pretty much as much as we need.
It’s not the same as the page speed kind of thing.
So that’s… one thing to watch out for.”
Related: Crawl Budget: Everything You Need to Know for SEO
Site Quality Can Impact GoogleBot Crawl Budget
Google’s John Mueller next mentioned the issue of site quality.
Poor site quality can cause the GoogleBot crawler to not crawl a website.
Google’s John Mueller explained:
“The other big reason why we don’t crawl a lot from websites is because we’re not convinced about the quality overall.
So that’s something where, especially with newer sites, I see us sometimes struggle with that.
And I also see sometimes people saying well, it’s technically possible to create a website with a million pages because we have a database and we just put it online.
And just by doing that, essentially from one day to the next we’ll find a lot of these pages but we’ll be like, we’re not sure about the quality of these pages yet.
And we’ll be a bit more cautious about crawling and indexing them until we’re sure that the quality is actually good.”
Factors that Affect How Many Pages Google Crawls
There are other factors that can affect how many pages Google crawls that weren’t mentioned.
For example, a website hosted on a shared server might be unable to serve pages quick enough to Google because there might be other sites on the server that are using excessive resources, slowing down the server for the other thousands of sites on that server.
Another reason may be that the server is getting slammed by rogue bots, causing the website to slow down.
John Mueller’s advice to note the speed that the server is serving web pages is good. Be sure to check it after hours at night because many crawlers like Google will crawl in the early morning hours because that’s generally a less disruptive time to crawl and there are less site visitors on sites at that hour.
Citations
Read the Google Developer Page on Crawl Budget for Big Sites:
Large Site Owner’s Guide to Managing Your Crawl Budget
Watch Google’s John Mueller answer the question about GoogleBot not crawling enough web pages.
View it at approximately the 25:46 minute mark: