Google’s John Mueller recently provided some guidance with regards to invalid URLs on a website.
An invalid URL should return a 404 error so it’s clear to Google that the URL does not belong to a particular site. A 5XX error doesn’t send that kind of signal.
Mueller offered this information on Twitter in response to an SEO tweeting about Google Search Console discovering 5XX errors.
“Got an alert from Google Search Console today that one of our pages is 5XX – after investigation it’s a mention of our link in the footnotes of a scientific pdf article: as there is a semicolon right after the url, the url is not valid. Had no idea @googlewmc was this thorough!”
Mueller didn’t acknowledge Search Console’s impressive ability to discover this error. Rather, he was more interested in addressing the issue of server errors versus 404 errors.
Site owners should generally avoid having URLs that trigger 5XX server errors, regardless of where they originate from, Mueller says.
“If URLs are invalid for your site, you should return 404 so that it’s clear that they’re not valid for your site.”
In general, you should aim to avoid having URLs that trigger server errors (5xx result codes) — regardless of where they come from. If URLs are invalid for your site, you should return 404 so that it's clear that they're not valid for your site.
— 🍌 John 🍌 (@JohnMu) March 17, 2019
In an answer to a follow-up question, Mueller revealed that Google will never stop crawling a 404 page as long as signals for the URL exist on the web somewhere.
“As long as we have signals for that URL (even if it’s just a random link somewhere), we’ll keep trying it from time to time.”
As long as we have signals for that URL (even if it's just a random link somewhere), we'll keep trying it from time to time.
— 🍌 John 🍌 (@JohnMu) March 17, 2019