In a recent LinkedIn post, Google Analyst Gary Illyes challenged a long-standing belief about the placement of robots.txt files.
For years, the conventional wisdom has been that a website’s robots.txt file must reside at the root domain (e.g., example.com/robots.txt).
However, Illyes has clarified that this isn’t an absolute requirement and revealed a lesser-known aspect of the Robots Exclusion Protocol (REP).
Robots.txt File Flexibility
The robots.txt file doesn’t have to be located at the root domain (example.com/robots.txt).
According to Illyes, having two separate robots.txt files hosted on different domains is permissible—one on the primary website and another on a content delivery network (CDN).
Illyes explains that websites can centralize their robots.txt file on the CDN while controlling crawling for their main site.
For instance, a website could have two robots.txt files: one at https://cdn.example.com/robots.txt and another at https://www.example.com/robots.txt.
This approach allows you to maintain a single, comprehensive robots.txt file on their CDN and redirect requests from their main domain to this centralized file.
Illyes notes that crawlers complying with RFC9309 will follow the redirect and use the target file as the robotstxt file for the original domain.
Looking Back At 30 Years Of Robots.txt
As the Robots Exclusion Protocol celebrates its 30th anniversary this year, Illyes’ revelation highlights how web standards continue to evolve.
He even speculates whether the file needs to be named “robots.txt,” hinting at possible changes in how crawl directives are managed.
How This Can Help You
Following Illyes’ guidance can help you in the following ways:
- Centralized Management: By consolidating robots.txt rules in one location, you can maintain and update crawl directives across your web presence.
- Improved Consistency: A single source of truth for robots.txt rules reduces the risk of conflicting directives between your main site and CDN.
- Flexibility: This approach allows for more adaptable configurations, especially for sites with complex architectures or those using multiple subdomains and CDNs.
A streamlined approach to managing robots.txt files can improve both site management and SEO efforts.
Featured Image: BestForBest/Shutterstock