Googlers Jayant Madhavan and Alon Halevy, members of the Crawling and Indexing team, recently indicated that Google has been testing out some HTML forms to see if they are able to discover web pages that otherwise couldn’t be found or indexed for users. In this experiment to index HTML forms, including drop-down boxes and select menus, Google has taken one step closer to the Deep Web.
In their blog post, the Googlers indicated their process:
“For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the Web page resulting from our query is valid, interesting and includes content not in our index, we may include it in our index much as we would include any other Web page.”
If you’re worried about forms being indexed that you’d rather not be included, Google said that they will adhere to any instructions or tools included in a site that prevents search engines from crawling certain sections. Furthermore, they said they will also omit forms that require password inputs, or those that use terms frequently associated with personal information, such as logins or user IDs.
Concerns that this new enhanced crawling method will come at the expense of regular web pages should be unfounded. According to Google, this method won’t affect sites already a part of the crawl and the method won’t impact page ranking. This new method of crawling is aimed simply to increase the search engine’s coverage of the web.