Advertisement
  1. SEJ
  2.  ⋅ 
  3. Google Patents & Research Papers

Google Patent May Explain How Sites are Ranked

Google Patent May Explain How Sites are Ranked

Bill Slawski wrote about a Google patent that seems to explain what happened in the poorly named Medic update. Bill said that the scope is wider than just medical sites. The patent may show why some sites can’t rank.

Caveat About Patents

It’s important to note that Google does not often confirm whether an algorithm described in a patent is in use.  This patent may or may not be used in Google’s algorithm.

What is Google’s Patent About?

The patent describes a way to classify search queries and websites by topic.

  • Websites are classified by topics.
  • Search queries are classified by topics.

Knowledge Domains = Topics

In this patent, the algorithm is working with what it calls Knowledge Domains which represent topics. Search queries and web pages can be said to belong to specific knowledge domains.

This is how Bill describes the knowledge domains:

“The words “knowledge domain” stands for topics that a query may be about, and is not a reference to a knowledge graph.”

And in his article he states:

“Queries from specific knowledge domains (covering specific topics) might return results using sites that are classified as being from the same Knowledge Domain.”

Topic Pages

A way to simplify this concept is to think of topic buckets. In a topic bucket pages about medical information go in one bucket, pages about natural health go into another bucket, pages about cell phone reviews into a different bucket and pages about personal injury lawyers in a specific city might go into another bucket and so on.

Topic Queries

According to the patent, search queries can also be recognized as belonging to their own buckets as well. So when someone searches for “what is diabetes” Google understands this search query to be a medical question and not a natural healing question.

Query topic and content topic must match in order to rank.

Google Patent Describes Classifying Sites and Queries

This is how the patent described this classification system:

Classifies websites

“The search engine… may use data from a website classification system… to generate search results. For instance, the website classification system… may generate representations for each of multiple websites… and use the representations to determine a classification for each of the multiple websites…”

Classifies search queries

“The search engine… may use a classification for a search query to select a category of websites with the same, or a similar, classification.

The search engine… may determine search results from the selected category of websites.”

Sites Organized into Clusters

The patent describes a process that organizes websites by classifying them.

“…the systems and methods described in this document may improve search results pages generated by a search system by including identification of only websites with a particular classification…”

The classification system could create clusters based on the likelihood that a website would contain the answer to a query:

“The website classification system… may determine the classifications based on a likely responsiveness for the websites in the corresponding cluster.

For example, the websites in the first cluster may have a higher likelihood of being responsive to queries in the particular knowledge domain than websites in the second cluster.”

Then it describes scenarios where a site might be skipped and not classified.

What I find interesting is that it mentions skipping analysis because the cluster that a site is in is distant from known clusters of sites about a topic.

“In some implementations, one or more of the websites used during training may not be assigned to a classification.

For instance, when a website representation is more than a threshold distance from a cluster, or is otherwise not included in a cluster, the website classification system… may determine to skip using the website representation to create a composite representation, e.g., may determine to skip further analysis for the website during training.”

Authoritativeness is a Classification

“…each website in the plurality of websites may have a score. The score may indicate a classification of the website, such as an authoritativeness, a responsiveness for a particular knowledge domain, another property of the website, or a combination of two or more of these.”

The Patent is About More than Medical Sites

What is important to understand is that the processes described in this patent apply to a wide range of niche topics. This is not a Medical Algorithm. It is far more than just a medical related patent.

According to Bill:

“The patent focused on more than just medical sites. It categorized by industry with health just being one of those. It later sorted by quality scores.

The patent provided an example specifically for medical sites… But it made it clear that it involves multiple industries.

The queries were classified based on knowledge domains also.”

Takeaway: Implications for Ranking

The part about clustering is intriguing because it mentions features like authoritativeness and distances from other clusters of sites.

One measure of authority is links.  And it just so happens there is much research into algorithms that sort websites according to topics. The algorithms choose seed sites that represent the most authoritative site in a particular topic classification. Other sites are then scored according to how distant they are from the seed sites.

This algorithm employs a similar system in which a site that is distant from other clusters will essentially be discarded and not considered for ranking.

There is no mention of links in the context of using them as a measure of authority.  But the similarities between link distance ranking algorithms that classify sites according to topics and creates clusters of sites based on topics is kind of a mirror to how this algorithm does similar clustering with content topics.

It may not be unreasonable to speculate that this reinforces the commonly held belief (and make it more urgent) that links from relevant pages may improve rankings.

Takeaway: Google Update Recovery

These insights into Google’s algorithm validate my suggestions about Google update recovery in general and recovering from the so-called Medic Update in particular.

“The so-called “Medic” update appeared to be clearly about relevance issues, not about author bios or “expertise.””

Perhaps one of the key insights from this patent is that it may be helpful to look at ranking issues from the perspective of relevance. In my experience consulting for sites that have lost rankings, if your site’s rankings have suffered a catastrophic collapse then that may be partially related to something similar to what is described in this patent.

If your site dropped a few positions across the board, then that may be other issues like increased competition or relevance.

Read:
Link Distance Ranking

How to Recover from a Google Update

Website Representation Vectors

Google patent: Website Representation Vector to Generate Search Results and Classify Website

 

 

ADVERTISEMENT
SEJ STAFF Roger Montti Owner - Martinibuster.com at Martinibuster.com

I have 25 years hands-on experience in SEO, evolving along with the search engines by keeping up with the latest ...