Advertisement
  1. SEJ
  2.  ⋅ 
  3. SEO

Wikipedia Article Quality Assessment and Ranking Tips for Users and Search Engine Engineers

The subject is getting old, but I believe that the complaining about the dominance of Wikipedia articles in the Google search results will continue until something happens that will make most people happy.

Here are two posts related to this subject from this month where I actually left long comments already, before I decided to make yet another post about it here at Search Engine Journal.

The first one from beginning of this month is the post titled: “Wikipedia: The Barry Bonds of Search Results” by Eric Lander here at Search Engine Journal. The second post is only a few days old and is titled “When Will Wikipedia Rank for everything?” that was written by Aaron Wall’s wife Giovanna at SEOBook.com.

Some Figures for Starters
Eric pointed out the amount of pages in Wikipedia that contain the phrase “meet Wikipedia’s quality standards” although I don’t think that this query is a particular good one to find poor quality articles at Wikipedia. In contradiction to that does it seem like as if about 96-97% of Wikipedia pages rank in the top 10 in Google for their own page title.

If 600 articles that where tested by Russ Jones are enough to come to this conclusion, considering that Wikipedia has 2,099,381 articles at the time I was writing this, remains to be questions.

Giovanna examined the results of a test that was conducted by RankPulse.com, which checked the rankings for 1,000 (mostly competitive) keywords where in 989 of the 1,000 cases did Wikipedia rank in the top 10 = 98.9%. This is even more than Russ Jones got in his test.

Generic Keywords versus Keywords with Commercial Intent
Giovanna pointed out though that those searches were for generic keywords, which did not indicate any intention of the searcher. Wikipedia rankings drop significantly if the generic keyword is extended by a word that indicates a commercial intent, such as “buy”, “quotes”, “rates” and others. This is not surprising, because Wikipedia is an encyclopedia and not an e-commerce or shopping comparison site.

Those keywords are actually more important than the generic ones, at least for a site that is commercially oriented (e.g. an online shop), because people who enter phrases like this into the Google search box are much more likely to convert. They are shopping and not researching.

However, people who do research will get Wikipedia pages in their search results and those pages are not always (often not) the most authoritative pages that are available to the subject. The fact that anybody can edit pages at Wikipedia is a blessing and at the same times a curse. A page is only as good as the editors who created it. Many editors are careful and double check information before their enter it into Wikipedia, but there are also editors who are not that thorough, not to mention all the junk and spam that is being created by people for various reasons, including “fun”, “self promotion” or “promotion of a product or service”.

Don’t Just Take Anybody’s Word – Verify Sources
An user of the encyclopedia should always double check the information and sources before making any serious decisions based on them. This is something you should always do, not just in cases where it is about a serious health issue that affects you and where you get a second or third opinion from other doctors instead of just taking the word of the first doctor for the absolute truth and his advice as the only options available to you.

There are also indicators in Wikipedia articles itself or in the talk page for that article that tell visitors about the quality of an article. All of those indicators are machine readable, which means that Google and other search engines could use them to adjust the ranking of a particular article.

Some examples:

Quality Assessment
Many articles in Wikipedia are being assessed and graded. There exist its own project within Wikipedia that does nothing else than reviewing articles to grade them, upgrade them or downgrade them. There are 6 grade levels an article can have after an official assessment.

Most relevant (highest quality) are articles in status “featured article” and “good article”. Those articles were scrutinized and verified the most and are also frequently reviewed to make sure that the article still meets the quality criteria and can keep its high grade.

“Featured Articles”
There are only 1,708 (1 in 1,220) featured articles in Wikipedia at the moment. You can find a list with all articles that are currently in featured status here. If an article is a featured article is visible on the article page itself. It has a little star in the upper right corner, which indicates its high grade.

SEO

There are many criteria the article had to meet before the featured status was granted. You can see in the statistics for featured articles promotions and demotions per month that it is not easy to keep the status. Demotions occur every month. Just FYI, the Wikipedia article to search engine optimization has the status “featured article”.

“Good Articles”
The next grade beneath featured is good article. Only 3,124 (1 in 672) articles in Wikipedia have this status. Here is a list of articles in “good article” status again; here is what it takes to achieve that status and here the statistics for promotions and demotions per month.

The good article status is not indicated on the article page directly, but at the top of the talk page of the article. Most articles have a talk page.

Egyptian Pyramids

Articles that do not have one yet have to be watched carefully. The absence of a talk page is a good indicator for a page that is new or has not many editors looking at the article. The activity on an articles talk page is a good indicator about the quality of the article as well. It tells you how many people did look at the article and discussed its content.

Just to give you an idea how hard it is to obtain the status good article or even featured article, the article to affiliate marketing, where I spent a considerable amount of work on already just failed recently to meet the good article criteria and was rejected. The article is not perfect, but many people I talked to from the industry said that the article is actually pretty good already.

Article Main Page

Main Page

Article Talk Page

Talk Page

Article Edit History

Edit History

Other criteria that indicate how good the quality of an article (might) be .

Dispute Templates
If another editor has a problem with the content of an article, there are numerous “templates” that can be added to it to indicate what the problem seems to be. The template is prominent visible at the top of the article and remains there until the dispute is resolved. It is a signal not only to readers of the article, but also to other editors, especially editors who were working on the article in the past already and added it to their “watch list”. The “watch list” alerts an editor about changes to pages he manually added to that list.

There are templates for virtually any issue somebody could have with an article, such as factual accuracy, neutral point of view, lack or quality of references, missing citations, tone the article is written in or controversial nature of the article. There are also templates to flag specific sentences within the article itself. Indicating the missing reference for a fact that is stated in the article is very common.

Number and Type of Edits/Traffic
Not so much an indicator of article quality, but a good indicator for determining what the chances are for spammy content and links to stick for long in an article is the statistic with the top 100 most visited articles at Wikipedia (updated frequently) and also the “edit history” or the article itself.

Lots of edits are an indicator of how much attention an article gets. Many edits usually indicate that the quality (and especially the neutrality) of an article is better than an article with only a few edits by only one or two editors. This is not always the case though. The exception is highly disputed articles where editors have radical conflicting views and engage in what is called an “edit war“.

Conclusion
All those indicators can help humans to evaluate the quality and accuracy of a Wikipedia article, but they are machine readable as well and search engines could put them to good use to adjust the ranking of the article. The first type of articles they should automatically “penalize” (the -30 penalty comes to mind) are articles that have any type of “stub” template in it. Stub templates are usually added to the end of the article. All templates (including the dispute templates) automatically assign the article where the template was added to special categories that are associated with those templates.

The article in the example above is a world-wide-web stub. The template added to the articles source code the in addition to the shown message also the reference to the special category the article was assigned to automatically. The HTML code looks like this. All templates I mentioned do this. It does not take a rocket scientist to be able to determine whether the article is a stub or not.

<a href="/wiki/Category:World_Wide_Web_stubs" mce_href="/wiki/Category:World_Wide_Web_stubs" title="World Wide Web stubs">World Wide Web stubs</a>

I hope that this little primer is not only interesting for search engine engineers, but regular Wikipedia users as well. I provided you with some information that helps to assess the quality and reliability of an article in Wikipedia. This is no substitute for verification and your own due diligence. Please keep that always in mind.

Cheers!
Carsten Cumbrowski

Carsten is an internet marketer since early 2001 and primarily active in affiliate marketing and search engine marketing. He is an active Wikipedia editor since December 2005 and made over 3,500 edits in Wikipedia since then, over 1,500 of those in the articles main space. He also operates the free internet marketing resources portal at the domain that bears his name, cumbrowski.com. The site is not his business or source of income. He makes his money as affiliate and paid search for the most part.

Category SEO Social Media
ADVERTISEMENT
Carsten Cumbrowski

Carsten Cumbrowski has years of experience in Affiliate Marketing and knows both sides of the business as the Affiliate and ...