There’s been a lot of fearmongering talk lately about Google’s human quality raters and just exactly what they’re used for and how their ratings are used internally at Google. Discussion was running rampant at Pubcon where I witnessed several SEOs grill Matt Cutts about the subject, and seems to have spread like wildfire across the blogosphere in the last month or two. I’d like to add my take – specifically that we shouldn’t panic about this at all but rather use what we’ve learned to our advantage.
So who am I to talk about this?
Well, in the interest of full disclosure: I used to be a quality rater back around 2005ish. Since I signed an NDA way back when and I’m not sure if it’s expired or not (I believe it is, but I can’t find a copy) I’m going to make sure everything I mention can be found (or easily arrived at) with a Google search. In addition, you should regard anything I say as a “theory” and not actual “truth.” The opinions herein are mine and not those of Google or anybody else blah blah blah legal stuff let’s get back on topic.
Quality raters aren’t a new thing at all. They’ve been around forever and have probably been involved in every major algorithm update since Florida (and who knows I’ve even heard rumors that Google used friends and family as raters before they hired t emps.) Google usually posts the jobs on their own website (I only see adwords rater openings at the moment but check back. Additionally you used to be able to apply by sending an email with subject “Quality Rater” to temporaryjobs@google.com – not sure if that still applies.)
The point is, Google has been using quality raters forever. They log into a system called EWOQ and perform a variety of rating tasks. Some of those tasks are as simple as being given a website and a query and rating it as Vital, Useful, Relevant, Not Relevant, Spam, Thin Affiliate, Porn, etc.
Other tasks involve looking at multiple search result sets for a query and determining which set was better overall. (note: Overall means they look at ALL the sites in the result set, so a listing with 7 useful 1 vital 1 relevant and 1 spam may be better than one with 8 useful and 2 non-relevant results) See how tricky that gets? That’s why they need LOTS of data.
And LOTS is exactly the point. Google isn’t using these raters to flag your site. Matt Cutts already debunked it – just as he did in person at Pubcon when some well-known SEOs confronted him at the mixer.
We don’t need to take Matt’s word for it though, we just need to realize that using quality raters to penalize sites is simply not robust and scalable enough to apply to the entire web. In fact, whenever you have a theory about Google, asking “is this robust and scalable?” is a good way to find out if they’re really doing it. The concepts of robust and scalable are HUGE within Google. Seriously, they use them in almost every patent application and scholarly article they publish. It’s just how computer scientists think.
So how are they (most likely) using quality raters?
My best guess (and what Matt alluded to on Twitter) is that they’re using the quality raters to evaluate algorithm changes – and that they’re doing it in a variety of ways. Evaluating the quality of an algorithm is a rather hard concept. One way is to put it live for a select group of people and then look at their click through rates, how often they refine their search, etc. But what if that algorithm change only applied to a small percentage or a certain type of query – would you get a decent enough sample size?
I firmly believe that sample size is at the heart of the Google quality rater program. Not only for ensuring that the proper types of queries are used as an algorithm strength indicator, but also for creating a proper sample size of useful, relevant, and spam sites. Once they have that sample, they can then start running more advanced automated analysis of the algorithms – against whatever task they’re trying to measure.
I use the word “task” because I’m sure if we had access we’d see that there’s some sort of neural network type algorithm associated with the human raters as well – mostly because their input seems to fit with the whole “learning paradigm” model of neural nets.
Given that, it wouldn’t make sense for Google to use the data on a site by site basis. Not only is it not robust and scalable, but it doesn’t make sense to fight spam manually. They’d miss way too much – plus, it would easily become corrupt in the same way that the Open Directory Project did.
So what can I learn?
The biggest takeaway for SEOs doesn’t have anything to do with the fact that Google uses quality raters or even how Google uses them. The biggest opportunity lies in understanding the definitions of Vital, Useful, Relevant, Not Releavnt, Spam, and Thin Affiliate contained within the leaked quality rater guide. Instead of asking “can a human rater penalize me?” ask yourself “will a human searcher visiting Google view my site as useful? Why? If not, what can I do to make sure it’s seen as useful?” (note to self: do a followup article on useful.) Look at the definition of thin affiliate and make sure your site isn’t close to that. Do the same thing with the definition of spam.
Don’t do it because of the raters though, they’re just rating the algorithm to see if it’s doing what’s intended. Do it because Google has told us that they want to show sites that fit the useful profile and not show spam or thin affiliates in their search results. Focusing less on the “how” and more on the “why” will help keep you ahead of the curve on algorithm changes while helping make your SEO efforts more sustainable.