I recently published an article here on Search Engine Journal about the poor state of correlation studies in the SEO industry.
This article showed that people with the best intentions conducted these studies, but they often lacked the proper levels of statistical know-how for such a gargantuan task.
In other cases, the studies used improper sampling data, lacked any sort of peer review, or were biased by their presentation by companies trying to sell a product associated with the results.
While the overwhelming majority of responses to the article from the SEO community (including some true legends of the business) were positive, there were a few detractors.
Former Moz employee, Russ Jones, even went to the trouble of writing an entire response to the post, calling out everything from my use of William Mulholland as a metaphor to the conclusions made by the statistician interviewed for the article, Jen Hood.
While Russ ultimately agreed with the point of the article, he took umbrage with how I made that point.
Given the subject matter, responses like this were expected.
However, one point that Jones made in his response, specifically about the consistently weak correlations found in these studies, caught my interest: “Weak correlations are going to be part of any system.”
As I stated in the original post, I am not a statistician, so once again, I reached out to statistician Jen Hood for an explanation.
“He’s right,” Jen replied, “Complex systems can include ‘weak’ correlations.”
The Concept of Emergence
While he didn’t mention it by name, Jones was referring to a phenomenon known as “emergence.”
In her book, “Engaging Emergence,” author Peggy Holman defines emergence as a:
“…Higher-order complexity arising out of chaos in which novel, coherent structures coalesce through interactions among the diverse entities of a system. Emergence occurs when these interactions disrupt, causing the system to differentiate and ultimately coalesce into something novel.”
Or, more simply, “order arising out of chaos.”
Holman’s favorite Carl Sagan quote is, “If you wish to make an apple pie from scratch, you must first invent the universe,” so perhaps we should try and define emergence another way.
If you’ve spent any time in the business world, you’ve probably heard someone in management misquote Aristotle by saying, “The whole is greater than the sum of its parts.”
My reference to Aristotle when discussing emergence is no coincidence.
He was one of the first to consider this phenomenon that has influenced art, philosophy, science, and systems theory, such as collective behavior, game theory, pattern formation, or, in our case, the machine learning systems associated with search engines.
Emergence is when a complex system, such as a human body or a search engine, appears to have abilities that its components do not have on their own.
In numerous publications, this concept describes everything from ant colony design to the creation of snowflakes; I first read about the idea from a scientist who was explaining all the individual processes in the human body needed for our eyes to see correctly.
In her study published in the Journal of the Optical Society of America, on the correlations between the abnormalities in the human eye in relation to the heart and lungs, scientist Karen M. Hampson and her team noticed that individual portions of the factors she reviewed were weak in nature.
Furthermore, and perhaps even more importantly, while the concept of emergence does allow for weak correlations, those correlations must not be so weak that they are entirely insignificant.
As my statistician, Jen Hood, pointed out concerning feedback on her statement of the weak correlation around Domain Authority, “Correlations can indeed be weak in complex systems, but these were really weak.”
The Difference Between Science & SEO Studies
Critics of my article thought that I was saying that search engines are unknowable; that those that attempt to study their components are wasting their time.
This critique is ridiculous.
Correlation and other studies that investigate the individual components of search engine studies have the potential of usefulness.
It’s just that this usefulness is lost when the studies are conducted or presented improperly.
For instance, most studies, while sometimes stating the caveat that “correlation does not equal causation,” fail to mention alternative conclusions besides, “this one thing we notice could be a ranking signal.”
Furthermore, of all the studies I have read over the years, none have bothered to bring up the concept of emergence concerning their findings specifically.
That is, their findings may be significant, but ultimately, they are still a single part of something that functions as something “greater than the sum of its parts.”
Or, in the case of Google, the “product” of its parts.
As Gary Illyes mentioned to a small group of SEO professionals in Sydney, Australia, in 2019, on how search engines work, the individual scores from the various algorithms that make up Google’s ranking system are not added together but multiplied.
This difference in operators is one of the reasons why understanding emergence within the scope of search engines is so important.
Without the proper context that these so-called discoveries about individual signals may have in relation to hundreds of other signals, some SEO pros may overly focus on that aspect and question why they have not rocketed up the SERP after implementation.
We Can Do Better
As Jen Hood stated in my original article, it is possible to study Google’s algorithms by way of “massive randomized testing over time, controlling for variation, and randomly assigning changes to be made to improve or decline in ranking” and done on “a large scale across many different topics, styles of searches, etc.”
Long-time SEO professional and now a fellow Search Engine Journal contributor, Micah Fisher-Kirshner, recently went into great detail on this subject in his article, How to Analyze Google’s Algorithm: The Math & Skills You Need.
However, almost immediately, he warns that “I stand against the view that a basic correlation analysis is sufficient for analyzing Google’s algorithm.”
After a short time in the SEO business, even the most novice of SEO workers will discover that Google updates itself every day, sometimes multiple times a day.
This fact leads one to question, even if we did put in the work to analyze Google’s ranking algorithms properly, would it even matter if that algorithm had changed by the time the study was published.
In a recent telling of Nikola Tesla’s life, simply called “Tesla”, screenwriter, and director Michael Almereyda ingeniously employed Tesla’s longtime love interest, Anne Morgan, as a narrator.
During a scene where the scientist was testing the device that would ultimately become known as a Tesla Coil, she states:
“He was synchronizing electricity in the sky and the earth with currents surging through his magnifying transmitter. It was like getting the ocean to sit for a portrait.”
While we may never be able to get Google to “sit for a portrait,” the concept of emergence at least lets us know that, if the correlations found in SEO studies is at least on the stronger side of weak, we might be able to see farther than ever before.
More Resources: