INside Performance Marketing
Unlocking the Google Panda Algorithm with Bayesian Mathematics
Image Credit  fortherock Creative Commons license

Unlocking the Google Panda Algorithm with Bayesian Mathematics

Panda 4.0 was an algorithm introduced by Google on the 20th May 2014.  The algorithm was introduced to combat web spam in response to SEO’s using unethical techniques to manipulate their client’s rankings in Google’s search results.  There have been many theories and articles that have sought to explain the Panda algorithm, however none of the resulting claims had been supported by scientific evidence that are 99% statistically significant, until now.

Challenge

While Google has been commendable enough to allude to Panda’s ingredients in a number of articles on its Google Webmaster Central blog, Google does not reveal which of ingredients are the most active or the optimal use of the ingredients for website design and content.  

The study gave us an opportunity to explore which of these ingredients had a causal effect on SEO traffic, and the weighting of those ingredients.  This was so that SEOs could explain and respond to Panda 4.0 in a predictable manner.

Solution

To deconstruct Panda we segmented sites within our data set into groups that as a result of Panda had:

  • An increase in traffic
  • A decrease in traffic
  • No change

If there was a change in traffic, we used Bayesian maths to calculate the likelihood that the change was in fact due to an event that happened on the 20th May 2014.  i.e. Panda 4.0 and nothing else!  Once the sites were segmented we were then able to start analysing for possible causal factors.

Process

MathSight didn't know what Panda was looking for, so we decomposed website architecture and content into a number of unique MathSight defined candidate signals such as the the use of rare words, number of external CSS files, the use of paragraphs etc.  

The next step was to use mean differences to spot patterns that are directly identifiable with the change in traffic as a result of Panda.  Some of the signals may look like patterns; but they could be just coincidence, so we analysed each of the candidate signals using ANOVA (Analysis of variance).  

If the signal had 5% or lower probability of causing a change in Google traffic then we know that there is a 95% (or higher) statistical likelihood in the signal being a causal factor for a change in traffic as a result of Panda.

From the dataset, we also found the HTML character distance limit for content.

We carried out over 200,000 visits of the data per website, to ensure the findings were repeatable and robust, should a third party data scientist conduct a similar study on an identical selection of web sites.

Results

The key stand out signal for MathSight was the proximity of the start of website's on-page copy to the top of the page.  This was measured in terms of the HTML character distance of the paragraph to the top of the source code.  

So our analysis has established that Google is rewarding sites that have positioned the content in an immediately readable position; the further the body copy is from the top, the more it was penalised.

Continue the conversation

Got a question or comment – post on Twitter, Facebook or LinkedIN.

 Andreas Voniatis

Andreas Voniatis

Andreas Voniatis is a Data Scientist of Artios - the online marketing agency that uses maths and data science technology to provide quantified content strategy, social media, SEO and online PR.  Andreas trained and qualified as a management accountant (CIMA) after graduating cum laude in Economics from Leeds University. Andreas then switched career in 2003 as a Search Engine Optimisation (SEO) consultant holding various Head of Search roles for award winning agencies and prestigious startups. Andreas has been featured in numerous media including PerformanceIN for using Bayesian mathematics to uncover the secret ingredients to the Google Penguin algorithm. In 2013, he retrained as a data scientist and in 2015 launched Artios.

 

Read more from Andreas

Related Articles

Join over 10,000 performance marketers for the ultimate weekly update on industry news