seotworank: Correlation Data for SEO and Social Media Analysis - Part 2

Posted by Aaron Wheeler

Last week, Rand discussed the importance of correlation data in general and how you can use it for SEO research. It's a lot easier to get things done if you know which tasks are high priority and which are low, and correlation data can help. This week, Rand finishes off this two-part series on correlation data by discussing some specific observations we've made about correlations between SEO tactics and their effects on rankings. There are some very interesting conclusions, so check it out! Also let us know in the comments below if you've been able to draw any correlations of your own.

Video Transcription

Howdy, SEOmoz fans. Welcome to another edition of Whiteboard Friday. This week the second in our two-parter on correlation data for SEO and social media analysis. I'm really excited about this one. We're going to be talking about very specifically a few of the really interesting things that we've observed from correlation data.

Last week, if you recall, we talked about a lot of the basics of correlation data. I showed some simple examples why it's useful both in aggregate and when studying some of your own stuff.

Today I'm going to be talking about some of those big aggregate average numbers collected from thousands of points of data to see what predicts better rankings over all. I want to be really clear, just to reiterate from last week. Remember that correlation is not causation.

One of my favorite examples, the one I like to use a lot is the one with dolphins. So, dolphins swim in pods, and some of the ones that swim in the front of the pods have different characteristics than ones that swim at the end of the pods, just like things in the search results have different features at the front of the search results - the top of the search results position 1, 2, 3 - than the things that are further down on the search results, 5, 10, 15, 20. Right?

So, we look at an analysis of what makes for front of the pod swimmers in both scenarios. With dolphins, it's things like, well, they have larger dorsal fins and they've got stronger flippers. They also have more damage. They've got like scars and pieces of glass or something like that, like cuts and scrapes in their flippers.

So two of those things, the bigger dorsal fins and the stronger flippers, that probably is causal. That's what's causing them to be front of the pod swimmers. But the damage is that really, it has a high correlation, it's got a good correlation with swimming at the front of the pod. Does that mean that more damage means you'll swim at the front of the pod? If we were to bash up a dolphin's fins who's swimming at the end of the pod, would he suddenly move to the front?

No. Right, it's correlation not causation. It's features that predict what people will look like up there. So when we are looking at things that are rankings, just remember this is correlation, not causation. Some of the features here might be things like damaged flippers, not stronger fins. So keep that in mind as we're looking at this.

That said, let's talk about some of these cool things. Number one, one of the things that we saw last June, we did a big analysis of Google versus Bing and the different ranking factors, looking at correlation across 11,000 search results in both. We had a very, very small standard error so that we can be very sure that these correlation numbers go across probably all the search results at the time.

We looked at things like number of linking root domains and the keyword in the title, the keyword in the domain name, document length. We looked at the length of the title and mozRank and PageRank and dozens of other features. What we found was that Google and Bing are not so different. In fact, on a lot of the SEO basics, the things that you would do for Google or for Bing are the same that you would do for the other engine.

That's really cool to learn because it means that we don't have to develop one site that's trying to rank well in Google and one site that's trying to rank well in Bing. We do different things for different ones of them. No, in fact, these engines are really, really similar. Then, of course, we found out in January of this year that Google had been running these experiments because they thought Bing's rankings looked too close to Google rankings. They were worried, and so they did this click stream, honey pot, and, of course, discovered that Bing was essentially measuring through Internet Explorer where people click after they perform search on any engine, including Google. Google got upset about this.

Nevertheless, I think that says, oh well, our analysis that these two engines are pretty similar, kind of verified by some other data including Google people thinking, hey, wait a minute these are looking really, really similar, right?

We get this big takeaway that, unlike the late '90s or even the early 2000s when SEOs used to build different websites targeting different search engines because they wanted different things, today we can really build one. That's a great takeaway. God, it saves us a ton of time and worry.

Number two, Facebook shares are highly correlated with Google rankings. This was one of our takeaways very, very recently, in March of this year, so just about a month ago, maybe a little less, depending when this Whiteboard Friday airs. You can see here that Facebook shares, in fact, were our single highest correlated, number one. Highest correlated metric with ranking higher, predicting that you would rank higher in Google among all the things that we measured.

We measured about 150 different factors, everything from keyword usage on the page to link metrics, to things like tweets and that kind of stuff. Those Facebook shares just seem to have an incredibly good correlation. A correlation so high, especially in, remember this 0.29 on a scale of 0 to 1 would not be that high. In a really simple system, where there's only one or two metrics that predict, 0.29 would be probably kind of low. But in a system where there's supposedly 200 plus unique ranking factors - probably much more than 200 plus at this point - but in a system with that much complexity to see one metric that predicts such a high correlation is extremely rare. In fact, we've only seen a few metrics that are up in that 0.29, 0.3 range ever in the history of looking at correlation data.

We can kind of say, huh, seems like Google must be using these Facebook shares. Not necessarily directly. They might be getting more data from Facebook, but there's something going on there. Of course, Google themselves and Bing as well admitted in an interview with Danny Sullivan on Search Engine Land that yes, we use data from Facebook and from Twitter directly in our web rankings to help with our algorithmic search. Facebook shares, you can see that correlation. You've got to be thinking, as an SEO, how do I get me some of those Facebook shares on my pages?

Number three, we looked at, one of the weirdest things to come out of our March 2011 data was the fact that no-follow links seemed to have a positive correlation with rankings. One of the things we did when we saw no-follow links having a really high correlation was we went, well that's just weird. Maybe what's going on here is that no-follow links and followed links have a high correlation with each other, and in fact, they do. If you have lots of no-follow links, you tend to also have lots of followed links. So, that makes sense. All right maybe that's all that's causing it. But then there's this one weird, weird data point - well, there's several weird ones - but there's this one weird data point around the percentage of followed links having a negative correlation, kind of a strong negative correlation with rankings, which sounds weird, but it suggests that websites and web pages that don't have any no-follow links aren't performing as well as those who have at least some or some reasonable percentage of them.

You kind of think about it. You scratch your head, like, "What? Wait, does Google want me to have no-follow links?" When you think that way, just remember correlation, not causation. So, it's not necessarily that Google's saying, "Oh, well, this website doesn't have a lot of no-follow links so let's rank them lower." That seems kind of crazy to me. I don't think that 's the case. Possible but I don't think that's what's happening.

What I think that's happening is that people who do natural things, normal websites, this is not normal. It is not normal to have a website that only has followed links. It's almost like, man, you must be doing something funny because normal websites earn links from no-follows. They get linked to on Wikipedia, which is no-follow. They have blog comments that people leave and point to them. Those are no-follow. They have social media profiles. Almost all of those are no-follow. People tweet about them. Those are no-follow. There are all of these no-follow links that exist from sort of good places on the Web where you would naturally be mentioned if you're a good website.

So, to have only followed links is weird. No wonder . . . I don't what it is exactly. We don't know what it is exactly that Google's measuring here, but I'm sure they're looking at this, not at this but at metrics that say, huh, this website does not interact in its ecosystem. One of the things that predicts those is no-follow links, and that's why you see that negative correlation.

Lots and lots of cool stuff, interesting data that we can take away from correlations even though we know it's not causal. We can say to ourselves, huh, this probably means, right? This probably means, oh, I'd better be interacting in the environment, and I shouldn't worry about getting no- follow links. This is not going to hurt me. In fact it might actually predict that I'm doing more good things on the Web.

In this case, right, it's saying, oh, you know what, Facebook likes have a much lower correlation, because liking something on Facebook, clicking that thumbs up button is so much easier than sharing and actually posting to your wall. I know the like textually posts to your wall, but it doesn't show up in top news. It only shows up in recent updates. So sharing, oh, that's a good behavior to start encouraging. Maybe I should be encouraging more shares than likes on my pages. Having this, the Google and Bing data says, oh, I can build one website and do a lot of the key basics that are going to be the same for all of them.

This type of data is incredibly useful. We love doing it. We plan on doing a ton more. If you've got requests for things that you would like to see us do, please put them in the comments and we will be happy to try to measure them in the future.

Hope this data is interesting for you. Hope lots of you start doing more correlation analyses, rigorous data analyses of this type. I think it will be assume if we, as a community, start to make a lot of our insight and our intuition a little more scientifically based, math based. I'm very excited for it.

All right, everyone. Thanks for watching these two Whiteboard Fridays. We will see you again next week. Take care.