I’m not a programmer. I can hack things together, and I’m great at copy / paste, but creating code from scratch that isn’t massively buggy? Not really one of my talents. As an SEO, however, that isn’t really important. Will Critchlow recently said that “Being able to make your own tools is going to become an increasingly important skill of the modern day SEO”. That’s a sentiment I completely agree with, so to demonstrate how this can help I set myself a challenge:
Find, qualify, and categorise a few thousand link prospects in an afternoon. Without spending a penny.
My chosen area for this is pets, and in particular dogs. As co-owner of Pet365, everything needs to be done as quickly and efficiently (ie. cheaply) as possible. So I started by searching for “top lists of dog blogs”. Copy and pasting the URL of each of these lists into a text document (without even checking their contents) gave me 20 sites that I thought would have some great link prospects. There were more out there but time was of the essence.
Next up, Technorati. There’s a huge section dedicated to pet blogs on there which expands to over 100 pages. I treated each of these as a list of potential targets and, with some quick excel wizardry, had a collection of Technorati URLs that I knew would give me some more link targets. The great thing about this is that if a blogger has gone to the effort of getting listed then chances are that they’re reasonably serious about their site, and therefore more likely to be active and producing good content.
Finally, Blog rolls. I’ll come clean here – I already had some insider knowledge and knew where I was most likely to be able to find blogs that had a list of other potentially relevant sites. This saved me some time but, to be honest, I could have easily skipped this step and it wouldn’t really have affected the outcome. Within the pet blogosphere there are networks like DogTime and BlogPaws – chances are that, whatever your niche, something similar will exist.
Less than an hour in and I had 150 URLs that I thought would be both potential link targets themselves, as well as linking to other sites that could be useful. But how would I know for sure? This was going to be the tricky bit.
Finding the Potentials
The first step was to create an account at Citation Labs Tools and use Garrett’s ‘pay with a tweet’ option to supersize my account. The cost model for these tools is based on the amount of bandwidth you use so, with my initial 10mb and an extra 250mb for the Tweet I knew I’d have more than enough for this little challenge.
The first thing to do was to scrape all of those URLs from earlier for any outbound links. You could do this manually, but we only have an afternoon here, so fire up the Outbound Link Checker, copy and paste your list of URLs into the ultra user friendly page layout and hit ‘Go’.
Being British it’s impossible to work for more than 39 minutes without a cup of tea so while that was running it was time to make a brew.
Upon my return I was greeted with a completed and ready to download CSV file. It contained a total of 15,000 rows which, I deduced, meant that I was onto a winner. Going through each of those individually, however, would be pretty much impossible. The first thing to do was filter out any duplicates. So, again, I fired up Citation Labs, went to the URL conversion tool, copy and pasted in my list, and within a few seconds had a slightly smaller list of domains.
Using an epic amount of common sense I knew that not all of these domains would still be active. Some would have expired, some would’ve been 301 redirected to other sites, whilst others would have errors. Fortunately Garret had the answer for this, and within a few minutes I’d checked the status of each domain (1000 at a time) and them all combined into a new CSV file. Finally, I re-ran the duplicate removal tool (just in case any of these sites had been 301’d to another that was already in my list) and saved my final list of prospects.
Easy as that. Kind off. This was really just a big list of websites that I knew nothing about. The outbound link checker would have found everything from blog comments to paid links and banner ads. There’d definitely be some pure gold in there – I just needed to find it.
Page Semantics
TextWise is, in short, API heaven. It allows you to input a URL and will return you a list of keywords associated with that page which are based on the old school DMOZ categories. You need a little bit of programming knowledge to get this to work, but not a huge amount.
Essentially all we need to do is create a MySQL table, and then from inside a PHP script loop through all of the URLs we’ve found and see what keywords are associated with them. I started by creating a file that would do one domain at a time, played around with the TextWise API a little to make sure that results were consistent, and then updated my script to allow as many sites as I wanted to be inputted. In total this took an hour or so to create a very basic (and buggy) version, but you could easily hire someone on oDesk to do the same thing to a better standard very cheaply ($100 – $200).
The result was astonishing. From my 20,000 URLs I ended up with about 500,000 rows of keywords, weighted to show how important they were on the page. Running a few quick queries through phpMyAdmin shows that I had 2000 sites with content related to either dogs, cats or pets. My only issue was that I didn’t know how to contact them and, more importantly, whether I should be spending time doing so. Fortunately SEOMoz has a (free) API.
Metrics
Doing a quick Google search for ‘SEOMoz PHP API’, found the relevant library, and hit the download button. Within a few seconds I’d copy and pasted this into my FTP client, uploaded to the web server, and was ready to start playing around. The great thing here was that the library included some examples (although they’re not great) and, along with the documentation, I was quite quickly able to get SEOMoz to check an individual URL for its page and domain authority. All that needed to happen now was to loop through each URL in my database, check its stats, and store them.
Again, this is fairly straightforward but due to SEOMOz now imposing a rate limit you’re going to need to do some extra tweaking to make sure that you’re not firing off too many requests and ended up with the dreaded ‘Throttled’ error message. Have a word with your programmer (or create a job on the earlier-mentioned elance or oDesk) and point them to this post.
Finally, it’s a case of finding the contact information for each site. I use a mixture of automated and manual checking for this but the process is basically as follows:
- Export sites from my database that match the required keyword;
- Filter those with a Domain Authority within my chosen range (typically 30 or higher);
- Run Citation Labs’ contact finder on the results and download the CSV;
- Do some excel VLOOKUP wizardry to drag in email address / contact forms to my list of sites;
- Outsource the manual finding of any missing email addresses and, to increase conversion rate, the first and last names of the site owner along with their Twitter / Facebook profiles.
My afternoon challenge ended after step 4, but by this point I had around 2000 sites that I knew were related to my chosen niche and would potentially be worth getting in touch with. The final stage, step 5, is entirely dependent on what you have to offer. In my case it was some fantastic infographics but the important thing to remember is that if you’re contacting people it needs to be for a reason other than just begging for a link. Good luck!