What We Learnt From A Pills Link Hacker

This post is a first for me. First time there is a guest post (well semi-guest) on this site. It also is my first collaboration with one of my favourite Research SEOs Neyne.  Neyne (Real name Branko Rihtman) doesn’t blog very often, but when he does it is always worth a read. This is a two part post, the first by Neyne, with the second part by yours truly.

My last post was about using WordPress Plugin Flaws to link build, “aka soft hacking”. However what we are about to demonstrate is another opensource CMS, Joomla, has just as big a flaw as WP. We didnt investigate the backdoor, or how it was done, however we do demonstrate the extent to which it works.

Worse Than Blackhat, Meet The Hacker SEO

Just like with “SEO is Dead” debate that raises its lame head in seemingly regular intervals over the past few years, so does its not-so-distant cousin, the “Whitehat vs. Blackhat” debate. There has been one raging on the popular blogs in the last week or so and, just like with its useless relative, this round did not bring any new arguments nor has it convinced anyone on the either side of the argument. However, not often does one get to encounter a true black hat campaign, one that leaves you with no doubt as to whether it is useful or not nor whether it is illegal or not. Thanks to a tip from one of my SEO buddies, I have taken the glimpse into the eyes of the beast, and it ain’t pretty.

Just before we dive in, I want to make something clear. I don’t usually out websites or SEO techniques. I think that outing is a cowardly practice, done by people that are not capable of outperforming others. Or in the immortal words on one of Aaron’s tshirts: “I have a very high tolerance for spammers, but a very low one for weasels”. That said, the techniques outlined in this article are most probably illegal (not a lawyer, so don’t want to be definite on that one). They include hacking into other people’s sites, flagging them as pill-related, squandering their link equity and eventually getting them flagged as compromised in Google SERPs, thus seriously decreasing their CTRs. Asshatery like that should be eliminated and I feel no remorse for doing so.

It all started with an enquiry of the mentioned friend about one of his client’s sites. The site seemed to be OK, nothing irregular about it; however, when looking at the Google cached version of the site, a footer appeared:

Pills Footer

Pills Footer

This footer does not appear when the site is visited with Googlebot useragent, so my guess is that this is a case of IP cloaking. The more interesting thing is that none of the sites linked in the footer seem to be V1@6r@ related.  They are regular sites on a wide range of topics. So my first thought was that this is a hatchet job – a slimy SEO company that is trying to ban their competitors by creating thousands of artificial, spammy links on hacked sites. However, when looking at the source code of Google cache of each of the linked sites, a different picture started to emerge. Check out the differences between the <header> element as it appears on the live site vs. how it appears in Google Cache:

Google Cache Header of Haked Site

Google Cache Header of Haked Site

So my next question was whether these site rank for any of the linked phrases. Almost all of them did. Check out this SERP for [V1@6r@ price] (6600 Global Exact Match monthly searches)

Ranking for V Price

So here came a head scratching part. It seems like someone is hacking into Joomla based sites, planting links in their footer to other hacked Joomla sites, whose header is cloaked to show V1@6r@-related keywords. But what is the point? Why would someone send V1@6r@-relevant traffic to totally unrelated websites? Then I clicked through to the site from the above SERP. This is the site I got:

Now you See It

Now you See It

If you go to the site directly, by typing the URL into the address bar, this is what you get:

Now You dont

Now You dont

So not only are they doing IP cloaking, they are also doing referral cloaking to show all visitors referred from Google SERPs .  Here is a partial list of sites, with their original Titles, hacked Titles, keyword they targeted with footer links anchors and their ranking on Google.com for that keyword:

Partial List of Hacked Sites

Partial List of Hacked Sites

There is one thing that is common to all the websites in question – they have been all created in Joomla. Furthermore, it is easy to target them as there is a clear indication they are Joomla based in their header:

<meta content="Joomla! 1.5 - Open Source Content Management" />

***********Investigation Ends*************

Search Volumes for v1@6r@

Search Volumes for v1@6r@

So Neyne has shown you the what, how and why. Hacking these many sites for those rankings isn’t an easy job, unless you prebuild in hacker doorways as I demonstrated in the WP Plugin Security fail. The only other way to do this is to run a number of brute force scripts on known weak spots of various servers and CMS’s. I want to show you what I learnt from investigating those links with Neyne. Like I said with the JC Penney scenario, when you get a chance to learn, do it.

10 Things I Learnt About The V1@6r@ Link Hackers

1. Old spam tactics still work

A while ago, I wrote about Spam Tactics, Then and Now, where I identified a number of tactics that still work. This discovery reinforces what I learnt back then, that old spam tactics dont die, they just resurface. And that Google isnt really as sophisticated an algo that people believe it to be.  Some of the points below take this into more detail…

2. content is not king

None of these sites that we investigated were serving up content that was V1@6r@ related. Of course quite a few had cloaking which meant that some conteant was being shown, but after investigating a number of these sites, not all had redirection or cloaking set up as yet.  And as a result just had links that were doctored.  So why did they rank for these keywords?

Just links. Links, links and more links. What about great content? Nope. Links.

Using Majestic, lets look at what the links could be like:

Look at all those links! (click to view Majestic data)

Look at all those links! (click to view Majestic data)

3. anchor text over rules all

Wordle for Links

Wordle for Links

Relevancy, thematic links, semantic analysis etc etc can all go to pot if you are working with a large scale access to link text manipulation system. Doesn’t matter where they are placed, and doesn’t matter where they came from.

An advanced analysis of the anchors for some of the sites we looked at gave you the wordle above  – you can see how heavy the manipulation is. In raw terms:

Anchor links Count

Anchor links Count

4. footer links work

For a while SEOs have been devaluing the relevance of links in footer or common elements – ummm they seem to work.

5. sitewide links work

Again, we get arguments that the value of sitewide links have been dampened greatly. Not when you are working in volume, as we discovered when we investigated these sites.

6. referrer cloaking still works

I think Neyne demonstrated this pretty well above.

The fact that referrer cloaking works is evident from the fact that the hacked sites are ranking even though they serve different content to users coming from Google SERPs

Another spam tactic from the past, still live and well.

Scripting, its an Art

Scripting, its an Art – this one isnt. (this is a tracking script on one of the sites)

7. i need to set up alerts

What really shocked me is that these site owners still haven’t realized that they rank for these keywords. If you suddenly rank for or get traffic from didgy keyphrases, its time to check WTF is going on. Now in the case of user agent redirection, sometimes analytics will not record those visits. But will most certainly show up for high volume impressions if you are signed in with Google Webmaster Tools.  AND they have a malware detection piece on there which is worth looking at once in a while.

8. i need to monitor catch all accounts

Google does try and email those sites that they have flagged up :

Site Compromised

Site Compromised

Site Compromised on All Accounts

Site Compromised on All Accounts

But you need to monitor and even set up catch all email accounts: You can find out if your site has been identified as a site that may host or distribute malicious software (one type of “badware”) by checking the Dashboard in Webmaster Tools. (Note: you need to verify site ownership to see this information.) We also send notices to webmasters of affected sites at the following email addresses for the site:

  • abuse@
  • admin@
  • administrator@
  • contact@
  • info@
  • postmaster@
  • support@
  • webmaster@

9. edu sites need some serious help

As part of the investigation, I had to scan a large number of SERPs for v1@6r@ related keywords. The most common resulting domain extension? That would be the “.edu”.  Google and/or someone else needs to teach these guys how to secure their sites… It’s not hard to spot the volume of hacking – see this simple query.

Or look at this gem:

edu Ranks for Buy that stuff Cheap

edu Ranks for Buy that stuff Cheap

.gov Sites Are FUBARUS Gov Search - Uncle Sam

Another common domain  extension that shows up in the SERPs is the .gov extension.   By the way, did you know google has an old search page that only looks at Government sites? Look what I found through it: http://bit.ly/dOlzKR

SERP Sniffing – A Long Tail Keyword Strategy

The Art Of SERP Sniffing

SERP Sniffing is a technique that has been used by a number of thin affiliates, blackhats and spammers to identify profitable long tail keywords to optimise for. Typically this technique charts thousands of easy pickings across the SERPs to bring in long term, scaleable traffic.  I would like to explore this technique, and demonstrate how it works, especially since I tried it myself to prove that it can and does work.

However, it is necessary to define the long tail and issues associated with it before going on to the technique itself.

What’s the most difficult part of SEO for the long tail? In my opinion there are two parts that make long tail strategies difficult:

1. Identifying Long Tail Keywords

By definition, long tail keywords make up the multitude of variations in any keyword target campaign. This means that there may be any number of strings attached to the original set of keywords / keyphrase to make up a number of sub sets, which could further branch off into sub-subsets. Most of these are independently low volume, but combined make a huge share of related traffic that sites ought to target. However, because of their low search volume, most of these keywords do not show up on most keyword tools.

Keyword Longtail Explained

How do you identify these? Or do you blindly pursue keyword variations by aptly stuffing your target pages with a number of variations that you can think off?

2. Identifying Ranking Opportunities Amongst Long Tail Keywords

So let’s say you have somehow compiled a list of X,000,000 Longtail Keyword variations. Well done. Awesome. Your keyword skills rock. But which ones should you try and target first? How easy is it to rank for these? As a rule, long tails tend to be easier to rank for, but assuredly, you won’t be the only person trying to get those rankings either.

How do you work out how quickly you can rank for KW X over KW Y without carrying out detailed SERP scraping exercises to work out some sort of value model in scale?

The Problem In Decision Making

So ideally you want to work on the easy ranking keywords first and then worry about the rest. Or you may want to work on the pot of higher combine traffic value long tails first. Either way you need the data that shows:

  • Ease of Ranking
  • Potential Traffic per LTK (Long Tail Keyword)

Neither one of these are easy to define, nor is there a ready guide where you can grab those numbers from. So how do you go about defining a detailed long tail strategy that is based on “real numbers” with regards to traffic and rankings?

The way to decide would obviously require real figures, real potential, in order to define priority. After all isnt it about profitability? Time is money and all that? Can you really waste time chasing after rankings that dont actually have traffic potential?

Show Me the Money

The Spammers Guide To SERP Sniffing

Warning: This is a HIGH risk strategy that may get you banned, and I don’t actually advise it. The following technique is for educational purposes only, and I do not condone Search Engine Spamming.

As I have discussed in my previous posts on Black Hat SEO, and SEO Automation, there are some industrial level methods to drive 1000’s of rankings fairly easily. This is easier when targeting the Long and Very Long Tail traffic, however the strategies aren’t sustainable as they are prone to creating “Burn Sites” which may gain short term rankings but not long term sustainability. This is because Google algo does recognize such sites and penalizes them, or “deprioritises” them in the SERPs.

However, short term rankings and traffic are great too. Not for sustainable businesses, but for research for sustainable businesses. Imagine if you raked in all the relevant data that these “burn” sites gave you? Then used them on legitimate sites? Thats what SERP Sniffing is.

Utilising Gray / Blackhat techniques to research SERP weaknesses so as to exploit them for Whitehat Purposes.

So How Does It Work?

Well to start off with, take your Sets of Two Word Phrases. Categorise them logically as you would in the absence of data, into their long tail targets.

In essence you could have:
Phrases:

  • Blue Widgets
  • Red Widgets
  • Pink Widgets

Categories:

  • Phrase + Location
  • Phrase + Review
  • Phrase + Buy
  • Phrase + For Sale
  • Cheap + Phrase
  • Free + Phrase
  • ETC ETC

Now you set up an automated site, where you pull in content in some real volume on the Categories and Sub Categories related to your Keyword Sets as identified above. Make sure you scale the operation such that the posts per category are coming in fast and hard once the site is indexed.

Use a “burn” link network to scale up back links (these only work in the short term and are also easily penalized).

Using your analytics, you should be able to identify keyword combinations that start driving traffic – in my experience such sites die out in periods between 2 – 14 days – and as a result you need to run daily exports of keywords and run ranking reports against those keywords.

Once the site has been burnt, you now have data:

  • The Keywords that drove traffic
  • Identified SERP positions for such keywords.
  • Ease with which positions were garnered.

Real Example

I ran this experiment on a site that could in no way be linked to my main sites, keeping the domain reg, domain ownership, hosting etc all different to anything that could be linked back to me, either via the algo, or manual human review. The site is defunc and the niche which I ran it for is one I dont work in. This was purely an exercise in experimenting. I didnt try to monetise the site.

I picked a Keyword Set of 4 two word combinations, further broken down into 6 subsets, which made my total categores into 24.

Site ran for a total of 15 days from indexation, and started bringing in traffic. See the Traffic Spike:

SERP Sniffing Traffic Spike

The strategy identified over 2,400 keywords that drove traffic to the site in the days it ranked.Think about this. I had an original target of 24 keywords (categories!). Automating these with random, yet related content multiplied the keywords into a data set of 2400. Thats nearly 100 variations per phrase!

Keyword Traffic Rank Breakdown

Cross referencing these keywords vs the SERP rankings showed that over 90% of these rankings were on page one. So I have a pot of 2400 Keywords that drove traffic, with 60% in positions between 1 and 4 that drove traffic to my site in a space of 14 days.

Now dont assume that the traffic that comes in in this method is crap either – check out the page view stats:

Traffic Page Views

Assuming that this is the normal trend of traffic for this category, if I maintained those rankings on a legitimate site, with good quality content, for those target keywords, with pages dedicated to these LTKs, then I stand to gain [(7100/14)*365] 185,107 page views annually. ( I am not disclosing traffic data – sorry!)

Summary

As I demonstrated above, the technique does work. I dont advise it, for obvious reasons. However, compare YOUR Long Tail Keyword research vs this method – are you surprised that Spammers, Blackhats, Feed Affiliates can still profit from the SERPs? your data is based on intelligent, but guesswork. Their data is based on real opportunities. guess who prioritises limited resources better?

Negative SEO – 4 Killer Strategies To Look For

Gutted I couldn’t make it to both the sessions at 10:30, I had to miss Tom Critchlows talk, Advanced Analytics for Affiliates. For those of you who know Tom, you know he really gets analytics, and in fact he recently posted for the google analytics team, no mean feat! I hope to catchup with him later if he is willing to share any tips.

Continue reading “Negative SEO – 4 Killer Strategies To Look For”

Content Farms – The Who, What, Where and Why

The name “Content Farm” kind of describes it perfectly. What a strange concept, isn’t it? Or maybe not. Spammers and BlackHat SEOs have been auto generating low quality content for long tail search engine rankings for a while now. The content farm technique arguably takes this a few steps further by creating better quality (note – still questionable quality), user friendly content for the exact same reason.

Continue reading “Content Farms – The Who, What, Where and Why”

Manipulating Google Suggest Results – An Alternative Theory

Google Suggest is a Reputation Management Nightmare at times. A number of companies have been hit and hurt by results that show up with “Company Name + Scam” for example. The problem with those results is that when users see the suggestion, they are immediately tempted to click on them, as opposed to their original query.

Continue reading “Manipulating Google Suggest Results – An Alternative Theory”

How To Get A Celebrity To Endorse ALL Your Products on Google

You see, I love finding interesting gaps in Google, both Organic and Paid Search results. I haven’t often spoken about Paid search manipulation, although it does exist, from arbitrage to brand manipulation. However I always love it when I spot something new, that could have been an innocent mistake, but could be used to pervert the normal run of PPC ads. And I spotted one today that made the mental clogs whirr.

Continue reading “How To Get A Celebrity To Endorse ALL Your Products on Google”

10 Things You Should Have Learnt from the JC Penney SEO Fiasco

You are probably bored reading about the JC Penney Fiasco. I know I was for a little while. But I couldn’t ignore this opportunity – it isn’t often that you get to see a Brand SEO campaign that is nearly burnt to the ground because of dodgy link building practices. You want a case study? Well it’s all in there. Intrigue, suspicion, politics, dirty PR, etc etc.

Continue reading “10 Things You Should Have Learnt from the JC Penney SEO Fiasco”