Referral Spammer Hitlist (.htaccess directives included)

Referral spam has been problematic for a long time; a quick search turned up an extensive list of referral spammers from 2009 on Perishable Press.

This thread from the Piwik repo on Github is more current (started in 2014) and mentions many of the sites currently contaminating my analytics reports.

My Referral Spammer Hitlist

Screenshot from Google Analytics Showing Referral Spam

The screenshot above is from Google Analytics (the last 6 months) for a site that I have access to the analytics data of but not the codebase/server (otherwise I’d have remedied the situation already). My significant other’s sister is the creator/founder/owner/president of Megan Lee Designs and, after learning of my vocation, granted me access to her analytics.

While 2.5% of traffic may not seem especially significant, it amounted to 16% (yep, 1 in 6) of their referral traffic! 42 of 143 sources of referral traffic were actually this referral spammer garbage (in the screenshot I included every website that was reported as having sent more than 1 visitor…there are another 35 domains (well, subdomains) that list a single visit. 34 of those are a subdomain of semalt.com (grumble, grumble)

When I first really noticed the problem in my analytics reports (in mid-2014) I came up with a blacklist. I called it a hitlist because, as I told my colleague Charlie, No workplace is complete without a hitlist ;-)

In the spirit of open-source, I give you my current list of directives:

Be forewarned, I’m no back-end developer or SysAdmin so while I can piece together enough RegEx to get simple things done, it is entirely possible that the directives written below are more efficient (and they are certainly more comprehensive).

My logic when I wrote them was simple:

  • I don’t care about protocols (http:// or https://)
  • I don’t care about subdomains (they often use lots of them)
  • I don’t care about case (hence the [NC] No Case [sensitivity] flag

If the domain listed as a referrer contains that series of characters (that string) I want to the RewriteRule to kill it before it hits my site (and my analytics) hence [F,L] the ‘fatal’ and ‘last’ rules on the re-write.

For anyone who might be of the copy and paste skill level (I myself was until fairly recently) be mindful of the ‘OR’ part of those directives [NC,OR], an [OR] is needed on all but the last RewriteCond, omitting it on prior conditions or adding it to the last condition will likely cause a series 500 error on your server.

A few months ago I addressed the most flagrant perpetrators—namely semalt.com—but suddenly I was getting visits from buttons-for-website.com.

Other Resources for Blocking Referral Spam

I recently tweeted about buttons-for-website.com and got a reply from @hbeckner who wrote a nice article (in German, but it translated well and the illustrations are in English so I had no problems) about it and the motivation behind it. He pointed me to this thread on the WordPress support forums.

In that thread someone shares their .htaccess directives to block these spammers and they list the following directives:

That list is a bit more extensive than mine and I also find it interesting to see how people write the directives in different ways.

I’ll try to keep this post up-to-date for my own use and yours if so inclined.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">