Referral Spammer Hitlist (.htaccess directives included)

Referral spam has been problematic for a long time; a quick search turned up an extensive list of referral spammers from 2009 on Perishable Press.

This thread from the Piwik repo on Github is more current (started in 2014) and mentions many of the sites currently contaminating my analytics reports.

My Referral Spammer Hitlist

Screenshot from Google Analytics Showing Referral Spam

The screenshot above is from Google Analytics (the last 6 months) for a site that I have access to the analytics data of but not the codebase/server (otherwise I’d have remedied the situation already). My significant other’s sister is the creator/founder/owner/president of Megan Lee Designs and, after learning of my vocation, granted me access to her analytics.

While 2.5% of traffic may not seem especially significant, it amounted to 16% (yep, 1 in 6) of their referral traffic! 42 of 143 sources of referral traffic were actually this referral spammer garbage (in the screenshot I included every website that was reported as having sent more than 1 visitor…there are another 35 domains (well, subdomains) that list a single visit. 34 of those are a subdomain of semalt.com (grumble, grumble)

When I first really noticed the problem in my analytics reports (in mid-2014) I came up with a blacklist. I called it a hitlist because, as I told my colleague Charlie, No workplace is complete without a hitlist ;-)

In the spirit of open-source, I give you my current list of directives:

Be forewarned, I’m no back-end developer or SysAdmin so while I can piece together enough RegEx to get simple things done, it is entirely possible that the directives written below are more efficient (and they are certainly more comprehensive).

My logic when I wrote them was simple:

  • I don’t care about protocols (http:// or https://)
  • I don’t care about subdomains (they often use lots of them)
  • I don’t care about case (hence the [NC] No Case [sensitivity] flag

If the domain listed as a referrer contains that series of characters (that string) I want to the RewriteRule to kill it before it hits my site (and my analytics) hence [F,L] the ‘fatal’ and ‘last’ rules on the re-write.

For anyone who might be of the copy and paste skill level (I myself was until fairly recently) be mindful of the ‘OR’ part of those directives [NC,OR], an [OR] is needed on all but the last RewriteCond, omitting it on prior conditions or adding it to the last condition will likely cause a series 500 error on your server.

A few months ago I addressed the most flagrant perpetrators—namely semalt.com—but suddenly I was getting visits from buttons-for-website.com.

Other Resources for Blocking Referral Spam

I recently tweeted about buttons-for-website.com and got a reply from @hbeckner who wrote a nice article (in German, but it translated well and the illustrations are in English so I had no problems) about it and the motivation behind it. He pointed me to this thread on the WordPress support forums.

In that thread someone shares their .htaccess directives to block these spammers and they list the following directives:

That list is a bit more extensive than mine and I also find it interesting to see how people write the directives in different ways.

I’ll try to keep this post up-to-date for my own use and yours if so inclined.


Data-driven digital marketing on a shoestring budget

A DIY guide to making the most of Google services through data sharing

This post expands on a short presentation entitled ‘Making the most of Google Services through data sharing’ I will be giving at the Web Analytics Wednesday Meetup in Columbus, Ohio on February 19, 2014.

The primary reasons I use Google services are low cost and staggering reach.

Most of the tools mentioned are free and the others can be setup—initially, at least—for less than $100 (their current evergreen AdWords introductory offer is ‘Spend $25 and get an additional $75 credit’).

In the presentation I touch on utilizing GTM, GA, GWT and AdWords but there are other tools that I regularly use in conjunction with those.

Those include:

  • Google Trends
  • Google Consumer Surveys
  • Google+ Local (aka Google Places)
  • Google+ (‘profiles’ for people, ‘pages’ for businesses)

All of the tools above are free.


Analytics Checklist

These are ordered by the order you will come across them if proceeding top to bottom, left to right (not by importance)

Admin Settings

Google Analytics Admin Interface
  • Account

  • Data Sharing
    Share with other Google products
    AdWords Linking
    Link to AdWords Account
  • Property

  • Link to Webmaster Tools
    Yes
    Enhanced Link Attribution
    Enable
    Session Settings
    Change default: Campaign Tracking 6 months => 24 months
    Organic Search Sources
    Add duckduckgo.com
  • View

  • Timezone
    (set)
    Default Page
    (set if applicable)
    E-Commerce
    Enable if applicable
    (On)Site Search
    Enable if applicable
    Goals
    IMPORTANT:Configure Macro (and possibly Micro) Goals
    Content Grouping, Segments, Custom Alerts, Scheduled Emails, Shortcuts
    As appropriate

Reporting

GA Custom Report Import Gallery

In the past, I have published compilations of links to custom report templates that I found useful so I would have them in a single location for easy reference.

Thankfully, that is no longer needed as Google now offers a gallery of custom reports that can be imported directly

  • Import From Gallery

  • Occam’s Razor Awesomeness
    Creme of the crop from Avinash Kaushik
    New Google Analytics User Starter Bundle
    Grom the Google Analytics Team
    Justin Cutroni’s…everything
    View all by him and import whatever might be relevant

There’s a fair amount of overlap between these (and some like those analyzing the number of keywords in a query are antiquated now that the majority of search is encrypted) so I recommend importing them all and cherry picking the ones that you like the best.


References

Data Layer Specification Draft:

www.w3.org/2013/12/ceddl-201312.pdf

Track Keyword Ranking as an event

cutroni.com/blog/2013/01/14/a-new-method-to-track-keyword-ranking-using-google-analytics/

Rename the Global Object
Renaming the Global Tracker in GTM

developers.google.com/analytics/devguides/collection/analyticsjs/advanced

Prevent data loss with remarketing tag

www.blastam.com/blog/index.php/2013/04/google-analytics-remarketing-tag-concerns-solved/

CRM integration
Universal

groups.google.com/forum/#!topic/google-analytics-data-export-api/l7-b5FoyW5Q

Async

cutroni.com/blog/2009/03/18/updated-integrating-google-analytics-with-a-crm/