Blocking Semalt from Behind a Shared Cache

I’ve written before about blocking semalt.com and other referrer spam. In most ordinary cases, the couple lines added to the .htaccess file suggested in the previous post will do the trick. However, I’ve come across a situation where blocking sites like this using .htaccess doesn’t work. Specifically, it doesn’t work when you’re running certain kinds of caching.

Why? Because when you’re running behind a caching layer, pages are served from the cache instead of by the Apache webserver. Unless semalt happens to be the first thing to hit your index page after a cache purge, the cache just serves the page it has in memory. The .htaccess file is never consulted about any of this. But when the page loads, even from memory, semalt gets tagged as the referrer when your WordPress (or whatever) script loads. Tilt!

This isn’t really a problem if you’re running on your own VPS or dedicated server. You just move the referrer check from .htaccess out to the Varnish config. If you’re running your own VPS or dedicated server and have access to your etc/varnish/default.vcl file, there’s a pretty easy how-to for getting this done on Omninoggin.

But, if you happen to be in a shared hosting environment and don’t have access to the Varnish config, the tactic of last resort is to block the offending sites within WordPress itself. It’s less than ideal, because you’re actually having to process the spam requests within the PHP environment, and that takes more server resources than blocking it farther up the line. (But, if you’re in a shared hosting environment, you’ve already made the decision to trade off some of the ideal for cheaper hosting. Shared hosting with any site caching is about as primo as shared hosting gets, so you can’t complain too much.)

To minimize the resources used by these unwanted page requests, the trick is to recognize and deal with them as early as possible in the WordPress page-load cycle. With that in mind, I’ve written a quick and dirty plug-in, based on a couple lines of code provided to me by Ivan Yordanov, one of the tech support gurus at my awesome host, SiteGround.

It’s a “Must Use” plugin. That’s different than the usual WordPress plug-ins that you download from the plug-in repository on wordpress.org. You have to install it manually. I’ve coded it this way because MU plugins load before any other plugins or theme files, and can be hooked earlier in the page-load cycle, before any other output happens and before WordPress starts building the page.

This is offered on a “use at your own risk/your mileage may vary” basis. To use it, copy it and save it in a file called referrer-blocker.php (or pick your own name, so long as it’s extension is .php). Save the file in your wp-content/mu-plugins/ directory. (You may have to make that directory, since it’s not part of the usual WordPress install.)

<?php

/**
 *
 * Referrer Blocker mu-plugin
 *
 * @author Caspar Green <http://iCasparWebDevelopment.com>
 * License: GNU 2.0 or later
 *          Creative Commons 4.0 Attribution-ShareAlike
 * Use at your own Risk. Your mileage may vary.
 *
 * This plug-in is for use in special cases where WordPress is operating
 * behind a cache (such as Varnish or Google Pagespeed) in a shared hosting
 * environment, where pages are served from the cache rather than from Apache
 * and where the shared hosting environment restricts access to the cache configuration.
 *
 * This plug-in is meant for use as a mu-plugin (see http://codex.wordpress.org/Must_Use_Plugins).
 * It must be manually saved to the wp-content/mu-plugins/ folder. (Most users will need to
 * create this folder, since it isn't installed with a default WP install.) Once installed,
 * it cannot be deactivated or uninstalled from WordPress admin.
 *
 * Hat-tip to Ivan Yordanov @SiteGround Hosting (http://siteground.com) for pointing out this work-around.
 * The idea for how to use php to block referrers behind the cache is his;
 * I just turned it into a plugin.
 *
 */

add_action( 'muplugins_loaded', 'iC_referer_blocker' , 1 ); // We want this to run RIGHT NOW!

function iC_referer_blocker() {

    // Add the domain names of referrers you want to block here.
    // Remember, the longer the list the more time it takes to check!
    $bad_referrers = array(
        'semalt.com',
        'buttons-for-website.com'
        );

    $ref = getenv( "HTTP_REFERER" );

    foreach ( $bad_referrers as $referer ) {
        if ( ( strpos( $ref, $referer ) ) !== false ) {
            header( 'Location: http://' . $ref ); // Send them back where they came from.
        }
    }
} // end iC_referer_blocker()