I have spent considerable time over the past week going through the backlog of unapproved comments on my blog (yes, which had one post until now). Somewhere upwards of 500 comments, every single one of them spam. Mostly drug related posts with the occasional fake designer shoes or handbags thrown in. Last time it was a spammer trying to post large chunks of misquoted Ender’s Saga anyway with a lone spam link in the middle, which at least proved a little interesting.
- Disallow anonymous posting.
- Use CAPTCHAs and other methods to prevent automated comment spamming.
- Turn on comment moderation.
- Use the “nofollow” attribute for links in the comment field.
- Disallow hyperlinks in comments.
- Block comment pages using robots.txt or meta tags
Many of those will work, but let’s say we want a functioning website without spam and a large administrative overhead. That leaves:
- Use CAPTCHAs and other methods to prevent automated comment spamming
Cool, thanks Google!
The article is four years old, so what are some actual options?
CAPTCHAs are a challenge (typically an image of distorted text) that a human must solve. The most popular of these is Google’s reCAPTCHA which not only acts as a screening tool, but in the process helps to digitise books (more details). Today for the first time I also saw what looked like a Google Street View image, so they may be branching out.
These aren’t without their flaws. Most implementations are being cracked by spam bots now. They also are not very accessible to certain groups of users. Even to the unimpaired, images that are unreadable are a semi-regular occurrence. But mostly I prefer to avoid them because they aren’t a good user experience. There has to be a better way than demanding everyone prove their humanity.
Blacklists / Whitelists
One of the oldest ways of dealing with spammers is to start blacklisting them. Most commonly by IP or by keywords. Certainly it seems that blacklisting any “enhancement” drugs will stop a significant portion of spam. But that quickly turns into a game of whack-a-mole and puts the burden on the site moderators to handle matters.
Whitelisting on the other hand is a lot safer. However it also means plenty of work allowing posts instead. Either method will ultimately end up being time consuming to manage and prone to failure. So why not automate the screening process?
A number of services are available now that filter comments on behalf of site owners. For WordPress Akismet is the weapon of choice, and even comes installed by default on new WordPress installs. The creator of Drupal has created Mollom for the same purpose, with plugins for Drupal (of course) as well as many other systems such as WordPress and SilverStripe.The downside is that outsourcing the filtering means less visibility of false positives.
Alternatively, you can use CloudFlare CDN. Not only will it speed up your site, but the security system it uses also attempts to block comment spammers from your site. This has the advantage of being implemented independently from your website, so there is no need to install modules or connect to an API.
This is my preferred choice at the moment, although my decision to use the lowest scoring threshold to block spam wasn’t the best in hindsight. Hopefully I can spend less time now clicking Delete and more doing useful things. Like writing blog posts?
- WordPress Codex: Combating Spam
- Hacker News posts on comment spammers – includes a post by a former automated screening company employee
Spam picture by Zell Faze