Spam: Kill it at the root

Matt and I were chatting briefly earlier today about blog comment spam, and about an article that Mark Pilgrim posted on his site. Mark is pretty pessimistic about the outlook for anti-comment-spam efforts. And he points out the great lengths to which spammers will go in order to get their spams out to where eyeballs will see them. Both usenet and email are awash with the refuse of spammers, because for some reason, there are people out there who respond to their messages.

However, I, like Sam Ruby, believe that the outlook isn’t quite as grim for blogs as it has been for usenet and email. Blogs are fundamentally different in how they work and in how people use them.

It was easy for spammers to abuse usenet, because it is a store-and-forward system designed specifically for distributing messages to a large number of leaf nodes. You post a message into a newsgroup, and your message is replicated around the world, on thousands of other usenet servers, viewed by millions of users. Clients and servers speak a simple protocol, NNTP, and though most servers these days require authentication before you can post a message, there are still open servers to be found, if you know how to look.

Likewise, email is a store-and-forward system, with a method of relaying messages from one server to another. Unlike usenet, email is designed to deliver messages directly to a particluar person, which makes it slightly less efficient for reaching large numbers of users. But only slightly, because it’s quite easy to specify multiple recipients for a message. So the spammer can deliver a single message to a server, and instruct the server to deliver it to hundreds of different people. And again, many modern servers attempt to prevent unauthorized users from relaying messages, but there are still plenty of vulnerable servers on the internet that the spammers can find.

The relative ease and extreme low cost of reaching large numbers of consumers have made usenet and email into juicy targets for spammers. Low-hanging fruit, if you will. Spammers have also targeted other services, such as instant messaging services, pagers, cell phones, and more recently, blogs.

Obviously, I have a great interest in the problem of spam on blogs. I run multiple personal blogs, and I’m a developer for the WordPress blogging software. Spammers have hit my blogs on multiple occassions recently, and I delete the spam comments as soon as I become aware of them. But automating the process to some degree would certainly allow me to relax more. Fortunately, blogs are fundamentally different from usenet and email in several ways, and I think that these differences may allow us to stop this spam problem before it really gets bad. There’s still a chance for us to kill this weed at the root, before it has a chance to grow.

Why do spammers post comments on blogs? Do they really expect blog owners to leave their comments up for people to read? Do they expect people to follow their links to the spammers’ sites selling viagra and porn? Not really. What they are really after is search engine ranking. Because the more links to their sites that a search engine sees, the better the chance that their site will come up in a list of search results. Even if you’ve deleted the original comment, it won’t matter, if a search engine has already indexed the links in the comment before you deleted it. So, one of our anti-spam tactics should be to bust up the search engine ranking for spammers links.

And once a spammer has given us the link to his page and we see it for what it is, we can store that information in a blacklist, and use it to block future messages that try to link to that site. And if we can make a common, distributed, authenticated way to share such blacklist info, we can be pretty effective at staunching the flow of the spam. And if we can team up now, while this problem is still young, we can basically create an environment where comment spam just isn’t cost-effective for the spammers.