http:// www.jms1.net / httpbl.shtml

http:BL Sample Code

I hate spammers. And I hate people who help the spammers, whether it's providing hosting for their content or emails, providing or shipping their products, or Windows users who don't secure their machines and end up providing web or DNS hosting, or actually sending the spam, without even realizing it. Worst of all, I hate the IDIOTS who actually BUY things from the spammers, because they're the ones who keep the spammers in business- if nobody were buying, it wouldn't be worth their time to send spam.

One of the tricks used by spammers is to "harvest" email addresses off of web pages. For example, my spamcop address is at the bottom of this page, and as a result, that address gets anywhere from 80 to 200 messages per day- and out of that total, on a good day, FIVE messages might be legitimate. The rest of the messages are all spam.

Another thing I see a lot of, is on web sites with "guest book" or other forms which email data to other places, spammers will submit bogus data to either try and trick the form into sending their spam for them, or just submit the spam message into the form itself, hoping the human who ends up reading the message will see their stuff, and maybe be interested.

In many cases, spammers will run both "harvesters" and form-spammers, but not from the same IP addresses. For example, they may use a zombie Windows machine on a cable modem in California to run a harvester, try any forms it finds from a machine in China, and control the whole thing from an apartment in Romania. And while it's easy to catch the IPs which are sending the spam, it's not so easy to catch the IPs which are harvesting- because they're just looking at your web pages like anybody else.

Project Honeypot

There is an anti-spam project out there called Project Honeypot. If you own any domains, or have a web site which can host a script, I strongly encourage you to read their web site and, if possible, contribute by hosting a honeypot page and/or donating one or more MX records (I'm hosting a honeypot page and have donated multiple MX records, since I own multiple domains.)

Project Honeypot consists of a network of web pages which contain one or more made-up email addresses, in the hope that the harvesters will find them and try to send spam to them. Of course, the email addresses themselves are not, and have never been valid, and in fact did not exist until the honeypot generated it, so the only thing which would ever be sent to it must be spam.

The way it works is this- the owner of a web site, like myself, installs a script on the site. When a harvester triggers the script (by visiting its URL) the script sends the harvester's IP address to Project Honeypot's server, which then generates a unique random email address, stores it in a database with the harvester's IP, and returns that email address and the URL for another honeypot to the server. The script then shows the harvester a page which contains a legal notice barring them from harvesting email addresses, along with links to the email address and URL returned from Project Honeypot's server. If the harvester follows links, it will then be directed to another honeypot script on some other server, get another bogus email address for their database, and be directed to yet another honeypot script (and so on, and so on, until the harvester decides to stop.)

The email addresses all use domains which are not otherwise used for any legitimate email, and which are handled by email servers controlled by Project Honeypot. When one of these email addresses receives spam, the IP of the original harvester is looked up in their database and added to a public list of known "harvester" IPs.

This list is known as the HTTP Blacklist, or "http:BL".


Using http:BL

There are several ways to make use of the http:BL. The most common is probably the Apache mod_httpbl module, which can be configured to block listed IPs from your web sites. There are also plug-ins for several commonly used CMS systems, such as Drupal, WordPress, Joomla, and phpBB. These modules are listed on this page.

However, you may wish to have finer-grained control over what to do with harvesters. You may wish to show them a different version of the page, for example- one with different text (i.e. "Dear spammer, go away kthxdie!") or whose form submits to a non-functional URL so you never get bothered with their junk.

Below is an explanation of how to write this kind of check into your own scripts. The sample code is written in Perl, because that's the language I normally use for things like this, however I will try and explain how it works clearly enough that you can write the equivalent code in any language you like.

Looking up an IP on the list

Looking up IPs on the http:BL is similar to looking up IPs on other blacklists- it involves reversing the IP, adding a DNS suffix, and checking whether or not the resulting name exists.

Using the http:BL requires you to register with Project Honeypot and request an Access Key. This allows them to track how many requests are coming in from each user, as well as how many different users are seeing traffic from each harvester. The keys are 12 characters in length, on this page I will use "keykeykeykey" as a sample key (rather than sharing my key with the entire world.)

For example, a script which I wrote for a client was accessed early this morning from 201.229.208.2, which IS listed in the http:BL. Using this as an example, you can check whether a given IP is on the list by doing a DNS query like this:

$ nslookup keykeykeykey.2.208.229.201.dnsbl.httpbl.org
Server:         192.168.1.30
Address:        192.168.1.30#53

Non-authoritative answer:
Name:    keykeykeykey.2.208.229.201.dnsbl.httpbl.org
Address: 127.1.55.7

Interpreting the results

As you can see, we reversed the octets (eight-bit parts) of the IP address, added our key and a "." to the beginning, and ".dnsbl.httpbl.org" to the end. It did return an answer, which means that it IS listed. The answer itself tells us several things:

The full list of the possible values can be found in Project Honeypot's http:BL API documentation.

An IP which is not listed, will have an NXDOMAIN result from the DNS check (i.e. "name does not exist".) For example, if we look up 208.111.3.163, my web server's IP...

$ nslookup keykeykeykey.163.3.111.208.dnsbl.httpbl.org
Server:         192.168.1.30
Address:        192.168.1.30#53

** server can't find keykeykeykey.163.3.111.208.dnsbl.httpbl.org: NXDOMAIN


Writing code

Now that we see the basic method for checking an IP's status, the next step is to turn that into code. Again, I will be using Perl, however it should be fairly simple for any competent programmer to write the equivalent code in any other language they are familiar with.

The core of the process is a DNS lookup. Perl supports the same gethostbyname() function available in C, and it returns the same binary structure that the C function returns- which means the IP addresses are returned as four bytes, rather than a "xx.xx.xx.xx" string which is easier to use. The Socket module contains the function inet_ntoa(), which converts the binary format to a usable string. Our script will need to include this module. These "use" statements are normally done at the beginning of the script.

use Socket ;

I normally take any items which might need to be configured by the user, or which might change in the future, and make them global variables at the beginning of the script. In this case, that means our http:BL key (which anybody using this code will need to customize) and the DNS zone name within which we will be searching (which probably won't change, but you never know what the future holds.)

my $httpbl_key = "keykeykeykey" ;
my $httpbl_zone = "dnsbl.httpbl.org" ;

With these pieces in place, the function to check whether a given IP address is on the list or not, looks like this. It returns 1 if the IP is listed (and not a search engine) or 0 if not.

sub httpbl_check($) { my $ip = ( shift || return 0 ) ; ######################################## # build the name my $rev_ip = join ( "." , reverse split ( /\./ , $ip ) ) ; my $name = "$httpbl_key.$rev_ip.$httpbl_zone" ; ######################################## # query the name my @a = gethostbyname ( $name ) ; unless ( $#a > 3 ) { print STDERR "httpbl allow $ip (empty result)\n" ; return 0 ; } @a = map { inet_ntoa($_) } @a[ 4 .. $#a ] ; ######################################## # split into fields my ( undef , $days , $threat , $type ) = split ( /\./ , $a[0] ) ; ######################################## # search engines (type=0) are okay unless ( $type & 7 ) { print STDERR "httpbl allow $ip -> $a[0] days=$days" . " threat=$threat type=$type\n" ; return 0 ; } ######################################## # others, not so much. print STDERR "httpbl deny $ip -> $a[0] days=$days" . " threat=$threat type=$type\n" ; return 1 ; }


Using the function

Obviously, how you choose to use the httpbl_check() function within your own code is up to you. Usually the procedure looks something like this:

########################################
# get the client's IP

my $ip = ( $ENV{"REMOTE_ADDR"} || "" ) ;

########################################
# if client's IP is listed on http:BL, don't send the message

if ( $ip )
{
        if ( httpbl_check ( $ip ) )
        {
                print <<EOF ;
Content-type: text/plain

You are not allowed to use this form, because your IP address is known to be
involved in one or more spamming operations. Do the world a favour and get a
real job, spammer.
EOF

                exit 0 ;
        }
}

########################################
# the IP was not on the list
# do further checks and then send the message


Security Notes

Of course, the form handler script should rely on this function as its only check. It should also check the form data to make sure this isn't a spammer using a previously unknown IP to try and send spam. The safest approach is to assume that the data you receive in the form is always being supplied by a spammer, hacker, or other attacker, and that you can't trust any of it until you have verified that it doesn't contain anything harmful. For example...

When I'm writing code which involves the Internet, I always try to be as paranoid as I can about the people who will be using the application- especially if it's a web site. I constantly look for ways that the program can be broken, not only from a security standpoint, but also in terms of just bugs in general. I always keep two rules in mind: