The Classifier

Apr 25, 2008 in Uncategorized

Login to the classifier

If you wish to take part in the classification process you will need to have permission and the ability to redirect your email domain to the classifier’s MX servers. You will also need your own account in the classifier. To get an account you should email Matthew Sullivan

When you have an active account you can add email addresses to the classifier that you will use for classification. Any email sent to the particular email address(es) will be stored in the web interface and in the HOLD queue of the mail server. You will be able to see the messages in the web interface where you will be able to classify them as either spam or not spam (ham). As soon as you classify the messages the system will release the mail server stored message which will result in final delivery to your server, and your normal email client.

The preference system in the classifier works at various different levels depending on your access rights. The most basic right ‘user’ will only allow you to classify mail and update your details. The ‘Domain Admin’ role will allow you to update the email addresses in use, and allow you to assign email addresses to others.


About Matthew…

Oct 21, 2007 in Uncategorized

Matthew

I’m 38 years old and have been working in the IT industry since 1991 where I started out by building IBM XT compatibles. In 1994/5 I joined forces with a local shop owner from my home town and became known for my ability to fix personal computers right down to the individual component level.

In 1997, I decided a small town was too small for me and that I was not going to get anywhere so I moved from one side of England to the other and took a position as a test engineer at 3Com in Cirencester. (For those with a sense of humor, my younger brother called my position a “Professional Idiot” as my job was to find new and exciting ways of breaking things.)

In early 1998 3Com decided to move the development and production of the Office Connect series router to other locations and close the Cirencester office, so I was made redundant. I moved to London, and took a position in the Morse Group, 6 months later I had had enough of the holier that thou attitude of the management, and accepted a position at Netscape Communications.

Netscape Communications really caused me to grow up, sharp and fast, already 29 years old I was arrogant, eager, and a complete know it all, little wonder I didn’t have many friends. Anyhow lots of fun ensued, decent salary, decent house to live in, resulted in a nice car (initially a Fiat X19, then a Toyota MR2) and of course this meant that trips back to my home town to see my son were possible. It was also possible for me to do stupid things like drive 200 miles to go clubbing and drive back in the same night which I did on occasion.

In 1999 on one of my impromptu nightclub trips to my home town I met Ally, who after 2 weeks, I proposed to. On February 29th, 2000 we married in Gretna Green, Scotland, and my life changed forever. Not long after the wedding my father died of prostate cancer, and Ally and I chose to emigrate to Australia, in July 2000 we landed in Melbourne.

My employment continued for Netscape Communication until November 2001 when I was made redundant at the same time as AOL closed the Australian office. My next position was at the University of Queensland as a ‘Specialist Systems Programmer’ where I suggested that the University put resources into the publicly accessible and maintained anti-spam resources. The University suggested it would not be appropriate and would be quite difficult. At around the same time, some of the University’s machines were listed as spam sources causing mail to be blocked, upon investigation it turned out to be a listing that was created because of an open proxy server elsewhere in the network. I posted to NANAE and managed to upset ‘RFG’, otherwise known as, Ronald F. Guilmette who I later found out often is easily upset at first contact. I am not known for subtlety so a clash was inevitable. Anyway to cut a long story short, I decided that the only way to argue with some of these people was to ’show them how to do it properly’ and therefore set myself a challenge. A few months later I had persuaded The University of Queensland to host my project and the SORBS daemon was created, the SORBS DNSbl following shortly there after.

So now you know, I’m an arrogant SoB and if I don’t don’t think someone is doing a good job and I am unable to pursuade them to do a good job I will set a challenge in myself to work at providing something better…. and that is the reason for this site.

More on the reason for this site…

An individual who used to be a whitehat promotes himself as the ‘DNSBL Resource‘ however it has become increasingly obvious, despite his protestation, that he is far from whitehat any more. An example, and very obvious correlation, is that I take the view spammers are thieves, they steal our resources for their own profit, this is the reason we fight spam. In a similar way the individual concerned decided it would suit his purpose to take a page that is copyright by myself and post it on his own site, without my permission, because I had previously removed the page from my site. I spoke to the ISP about the page, and the ISP told the individual to remove the page or they would. Now at this point any whitehat would have left it at that, however this individual chose instead to create a page on Google Pages and post the same content to that site with a disclaimer indicating he was exercising ‘Fair Use’. Not the actions of a whitehat. Contacting Google, they indicated they wouldn’t do anything without a DMCA takedown notice, so I issued one, the page is still there, as Google has offices in Australia, my next stop should be to issue a full law suit with damages claim against the Individual and Google here in Australia. The Berne Convention will ensure he doesn’t escape justice, such a sad thing that it has had to come to this, but rights must be preserved.

An amusing little update, after all the protestations of the same individual, here seems he’s actually got a similar problem of people stealing copyright works of his, the irony of it all.

Anyhow enough of the distasteful s**t, the other part of the other site explains a very dubious methodology which Iverson of ‘DNSBL Resource‘ uses. Dubious you may ask? Well that’s the best fitting word, the methodology basically boils down to him setting up a number of email addresses which he ’seeds’ into spam lists, and then he subscribes other email addresses to a number of mailing lists and requests messages from other sources. His definition is therefore anything he has requested is “ham” anything sent to him without a request is “spam”.

For the astute amongst you, you will realize that this method of creating ‘ham’ is fraught with significant danger of errors. The creation of ’spam’ on the other hand is quite reasonable but will by no means be comprehensive.

The dangers of using subscribed mail for ‘ham’…

‘ham’ that is based on mailing list subscriptions is dangerous as I explained elsewhere because companies that operate CAN-SPAM complaint marketing practices, which do not comply with the Australian Spam Act 2003 (eg anyone not confirming the opt-in process, and not having a prior business relationship) will be listed as a spammer by some organisations, and yet will likely be listed as a valid ‘ham’ source. This issue will very obviously cause skewed and inaccurate reporting.

Further issues in “subscription ham” is the diversity of source data, it cannot be very diverse because it’s very nature is that it is mailing list traffic, and mailing list traffic is not typical email, but a small subset. As an example you can prove this to yourself by thinking about how much of your email comes from mailing lists and opt-out email sources, and how much comes from your work, your partner, your family and your friends. My mailbox is not atypical and consists of 70-80% of mailing list mail. My wife’s email box is more typical, she has less than 1% mailing list messages. Friends and colleagues of mine are more the average IT user, and have between 10% and 20% of their mail from mailing lists. Note: All percentages given are after spam removal.

The solution to this problem is quite a simple one in theory, and yet could prove to be more difficult than it appears, this site is my attempt at fixing the problem, and my attempt at providing some real and unbiased stats.

Thanks for listening,

Matthew


Statistics

Oct 21, 2007 in Uncategorized

The following DNS based lists are currently reviewed:

FP TP FN TN List Configuration Used Webpage
AHBL dnsbl.ahbl.org http://www.ahbl.org/
CBL cbl.abuseat.org http://cbl.abuseat.org/
DSBL lists.dsbl.org http://dsbl.org/
Five Ten blackholes.five-ten-sg.com http://www.five-ten-sg.com/
NJABL dnsbl.njabl.org http://www.njabl.org/
PSBL psbl.surriel.com http://psbl.surriel.com/
SORBS dnsbl.sorbs.net http://www.sorbs.net/
SORBS safe.dnsbl.sorbs.net http://www.sorbs.net/
Spamcop bl.spamcop.net http://www.spamcop.net/
Spamhaus Zen zen.spamhaus.org http://www.spamhaus.org/
Spamhaus XBL xbl.spamhaus.org http://www.spamhaus.org/
UBL ubl.unsubscore.com http://www.lashback.com/
UCEPROTECT Level 1 dnsbl-1.uceprotect.net http://www.uceprotect.net/
UCEPROTECT Level 2 dnsbl-2.uceprotect.net http://www.uceprotect.net/
UCEPROTECT Level 3 dnsbl-3.uceprotect.net http://www.uceprotect.net/
VIRBL virbl.dnsbl.bit.nl http://virbl.bit.nl/
WPBL db.wpbl.info http://www.wpbl.info/

To view statistics for a particular list, click its name.

Column Definitions:

FP Daily change in the number of false positives (real mail that would be blocked as spam)
TP Daily change in the number of true positives (spam that would be successfully blocked)
FP Daily change in the number of false negatives (spam that would not be blocked)
FP Daily change in the number of true negatives (real email that would not be blocked)

How it works

Oct 21, 2007 in Uncategorized

The statistics generator is fully automatic.  It is based on user input about what is spam and what is not. For ’spam’ and ‘ham’ distinction, we have a collection of people who contribute time to classify every message they receive as either. The Message-ID and time stamp are combined to ensure each entry in the database is unique. This key is then used to store the IP address, user ID, results and classification for inclusion in the graphs. We are using this method because of a number of distinct advantages and few disadvantages. The advantages are:

  • No individual can affect a major change in the statistics generated without being noticed, and their results being discarded.
  • Statistics can be recorded from around the world from a variety of different people, places and persuasions.
  • As the checks are made at reception time, not delivery time, it will give a true view about what is actually seen as the performance of the list, not some 10+ minutes after the fact.

The disadvantages with this approach are:

  • DNS based lists cannot be added and historical information for those lists be given.
  • Private lists that only allow zone transfers (ie. no public servers) cannot be used without disadvantaging them.
  • Lists which give priority to subscribed users may be disadvantaged.
  • A collection of people have to devote time to sorting the spam from the ham.
  • The statistics server(s) have to be the primary MX server for the various addresses in use to ensure overall integrity.

Lastly we will be providing details to the DNS based list operators of false positives on request and we will be providing them with a method to comment on the statistics and any errors.

We will endeavor to be completely transparent to you, the reader, and to the list operators.

If you want to contribute to the statistics and can redirect your domain to go via the DNSBL Report mail servers, please contact Matthew. Note: contributing to the statistics is not something to be taken lightly, no messages are rejected at the MX and you must mark each message as spam or not spam within the interface. If you don’t mark the messages the domain will be automatically suspended after 30 days and all mail will be rejected rather than forwarded to your destination of choice.


About

Oct 20, 2007 in Uncategorized

This site has been created for a variety of reasons, first and foremost to provide objective and unbiased information about how well DNS based lists work for spam prevention. This will include statistics collected by many and presented in a clear concise manner, as well as providing the ability for the administrators of the lists to provide commentary as to why their list perform well (or don’t) in the tests.

The second reason is to provide information about the various DNS based lists with respect to using them and how not to use them.

The third reason is to provide feedback to the various DNS based lists to ensure that if they wish they can improve their lists, whilst retaining the integrity of these statistics.

For detailed information about how the statistics are compiled, please see the ‘How it works‘ page.Note: This site has been created by Matthew Sullivan of SORBS. You should read his blog about why, how, and other information before making any judgements about the data or whether there is any bias.


DNSBl Report, an introduction.

Oct 20, 2007 in Uncategorized

This site has been created for a variety of reasons, first and foremost to provide objective and unbiased information about how well DNS based lists work for spam prevention. This will include statistics collected by many and presented in a clear concise manner, as well as providing the ability for the administrators of the lists to provide commentary as to why their list perform well (or don’t) in the tests.

The second reason is to provide information about the various DNS based lists with respect to using them and how not to use them.

The third reason is to provide feedback to the various DNS based lists to ensure that if they wish they can improve their lists, whilst retaining the integrity of these statistics.

Note: This site has been created by Matthew Sullivan of SORBS. You should read his blog about why, how, and other information before making any judgements.