FRAMINGHAM (10/09/2003) - Spam in the wild
Your test of anti-spam tools states: "Estimates of the amount of unwanted e-mail range from 40 percent to 75 percent, but we can give you an exact percentage - 69 percent. That's how much spam we saw during the month of June." There are a lot of unknowns here. What e-mail addresses were included? Are these publicly advertised addresses? Was any blocking of known open Simple Mail Transfer Protocol (SMTP) relays done?
Spouting numbers like this just leads to more uninformed discussion of spam. Blocking known open SMTP relays combined with "safe e-mailing" (not giving my e-mail address to entities unknown), have kept my quantity of spam down.
Information Management Group
Albany, New York
You picked a poor statistic to reflect false positives in your story on spam filters. The caption in the table on page 40 describes the data as exactly what we need to know about false positives: "The percentage of non-spam messages that are marked as spam." However, in the box on page 44, we find that the numbers in the table are not as just defined. This box tells us that you have defined false positives to be the complement of positive predictive value (PPV), that is, the percentage of messages marked as spam that are not spam.
There are two reasons why this is a poor choice of definition for the false positive rate:
First, what people want to know is: (a) what percentage of spam is blocked; and (b) what percentage of legitimate e-mail is wrongly blocked. You got (a) right, but you chose the wrong statistic for (b). The right statistic for (b) would be exactly what you described in the page 40 table caption, namely, "the percentage of non-spam messages that are marked as spam."
Second, the statistic you chose for false positives is meaningless without knowing the fraction of the test messages that are spam. Suppose, for example, that your bank of messages consisted of 1,000 legitimate messages and one item of spam. Suppose further, that the filter caught the one spam message and one of the thousand legitimate messages. Your false positive rate would be 50 percent, rather than the 0.1 percent rate that most would deem appropriate. Defining the false positive rate as the percentage of non-spam messages that are marked as spam would give you a measure that is independent of the spam fraction in your test data.
Professor and chair, Physics Department
Wake Forest University
Winston-Salem, North Carolina
Clarification was sufficient
In his letter to the editor "Clarifying AT&T's clarification", Ohio State University Professor of Economics Russell Olsen writes that "AT&T can mark down its capital infrastructure to market value" in its effort to compete with MCI. This is Ivory Tower nonsense. AT&T's biggest competitor, MCI, writes off billions of dollars in debt, eliminates its current stockholders by extinguishing shares and gives new shares to debt holders. In the process, MCI eliminates millions of dollars in monthly expenses associated with those debts. To compete, Olsen suggests, AT&T simply can write off goodwill and mark down asset values. Take a walk down the hall, sir, and spend some time with your accounting colleagues. I really think you're missing something.