From Wall Street Journal, 9 February 2004
By TIM HANRAHAN AND JASON FR
Spammers, Human Mind Do Battle Over Spelling
Does the human mind have a yen for the kooky misspellings of contemporary spam? Antispam software maker Commtouch had two pieces of bad news last week. First -- no surprise -- the federal Can-Spam law hasn't done much to put a lid on spam. Less than 2% of all junk mail in the first month of 2004 complied with the law, Commtouch reported. Second, the company said, spammers are taking advantage of a quirk in the way people understand written language to sneak nonsense words past content-based spam filters (Commtouch sells a different kind of technology). Its evidence?This passage:
Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and the lsat ltteer is at the rghit pclae. … Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.
Pretty convincing, eh? It could explain why spammers think their messages are still effective even when scrambled as vairga or szizlnig pron.
Or not. First, some background: Spammers are resorting to worse and worse spelling to avoid filters, in a sort of Darwinian coping mechanism. George Johnson argued in the New York Times that this trend would eventually render spam unreadable and ineffectual – and the increasingly bad spelling we see in our inboxes is a sign that antispam efforts are working. (We'd equate this to a male animal that grows hairier and uglier to deal with a new threat, but then realizes no female wants to mate with it.) But could people's ability to understand spam gibberish, as shown in the "Elingsh uinervtisy" passage, slow efforts to quickly stamp it out?
Well, not really, for a couple reasons. First, the passage Commtouch quotes isn't entirely what it seems, nor is it terribly new. We first saw the message when a reader sent it our way last fall, and mentions of the "English meme" show up in various U.S. and U.K. publications starting last September. To sort this out, we asked Matt Davis, who has an in-depth Web page dedicated to the scramble meme, about the passage.
Mr. Davis, a scientist at the cognition and brain sciences unit at the U.K.'s Medical Research Council, says that the e-mail reflects things that researchers have known for a long time. (He notes the controversy over the "F.C.U.K." initials of clothier French Connection U.K.) However, the scrambled passages passed around the Internet are "surprisingly easy to read because the authors of these things are making sure they're easy to read." That is, there's a lot of truth to the e-mail – that the human brain can understand scrambled passages – but there's also been some gaming of this particular passage. He notes that partially scrambling words but leaving the right sounds in the same places can help readability, as can using short words rather than long words.
For comparison, try this more-scrambled BBC passage from Mr. Davis's Web site: A dootcr has aimttded the magltheuansr of a tageene ceacnr pintaet who deid aetfr a hatospil durg blendur.
In any case, who should get credit for the original scramble findings? Mr. Davis tracked the first reference to an unpublished Ph.D. thesis from the 1970s, "The Significance of Letter Position in Word Recognition," and says the issue came back into circulation in 1999, when the author, Graham Rawlinson, sent a letter to New Scientist that mentioned his earlier research. Mr. Rawlinson, a psychologist, consultant and writer, says via e-mail that it's hard to say exactly why the general gist of his research has became an Internet hit 27 years later. One reason, he says, "may be that people are forever being told about their limited capabilities ... whereas this demonstrates ... how powerful the brain is all by itself."
Meanwhile, just because people can read scrambled or altered words doesn't mean we don't identify them as such, or that filters can't too -- meaning no real reprieve for spammers. It may take a little longer to catch up with all the possible spellings of Viagra, mixed with other spammer substitution and homonym-like tricks -- vagria, vi@gr@, vyagrah, etc. -- but in the end, few text-only spams will get though filters, and those that do will be a bloody mess.
Is your e-mail getting harder to read? Been impressed by an unexpected mental skill? Write to email@example.com, and we'll post selected comments this Thursday. If you want to share your thoughts but don't want your letter published, please make that clear. Something rotten in your inbox? Tell us about it. Also, see more about junk e-mail, including previous Spams of the Week, at wsj.com/junkmail.Write to Tim Hanrahan and Jason Fry at firstname.lastname@example.org Updated February 9, 2004