Spammers, Human Mind
What TiVo's Tracking Technology
Means for Your Average Couch Potato
Does the human mind have a yen for the kooky misspellings of contemporary spam?
Antispam software maker Commtouch had two pieces of bad news last week. First -- no surprise -- the federal Can-Spam law hasn't done much to put a lid on spam. Less than 2% of all junk mail in the first month of 2004 complied with the law, Commtouch reported. Second, the company said, spammers are taking advantage of a quirk in the way people understand written language to sneak nonsense words past content-based spam filters (Commtouch sells a different kind of technology). Its evidence? This passage:
Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and the lsat ltteer is at the rghit pclae. … Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.
Pretty convincing, eh? It could explain why spammers think their messages are still effective even when scrambled as vairga or szizlnig pron.
Or not. First, some background: Spammers are resorting to worse and worse spelling to avoid filters, in a sort of Darwinian coping mechanism. George Johnson argued in the New York Times that this trend would eventually render spam unreadable and ineffectual – and the increasingly bad spelling we see in our inboxes is a sign that antispam efforts are working. (We'd equate this to a male animal that grows hairier and uglier to deal with a new threat, but then realizes no female wants to mate with it.) But could people's ability to understand spam gibberish, as shown in the "Elingsh uinervtisy" passage, slow efforts to quickly stamp it out?
Well, not really, for a couple reasons. First, the passage Commtouch quotes isn't entirely what it seems, nor is it terribly new. We first saw the message when a reader sent it our way last fall, and mentions of the "English meme" show up in various U.S. and U.K. publications starting last September. To sort this out, we asked Matt Davis, who has an in-depth Web page dedicated to the scramble meme, about the passage.
Mr. Davis, a scientist at the cognition and brain sciences unit at the U.K.'s Medical Research Council, says that the e-mail reflects things that researchers have known for a long time. (He notes the controversy over the "F.C.U.K." initials of clothier French Connection U.K.) However, the scrambled passages passed around the Internet are "surprisingly easy to read because the authors of these things are making sure they're easy to read." That is, there's a lot of truth to the e-mail – that the human brain can understand scrambled passages – but there's also been some gaming of this particular passage. He notes that partially scrambling words but leaving the right sounds in the same places can help readability, as can using short words rather than long words.
For comparison, try this more-scrambled BBC passage from Mr. Davis's Web site: A dootcr has aimttded the magltheuansr of a tageene ceacnr pintaet who deid aetfr a hatospil durg blendur.
In any case, who should get credit for the original scramble findings? Mr. Davis tracked the first reference to an unpublished Ph.D. thesis from the 1970s, "The Significance of Letter Position in Word Recognition," and says the issue came back into circulation in 1999, when the author, Graham Rawlinson, sent a letter to New Scientist that mentioned his earlier research. Mr. Rawlinson, a psychologist, consultant and writer, says via e-mail that it's hard to say exactly why the general gist of his research has became an Internet hit 27 years later. One reason, he says, "may be that people are forever being told about their limited capabilities ... whereas this demonstrates ... how powerful the brain is all by itself."
Meanwhile, just because people can read scrambled or altered words doesn't mean we don't identify them as such, or that filters can't too -- meaning no real reprieve for spammers. It may take a little longer to catch up with all the possible spellings of Viagra, mixed with other spammer substitution and homonym-like tricks -- vagria, vi@gr@, vyagrah, etc. -- but in the end, few text-only spams will get though filters, and those that do will be a bloody mess.
Is your e-mail getting harder to read? Been impressed by an unexpected mental skill? Write to email@example.com, and we'll post selected comments this Thursday. If you want to share your thoughts but don't want your letter published, please make that clear.
CAREFUL, TIVO'S LOOKING: Last week TiVo issued a press release announcing that the moment in which Justin Timberlake unveiled Janet Jackson "drew the biggest spike in audience reaction TiVo has ever measured." What was happening was that TiVo users had paused the live feed from the halftime show shortly after It Happened, rewound, watched It Happen again, and so on, like some parody of Oliver Stone's "JFK."
Seemed like an amusing footnote to a classic water-cooler story -- until those TiVo users thought, "Wait a minute, TiVo knows I did that." No doubt some of us had a vaguely queasy feeling as we realized TiVo knew not only about the Janet replay, but also about all those surreptitious back-and-forths through, say, key bits of "Fast Times at Ridgemont High" or "Clash of the Titans" or goodness knows what else. Juvenile things you thought you were doing in the privacy of your own DVR, but it turns out didn't escape TiVo's all-seeing gaze. (Related article.)
All this reminds us of the promises and dire predictions (for they were one and the same) made about e-commerce a few years back: that Web-site owners would use a combination of cookies and site logs to follow your every move and assemble a perfect profile of your cyber-self.
That data-rich world hasn't arrived for a couple of reasons. For one thing, privacy worries got a thorough (and necessary) airing, forcing advertisers and marketers to back off. The decentralized nature of the Internet has left one's online identity fairly fragmented: Most organizations only track you within their own sites, and those that can track you across multiple sites generally either don't make use of that information (your Internet-service provider) or do so anonymously (companies that serve ads) – a state of affairs for which we should thank those who raised the alarm about Internet privacy.
Even when Web-site operators do know what you're up to, it can be very hard to pick the signal out from the noise – as our print colleague Nick Wingfield explained in a 1998 story that remains relevant today. For proof, consider recommendation engines. TiVo's leaps of faith in making recommendations for its users have inspired several sitcom plot lines and an entertaining article by the Journal's Jeffrey Zaslow, and Amazon.com users know all about the disasters that can unfold if you forget to tell the site that an atypical purchase is a gift. Most people have a well-meaning relative who once picked up on the fact that you like X (which could be the Red Sox, goldfish or Gary Larson) and now gives you nothing but X-related gifts, despite the fact that either a) you are a big fan of X and therefore already have most everything related to it; or b) you were just in a weird mood or being nice and don't in fact like X at all. If you don't have such a relative, the recommendation engines for TiVo or Amazon can make you feel like you do.
There are reasons for privacy concerns today, but in our view most of them have to do with poor security, not existing practices – phishers, scammers and security holes on Web sites are much more dangerous than what companies who have your information might do with it. For the most part, Internet-service providers have resisted legal attempts to part with their subscribers' personal information. Marketers, for the most part, have retreated from grand plans that would impinge on our privacy – and technology has helped us fight back. And recommendation engines remain crude tools – not because the technology is poor, but because they need a tremendous amount of information to be truly on target.
Still, anonymity can depend on context. No, TiVo doesn't have your name in a file folder named "Halftime Perverts." But start recording too many late-night Cinemax movies and TiVo just may present you with the collected works of Shannon Tweed. In which case your significant other might assign you to a file folder you'd rather not be in.
Are you worried about what a given organization – whether it's TiVo, Amazon or something else -- knows about you? How has that affected your habits? Write to firstname.lastname@example.org, and we'll post selected comments this Thursday. If you want to share your thoughts but don't want your letter published, please make that clear.
SPAM OF THE WEEK: If you've been online for a while, by now you've probably received innumerable spams from obviously false addresses, scads of spams with "remove" links that won't work, gobs of HTML spams hiding text offers within images, hordes of dodgy personal-finance offers and even a good number of spams with those weirdly poetic random words designed to foil filters.
But all at once? This week Jace got a missive from one Shannon Cooley (e-mail address: email@example.com) with the subject line "Pastdue, acct Kirk seminary tomato." (Silly spammer! The filter-foiler goes in the body of the message!) Inside was an image touting one of those dopey offers to eliminate credit-card debt and another image offering the ability to be removed from the spammer's list -- the latter bracketed by the odd exhortation For example, midwife related to indicates that from fundraiser reach an understanding with waif about.dilettante from buy an expensive gift for fruit cake behind.He called her Alexandra (or was it Alexandra?).philosopher… on fruit cake from traffic light.Unlike so many cream puffs who have made their worldly submarine to us. from cough syrup assimilate related to food stamp.
Wheee! It's like five spams in one.
Write to Tim Hanrahan and Jason Fry at firstname.lastname@example.org
Updated February 9, 2004
Help Mobile Devices Corrections
Copyright © 2004 Dow Jones & Company, Inc. All Rights Reserved