How does the Gmail anti-spam filter work?

advertisements

I'm always surprised by the high quality of Gmail spam filter. For the last year, it filtered 99.95% of the spam, and blocked by mistake only one mail. By comparison, any other mail service I used makes at least one mistake for every 50 mails.

How, internally, Gmail does to reach this level of quality? Is it based on customers feedback (ie. if N customers block mail as spam, it is sorted as spam for every other customer)? Or there is some trick? Maybe a basic filter algorithm filters the most obvious spam, and some difficult cases are analyzed by real humans?


Briefly speaking this is based on the community feedback. Here is a citation from official explanation:

Gmail users play an important role in keeping spammy messages out of millions of inboxes. When the Gmail community votes with their clicks to report a particular email as spam, our system quickly learns to start blocking similar messages. The more spam the community marks, the smarter our system becomes.

You can read a bit more about it on their Spam Explained page.