This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; our solution is based on the estimation of the a-posteriori probability of the correct decision, and the resulting algorithm is called highest probability SVM nearest neighbor (HP-SVM-NN). The second problem is the application of the algorithm in practice, and we propose a practical filter architecture based on HP-SVM-NN. Extensive testing is performed on SpamAssassin corpus and TREC 2005 Spam Track corpus, showing that HP-SVM-NN outperforms pure SVM and is applicable in practice. Finally, we explore the locality properties of the two corpora using Sammon’s projection.
E-Mail Spam Filtering with Local SVM Classifiers / Blanzieri, Enrico; Bryl, Anton. - ELETTRONICO. - (2008).
E-Mail Spam Filtering with Local SVM Classifiers
Blanzieri, EnricoPrimo
;Bryl, AntonUltimo
2008-01-01
Abstract
This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; our solution is based on the estimation of the a-posteriori probability of the correct decision, and the resulting algorithm is called highest probability SVM nearest neighbor (HP-SVM-NN). The second problem is the application of the algorithm in practice, and we propose a practical filter architecture based on HP-SVM-NN. Extensive testing is performed on SpamAssassin corpus and TREC 2005 Spam Track corpus, showing that HP-SVM-NN outperforms pure SVM and is applicable in practice. Finally, we explore the locality properties of the two corpora using Sammon’s projection.File | Dimensione | Formato | |
---|---|---|---|
013.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
718.85 kB
Formato
Adobe PDF
|
718.85 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione