Kansai debian study_20071007

Embed Size (px)

Citation preview

Debian SPAM(CRM114)

dselect(1)

2007/10/7

Free Software Foundation GNU General Public License(2)

http://www.gnu.org/copyleft/gpl.html

http://www.netfort.gr.jp/~tosihisa/debian/kansai_debian_study_20071007.odp

()Debian

SPAM(CRM114)

Debian

Debian 1.3(bo)1997

12
(dselect )

dpkg
./configure ; make ; make install

Debian runlevel 23

etch
apt(deb)

SPAM



SPAM

1SPAM
a) 10(^^)v
b) 20(^^)/
c) 50(--)/
d) 100(__)/
e)

130250SPAM

SPAM7
SPAM

3627SPAM
106.67SPAM

SPAM

(MUA)()SPAM(^^)/

bogofilter,bsfilter,POPFile SPAM(^^)/

(^^)/

(^^)/

(__)/

SPAM(^^)v

(__)/

SPAM

procmail

bsfilter

ThunderBirdSPAM

POPFile

POPFile

POPFileSPAM
SPAMSYSTEM

POP proxy

POPFile

POPFilePOPProxySPAM

POPFile


1100SPAM
100SPAM

SPAM

POPPOPFilefetchmail
SPAM

DebianSPAM

bogofilter
m(_ _)m

SpamAssassin

SPAM

bsfilter

CRM114

'the Controllable Regex Mutilator'

apt-cache search SPAM

http://crm114.sourceforge.net/

CRM114

Hidden Markov Model, Bayesian Chain Rule Orthogonal Sparse Bigrams, Winnow, Correlation, KNN/Hyperspace, Bit Entropy, CLUMP, SVM, Neural Networks
( or by other means- its all programmable).

CRM114

SPAMnkfkakasi

$ apt-get install nkf kakasi crm114

CRM114

CRM114
cp -a /usr/share/doc/crm114/examples .crm114
cd .crm114
gunzip *.gz
chmod +x mailfilter.crm


cssutil -r -b spam.css
cssutil -r -b nonspam.css

CRM114

crm -v

This is CRM114, version 20060704a-BlameRobert (TRE 0.7.3 (LGPL))
Copyright 2001-2006 William S. Yerazunis
This software is licensed under the GPL with ABSOLUTELY NO WARRANTY

SPAM
11

$ cat spam | crm -u ~/.crm114/ mailfilter.crm | grep X-CRM114-Status
X-CRM114-Status: UNSURE (0.0000) This message is 'unsure'; please train it!

CRM114SPAM

'pR'-320.0+320.0
X-CRM114-Status:

+320.0

-320.0

+/-0.0

-10.0

+10.0

GOOD(SPAM)

SPAM

UNSURE()

SPAM--learnspam
$ cat spam | crm -u ~/.crm114/ mailfilter.crm learnspam


$ cat spam | crm -u ~/.crm114/ mailfilter.crm | grep X-CRM114-Status
X-CRM114-Status: SPAM ( pR: -183.7027 )

SPAM--learnnonspam

CRM114SPAM

TOE - Train Only Errors


POPFile

TET Train Every Thing

CRM114

SPAM

CRM1148bit clean

[]
CRM114kakasi

kakasi



$ echo '' | nkf -e | kakasi -Ha -Ka -Ja -Ea -ka -s
watashi no namae ha tanaka desu .

nkfUTF-8(tty)EUC,JIS(mail)EUC

pretokenizer.crm

CRM114(mailfilter.crm)

pretokenizer.crmMIMECRM114

(pretokenizer.crm)

pretokenizer.crmSPAM(1)

*** 261,268 ****# We clip m_text to be the first :decision_length: characters of# the incoming mail.#! match (:m_text:) [:_dw: 0 :*:decision_length:] /.*/! isolate (:m_text:)## :b_text: is the text with base64's expanded.isolate (:b_text:) /:*:m_text:/--- 261,274 ----# We clip m_text to be the first :decision_length: characters of# the incoming mail.#! #match (:m_text:) [:_dw: 0 :*:decision_length:] /.*/! #isolate (:m_text:)! isolate (:m_text:) /:*:_dw:/! {! match [:text_preprocessor:] /./! syscall (:*:_dw:) (:m_text:) /:*:text_preprocessor:/! }! match (:m_text:) [:m_text: 0 :*:decision_length:] /.*/## :b_text: is the text with base64's expanded.isolate (:b_text:) /:*:m_text:/

pretokenizer.crmSPAM(2)

:text_preprocessor: /\/home\/tosihisa\/\.crm114\/pretokenizer\.crm/mailfilter.cf1
()


mailfilter.cfdo_base64no
(pretokenizer.crm)

URL
http://www.netfort.gr.jp/~tosihisa/crm114/

mailfilter2.crm
SPAM

mailfilter2.cf

pretokenizer.crm

CRM114

postfixclamav

procmailCRM114

Maildir

imap
PC

CRM114

CRM114

MLSPAM
ML

anyone can post ML

MLML
SPAM

CRM114(1)
()

CRM114(1)
(SPAM)

UNSURESPAM

SPAM
UNSURE)

CRM114




UNSURE
GOOD/SPAMSPAM
UNSURE

SPAM

SPAM


()


POP

POPFile

GOOD33597.10%UNSURE102.90%SPAM00.00%345

??? ??? (???)2007/10/06, 01:57:20 / C D

GOOD3350.971014492753623

UNSURE100.0289855072463768

SPAM00

GOOD10.03%UNSURE61316.90%SPAM301483.08%3628

??? ??? (???)2007/10/06, 01:57:20 / B C

GOOD10.000275633958103638

UNSURE6130.16896361631753

SPAM30140.830760749724366