Mail Filtering with Exim and Python
PyCon 2004 -- March 25, 2004
A.M. Kuchling
Exim (www.exim.org) is a
mail transfer agent.
A mail transfer agent, or MTA, is the program responsible for
sending outgoing mail, receiving incoming messages, queueing messages
when connections are down, etc. Exim is an MTA; comparable programs
include Sendmail, qmail, and Postfix.
Mail user agents, or MUAs, are
the programs that users run to read and send mail, such as mutt,
Eudora, Entourage, Outlook, etc. MUAs usually hand off messages to an
MTA for transport.
Some of Exim's noteworthy features are:
- Supports SMTP, SMTP-over-TLS, authenticated SMTP, retrying, etc.
- Has a logical and readable configuration file. Sendmail is a very commonly used MTA, but its configuration is legendarily complicated.
- Licensed under the GPL.
elspy (elspy.sf.net, by Greg Ward) embeds Python inside Exim for
filtering.
Exim supports an API hook (local_scan()
) for filtering
messages. elspy provides a local_scan()
that initializes the Python interpreter and invokes Python code.
On receiving a message, elspy will import the exim_local_scan
module and run the local_scan()
function in the module.
The Python code is given the headers and body of the message, and
various information about the remote connection. Messages can be
permanently or temporarily rejected by raising an exception. Another
option is to add headers to messages before passing them on to Exim's
usual delivery process. (Exim's API permits modifying existing
headers, but no one has bothered to wrap this for elspy.)
Installing Elspy
Installing a new local_scan()
requires recompiling Exim.
Caution: the 0.1.1 release of elspy has a bug... download
elspy-0.1.2 from www.amk.ca/files/python
instead.
- Untar Exim and elspy.
- Install the elspy Python modules with "python setup.py install"
- Configure Exim: copy a template to Local/Makefile and edit the copy.
- Copy elspy.c to Exim's Local/ directory.
- Edit Local/Makefile to compile and link elspy.c into Exim.
- Run "make install"
The Python local_scan Interface
On receiving a message, Exim will import the
exim_local_scan
module and run the
local_scan()
function in it.
An example:
from elspy import RejectMessage
def local_scan (info, headers, fd):
subject = headers.get('subject')
if subject is not None:
subject = subject.strip().lower()
if subject.startswith('spam'):
raise RejectMessage("obvious spam rejected")
Available Exceptions
-
RejectMessage( [message] )
-- rejects message outright
-
TempRejectMessage( [message] )
-- temporarily reject message
- most real senders will retry after a temporary rejection
- many viruses and spammers won't.
-
AcceptMessage( [message] )
-- message is delivered
- or you can just let
local_scan
return normally
The EximInfo instance
Contains information about the SMTP transaction and connection:
- Envelope sender address.
- IP address of remote host.
- List of recipients (can be modified)
Examining Headers
Headers are provided as a sequence-like instance.
subject = headers.get('subject')
received = headers.get_all('received')
headers.add('X-Spam-Ranking', 'DEFINITELY')
log = open('/var/log/exim/spam', 'a')
headers.write(log) # Writes header list to a file
Reading the Message Body
The fd
parameter is a file descriptor positioned at
the start of the message body.
msg = os.fdopen(fd)
for line in msg.readlines():
if '419' in line:
spam_score += 1
On to some examples...
Example: Rejecting based on subject
from email.Header import Header # Need to decode quoted-printable
from elspy import AcceptMessage, RejectMessage
def local_scan (info, headers, fd):
subject = headers.get('subject', '')
subject = unicode(Header(subject))
lsubject = subject.lower()
if lsubject.startswith('adv:'):
raise RejectMessage("Spam not wanted here "
"(subject line includes 'ADV')")
There's actually a bug in this code, but the bug will likely only
affect spam messages. 8-bit characters are forbidden in RFC2822
e-mail headers, but some spam messages contain 8-bit text (often ones
written in Chinese or Korean). If such a message is received, the
unicode()
call will raise an unexpected exception. Exim
will then return a temporary failure, but most such spam messages
aren't tried again. MUAs in use by actual users usually get the
subject-line quoting correct.
Example: Rejecting based on addressee
We can look at info.recipients_list
and check if any spamtrap addresses are present.
def match_recipients (recipient_list, local_parts, domains):
"""(str | [str], str | [str]) : [str]
Checks whether the given recipient list contains any addresses
that combine one of the local_parts with one of the domains.
"""
body of function deleted --
see the full function in the slides on my web page
def local_scan (info, headers, fd):
if match_recipients(info.recipients_list,
'spamtrap', 'example.com'):
raise RejectMessage("Spam not wanted here")
Full version of function:
import re
def match_recipients (recipient_list, local_parts, domains):
"""(str | [str], str | [str]) : [str]
Checks whether the given recipient list contains any addresses
that combine one of the local_parts with one of the domains.
"""
if isinstance(local_parts, str):
local_parts = [local_parts]
if isinstance(domains, str):
domains = [domains]
# Construct a regex pattern: (local1|local2)@(domain1|domain2)
pattern = ('^(' + '|'.join(local_parts) + ')@(' +
'|'.join(domains) + ')$')
pattern = re.compile(pattern)
matches = []
for addr in recipient_list:
if pattern.match(addr):
matches.append(addr)
return matches
A subtlety of the function:
- a message can be addressed to multiple recipients
- if the example rejects a message, the message is rejected
for all recipients and won't be delivered to any of them.
- alternatively, you can mutate
info.recipients_list
to change the
recipients.
Example: Rejecting virus messages
elspy includes a content filter that looks for
executable attachments (using regexes, not full MIME).
from elspy import AcceptMessage, RejectMessage
from elspy import execontent_simple
def local_scan (info, headers, fd):
# Reject messages with executable attachments
# -- will raise a RejectMessage
execontent_simple.local_scan(info, headers, fd)
Example: SpamAssassin
There's also support for using SpamAssassin running
as a separate daemon, rejecting certain markers and marking others.
-
X-Spam-Status: {Yes,No}, hits=7.8
-
X-Spam-Level: *******
(regex-friendly)
-
X-Spam-Flag: YES
if it's spammy, but not enough to be
rejected.
from elspy import AcceptMessage, RejectMessage
from elspy import spamassassin
# Messages that score this or more will be rejected outright.
spamassassin.REJECT_THRESHOLD = 12.0
def local_scan (info, headers, fd):
spamassassin.local_scan(info, headers, fd)
Example: SPF
Sender-Permitted-From is a recent proposal to make it more difficult
to forge e-mails from arbitrary addresses.
The example below uses Terence Way's PySPF (http://www.wayforward.net/spf/) to perform the SPF checking.
import spf
def local_scan (info, headers, fd):
response, smtp_code, explanation = \
spf.check(info.sender_host_address,
info.sender_address,
info.sender_host_name)
# Response is one of 'pass', 'deny', 'unknown', 'error'
if response == 'deny':
raise RejectMessage(explanation)
Comparison: Sendmail's milter interface
Sendmail + milter | elspy |
Runs under different user ID | Runs under Exim's user ID (probably mail:mail) |
Runs in separate process | Runs in the Exim process. |
Imports done once, on process startup | Imports done once per Exim process (~ once per message) |
Can modify recipients, headers and message body | More limited:
- Change recipient list.
- Add headers.
|
Can insert processing at various points in SMTP operation: after HELO, after MAIL FROM, after DATA, etc. | Processing is only done after DATA |
Short summary: elspy is simpler, but this means it's less powerful
and likely less scalable at higher workloads.
This table compares elspy with the Python Milter (http://www.bmsi.com/python/milter.html).