Mail Filtering with Exim and Python

PyCon 2004 -- March 25, 2004

A.M. Kuchling

www.amk.ca

Exim (www.exim.org) is a mail transfer agent.

A mail transfer agent, or MTA, is the program responsible for sending outgoing mail, receiving incoming messages, queueing messages when connections are down, etc. Exim is an MTA; comparable programs include Sendmail, qmail, and Postfix.

Mail user agents, or MUAs, are the programs that users run to read and send mail, such as mutt, Eudora, Entourage, Outlook, etc. MUAs usually hand off messages to an MTA for transport.

Some of Exim's noteworthy features are:

elspy (elspy.sf.net, by Greg Ward) embeds Python inside Exim for filtering.

Exim supports an API hook (local_scan()) for filtering messages. elspy provides a local_scan() that initializes the Python interpreter and invokes Python code. On receiving a message, elspy will import the exim_local_scan module and run the local_scan() function in the module.

The Python code is given the headers and body of the message, and various information about the remote connection. Messages can be permanently or temporarily rejected by raising an exception. Another option is to add headers to messages before passing them on to Exim's usual delivery process. (Exim's API permits modifying existing headers, but no one has bothered to wrap this for elspy.)

Installing Elspy

Installing a new local_scan() requires recompiling Exim.

Caution: the 0.1.1 release of elspy has a bug... download elspy-0.1.2 from www.amk.ca/files/python instead.

The Python local_scan Interface

On receiving a message, Exim will import the exim_local_scan module and run the local_scan() function in it.

An example:

from elspy import RejectMessage

def local_scan (info, headers, fd):
    subject = headers.get('subject')
    if subject is not None:
        subject = subject.strip().lower()
        if subject.startswith('spam'):
            raise RejectMessage("obvious spam rejected")

Available Exceptions

The EximInfo instance

Contains information about the SMTP transaction and connection:

Examining Headers

Headers are provided as a sequence-like instance.

subject = headers.get('subject')
received = headers.get_all('received')
headers.add('X-Spam-Ranking', 'DEFINITELY')
log = open('/var/log/exim/spam', 'a')
headers.write(log)   # Writes header list to a file

Reading the Message Body

The fd parameter is a file descriptor positioned at the start of the message body.

msg = os.fdopen(fd)
for line in msg.readlines():
    if '419' in line:
        spam_score += 1

On to some examples...

Example: Rejecting based on subject

from email.Header import Header   # Need to decode quoted-printable
from elspy import AcceptMessage, RejectMessage

def local_scan (info, headers, fd):
    subject = headers.get('subject', '')
    subject = unicode(Header(subject))
    lsubject = subject.lower()
    if lsubject.startswith('adv:'):
        raise RejectMessage("Spam not wanted here "
                            "(subject line includes 'ADV')")

There's actually a bug in this code, but the bug will likely only affect spam messages. 8-bit characters are forbidden in RFC2822 e-mail headers, but some spam messages contain 8-bit text (often ones written in Chinese or Korean). If such a message is received, the unicode() call will raise an unexpected exception. Exim will then return a temporary failure, but most such spam messages aren't tried again. MUAs in use by actual users usually get the subject-line quoting correct.

Example: Rejecting based on addressee

We can look at info.recipients_list and check if any spamtrap addresses are present.

def match_recipients (recipient_list, local_parts, domains):
    """(str | [str], str | [str]) : [str]
    Checks whether the given recipient list contains any addresses
    that combine one of the local_parts with one of the domains.
    """
    body of function deleted -- 
    see the full function in the slides on my web page

def local_scan (info, headers, fd):
    if match_recipients(info.recipients_list,
                        'spamtrap', 'example.com'):
        raise RejectMessage("Spam not wanted here")

Full version of function:

import re
def match_recipients (recipient_list, local_parts, domains):
    """(str | [str], str | [str]) : [str]
    Checks whether the given recipient list contains any addresses
    that combine one of the local_parts with one of the domains.
    """
    if isinstance(local_parts, str):
        local_parts = [local_parts]
    if isinstance(domains, str):
        domains = [domains]

    # Construct a regex pattern: (local1|local2)@(domain1|domain2)
    pattern = ('^(' + '|'.join(local_parts) + ')@(' +
               '|'.join(domains) + ')$')
    pattern = re.compile(pattern)

    matches = []
    for addr in recipient_list:
        if pattern.match(addr):
            matches.append(addr)
    return matches

A subtlety of the function:

Example: Rejecting virus messages

elspy includes a content filter that looks for executable attachments (using regexes, not full MIME).

from elspy import AcceptMessage, RejectMessage
from elspy import execontent_simple

def local_scan (info, headers, fd):
    # Reject messages with executable attachments
    # -- will raise a RejectMessage
    execontent_simple.local_scan(info, headers, fd)

Example: SpamAssassin

There's also support for using SpamAssassin running as a separate daemon, rejecting certain markers and marking others.

from elspy import AcceptMessage, RejectMessage
from elspy import spamassassin

# Messages that score this or more will be rejected outright.
spamassassin.REJECT_THRESHOLD = 12.0

def local_scan (info, headers, fd):
    spamassassin.local_scan(info, headers, fd)

Example: SPF

Sender-Permitted-From is a recent proposal to make it more difficult to forge e-mails from arbitrary addresses.

The example below uses Terence Way's PySPF (http://www.wayforward.net/spf/) to perform the SPF checking.

import spf

def local_scan (info, headers, fd):
    response, smtp_code, explanation = \
      spf.check(info.sender_host_address,
                info.sender_address, 
                info.sender_host_name)    

    # Response is one of 'pass', 'deny', 'unknown', 'error'
    if response == 'deny':
        raise RejectMessage(explanation)

Comparison: Sendmail's milter interface

Sendmail + milterelspy
Runs under different user ID Runs under Exim's user ID (probably mail:mail)
Runs in separate process Runs in the Exim process.
Imports done once, on process startup Imports done once per Exim process (~ once per message)
Can modify recipients, headers and message body More limited:
  • Change recipient list.
  • Add headers.
Can insert processing at various points in SMTP operation: after HELO, after MAIL FROM, after DATA, etc. Processing is only done after DATA

Short summary: elspy is simpler, but this means it's less powerful and likely less scalable at higher workloads.

This table compares elspy with the Python Milter (http://www.bmsi.com/python/milter.html).

Questions, comments?

To download:

These slides: www.amk.ca/talks/elspy