Delivery Errors and Retrying

A MTA has to be prepared to hold messages due to hosts being down or temporary unavailable, some rules are required for deciding how often the retrying is to occur and when to give up.

Delivering message costs resources, so it is a good idea not to retry too often. Exim uses host-based retying (actually it uses the IP address not the hostname), if delivery to a host fails temporarily, all messages that are routed to that host are delayed until its next retry time arrives. Information about temporary delivery failures is kept in a hints database called retry, you can read this database by using the utilities exinext or exim_dumpdb. The information in the database includes details of the error, the time of the first failure, the time of the most recent failure, and the time before which it is not reasonable to try again.

Remote Delivery Errors

There can be a number of hosts errors

When a permanent SMTP error code (5xx) is given at the start of a connection all the addresses that are routed to the host are failed and returned to the sender in a bounce message. The other kind are errors that are temporary cause all messages to that host to be deferred and not retried again until after its retry time has passed.

Local Delivery Errors

Two of the most common local errors are

The retry times are the same as for the remote delivery errors above, but retry delays apply only to deliveries in the queue runs.

Routing Errors

Common routing errors can be

Retry processing applies to routing an address as well as to transporting a message, but only for delivery processes started in the queue runs. There is no distinction between routing and transporting a message.

Retry Rules

The retry rules are contained in a separate area on the configuration file, it starts at the line "begin retry". Each rule occupies one line and consists of three parts

retry rule example

# a pattern; an error name; list of retry parameters

domain     error      retries
----------------------------------------------------------------------
*          *          F,2h,15m;        G,16h,1h,1.5;          F,4d,6h;

Note: the above retry rule is as follows, this is a catch all rule note the first two *

*               = for all domains
*               = for all errors
F,2h,15m;       = try every 15 mins for 2 hours (F means fixed time intervals)
G,16h,1h,1.5;   = Then start with 1 hour interval; increase by X 1.5 until 16 hours (G means increasing intervals)
F,4d,6h;        = Then try every 6 hours up to 4 days (F means fixed time intervals)

Exim searches the rules in order until one matches, there is normally a catch all rule (see above). If a rule cannot be found then the temporary error is converted to a permanent error and the address is bounced after the first delivery attempt. Also the times are used in turn once all the times have been used then again the error is converted to a permanent error and the message is bounced. There is a option called retry_interval_max (defaulted to 24) which makes sure that a message tries at least once a day, this option prevents you from generating enormously long retry intervals.

The domain description can use wildcards i.e *.datadisk.co.uk, you can also use expressions and several forms of lookup.

There are a number of error field values that you can use

Error Meaning
auth_failed Authentication failed
data_4xx A 4xx error was received for a DATA command
lost_connection The connection closed unexpectedly
mail_4xx A 4xx error was received for a MAIL command
quota Quota exceeded in local delivery
quota_<time> Quota exceeded in local delivery, and the mailbox has not been read for <time>
rcpt_4xx A 4xx error was received for a RCPT command
refused_MX Connection refused: host obtained from an MX record
refused_A Connection refused: host not obtained from an MX record
refused any connection refusal
timeout_connect_MX Connection timed out: host obtained from an MX record
timeout_connect_A Connection timed out: host not obtained from an MX record
timeout_connect Any timeout connection
timeout_MX Any timeout for a host obtained from an MX record
timeout_A Any timeout for a host not obtained from an MX record
timeout Any timeout
tls_required A TLS session could not be setup when required

The times specified are hints not promises, Exim will try its best to honor the times but they will not be exact times. Also make sure that if your queue runner process only runs every 15mins it does not make much sense in specifying a retry time of 5mins, what i am trying to say is don't make a retry rule less then the queue runner time, it don't make much sense.

More retry rule examples

alice@wonderland.example   quota      F,7,3H
wonderland.example         quota_5d
wonderland.example         *          F,1h,15m; G,2d,1h,2;
lookingglass.example       *          F,24h,30m;
*                          refused_A  F,12h,20m;
*                          *          F,2h,15m; G,16h,1h,1.5; F,4d,6h;

Note: I will leave you to figure these out

Certain messages could fail for a long period, this could be because the message has multiple choices to deliver to (multiple MX records for the same domain), it is possible to have different rules for domains for example

message with different MX records # suppose the domain tweedledum.example is routed by MX records to both tweedledum.example and
# tweedledee.example

tweedledum.example  *  F,1d,30m;     ## the first route for a message as per the MX record
tweedledee.example  *  F,5d,2h;      ## the second route for a message as per the MX record

Note: a message may have two routes to deliver (as above), the address will only timeout when the all routes times have passed

I have not documented dial-up connections, you may want to pop over the official Exim Web Site to get more information on dial-ups.