Host a Backup Relay Mail Server on OpenBSD

WIP!

Most of this site is incomplete, and the current state is available as an open draft. Most of the text here is likely incomplete, misinformed, or just plain wrong. I'm looking for feedback on my website, so that I can:

To anyone who wants to send me feedback, thank you, and shoot me an email!

Very rarely I have uptime issues with my mail server. SMTP is a fairy resilient protocol and you should generally receive your mail even if your server has been down for up to a day.

However, longer periods may risk your mail expiring and being sent back to sender, or simply expired. For the three years I’ve hosted an email server (both Postfix and OpenSMTPD), it had a service outage only one time, for multiple days when a badly configured nginx server clogged my machine with error logs. After purging the logs, fixing the config and upsizing the disk space, I found old mail trickle in for the next couple weeks.

I could have hosted a dedicated backup server that would cache envelopes and relay them to the main server. Although I would find the ~$5/mo cost a bit much for a personal site, it would be profitable for machines with a much larger userbase.

smtpd’s config action relay backup transforms the daemon from a standard MX to a backup mail exchanger, relaying envelopes to any sibling exchanger with a higher priority.

This tutorial will show how to set up one or more backup MX’s that will queue up messages in case the main MX catches fire.

Preparation

I assume you followed through the main tutorial or already have an MX somewhere else. A couple notes about this setup:

This is mainly to make the backup server’s job much simpler. Without owning a mailbox, it doesn’t have to run dovecot, rspamd, or redis. It doesn’t have to manage DKIM or DH keys, or sync mailboxes or credentials with the main server.

This means hosting a backup server has less moving parts and can run on a slimmer machine than the main server. Conversely, it also means this won’t protect against data loss from the main server burning down, and only exists to preserve uptime.

For this tutorial, we’ll assume we’ve already set up our main server mail.example.org, and want to set up a backup server mx2.example.org.

Setup DNS

Assuming you’ve already set up DNS for the main server, there’s only one MX record you’d need to add. I’ll pull over the MX record from the main tutorial for comparison:

example.org.                    MX      "mail.example.org"    10
example.org.                    MX      "mx2.example.org"     100

Notice that the record for mx2.example.org has a higher priority number? In MX records, a lower priority value is prioritized first, so mail exchangers will attempt to send envelopes to the main MX before falling back to mx2.

Setup SSL

Like the main server, set up acme-client and httpd to accept challenges from Let’s Encrypt and renew the SSL key.

For /etc/acme-client.conf:

authority letsencrypt {
	api url "https://acme-v02.api.letsencrypt.org/directory"
	account key "/etc/acme/letsencrypt-privkey.pem"
}

domain mx2.example.org {
	domain key "/etc/ssl/private/mx2.example.org.key"
	domain certificate "/etc/ssl/mx2.example.org.cert"
	domain full chain certificate "/etc/ssl/mx2.example.org.fullchain.pem"
	sign with letsencrypt
}

For /etc/httpd.conf:

server "mx2.example.org" {
	listen on egress port http

	block drop

	location found "/.well-known/acme-challenge/*" {
		root "/acme"
		request strip 2
		pass
	}
}

Check the config, enable httpd, and generate the cert:

$ doas httpd -nf /etc/httpd.conf
configuration OK
$ doas acme-client -nf /etc/acme-client.conf
$ doas rcctl enable httpd
$ doas rcctl restart httpd
httpd(ok)
$ acme-client -v mx2.example.org

Setup Smtpd

For /etc/mail/smtpd.conf:

pki mx2.example.org cert "/etc/ssl/mx2.example.org.fullchain.pem"
pki mx2.example.org key "/etc/ssl/private/mx2.example.org.key"

table aliases file:/etc/mail/aliases
table domains { example.org }

# Just like before, we check for and filter out dyndns, rdns, and fcrdns.
# However for simplicity's sake, we leave anti-spam logic by rspamd for
# the main server.
filter "no_dyndns" phase connect match rdns regex { '.*\.dyn\..*', '.*\.dsl\..*' } \
	disconnect "550 no residential connections"
filter "no_rdns" phase connect match !rdns \
	disconnect "550 mailserver failed rDNS check"
filter "no_fcrdns" phase connect match !fcrdns \
	disconnect "550 mailserver failed FCrDNS check"

filter incoming chain { "no_dyndns", "no_rdns", "no_fcrdns" }

# ---Incoming Mail---

listen on egress	tls pki mx2.example.org auth-optional filter incoming

action "local_mail" relay backup

match from any for domain <domains> action "local_mail"

# ---Outgoing Mail---
# We'll still keep smtpd's default behavor of unix users
# delivering mail between other users

listen on socket

action "internal_mail" mbox alias <aliases>

match from local for local action "internal_mail"

Check the config and restart smtpd:

$ doas smtpd -nf /var/mail/smtpd.conf
configuration OK
$ doas rcctl restart smtpd
smtpd(ok)
smtpd(ok)

Open SMTP on the firewall

If you disable incoming ports on pf by default, open up SMTP. Add this line in /etc/pf.conf:

pass in on egress proto tcp to port smtp

Check the config and apply changes:

$ doas pfctl -nf /etc/pf.conf
$ doas pfctl -f /etc/pf.conf

Test

I tested this setup with a fresh domain running mail for a new domain (I’ll be using @example.org as a stand-in), and sent email from @websteading.net to see where the envelope gets sent to.

To test the backup relay is working, I ran smtpctl monitor on mail.example.org, mx2.example.org, and mail.websteading.net. Since I’ve never used the new domain, and have extremely low traffic on websteading.net, the monitor usually looks like this:

--- client ---  -- envelope --   ---- relay/delivery --- ------- misc -------
curr conn disc  curr  enq  deq   ok tmpfail prmfail loop expire remove bounce
   0    0    0     0    0    0    0       0       0    0      0      0      0
   0    0    0     0    0    0    0       0       0    0      0      0      0
   0    0    0     0    0    0    0       0       0    0      0      0      0

...

Having no clients connected, and no envelopes in queue, and no activity each poll.

First Experiment

To start, I have all servers running and I sent an email to test@example.org. The monitor for mail.example.org reported a new connection delivering the envelope:

--- client ---  -- envelope --   ---- relay/delivery --- ------- misc -------
curr conn disc  curr  enq  deq   ok tmpfail prmfail loop expire remove bounce

...

   0    0    0     0    0    0    0       0       0    0      0      0      0
   1    1    0     0    0    0    0       0       0    0      0      0      0
   1    0    0     0    1    1    1       0       0    0      0      0      0
   1    0    0     0    0    0    0       0       0    0      0      0      0
   1    0    0     0    0    0    0       0       0    0      0      0      0
   0    0    1     0    0    0    0       0       0    0      0      0      0
   0    0    0     0    0    0    0       0       0    0      0      0      0

...

…and mx2.example.org showing no traffic. This is expected, since the MX record pointing to mail.example.org has a higher priority than the backup, so with both servers on, the former gets precedence.

Second Experiment

Then, I stopped accepting incoming sessions from mail.example.org:

$ doas smtpctl pause smtp
command succeeded

I sent a new email, and initially there were no monitor changes for either mail.example.org or mx2.example.org. Instead, mail.websteading.net kept the letter in queue:

--- client ---  -- envelope --   ---- relay/delivery --- ------- misc -------
curr conn disc  curr  enq  deq   ok tmpfail prmfail loop expire remove bounce
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0

And I was able to see a summary of the letter via smtpctl show queue:

a1e24bba4ee4092b|inet4|mta|auth|test23442@websteading.net|someone@example.org|someone@example.org|209094426|209094426|209094426|0|inflight|29

Showing that the letter has about a half minute wait time left until smtpd tries again (the last value in the row).

About a minute later, I see the timer reset:

a1e24bba4ee4092b|inet4|mta|auth|test23442@websteading.net|someone@example.org|someone@example.org|209094426|209094426|209094426|0|inflight|93

Eventually, the monitor changes in mail.websteading.net:

--- client ---  -- envelope --   ---- relay/delivery --- ------- misc -------
curr conn disc  curr  enq  deq   ok tmpfail prmfail loop expire remove bounce
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     1    0    0    0       0       0    0      0      0      0
   0    0    0     0    0    1    1       0       0    0      0      0      0
   0    0    0     0    0    0    0       0       0    0      0      0      0

...

And in mx2.example.org:

--- client ---  -- envelope --   ---- relay/delivery --- ------- misc -------
curr conn disc  curr  enq  deq   ok tmpfail prmfail loop expire remove bounce
   1    1    0     0    0    0    0       0       0    0      0      0      0
   1    0    0     1    1    0    0       0       0    0      0      0      0
   1    0    0     1    0    0    0       0       0    0      0      0      0
   1    0    0     1    0    0    0       0       0    0      0      0      0
   1    0    0     1    0    0    0       0       0    0      0      0      0
   1    0    0     1    0    0    0       0       0    0      0      0      0
   1    0    0     1    0    0    0       0       0    0      0      0      0

I then resumed normal operations in mail.example.org:

$ doas smtpctl enable smtp
command succeeded

And then in the main mail server, the envelope is finally delivered:

--- client ---  -- envelope --   ---- relay/delivery --- ------- misc -------
curr conn disc  curr  enq  deq   ok tmpfail prmfail loop expire remove bounce
   0    0    0     0    0    0    0       0       0    0      0      0      0
   1    2    1     0    0    0    0       0       0    0      0      0      0
   1    0    0     0    0    0    0       0       0    0      0      0      0
   1    0    0     0    1    1    1       0       0    0      0      0      0
   1    0    0     0    0    0    0       0       0    0      0      0      0
   1    0    0     0    0    0    0       0       0    0      0      0      0
   1    0    0     0    0    0    0       0       0    0      0      0      0
   1    0    0     0    0    0    0       0       0    0      0      0      0
   0    0    1     0    0    0    0       0       0    0      0      0      0
   0    0    0     0    0    0    0       0       0    0      0      0      0

All seems well. After a mail server made multiple attempts to connect to the main MX, it deferred to the backup server instead. The backup server then queued the letter until the main server went back online. Overall, this looks like a success!