When designing a backup scheme, it's important to start with your high-level requirements. In my case, they are as follows:

  • Is fully automated and non-interactive
  • Does not expose decryption key by writing it to disk in an unprotected state
  • Uses well-supported, high-availability, open-source tools

So with that, let's get started. This article is split into a few different sections, which I've linked here:

Data Encryption

One of my goals with this setup is to reduce exposure of the decryption key as much as possible. So writing it to disk in plaintext by storing it in a file on a filesystem should be avoided. With symmetric encryption, the encryption and decryption steps both use the same key. And since we are trying to design an automated (non-interactive) backup scheme, the encryption step requires access to the unprotected encryption key. For these reasons, symmetric encryption is not a good choice.

Asymmetric encryption is the way. So which open source tool has been around forever, is widely available, and is relatively simple to use? Why gpg of course!

“GPG” stands for “Pretty Good Privacy”; “GPG” stands for “Gnu Privacy Guard.” It was the original freeware copyrighted program; GPG is the re-write of GPG. The GPG uses the RSA algorithm and the IDEA encryption algorithm. GPG uses the NIST AES, Advanced Encryption Standard

We will need gpg to be installed on both our local system as well as on the system where we want to run our backup script. For debian and ubuntu, it's as simple as:

sudo apt-get install gpg

For other operating systems it is likely just as simple. Once you've got gpg installed, you can move on to the next steps.

Create a New Key Pair

On your local system, generate a new private/public key pair using gpg:

gpg --full-generate-key

This will guide you through the key generation process. For this backup scheme, I chose the following:

  • "RSA and RSA" for kind of key
  • 4096 bits key size
  • Does not expire

The name and other identifying information is up to you.

Don't forget to backup your keys! If you don't already have a backup scheme setup for your password databases, SSH/PGP keys, and other critical data then now is a good time to do it.

Use the following command to list the keys for which you have both the public and private key:

gpg --list-secret-keys --keyid-format LONG

The output will look something like this:

/path/to/user/.gnupg/pubring.kbx
------------------------------
sec   rsa4096/XXXXXXXXXXXXXXXX 2020-001-01 [SC]
      XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
uid                 [ultimate] xxx (xxx) <xxx@xxx>
ssb   rsa4096/XXXXXXXXXXXXXXXX 2020-001-01 [E]

Export and Import Public Key

The next step is to export the public key from your local machine so that we can import it to the system that will run the backup script.

Use the following command to print the GPG key ID for the last key that you generated:

gpg --list-secret-keys --keyid-format LONG | sed -r -n 's/sec +[a-z0-9]+\/([A-Z0-9]+) .*/\1/p' | tac | head -n 1

And then use this command to export your public key:

gpg --armor --export YOUR_GPG_KEY_ID > key.asc

Upload the "key.asc" file to the system that will run the backup.

Finally, on the backup system run the following to import the key:

gpg --import key.asc

Whew!

Test the Encryption/Decryption Setup

Now let's test the whole encryption/decryption process. Run the following command on the system where you will be running the backup script:

echo 'it works!' | gpg --encrypt -r "YOUR_GPG_KEY_ID" --output ./test.gpg

To demonstrate that the system is not able to decrypt:

gpg -r "YOUR_GPG_KEY_ID" --decrypt ./test.gpg

You should see the following error message:

gpg: encrypted with 4096-bit RSA key, ID XXX, created YYYY-MM-DD
      "xxx"
gpg: decryption failed: No secret key

For a successful decryption test, run the command again but on the system that has your GPG private key. You should see an

gpg -r "YOUR_GPG_KEY_ID" --decrypt ./test.gpg

If you see "it works!" printed in your terminal, then congratulations! If not, you will need to review the above steps to see what you might've missed.

Uploading Encrypted Backups to Remote Storage

You may have already heard the expression, "two is one and one is none", but it's worth considering when designing a data backup scheme. Though it is a bit simplistic, you can interpret it as a reminder that more redundancy is usually better. In our case, adding a remote storage backup can be very beneficial to the overall durability of our data.

You are probably already familiar or at least aware of Amazon's AWS S3 cloud storage, but I prefer to use DigitalOcean's Spaces which offers a compatible API. Both options work with s3cmd - an open source tool to manage your S3-compatible cloud storage.

Like before, installation on debian and ubuntu is quite simple:

sudo apt-get install s3cmd

If using another system, you can follow the installation instructions on the official website linked above.

Now it's time to configure your s3cmd tool. Be sure that your current shell is logged in as the user that will be running the backup script later:

s3cmd --configure

Complete the steps in the configuration. You will need an access key, bucket location and domain, etc. For help with this specific command have a look at the s3cmd howto page or, if using DigitalOcean, have a look at this article about s3cmd usage for "Spaces".

Once you've got the s3cmd configured, you can test it by trying to upload and read a file from your remote storage:

export S3_LOCATION="YOUR_BUCKETNAME" \
    && echo "it works!" | s3cmd -c ~/.s3cfg put - s3://$S3_LOCATION/test \
    && s3cmd -c ~/.s3cfg --no-progress get s3://$S3_LOCATION/test - | cat

Do you see "it works!" printed in your console? Super!

Task Scheduling

Crontab is the obvious choice here because it's available pretty much universally across all unix-like systems.

A crontab file contains instructions for the cron daemon in the following simplified manner: "run this command at this time on this date."

Basically what crontab allows us to do is configure our system to run scheduled tasks. The format for defining which comands to run and when is fairly simple:

m h  dom mon dow   command
  • m = minutes
  • h = hours
  • dom = day of the month
  • mon = month
  • dow = day of the week
  • command = the command to be executed

So for the following example:

0 */3 * * * /path/to/script.sh

The script will be run once at 0 minutes past every 3 hours.

To schedule cron tasks, you must add them to your user's crontab file. Run the following to open the terminal-based editor:

crontab -e

Your tasks should go at the bottom of the file, each on their own new line.

For an interactive, human-friendly tool to practice or test cron scheduling, have a look at crontab.guru. It also includes a few tips for working with crontab.

Potential Gotcha's of crontab

Scripts run via cron do not source the user's profile or bashrc files, so environment variables like PATH will not be set and you will not be able to use all the programs that you expect. The solution is to set the PATH variable at the top of your scripts:

PATH=$PATH:/usr/bin:/usr/sbin:/usr/local/bin

Testing and Debugging crontab

It's a good idea to test your cron setup to verify that everything is working as you expect. So let's start by creating a simple test script:

echo '#!/bin/bash' > ~/cron-test.sh \
    && echo 'PATH=$PATH:/usr/bin:/usr/sbin:/usr/local/bin' >> ~/cron-test.sh \
    && echo 'set -e' >> ~/cron-test.sh \
    && echo 'VERSION="$(s3cmd --version)"' >> ~/cron-test.sh \
    && echo 'echo "[$(date -u +%FT%T)] $VERSION"' >> ~/cron-test.sh \
    && chmod a+x ~/cron-test.sh

Let's set the test script to run once every 15 seconds. This is a little tricky because crontab's granularity is down to the minute. So this requires a bit of a creative kludge:

* * * * * ~/cron-test.sh
* * * * * ( sleep 15 ; ~/cron-test.sh )
* * * * * ( sleep 30 ; ~/cron-test.sh )
* * * * * ( sleep 45 ; ~/cron-test.sh )

What's happening here is that all the cron tasks run at the same time (once every minute). But each subsequent task is delayed by +15 seconds (sleep X).

And to help with debugging the tasks as they run, I like to write the script's output to a log file:

~/cron-test.sh 2>&1 >> ~/cron-test.log;
  • 2>&1 = redirect stderr (error output) to wherever stdout is being redirected
  • >> ~/cron-test.log = redirect stdout (and append) to crontab.log file

Combining both of the above will look like this:

* * * * * ~/cron-test.sh 2>&1 >> ~/cron-test.log
* * * * * ( sleep 15 ; ~/cron-test.sh 2>&1 >> ~/cron-test.log )
* * * * * ( sleep 30 ; ~/cron-test.sh 2>&1 >> ~/cron-test.log )
* * * * * ( sleep 45 ; ~/cron-test.sh 2>&1 >> ~/cron-test.log )

And finally to watch the log output as it is written:

touch ~/cron-test.log \
    && tail -n 20 -f ~/cron-test.log

If all is well, you should see something like the following:

[2020-01-01T00:00:01] s3cmd version 2.0.2
[2020-01-01T00:00:16] s3cmd version 2.0.2
[2020-01-01T00:00:31] s3cmd version 2.0.2

A new line should appear once every 15 seconds.

Complete Example Script

So after all of that, here's a complete example script for you to copy/paste to your heart's content. It uses pg_dump to dump an entire Postgres database, gzip to compress the data, gpg to encrypt the data, and finally s3cmd to upload the encrypted backup file to remote storage.

#!/bin/bash

#   MIT License
#   
#   Copyright (c) 2020 Charles Hill
#   
#   Permission is hereby granted, free of charge, to any person obtaining a copy
#   of this software and associated documentation files (the "Software"), to deal
#   in the Software without restriction, including without limitation the rights
#   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
#   copies of the Software, and to permit persons to whom the Software is
#   furnished to do so, subject to the following conditions:
#   
#   The above copyright notice and this permission notice shall be included in all
#   copies or substantial portions of the Software.
#   
#   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
#   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
#   FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
#   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
#   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
#   OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
#   SOFTWARE.
#   
#   For a human-friendly explanation of this license:
#   https://tldrlegal.com/license/mit-license
#   
#   If you have questions technical or otherwise, find my contact details here:
#   https://degreesofzero.com/
#   

# This is required because scripts run via cron do not source the user's profile or bashrc files.
PATH=$PATH:/usr/bin:/usr/sbin:/usr/local/bin

# Date/time will be used to keep our backup files properly organized.
DATE=$(date -u +%F)
DATETIME="$(date -u +%FT%T)"

# https://github.com/s3tools/s3cmd
# https://www.digitalocean.com/docs/spaces/resources/s3cmd/
S3_LOCATION="s3/bucket/path/$DATE"
S3_CONFIG_FILE="/path/to/user/.s3cfg"

# The GPG key ID that was previously imported to the server's keyring.
# NOTE: Don't forget to mark the imported key as trusted.
GPG_KEY_ID="XXX"

# Directory where backup files are stored locally.
BACKUPS="/path/to/local/backups"
mkdir -p $BACKUPS

# Dump SQL, compress and encrypt.
DBHOST="localhost"
DBNAME="XXX"
DBUSER="XXX"
DBPASS="XXX"
FILE="$BACKUPS/backup-file-name-$DATETIME.sql.gz.gpg"
if [ ! -f "$FILE" ]; then
    echo "Creating encrypted SQL dump..."
    pg_dump \
        --host="$DBHOST" \
        --dbname="$DBNAME" \
        --username="$DBUSER" \
        --password="$DBPASS" \
            | gzip --best - \
            | gpg --encrypt -r "$GPG_KEY_ID" --output $FILE
else
    echo "Encrypted backup already exists"
fi

echo "$FILE"

# Upload to remote location.
echo "Uploading to remote..."
s3cmd -c $S3_CONFIG_FILE put $FILE s3://$S3_LOCATION/

echo "Done!"
exit 0