Restoring MongoDB snapshots backed by EFS


The time had come to practice the backup procedure for our MongoDB setup. I’m not a big user of MongoDB so (as usual) this was a lot of “learn it as you go”. What I thought would be a clearly documented route for restoring a snapshot was actually not so straightforward and required piecing a few different sources together. Here’s a quick summary of where I fell down and how I got back up again.

EFS snapshots

We run mongo as part of a containerised setup on AWS Fargate. To actually store the data, we save it to an EFS mounted to the containers. EFS provides lovely, automated snapshots that we can access and restore at any time.

Restoring EFS snapshots

I wanted to do a full restore. It seems simple: go to the AWS Backups console, select the one you want and hit restore. Oops: access denied. Why!? In the UI it claims that if you don’t have the appropriate IAM role already configured, it will just create it for you and you should be set. This is a bit misleading…

Buried in a dev guide is this little gem of information:

Your EFS backup vault receives the access policy Deny backup:StartRestoreJob upon creation. If you are restoring your backup vault for the first time, you must change your access policy as follows.
  1. Choose Backup vaults.
  2. Choose the backup vault containing the recovery point you would like to restore.
  3. Scroll down to the vault Access policy
  4. If present, delete backup:StartRestoreJob from the Statement. Do this by choosing Edit, deleting backup:StartRestoreJob, then choosing Save policy.

Sort that out and, hey-presto, no more access issues.

Finding the restored data

Much to my disappointment but not to my surprise, it doesn’t magically restore everything.

When creating a fully restored snapshot, the data is placed into a directory named like aws-backup-restore<timestamp>. The exact timestamp is unclear from anywhere in the AWS console so to use it, you have to mount the EFS somewhere that you can gain access and have a look.

This is as easy as creating a tiny, throwaway EC2 instance and mounting it during the configuration of the instance. I did this all in the console and it took about 3 minutes.

Using the restored data

The EFS is configured as a ‘volume’ from the perspective of the ECS task. This brings with it a lovely configuration option that allows you to select a root directory to act as the volume root. More details on this can be found in the AWS::ECS::TaskDefinition EFSVolumeConfiguration CloudFormation reference docs.

By re-defining the TaskDefinition with the new restored snapshot as the root directory, the mongo instance will now be back to the state of the snapshot.

Improvements

A few things I’ve had in the back of my head to try in future…

This structure means that we’ll get ‘nested restores’ e.g. eventually we’ll be configuring the root at backup-two/backup-one. This needs to be improved on for this to a reasonable strategy.

Can we point to a consistent root directory and change that directory on-the-fly inside the EFS? For instance, could we create a symlink called current at the root of the EFS and re-point that to the snapshot whenever we felt like it? I’m sure there are problems here around ensuring no transactions are in progress at the time.

How does this work in a cluster? (It probably doesn’t.)