Restoring a Postgres database to AWS RDS using Docker

In this post I look at using Docker to restore a Postgres dump file to a Postgres database running in the cloud on AWS RDS.

Keep it clean

One of the big selling points of docker, for me, is that I can have lots of apps and utils running in nice containers on my dev laptop, without having to install them locally.  This ensures my laptop stays nice and responsive and I don’t clutter/break my laptop with lots of weird dependencies and running processes that I’m then too scared to delete.

Postgres is a good example – I don’t want to install it locally, but I do need access to the command line tools like psql and pg_restore, to be able to work with my databases effectively.

One way of accessing these tools would be to ssh onto the AWS cloud instances, but there’s a bunch of reasons most pertinently security (not to mention the faff) why you’d want to avoid that every time you want to run some sql.  So let’s look at how we use Docker to ease the pain instead.

Start Me Up

With Docker installed you can build this simple Dockerfile to create a local Postgres container.  The User and Password env vars aren’t strictly required, however, if you want to actually connect to the containerised DB, it’s pretty handy


FROM postgres
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD password

view raw

Dockerfile

hosted with ❤ by GitHub

You can build, run and connect to the container as follows (assumes you are on Mac)


mkdir data-load
vi Dockerfile # Copy Docker commands listed above into your local Dockerfile
docker build -t postgres-db .
docker run -d -v /Users/iainhunter/dev/docker/postgres/data-load:/data-loader -p 5432:5432 postgres-db
docker ps
docker exec -it <imageId> bash

Note line 4 where I map the data-load dir I created at line 1 to a new directory called data-loader inside my container.  This means that when I copy the Postgres dump file into my local data-load directory, it will be available to the postgres tools available in the container.

Line 6  allows me to connect to the container, swap the imageId  for your locally running containerID.

Restoring your database with pg_restore

I’ll assume you already have a Postgres database set up within the AWS cloud.  So now we have connected to our container, we can use pg_restore to use restore our dumpfile into AWS (note this command will prompt you for the admin password)


pg_restore host=<youHost>.eu-west-1.rds.amazonaws.com –port=5432 –username=<yourAdminUser> –password –dbname=<yourDB> /data-loader/dumpfile.dmp

A note on schemas

If you’re doing a partial restore, you may want to restore your dumpfile to a separate schema.  Unfortunately there appears to be no way to do this from the command line.  What you have to do is to rename the public schema, create a new public schema and restore into that, then reverse the process.

This StackOverflow answer outlines the process.

Restore Complete

You should now have a complete restore of your dumpfile in the cloud.  Please add comments if anything is unclear.

Advertisement

Avoiding pitfalls running Mongo 3.2 in Docker on OSX

Quick post after smashing my head off the desk for 5 hours trying to understand YTF I couldn’t get Mongo 3.2 running in a Docker container on my Mac.  Hopefully this post spares others the elastoplasts 🙂

logo-gray-1024x512-993867aa92

Pitfall #1 – Mongo cannot write data to the host OSX machine

There are three factors that cause this issue:

  1. A lot of mongo/docker docs tell you to mount a local volume
  2. By default docker-machine uses the virtualbox driver to create a local VM
  3. Mongo uses mmap to turbo-charge access to files on disk

These facts led me (by way of this issue) to the fact that there’s been an ancient known issue on VirtualBox that basically says docker ain’t going to be able to read/write files with mmap.

So it’s pointless trying to tell mongo, via docker, to mount a local data volume, as the above bug means mongo isn’t going to be able to access it.  (careful reading of the Docker Hub mongo docs may allow you to divine this, but it’s not at all obvious)

Solution – Create a data volume in the container rather than the host.

Like always once you understand the problem, the solution is simple.  We can tell docker to create a data volume in the container rather than the host, as follows:


docker run –name my-local-mongo -v mongo-data:/data/db -p 27017:27017 -d mongo

view raw

mongo-dock.sh

hosted with ❤ by GitHub

In my (still) early days with docker I’d come across these cli commands which contained absolutely no description of WTF all the switches are doing, so a quick breakdown:

  • –name my-local-mongo – This tells docker to give the container the name “my-local-mongo” rather than generating a name like tuftybeaver
  • -v mongo-data:/data/db – This is the key switch, here we tell docker to create a new data volume called mongo-data and mount it as /data/db which is the default location mongo wants to write data to
  • -p 27017:27017 – Here we tell docker to bind the container port 27017 to port 27017 on the host vm, allowing us to access docker locally.
  • -d – This switch tells docker to run the process detached so we’re not running interactively.

If all has gone well docker should be running your mongo container successfully.  A quick docker ps will show your running container.

docker logs -f my-local-mongo – will allow you to follow the log files

Pitfall #2 – I can’t connect to mongo from localhost

This is the classic docker school boy error, that everyone makes.  The first time you do this you’ll be patting yourself on the back as you type localhost:27017 into your connection string and then wonder why the hell docker isn’t found.

This is because everything is actually running on your local docker virtual machine.  To find the IP of the VM enter docker-machine ls

You should see the URL listed something like this tcp://192.168.99.100:2376 You need to snarf this IP for your connection string.  The easiest way to sanity check that’s it all working is bash it into your browser, eg for this example:

http://192.168.99.100:27017/

The mongo database should helpfully respond with:

It looks like you are trying to access MongoDB over HTTP on the native driver port.

Demonstrating that your mongo db is running and responding to idiots trying to speak to it over HTTP 🙂

Pitfall #3 – No collections are displayed in RoboMongo

This was a real head scratcher, I was pretty sure everything was working, and yet RoboMongo wasn’t showing a damn thing.  This was a classic case of me Sherlock Holmes style spending 2 hours eliminating the impossible, to find whatever remains, however improbable, must be the truth

So in a turn up for the books, there was no bug in my code!!  It turns out by default Mongo 3.2 uses the WiredTiger storage engine, and RoboMongo can’t parse data stored in this format.

So instead if you’re using Mongo 3.2 you should use MongoChef to connect to your db.  MongoChef will show you all your lovely collections, living within your containerised Mongo.

Hope the above helps, thanks 🙂