On Sunday, November 12, 2023, the Quay engineering team migrated quay.io’s database from its original MySQL RDS instance (circa 2013) to an Aurora Postgres instance. We did this because maintaining an older version of MySQL and updating the database without causing downtime was becoming increasingly difficult. This migration was a large undertaking that required months of planning. Because quay.io’s database holds metadata linking all customer images to their underlying layer blobs, losing any of this information would have been catastrophic.
To facilitate the migration between databases we used Amazon’s Data Migration Service (DMS), which is specifically designed for this type of task. The day after we went live on Aurora Postgres, we discovered that some images were not pulling successfully. Upon investigation, we discovered that these failures were due to truncated manifests in our new database.
This truncation occurred because DMS had been configured to limit the size of some MySQL text fields to 32K when they were converted to Postgres. This resulted in very large manifests being truncated. New image pushes were not affected as these were being written directly to the database.
We immediately wrote and executed a script to traverse our old MySQL database and reconcile all manifests against our Postgres database, correcting any truncated manifests. This was a large task, as quay.io currently holds over 60 million manifests. This work was completed on November 16th.
While only a small subset of images have been affected, we are deeply sorry for any inconvenience this may have caused.
This database migration has been an interesting journey in how to migrate a critical architectural component while trying to keep quay.io functioning. We will be sharing a write up soon with more details on our migration process, the issues we encountered, and which improvements we look forward to making in the coming months.