[Chore] Migrate to PostgreSQL #93212

Closed
opened 2021-11-19 15:50:45 +01:00 by Anna Sirota · 8 comments
Owner

Goal

The goal of this migration is to have a more unified (in terms of number of technologies used) web infrastructure, starting with at least having all Django projects use the same RDBMS. This makes running local development setups and having exactly the same up-to-date dependencies everywhere easier.

One downside to this is the way PostgreSQL major version upgrades have to be done, but this would produce an issue once every several years, while upgrades of Django and surrounding boilerplate of DB client libs and other deps have to happen much more often.

The machine, which runs production Blender ID, already has a PostgreSQL container, so, if production data is moved successfully, doing the swithcover should be easy.

The main problem is copying the data with minimal disruptions.

How production data is copied

./manage.py dumpdata|loaddata were considered, but without extra hacks, such as migrating first, then truncating all the tables, loaddata won't actually be able to load all the data. Not only that, but it's also problematic because loaddata will trigger all signals, which we don not want to do for all the production data.

Luckily for us, latest https://pgloader.io/ (version 3.6.2 at the moment of writing) does a good job of copying data directly from a running MySQL to a running PostgreSQL.

It has to be built from sources because Debian/Ubuntu are much behind the currently stable working version, as usual. Let's assume we've done that and pgloader binary is now located at /opt/pgloader/pgloader-bundle-3.6.2.

In order to be able to copy from MySQL instance to PostgreSQL one, pgloader must have access to both, so new user, database and schema must be created in PostgreSQL instance (these can be the same ones Blender ID will later use). Something along these lines executed in psql should do the trick:

postgres=# CREATE USER blender_id CREATEDB PASSWORD ??????;
postgres=# CREATE DATABASE blender_id OWNER blender_id;
postgres=# \c blender_id;
You are now connected to database "blender_id" as user "postgres".
blender_id=# CREATE SCHEMA blender_id;
blender_id=# GRANT CONNECT ON DATABASE blender_id TO blender_id;
GRANT
blender_id=# GRANT USAGE ON schema blender_id TO blender_id;
GRANT
blender_id=# GRANT ALL ON schema blender_id TO blender_id;
GRANT

pgloader also needs to be able to connect to both instances, so we need to find IPs of their containers:

for n in `docker network ls  | awk '{print $1}'`; do docker network inspect $n | grep -E 'postgr|mysql' -A 3 | grep -iE 'name|ip.*address' ; done

This outputs something along the lines of

                "Name": "postgres",
                "IPv4Address": "172.22.0.4/16",
                "Name": "mysql",
                "IPv4Address": "172.22.0.8/16",

A script for copying the data can look as follows:

#!/bin/bash

PGLOADER=/opt/pgloader/pgloader-bundle-3.6.2/bin/pgloader
MYSQL_IP=172.22.0.8
PSQL_IP=172.22.0.4

MYSQL_IP="${MYSQL_IP}" PSQL_IP="${PSQL_IP}" \
        PGPASSWORD=?????? \
        MYSQL_PWD=?????? \
	$PGLOADER --verbose mysql_to_psql.load

where contents of mysql_to_psql.load looks like this:

LOAD DATABASE
    FROM mysql://blender_id@{{MYSQL_IP}}:3306/blender_id
    INTO postgresql://blender_id@{{PSQL_IP}}:5432/blender_id
CAST 
    column bid_main_user.last_login_ip to inet drop typemod,
    column bid_main_user.current_login_ip to inet drop typemod
;

Issues encountered

Invalid datetime values

The first attempt produced some errors due to invalid datetime values:

ERROR Database error 23502: null value in column "date_joined" violates not-null constraint

This is because MySQL has a bug that allows inserting invalid 0000-00-00 00:00:00 DATETIME values, which are later treated as NULLs, which in turn breaks the NON NULL constraint for these values.

Invalid inet values

PostgreSQL has a data types for IP addresses and these used by Django's GenericIPAddressField, which means that all values contained by last_login_ip and current_login_ip must be castable to inet, otherwise row will fail to be copied with the following error:

2021-11-19T16:28:54.816000Z ERROR Database error 22P02: invalid input syntax for type inet: "192.168.168.116, 127.0.0.1"
CONTEXT: COPY bid_main_user, line 14055, column current_login_ip: "192.168.168.116, 127.0.0.1"

To find all offending records we can use:

SELCT id, last_login, email, current_login_ip, last_login_ip FROM bid_main_user WHERE current_login_ip NOT REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$' OR last_login_ip NOT REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$';

These also have to be fixed manually before data can be copied fully.

Constraint names are truncated

Updating the rows with invalid values and running pgloader again produces no errors, but still produces a lot of warnings about some obnoxiously long index/constraint names being truncated. E.g.:

2021-11-19T10:51:04.069000Z WARNING PostgreSQL warning: identifier "idx_233110_oauth2_provider_gran_application_id_81923564_fk_bid_main_" will be truncated to "idx_233110_oauth2_provider_gran_application_id_81923564_fk_bid_"

As far as I can tell, this doesn't break anything because the truncated names were still unique and constraints and indices were still created, so for now we assume this is a non-issue.

At this point we have a full copy of Blender ID's production data in our PostgreSQL instance 🎉

To be continued:

    • deploying missing PostgreSQL deps to production (see linked commits);
    • switching Django's database configuration;

Recap of the migration

switch to maintenance

    # show a maintenance page
    docker exec -it blender-id sh
    uwsgi --stop /var/run/uwsgi/blender-id.pid  # did not work, see plan B below
    
    # plan B: add maintenance mode to nginx config
    nginx -t
    pkill -HUP nginx
    touch /var/www/maintenance_on

    # stop cron
    sudo systemctl stop cron

backup everything

    sudo /bin/bash /var/www/blender-id/backup.sh
    sudo -Hu dbbackup /var/www/sintel-blender-org-management/postgresql_backup.sh

copy and switch

    # copy the data
    cd /opt/pgloader/
    bash copy_mysql_to_psql.sh 2>&1 | tee pgloader101.log
    # check pgloader101.log: should have 0 errors
    
    # change default database
    vim /var/www/blender-id/settings/blender_id_settings.py

bring everything back

    # restart UWSGI
    uwsgi --reload /var/run/uwsgi/blender-id.pid
    # hide a maintenance page
    docker exec -it blender-id sh
    # uwsgi /etc/uwsgi/uwsgi.ini
    rm /var/www/maintenance_on

    # start cron
    sudo systemctl start cron
### Goal The goal of this migration is to have a more unified (in terms of number of technologies used) web infrastructure, starting with at least having all Django projects use the same RDBMS. This makes running local development setups and having exactly the same up-to-date dependencies everywhere easier. One downside to this is the way PostgreSQL major version upgrades have to be done, but this would produce an issue once every several years, while upgrades of Django and surrounding boilerplate of DB client libs and other deps have to happen much more often. The machine, which runs production Blender ID, already has a PostgreSQL container, so, if production data is moved successfully, doing the swithcover should be easy. The main problem is copying the data with minimal disruptions. ### How production data is copied `./manage.py dumpdata|loaddata` were considered, but without extra hacks, such as migrating first, then truncating all the tables, `loaddata` won't actually be able to load all the data. Not only that, but it's also problematic because `loaddata` will trigger all signals, which we don not want to do for all the production data. Luckily for us, latest https://pgloader.io/ (version 3.6.2 at the moment of writing) does a good job of copying data directly from a running MySQL to a running PostgreSQL. It has to be built from sources because Debian/Ubuntu are much behind the currently stable working version, as usual. Let's assume we've done that and `pgloader` binary is now located at `/opt/pgloader/pgloader-bundle-3.6.2`. In order to be able to copy from MySQL instance to PostgreSQL one, `pgloader` must have access to both, so new user, database and schema must be created in PostgreSQL instance (these can be the same ones Blender ID will later use). Something along these lines executed in `psql` should do the trick: ``` postgres=# CREATE USER blender_id CREATEDB PASSWORD ??????; postgres=# CREATE DATABASE blender_id OWNER blender_id; postgres=# \c blender_id; You are now connected to database "blender_id" as user "postgres". blender_id=# CREATE SCHEMA blender_id; blender_id=# GRANT CONNECT ON DATABASE blender_id TO blender_id; GRANT blender_id=# GRANT USAGE ON schema blender_id TO blender_id; GRANT blender_id=# GRANT ALL ON schema blender_id TO blender_id; GRANT ``` `pgloader` also needs to be able to connect to both instances, so we need to find IPs of their containers: ``` for n in `docker network ls | awk '{print $1}'`; do docker network inspect $n | grep -E 'postgr|mysql' -A 3 | grep -iE 'name|ip.*address' ; done ``` This outputs something along the lines of ``` "Name": "postgres", "IPv4Address": "172.22.0.4/16", "Name": "mysql", "IPv4Address": "172.22.0.8/16", ``` A script for copying the data can look as follows: ``` #!/bin/bash PGLOADER=/opt/pgloader/pgloader-bundle-3.6.2/bin/pgloader MYSQL_IP=172.22.0.8 PSQL_IP=172.22.0.4 MYSQL_IP="${MYSQL_IP}" PSQL_IP="${PSQL_IP}" \ PGPASSWORD=?????? \ MYSQL_PWD=?????? \ $PGLOADER --verbose mysql_to_psql.load ``` where contents of `mysql_to_psql.load` looks like this: ``` LOAD DATABASE FROM mysql://blender_id@{{MYSQL_IP}}:3306/blender_id INTO postgresql://blender_id@{{PSQL_IP}}:5432/blender_id CAST column bid_main_user.last_login_ip to inet drop typemod, column bid_main_user.current_login_ip to inet drop typemod ; ``` **Issues encountered** **Invalid datetime values** The first attempt produced some errors due to invalid `datetime` values: ``` ERROR Database error 23502: null value in column "date_joined" violates not-null constraint ``` This is because MySQL has a bug that allows inserting invalid `0000-00-00 00:00:00` DATETIME values, which are later treated as `NULL`s, which in turn breaks the `NON NULL` constraint for these values. **Invalid inet values** PostgreSQL has a data types for IP addresses and these used by Django's `GenericIPAddressField`, which means that all values contained by `last_login_ip` and `current_login_ip` must be castable to `inet`, otherwise row will fail to be copied with the following error: ``` 2021-11-19T16:28:54.816000Z ERROR Database error 22P02: invalid input syntax for type inet: "192.168.168.116, 127.0.0.1" CONTEXT: COPY bid_main_user, line 14055, column current_login_ip: "192.168.168.116, 127.0.0.1" ``` To find all offending records we can use: ``` SELCT id, last_login, email, current_login_ip, last_login_ip FROM bid_main_user WHERE current_login_ip NOT REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$' OR last_login_ip NOT REGEXP '^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$'; ``` These also have to be fixed manually before data can be copied fully. **Constraint names are truncated** Updating the rows with invalid values and running `pgloader` again produces no errors, but still produces a lot of warnings about some obnoxiously long index/constraint names being truncated. E.g.: ``` 2021-11-19T10:51:04.069000Z WARNING PostgreSQL warning: identifier "idx_233110_oauth2_provider_gran_application_id_81923564_fk_bid_main_" will be truncated to "idx_233110_oauth2_provider_gran_application_id_81923564_fk_bid_" ``` As far as I can tell, this doesn't break anything because the truncated names were still unique and constraints and indices were still created, so for now we assume this is a non-issue. At this point we have a full copy of Blender ID's production data in our PostgreSQL instance 🎉 To be continued: - - [x] deploying missing PostgreSQL deps to production (see linked commits); - - [x] switching Django's database configuration; ## Recap of the migration **switch to maintenance** ``` # show a maintenance page docker exec -it blender-id sh uwsgi --stop /var/run/uwsgi/blender-id.pid # did not work, see plan B below # plan B: add maintenance mode to nginx config nginx -t pkill -HUP nginx touch /var/www/maintenance_on # stop cron sudo systemctl stop cron ``` **backup everything** ``` sudo /bin/bash /var/www/blender-id/backup.sh sudo -Hu dbbackup /var/www/sintel-blender-org-management/postgresql_backup.sh ``` **copy and switch** ``` # copy the data cd /opt/pgloader/ bash copy_mysql_to_psql.sh 2>&1 | tee pgloader101.log # check pgloader101.log: should have 0 errors # change default database vim /var/www/blender-id/settings/blender_id_settings.py ``` **bring everything back** ``` # restart UWSGI uwsgi --reload /var/run/uwsgi/blender-id.pid # hide a maintenance page docker exec -it blender-id sh # uwsgi /etc/uwsgi/uwsgi.ini rm /var/www/maintenance_on # start cron sudo systemctl start cron ```
Anna Sirota self-assigned this 2021-11-19 15:50:45 +01:00
Author
Owner

Added subscriber: @railla

Added subscriber: @railla

This issue was referenced by 4ff88bee7b

This issue was referenced by 4ff88bee7b39190f46e80a51c70599928f754a32

This issue was referenced by 7a13e8c0a0b8e0bd0e047aed11f6b324cd37915d

This issue was referenced by 7a13e8c0a0b8e0bd0e047aed11f6b324cd37915d

This issue was referenced by 27c592e0654174d4e3872b0bdf8f5beee9bbaa84

This issue was referenced by 27c592e0654174d4e3872b0bdf8f5beee9bbaa84

This issue was referenced by 5ddeaf8047

This issue was referenced by 5ddeaf8047a673493c36d5f3b63ede502f6da817

This issue was referenced by 02cbd58391

This issue was referenced by 02cbd583917f3e03d5fc5ad86673498766e73232
Author
Owner

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'
Author
Owner

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/blender-id#93212
No description provided.