avatarAnand Tripathi

Summary

The provided content outlines a method for managing database migrations across multiple databases using Alembic in a Python environment, particularly when transitioning from a monolithic architecture to microservices.

Abstract

The article discusses the challenges of managing database migrations for multiple independent databases within a single project, a scenario that arises when converting a monolithic application into microservices. The author, with a background in ORM and Python, advocates for using Alembic to handle migrations without manually writing raw SQL queries. The solution involves configuring Alembic to work with different databases by customizing the alembic.ini file and the env.py script to support separate migration directories and version tracking for each database. The process includes initializing Alembic, setting up the configuration for multiple databases, and modifying the env.py file to dynamically set the database URL and handle migrations for each database schema. The article concludes with instructions on testing the configuration and generating autogenerated migration files, emphasizing the importance of maintaining separate alembic_version tables for accurate version tracking of each database.

Opinions

  • The author expresses a preference for using SQLAlchemy and Alembic due to their ease of use and compatibility with various database engines.
  • There is a clear opinion that using different alembic.ini and env.py files for each database leads to code duplication and maintainability issues, which the proposed solution aims to avoid.
  • The author suggests that the default multidb template provided by Alembic is not suitable for their use case because it tracks all migrations in a single alembic_version table, which could lead to undesirable downgrade scenarios affecting multiple databases.
  • The article promotes the use of environment variables for database URLs, which is considered a good practice for security and flexibility in different environments.
  • The author encourages readers to follow the provided steps to manage multiple databases effectively and to refer to the accompanying GitHub repository for further code examples and guidance.
  • The author believes that a well-structured migration file name and length are important, as indicated by the inclusion of a file template with date, time, and revision information, along with a truncation setting for the slug length.
  • There is an opinion that readers would benefit from a Medium membership, suggesting that the cost is minimal compared to the value provided by unlimited access to stories and support for writers.
  • The author invites readers to follow the "Pythonistas" publication for more content similar to the article, indicating a sense of community and shared interest in Python-related topics.
  • The author recommends an AI service as a cost-effective alternative to ChatGPT Plus (GPT-4), suggesting that it offers similar performance and functionality.

Managing multiple databases migrations with Alembic

Photo by Maksim Shutov on Unsplash

Coming from a background in ORM and python, I loved working with SQLAlchemy as it makes our life so much easier that we don’t have to overthink whether we are using PostgresSQL, MySQL or any other database engine. Alembic just adds more spice to our tech stack when it comes to a migrate your database from one state to another without writing the raw queries. So basically my world was like a fair “PythonyTale”, I meant fairytale.

Everything was going smooth like a JSON object. But then we thought to convert our project from a monolithic app to micro-services.

Photo by Zoran Borojevic on Unsplash

Now comes the twist in architecture. Generally, if you want to convert a project into a microservice then the first thing you have to think for the candidates that can move out and acts as microservices. Microservices will have their own resources, backing services, environment basically everything. So we thought first it would be easy to separate out databases and then move out the resource layer from the project.

So we have to separate our database into smaller independent DBS and as we were following rolling updates so we shipped different DBS to production and in the next release the plan was to ship the other stuff. So in the meantime, we have to support different DBS in a single project.

The easiest way is to use different alembic.ini and env.py for different databases. But that will lead to a lot of code replication and maintainability of code. So we came up with a solution

So let’s get started

Photo by Matt Duncan on Unsplash

Installation

As easy as being an influencer.

pip install alembic
pip install pymysql

Alembic Setup

Alembic comes with a different types of templates for different use cases. To check that simply type

alembic list_templates

So bydefault alembic has async, multidb and generic template. We are not using multidb template because we are not using multidb the functionality of SQLalchemy. The multidb template in alembic is interesting, but it tracks all of your migrations in a single alembic_version table. (We found this to be undesirable because if we wanted to downgrade a database, we only wanted those changes to be isolated to that particular database–this doesn’t seem possible with a single alembic_version table for multiple databases.

So in our use case, we will be going to use the default template that is generic.

To initialize migration with a default template.

alembic init migrations

Here migrations are the migration directory that alembic will be going to create.

It will create alembic.ini (on the root level). migrations directory on the root level. Inside migrations, there is a versions directory that will have all the migrations files and on the same level, there is env.py.

So for our use case, we are going to play with two files alembic.ini and env.py

This is how the default alembic.ini looks like

# A generic, single database configuration.

[alembic]
# path to migration scripts
script_location = migrations

# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s

# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .

# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python-dateutil library that can be
# installed by adding `alembic[tz]` to the pip requirements
# string value is passed to dateutil.tz.gettz()
# leave blank for localtime
# timezone =

# max length of characters to apply to the
# "slug" field
# truncate_slug_length = 40

# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false

# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false

# version location specification; This defaults
# to migrations/versions.  When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator" below.
# version_locations = %(here)s/bar:%(here)s/bat:migrations/versions

# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
version_path_separator = os  # Use os.pathsep. Default configuration used for new projects.

# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8

sqlalchemy.url = driver://user:pass@localhost/dbname


[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts.  See the documentation for further
# detail and examples

# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME

# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = WARN
handlers = console
qualname =

[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine

[logger_alembic]
level = INFO
handlers =
qualname = alembic

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

Make sure you create a backup before editing this file

Alembic.ini

Alembic placed a file alembic.ini into the current directory. This is a file that the alembic script looks for when invoked.

So first create three databases in MySQL

Now I have three databases. Let’s our databases in the default alembic section of the ini file

# A generic, single database configuration.

[alembic]
# path to migration scripts
databases=first, second, third

Let’s add three databases in the alembic.ini as the three sections now so that we can use them while invoking script and also add some additional property in it.

[first]
sqlalchemy.url = driver://user:pass@localhost/dbname
version_locations = ./migrations/first

[second]
sqlalchemy.url = driver://user:pass@localhost/dbname
version_locations = ./migrations/second

[third]
sqlalchemy.url = driver://user:pass@localhost/dbname
version_locations = ./migrations/third
# version_locations is the key here to segregate migrations file for different databases

For all the databases settings I have put them in default section

[default]
script_location = migrations
# template used to generate migration files
file_template = %%(year)d-%%(month).2d-%%(day).2d-%%(hour).2d-%%(minute).2d-%%(second).2d_%%(rev)s_%%(slug)s
truncate_slug_length = 60

I have also added file_template that is to give your auto-generated file name a good structure and also to truncate it if the length is more than 60

Final alembic.ini looks like this

# A generic, single database configuration.

[alembic]
# path to migration scripts
databases = first, second, third


[DEFAULT]
# template used to generate migration files
file_template = %%(year)d-%%(month).2d-%%(day).2d-%%(hour).2d-%%(minute).2d-%%(second).2d_%%(rev)s_%%(slug)s
truncate_slug_length = 60

# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
# for all available tokens
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s

# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .
script_location = ./migrations/
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python-dateutil library that can be
# installed by adding `alembic[tz]` to the pip requirements
# string value is passed to dateutil.tz.gettz()
# leave blank for localtime
# timezone =

# max length of characters to apply to the
# "slug" field


# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false

# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false

# version location specification; This defaults
# to migrations/versions.  When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator" below.
# version_locations = %(here)s/bar:%(here)s/bat:migrations/versions

# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
version_path_separator = os

# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8

sqlalchemy.url = driver://user:pass@localhost/dbname

[first]
sqlalchemy.url = driver://user:pass@localhost/dbname
version_locations = ./migrations/first

[second]
sqlalchemy.url = driver://user:pass@localhost/dbname
version_locations = ./migrations/second

[third]
sqlalchemy.url = driver://user:pass@localhost/dbname
version_locations = ./migrations/third
# version_locations is the key here to segregate migrations file for different databases

[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts.  See the documentation for further
# detail and examples

# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME

# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = WARN
handlers = console
qualname =

[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine

[logger_alembic]
level = INFO
handlers =
qualname = alembic

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

env.py

Now let's update the env.py file, for this blog we are mainly going to edit the run_migrations_online function. So let’s go one by one.

Let's if you remember that in the alembic.ini file we left the sqlalchemy.url section to a placeholder URL only. So here in this file, we are going to set the URL. Add the below lines in env.py

db_name = config.config_ini_section # active config ini section is the db name that we have chosen
config.set_main_option(
    "sqlalchemy.url",
    f'{os.environ["SQLALCHEMY_DATABASE_URI"]}/{db_name}'
)

Change the run_migrations_online

def run_migrations_online() -> None:
    """Run migrations in 'online' mode.

    In this scenario we need to create an Engine
    and associate a connection with the context.

    """
    connectable = engine_from_config(
        config.get_section(config.config_ini_section),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )

    def include_object(object, name, type_, reflected, compare_to):
        if type_ == 'foreign_key_constraint' and compare_to and (
                compare_to.elements[0].target_fullname == db_name + '.' +
                object.elements[0].target_fullname or
                db_name + '.' + compare_to.elements[0].target_fullname == object.elements[
                    0].target_fullname):
            return False
        if type_ == 'table':
            if object.schema == db_name or object.schema is None:
                return True
        elif object.table.schema == db_name or object.table.schema is None:
            return True
        else:
            return False

    with connectable.connect() as connection:
        context.configure(
            connection=connection, target_metadata=target_metadata, include_object=include_object
        )

        with context.begin_transaction():
            context.run_migrations()

Now let’s create some SQLalchemy models for test tables for all three different databases.

from sqlalchemy import Text, Column, Integer
from sqlalchemy.ext.declarative import declarative_base


Base = declarative_base()


class FirstTest(Base):
    __table_args__ = ({'schema': 'first'})
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    name = Column(Text, nullable=True)


class SecondTest(Base):
    __table_args__ = ({'schema': 'second'})
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    name = Column(Text, nullable=True)


class ThirdTest(Base):
    __table_args__ = ({'schema': 'third'})
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    name = Column(Text, nullable=True)

If we want to autogenerate the migrations file we want to include them Base to alembic configuration in env.py

# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
from models import Base
target_metadata = Base.metadata

Now let’s test our configurations

RUN

Export database URL

export SQLALCHEMY_DATABASE_URI=mysql+pymysql://root:[email protected]:3306

To test the configuration just check the current version of alembic

# Use -n argument to specify the configuration and which db to choose
alembic -n first current
alembic -n second current
alembic -n third current

Now let’s generate our first autogenerated migration

alembic -n first revision --autogenerate -m "First test created"

It will create a migration file inside migrations/first directory

Then run same command for second and third database

alembic -n second revision --autogenerate -m "Second test created"
alembic -n third revision --autogenerate -m "Third test created"

So the final directory structure will look like this

Every migration has its own versions separated from each other. All the databases have their own alembic_version table in them to track versions.

The migrations file looks like this

Conclusion

You need to change the env.py according to your use case especially the include_object function object. If you feel any trouble with the code please checkout the git project of this blog at this link

If you liked this story, you might also like a Medium membership. It’s only $5 a month (a price of a cup of coffee!) but it will give you unlimited access to stories while supporting your favourite writers. If you sign up using this link, I’ll earn a small commission. Thanks!

For more content like this follow our publication Pythonistas and become a Pythonista

Alembic
Database Migration
Database
Sqlalchemy
Python
Recommended from ReadMedium