Automated database migrations with Alembic
Hello everyone, After discussing with Lukas about automation and database migrations, he suggested I write a proposal. Here it is. By the way, I’m a new contributor, and nothing has yet been decided on who would implement this proposal if we do decide to follow it. Feedback and contributions welcome! Context ======= We want to be able to deploy new versions of aurweb from Ansible without requiring manual intervention for data migration. Many all-in-one web frameworks provide tools for database migrations, but aurweb being a multi-language project, we can’t afford to make a switch to such a framework in the short term. Proposed solution: Alembic ========================== Disclaimer: I don’t have any practical experience with SQLAlchemy or Alembic in production. I’m suggesting it entirely because it looks like the best fit for our use case among the existing tools, but I could be missing something. Alembic[1] is a tool for managing database migrations. It is a sister project of the Python database library SQLAlchemy. Both Alembic and SQLAlchemy are flexible enough to collaborate with other database connectors. I suggest we use SQLAlchemy only for the migrations and the schema definition, without porting the existing Python and PHP codebase for now. The migrations scripts are written in Python and are intended to be used with SQLAlchemy, but that’s not a requirement. Regardless, I do suggest we migrate the whole schema to SQLAlchemy because (a) it takes care of the SQLite/MySQL compatibility for us, (b) Alembic is able to automatically generate migrations from SQLAlchemy schemas[2], and (c) we did plan to do that eventually, didn’t we? Alembic supports multi-database setups but don’t technically support migrating other data files like INI configurations, though it is something we might need to migrate in the future. We can still write custom code in the migrations, but with the risk that the files may get out of sync with their migration metadata in the database, which may happen after manual interventions. I think it’s worth it. To convert our current SQL schema, sqlacodegen[3] can generate the SQLAlchemy code for us. Alembic can also create migrations to migrate from zero to the full database, but I suggest we keep the SQLAlchemy schema up-to-date in the code. Migrations would only run on existing database, as described in the Alembic cookbook[4]. [1]: https://alembic.sqlalchemy.org/en/latest/tutorial.html [2]: https://alembic.sqlalchemy.org/en/latest/autogenerate.html [3]: https://pypi.org/project/sqlacodegen/ [4]: https://alembic.sqlalchemy.org/en/latest/cookbook.html#building-an-up-to-dat... Alternative solutions ===================== Flyway and Liquibase are two popular tools for database version control. They’re both commercial with opensource editions, and written in Java. While not bad in themselves, Alembic is a better match for us with its licensing and its Python ecosystem. We could roll our own migration tools, but it would certainly require more effort than using an existing tool, and error handling is tricky and may cause data damage if not done right.
Hi Frédéric, Thanks for the proposal! On Fri, 07 Feb 2020 at 01:51:28, Frédéric Mangano-Tarumi wrote:
We want to be able to deploy new versions of aurweb from Ansible without requiring manual intervention for data migration.
That is correct. We are still working closely with the DevOps team on migrating everything to Ansible but expect completion by the end of this week.
Proposed solution: Alembic ========================== [...]
I like the idea of using SQLAlchemy and Alembic for just the schema and migrations as a first step. It's important that nothing in our PHP code breaks (at least nothing that cannot be fixed with some simple code changes). I don't see where things could potentially break but it's better to do some testing. Following that, I guess we could also port the existing Python code to use SQLAlchemy? There currently seem to be ~100 database queries that we would need to work on.
The migrations scripts are written in Python and are intended to be used with SQLAlchemy, but that\u2019s not a requirement. Regardless, I do suggest we migrate the whole schema to SQLAlchemy because (a) it takes care of the SQLite/MySQL compatibility for us, (b) Alembic is able to automatically generate migrations from SQLAlchemy schemas[2], and (c) we did plan to do that eventually, didn\u2019t we?
Both (a) and (b) sound great! There have been proposals of doing a full rewrite of aurweb using Flask and SQLAlchemy. However, nothing is set in stone yet. I expect this rewrite to be a long-term project with a timeline of at least 12 months, considering that we need a *lot* of testing, both automated and manual, while backporting new changes at the same time. Doing a partial migration to SQLAlchemy certainly won't hurt, unless for some reason, we decide to use a different database toolkit for the rewrite.
Alembic supports multi-database setups but don\u2019t technically support migrating other data files like INI configurations, though it is something we might need to migrate in the future. We can still write custom code in the migrations, but with the risk that the files may get out of sync with their migration metadata in the database, which may happen after manual interventions. I think it\u2019s worth it.
It's likely that we need to migrate other things in the future but I expect that to be pretty rare (you can check the text files in upgrading/ for how often that happened in the past). As long as there is some way of adding other types of migrations in the future, we should focus on the database now and take care of other things as they come up.
To convert our current SQL schema, sqlacodegen[3] can generate the SQLAlchemy code for us. Alembic can also create migrations to migrate from zero to the full database, but I suggest we keep the SQLAlchemy schema up-to-date in the code. Migrations would only run on existing database, as described in the Alembic cookbook[4].
Sounds all good to me. We should leave this discussion open for a couple of days for others to chime in before starting with an implementation. Lukas
Lukas Fleischer [2020-02-07 11:11:35 +0100]
Following that, I guess we could also port the existing Python code to use SQLAlchemy? There currently seem to be ~100 database queries that we would need to work on.
Absolutely. Maybe we should wait until we have a few successful deployments with Alembic though, just in case.
Doing a partial migration to SQLAlchemy certainly won't hurt, unless for some reason, we decide to use a different database toolkit for the rewrite.
SQLAlchemy is our safest bet considering its popularity, but if we do end up with a different toolkit it’s gonna be a mess. If you have other candidates in mind, it might be worth looking at them before migrating.
Sounds all good to me. We should leave this discussion open for a couple of days for others to chime in before starting with an implementation.
Nobody has expressed dissent so far, so here’s a plan: 1. Migrate the schema to SQLAlchemy. The new schema will unify both SQLite and MySQL, but since the SQLite schema is generated from the MySQL one, that should go smoothly. Naming-wise, where should it go? aurweb.schema, aurweb.db.schema? We’ll need an init script too: aurweb.db.initdb, aurweb.scripts.initdb? 2. Set up Alembic. With its env.py, we’ll be able to use the common aurweb configuration rather than duplicate the database connection settings in Alembic’s INI. We should also link it to the schema created in step 1. I’m up for doing both tasks. If you’d like me to do them, just tell me when I should start.
participants (2)
-
Frédéric Mangano-Tarumi
-
Lukas Fleischer