Getting the most out of LiquiBase
This last part of the series, we focus on successful database schema evolution with LiquiBase. You will learn about LiquiBase features that we find most useful in our projects. This article also gives a roundup of our best practices and tips, like how to include custom SQL and stored procedures and how to use LiquiBase with different DBMS products.
Remember that one of the benefits of using LiquiBase is that it allows you to keep all the Changes to the database(s) in one place, the Changelog. But what happens if we are dealing with a very large and complex physical data model with frequent changes? Won’t the Changelog grow to such a size that it is completely unmanageable? This is a valid question to which the simple answer is that although LiquiBase will always have only one Master-Changelog per database, this does not mean that all Changesets have to be in that one Changelog file.
Partitioning the Changelog
In fact, it is very advisable to create smaller Sub-Changelogs and use the
include script statement to link these into the Master-Changelog:
<databaseChangeLog ...> <include file="r1.0/r1.0_changelog.xml"/> <include file="r2.0/r1.0_changelog.xml"/> ... </databaseChangeLog>
It is then quite easy and comfortable to create a hierarchically ordered directory of different Sub-Changelogs which allow to bundle all changes that concern a certain release or concept like triggers etc. into separate Sub-Changelogs. If the Sub-Changelogs are already referenced in the Master-Changelog, you only have to update the smaller Sub-Changelogs with any new Changesets while you will always just run the MasterChangelog.
This mechanism allows us to organize the changes for different releases into separate folders to create an easy to understand structure which makes it quite simple to add any new Changesets at the correct space.
We have found it particularly helpful to keep stored procedures in separate Sub-Changelog files, since they tend to get rather big. We use a special approach to achieve this (which differs from the way other Changes are handled): for these stored procedures, we only keep a single version of the procedure in the Sub-Changelog file – instead of appending a new Changeset each time the stored procedure changes. To ensure that the changed Changesets are run again, each procedures is wrapped into a single <changeSet> to which we add the attribute
runOnChange="true" that forces LiquiBase to check if the Changeset was changed instead of just checking if it was run once before so it can be skipped. This exception to the rule makes it a lot easier to manage and spot changes to stored procedures and as all our Changelogs are stored and versioned in Subversion there is no danger to lose data.
Using Relative Paths
We consider it a good practice to use relative paths to ensure consistency and to avoid problems caused by mistakenly reapplied Changesets. This is especially important if a structure of Master and Sub-Changelogs and/or a lot of
include statements are used.
A short recap: The “DATABASECHANGELOG” table records each Changeset that has successfully applied by author and id tag and by filename and uses this information to check if a given Changeset has already been applied.
If absolute paths are used they will be recorded as part of the filename and should (which is quite likely) the next application of the Master Changelog be triggered from a different path (e.g. a different developer’s machine, as part of a build process, etc.) the absolute path will no longer be the same and LiquiBase will not recognize the Changeset.
To avoid running into this kind of problems, some of my colleagues force LiquiBase to record the relative path names in the “DATABASECHANGELOG” table by adding the Master Changelog directory to the CLASSPATH so that Liquibase is launched to execute the Master Changelog only by filename (without path). All the
include statements are configured to also use only paths relative to the Master Changelog to ensure that no absolute paths are written to the “DATABASECHANGELOG” table. This has so far allowed us to avoid these problems.
For another discussion covering quite broadly why and how to use relative paths with LiquiBase, see the posting “Why does the change log contain the file name?” in the LiquiBase forum.
Custom SQL — Blessing and curse
In the former Installments we mentioned that PowerArchitect does not offer to reverse or forward engineer concepts like check constraints, triggers or stored procedures but you may have noticed that we just explained how we use separate Changelogs for stored procedures. So it is clear that there is another way to create Changesets for these. The concept is called Custom SQL and it enables us to add any SQL statement as a valid Changeset to LiquiBase. We just enclose the respective statement with the tags:
<changeSet author='xxxx' id='x'> <sql > ADD YOUR SQL STATEMENT HERE </sql> </changeSet>
If it is a stored Procedure the tags will be:
<changeSet author='xxxx' id='x' runOnChange="true"> <sql > ADD YOUR SQL STATEMENT HERE </sql> </changeSet>
Sometimes it is helpful to add rollback information after the
</sql> tag using the
<rollback> tag, to avoid problems should the custom SQL somehow fail.
Some advanced concepts are dependent on the DBMS (as they use SQL dialects that slightly vary). If it is obvious that the Changelog will have to be applied to different flavors of SQL we use the Include mechanism and create separate Folders to store all the custom sql Changesets for a given flavor of SQL (depends on DBMS). We might for example have one subdirectory for Oracle Custom SQL and one for PostgreSQL.
Using Preconditions in the Master-Changelog we can then control which set of Sub-Changelogs will be run for a given DBMS. Once this is implemented correctly in the Master-Changelog you can safely update your database specific Changelogs in the respective folders.
One of the major selling points to use LiquiBase for us is that it makes applying Data model Changes very quick, easy and safe. LiquiBase can be set up to be run using Ant, Maven, Spring etc. or by using the Commandline option. As we currently use Maven we can thus integrate LiquiBase into our build management to deal with database changes. So applying LiquiBase is just a click or two.
LiquiBase will keep track of which changes have already been applied (though you can force it to reapply all Changes by using the attributes “runAlways” or “runOnChange”). It does this by means of an internal table “DatabaseChangeHistory”, which registers every applied Changeset by filename, id and author tags. When applying a Changelog to a database, the LiquiBase process scans the “DatabaseChangeHistory” table and skips all Changesets that have already been applied. Once a Changeset has been applied the filename author and id will be logged in the table. This is a major time saver in the long run.
But what if a customer insists on getting a SQL file that his DBA can apply? Though we do not recommend it because it will be potentially less safe, more time consuming and give less control we can use LiquiBase for the creation of traditional SQL scripts as well. And we can easily switch to LiquiBase should the customer at a later stage decide that he wants it.
Special DBMS specific tips and warnings
Let me finish with a few experiences we have had using LiquiBase with various database management systems.
I will start with MySQL, the popular free DBMS, as it has some quirks that set it apart from most other DBMS.
In MySQL it is necessary to select a storage engine like InnoDB when creating new tables. This poses a problem in LiquiBase that we currently solve by a workaround suggested by LiquiBase. We put the create table statement into a separate Changeset which we then adapt using the tag
<changeSet id="55" author="xxx"> <createTable tableName="person"> <column name="id" type="bigint"/> <column name="firstname" type="varchar(255)"/> <column name="lastname" type="varchar(255)"/> </createTable> <modifySql> <replace replace="bigint" with="long"/> </modifySql> <modifySql dbms="mysql"> <append value=" engine innodb"/> </modifySql> </changeSet>
This is not perfect but it works.
Another MySQL specific quirk is that opposed to most other DBMS MySQL expects the definition of Constraints to be part of the column definition like:
<column name="ID" type="BIGINT" remarks="Primary key" autoIncrement="true"> <constraints nullable="false" primaryKey="true" primaryKeyName="KATEGORIE_pk" /> </column>
Instead of adding the key afterwards in a statement as it is generated by the PowerArchitect export:
<addPrimaryKey tableName="KATEGORIE" constraintName="KATEGORIE_pk" columnNames="ID"/>
To correct this manually is quite tedious but we have found it is possible to use an XSLT transformation to deal with this problem which takes much of the pain out of this.
Another inconvenience we have encountered with MySQL is that it enforces a limit of a maximum of 64 characters on remarks on tables or columns. Currently there is no workaround for this. While this is not a problem introduced by either Power Architect or LiquiBase, my colleague Thomas Kellerer has delivered a solution that can be run with the “Criticize Model” context menu item in Power Architect to check if the comment is compatible with MySQL (See image below). Maybe not a big deal for some but still a little handy solution.
On Oracle we have discovered that we can run into a situation were we have to use preconditions to ensure the correct username and this database is used when dealing with multiple databases as in Oracle DB user and schema are the same thing. We use Changeset modifications like the following to ensure that we do not run into problems
<changeSet author="xxx" id="5"> <preConditions onFail="MARK_RAN"> <or> <runningAs username="SHOP" /> <!-- PROD --> <runningAs username="SHOPBUGFIXONLINE" /> <!-- BUGFIX --> <runningAs username="SHOPTESTONLINE" /> <!-- TEST --> </or> </preConditions>
We also use Liquibase with PostgreSQL but have not yet ran into any major problems there.
Managing Data with LiquiBase?
A question that keeps coming up is: “Can I manage my (test) data with LiquiBase as well?”
The answer is yes and no. While it is often no problem to use LiquiBase to deal with small amounts of relatively static test data (e.g. to fill a smaller test database for the first time) using LiquiBase to maintain large amounts of quick changing data is often not so convenient. We are currently still investigating the best ways to deal with this problem and we will keep you informed about the outcome in another blog post.
Now that we have shared our experiences with you let us conclude to sum it up to answer the question: Why do we do it? Why do we use tools instead of just writing SQL scripts from logical data models?
One answer is that we believe and our experiences support it that well documented data models save us time and money. Many applications evolve over time and our tool-chain supports the adaptation of the physical data model and it simplifies the application of the changes to databases. It also helps us in the test and maintenance phases to create databases in different stages and selective rollbacks of many changes with great precision and little effort. And last but not least it is much easier to test a given data model on different DBMS should a customer want to either change his DBMS or insist on an unsuitable one – nothing is more convincing than a demonstration.