Retype a Column in YugabyteDB (And PostgreSQL)

Franck Pachot
Published: March 22, 2023

You inherit a data model that does not use the best datatypes, like using int instead of bigint for a value that can be larger? Don’t wait until the tables grow larger. Here is an example to show how to retype it efficiently. Usually, those operations are done two times:

  • The DDL that may block the application must be short, with metadata changes only
  • The DML that modifies data can take longer as long as they are online without impact on the application availability

During the intermediate phase, you have to maintain both versions of the columns. Because YugabyteDB is PostgreSQL compatible with triggers, you may even do that without any change to the application code.

Here is an example.

Demo Table

I have the following table:

Unfortunately, column a was defined as int and I want to change it to bigint to get larger values.

Add the New Column

First, I’ll add a new column a_new with the target datatype, bigint and a trigger to update it from a:

This will synchronize the new column for new inserts or when a is updated, but I have all existing rows to update:

Backfill the New Column

I can do it in one transaction:

Or if the table is large, I may prefer to do it with small transactions. To reduce the reads, I can create an index on what remains to update:

This can take longer but is online, with backfill, and will help to update by small batches without re-scanning the table each time. You may choose to do the same without the index. With the index, you do fewer reads, but more writes.

This will update by the batch of 1,000:

It will stop with ERROR: division by zero when there is no more to update. I use that to stop the \watch without more code, but you can do better, of course.

The most important to be sure that it is efficient is to look at the execution plan with explain (costs off, analyze, dist):

  

This confirms that I have only two read requests and one write request for each batch.

The index, which has no entries now can be removed:

Validate the State of the New Column

I validate that a_new is synchronized even after some DML:

Switch to the New Column

With no urge and no stress, we are ready for the last step: dropping a and renaming a_new to a:

This must be done while the application does not query the table because in YugabyteDB 2.17, we don’t do transactional DDL yet (#3109). However, this is short as it updates only the table metadata. And it is done in a way where it can raise an error but not corrupt data.

Here is the table in the final state:

yugabyte=> select * from demo where k<=3;
 k | a
---+----
 3 |  6
 2 |  4
 0 | 42
 1 |  2
(4 rows)

Remarks

You don’t have to use a trigger. This can also be done with two application releases:

Even if, in theory, you don’t need this in PostgreSQL because DDL is transactional, you still want to avoid long transactions in PostgreSQL, and this is still a good alternative.

Source: dzone.com