Select Page

Converting UUIDs To Binary For VARBINARY(16) Storage In MySQL And ColdFusion

Cyberdime
Published: October 4, 2022

The other day, while recording a Working Code podcast episode, I mentioned to Adam that a big blind-spot in my database mental model was storing non-AUTO_INCREMENT primary keys. Or, more specifically, using something like a UUID (Universally Unique Identifier), GUID, or CUID as a table’s primary key. As such, I wanted to start building up some foundational knowledge. And, based on what I’ve been reading, it seems that being able to convert a UUID string to and from a binary value is an important point of know-how. This post looks at performing this String-to-Binary conversion in ColdFusion.

To be clear, I am not a database expert! Yes, I love writing SQL. And yes, I love thinking deeply about database index design. But, I’m not one of those people who knows much about low-level storage details, engine ramifications, data replication, or any of the many complex topics that go into database management. Consider this post a note-to-self more than anything.

To start learning about storing Strings as primary keys, I did some reading:

From what I’ve seen in these articles – which is echoed in many StackOverflow posts – is that using Strings as primary keys is a trade-off: in return for having system-independent uniqueness, you incur larger indexes, larger working memory, possible performance hits, less intuitive values (pro-or-con depending on how you see it), and more complex workflows.

This post doesn’t tackle all of those issues – I’m here to noodle on just one of them: larger indexes. Part of the index-size issue comes from how the value is stored. If a UUID is a 35-character String, storing said UUID as a String requires 35-bytes (1 byte per character).

And, that’s just for the column value itself. When you consider that the primary key is implicitly stored as the suffix on a secondary index, the storage requirements of a “UUID as String” is multiplied by the number of indexes on the table. Not to mention that any other table using said UUID as a foreign key will also need 35-bytes.

A common suggestion for reducing storage size is to persist the value as a VARBINARY(16) instead of a VARCHAR(35). This technique is based on the fact that a UUID is already a HEX-encoded value. As such, converting a UUID into a Byte Array requires little more than a binaryDecode() call.

Converting a binary value back into a UUID is a little more work since we have to re-insert the dashes (-) after we generate the String. Here’s two User Defined Functions (UDFs) that I created for managing this conversion in ColdFusion:

<cfscript>
	/**
	* I convert the given UUID string to a byte array (binary value) for use in a MySQL
	* VARBINARY(16) database field.
	*/
	public binary function uuidToBinary( required string input ) {
		// The UUID string is already a hex-encoded representation of data. As such, to
		// convert it to binary, all we have to do is strip-out the dashes and decode it.
		return(
			binaryDecode(
				input.replace( "-", "", "all" ),
				"hex"
			)
		);
	}

	/**
	* I convert the given MySQL VARBINARY(16) byte array (binary value) to a ColdFusion
	* UUID string.
	*/
	public string function binaryToUuid( required binary input ) {
		var asHex = binaryEncode( input, "hex" );
		// ColdFusion UUIDs use the format: xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxx.
		return(
			asHex.left( 8 ) & "-" &
			asHex.mid( 9, 4 ) & "-" &
			asHex.mid( 13, 4 ) & "-" &
			asHex.right( 16 )
		);
	}
</cfscript>

To try these functions out, I created a simple MySQL database table that uses a VARBINARY primary-key and a value column that stores the UUID in plain-text so that we can confirm values:

CREATE TABLE `uuid_test` (
	`uid` varbinary(16) NOT NULL,
	`value` varchar(255) NOT NULL,
	PRIMARY KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Then, I generated 10,000 rows in this test table. Note that in my INSERT, I’m using a CFQueryParam of type binary for the primary-key column.

<cfscript>
	// Include are `uuidToBinary()` and `binaryToUuid()` UDFs.
	include "./functions.cfm";
	loop times = 10000 {
		uid = createUuid();
		```
		<cfquery>
			INSERT INTO
				uuid_test
			SET
				uid = <cfqueryparam value="#uuidToBinary( uid )#" sqltype="binary" />,
				value = <cfqueryparam value="#uid#" sqltype="varchar" />
			;
		</cfquery>
		```
	}
</cfscript>

To then test the SELECTing of rows, I looked in the database table, grabbed a UUID from about half-way through the table, and used it to locate the row. Notice that I’m using the uuidToBinary() to perform the look-up; and then, I’m using the binaryToUuid() to consume the key in my ColdFusion code:

<cfscript>
	// Include are `uuidToBinary()` and `binaryToUuid()` UDFs.
	include "./functions.cfm";
	// A UUID randoly picked from half-way through the records.
	uid = "6D9F382A-5164-48EF-8DDEA942D5EAE8E3";
	```
	<cfquery name="results">
		SELECT
			t.uid,
			t.value
		FROM
			uuid_test t
		WHERE
			t.uid = <cfqueryparam value="#uuidToBinary( uid )#" sqltype="binary" />
		;
	</cfquery>
	```
	dump( results.uid );
	dump( results.value );
	// Use our custom functions to convert the VARBINARY back to a String for consumption
	// within the ColdFusion application.
	dump( binaryToUuid( results.uid ) );
</cfscript>

When we run this ColdFusion (Lucee CFML) code, we get the following output:

Row located based on VARBINARY UUID look-up in MySQL and ColdFusion.

As you can see, we were able to locate the row in the database using the VARBINARY value that we generated with uuidToBinary(). Then, we were able to convert the binary value back into a ColdFusion-formatted UUID using the binaryToUuid() function.

And, if we run an EXPLAIN on this query, we can see that it uses the implicit primary-key index to look up a single-row without any table scanning!

NOTE: I’m using MySQL’s UNHEX() method in this case since I’m running this EXPLAIN right in the database interface:

EXPLAIN SELECT shows SQL query uses PKEY index to locate the row without scanning the table.

As you can see, this SQL query is using the implicit primary key (PKEY) index. And, is able to locate the single row using the index without any table scanning.

includes additional functions to perform these conversions in the SQL context:

  • UUID_TO_BIN()
  • BIN_TO_UUID()

For the time-being, I’ll be sticking with Integer-based AUTO_INCREMENT columns for my primary keys. But, at least I feel like I’m finally starting to build up my mental model for what a String-based primary-key might look like. The UUIDs that ColdFusion generates (random, version 4) incur additional problems with regard to index structure and storage; but, that’s a whole other topic (of which I know very little).

Want to use code from this post?
Check out the license.

Source: www.bennadel.com