Select Page

Comparing Java’s MessageDigest To ColdFusion’s hash() Function In Lucee CFML

Cyberdime
Published: January 19, 2023

Last week, I implemented a ColdFusion port of the CUID2 library. My version seems to work correctly; however, it has some performance problems when compared to the Java version. When I instrumented the ColdFusion component methods, nothing really jumped out at me. But, I have a hunch that I could make the SHA hashing more performant. Only, I don’t have a great mental model for hashing. As such, I wanted to perform a small comparison of Java’s MessageDigest class with ColdFusion’s native hash() function for hashing a compound input.

As of ColdFusion 10, the hash() function can hash a binary value. And, before ColdFusion 10, we could still dip down into the Java layer to hash binary values with the MessageDigest class. However, I’ve historically only ever hashed a single value. And, with the CUID2 library, the hash is generated from a compound value that composes several sources of entropy. I’m curious to see if I can get better performance by generating the entropy as byte arrays, skipping any stringification of values, and hashing all the byte arrays together as a single composite value.

When using Java’s MessageDigest class, hashing a compound input is seemingly straightforward since I can call .update(bytes) on the instance as many times as I want before completing the hash with a call to .digest(). But, as much as possible, I bias towards the native ColdFusion methods rather than dipping down into the Java layer. To that end, I want to see if passing concatenated byte arrays into the hash() function produces the same result as calling the .update() method several times on MessageDigest.

To test this, I’m going to create several binary values from different sources (text, secure-random, and image); and then, try to create a single SHA-256 hash from the compound input. The following ColdFusion code has three tests; but, the last two tests exercise the same hash() function – I’m simply building the byte[] (Byte Array) input differently.

<cfscript>
	// A collection of binary data read from different sources.
	parts = [
		// From string data.
		charsetDecode( "This is a string", "utf-8" ),
		// From secure entropy data.
		createObject( "java", "java.security.SecureRandom" )
			.init()
			.generateSeed( 100 )
		,
		// From image data.
		fileReadBinary( "./logo.png" )
	];
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	// TEST ONE: We're going to take the various binary values / byte arrays and hash them
	// all together using the MessageDigest. The nice thing about the MessageDigest class
	// is that you can call .update() multiple times to feed-in the inputs one at a time.
	messageDigest = createObject( "java", "java.security.MessageDigest" )
		.getInstance( "sha-256" )
	;
	for ( part in parts ) {
		messageDigest.update( part );
	}
	// The .digest() method completes the hashing algorithm and returns the bytes for the
	// hash calculation. Since the native hash() method returns hex, let's encode the
	// results as hex for comparison.
	hexEncoding = binaryEncode( messageDigest.digest(), "hex" );
	dump(
		var = hexEncoding,
		label ="Using MessageDigest"
	);
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	// TEST TWO: We're going to take the various binary values / byte arrays and hash them
	// all together using the ColdFusion native hash() function. Unlike MessageDigest,
	// there's no way to incrementally build the hash input. As such, we're going to have
	// to reduce the parts down into a single binary value by appending them all together.
	aggregatedBytes = parts.reduce(
		( reduction, part ) => {
			return( reduction.append( part, true ) );
		},
		[]
	);
	// And, once we have a single COLDFUSION ARRAY of bytes, we have to CAST it to a
	// BINARY value in order to get to get it work with the hash() function.
	hexEncoding = hash( javaCast( "byte[]", aggregatedBytes ), "sha-256" );
	dump(
		var = hexEncoding,
		label = "Using hash()"
	);
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	// TEST THREE: This is the same as TEST TWO, only I want to use a ByteBuffer to
	// aggregate the inputs instead of using javaCast() with the ColdFusion array. This
	// doesn't really add much value; but, I am just trying to fill my head with options.
	buffer = createObject( "java", "java.io.ByteArrayOutputStream")
		.init()
	;
	for ( part in parts ) {
		buffer.write( part );
	}
	hexEncoding = hash( buffer.toByteArray(), "sha-256" );
	dump(
		var = hexEncoding,
		label = "Using hash( byte buffer )"
	);
</cfscript>

As you can see, in the Java-oriented code, each binary part is passed to a separate .update() call. And, in the ColdFusion-oriented code, the binary parts are reduced down (ie, flattened) to a single ColdFusion array, which is then passed to the hash() function. And, when we run this ColdFusion code, we get the following output:

Three different SHA-256 hash values all showing the same string.

As you can see, all three SHA-256 hash generation approaches all resulted in the same hex-encoded output. As such, I think we can conclude that calling .update() multiple times on the MessageDigest instance is functionally equivalent to concatenating multiple binary values and passing the composite to ColdFusion’s hash() function.

Now that I can, in theory, generate a hash from multiple binary values in ColdFusion, I might be able to update my CUID2 library to deal directly with byte arrays, removing unnecessary encoding. Of course, it remains to be seen as to whether or not that actually makes it any faster.

Want to use code from this post?
Check out the license.

Source: www.bennadel.com