Looking At The Performance Overhead Of A Read-Only Lock In Lucee CFML 5.3.8.201

Cyberdime
Published: June 24, 2022

In yesterday’s post, I demonstrated that iterating over shared Structs and Arrays is thread-safe in ColdFusion; assuming, of course, that the access is read-only. But, what if I need to occasionally mutate the shared data? In that case, I’d have to acquire an exclusive lock some of the time; which, in turn, means that I’d have to acquire a read-only lock most of the time. This got me thinking about the performance overhead of a read-only lock in Lucee CFML 5.3.8.201.

The performance overhead of an exclusive lock is easier to understand because it essentially single-threads access to a given block of code. So, if nothing else, there’s a limit to the throughput on an exclusive lock. But, with a read-only lock, throughput isn’t an issue (unless there’s a competing exclusive lock) – multiple threads can access the same read-only lock at the same time.

But, do the mechanics of a read-only lock have overhead in and of itself? Meaning, when there is no exclusive lock contention, does having a read-only lock in place affect throughput? To test, I’m going to try and iterate over shared data using parallel threads. In the first test – our control – there will be no locking. Then, in the second test, we’ll apply a read-only lock.

In the following control test, we’re giving ColdFusion a 10-second window in which to run as many iterations as possible. Each iteration will spawn parallel threads that each try to iterate over the same read-only data:

<cfscript>
	// Let's attempt to simulate concurrent request activity all trying to access shared
	// data. Each entry in the simulated request will be executed via parallel iteration.
	// And, each parallel iteration will try to iterate over the given shared data array.
	simulatedRequests = buildArray( 20 );
	sharedData = buildArray( 100 );
	// Let's keep track of how many test iterations we perform in our test window.
	loopCounter = 0;
	valueCounter = 0;
	// Each test window will be 10-seconds long.
	cutoffAt = ( getTickCount() + ( 10 * 1000 ) );
	// Let's see how many contentious read operations we can perform in our test window.
	while ( getTickCount() < cutoffAt ) {
		simulatedRequests.each(
			() => {
				for ( var increment in sharedData ) {
					// CAUTION: The "++" operator is NOT THREAD SAFE. As such, we cannot
					// trust the following operation inside a parallel iterator. That
					// said, I have it here in order to make sure that the Lucee compiler
					// doesn't try to optimize this inner loop away. I wanted to make sure
					// that we're consuming the iteration value in some way.
					valueCounter += increment;
				}
			},
			// Run the .each() in parallel using Java's thread pool.
			true,
			// Maximum number of parallel threads.
			simulatedRequests.len()
		);
		loopCounter++;
	}
	echo( "Without-lock test <br />" );
	echo( "Loop Counter: #loopCounter# <br />" );
	echo( "Value Counter: #valueCounter.intValue()# <br />" );
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	/**
	* I construct an array of the given size in each value is "1".
	*/
	public array function buildArray( required numeric size ) {
		var result = [];
		for ( var i = 1 ; i <= size ; i++ ) {
			result[ i ] = 1; // All values are 1 (for our counter).
		}
		return( result );
	}
</cfscript>

Note that inside each iteration of the shared data array, I’m using the value of the array item (which is always 1) to increment a counter. I’m doing this to make sure that the ColdFusion compiler isn’t removing our inner loop using some clever optimization. That said, the ++ operator is not thread-safe. As such, we don’t expect this inner counter to be accurate – it’s there just to force the code to compile a certain way.

That said, if I run this control case 10-times in a row and take the 5 highest values, we get the following performance numbers:

Without-lock test
Loop Counter: 5397
Value Counter: 10769762
Without-lock test
Loop Counter: 4964
Value Counter: 9895341
Without-lock test
Loop Counter: 5138
Value Counter: 10242939
Without-lock test
Loop Counter: 5296
Value Counter: 10552607
Without-lock test
Loop Counter: 4885
Value Counter: 9740710

As you can see, the 10-second window for our control test – without locking – resulted in outer iterations of 4,885 – 5,397.

ASIDE: You can also see how the ++ operator is not thread-safe. The “value counter” is different on every single request, despite the fact that it was always running the same logic.

Now for our read-only lock test. This is the same exact code; only, inside each parallel thread, we’re acquiring a read-only lock:

<cfscript>
	// Let's attempt to simulate concurrent request activity all trying to access shared
	// data. Each entry in the simulated request will be executed via parallel iteration.
	// And, each parallel iteration will try to iterate over the given shared data array.
	simulatedRequests = buildArray( 20 );
	sharedData = buildArray( 100 );
	// Let's keep track of how many test iterations we perform in our test window.
	loopCounter = 0;
	valueCounter = 0;
	// Each test window will be 10-seconds long.
	cutoffAt = ( getTickCount() + ( 10 * 1000 ) );
	// Let's see how many contentious read operations we can perform in our test window.
	while ( getTickCount() < cutoffAt ) {
		simulatedRequests.each(
			() => {
				lock
					name = "read-only-lock-test"
					type = "readonly"
					timeout = 5
					{
					for ( var increment in sharedData ) {
						// CAUTION: The "++" operator is NOT THREAD SAFE. As such, we
						// cannot trust the following operation inside a parallel
						// iterator. That said, I have it here in order to make sure that
						// the Lucee compiler doesn't try to optimize this inner loop
						// away. I wanted to make sure that we're consuming the iteration
						// value in some way.
						valueCounter += increment;
					}
				}
			},
			// Run the .each() in parallel using Java's thread pool.
			true,
			// Maximum number of parallel threads.
			simulatedRequests.len()
		);
		loopCounter++;
	};
	echo( "With-lock test <br />" );
	echo( "Loop Counter: #loopCounter.intValue()# <br />" );
	echo( "Value Counter: #valueCounter.intValue()# <br />" );
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	/**
	* I construct an array of the given size in each value is "1".
	*/
	public array function buildArray( required numeric size ) {
		var result = [];
		for ( var i = 1 ; i <= size ; i++ ) {
			result[ i ] = 1; // All values are 1 (for our counter).
		}
		return( result );
	}
</cfscript>

As you can see, this ColdFusion code is interesting into a read-only lock before it tries to iterate over the shared data. And, when I run this code 10-times in a row and take the 5 highest values, we get the following output:

With-lock test
Loop Counter: 5263
Value Counter: 10491977
With-lock test
Loop Counter: 4939
Value Counter: 9835909
With-lock test
Loop Counter: 5045
Value Counter: 10060834
With-lock test
Loop Counter: 5358
Value Counter: 10688017
With-lock test
Loop Counter: 5642
Value Counter: 11256738

As you can see, the 10-second window for our test – with read-only locking – resulted in outer iterations of 4,939 – 5,642. The side-by-side results:

  • Without Locking: between 4,885 and 5,397 iterations.
  • With Locking: between 4,939 and 5,642 iterations.

What I’m seeing here is that there is no readily apparent overhead to a having a read-only lock in ColdFusion. Both tests ran with a decent amount of variation between tests. But, both tests also ran within roughly the same min/max range.

Again, to be clear, if there was a competing exclusive lock, all of the read-only locks would block-and-wait until the lock became available. But, in a situation where there won’t be an exclusive lock in the vast majority of requests, having the read-only lock in place doesn’t appear to have a discernible performance overhead.

Check out the license.

Source: www.bennadel.com