Select Page

Using An Array To Power Weighted Distributions In Lucee CFML 5.3.8.201

Cyberdime
Published: June 29, 2022

Lately, I’ve been working on some code that needs to randomly assign a value to a request. But, the “randomness” isn’t entirely random: the set values needs to be assigned using a weighted distribution. Meaning, over a period of time, each value should be “randomly assigned” a limited percentage of the time. I’m sure there are fancy / mathy ways to do this; but, I’ve found that pre-calculating an array of repeated values makes the value-selection process simple in Lucee CFML 5.3.8.201.

Imagine that I want to randomly select the following values with a weighted distribution:

  • A: 10% of the time.
  • B: 20% of the time.
  • C: 70% of the time.

Instead of doing any fancy maths to figure this out, I’m literally taking those values and inserting them into a ColdFusion array, repeating each value according to the desired percent. So, if I want A to show up 10% of the time, I’m inserting A into the array 10-times. Then, I insert B 20-times and C 70-times. What this gives me is a ColdFusion array with 100 items and a composition that matches the weighted distribution.

At this point, in order to return a random weighted value, all I have to do is randomly select an index from this pre-calculated array. To see this in action, I’m going to code-up the above distribution and then select 100 random values:

<cfscript>
	// Our build-function is going to return a generator function that produces values
	// with the given weighted frequencies.
	next = buildWeightedDistribution([
		{ value: "a", percent: 10 },
		{ value: "b", percent: 20 },
		{ value: "c", percent: 70 }
	]);
	// By outputting 100 values, we should see occurrences that roughly match the defined
	// percentages from above.
	loop times = 100 {
		echo( next() & " " );
	}
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	/**
	* I return a function that will produce values with the given distribution. Each entry
	* is expected to have two properties:
	* 
	* - value: the value returned by the generator.
	* - percent: the weight (0-100) of the value in the distribution.
	*/
	public function function buildWeightedDistribution( required array distributions ) {
		var index = [];
		var indexSize = 0;
		// In order to make it super easy to generate the next value in our series, we're
		// going to pre-compute an array in which each value is repeated as many times as
		// is required by its weight. So, a value that is supposed to be returned 30% of
		// the time will be repeated 30 times in our internal index.
		for ( var distribution in distributions ) {
			loop times = distribution.percent {
				index[ ++indexSize ] = distribution.value;
			}
		}
		// Now that we have our internal index of repeated values, our generator function
		// simply has to pick a random value from the index.
		return(
			() => {
				return( index[ randRange( 1, indexSize, "SHA1PRNG" ) ] );
			}
		);
	}
</cfscript>

As you can see, I’m simply using the CFLoop tag, for each distribution, to repeat the given value the desired number of times. This leaves me with an array whose composition matches the weighted distribution. The returned fat-arrow function then does nothing more than select random values from the pre-calculated array. And, when we run this ColdFusion code, we get the following output:

100 randomly selected values with counts that roughly match the desired weighted distribution in Lucee CFML.

When this ColdFusion code ran, we ended up with the following value counts:

  • A: 8 – which is roughly 10% of the time.
  • B: 21 – which is roughly 20% of the time.
  • C: 71 – which is roughly 70% of the time.

As you can see, our value counts roughly match the desired weighted distribution.

The main downside to this approach is that the values needed to be repeated in memory. But, memory is cheap, and array look-ups are fast. As such, this feels like a really nice solution to this approach in ColdFusion.

Want to use code from this post?
Check out the license.

Source: www.bennadel.com