Select Page

Adding jreExtract() To Pluck Captured Groups Using Regular Expressions In ColdFusion

Cyberdime
Published: June 19, 2022

I love Regular Expressions. I use them probably every day in some capacity. And, I’ve loved having my JRegEx.cfc project to simplify many pattern-based interactions. Today, I’m adding one more tool to that toolbox: jreExtract(). The jreExtract() method matches a Java Regular Expression against an input String and returns a Struct of the captured groups. Sometimes, I want to use a single pattern-match to pluck-out parts of a string in ColdFusion.

View this code in my JRegEx project on GitHub.

The JRegEx.cfc ColdFusion component already has a method named, jreMatchGroups(). This returns an array of captured groups for each match. You can think of jreExtract() as roughly equivalent to:

jreMatchGroups( ... ).first() ?: {}

It’s capturing groups; but, instead of capturing all matches, it captures only the first match. Or, returns an empty-struct if no match can be found. This provides an elegant way to break a String up into various parts using a Regular Expression pattern without having to deal with arrays.

In the following ColdFusion demo, I’m going to use the jreExtract() method to try and parse different parts of a URL into different captured groups. For the sake of the dump() calls, I’m going to map the captured group indices onto human-friendly names:

<cfscript>
	jre = new JRegEx();
	// The following pattern uses a VERBOSE Regular Expression flag to allow for comments
	// and whitespace to make the pattern easier to read. This pattern attempts to capture
	// the aspects of an HTTP URL.
	pattern = "(?x)^
		## Protocol extraction.
		( https?:// | // )?
		## Hostname extraction.
		( [^./][^/\##?]+ )?
		## Pathname extraction (must start with `./` or `/`).
		( \./[^?\##]* | /[^?\##]* )?
		## Query-string extraction (`?` is not captured).
		(?: \? ( [^\##]* ) )?
		## Fragment extraction (`##` is not captured).
		(?: \## ( .* ) )?
	";
	extractGroups( "/index.htm" );
	extractGroups( "./index.htm" );
	extractGroups( "www.bennadel.com" );
	extractGroups( "//www.bennadel.com/index.htm" );
	extractGroups( "https://www.bennadel.com##cool-beans" );
	extractGroups( "https://www.bennadel.com/about/about-ben-nadel.htm?source=Google##h1" );
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	/**
	* I apply the Java Regular Expression to the given target text using the jreExtract()
	* method. The captured groups are mapped to human-friendly names for the demo dump.
	*/
	public void function extractGroups( required string targetText ) {
		var extraction = jre.jreExtract( targetText, pattern );
		dump(
			label = "Input: #targetText#",
			var = [
				match: extraction[ 0 ],
				protocol: extraction[ 1 ],
				hostname: extraction[ 2 ],
				pathname: extraction[ 3 ],
				queryString: extraction[ 4 ],
				fragment: extraction[ 5 ]
			]
		);
	}
</cfscript>

As you (might be able to) see, I’m using a verbose Regular Expression here to allow for non-matching whitespace and comments within the pattern text. This just makes the pattern easier to read, understand, and debug. In this demo, I’m trying to extract parts of various URL inputs. And, when we run this ColdFusion code, we get the following output:

As you can see, we were able to extract portions of the URL into captured groups within the Struct returned from the jreExtract() method.

ASIDE: As I’m writing this, I’m realizing how luxurious it is to be able to map captured group indices onto “named” groups. I might try to build that concept into the JRegEx.cfc library itself.

I seriously love Regular Expressions. And, using them in ColdFusion is such a treat. Having this jreExtract() function in my back pocket is definitely going to make certain use-cases just a little bit easier to deal with.

Want to use code from this post?
Check out the license.

Source: www.bennadel.com