Select Page

Index Out Of Bounds For Length

Ben Nadel
Published: October 14, 2023

Over on my Feature Flags Book site, I’m starting to move some of the content behind a pay-wall; and, to do this, I’m using jSoup to replace multiple content paragraphs with a single purchase notice paragraph within designated chapters. However, in my first approach to this algorithm, I was getting the following jSoup error:

Index 1 out of bounds for length 0

The error isn’t terribly helpful; but, I believe what’s happening here is that when I remove an element from the jSoup DOM (Document Object Model) using an .empty() call, jSoup is not breaking the parent-child relationship to the removed elements. Which is then causing an issue when I go to re-append the removed elements back into the same parent.

I can reproduce this error with a simple jSoup demo using this HTML document:

<body>
	<p>jSoup + ColdFusion = Noice!</p>
</body>

To reproduce the error with ColdFusion (Lucee CFML), I’m going to .empty() the body and then re-append the single p element:

<cfscript>
	body = javaNew( "org.jsoup.Jsoup" )
		.parseBodyFragment( fileRead( "./content.htm" ) )
		.body()
	;
	paragraph = body.firstElementChild();
	// Remove all the children from the BODY and then try to re-add the paragraph.
	body
		.empty()
		.appendChild( paragraph )
	;
	// Output resultant HTML to the page.
	echo( body.outerHtml() );
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	/**
	* I create a new Java class wrapper using the jSoup JAR files.
	*/
	public any function javaNew( required string className ) {
		var jarPaths = [
			expandPath( "./jsoup-1.16.1.jar" )
		];
		return( createObject( "java", className, jarPaths ) );
	}
</cfscript>

And, when we run this ColdFusion code, we get the following error:

Index 1 out of bounds for length 0

For anyone Googling to get here, this is the stacktrace that I get:

lucee.runtime.exp.NativeException: Index 1 out of bounds for length 0
  at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
  at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
  at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
  at java.base/java.util.Objects.checkIndex(Objects.java:372)
  at java.base/java.util.ArrayList.remove(ArrayList.java:536)
  at org.jsoup.helper.ChangeNotifyingArrayList.remove(ChangeNotifyingArrayList.java:37)
  at org.jsoup.nodes.Node.removeChild(Node.java:504)
  at org.jsoup.nodes.Node.setParentNode(Node.java:482)
  at org.jsoup.nodes.Node.reparentChild(Node.java:563)
  at org.jsoup.nodes.Element.appendChild(Element.java:577)

To fix this error, we need to call .remove() on the p element before we try to re-append it to the body:

<cfscript>
	body = javaNew( "org.jsoup.Jsoup" )
		.parseBodyFragment( fileRead( "./content.htm" ) )
		.body()
	;
	paragraph = body.firstElementChild();
	// In order to re-append the paragraph back into the document, we have to first BREAK
	// THE PARENT RELATIONSHIP to the body. We can do that by calling removing() on the
	// paragraph itself.
	paragraph.remove();
	// Remove all the children from the BODY and then try to re-add the paragraph.
	body
		.empty() // Remove any remaining non-element nodes (ex, comments).
		.appendChild( paragraph )
	;
	// Output resultant HTML to the page.
	echo( body.outerHtml() );
	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //
	/**
	* I create a new Java class wrapper using the jSoup JAR files.
	*/
	public any function javaNew( required string className ) {
		var jarPaths = [
			expandPath( "./jsoup-1.16.1.jar" )
		];
		return( createObject( "java", className, jarPaths ) );
	}
</cfscript>

The only difference in this version of the code is that I’m calling paragraph.remove() before adding the node back into the DOM. Whatever this is doing behind the scenes, it is properly breaking the parent-child relationship in a way that calling .empty() does not.

ASIDE: Some jSoup methods, like .children(), return an Array of Element nodes called Elements. This array has its own .remove() method that will call .remove() on all of the nodes in the collection.

I don’t know enough about jSoup — or the intention of these methods — in order to call this a “bug”; but, I will say that it seems unexpected to me. In fact, I would expect an .empty() method to be little more than a short-hand implementation for looping over all the child-nodes and calling .remove() on them in turn.

Want to use code from this post?
Check out the license.

Source: www.bennadel.com