At work, I’ve been building a way to generate “placeholder” images using a fragment of the DOM (Document Object Model). And, up until now, I’ve been using the
.measureText() method, available on the Canvas 2D rendering context, to programmatically wrap lines-of-text onto a
<canvas> element. But, this approach has proven to be a bit “glitchy” on the edges. As such, I wanted to see if I could find a way to detect the rendered line breaks in a text node of the document, regardless of what the text in the markup looked like. Then, I could more easily render the lines of text to the
<canvas> element. It turns out, the
ASIDE: As a quick note, I’m actually trying to recreate a very tiny fraction of what the
html2canvaslibrary by Niklas von Hertzen already does. But, as stated in his own README, the
html2canvaslibrary should not be used in a production application. As such, I wanted to try and create something over which I had full control (and responsibility).
Range object represents some fragment of the page. This can contain a series of nodes; or, part of a text node. What’s really cool about the
Range object is that it can return a collection of bounding boxes that represent the visual rendering of the items within the range.
I actually looked at the
Range object once before when drawing boxes around selected text. I didn’t really have a use-case for that exploration at the time; but, performing that experiment 4-years ago allowed me to see a path forward in my current problem.
If I have a text-node in the DOM, and I create a
Range for the contents of that text-node, the
.getClientRects() method, on the
Range, will return the bounding box for each line of text as it is rendered for the user. Now, this doesn’t inherently tell me which chunk of text is on which rendered line; but, it gives us a way to do that with a little brute-force magic.
Range that has a single character in it – the first character in our text-node. This
Range will only have a single bounding box. Now, what if we add the second character to that
Range and examine the bounding boxes? If there is still a single bounding box, we can deduce that the second character is in the first line of text. But, if we now have two bounding boxes, we can deduce that the second character belongs in the second line of text.
Extending this, if we incrementally expand the contents of a
Range, one character at a time, the last added character will always be in the last line of text. And, we can determine the “index” of that last line of text by using the current count of the bounding boxes.
This is definitely brute force and is probably going to be slow on very large chunks of text. But, for a single paragraph on a desktop computer, this brute force approach feels instantaneous.
Let’s see this in action. In the following demo, I have a text node with some static text in it. When you click the button, I examine the text node and brute force extract the rendered lines of text and log them to the console. The method of not here is called
extractLinesFromTextNode() – this is where we dynamically extend the
Range to identify the text wrapping:
As you can see, we’re looping over the characters in our text-node, adding each one the
Range in sequence. Then, after each character has been added, we look at the current number of bounding boxes in order to determine which line of text contains the just-added character:
var lineIndex = ( range.getClientRects().length - 1 );
At the end of the brute-forcing, we have a two-dimensional array of characters in which the first dimension is this
lineIndex value. Then, we simply collapse each character buffer (
Array) down into a single
String and we have our lines of text:
As you can see, we took a text-node from the DOM, which has no inherent line-breaks or text-wrapping, and used the
Range object to determine which substrings of that text-node where on which lines (as seen by the user).
This works on my Chrome, Firefox, Edge, and Safari (though, I had to normalize the white-space in the text-content in order for Safari to work consistently with the modern browsers). And, of course, this is for a text node only. Meaning, this approach wasn’t designed to work with an
Element node that might contain mixed-content (such as formatting elements). But, such a constraint is sufficient for my particular use-case.
Once I have this production, I’d like to follow-up with a more in-depth example of how I’m generating the placeholder images using the
<canvas> element. But, I’m hopeful that this approach will make it much easier to render multi-line text to that image.
Want to use code from this post?
Check out the license.