Select Page

Parsing File Paths As Slash-Delimited Lists In MySQL

Cyberdime
Published: February 17, 2023

I’ve been working on populating a temporary MySQL database table that contains file paths of files that need to be deleted. And, since deleting data is always terrifying, I wanted to write a validation SQL statement that would take the files in the temporary table and then cross-reference them against other tables in the database. In order to do this, I needed to parse the file paths and extract various path segments. And, by using a combination of calls to SUBSTRING_INDEX(), I was able to treat each file path as a delimited list in MYSQL.

Imagine that I have a file path stored in the form of:

/path/to/images/1234.png

In order to cross-reference this file path with another database table, I need to extract the ID of the parent entity – 1234 in this case – and the filename, which is 1234.png. If you squint your eyes, you can think of a file path as being a set of nested, delimited lists. The outer list is a series of segments delimited by (/). And, the filename is an inner list which contains a series of segments delimited by (.).

Unfortunately, MySQL doesn’t have the powerful list functions that ColdFusion has. But, one of the functions that it does have is SUBSTRING_INDEX(). This function will extract prefixes and suffixes based on a given delimiter.

From the documentation:

SUBSTRING_INDEX( str, delim, count )

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.

So, basically, if the count is positive, it returns a prefix; and, if the count is negative, it returns a suffix. Let’s try applying this to our file path. Here’s extracting prefixes from the path:

SET @path="/path/to/images/12345.png";
SELECT
	( @path ) AS path,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 1 ) AS segment_1,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 2 ) AS segment_2,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 3 ) AS segment_3,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 4 ) AS segment_4,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 5 ) AS segment_5
;

As you can see, we’re using the / as the path delimiter. And, when we run this SQL, we get the following output:

mysql> SELECT
    ->     ( @path ) AS path,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 1 ) AS segment_1,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 2 ) AS segment_2,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 3 ) AS segment_3,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 4 ) AS segment_4,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", 5 ) AS segment_5
    -> \G
*************************** 1. row ***************************
     path: /path/to/images/12345.png
segment_1: 
segment_2: /path
segment_3: /path/to
segment_4: /path/to/images
segment_5: /path/to/images/12345.png

As you can see, as the count SUBSTRING_INDEX() count increases, more of our leading path segments are included in the result.

Now, let’s try extracting suffixes from the path:

SET @path="/path/to/images/12345.png";
SELECT
	( @path ) AS path,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ) AS segment_negative_1,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -2 ) AS segment_negative_2,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -3 ) AS segment_negative_3,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -4 ) AS segment_negative_4,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -5 ) AS segment_negative_5
;

Again, we’re treating the path as a / delimited list; but, this time we’re using negative counts. This SQL gives us the following:

mysql> SELECT
    ->     ( @path ) AS path,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ) AS segment_negative_1,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -2 ) AS segment_negative_2,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -3 ) AS segment_negative_3,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -4 ) AS segment_negative_4,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -5 ) AS segment_negative_5
    -> \G
*************************** 1. row ***************************
              path: /path/to/images/12345.png
segment_negative_1: 12345.png
segment_negative_2: images/12345.png
segment_negative_3: to/images/12345.png
segment_negative_4: path/to/images/12345.png
segment_negative_5: /path/to/images/12345.png

As you can see, as the count SUBSTRING_INDEX() count increases (negatively), more of our trailing path segments are included in the result.

And, with the first result above being the filename in our path, we can really start to see how we can get at the low-level data by passing the result of one SUBSTRING_INDEX() call in as the input for another SUBSTRING_INDEX() call:

SET @path="/path/to/images/12345.png";
SELECT
	( @path ) AS path,
	SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ) AS filename,
	SUBSTRING_INDEX( SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ), '.', 1 ) AS id,
	SUBSTRING_INDEX( SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ), '.', -1 ) AS ext
;

In this case, we’re extracting the filename from the outer / delimited list. And then, we’re treating the filename as a . delimited list. So, when we run this SQL, we get the following output:

mysql> SELECT
    ->     ( @path ) AS path,
    ->     SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ) AS filename,
    ->     SUBSTRING_INDEX( SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ), '.', 1 ) AS id,
    ->     SUBSTRING_INDEX( SUBSTRING_INDEX( @path, "https://www.bennadel.com/", -1 ), '.', -1 ) AS ext
    -> \G
*************************** 1. row ***************************
    path: /path/to/images/12345.png
filename: 12345.png
      id: 12345
     ext: png

As you can see, by using a series of nested SUBSTRING_INDEX() calls, we were able to extract the filename (12345.png), the ID (12345), and the file extension (.png).

These values can then be used in an INNER JOIN clause for cross-table validation. Here’s some pseudo-code for what I was trying to do – imagine that my clean_up table contains file paths relating to the widget table:

-- We want to VALIDATE that we're not about to delete REAL DATA. As such, let's
-- check to make sure that the clean_up records have NO CORRESPONDING ROWS in
-- the widget table. We want this query to return ZERO RECORDS!!!! If it
-- returns any records, it means we're about to delete active data.
SELECT
	c.id,
	c.path
FROM
	clean_up c
INNER JOIN
	widget w
ON
	(
			-- Match the widget on the ID extracted from the path.
			w.id = SUBSTRING_INDEX( SUBSTRING_INDEX( c.path, "https://www.bennadel.com/", -1 ), '.', 1 )
		AND
			-- Make sure the filenames match as well.
			w.imageFilename = SUBSTRING_INDEX( c.path, "https://www.bennadel.com/", -1 )
	)
;

Most of my day-to-day work involves straightforward queries for data being displayed to the user. But sometimes, in maintenance and migration scripts, things get tricky. And, it’s nice to know that MySQL has some affordance for treating strings as lists.

Want to use code from this post?
Check out the license.

Source: www.bennadel.com