Introduction
This function will scan directories and return keyed arrays of file attributes matching a user provided filter string. Perfect for image, documents, and other sorts of content delivery where a naming convention is known but the directory contents are often appended or otherwise in flux.
Example
Let’s assume we need to locate a series of .pdf newsletters. Occasionally these letters are uploaded to the web server with a big endian date based naming convention.
Since we know each file begins with “bio_newsletter_”, we can use that as our search string, like this:
$directory = '/docs/pdf/';
$pattern = 'bio_newsletter_*.pdf';
$attribute = 'name';
$sorting = 3; // Descending order + natural sort.
$files = dc_directory_scan($directory, $pattern, $attribute, $sorting);
The function will then rummage through our target directory, and return an array with any matched files, giving you an output that looks something like this:
| Key | Value |
|---|---|
| bio_newsletter_2015_09.pdf | bio_newsletter_2015_09.pdf |
| bio_newsletter_2015_05.pdf | bio_newsletter_2015_05.pdf |
| bio_newsletter_2015_04.pdf | bio_newsletter_2015_04.pdf |
| … | |
This might look redundant, but that’s because keys are always populated with file name to allow extraction of values by name later, and in this case we are looking specifically for the file name. There is an option of returning one of several attributes, which are reflected in the value.
If for some reason the directory is not readable or doesn’t exist, the function will throw an exception error.
Source
I wrote this to be fully self-contained, but if preferred, it should be easy enough to assimilate into a class.
/*
* Caskey, Damon V.
* 2012-03-19
* - Refactor for PHP 7.1+ 2016-11-08
*
* Directory scanning utility function. Accepts a
* directory path, glob pattern, and sorting options,
* and returns an associative array of matched files
* sorted by the specified attribute.
*
* @param string $directory - The directory to scan.
* @param string $pattern - The glob pattern to match files against.
* @param string $attribute - The file attribute to sort by (e.g. 'name', 'size', 'mtime').
* @param int $sorting - Bitfield representing the sorting flags (default: 0).
*
* Sorting flags (bitmask):
* 0 = default ascending sort
* 1 = descending sort
* 2 = natural filename sort
* 3 = descending natural filename sort
*
* @return array An associative array of matched files sorted by the specified attribute.
* @throws InvalidArgumentException If an unsupported attribute is specified or if the directory is not readable.
* @throws RuntimeException If the directory cannot be read.
*/
function dc_directory_scan(string $directory, string $pattern, string $attribute = 'name', int $sorting = 0): array {
/*
* Define allowed attributes for sorting.
* Fail if an unsupported attribute is requested
* to prevent errors.
*/
$allowed_attributes = ['name', 'size', 'mtime', 'ctime', 'atime'];
if (!in_array($attribute, $allowed_attributes, true)) {
throw new InvalidArgumentException("Unsupported attribute for sorting: ". $attribute);
}
/*
* Define local sorting flags. If
* unsupported flags are set, throw an
* exception.
*/
$sorting_descending = 1 << 0; // 1
$sorting_natural = 1 << 1; // 2
$sorting_known_flags = $sorting_descending | $sorting_natural;
if (($sorting & ~$sorting_known_flags) !== 0) {
throw new InvalidArgumentException("Unsupported sorting flag: " . ($sorting & ~$sorting_known_flags));
}
/*
* Decode bitfields to boolean flags for
* sorting behavior.
*/
$descending = ($sorting & $sorting_descending) === $sorting_descending;
$natural = ($sorting & $sorting_natural) === $sorting_natural;
/* Normalize directory path by removing trailing slash if present. */
$directory = rtrim($directory, '/\\');
if ($directory === '') {
$directory = DIRECTORY_SEPARATOR;
}
/* Validate that the directory is readable before attempting to scan. */
if (!is_readable($directory)) {
throw new InvalidArgumentException("Directory is not readable: ". $directory);
}
/* Use glob to find files matching the pattern in the specified directory. */
$matches = glob($directory.DIRECTORY_SEPARATOR.$pattern);
/* Validate glob results. */
if ($matches === false) {
throw new RuntimeException("Failed to read directory: ". $directory);
}
/* Return empty array if no matches are found. */
if (empty($matches)) {
return [];
}
/*
* Output array from glob is a simple indexed array
* of file paths. We need to build an associative
* array keyed by the filename, with the selected
* attribute as the value for sorting.
*
* Scan through the matched file paths and retrieve
* the specified attribute for each file.
*/
$result = [];
foreach ($matches as $path) {
/*
* Skip entries that are not regular files.
*/
if (!is_file($path)) {
continue;
}
/*
* Use basename to get the file name from the path
* for the 'name' attribute.
*/
$name = basename($path);
/*
* If sorting by 'name', use the file name as
* the key and value in the result array.
*/
if ($attribute === 'name') {
$result[$name] = $name;
continue;
}
/*
* Use stat to retrieve file attributes for sorting
* by 'size', 'mtime', etc.
*/
$stat = stat($path);
if ($stat === false || !array_key_exists($attribute, $stat)) {
/* Skip files that cannot be stat-ed or do not have the specified attribute. */
continue;
}
/*
* Use the file name as the key and the selected
* attribute value as the sortable value.
*/
$result[$name] = $stat[$attribute];
}
/*
* Sort the result array by the specified attribute in
* either ascending or descending order.
*/
if ($attribute === 'name') {
$sort_flags = $natural ? (SORT_NATURAL | SORT_FLAG_CASE) : (SORT_STRING | SORT_FLAG_CASE);
$descending ? arsort($result, $sort_flags) : asort($result, $sort_flags);
} else {
$descending ? arsort($result, SORT_NUMERIC) : asort($result, SORT_NUMERIC);
}
/* Return the sorted result array. */
return $result;
}
A word of caution – directory scanning is simple and effective, but doesn’t scale so well. A few hundred files is fine, but once you start breaching the thousands it’s probably time to break your directory structure down a bit, or consider an RDBMS solution.
Until next time!
DC
