avatarRenato Boemer

Summary

The web content provides an overview of how to use glob patterns and wildcards in the command line to efficiently match and manipulate filenames, particularly useful for data scientists and those familiar with Unix-like operating systems.

Abstract

The article focuses on the utility of command-line interfaces, specifically for Unix-like systems, and introduces glob patterns and wildcards as powerful tools for managing files. It emphasizes the inefficiency of manually specifying each file when performing operations like copying, and instead suggests using wildcards such as ? for single characters and * for strings of characters. The article also explains the use of character classes and escape characters to refine searches. By mastering these techniques, users can save time and streamline their workflow in the command line, avoiding repetitive tasks and making the most of the terminal's capabilities.

Opinions

  • The author believes that understanding glob patterns and wildcards is essential for those who frequently use the command line, especially data scientists.
  • The article conveys that using wildcards can significantly reduce the time spent on file management tasks.
  • It is suggested that the command line, equipped with the knowledge of wildcards, can be more efficient than graphical user interfaces for certain tasks.
  • The author implies that the command line's syntax, while initially daunting, can be mastered with practice and understanding of wildcards.
  • There is an underlying assumption that the reader may be apprehensive about using the command line and wildcards, which the author addresses by explaining their benefits and providing examples.
  • The article encourages the use of wildcards with caution, especially with commands that can alter or delete files, advising readers to test their patterns with ls before executing more destructive commands.

Save Time Using the Command-Line: Glob Patterns and Wildcards

Match filenames using patterns

Photo by CHUTTERSNAP on Unsplash

The most common operating systems for computers nowadays are Windows, Linux, and OS X, and they all come equipped with a terminal — also called shell. Linux and OS X are Unix-like operating systems. Because this system is frequently used by data scientists, I will focus on Bash — which is a type of Unix shell [1]. So, let’s get started.

If the only way you know how to copy, create, and find files when using the Command-Line is by passing arguments to cp—or avoiding it altogether — then keep reading. Wildcards are used to create patterns that match groups of filenames. These patterns are called glob patterns, which work like regular expressions (aka regex), but with different rules. The good news is that glob patterns combined with other commands will become useful tools to avoid repetitive commands.

Why You Should Care

Suppose you have to copy hundreds of files in the command line. Passing each of the files’ names as arguments to cp is, definitely, not the best use of your time. Some people will most likely spend a lot of time trying to figure out an alternative instead of dealing with the command line syntax. However, terminals give you a way to specify groups of files by creating patterns to match filenames. You can generate glob patterns to match filenames using both special characters (wildcards) and standard characters (letters and numbers).

How To Use Glob Patterns and Wildcards

Below is a list of commands that will build your knowledge of glob patterns and wildcards. After some practice, you will be able to match any character, string, or number without avoiding the command line. This is how you match:

1. A single character

The wildcard ? matches any single character. For example, if you want to find files that contain its, then use the pattern ?its. The output will show filenames that are four characters long such as hits.jpg, fits.png, or kits.docx.

Below is an example with the ls command, which is used for listing the contents of a directory. In the example below, I used the pattern b??t, which matches any four-character word that starts with b and ends with t. As a result, the shell returned a file called best.

/home/a_folder$ ls b??t

best

However, you might be wondering what happens when a filename has the character ? in its name. Like regular expressions, you can use a character’s literal meaning (as opposed to its special meaning) by including a backslash \. In these circumstances, a backslash is called an escape character [2].

2. A string of characters

The character * acts as a placeholder for any word. It is a placeholder for any number of characters, including spaces. This means * can be used to match multiple words. The wildcard * will match any character, any number of times, except for leading dots ..

Passing * as an argument to ls will list all non-hidden files and directories in the working directory, plus all files at the root of the listed directories. Here is what it looks like:

/home/a_folder$ ls *

file_A file_B Pic_C

What happens in the background?

  • For each file or directory, the shell checks to see if * matches its name. It will fit every word except for hidden files.
  • The names that are matched are passed as parameters to ls. Since the matched names are file_A, file_B, and Pic_C, running ls * in this instance is the same as running ls file_A file_B Pic_C.
  • To make the result easier to read, you can read more about Prettyprint [3].

More importantly, you can use wildcards in conjunction with other characters to form more complex patterns, just like you would with regular expressions. This is done by concatenating wildcards with other characters to match what you are searching for.

Here is an example: Suppose you want to list all the files (or directory content) in /home/a_folder with names ending in ics. This can be done by running ls *ics.

/home/learn$ ls *ics

Statistics Analytics

The pattern above matched filenames with *ics. It is the concatenation of the wildcard * with ics. The character * matched both Statist (in Statistics) and Analyt (in Analytics), and ics matches ics in both filenames. If nothing else has been returned it is because no other file or directory names end with ics.

3. Letters, numbers or both

You can also use character classes such as [[:alpha:]] (letters), [[:digit:]] (the numbers from 0 to 9), [[:lower:]] (lowercase letters), [[:upper:]] (uppercase letters), and [[:alnum:]](both letters and numbers).

Photo by Marcos Ferreira on Unsplash

Character classes are not square bracket wildcards. They are wildcards just like ? or *, but they must be used inside square brackets; otherwise, the shell will interpret them literally and return something else. Here are some examples:

  • To list all files (and the content of directories) in the working directory with names that end in . and are directly followed by three lowercase letters (such as .jpg or .pdf), we can run ls *.[[:lower:]][[:lower:]][[:lower:]].
  • To list all files (and the content of directories) in the working directory with names that do not start with an uppercase letter and end with a number, we can run ls [![:upper:]]*[[:digit:]]. The exclamation mark character ! represents not.

Although there are only examples of wildcards with the ls command in this article, wildcards work with most commands you are probably familiar with, such as cp, mv, rm, and rmdir.

Just be extra careful when using wildcards with commands like rm, cp and mv, as they can have negative consequences. Before using any filesystem-altering command with wildcards, make sure they will work as you intend by trying them with ls first.

Conclusion

Wildcards allow us to create patterns that match groups of filenames. These patterns work like regular expressions — and are called glob patterns — but with different rules. Glob patterns combined with other commands can make your life easier when using the command line or terminal. Here is a summary to help you get familiar with wildcards:

Table replicated by the author.

Resources:

[1] Bash: https://tiswww.case.edu/php/chet/bash/bashtop.html

[2] Escape Character: https://en.wikipedia.org/wiki/Escape_character

[3] Prettyprint: https://en.wikipedia.org/wiki/Prettyprint

Programming
Data Science
Machine Learning
Artificial Intelligence
Technology
Recommended from ReadMedium