Save Time Using the Command-Line: Glob Patterns and Wildcards
Match filenames using patterns
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KMeTSTz2TZNV3JbUyopl9A.jpeg)
The most common operating systems for computers nowadays are Windows, Linux, and OS X, and they all come equipped with a terminal — also called shell. Linux and OS X are Unix-like operating systems. Because this system is frequently used by data scientists, I will focus on Bash — which is a type of Unix shell [1]. So, let’s get started.
If the only way you know how to copy, create, and find files when using the Command-Line is by passing arguments to cp
—or avoiding it altogether — then keep reading. Wildcards are used to create patterns that match groups of filenames. These patterns are called glob patterns, which work like regular expressions (aka regex), but with different rules. The good news is that glob patterns combined with other commands will become useful tools to avoid repetitive commands.
Why You Should Care
Suppose you have to copy hundreds of files in the command line. Passing each of the files’ names as arguments to cp
is, definitely, not the best use of your time. Some people will most likely spend a lot of time trying to figure out an alternative instead of dealing with the command line syntax. However, terminals give you a way to specify groups of files by creating patterns to match filenames. You can generate glob patterns to match filenames using both special characters (wildcards) and standard characters (letters and numbers).
How To Use Glob Patterns and Wildcards
Below is a list of commands that will build your knowledge of glob patterns and wildcards. After some practice, you will be able to match any character, string, or number without avoiding the command line. This is how you match:
1. A single character
The wildcard ?
matches any single character. For example, if you want to find files that contain its
, then use the pattern ?its
. The output will show filenames that are four characters long such as hits.jpg, fits.png, or kits.docx.
Below is an example with the ls
command, which is used for listing the contents of a directory. In the example below, I used the pattern b??t
, which matches any four-character word that starts with b and ends with t. As a result, the shell returned a file called best.
/home/a_folder$ ls b??t
best
However, you might be wondering what happens when a filename has the character ?
in its name. Like regular expressions, you can use a character’s literal meaning (as opposed to its special meaning) by including a backslash \
. In these circumstances, a backslash is called an escape character [2].
2. A string of characters
The character *
acts as a placeholder for any word. It is a placeholder for any number of characters, including spaces. This means *
can be used to match multiple words. The wildcard *
will match any character, any number of times, except for leading dots .
.
Passing *
as an argument to ls
will list all non-hidden files and directories in the working directory, plus all files at the root of the listed directories. Here is what it looks like:
/home/a_folder$ ls *
file_A file_B Pic_C
What happens in the background?
- For each file or directory, the shell checks to see if
*
matches its name. It will fit every word except for hidden files. - The names that are matched are passed as parameters to
ls
. Since the matched names arefile_A
,file_B
, andPic_C
, runningls *
in this instance is the same as runningls file_A file_B Pic_C
. - To make the result easier to read, you can read more about Prettyprint [3].
More importantly, you can use wildcards in conjunction with other characters to form more complex patterns, just like you would with regular expressions. This is done by concatenating wildcards with other characters to match what you are searching for.
Here is an example: Suppose you want to list all the files (or directory content) in /home/a_folder
with names ending in ics
. This can be done by running ls *ics
.
/home/learn$ ls *ics
Statistics Analytics
The pattern above matched filenames with *ics
. It is the concatenation of the wildcard *
with ics
. The character *
matched both Statist (in Statistics) and Analyt (in Analytics), and ics
matches ics
in both filenames. If nothing else has been returned it is because no other file or directory names end with ics
.
3. Letters, numbers or both
You can also use character classes such as [[:alpha:]]
(letters), [[:digit:]]
(the numbers from 0 to 9), [[:lower:]]
(lowercase letters), [[:upper:]]
(uppercase letters), and [[:alnum:]]
(both letters and numbers).
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*p9xyN6_JUvQMNfRdT7S2Lw.jpeg)
Character classes are not square bracket wildcards. They are wildcards just like ?
or *
, but they must be used inside square brackets; otherwise, the shell will interpret them literally and return something else. Here are some examples:
- To list all files (and the content of directories) in the working directory with names that end in
.
and are directly followed by three lowercase letters (such as .jpg or .pdf), we can runls *.[[:lower:]][[:lower:]][[:lower:]]
. - To list all files (and the content of directories) in the working directory with names that do not start with an uppercase letter and end with a number, we can run
ls [![:upper:]]*[[:digit:]]
. The exclamation mark character!
represents not.
Although there are only examples of wildcards with the ls
command in this article, wildcards work with most commands you are probably familiar with, such as cp
, mv
, rm
, and rmdir
.
Just be extra careful when using wildcards with commands like rm
, cp
and mv
, as they can have negative consequences. Before using any filesystem-altering command with wildcards, make sure they will work as you intend by trying them with ls
first.
Conclusion
Wildcards allow us to create patterns that match groups of filenames. These patterns work like regular expressions — and are called glob patterns — but with different rules. Glob patterns combined with other commands can make your life easier when using the command line or terminal. Here is a summary to help you get familiar with wildcards:
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*cnIPtZNOYvFyrkHn6HVMVg.png)
Resources:
[1] Bash: https://tiswww.case.edu/php/chet/bash/bashtop.html
[2] Escape Character: https://en.wikipedia.org/wiki/Escape_character
[3] Prettyprint: https://en.wikipedia.org/wiki/Prettyprint