avatarYang Zhou

Summary

The web content provides an introduction to regular expressions (regex) in Python, detailing basic syntax, usage, and tools for string matching and manipulation.

Abstract

The article "Regular Expressions in Python" offers a concise guide to regex, a powerful tool for string processing. It begins with an introduction to regex, emphasizing its utility in programming for tasks like string matching and validation. The article then delves into the basic syntax of regex, including character matching with \d, \w, and ., and quantifiers such as *, +, ?, {n}, and {n,m} for variable length matches. It highlights the concept of greedy versus non-greedy matching and provides examples to illustrate these principles. The article also covers more advanced regex tools like character ranges [], alternation with |, line start ^, and end $ indicators, as well as escaping special characters with \. The practical application of regex in Python is demonstrated using the re module, showcasing functions like re.match(), re.split(), re.group(), and re.compile(). The conclusion underscores the significance of regex in simplifying programming tasks related to string handling.

Opinions

  • The author believes that regex is an essential tool for programmers dealing with strings, suggesting that it can significantly simplify complex string processing tasks.
  • Regular expressions are acknowledged to have a steep learning curve, but the article encourages readers that familiarity with regex syntax will greatly benefit their programming endeavors.
  • The article promotes the Python re module as a convenient and efficient way to implement regex within Python, particularly when pre-compiling regex patterns for repeated use.
  • The author emphasizes the importance of understanding the difference between greedy and non-greedy matching to avoid unexpected behavior when using regex.
  • By providing practical examples and encouraging the reader to follow the associated publication for more tutorials, the author conveys a passion for teaching and sharing knowledge in the fields of programming, technology, and investment.

Regular Expressions in Python

Photo by Jeremy Thomas on Unsplash

Introduction

The regular expression (regex) is a powerful tool to deal with characters/strings. Many programming languages support regex. It could need a whole book to introduce all syntax and features of it. This post will introduce some basic syntax which are very useful in daily programming and demonstrate how we can use regex in Python.

Basic Syntax

The regex is not a complex algorithm or data structure. It just some common rules defines by human to make processes of strings matching/finding/searching/etc convenient.

For example, if we wanna write a script to check whether a string is a valid e-mail address or not. Only two steps are needed if we use regex:

  1. Design a regex by its rules for matching the e-mail addresses.
  2. All e-mail addresses we received can be checked by this regex we just designed.

Very easy, right?

Photo by Windows on Unsplash

All right, let’s see what the rules are:

Firstly, if we are gonna match a single character:

  1. If a character is given directly, it is an exact match.
  2. Use \d to match a number.
  3. Use \w to match a number or letter.
  4. A . can match any character except a newline.
  5. Use \s to match a space.

Examples:

  • 00\d can match 007 , but can’t match 00M
  • \d\w\d can match 1A1 , 2k9 , etc.
  • c.t can match cat , c!t , c9t , etc.

Secondly, if we gonna match a variable length character:

  1. Use * to match 0 or more repetitions of the preceding regex, as many repetitions as are possible.
  2. Use + to match 1 or more repetitions of the preceding regex. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.
  3. Use ? to match 0 or 1 repetitions of the preceding regex.
  4. Use {n} to match n repetitions of the preceding regex.
  5. Use {n,m} to match n-m repetitions of the preceding regex.

Examples:

  • ab* can match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.
  • ab+ can match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.
  • ab? can match either ‘a’ or ‘ab’.
  • \d{5} can match five numbers, such as 12345
  • \d{2,8} can match 2–8 numbers, such as 25 , 999 , 12345678

Looks so easy, right? But don’t start celebrating yet 🙂 . Let’s look a interesting example:

If we use <.*> to match '<a> b <c>', it will match the entire string, and not just '<a>'. Sometimes we could meet unexpected behaviours because regex defaults to greedy matching, which means matching as many characters as possible.

In situations like this, we can adding ? after a expression to make it become non-greedy. In other words, as few characters as possible will be matched. Using the regex <.*?> will match only '<a>'.

Thirdly, there are some more tools to do a more precise match:

  1. Use [] to indicate a range. For example, [0-9a-z] can match a number between 0 and 9 or a letter between a and z.
  2. A|B can match A or B, so (P|python) can match ‘Python’ or ‘python’.
  3. ^ indicates the beginning of the line. For example, ^\d indicates that the string must start with a number.
  4. $ indicates the end of the line. For example, \d$ indicates that the string must end with a number.
  5. Use \ to escapes special characters (permitting us to match characters like '*', '?', and so forth).

How to use regex in Python?

Python has a module re which makes it very convenient to use regex. Let’s see how it works:

  • re.match()

The re.match() method is to check whether a regex matches a string or not, if it matches successfully, return a match object, else return None .

  • re.split()

This split function is more powerful than the Python build-in split function. It could split a string by any characters and any number of characters.

  • re.group()

In addition to simply judging whether a string matches or not, regular expressions also have the power of extracting sub strings!

On a regex, we can use group() to indicate the group which needed to be extracted.

  • re.compile()

When we use regex to match a string in Python, two steps will happen inside the re module:

  1. Compile the regex, if the expression itself is invalid, an error will be reported;
  2. Use the compiled regex to match a string.

If a regular expression is to be reused many times, it will waste lots of time to compile it again and again. Therefore, we can pre-compile the regular expression, and then no need to compile it again.

Conclusion

The regular expression is a very useful and powerful tool to deal with strings. It has lots of syntax, but when you are familiar with that, you programming life will become easier.

Thanks for reading. If you like it, please follow my publication TechToFreedom, where you can enjoy other Python tutorials and topics about programming, technology and investment.

Photo by Robert Collins on Unsplash
Python
Tutorial
Regular Expressions
Programming
Computer Science
Recommended from ReadMedium