avatarYang Zhou

Summary

The article provides an overview of three methods for searching substrings within Python strings, ranging from basic built-in functions like find() and index() to the more versatile regular expressions.

Abstract

The article "3 Ways To Search Substrings of Python Strings" introduces Python developers to various methods for substring searching within strings. It begins with the simplest approach using the find() and rfind() functions, which return the index of the first and last occurrences of a substring, respectively, or -1 if not found. It then discusses the index() and rindex() functions, which are similar to find() but raise an exception when the substring is not found, making them suitable for scenarios where the absence of a substring is critical. The article further explores the use of regular expressions, a powerful tool for string manipulation, demonstrating how to search for patterns with re.search() and re.findall(). These methods are particularly useful for complex searching tasks that cannot be easily accomplished with basic functions. The author emphasizes the importance of choosing the right method based on the task's complexity and the need for pattern matching.

Opinions

  • The author suggests that Python's built-in functions find() and index() are sufficient for straightforward substring searches.
  • The choice between find() and index() should be based on whether the program's flow can tolerate not finding a substring; find() returns -1, while index() raises an exception.
  • Regular expressions are recommended for complex searching tasks, as they offer the flexibility to define patterns and can handle intricate matching scenarios that would otherwise require loops or additional logic.
  • The author implies that familiarity with regular expressions can significantly simplify string searching tasks, referring to them as the "ultimate weapon" for string manipulation.
  • The article subtly encourages the reader to follow best practices in programming by choosing the appropriate method for substring searching based on the context and requirements of the task.

3 Ways To Search Substrings of Python Strings

From basic functions to regular expressions

Image by cocoparisienne from Pixabay

One essential task of programming is to get the data you need from a large amount of data. Searching certain substrings from strings is this type of task. If you happen to be a Python developer, congratulations! Because Python kindly gives us at least 3 options to do this job.

Of course, we can always use loops to go through characters one by one and find what we need. This article is not to discuss this brute force method. 🙂

This article will introduce the 3 common ways to search substrings, from the built-in functions( find() and index()) to regular expressions. After reading, it’s gonna be easy for you to choose the most appropriate one based on different usage scenarios.

1. Use the find() and rfind() Functions

Python strings contain two functions for searching the index of substrings:

title = '3 ways to search substrings of Python strings'
print(title.find('string'))
# 20
print(title.rfind('string'))
# 38
print(title.find('Yang'))
# -1

As shown above, we can use the find() function to return the lowest index in title where the substring is found. And as its name implies, the rfind() function returns the highest index. If a substring cannot be found, a -1 will be returned.

2. Use the index() and rindex() Functions

The index() function is similar with the find(). Let’s change the previous example a bit:

title = '3 ways to search substrings of Python strings'
print(title.index('string'))
# 20
print(title.rindex('string'))
# 38
print(title.index('Yang'))
# ValueError: substring not found

As shown above, everything will be the same if we do find the substrings. However, if the substring cannot be found, the index() function will raise an exception (ValueError).

We should be careful of this difference and choose the proper one based on our needs. If failing to find a substring matters a lot for the continuing programs, we do need to raise an exception. If it doesn’t matter (in other words, we don’t need to stop our programs when failed to find a substring), we can use find().

In addition, the find() function can only be used on strings. But the index() function can also be used on lists or tuples.

L = [1, 2, 3, 4, 5]
print(L.index(5))
# 4
T=(1,2,4)
print(T.index(2))
# 1

3. Use the Regular Expressions

When it comes to string-related manipulations, the regular expressions are the ultimate weapon. For some complex tasks, we need to use them.

import re
title = '3 ways to search substrings of Python strings'

print(re.search('Python', title))
# <re.Match object; span=(31, 37), match='Python'>
print(re.search('Yang', title))
# None

The above example uses the re.search() function to find the substrings:

  • If there is no matched substrings, a None will be returned.
  • Otherwise, the relative information will be returned.

You may want to ask: just this? Any differences between this regular expression method and the built-in find() method?

In fact, the true power of the re.search() is that we can search by a “pattern”.

For example, if we want to find where is the first number which is between 0 and 9, we can find it as follows:

import re

title = '3 ways to search substrings of Python strings'

pattern = re.compile('[0-9]')
print(re.search(pattern, title))
# <re.Match object; span=(0, 1), match='3'>

As the above code shows, we can use the re.compile() module to define a pattern and then search strings by it. This is super useful and efficient for complex string searching.

How to do the same job with the find() function? We may need a for loop to go through all the nine numbers:

title = '3 ways to search substrings of Python strings'

idx= 999
for i in range(10):
    new_idx = title.find(str(i))
    if new_idx!=-1 and new_idx<idx:
        idx = new_idx
print(idx)
# 0

By the way, there is another function named re.findall() which can help us find all substrings that match the pattern:

import re

title = '3 ways to search 4 of 6 Python 2'

pattern = re.compile('[0-9]')
print(re.search(pattern, title))
# <re.Match object; span=(0, 1), match='3'>

print(re.findall(pattern,title))
# ['3', '4', '6', '2']

Key Takeaways

  1. For simple tasks, we can use the find() or index() function to search substrings of Python strings.
  2. The find() can only be used for strings and will return -1 if there is no match. The index() function can also be used for lists or tuples and will raise an exception when there is no match.
  3. The re.search() and re.findall() functions give us more abilities to search substrings by a pattern. If you are familiar with the regular expressions, searching a string is just a piece of cake.

Thanks for reading. If you like it, please follow me and become a Medium member to enjoy more great articles about programming and technologies!

Relative articles:

Programming
Python
Data Science
Technology
Software Development
Recommended from ReadMedium