avatarAmit Chauhan

Summary

The web content provides an overview of 40 essential Python string methods, detailing their usage in data cleaning and wrangling for data science and machine learning applications.

Abstract

The article titled "40 Most Insanely Usable Methods in Python" focuses on string methods that are pivotal for data cleaning and wrangling in data science and machine learning projects. It categorizes these methods into four main types: string manipulation, encoding, searching, and formatting. Each method is explained with examples, such as capitalize() for capitalizing the first letter of a string, count() for finding the number of occurrences of a substring, and encode() for converting strings into bytes. The article emphasizes the practicality of these methods in handling and preparing data, which is crucial for data analysis. It also includes interactive examples to illustrate how these methods can be applied in real-world scenarios, enhancing the reader's understanding and ability to implement them effectively.

Opinions

  • The author believes that mastering these Python string methods is crucial for a long-term career in data science and machine learning.
  • The article suggests that the format() and format_map() methods are particularly useful for data analytics research due to their key-value parameter usage.
  • The author highlights the importance of methods like isalnum(), isalpha(), and isdigit() in validating data types and ensuring data quality.
  • The use of join() with separators is presented as a versatile tool for combining iterable elements into a string, which can be highly beneficial in data manipulation tasks.
  • The article conveys that methods such as partition() and rpartition() are valuable for splitting strings based on specific substrings, a common requirement in data preprocessing.
  • The author encourages readers to reach out on LinkedIn and Twitter for further engagement and discussion on the topic, indicating a willingness to share knowledge and collaborate with the data science community.
  • A list of recommended articles is provided at the end, suggesting that these resources offer additional insights and complement the content of the current article.

40 Most Insanely Usable Methods in Python

Data cleaning and wrangling in data science and machine learning

Photo by ThisisEngineering RAEng on Unsplash

This article will take you to the most usable python methods used in the field of data analysis, data science, and machine learning for data wrangling. The scope of this article is to make you feel comfortable around these methods for the long run in your career.

Topics to be covered:

1. String Methods    15. isdecimal()       29. maketrans()
2. capitalize()      16. isdigit()         30. partition()
3. casefold()        17. isidentifier()    31. replace()
4. center()          18. islower()         32. rfind()
5. count()           19. isnumeric()       33. rindex()
6. encode()          20. isprintable()     34. rjust()
7. endswith()        21. zfill()     35. rpartition()
8. expandtabs()      22. isspace()         36. rsplit()
9. find()            23. istitle()         37. rstrip()
10. format()         24. isupper()         38. splitlines()
11. format_map()     25. join()            39. startswith()
12. index()          26. ljust()           40. upper()
13. isalnum()        27. lower()           
14. isalpha()        28. lstrip()
  1. String Methods

The string methods are very useful in data wrangling in any application related to data.

2. capitalize()

In this method the first letter word or sentences are capitalize and rest of the letters are lower.

#Example of word and sentence
word = "aMIT"
sentence = "hEllo WOrLD"
word.capitalize()
#output:
'Amit'
#if we use variable then the quotes in the output won't come
w = word.capitalize()
print(w)
#output:
Amit
#for the sentence
s = sentence.capitalize()
print(s)
#output:
Hello world

3. casefold()

This method is used to lower all the letters in the word or sentences, but if there are any different language letters, it will convert to some lower letters.

word = "aMIT"
word.casefold()
#output:
'amit'

4. center()

In this method the the word or sentences are padded in the prefix and suffix with space or other character.

word = "aMIT"
without_padding = word.center(15)
print(without_padding)
#output:
      aMIT
#With padding
word = "aMIT"
with_padding = word.center(15, '*')
print(with_padding)
#output:
******aMIT*****

5. count()

It is used to find the occurrence of letter, word, phrase in the document.

sentence = "Happy Happy Happy Happy"
letter_count = sentence.count("a")
print(letter_count)
#output:
Letter 'a' occurred: 4 Times
#find occurrence with positions
sentence = "Happy Happy Happy Happy"
letter_count = sentence.count("a", 0, 15)
print("Letter 'a' occurred in position between:", letter_count,
                                                          "Times")
#output:
Letter 'a' occurred in position: 3 Times

6. encode()

It is used to encode the words or sentences for security of the messages.

word = " Άmit "
encoded_word = word.encode()
print(encoded_word)
#output:
b' \xbe\x92mit '

7. endswith()

It will return the boolean value if the last letter or word i.e. suffix is matched or not.

sentence = "Happy Happy Happy Happy"
result = sentence.endswith("py")
print(result)
#output:
True
#when the suffix not matched
sentence = "Happy Happy Happy Happy"
result = sentence.endswith("it")
print(result)
#output:
False

8. expandtabs()

This method uses ‘\t’ character to give the space between the words or sentence.

sentence = 'happy\thappy\thappy'
r = sentence.expandtabs()
print(r)
#output:
happy   happy   happy

9. find()

It is used to find the first-time occurrence of a letter or word or sub-string in the sentence and returns the starting index.

sentence = "Happy Happy Happy Happy"
a = sentence.find("pp")
print(a)
#output:
2

10. format()

This method is very useful in data analytics research. It has key and value parameters to be used in a positional manner. The first position value goes to the first curly bracket and so on.

print("My name is {}, I live in {}.".format("Amit", "Delhi"))
#output:
My name is Amit, I live in Delhi.

11. format_map()

It is used to mapping the values in the dictionary.

#normal format method
dict1 = {'a':'Amit','b':'Delhi'}
print("Hello {a}, I live in {b}.".format(**dict1))
#output:
Hello Amit, I live in Delhi.
#With format_map() method
dict1 = {'a':'Amit','b':'Delhi'}
print("Hello {a}, I live in {b}.".format_map(dict1))
#output:
Hello Amit, I live in Delhi.

12. index()

It is used to find the index of the word or letter that is in the document.

sentence = "Happy Happy Happy Happy"
print(sentence.index('pp'))
#output:
2

13. isalnum()

This method will return the boolean value based on the alphanumeric letters in the word. If all the letters are either numeric or alphabet then it will return ‘true’ otherwise ‘false’.

word = "aMIT235"
print(word.isalnum())
#output:
True
#If there is any space it will return false.
word = "aMIT 235"
print(word.isalnum())
#output:
False

14. isalpha()

This method will return the boolean value and true if all the letters are alphabet.

word = "aMIT"
print(word.isalpha())
#output:
True
word = "aMIT235"
print(word.isalpha())
#output:
False

15. isdecimal()

This method will return the boolean value and true if all the letters are decimal.

word = "235"
print(word.isdecimal())
#output:
True
#All the letters are not numeric
word = "Amit235"
print(word.isdecimal())
#output:
False

16. isdigit()

This method will return the boolean value and true if all the letters are digit.

word = "235"
print(word.isdigit())
#output:
True
#When all the letters are not numeric
word = "Amit235"
print(word.isdigit())
#output:
False

17. isidentifier()

In this method if the word is an identifier then it will return true otherwise false.

word = "Amit235"
print(word.isidentifier())
#output:
True
word = "Amit 235"
print(word.isidentifier())
#output:
False

18. islower()

This method will check the sentence or word if it is in lower case or not.

word = "amit 235"
print(word.islower())
#output:
True
word = "Amit 235"
print(word.islower())
#output:
False

19. isnumeric()

This method will return the boolean value and true if all the letters are numeric.

word = "235"
print(word.isnumeric())
#output:
True
word = "A235"
print(word.isnumeric())
#output:
False

20. isprintable()

This method will identify that if the space is occupied with pritable characters or not.

sentence = "Happy Happy Happy Happy"
print(sentence.isprintable())
#output:
True
#The new line sequence in the last is not printable space
sentence = "Happy Happy Happy Happy\n"
print(sentence.isprintable())
#output:
False

21. zfill()

It is used to fill the padding from the left side with zero i.e. ‘0’. In the below example the word has 4 characters and the padding width is 15 then it will be 11 zeroes to the left side.

word = "A235"
print(word.zfill(15))
#output:
00000000000A235

22. isspace()

This method will return true if all the characters are white spaces in the word otherwise false.

word = "A235"
print(word.isspace())
#output:
False
word = ""
print(word.isspace())
#output:
False
word = "       "
print(word.isspace())
#Output:
True

23. istitle()

In this method, the words in the sentence start with upper case, and the rest of the letters in the word are the lower case then it will return true otherwise false.

sentence = "Happy Happy Happy Happy"
print(sentence.istitle())
#output:
True
sentence = "Happy HAppy Happy Happy"
print(sentence.istitle())
#output:
False

24. isupper()

In this method, it will return true if all the letters are in capital case and return false if all the letters are not in upper case.

sentence = "HAPPY HAPPY"
print(sentence.isupper())
#output:
True
sentence = "Happy Happy"
print(sentence.isupper())
#output:
False

25. join()

This method uses a separator to join the items in a iterate fashion.

items = ['Happy', 'Happy', 'Happy', 'Happy']
print(' '.join(items))
#output:
Happy Happy Happy Happy
#Using with different seprator
items = ['Happy', 'Happy', 'Happy', 'Happy']
print('--'.join(items))
output:
Happy--Happy--Happy--Happy

26. ljust()

It is used to give the space on the right side and the word will be on the left side.

word = 'Happy'
width = 10
print(word.ljust(width, '@'))
#output:
Happy@@@@@

27. lower()

This method is used to convert all the letters to lower case.

sentence = "HAPPY HAPPY"
print(sentence.lower())
#output:
happy happy

28. lstrip()

It is used to strip the characters or spaces from the left side and remove the characters till it matches.

#Removing the spaces on left side
word = '    Happy'
print(word.lstrip())
#output:
Happy
#removing character 
word = '...,,,bgrrrr..,,Happy'
a = word.lstrip(",.gbr")
print(a)
#output:
Happy

29. maketrans()

It is used to make mapping the characters in the words with the help of arguments with equal length.

#One argument has to be dictionary only
dict1 = {"A": "15", "M": "46", "I": "79", "T": "84"}
word = "AMIT"
print(word.maketrans(dict1))
#output:
{65: '15', 77: '46', 73: '79', 84: '84'}

30. partition()

This method separates the words into different tuples, if the sub-string is found then it will separate the words before the sub-string, the sub-string itself, and separate the words after the sub-string.

sentence = "Happy sad Everyone is sad Happy"
print(sentence.partition('is'))
#output:
('Happy sad Everyone ', 'is', ' sad Happy')

31. replace()

It is used to replace the existing word or letter in the document with the new mentioned word or letter.

sentence = "Happy sad Everyone is sad Happy"
print(sentence.replace('Happy', 'Sad'))
#output:
Sad sad Everyone is sad Sad

32. rfind()

This is used to find the sub-string in the sentence at maximum index position.

sentence = "Happy sad Everyone is sad Happy"
result = sentence.rfind('Ha')
print("Sub-string 'Ha':", result)
#output:
Sub-string 'Ha': 26

The ‘Ha’ sub-string repeated 4 times then it will search the sub-string that is on maximum index.

33. rindex()

It is almost same as rfind() method.

sentence = "Happy sad Everyone is sad Happy"
result = sentence.rindex('Ha')
print("Sub-string 'Ha':", result)
#output:
Sub-string 'Ha': 26

34. rjust()

It is used to give the space on the left side and the word will be on the right side.

word = 'Happy'
width = 10
print(word.rjust(width, '@'))
#output:
@@@@@Happy

35. rpartition()

This method is used to separate the sentences into tuple on the basis of last sub-string index position.

sentence = "Happy sad Everyone is sad Happy"
print(sentence.rpartition('is'))
#output:
('Happy sad Everyone ', 'is', ' sad Happy')

36. rsplit()

It is used to split the sentences into a list of words. If we mention the number of splits from the right side of the sentence.

sentence = "Happy Happy Everyone is Happy"
# splits at space
print(sentence.rsplit())
#output:
['Happy', 'Happy', 'Everyone ', 'is', 'Happy']
#With number of splits
sentence = "Happy Happy Everyone is Happy"
print(sentence.rsplit(' ', 3))
#output:
['Happy Happy', 'Everyone', 'is', 'Happy']

37. rstrip()

It is used to strip the characters or spaces from the right side and remove the characters till it matches.

word = 'Happy     '
print(word.rstrip())
#output:
Happy
word = 'Happy.....'
print(word.rstrip("."))
#output:
Happy

38. splitlines()

This method splits the words or letters based on some boundary characters i.e. \n, \r, \f, and many more.

sentence = "Happy\nHappy\rEveryone\fis\vHappy"
print(sentence.splitlines())
#output:
['Happy', 'Happy', 'Everyone', 'is', 'Happy']

39. startswith()

This method will return true if the sentence or words start with the testing sub-string.

sentence = "Happy sad Everyone is sad Happy"
result = sentence.startswith("Happy")
print(result)
#output:
True

40. upper()

It is used to converts all the lower case letters to the upper case letters.

sentence = "Happy sad Everyone is sad Happy"
print(sentence.upper())
#output:
HAPPY SAD EVERYONE IS SAD HAPPY

I hope you like the article. Reach me on my LinkedIn and twitter.

Recommended Articles

1. 8 Active Learning Insights of Python Collection Module 2. NumPy: Linear Algebra on Images 3. Exception Handling Concepts in Python 4. Pandas: Dealing with Categorical Data 5. Hyper-parameters: RandomSeachCV and GridSearchCV in Machine Learning 6. Fully Explained Linear Regression with Python 7. Fully Explained Logistic Regression with Python 8. Data Distribution using Numpy with Python 9. Decision Trees vs. Random Forests in Machine Learning 10. Standardization in Data Preprocessing with Python

Python
Programming
Data Science
Artificial Intelligence
Education
Recommended from ReadMedium