40 Most Insanely Usable Methods in Python
Data cleaning and wrangling in data science and machine learning
This article will take you to the most usable python methods used in the field of data analysis, data science, and machine learning for data wrangling. The scope of this article is to make you feel comfortable around these methods for the long run in your career.
Topics to be covered:
1. String Methods 15. isdecimal() 29. maketrans()
2. capitalize() 16. isdigit() 30. partition()
3. casefold() 17. isidentifier() 31. replace()
4. center() 18. islower() 32. rfind()
5. count() 19. isnumeric() 33. rindex()
6. encode() 20. isprintable() 34. rjust()
7. endswith() 21. zfill() 35. rpartition()
8. expandtabs() 22. isspace() 36. rsplit()
9. find() 23. istitle() 37. rstrip()
10. format() 24. isupper() 38. splitlines()
11. format_map() 25. join() 39. startswith()
12. index() 26. ljust() 40. upper()
13. isalnum() 27. lower()
14. isalpha() 28. lstrip()- String Methods
The string methods are very useful in data wrangling in any application related to data.
2. capitalize()
In this method the first letter word or sentences are capitalize and rest of the letters are lower.
#Example of word and sentenceword = "aMIT"
sentence = "hEllo WOrLD"word.capitalize()#output:
'Amit'#if we use variable then the quotes in the output won't comew = word.capitalize()
print(w)#output:
Amit#for the sentences = sentence.capitalize()
print(s)#output:
Hello world3. casefold()
This method is used to lower all the letters in the word or sentences, but if there are any different language letters, it will convert to some lower letters.
word = "aMIT"word.casefold()#output:
'amit'4. center()
In this method the the word or sentences are padded in the prefix and suffix with space or other character.
word = "aMIT"
without_padding = word.center(15)print(without_padding)#output:
aMIT#With padding
word = "aMIT"
with_padding = word.center(15, '*')print(with_padding)#output:
******aMIT*****5. count()
It is used to find the occurrence of letter, word, phrase in the document.
sentence = "Happy Happy Happy Happy"
letter_count = sentence.count("a")print(letter_count)#output:
Letter 'a' occurred: 4 Times#find occurrence with positionssentence = "Happy Happy Happy Happy"
letter_count = sentence.count("a", 0, 15)print("Letter 'a' occurred in position between:", letter_count,
"Times")#output:
Letter 'a' occurred in position: 3 Times6. encode()
It is used to encode the words or sentences for security of the messages.
word = " Άmit "
encoded_word = word.encode()
print(encoded_word)#output:
b' \xbe\x92mit '7. endswith()
It will return the boolean value if the last letter or word i.e. suffix is matched or not.
sentence = "Happy Happy Happy Happy"
result = sentence.endswith("py")
print(result)#output:
True#when the suffix not matched
sentence = "Happy Happy Happy Happy"
result = sentence.endswith("it")
print(result)#output:
False8. expandtabs()
This method uses ‘\t’ character to give the space between the words or sentence.
sentence = 'happy\thappy\thappy'
r = sentence.expandtabs()print(r)#output:
happy happy happy9. find()
It is used to find the first-time occurrence of a letter or word or sub-string in the sentence and returns the starting index.
sentence = "Happy Happy Happy Happy"
a = sentence.find("pp")
print(a)#output:
210. format()
This method is very useful in data analytics research. It has key and value parameters to be used in a positional manner. The first position value goes to the first curly bracket and so on.
print("My name is {}, I live in {}.".format("Amit", "Delhi"))#output:
My name is Amit, I live in Delhi.11. format_map()
It is used to mapping the values in the dictionary.
#normal format method
dict1 = {'a':'Amit','b':'Delhi'}
print("Hello {a}, I live in {b}.".format(**dict1))#output:
Hello Amit, I live in Delhi.#With format_map() method
dict1 = {'a':'Amit','b':'Delhi'}
print("Hello {a}, I live in {b}.".format_map(dict1))#output:
Hello Amit, I live in Delhi.12. index()
It is used to find the index of the word or letter that is in the document.
sentence = "Happy Happy Happy Happy"
print(sentence.index('pp'))#output:
213. isalnum()
This method will return the boolean value based on the alphanumeric letters in the word. If all the letters are either numeric or alphabet then it will return ‘true’ otherwise ‘false’.
word = "aMIT235"
print(word.isalnum())#output:
True#If there is any space it will return false.
word = "aMIT 235"
print(word.isalnum())#output:
False14. isalpha()
This method will return the boolean value and true if all the letters are alphabet.
word = "aMIT"
print(word.isalpha())#output:
Trueword = "aMIT235"
print(word.isalpha())#output:
False15. isdecimal()
This method will return the boolean value and true if all the letters are decimal.
word = "235"
print(word.isdecimal())#output:
True#All the letters are not numeric
word = "Amit235"
print(word.isdecimal())#output:
False16. isdigit()
This method will return the boolean value and true if all the letters are digit.
word = "235"
print(word.isdigit())#output:
True#When all the letters are not numeric
word = "Amit235"
print(word.isdigit())#output:
False17. isidentifier()
In this method if the word is an identifier then it will return true otherwise false.
word = "Amit235"
print(word.isidentifier())#output:
Trueword = "Amit 235"
print(word.isidentifier())#output:
False18. islower()
This method will check the sentence or word if it is in lower case or not.
word = "amit 235"
print(word.islower())#output:
Trueword = "Amit 235"
print(word.islower())#output:
False19. isnumeric()
This method will return the boolean value and true if all the letters are numeric.
word = "235"
print(word.isnumeric())#output:
Trueword = "A235"
print(word.isnumeric())#output:
False20. isprintable()
This method will identify that if the space is occupied with pritable characters or not.
sentence = "Happy Happy Happy Happy"
print(sentence.isprintable())#output:
True#The new line sequence in the last is not printable space
sentence = "Happy Happy Happy Happy\n"
print(sentence.isprintable())#output:
False21. zfill()
It is used to fill the padding from the left side with zero i.e. ‘0’. In the below example the word has 4 characters and the padding width is 15 then it will be 11 zeroes to the left side.
word = "A235"
print(word.zfill(15))#output:
00000000000A23522. isspace()
This method will return true if all the characters are white spaces in the word otherwise false.
word = "A235"
print(word.isspace())#output:
Falseword = ""
print(word.isspace())#output:
Falseword = " "
print(word.isspace())#Output:
True23. istitle()
In this method, the words in the sentence start with upper case, and the rest of the letters in the word are the lower case then it will return true otherwise false.
sentence = "Happy Happy Happy Happy"
print(sentence.istitle())#output:
Truesentence = "Happy HAppy Happy Happy"
print(sentence.istitle())#output:
False24. isupper()
In this method, it will return true if all the letters are in capital case and return false if all the letters are not in upper case.
sentence = "HAPPY HAPPY"
print(sentence.isupper())#output:
Truesentence = "Happy Happy"
print(sentence.isupper())#output:
False25. join()
This method uses a separator to join the items in a iterate fashion.
items = ['Happy', 'Happy', 'Happy', 'Happy']
print(' '.join(items))#output:
Happy Happy Happy Happy#Using with different seprator
items = ['Happy', 'Happy', 'Happy', 'Happy']
print('--'.join(items))output:
Happy--Happy--Happy--Happy26. ljust()
It is used to give the space on the right side and the word will be on the left side.
word = 'Happy'
width = 10print(word.ljust(width, '@'))#output:
Happy@@@@@27. lower()
This method is used to convert all the letters to lower case.
sentence = "HAPPY HAPPY"
print(sentence.lower())#output:
happy happy28. lstrip()
It is used to strip the characters or spaces from the left side and remove the characters till it matches.
#Removing the spaces on left side
word = ' Happy'
print(word.lstrip())#output:
Happy#removing character
word = '...,,,bgrrrr..,,Happy'
a = word.lstrip(",.gbr")
print(a)#output:
Happy29. maketrans()
It is used to make mapping the characters in the words with the help of arguments with equal length.
#One argument has to be dictionary only
dict1 = {"A": "15", "M": "46", "I": "79", "T": "84"}
word = "AMIT"
print(word.maketrans(dict1))#output:
{65: '15', 77: '46', 73: '79', 84: '84'}30. partition()
This method separates the words into different tuples, if the sub-string is found then it will separate the words before the sub-string, the sub-string itself, and separate the words after the sub-string.
sentence = "Happy sad Everyone is sad Happy"
print(sentence.partition('is'))#output:
('Happy sad Everyone ', 'is', ' sad Happy')31. replace()
It is used to replace the existing word or letter in the document with the new mentioned word or letter.
sentence = "Happy sad Everyone is sad Happy"
print(sentence.replace('Happy', 'Sad'))#output:
Sad sad Everyone is sad Sad32. rfind()
This is used to find the sub-string in the sentence at maximum index position.
sentence = "Happy sad Everyone is sad Happy"
result = sentence.rfind('Ha')
print("Sub-string 'Ha':", result)#output:
Sub-string 'Ha': 26The ‘Ha’ sub-string repeated 4 times then it will search the sub-string that is on maximum index.
33. rindex()
It is almost same as rfind() method.
sentence = "Happy sad Everyone is sad Happy"
result = sentence.rindex('Ha')
print("Sub-string 'Ha':", result)#output:
Sub-string 'Ha': 2634. rjust()
It is used to give the space on the left side and the word will be on the right side.
word = 'Happy'
width = 10print(word.rjust(width, '@'))#output:
@@@@@Happy35. rpartition()
This method is used to separate the sentences into tuple on the basis of last sub-string index position.
sentence = "Happy sad Everyone is sad Happy"
print(sentence.rpartition('is'))#output:
('Happy sad Everyone ', 'is', ' sad Happy')36. rsplit()
It is used to split the sentences into a list of words. If we mention the number of splits from the right side of the sentence.
sentence = "Happy Happy Everyone is Happy"# splits at space
print(sentence.rsplit())#output:
['Happy', 'Happy', 'Everyone ', 'is', 'Happy']#With number of splits
sentence = "Happy Happy Everyone is Happy"
print(sentence.rsplit(' ', 3))#output:
['Happy Happy', 'Everyone', 'is', 'Happy']37. rstrip()
It is used to strip the characters or spaces from the right side and remove the characters till it matches.
word = 'Happy '
print(word.rstrip())#output:
Happyword = 'Happy.....'
print(word.rstrip("."))#output:
Happy38. splitlines()
This method splits the words or letters based on some boundary characters i.e. \n, \r, \f, and many more.
sentence = "Happy\nHappy\rEveryone\fis\vHappy"
print(sentence.splitlines())#output:
['Happy', 'Happy', 'Everyone', 'is', 'Happy']39. startswith()
This method will return true if the sentence or words start with the testing sub-string.
sentence = "Happy sad Everyone is sad Happy"
result = sentence.startswith("Happy")
print(result)#output:
True40. upper()
It is used to converts all the lower case letters to the upper case letters.
sentence = "Happy sad Everyone is sad Happy"
print(sentence.upper())#output:
HAPPY SAD EVERYONE IS SAD HAPPYI hope you like the article. Reach me on my LinkedIn and twitter.
Recommended Articles
1. 8 Active Learning Insights of Python Collection Module 2. NumPy: Linear Algebra on Images 3. Exception Handling Concepts in Python 4. Pandas: Dealing with Categorical Data 5. Hyper-parameters: RandomSeachCV and GridSearchCV in Machine Learning 6. Fully Explained Linear Regression with Python 7. Fully Explained Logistic Regression with Python 8. Data Distribution using Numpy with Python 9. Decision Trees vs. Random Forests in Machine Learning 10. Standardization in Data Preprocessing with Python






