The web content provides an overview of ten essential Numpy functions that can significantly simplify data manipulation and analysis tasks for data scientists.
Abstract
The article "10 quick Numpy tricks that will make life easier for a data scientist" highlights the power and convenience of Numpy, a fundamental Python library for numerical computing. It introduces readers to Numpy's capabilities through ten practical functions, such as numpy.arange for generating sequences, numpy.random for random number generation, numpy.argmax for finding the index of the maximum value, slicing and indexing techniques for efficient data access, set operations like numpy.setdiff1d, array reshaping with numpy.reshape, conditional selection using numpy.where, array concatenation via numpy.concatenate, exponentiation with numpy.power, and approximate array comparison with numpy.allclose. Each function is illustrated with examples and accompanied by explanations of their use cases, aiming to help data scientists perform complex tasks more efficiently and with less code.
Opinions
The author expresses a strong preference for Numpy, stating that it was the sole reason for shifting from C++ to Python for data science tasks.
Numpy is portrayed as a library that can change a programmer's loyalty to a programming language due to its utility and efficiency.
The article suggests that mastery of Numpy's functions can eliminate the need for verbose code, such as double for loops when dealing with 2D arrays.
The author believes that Numpy's functions for array manipulation, such as slicing and reshaping, can drastically reduce the length of code.
There is an emphasis on the importance of Numpy for data science, with the author asserting that no one can survive in the field without Python and, by extension, Numpy.
The author playfully challenges readers to test their Numpy knowledge with an example, indicating a community spirit of learning and improvement through quizzing and shared knowledge.
The article concludes by certifying the reader as a "Numpy master" upon understanding the presented functions, suggesting a transformative effect of learning these Numpy tricks on the reader's data science skills.
10 quick Numpy tricks that will make life easier for a data scientist
Numpy was the only thing that made me, a die-hard C++ coder, shift to python in the first place. Mind you, this was before my data science learning curve, where no one can survive without python! (No offense, please, just stating my opinion, all the R lovers, I respect you guys all the same). But if one library can change my loyalty to a programing language, then it must be one hell of a library.
In the robotics lab that I interned at once, we used to quiz each other on weird Numpy commands, and that is when I got to see the beauty of this library truly. So if you pass the following test, by all means, you don’t need this blog :P. But otherwise, keep reading!!
Test: Predict the output of the following
import numpy as np
arr = np.arange(5)
arr = arr[-1::-1]print(arr)
If you already know this, you probably know most of what I will cover through this blog but otherwise, keep reading.
Output is [4,3,2,1,0]
Let’s dive into functions of Numpy that make it a lifesaver for us, especially for me because I hate writing double for loops to access a 2D array.
Note: In all future examples, I have assumed that import numpy as np has been called
1. numpy.arange(start, stop, step)
This function returns a numpy array that contains numbers starting from start ending before stop and increasing with a difference of step. So the numbers lie in [start, stop) interval.
If you want to see other ways to define and initialize an array, I have mentioned them in my previous blog here.
2. numpy.random
numpy.random is a module in NumPy that contains functions used for generating random numbers. It can be used for several purposes and is especially important when we deal with probability-based functions. Let’s look at a couple of examples of different functions from this module. The outputs shown below will change when you use the same commands since they are randomly generated.
#Create an array of random numbers, size of arrayis (3,2)
#Return random integers from low (inclusive) to high (exclusive) or 0 (inclusive) to low(exclusive), random.randint(low, high=None, size=None, dtype=int)
A lot of our use cases, especially when performing optimization, require us to know the variable that has the maximum or minimum value. To get rid of the extra lines of code to keep track of this information, we can simply use the numpy functions of argmax and argmin. Let’s see an example of the same. The following examples contain the term axis. Axis for an array is the direction along which we want to perform the calculations. If we don’t specify an axis, then the calculations are done over the complete array by default.
>>> a = np.array([[10, 12, 11],[13, 14, 10]])
>>> np.argmax(a)
4 #since if the arrayis flattened, 14isat the 4th index
>>> np.argmax(a, axis=0)
array([1, 1, 0]) # index of maxin each column
>>> np.argmax(a, axis=1)
array([1, 1]) # index of maxin each row
This plays a crucial role in changing the content of a NumPy array. It is used to access multiple elements of an array at the same time. Taking an example would make it easier for you to understand how it works.
>>>a = np.arange(5)>>>print(a[4])
4 # number at index
>>> print(a[1:])
array([1, 2, 3, 4])#it will print all element from index 1 to last (including number at index 1)
>>> print(a[:2])
array([0, 1])#it will print all element from index 0 to index 2(excluding number at index 2)
>>> print(a[1:3])
array([1, 2])#it will print all element from index 1 (including) to index 3 (excluding)
>>> print(a[0:4:2])
array([0,2]) # [0:4:2] represents start index, stop index, andstep respectively. It will start fromindex0 (inclusive) go toindex4 (exclusive) instepof2 which will resultin [0, 2] and a[0,2] will be the output.
Slicing is used to access elements at multiple indexes of a numpy array smartly. If used properly, it could reduce the length of code by a drastic amount.
Jumping back to the test I gave at the beginning of the blog, the sorcery there was using slicing. When we wrote arr[-1::-1] it essentially started at the last element of arr and then the second -1 ensured that it went in reverse order with step size 1. Hence we got the reverse array.
5. numpy.setdiff1d(ar1, ar2)
When we need to treat our arrays as sets and find their difference, intersection or union, numpy makes the job easy by these inbuilt functions.
#Set difference
>>> a = np.array([1, 2, 3, 2, 4, 1])
>>> b = np.array([3, 4, 5, 6])
>>> np.setdiff1d(a, b)
array([1, 2])
When dealing with matrices, we need proper dimensions when multiplying and such. Or even when dealing with complex data, I have had to change the shape of my arrays countless times.
Check out more examples here. ( I strongly recommend reading all the examples)
Similar functions:ndarray.flatten, numpy.ravel. Both of these changes any array into a 1D array, but there are subtle differences between the two. Also, functions like numpy.squeeze, and numpy.unsqueeze are used to remove or add one dimension in the array.
7. numpy.where(condition[, x, y])
np.where helps us extract a subarray with respect to some condition that we input to the function. Here if the condition is satisfied, then x is chosen or else y. Let’s see some examples to understand more
>>> np.where([[True, False], [True, True]],
... [[1, 2], [3, 4]], #x
... [[9, 8], [7, 6]]) #y
#Here since the second element in first row was false the output contained the element from the second array at that index.
I can’t tell you how much this function has made my life easier truly. It has helped me replace all the loops I used to write to square or cube a whole array. This function outputs an array that contains the first array’s elements raised to powers from the second array, element-wise. Let me show you an example to make things clear.
The ** operator can be used as a shorthand for np.power on ndarrays.
Similar functions: All the basic mathematical operators. You can check them out here.
10. numpy.allclose(a, b,rtol=1e-5, atol=1e-8)
np.allclose is used to determine if two arrays are element-wise equal within a tolerance. The tolerance value is positive and a small number. It is calculated as the sum of (rtol * abs(b)) and atol. The element-wise absolute difference between a and b should be less than the calculated tolerance.