The web content provides five practical strategies for writing more computationally efficient code in Python, emphasizing memory management, the use of generators, the iterrtools library, string concatenation methods, and adjusting Jupyter Notebook memory settings.
Abstract
The article titled "5 Quick & Easy Hacks to Write More Computationally Efficient Code" outlines methods for improving the computational efficiency of code, particularly in Python. It advises developers to actively manage memory by deleting unnecessary variables and using the garbage collection module. The use of generators is recommended for processing large datasets without overloading memory, as they yield results one at a time rather than storing all results simultaneously. The iterrtools library is highlighted for its ability to speed up iterative tasks through efficient looping constructs borrowed from other languages. The article also warns against using the + operator for string concatenation due to Python's immutability of strings, suggesting the use of formatted strings instead. Lastly, it guides users on increasing Jupyter Notebook's memory capacity for handling heavy computational tasks, with a caution to allocate memory judiciously.
Opinions
The author believes that developers tend to be liberal with computational resources, which can lead to inefficient code and development challenges.
Generators are praised for their ability to clean up code and avoid messy iterators while being computationally efficient.
The iterrtools library is seen as a powerful tool for enhancing the speed of iterative tasks in Python, compensating for the language's inherent computational inefficiencies.
The author suggests that Python's ease of use should not deter programmers from striving to write computationally efficient code.
There is an emphasis on the importance of actively managing memory to prevent excessive RAM consumption, especially when working with large data structures like DataFrames.
The author expresses surprise at the infrequent use of memory-efficient string concatenation methods, given their potential impact on performance.
Adjusting Jupyter Notebook's memory settings is presented as both a solution for memory-related errors and a potential risk if not done carefully.
5 Quick & Easy Hacks to Write More Computationally Efficient Code
Transform your inefficient coding style
When working as a computer scientist — be it with performing costly big data manipulations in data science or running expensive simulations in game development — we all tend to be on the liberal side in terms of computational usage. However, if you’re not careful about using computational space, your development will be a nightmare. Here are 5 quick and easy ways you can spare yourself that experience by writing more computational efficient code.
Freeing up memory
Whenever you create any variable, even if you are not using it anymore, it exists in the python environment, is taking up space, and does not free on its own. If you load a massive DataFrame using Pandas, make a copy or edit of it, and never use it again, the original DataFrame will still be in the RAM, consuming memory.
After you decide an object is not necessary anymore — say, for instance, a DataFrame that contains columns not needed for analysis and to be discarded — explicitly delete it using del object_name. Even after deleting it, however, there will be some remaining memory usage.
The garbage collection module in Python is very helpful with freeing up used memory. At the beginning of your script, call import gc to load the library. Call gc.collect() to clear up excess space that is not being used. When you perform several transformations, functions, and copies in the data, references, values, and indices running behind the code begin to pile up, so it is helpful to call it every so often.
Use Generators
Generators are similar to functions, but iterate and returns one item at time, instead of all at once. If you have a large dataset or list, using generators allows you to utilize what you have so far, instead of being forced to wait until the entire dataset is accessible.
To create a generator, construct a function how you normally would, but substitute the return keyword with yield. Using the next operator will return the next iteration of the generator.
Generators are much more computationally efficient than functions because they do not need to store all the results within the memory, but generate them on the spot; hence, a small amount of memory is only being used when we request for a result. As an additional bonus, using generators is a great way to clean up code and avoiding messy iterators.
Use iterrtools
iterrtools is a must-know resource for anyone writing long iterative scripts, for instance, converting a DataFrame row-by-row or scraping data. While Python is known for being very easy to learn, read, and work with, unfortunately, this advantage comes at the cost of being less computationally superior than other languages, which for instance force specification of variables, variable types, as well as less freedom on iterations. No fear! — the iterrtools library borrows the efficient iterative building blocks from APL, Haskell, and SML, and recasts them into Python for your easy use.
Because it is based on the fundamental structures of other more efficient languages, using iterrtools will make your looping much faster than in native python. Consider, for instance, an infinite iterator count(x,y), which takes in a beginning number x and a step size of y. As an experiment, let us compare a native python implementation of counting from 0 to 100 million with the iterrtools implementation.
Wow! Using iterrtools, the iteration process is two times faster. This is not a purely additive relationship (+7 seconds, for example) because of the different approaches to memory. By substituting native python looping with iterrtools functionalities, you will be saving yourself not only time but also computational space.
Not only is iterrtools very fast, but it also has easy implementations of more complex iterative tasks that would otherwise need to be implemented by hand. A small sample of these would be:
Filtering. Return a list of all items that satisfy a given condition.
Compression. Return a list of all items whose index in a ‘compression list’ are 1 (e.g. compress([‘a’,‘b’,‘c’,‘d’],[0,1,0,1]) -> [‘b’,‘d’]).
Permutations. Iterate through all possible combinations of lists of elements.
Accumulation. Iterate through a list, while keeping track of the sum of all terms at the current index and all indices beforehand.
You can find a complete list of references in the documentation here.
Don’t use + for strings
A common high-computational cost task for looping is to perform some task on a long list of strings. The easy way to do this is with +; for example, “hey there ” + “bob” -> “hey there bob”. Unfortunately, as strings are immutable, every time an element is added to a string, Python will create an entirely new string at an entirely new address. This means that every time a string is altered, new memory needs to be allocated. That means that after your complex string-altering function has gone through all the rows in a massive NLP dataset, your notebook is chock-full of data-using strings that are not being utilized.
Although python is not as rigid with other languages about writing the most computationally efficient code, it doesn’t mean that we shouldn’t try to do so. Using the %[object abbreviation] notation allows python to optimize accommodation for a specific data type. In this case, we are attaching a string, so the used notation is %s.
Using this implementation is so helpful but so seldom do people actually use it. Your memory will thank you!
Increase Jupyter Notebook’s default memory size
Jupyter notebook is a top choice of environment for many working with high-computational operations and manipulation.
If you’ve ever dealt with heavy models and massive amounts of data, you may have witnessed Jupyter notebook suddenly shut down with an error message, “notebook attempted to allocate more memory than possible.” In order to edit Jupyter Notebook’s default memory size, run the following line with whatever you would like the size to be:
This automatically enters your Jupyter notebook’s configuration file and changes the max_buffer_size argument within it.
Be careful with this, though; allocating more memory than you need may end up doing more harm than good. On the other hand, it is a good way to utilize the full power of your device.
If you enjoyed,
you may also enjoy some of my other programming articles.