Summary

The article introduces Python's Pipe library, demonstrating how it simplifies data processing by providing concise and readable code.

Abstract

The Pipe library in Python is presented as a powerful tool for streamlining data manipulation and cleaning tasks. The library offers functionalities similar to built-in functions like filter and map, but with a more intuitive syntax that allows for chaining operations in a pipeline manner, reminiscent of Unix pipes. The article highlights key features of Pipe, such as filtering elements with where, transforming elements with select, flattening nested structures with chain and traverse, grouping data with groupby, and performing deduplication with dedup and uniq. Examples provided illustrate how these operations can be combined to perform complex data transformations in a clear and concise way, ultimately improving code readability and maintainability.

Opinions

The author advocates for the Pipe library as a means to achieve code simplicity and adheres to the "less is more" philosophy in programming.
The Pipe library is praised for its ability to handle nested lists and dictionaries with ease, which is often a cumbersome task in Python.
The article suggests that using Pipe can enhance the readability of code, particularly when dealing with data processing tasks.
The author implies that Pipe's functionality can replace more verbose traditional Python code, making it a preferred choice for certain data operations.
The comparison of Pipe's chaining capabilities to Unix pipes conveys a sense of familiarity and efficiency for those accustomed to command-line data processing.
By providing practical examples, the author conveys confidence in the Pipe library's effectiveness and ease of use for Python developers.

Using Python’s Pipe Library to Write Code is so Concise

The way of code simplicity, less is more.

Python is known to be very good at manipulating and cleaning data. Today I will introduce the Pipe library for processing data.

Install

pip install pipe

Filter element

Similar to filter, the where operator in pipe can filter elements in an iterable object.

In [4]: from pipe import *
In [5]: numbers = [0, 1, 2, 3, 4, 5, 6 ,7 ,8]

In [6]: list(numbers | where(lambda x: x % 2 == 0))
Out[6]: [0, 2, 4, 6, 8]

Action element

Like map, the select the operation applies a function to each element in the iterable. In the following example, we will double the elements in the list.

In [8]: list(numbers | select(lambda x: x * 2))
Out[8]: [0, 2, 4, 6, 8, 10, 12, 14, 16]

Of course, multiple operations can also be combined.

The following example is to pick out the even numbers in the list and expand them by a factor of 2. Unlike filter and map, pipe can connect multiple operations, just like a water pipe is a water pipe.

In [10]: list(numbers
    ...:     | where(lambda x: x % 2 == 0)
    ...:     | select(lambda x: x * 2)
    ...:    )
    ...:
Out[10]: [0, 4, 8, 12, 16]

Connecting elements

It is very uncomfortable to operate nested lists. Fortunately, the pipe gives a friendly interface, just chain it.

In [17]: list([[1, 2], [3, 4], [5,6]] | chain)
Out[17]: [1, 2, 3, 4, 5, 6]

In [18]: list((1, 2, 3) | chain_with([4, 5], [6,7]))
Out[18]: [1, 2, 3, 4, 5, 6, 7]

In [19]: list((1, 2, 3) | chain_with([4, 5], [6,[7],8]))
Out[19]: [1, 2, 3, 4, 5, 6, [7], 8]

As you can see above, the chain can only be disassembled one level, if we want to disassemble multiple layers of nesting, we use the traverse method.

In [20]: list([[1, 2], [[[3], [[4]]], [5]]] | traverse)
Out[20]: [1, 2, 3, 4, 5]

Combined with select method, get a set of field attributes in the dictionary.

In [22]: fruits = [
    ...:     ...:     {"name": "apple", "price": [2, 5]},
    ...:     ...:     {"name": "orange", "price": 4},
    ...:     ...:     {"name": "grape", "price": 5},
    ...:     ...: ]

In [23]: list(fruits
    ...:     ...:      | select(lambda fruit: fruit["price"])
    ...:     ...:      | traverse)
    ...:     ...:
Out[23]: [2, 5, 4, 5]

Grouping

Grouping the elements in a list is essential and can be done using groupby in pipe.

In [25]: list(numbers
    ...:     ...:      | groupby(lambda x: 'Even' if x % 2 == 0 else 'Odd')
    ...:     ...:      | select(lambda x: {x[0]: list(x[1])})
    ...:     ...:     )
    ...:     ...:
Out[25]: [{'Even': [0, 2, 4, 6, 8]}, {'Odd': [1, 3, 5, 7]}]

Similarly, you can also add where filter conditions to select.

In [26]: list(numbers
    ...:     ...:      | groupby(lambda x: 'Even' if x % 2 == 0 else 'Odd')
    ...:     ...:      | select(lambda x: {x[0]: list(x[1] | where(lambda x: x > 2))})
    ...:     ...:     )
    ...:     ...:
Out[26]: [{'Even': [4, 6, 8]}, {'Odd': [3, 5, 7]}]

Row and column swap

Rows and columns are often used to convert between rows and columns in data processing, especially when using DataFrame, which uses pipe of one line of code to get the row and column conversion.

In [27]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] | transpose
Out[27]: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

Remove elements

Deduplication of lists is also a common operation, and dedup is used in pipe to deduplicate lists.

In [28]: list([1, 1, 2, 2, 3, 3, 1, 2, 3] | dedup)
Out[28]: [1, 2, 3]

Unlike dedup, uniq will only keep one continuous repeating element, and non-consecutive repeating elements will not be filtered.

In [29]: list([1, 1, 2, 2, 3, 3, 1, 2, 3] | uniq)
Out[29]: [1, 2, 3, 1, 2, 3]

Today, we introduced a good way to process data. Using the pipe library can make tedious operations concise and improve the readability of the code.

References:

https://pypi.org/project/pipe/

Thank you for reading!

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Check out our Community Discord and join our Talent Collective.