avatarChristianlauer

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

2179

Abstract

e offers a huge number of high-quality mathematical functions to work with these matrices and arrays.</p><div id="e5dc" class="link-block"> <a href="https://numpy.org/"> <div> <div> <h2>NumPy</h2> <div><h3>Why NumPy? Powerful n-dimensional arrays. Numerical computing tools. Interoperable. Performant. Open source.</h3></div> <div><p>numpy.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*Yap3-Nq4o80Cs1ri)"></div> </div> </div> </a> </div><h2 id="e65d">BigQuery Client Libraries</h2><p id="d7d8">Anyone working with Big Data also needs systems designed for this purpose, such as Google’s BigQuery Data Lake and Warehouse technology. Google offers an official solution here. So you can easily process from and to BigQuery data via Python. Other solutions like Amazon Redshift or Snowflake also offer such libaries or connectors.</p><div id="ad5d" class="link-block"> <a href="https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries"> <div> <div> <h2>Quickstart: Using client libraries | BigQuery | Google Cloud</h2> <div><h3>This page shows you how to get started with the BigQuery API in your favorite programming language. Sign in to your…</h3></div> <div><p>cloud.google.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*z-b-TJH0BYVb443Z)"></div> </div> </div> </a> </div><h2 id="f33a">SQLite</h2><p id="14ce">While for MySQL and PostreSQL you have to work with conectors and additional modules, for SQLite you only have to work with the corresponding library.</p><blockquote id="e294"><p>SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query langu

Options

age. -python.org[1]</p></blockquote><div id="9f5d" class="link-block"> <a href="https://docs.python.org/3/library/sqlite3.html"> <div> <div> <h2>sqlite3 - DB-API 2.0 interface for SQLite databases - Python 3.10.0 documentation</h2> <div><h3>Source code: Lib/sqlite3/ SQLite is a C library that provides a lightweight disk-based database that doesn't require a…</h3></div> <div><p>docs.python.org</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/)"></div> </div> </div> </a> </div><h2 id="9e57">Summary</h2><p id="a3c0">There are some really useful libraries to work with Python even more efficiently. Often Data Scientists and corresponding libraries are in the foreground. However, some of these and others from the area of data integration can also make the everyday life of engineers much easier. Here, I have listed some libraries that I often use in my daily work and when processing Big Data.</p> <figure id="0183"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2FfSYmbgG5Ug8S11K0FU%2Ftwitter%2Fiframe&amp;display_name=Giphy&amp;url=https%3A%2F%2Fmedia.giphy.com%2Fmedia%2FfSYmbgG5Ug8S11K0FU%2Fgiphy.gif&amp;image=https%3A%2F%2Fmedia2.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExcTBuZXF1eTRuaGJrOXR0cmwxYnFxaWMzb3oyYnY4OXVmNzI0MHdrdyZlcD12MV9naWZzX2dpZklkJmN0PWc%2FfSYmbgG5Ug8S11K0FU%2Fgiphy.gif&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=giphy" allowfullscreen="" frameborder="0" height="244" width="435"> </div> </div> </figure></iframe></div></div></figure><h2 id="5497">Sources and Further Readings</h2><p id="c0f8">[1] python.org, <a href="https://docs.python.org/3/library/sqlite3.html#module-sqlite3"><code>sqli</code>te3</a> — DB-API 2.0 interface for SQLite databases (2021)</p></article></body>

My Top Big Data Python Libraries

Which Libraries can help to process Big Data?

Photo by Brooke Cagle on Unsplash

While Data Scientists primarily rely on libraries such as Keras or Tensorflow, Data Engineers can also benefit from practical libraries in their daily lives. I would like to introduce you to the ones I like to use here. The first one is probably the best known and is widely used in many areas.

Pandas

The library can read data of different formats. Functions are available for data cleansing, for aggregating or transforming the data and for other tasks. Pandas Library has its strengths in the evaluation and processing of tabular data. Therefore, it’s great for the whole data integration part when building ETL and ELT pipelines from source to target systems.

NumPy

NumPy extends the Python programming language with powerful data structures for efficient computation with large arrays and matrices.

The implementation targets extremely large amounts of data in the form of matrices and arrays. Furthermore, the module offers a huge number of high-quality mathematical functions to work with these matrices and arrays.

BigQuery Client Libraries

Anyone working with Big Data also needs systems designed for this purpose, such as Google’s BigQuery Data Lake and Warehouse technology. Google offers an official solution here. So you can easily process from and to BigQuery data via Python. Other solutions like Amazon Redshift or Snowflake also offer such libaries or connectors.

SQLite

While for MySQL and PostreSQL you have to work with conectors and additional modules, for SQLite you only have to work with the corresponding library.

SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. -python.org[1]

Summary

There are some really useful libraries to work with Python even more efficiently. Often Data Scientists and corresponding libraries are in the foreground. However, some of these and others from the area of data integration can also make the everyday life of engineers much easier. Here, I have listed some libraries that I often use in my daily work and when processing Big Data.

Sources and Further Readings

[1] python.org, sqlite3 — DB-API 2.0 interface for SQLite databases (2021)

Python
Data Science
Big Data
Technology
Data Engineering
Recommended from ReadMedium