How to use Python & SQL to manipulate data in 1 min
Just read on!

1.Introduction
Hi all. This post is going to be a bit unique and not lengthy like my previous articles.
I just discovered a great python library and I wanted to share that with my audience.
Would you like to use both Python and SQL to manipulate data?
If you answered yes, read on!
2. The library
FugueSQL is an interface that allows users to use SQL to work with Pandas, Spark, and Dask DataFrames.
A brief summary:
Fugue is a unified interface for distributed computing that lets users execute Python, pandas, and SQL code on Spark and Dask without rewrites.
- Data scientists/analysts who want to focus on defining logic rather than worrying about execution
- SQL-lovers wanting to use SQL to define end-to-end workflows in pandas, Spark, and Dask.
- Data scientists using pandas wanting to take advantage of Spark or Dask with minimal effort.
- Data teams with big data projects that struggle maintaining code.
The official page of the library is the following: https://github.com/fugue-project/fugue#fuguesql
NEW: After a great deal of hard work and staying behind the scenes for quite a while, we’re excited to now offer our expertise through a platform, the “Data Science Hub” on Patreon (https://www.patreon.com/TheDataScienceHub). This hub is our way of providing you with bespoke consulting services and comprehensive responses to all your inquiries, ranging from Machine Learning to strategic data analytics planning.
Another resource. Learn Data Science and ML with the help of an 🤖 AI-powered tutor. Start here https://aigents.co/learn choose a topic and he will show up where you need him. No paywall, no signups, no ads.
3. A short example
Install it:
python3 -m pip install fugueExample using SELECT, WHERE and PRINT commands:
from fugue_sql import fsql
import pandas as pd# Build a pandas DataFrame
df = pd.DataFrame({"monthly_readers":[1000,2000,3000],
"topic" :["ML","AI","Python"]})print(df)
# monthly_readers topic
# 0 1000 ML
# 1 2000 AI
# 2 3000 Python# Define the query: print the topics that had more than 1000 readers
query = """
SELECT topic FROM df
WHERE monthly_readers > 1000
PRINT
"""# execute the query
fsql(query).run()# PandasDataFrame
# topic:str
# — — — — -
# AI
# Python
# Total count: 2
That’s all folks!
As said at the beginning, this post was not going to be as lengthy as my previous articles.
Hope you liked this article! Feel free to share!






