Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

e>engine = sqlalchemy.create_engine( sqlalchemy.engine.url.URL( drivername="postgresql", username="username", password="password", host="host", port="port", database="database", ), echo_pool=True, ) print("connecting with engine " + str(engine)) connection = engine.connect()</pre></div><div id="f758"><pre>query = "select * from table"</pre></div><div id="001b"><pre>df = pd.read_sql_query(query, connection)</pre></div>As you can see above, the SQL query is written in a string. Even if you put the query string in a separate cell in a Jupyter Notebook and you have SQL code formatted installed, it won’t format the query, because it is still a Python string.Is there a better way?<h1 id="1bf9">Meet ipython-sql</h1><a href="http://ipython-sql">ipython-sql</a> enables us to run SQL queries directly from a Jupyter Notebook. No need to write multiple lines of code to connect to the database or wrap the query in a string. ipython-sql makes querying a database from Jupyter Notebook “cleaner”.<figure id="347e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jfyBjjpRnqyYut_JK5k8hw.gif"><figcaption>Querying an SQLite database with ipython-sql</figcaption></figure>To install ipython-sql simply run the following command in a Jupyter Notebook:<div id="a35e"><pre>!pip install ipython-sql</pre></div>Then load the SQL module:<div id="1f96"><pre>%load_ext sql</pre></div>We need a connection string to connect to the database. For SQLite, it is as simple as:<div id="a772"><pre>%sql sqlite://</pre></div>If you’ve used <a href="http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls">SQLAlchemy</a> before to connect to a database, you can use the code below to obtain the connection string:<div id="360e"><pre>print("connecting with engine " + str(engine))</pre></div>Note, each cell needs to be annotated with %%sql. This tells the Jupyter Notebook that the code in the cell is SQL.Let’s add sample data to the SQLite database:<div id="fde0"><pre>%%sql CREATE TABLE sales ( key varchar(6), ts timestamp, product integer, completed boolean, price floa

Options

t ); INSERT INTO sales VALUES ('sale_1', '2019-11-08 00:00', 0, TRUE, 1.1), ('sale_2', '2019-11-08 01:00', 0, FALSE, 1.2), ('sale_3', '2019-11-08 01:00', 0, TRUE, 1.3), ('sale_4', '2019-11-08 01:00', 1, FALSE, 1.4), ('sale_5', '2019-11-08 02:00', 1, TRUE, 1.5), ('sale_6', '2019-11-08 02:00', 1, TRUE, 1.5);</pre></div><figure id="66f0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*y4N37iNW7JvdYgHXCVK1nQ.png"><figcaption></figcaption></figure>Now, let’s query the database’<div id="16e1"><pre>%%sql</pre></div><div id="ef70"><pre>select * from sales;</pre></div><figure id="2e27"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Z_Bjgt6JjJ7AeRdBq65MzQ.png"><figcaption></figcaption></figure>Works like a charm!We only ran the query, but results aren’t assigned to any variable, which is not much of a use. Luckily, Jupyter Notebook enables us to set the previous output of a cell to a variable:<div id="8c41"><pre>result = _</pre></div>Let’s check the type of result variable:<div id="17ec"><pre>type(result)</pre></div><div id="ada6"><pre># sql.run.ResultSet</pre></div>Now, let’s convert the ResultSet directly to a pandas DataFrame with:<div id="3304"><pre>df = result.DataFrame()</pre></div><figure id="15c6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*HSGopE8OeUmMJOuJ_oqN9A.png"><figcaption></figcaption></figure>This enables us to interactively explore the data.<figure id="9e8c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CdRo44aaJepnWxwyajZmAw.gif"><figcaption></figcaption></figure><h1 id="ab39">Before you go</h1>Follow me on <a href="https://twitter.com/romanorac">Twitter</a>, where I regularly <a href="https://twitter.com/romanorac/status/1328952374447267843">tweet</a> about Data Science and Machine Learning.<figure id="aba9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*69rP1pwjJi9mLSFE"><figcaption>Photo by <a href="https://unsplash.com/@cmhedger?utm_source=medium&utm_medium=referral">Courtney Hedger</a> on <a href="https://unsplash.com/?utm_source=medium&utm_medium=referral">Unsplash</a></figcaption></figure></article></body>

How To run SQL queries from a Jupyter Notebook

SQL IDEs aren’t capable of visualizing the data. Did you know you can run and visualize the results of a query with Jupyter Notebooks?

If you are working in the field of Data Science chances are you are using SQL to extract and preprocess the data. I’ve been using JetBrains DataGrip as my main SQL editor for a while now. While DataGrip does a decent job, it doesn't offer a way to visualize the data. Is there a better way?

Here are few links that might interest you:

- Complete your Python analyses 10x faster with Mito [Product]

- Free skill tests for Data Scientists & ML Engineers [Test]

- All New Self-Driving Car Engineer Nanodegree [Course]

Would you like to read more such articles? If so, you can support me by clicking on any links above. Some of them are affiliate links, but you don’t need to buy anything.

IDEs for SQL

Integrated Development Environments for SQL usually don’t have the capability to visualize the results of a query. IMO that’s a good thing because each IDE would have its own proprietary way to visualize the data, which we would need to learn.

My workflow to visualize the data was: develop and execute the query in the SQL IDE, export the data to CSV and visualize it in a Jupyter Notebook.

Running queries in a Jupyter Notebook

What to do when the query results are too big to be processed in a single query? One solution is to run the query on smaller time intervals — instead of a whole year, you can run the query by months.

Running a query by months requires 12 query runs for one year, which is not feasible to do by hand — for loops come in handy here.

It is always good to have multiple tools at your disposal

When faced with such problems, I’ve been using SQLAlchemy to connect to the database and extract the data to a pandas DataFrame, like in the example below:

import sqlalchemy
import pandas

engine = sqlalchemy.create_engine(
    sqlalchemy.engine.url.URL(
        drivername="postgresql",
        username="username",
        password="password",
        host="host",
        port="port",
        database="database",
    ),
    echo_pool=True,
)
print("connecting with engine " + str(engine))
connection = engine.connect()

query = "select * from table"

df = pd.read_sql_query(query, connection)

As you can see above, the SQL query is written in a string. Even if you put the query string in a separate cell in a Jupyter Notebook and you have SQL code formatted installed, it won’t format the query, because it is still a Python string.

Is there a better way?

Meet ipython-sql

ipython-sql enables us to run SQL queries directly from a Jupyter Notebook. No need to write multiple lines of code to connect to the database or wrap the query in a string. ipython-sql makes querying a database from Jupyter Notebook “cleaner”.

Querying an SQLite database with ipython-sql

To install ipython-sql simply run the following command in a Jupyter Notebook:

!pip install ipython-sql

Then load the SQL module:

%load_ext sql

We need a connection string to connect to the database. For SQLite, it is as simple as:

%sql sqlite://

If you’ve used SQLAlchemy before to connect to a database, you can use the code below to obtain the connection string:

print("connecting with engine " + str(engine))

Note, each cell needs to be annotated with %%sql. This tells the Jupyter Notebook that the code in the cell is SQL.

Let’s add sample data to the SQLite database:

%%sql
CREATE TABLE sales
(
    key       varchar(6),
    ts        timestamp,
    product   integer,
    completed boolean,
    price     float
);
INSERT INTO sales
VALUES ('sale_1', '2019-11-08 00:00', 0, TRUE, 1.1),
       ('sale_2', '2019-11-08 01:00', 0, FALSE, 1.2),
       ('sale_3', '2019-11-08 01:00', 0, TRUE, 1.3),
       ('sale_4', '2019-11-08 01:00', 1, FALSE, 1.4),
       ('sale_5', '2019-11-08 02:00', 1, TRUE, 1.5),
       ('sale_6', '2019-11-08 02:00', 1, TRUE, 1.5);

Now, let’s query the database’

%%sql

select * from sales;

Works like a charm!

We only ran the query, but results aren’t assigned to any variable, which is not much of a use. Luckily, Jupyter Notebook enables us to set the previous output of a cell to a variable:

result = _

Let’s check the type of result variable:

type(result)

# sql.run.ResultSet

Now, let’s convert the ResultSet directly to a pandas DataFrame with:

df = result.DataFrame()

This enables us to interactively explore the data.

Before you go

Follow me on Twitter, where I regularly tweet about Data Science and Machine Learning.