avatarRoman Orac

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3759

Abstract

e>engine = sqlalchemy.create_engine( sqlalchemy.engine.url.URL( <span class="hljs-attribute">drivername</span>=<span class="hljs-string">"postgresql"</span>, <span class="hljs-attribute">username</span>=<span class="hljs-string">"username"</span>, <span class="hljs-attribute">password</span>=<span class="hljs-string">"password"</span>, <span class="hljs-attribute">host</span>=<span class="hljs-string">"host"</span>, <span class="hljs-attribute">port</span>=<span class="hljs-string">"port"</span>, <span class="hljs-attribute">database</span>=<span class="hljs-string">"database"</span>, ), <span class="hljs-attribute">echo_pool</span>=<span class="hljs-literal">True</span>, ) <span class="hljs-built_in">print</span>(<span class="hljs-string">"connecting with engine "</span> + str(engine))<span class="hljs-built_in"> connection </span>= engine.connect()</pre></div><div id="f758"><pre><span class="hljs-attribute">query</span> <span class="hljs-operator">=</span> <span class="hljs-string">"select * from table"</span></pre></div><div id="001b"><pre><span class="hljs-attr">df</span> = pd.read_sql_query(query, connection)</pre></div><p id="e81c">As you can see above, the SQL query is written in a string. Even if you put the query string in a separate cell in a Jupyter Notebook and you have SQL code formatted installed, it won’t format the query, because it is still a Python string.</p><p id="c4ab">Is there a better way?</p><h1 id="1bf9">Meet ipython-sql</h1><p id="9795"><a href="http://ipython-sql">ipython-sql</a> enables us to run SQL queries directly from a Jupyter Notebook. No need to write multiple lines of code to connect to the database or wrap the query in a string. ipython-sql makes querying a database from Jupyter Notebook “cleaner”.</p><figure id="347e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jfyBjjpRnqyYut_JK5k8hw.gif"><figcaption>Querying an SQLite database with ipython-sql</figcaption></figure><p id="2c19">To install ipython-sql simply run the following command in a Jupyter Notebook:</p><div id="a35e"><pre>!pip <span class="hljs-keyword">install</span> ipython-sql</pre></div><p id="ed73">Then load the SQL module:</p><div id="1f96"><pre><span class="hljs-tag">%<span class="hljs-selector-tag">load_ext</span></span> sql</pre></div><p id="dbe0">We need a connection string to connect to the database. For SQLite, it is as simple as:</p><div id="a772"><pre><span class="hljs-built_in">%sql</span> sqlite:<span class="hljs-comment">//</span></pre></div><p id="e3c9">If you’ve used <a href="http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls">SQLAlchemy</a> before to connect to a database, you can use the code below to obtain the connection string:</p><div id="360e"><pre><span class="hljs-function"><span class="hljs-title">print</span>(<span class="hljs-string">"connecting with engine "</span> + <span class="hljs-title">str</span>(<span class="hljs-variable">engine</span>))</span></pre></div><p id="bf5b" type="7">Note, each cell needs to be annotated with %%sql. This tells the Jupyter Notebook that the code in the cell is SQL.</p><p id="4431">Let’s add sample data to the SQLite database:</p><div id="fde0"><pre><span class="hljs-operator">%</span><span class="hljs-operator">%</span><span class="hljs-keyword">sql</span> <span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> sales ( key <span class="hljs-type">varchar</span>(<span class="hljs-number">6</span>), ts <span class="hljs-type">timestamp</span>, product <span class="hljs-type">integer</span>, completed <span class="hljs-type">boolean</span>, price <span class="hljs-type">floa

Options

t</span> ); <span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> sales <span class="hljs-keyword">VALUES</span> (<span class="hljs-string">'sale_1'</span>, <span class="hljs-string">'2019-11-08 00:00'</span>, <span class="hljs-number">0</span>, <span class="hljs-literal">TRUE</span>, <span class="hljs-number">1.1</span>), (<span class="hljs-string">'sale_2'</span>, <span class="hljs-string">'2019-11-08 01:00'</span>, <span class="hljs-number">0</span>, <span class="hljs-literal">FALSE</span>, <span class="hljs-number">1.2</span>), (<span class="hljs-string">'sale_3'</span>, <span class="hljs-string">'2019-11-08 01:00'</span>, <span class="hljs-number">0</span>, <span class="hljs-literal">TRUE</span>, <span class="hljs-number">1.3</span>), (<span class="hljs-string">'sale_4'</span>, <span class="hljs-string">'2019-11-08 01:00'</span>, <span class="hljs-number">1</span>, <span class="hljs-literal">FALSE</span>, <span class="hljs-number">1.4</span>), (<span class="hljs-string">'sale_5'</span>, <span class="hljs-string">'2019-11-08 02:00'</span>, <span class="hljs-number">1</span>, <span class="hljs-literal">TRUE</span>, <span class="hljs-number">1.5</span>), (<span class="hljs-string">'sale_6'</span>, <span class="hljs-string">'2019-11-08 02:00'</span>, <span class="hljs-number">1</span>, <span class="hljs-literal">TRUE</span>, <span class="hljs-number">1.5</span>);</pre></div><figure id="66f0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*y4N37iNW7JvdYgHXCVK1nQ.png"><figcaption></figcaption></figure><p id="8c24">Now, let’s query the database’</p><div id="16e1"><pre><span class="hljs-meta">%</span><span class="hljs-meta">%</span>sql</pre></div><div id="ef70"><pre><span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> sales;</pre></div><figure id="2e27"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Z_Bjgt6JjJ7AeRdBq65MzQ.png"><figcaption></figcaption></figure><p id="c11c">Works like a charm!</p><p id="a61b">We only ran the query, but results aren’t assigned to any variable, which is not much of a use. Luckily, Jupyter Notebook enables us to set the previous output of a cell to a variable:</p><div id="8c41"><pre><span class="hljs-attribute">result</span> <span class="hljs-operator">=</span> _</pre></div><p id="c8c2">Let’s check the type of result variable:</p><div id="17ec"><pre><span class="hljs-keyword">type</span>(result)</pre></div><div id="ada6"><pre><span class="hljs-meta"># sql.run.ResultSet</span></pre></div><p id="25ff">Now, let’s convert the ResultSet directly to a pandas DataFrame with:</p><div id="3304"><pre><span class="hljs-attribute">df</span> <span class="hljs-operator">=</span> result.DataFrame()</pre></div><figure id="15c6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*HSGopE8OeUmMJOuJ_oqN9A.png"><figcaption></figcaption></figure><p id="b375">This enables us to interactively explore the data.</p><figure id="9e8c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CdRo44aaJepnWxwyajZmAw.gif"><figcaption></figcaption></figure><h1 id="ab39">Before you go</h1><p id="dea3">Follow me on <a href="https://twitter.com/romanorac">Twitter</a>, where I regularly <a href="https://twitter.com/romanorac/status/1328952374447267843">tweet</a> about Data Science and Machine Learning.</p><figure id="aba9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*69rP1pwjJi9mLSFE"><figcaption>Photo by <a href="https://unsplash.com/@cmhedger?utm_source=medium&amp;utm_medium=referral">Courtney Hedger</a> on <a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></article></body>

How To run SQL queries from a Jupyter Notebook

SQL IDEs aren’t capable of visualizing the data. Did you know you can run and visualize the results of a query with Jupyter Notebooks?

Photo by Ben White on Unsplash

If you are working in the field of Data Science chances are you are using SQL to extract and preprocess the data. I’ve been using JetBrains DataGrip as my main SQL editor for a while now. While DataGrip does a decent job, it doesn't offer a way to visualize the data. Is there a better way?

Here are few links that might interest you:

- Complete your Python analyses 10x faster with Mito [Product]
- Free skill tests for Data Scientists & ML Engineers [Test]
- All New Self-Driving Car Engineer Nanodegree [Course]

Would you like to read more such articles? If so, you can support me by clicking on any links above. Some of them are affiliate links, but you don’t need to buy anything.

IDEs for SQL

Gif from Giphy

Integrated Development Environments for SQL usually don’t have the capability to visualize the results of a query. IMO that’s a good thing because each IDE would have its own proprietary way to visualize the data, which we would need to learn.

My workflow to visualize the data was: develop and execute the query in the SQL IDE, export the data to CSV and visualize it in a Jupyter Notebook.

Running queries in a Jupyter Notebook

Gif from Giphy

What to do when the query results are too big to be processed in a single query? One solution is to run the query on smaller time intervals — instead of a whole year, you can run the query by months.

Running a query by months requires 12 query runs for one year, which is not feasible to do by hand — for loops come in handy here.

It is always good to have multiple tools at your disposal

When faced with such problems, I’ve been using SQLAlchemy to connect to the database and extract the data to a pandas DataFrame, like in the example below:

import sqlalchemy
import pandas
engine = sqlalchemy.create_engine(
    sqlalchemy.engine.url.URL(
        drivername="postgresql",
        username="username",
        password="password",
        host="host",
        port="port",
        database="database",
    ),
    echo_pool=True,
)
print("connecting with engine " + str(engine))
connection = engine.connect()
query = "select * from table"
df = pd.read_sql_query(query, connection)

As you can see above, the SQL query is written in a string. Even if you put the query string in a separate cell in a Jupyter Notebook and you have SQL code formatted installed, it won’t format the query, because it is still a Python string.

Is there a better way?

Meet ipython-sql

ipython-sql enables us to run SQL queries directly from a Jupyter Notebook. No need to write multiple lines of code to connect to the database or wrap the query in a string. ipython-sql makes querying a database from Jupyter Notebook “cleaner”.

Querying an SQLite database with ipython-sql

To install ipython-sql simply run the following command in a Jupyter Notebook:

!pip install ipython-sql

Then load the SQL module:

%load_ext sql

We need a connection string to connect to the database. For SQLite, it is as simple as:

%sql sqlite://

If you’ve used SQLAlchemy before to connect to a database, you can use the code below to obtain the connection string:

print("connecting with engine " + str(engine))

Note, each cell needs to be annotated with %%sql. This tells the Jupyter Notebook that the code in the cell is SQL.

Let’s add sample data to the SQLite database:

%%sql
CREATE TABLE sales
(
    key       varchar(6),
    ts        timestamp,
    product   integer,
    completed boolean,
    price     float
);
INSERT INTO sales
VALUES ('sale_1', '2019-11-08 00:00', 0, TRUE, 1.1),
       ('sale_2', '2019-11-08 01:00', 0, FALSE, 1.2),
       ('sale_3', '2019-11-08 01:00', 0, TRUE, 1.3),
       ('sale_4', '2019-11-08 01:00', 1, FALSE, 1.4),
       ('sale_5', '2019-11-08 02:00', 1, TRUE, 1.5),
       ('sale_6', '2019-11-08 02:00', 1, TRUE, 1.5);

Now, let’s query the database’

%%sql
select * from sales;

Works like a charm!

We only ran the query, but results aren’t assigned to any variable, which is not much of a use. Luckily, Jupyter Notebook enables us to set the previous output of a cell to a variable:

result = _

Let’s check the type of result variable:

type(result)
# sql.run.ResultSet

Now, let’s convert the ResultSet directly to a pandas DataFrame with:

df = result.DataFrame()

This enables us to interactively explore the data.

Before you go

Follow me on Twitter, where I regularly tweet about Data Science and Machine Learning.

Photo by Courtney Hedger on Unsplash
Data Science
Sql
Jupyter Notebook
Database
Data Visualization
Recommended from ReadMedium