avatarBen Hui

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9506

Abstract

pan class="hljs-number">10</span>, <span class="hljs-number">6</span>)) corr = wines.corr() hm = sns.heatmap(<span class="hljs-built_in">round</span>(corr,<span class="hljs-number">2</span>), annot=<span class="hljs-literal">True</span>, ax=ax, cmap=<span class="hljs-string">"coolwarm"</span>,fmt=<span class="hljs-string">'.2f'</span>, linewidths=<span class="hljs-number">.05</span>) f.subplots_adjust(top=<span class="hljs-number">0.93</span>) t= f.suptitle(<span class="hljs-string">'Wine Attributes Correlation Heatmap'</span>, fontsize=<span class="hljs-number">14</span>)</pre></div><figure id="200d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*LJgjBiTlLRbHkeGZu_SWZg.png"><figcaption></figcaption></figure><p id="8cce">For the specific attributes, we can use pairplot to see their relationship:</p><div id="a94e"><pre>specific_atts=[<span class="hljs-string">'density'</span>,<span class="hljs-string">'residual sugar'</span>,<span class="hljs-string">'total sulfur dioxide'</span>,<span class="hljs-string">'fixed acidity'</span>]

sns.pairplot(wines[specific_atts],diag_kind=<span class="hljs-string">'kde'</span>)</pre></div><figure id="a30f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*i3nZ0iYDlINykMh6LOmbjw.png"><figcaption></figcaption></figure><p id="f30e">Or we can use parallel coordinates to display the relationship between categories:</p><div id="b8d2"><pre>specific_atts=specific_atts+[<span class="hljs-string">'wine_type'</span>]

pd.plotting.parallel_coordinates(wines[specific_atts],<span class="hljs-string">'wine_type'</span>)</pre></div><figure id="9770"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*raQAGnCA5aiJfKYf0z8ylQ.png"><figcaption></figcaption></figure><p id="4878">For 2 continuous attributes, scatter plot and joint plot are the best options to see the relationship and distributions.</p><div id="84ff"><pre><span class="hljs-comment"># Scatter Plot</span> plt.scatter(wines[<span class="hljs-string">'sulphates'</span>], wines[<span class="hljs-string">'alcohol'</span>], alpha=<span class="hljs-number">0.4</span>, edgecolors=<span class="hljs-string">'w'</span>)

plt.xlabel(<span class="hljs-string">'Sulphates'</span>) plt.ylabel(<span class="hljs-string">'Alcohol'</span>) plt.title(<span class="hljs-string">'Wine Sulphates - Alcohol Content'</span>,y=<span class="hljs-number">1.05</span>)

<span class="hljs-comment"># Joint Plot</span> jp = sns.jointplot(x=<span class="hljs-string">'sulphates'</span>, y=<span class="hljs-string">'alcohol'</span>, data=wines, kind=<span class="hljs-string">'reg'</span>, space=<span class="hljs-number">0</span>, size=<span class="hljs-number">5</span>, ratio=<span class="hljs-number">4</span>)</pre></div><figure id="e75d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XBCh5jTJJQkqdGDreGKplg.png"><figcaption></figcaption></figure><figure id="2189"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hkVKqMc32WtAlqsKfDXBHw.png"><figcaption></figcaption></figure><p id="75e2">But for the 2 discrete attributes, we have to choose another type— bar chart:</p><div id="af9b"><pre><span class="hljs-comment"># Multi-bar Plot</span> cp = sns.countplot(x=<span class="hljs-string">"quality"</span>, hue=<span class="hljs-string">"wine_type"</span>, data=wines, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"#FFE888"</span>})</pre></div><figure id="8154"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gQKJoav-RCi1KjEYOf61lA.png"><figcaption></figcaption></figure><p id="b450">For the mixed attributes, multi-histograms are easy to use:</p><div id="16e5"><pre><span class="hljs-comment"># Using multiple Histograms</span> g = sns.FacetGrid(wines, hue=<span class="hljs-string">'wine_type'</span>, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"r"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"y"</span>},size=<span class="hljs-number">4</span>) g.<span class="hljs-built_in">map</span>(sns.distplot, <span class="hljs-string">'sulphates'</span>, kde=<span class="hljs-literal">False</span>, bins=<span class="hljs-number">15</span>, ax=ax).add_legend(title=<span class="hljs-string">'Wine Type'</span>) g.fig.suptitle(<span class="hljs-string">"Sulphates Content in Wine"</span>, fontsize=<span class="hljs-number">14</span>) g.set_axis_labels(<span class="hljs-string">'Sulphates'</span>,<span class="hljs-string">'Frequency'</span>) g.fig.subplots_adjust(top=<span class="hljs-number">0.85</span>, wspace=<span class="hljs-number">0.3</span>)</pre></div><figure id="9fca"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qblDVKQ9F7cdtvcqe4AjYg.png"><figcaption></figcaption></figure><p id="b1d2">Further more, box plots and violin plots are used for see the outliners and kernels respectively:</p><div id="5c16"><pre><span class="hljs-comment"># Box Plots</span> f, (ax) = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, figsize=(<span class="hljs-number">12</span>, <span class="hljs-number">4</span>)) f.suptitle(<span class="hljs-string">'Wine Quality - Alcohol Content'</span>, fontsize=<span class="hljs-number">14</span>)

sns.boxplot(x=<span class="hljs-string">"quality"</span>, y=<span class="hljs-string">"alcohol"</span>, data=wines, ax=ax) ax.set_xlabel(<span class="hljs-string">"Wine Quality"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) ax.set_ylabel(<span class="hljs-string">"Wine Alcohol %"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>)</pre></div><figure id="9a99"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*nGVIDAh5JRjVunY0_ZiFrA.png"><figcaption></figcaption></figure><div id="f782"><pre><span class="hljs-comment"># Violin Plots</span> f, (ax) = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">1</span>, figsize=(<span class="hljs-number">12</span>, <span class="hljs-number">4</span>)) f.suptitle(<span class="hljs-string">'Wine Quality - Sulphates Content'</span>, fontsize=<span class="hljs-number">14</span>)

sns.violinplot(x=<span class="hljs-string">"quality"</span>, y=<span class="hljs-string">"sulphates"</span>, data=wines, ax=ax) ax.set_xlabel(<span class="hljs-string">"Wine Quality"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) ax.set_ylabel(<span class="hljs-string">"Wine Sulphates"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>)</pre></div><figure id="f473"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vvat_mGwGH6A4dp1lvbiug.png"><figcaption></figcaption></figure><p id="61b6">3. 3-D visualization</p><p id="45ac">The 3rd dimension usually is the categorical attribute, displaying by different colors like “hue” parameter in Seaborn:</p><div id="40d9"><pre><span class="hljs-comment"># Scatter Plot with Hue for visualizing data in 3-D</span> cols = [<span class="hljs-string">'density'</span>, <span class="hljs-string">'residual sugar'</span>, <span class="hljs-string">'total sulfur dioxide'</span>, <span class="hljs-string">'fixed acidity'</span>, <span class="hljs-string">'wine_type'</span>] pp = sns.pairplot(wines[cols], hue=<span class="hljs-string">'wine_type'</span>, size=<span class="hljs-number">1.8</span>, aspect=<span class="hljs-number">1.8</span>, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"#FFE888"</span>}, plot_kws=<span class="hljs-built_in">dict</span>(edgecolor=<span class="hljs-string">"black"</span>, linewidth=<span class="hljs-number">0.5</span>)) fig = pp.fig fig.subplots_adjust(top=<span class="hljs-number">0.93</span>, wspace=<span class="hljs-number">0.3</span>) t = fig.suptitle(<span class="hljs-string">'Wine Attributes Pairwise Plots'</span>, fontsize=<span class="hljs-number">14</span>)</pre></div><figure id="ab4a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*L1ysAzHFrExcSzTTBp2RCA.png"><figcaption></figcaption></figure><p id="41c4">For 3 continuous attributes, we can use space’s 3-D scatter plot with x, y, z axis:</p><div id="0f10"><pre><span class="hljs-comment"># Visualizing 3-D numeric data with Scatter Plots</span> <span class="hljs-comment"># length, breadth and depth</span> fig = plt.figure(figsize=(<span class="hljs-number">8</span>, <span class="hljs-number">6</span>)) ax = fig.add_subplot(<span class="hljs-number">111</span>, projection=<span class="hljs-string">'3d'</span>)

xs = wines[<span class="hljs-string">'residual sugar'</span>] ys = wines[<span class="hljs-string">'fixed acidity'</span>] zs = wines[<span class="hljs-string">'alcohol'</span>] ax.scatter(xs, ys, zs, s=<span class="hljs-number">50</span>, alpha=<span class="hljs-number">0.6</span>, edgecolors=<span class="hljs-string">'w'</span>)

ax.set_xlabel(<span class="hljs-string">'Residual Sugar'</span>) ax.set_ylabel(<span class="hljs-string">'Fixed Acidity'</span>) ax.set_zlabel(<span class="hljs-string">'Alcohol'</span>)</pre></div><figure id="24b8"><img src="https://cdn-images-1.

Options

readmedium.com/v2/resize:fit:800/1IpDy1H4ISmA49tCcIXoSPw.png"><figcaption></figcaption></figure><p id="37ca">Sometimes the xyz style maybe not the best, we can use size as the 3rd dimension with a bubble chart:</p><div id="67f0"><pre><span class="hljs-comment"># Visualizing 3-D numeric data with a bubble chart</span> <span class="hljs-comment"># length, breadth and size</span> plt.scatter(wines[<span class="hljs-string">'fixed acidity'</span>], wines[<span class="hljs-string">'alcohol'</span>], s=wines[<span class="hljs-string">'residual sugar'</span>]<span class="hljs-number">25</span>, alpha=<span class="hljs-number">0.4</span>, edgecolors=<span class="hljs-string">'w'</span>)

plt.xlabel(<span class="hljs-string">'Fixed Acidity'</span>) plt.ylabel(<span class="hljs-string">'Alcohol'</span>) plt.title(<span class="hljs-string">'Wine Alcohol Content - Fixed Acidity - Residual Sugar'</span>,y=<span class="hljs-number">1.05</span>)</pre></div><figure id="62a3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*-iTaR3IR2k9jH21WtAozAQ.png"><figcaption></figcaption></figure><p id="96ac">For 3 discrete attributes, subplots are the easiest way:</p><div id="c97f"><pre><span class="hljs-comment"># Visualizing 3-D categorical data using bar plots</span> <span class="hljs-comment"># leveraging the concepts of hue and facets</span> fc = sns.factorplot(x=<span class="hljs-string">"quality"</span>, hue=<span class="hljs-string">"wine_type"</span>, col=<span class="hljs-string">"quality_label"</span>, data=wines, kind=<span class="hljs-string">"count"</span>, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"#FFE888"</span>})</pre></div><figure id="2d19"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1ME4tK8IYZoqChgjsXdNPQ.png"><figcaption></figcaption></figure><p id="e839">For mixed attributes, categorical colors will be the 3rd dimension:</p><div id="e7ac"><pre><span class="hljs-comment"># Visualizing 3-D mix data using scatter plots</span> <span class="hljs-comment"># leveraging the concepts of hue for categorical dimension</span> jp = sns.pairplot(wines, x_vars=[<span class="hljs-string">"sulphates"</span>], y_vars=[<span class="hljs-string">"alcohol"</span>], size=<span class="hljs-number">4.5</span>, hue=<span class="hljs-string">"wine_type"</span>, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"#FFE888"</span>}, plot_kws=<span class="hljs-built_in">dict</span>(edgecolor=<span class="hljs-string">"k"</span>, linewidth=<span class="hljs-number">0.5</span>))

<span class="hljs-comment"># we can also view relationships\correlations as needed </span> lp = sns.lmplot(x=<span class="hljs-string">'sulphates'</span>, y=<span class="hljs-string">'alcohol'</span>, hue=<span class="hljs-string">'wine_type'</span>, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"#FFE888"</span>}, data=wines, fit_reg=<span class="hljs-literal">True</span>, legend=<span class="hljs-literal">True</span>, scatter_kws=<span class="hljs-built_in">dict</span>(edgecolor=<span class="hljs-string">"k"</span>, linewidth=<span class="hljs-number">0.5</span>)) </pre></div><figure id="b7ed"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*_cnB1AgWIigH-muafp-mtg.png"><figcaption></figcaption></figure><figure id="bf70"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*5DIPWqRSBCGgIAlcdV0rxw.png"><figcaption></figcaption></figure><p id="b3b2">Instead, KDE can be used as well:</p><div id="8567"><pre><span class="hljs-comment"># Visualizing 3-D mix data using kernel density plots</span> <span class="hljs-comment"># leveraging the concepts of hue for categorical dimension</span> ax = sns.kdeplot(white_wine[<span class="hljs-string">'sulphates'</span>], white_wine[<span class="hljs-string">'alcohol'</span>], cmap=<span class="hljs-string">"YlOrBr"</span>, shade=<span class="hljs-literal">True</span>, shade_lowest=<span class="hljs-literal">False</span>) ax = sns.kdeplot(red_wine[<span class="hljs-string">'sulphates'</span>], red_wine[<span class="hljs-string">'alcohol'</span>], cmap=<span class="hljs-string">"Reds"</span>, shade=<span class="hljs-literal">True</span>, shade_lowest=<span class="hljs-literal">False</span>)</pre></div><figure id="b1d3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*h92QKia71F7Y-GdcHyIYyA.png"><figcaption></figcaption></figure><p id="2ff3">From the above, we can see red wines have more sulfate than white wines.</p><p id="eed8">Also, box plots and violin plots sometimes are used in 3-D subplots:</p><div id="c3a3"><pre><span class="hljs-comment"># Visualizing 3-D mix data using violin plots</span> <span class="hljs-comment"># leveraging the concepts of hue and axes for > 1 categorical dimensions</span> f, (ax1, ax2) = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, figsize=(<span class="hljs-number">14</span>, <span class="hljs-number">4</span>)) f.suptitle(<span class="hljs-string">'Wine Type - Quality - Acidity'</span>, fontsize=<span class="hljs-number">14</span>)

sns.violinplot(x=<span class="hljs-string">"quality"</span>, y=<span class="hljs-string">"volatile acidity"</span>, data=wines, inner=<span class="hljs-string">"quart"</span>, linewidth=<span class="hljs-number">1.3</span>,ax=ax1) ax1.set_xlabel(<span class="hljs-string">"Wine Quality"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) ax1.set_ylabel(<span class="hljs-string">"Wine Volatile Acidity"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>)

sns.violinplot(x=<span class="hljs-string">"quality"</span>, y=<span class="hljs-string">"volatile acidity"</span>, hue=<span class="hljs-string">"wine_type"</span>, data=wines, split=<span class="hljs-literal">True</span>, inner=<span class="hljs-string">"quart"</span>, linewidth=<span class="hljs-number">1.3</span>, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"white"</span>}, ax=ax2) ax2.set_xlabel(<span class="hljs-string">"Wine Quality"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) ax2.set_ylabel(<span class="hljs-string">"Wine Volatile Acidity"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) l = plt.legend(loc=<span class="hljs-string">'upper right'</span>, title=<span class="hljs-string">'Wine Type'</span>) </pre></div><figure id="564f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*mqEb47KCYi9oAG0LPojLuQ.png"><figcaption></figcaption></figure><div id="0b84"><pre><span class="hljs-comment"># Visualizing 3-D mix data using box plots</span> <span class="hljs-comment"># leveraging the concepts of hue and axes for > 1 categorical dimensions</span> f, (ax1, ax2) = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, figsize=(<span class="hljs-number">14</span>, <span class="hljs-number">4</span>)) f.suptitle(<span class="hljs-string">'Wine Type - Quality - Alcohol Content'</span>, fontsize=<span class="hljs-number">14</span>)

sns.boxplot(x=<span class="hljs-string">"quality"</span>, y=<span class="hljs-string">"alcohol"</span>, hue=<span class="hljs-string">"wine_type"</span>, data=wines, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"white"</span>}, ax=ax1) ax1.set_xlabel(<span class="hljs-string">"Wine Quality"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) ax1.set_ylabel(<span class="hljs-string">"Wine Alcohol %"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>)

sns.boxplot(x=<span class="hljs-string">"quality_label"</span>, y=<span class="hljs-string">"alcohol"</span>, hue=<span class="hljs-string">"wine_type"</span>, data=wines, palette={<span class="hljs-string">"red"</span>: <span class="hljs-string">"#FF9999"</span>, <span class="hljs-string">"white"</span>: <span class="hljs-string">"white"</span>}, ax=ax2) ax2.set_xlabel(<span class="hljs-string">"Wine Quality Class"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) ax2.set_ylabel(<span class="hljs-string">"Wine Alcohol %"</span>,size = <span class="hljs-number">12</span>,alpha=<span class="hljs-number">0.8</span>) l = plt.legend(loc=<span class="hljs-string">'best'</span>, title=<span class="hljs-string">'Wine Type'</span>) </pre></div><figure id="4c07"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*9T9A77F2OXGBHQVFDHLpMA.png"><figcaption></figcaption></figure><p id="5766">To be continued…</p><p id="d251">Thank you for reading.</p></article></body>

Multi-Dimension Visualization in Python Part I

Data visualization helps us to explore and review data easily. It is much better than a table or a long long paper. Most of time the visualization work is limited to 2-D. In this article, I am going to introduce how to plot and explore multi-dimension data (1-D to 6-D).

Firstly import the libraries we need:

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl
import numpy as np
import seaborn as sns
%matplotlib inline

Here we use winequanlity dataset as an example: https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

Before exploring the data, we do some manipulation:

# store wine type as an attribute
red_wine['wine_type'] = 'red'   
white_wine['wine_type'] = 'white'

# bucket wine quality scores into qualitative quality labels
red_wine['quality_label'] = red_wine['quality'].apply(lambda value: 'low' 
                                                          if value <= 5 else 'medium' 
                                                              if value <= 7 else 'high')
red_wine['quality_label'] = pd.Categorical(red_wine['quality_label'], 
                                           categories=['low', 'medium', 'high'])
white_wine['quality_label'] = white_wine['quality'].apply(lambda value: 'low' 
                                                              if value <= 5 else 'medium' 
                                                                  if value <= 7 else 'high')
white_wine['quality_label'] = pd.Categorical(white_wine['quality_label'], 
                                             categories=['low', 'medium', 'high'])

# union red and white wine datasets
wines = pd.concat([red_wine, white_wine])

# re-shuffle records just to randomize data points
wines = wines.sample(frac=1, random_state=42).reset_index(drop=True)

We concatenate both red and white wine to create a single wine dataset, and create a feature as “quality_label”.

  1. 1-D visualization

To display the distribution for all the attributes, Histogram is the best choice:

wines.hist(bins=15, color='steelblue', edgecolor='black', linewidth=1.0,
           xlabelsize=8, ylabelsize=8, grid=False)    
plt.tight_layout(rect=(0, 0, 1.2, 1.2)) 

It is a good idea for exploring all the attributes. Further more, let’s see another option for a continuous attribute:

# Histogram
fig = plt.figure(figsize = (6,4))
title = fig.suptitle("Sulphates Content in Wine", fontsize=14)
fig.subplots_adjust(top=0.85, wspace=0.3)

ax = fig.add_subplot(1,1, 1)
ax.set_xlabel("Sulphates")
ax.set_ylabel("Frequency") 
ax.text(1.2, 800, r'$\mu$='+str(round(wines['sulphates'].mean(),2)), 
         fontsize=12)
freq, bins, patches = ax.hist(wines['sulphates'], color='steelblue', bins=15,
                                    edgecolor='black', linewidth=1)


# Density Plot
fig = plt.figure(figsize = (6, 4))
title = fig.suptitle("Sulphates Content in Wine", fontsize=14)
fig.subplots_adjust(top=0.85, wspace=0.3)

ax1 = fig.add_subplot(1,1, 1)
ax1.set_xlabel("Sulphates")
ax1.set_ylabel("Frequency") 
sns.kdeplot(wines['sulphates'], ax=ax1, shade=True, color='steelblue')

We can see that the distribution of sulphates is right skew.

For a discrete attribute, a Histogram usually is the best one. Sometimes we can use a pie chart:

fig = plt.figure(figsize = (6, 4))
title = fig.suptitle("Wine Quality Frequency", fontsize=14)
fig.subplots_adjust(top=0.85, wspace=0.3)

ax1 = fig.add_subplot(1,1, 1)
ax1.set_xlabel("Quality")
ax1.set_ylabel("Frequency") 
ax1.pie(wines.groupby('quality').count()['quality_label'],labels=wines.groupby('quality').count().index)

2. 2-D visualization

2-D visualization is the most commonly used method.

Usually we use heatmap to display the correlations between attributes:

# Correlation Matrix Heatmap
f, ax = plt.subplots(figsize=(10, 6))
corr = wines.corr()
hm = sns.heatmap(round(corr,2), annot=True, ax=ax, cmap="coolwarm",fmt='.2f',
                 linewidths=.05)
f.subplots_adjust(top=0.93)
t= f.suptitle('Wine Attributes Correlation Heatmap', fontsize=14)

For the specific attributes, we can use pairplot to see their relationship:

specific_atts=['density','residual sugar','total sulfur dioxide','fixed acidity']

sns.pairplot(wines[specific_atts],diag_kind='kde')

Or we can use parallel coordinates to display the relationship between categories:

specific_atts=specific_atts+['wine_type']

pd.plotting.parallel_coordinates(wines[specific_atts],'wine_type')

For 2 continuous attributes, scatter plot and joint plot are the best options to see the relationship and distributions.

# Scatter Plot
plt.scatter(wines['sulphates'], wines['alcohol'],
            alpha=0.4, edgecolors='w')

plt.xlabel('Sulphates')
plt.ylabel('Alcohol')
plt.title('Wine Sulphates - Alcohol Content',y=1.05)


# Joint Plot
jp = sns.jointplot(x='sulphates', y='alcohol', data=wines,
                   kind='reg', space=0, size=5, ratio=4)

But for the 2 discrete attributes, we have to choose another type— bar chart:

# Multi-bar Plot
cp = sns.countplot(x="quality", hue="wine_type", data=wines, 
                   palette={"red": "#FF9999", "white": "#FFE888"})

For the mixed attributes, multi-histograms are easy to use:

# Using multiple Histograms
g = sns.FacetGrid(wines, hue='wine_type', palette={"red": "r", "white": "y"},size=4)
g.map(sns.distplot, 'sulphates', kde=False, bins=15, ax=ax).add_legend(title='Wine Type')
g.fig.suptitle("Sulphates Content in Wine", fontsize=14)
g.set_axis_labels('Sulphates','Frequency')
g.fig.subplots_adjust(top=0.85, wspace=0.3)

Further more, box plots and violin plots are used for see the outliners and kernels respectively:

# Box Plots
f, (ax) = plt.subplots(1, 1, figsize=(12, 4))
f.suptitle('Wine Quality - Alcohol Content', fontsize=14)

sns.boxplot(x="quality", y="alcohol", data=wines,  ax=ax)
ax.set_xlabel("Wine Quality",size = 12,alpha=0.8)
ax.set_ylabel("Wine Alcohol %",size = 12,alpha=0.8)
# Violin Plots
f, (ax) = plt.subplots(1, 1, figsize=(12, 4))
f.suptitle('Wine Quality - Sulphates Content', fontsize=14)

sns.violinplot(x="quality", y="sulphates", data=wines,  ax=ax)
ax.set_xlabel("Wine Quality",size = 12,alpha=0.8)
ax.set_ylabel("Wine Sulphates",size = 12,alpha=0.8)

3. 3-D visualization

The 3rd dimension usually is the categorical attribute, displaying by different colors like “hue” parameter in Seaborn:

# Scatter Plot with Hue for visualizing data in 3-D
cols = ['density', 'residual sugar', 'total sulfur dioxide', 'fixed acidity', 'wine_type']
pp = sns.pairplot(wines[cols], hue='wine_type', size=1.8, aspect=1.8, 
                  palette={"red": "#FF9999", "white": "#FFE888"},
                  plot_kws=dict(edgecolor="black", linewidth=0.5))
fig = pp.fig 
fig.subplots_adjust(top=0.93, wspace=0.3)
t = fig.suptitle('Wine Attributes Pairwise Plots', fontsize=14)

For 3 continuous attributes, we can use space’s 3-D scatter plot with x, y, z axis:

# Visualizing 3-D numeric data with Scatter Plots
# length, breadth and depth
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')

xs = wines['residual sugar']
ys = wines['fixed acidity']
zs = wines['alcohol']
ax.scatter(xs, ys, zs, s=50, alpha=0.6, edgecolors='w')

ax.set_xlabel('Residual Sugar')
ax.set_ylabel('Fixed Acidity')
ax.set_zlabel('Alcohol')

Sometimes the xyz style maybe not the best, we can use size as the 3rd dimension with a bubble chart:

# Visualizing 3-D numeric data with a bubble chart
# length, breadth and size
plt.scatter(wines['fixed acidity'], wines['alcohol'], s=wines['residual sugar']*25, 
            alpha=0.4, edgecolors='w')

plt.xlabel('Fixed Acidity')
plt.ylabel('Alcohol')
plt.title('Wine Alcohol Content - Fixed Acidity - Residual Sugar',y=1.05)

For 3 discrete attributes, subplots are the easiest way:

# Visualizing 3-D categorical data using bar plots
# leveraging the concepts of hue and facets
fc = sns.factorplot(x="quality", hue="wine_type", col="quality_label", 
                    data=wines, kind="count",
                    palette={"red": "#FF9999", "white": "#FFE888"})

For mixed attributes, categorical colors will be the 3rd dimension:

# Visualizing 3-D mix data using scatter plots
# leveraging the concepts of hue for categorical dimension
jp = sns.pairplot(wines, x_vars=["sulphates"], y_vars=["alcohol"], size=4.5,
                  hue="wine_type", palette={"red": "#FF9999", "white": "#FFE888"},
                  plot_kws=dict(edgecolor="k", linewidth=0.5))

# we can also view relationships\correlations as needed                  
lp = sns.lmplot(x='sulphates', y='alcohol', hue='wine_type', 
                palette={"red": "#FF9999", "white": "#FFE888"},
                data=wines, fit_reg=True, legend=True,
                scatter_kws=dict(edgecolor="k", linewidth=0.5)) 

Instead, KDE can be used as well:

# Visualizing 3-D mix data using kernel density plots
# leveraging the concepts of hue for categorical dimension
ax = sns.kdeplot(white_wine['sulphates'], white_wine['alcohol'],
                  cmap="YlOrBr", shade=True, shade_lowest=False)
ax = sns.kdeplot(red_wine['sulphates'], red_wine['alcohol'],
                  cmap="Reds", shade=True, shade_lowest=False)

From the above, we can see red wines have more sulfate than white wines.

Also, box plots and violin plots sometimes are used in 3-D subplots:

# Visualizing 3-D mix data using violin plots
# leveraging the concepts of hue and axes for > 1 categorical dimensions
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
f.suptitle('Wine Type - Quality - Acidity', fontsize=14)

sns.violinplot(x="quality", y="volatile acidity",
               data=wines, inner="quart", linewidth=1.3,ax=ax1)
ax1.set_xlabel("Wine Quality",size = 12,alpha=0.8)
ax1.set_ylabel("Wine Volatile Acidity",size = 12,alpha=0.8)

sns.violinplot(x="quality", y="volatile acidity", hue="wine_type", 
               data=wines, split=True, inner="quart", linewidth=1.3,
               palette={"red": "#FF9999", "white": "white"}, ax=ax2)
ax2.set_xlabel("Wine Quality",size = 12,alpha=0.8)
ax2.set_ylabel("Wine Volatile Acidity",size = 12,alpha=0.8)
l = plt.legend(loc='upper right', title='Wine Type') 
# Visualizing 3-D mix data using box plots
# leveraging the concepts of hue and axes for > 1 categorical dimensions
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))
f.suptitle('Wine Type - Quality - Alcohol Content', fontsize=14)

sns.boxplot(x="quality", y="alcohol", hue="wine_type",
               data=wines, palette={"red": "#FF9999", "white": "white"}, ax=ax1)
ax1.set_xlabel("Wine Quality",size = 12,alpha=0.8)
ax1.set_ylabel("Wine Alcohol %",size = 12,alpha=0.8)

sns.boxplot(x="quality_label", y="alcohol", hue="wine_type",
               data=wines, palette={"red": "#FF9999", "white": "white"}, ax=ax2)
ax2.set_xlabel("Wine Quality Class",size = 12,alpha=0.8)
ax2.set_ylabel("Wine Alcohol %",size = 12,alpha=0.8)
l = plt.legend(loc='best', title='Wine Type')   

To be continued…

Thank you for reading.

Python
Data Science
Data Analysis
Data Visualization
Data
Recommended from ReadMedium