DATA | ANALYTICS | PYTHON
Using Large Language Models Directly in Your Jupyter Notebooks
A quick guide to take your data analytics to the next level using Mito AI
A while back, I’ve stumbled across an awesome package, Mito, that completely transformed my data analytics and Jupyter notebook experience.
Mito is a spreadsheet that can write code for you and help you visualise, filter, and analyse your data quickly and effectively.
And let me tell you. It just got a whole lot better!
Mito recently introduced a new feature — Mito AI. Unless you’ve been living under a rock the past few weeks, then you know that the current buzz in the tech world is Large Language Models (LLMs) — basically your ChatGPT and Bard AI. But this craze isn’t unfounded. In fact, LLMs have some really interesting use-cases.
Why Mito AI is a Game Changer
As data scientists, we love to question and to analyse. We want to be quick to answer any hunches that we might have but at the same time we want to do it in a meaningful way that can tell a story.
As a no-code tool, Mito provided us with out-of-the-box no-code functionality that can get this job done. But if you’re like me, and you actually enjoy the coding aspect of data analytics, then you might feel that no-code tools might take away some of your freedom to shape the analytics in your own way.
You don’t want your data analytics process to be identical to everyone else's. You long for that edge.
Mito AI enables its users just that. Text to code.
It’s still a no-code tool at heart, but it extends its arm to those who dare take control of their code.
In this article, we will go over how to get started with Mito AI and some of its cool tricks.
If you want to read more, you can find their documentation below:
Getting Started with Mito
If you want to install Mito, you can follow the guide in my previous article (linked above).
However, Mito has since introduced a Jupyter extension that is also handy. If you’ve never used Jupyter extensions, then I suggest that you first go over this documentation:
To get this extension simply:
pip install mitosheet
and then activate it as follows,
jupyter nbextension install --py --user mitosheet
jupyter nbextension enable --py --user mitosheet
When we launch our new notebook, we should see the ‘New Mitosheet’ option in our toolbar.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vSsyIB3J3ao1d3En8z4c3Q.png)
Playing Around with Mito AI
For illustration purposes, we will be using the familiar iris dataset. you can access it here.
Our button of focus for this article is the AI icon.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*P-vrTR60zthSF9UTomBdtQ.png)
The AI functionality opens up 3 different tabs. One dedicated for quick access examples, the other for our prompt, and finally for the reply back.
Let’s jump into some of its cool tricks, shall we?
Prompt 1
check how many duplicated rows we have in this dataset
Mito AI responds with
iris.duplicated().sum()
We can also execute the generated code.
Let’s try something more complex.
Prompt 2
generate some descriptive statistics per species
iris.groupby('species').describe()
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ziXiD_9mtHQ6TPqPadv8Kw.png)
So far so good. But can we keep going?
Prompt 3
generate 3 clusters using kmeans
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(iris.drop('species', axis=1))
iris['cluster'] = kmeans.labels_
This is amazing! The generated code is correct and took less than 5 seconds.
Prompt 4
Let us a try an open ended prompt without any specific direction. We will ask it to generate the code for us to visualise the data per species. We won’t specify what to visualise, or how. We’ll just let it be. We want it to use a pairplot to visualise all features. We will throw in a little hint by saying that we also want to compare.
visualise and compare the different properties per species
import seaborn as sns
sns.pairplot(iris, hue='species')
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MljagT9MdGisIEWLUYkoUA.png)
And there we go, it managed to pick the correct visualisation type and generate the correct code for it.
Prompt 5
Let us try chaining some prompts together.
scale our data to be in the range between 0 and 1 and convert the species columns to numeric
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
iris.iloc[:, :-1] = scaler.fit_transform(iris.iloc[:, :-1])
iris['species'] = pd.factorize(iris['species'])[0]
Once again, the generated code is correct and on-point. Truly remarkable.
Concluding Remarks
I strongly recommend that you see for yourself. Play around with your own datasets and replicate your experiments. I’ve done the test of redoing one of my last data analytics task using only Mito AI — and I finished everything in under 2 minutes. Granted that this time I knew exactly what I was looking for — but still — it would have taken me much more than 2 minutes to simply write the code.
The question now is, where do we go from here? :)
Did you enjoy this post? For $5/month, you can become a member to unlock unlimited access to Medium. You will be directly supporting me and all your other favourite writers on Medium. So huge thanks for that!
Want to Get in Touch?
I would love to hear your thoughts on the topic, or anything AI and Data.
Drop me an email at [email protected] should you wish to get in touch.
More content at PlainEnglish.io. Sign up for our free weekly newsletter. Join our Discord community and follow us on Twitter, LinkedIn and YouTube.
Learn how to build awareness and adoption for your startup with Circuit.