avatarDavid Farrugia

Summary

The websiteundefined` website article discusses the integration of Mito AI with Jupyter notebooks to enhance data analytics by leveraging large language models for generating code from plain text prompts.

Abstract

The article introduces Mito AI, a new feature within the Mito package that revolutionizes data analytics by combining spreadsheet functionality with code generation through large language models. Mito AI allows users to perform complex data analysis tasks using plain text prompts, catering to both those who enjoy coding and those who prefer no-code solutions. The article guides readers through installing Mito and its Jupyter extension, demonstrating how to use Mito AI with practical examples, such as detecting duplicate rows, generating descriptive statistics, and executing k-means clustering. It emphasizes the tool's ability to understand open-ended prompts and produce accurate code, significantly speeding up the data analytics process. The author encourages readers to explore Mito AI's capabilities with their datasets and shares personal experiences of increased efficiency.

Opinions

  • The author is enthusiastic about Mito AI, stating it has "completely transformed" their data analytics and Jupyter notebook experience.
  • Mito AI is praised for its ability to provide an edge in data analytics by allowing customization beyond what no-code tools typically offer.
  • The author believes that the current buzz around large language models is justified due to their "really interesting use-cases."
  • The article suggests that Mito AI's text-to-code functionality is a game-changer for data scientists who want to maintain control over their code while benefiting from no-code efficiencies.
  • The author is impressed with Mito AI's performance, highlighting its speed and accuracy in generating code for various data analysis tasks.
  • A strong recommendation is made for readers to try Mito AI, with the author sharing a successful replication of a data analytics task in under 2 minutes using the tool.
  • The author expresses excitement about the future potential of Mito AI, ending with a rhetorical question about the direction of future developments in this space.

DATA | ANALYTICS | PYTHON

Using Large Language Models Directly in Your Jupyter Notebooks

A quick guide to take your data analytics to the next level using Mito AI

Photo by fabio on Unsplash

A while back, I’ve stumbled across an awesome package, Mito, that completely transformed my data analytics and Jupyter notebook experience.

Mito is a spreadsheet that can write code for you and help you visualise, filter, and analyse your data quickly and effectively.

And let me tell you. It just got a whole lot better!

Mito recently introduced a new feature — Mito AI. Unless you’ve been living under a rock the past few weeks, then you know that the current buzz in the tech world is Large Language Models (LLMs) — basically your ChatGPT and Bard AI. But this craze isn’t unfounded. In fact, LLMs have some really interesting use-cases.

Why Mito AI is a Game Changer

As data scientists, we love to question and to analyse. We want to be quick to answer any hunches that we might have but at the same time we want to do it in a meaningful way that can tell a story.

As a no-code tool, Mito provided us with out-of-the-box no-code functionality that can get this job done. But if you’re like me, and you actually enjoy the coding aspect of data analytics, then you might feel that no-code tools might take away some of your freedom to shape the analytics in your own way.

You don’t want your data analytics process to be identical to everyone else's. You long for that edge.

Mito AI enables its users just that. Text to code.

It’s still a no-code tool at heart, but it extends its arm to those who dare take control of their code.

In this article, we will go over how to get started with Mito AI and some of its cool tricks.

If you want to read more, you can find their documentation below:

Getting Started with Mito

If you want to install Mito, you can follow the guide in my previous article (linked above).

However, Mito has since introduced a Jupyter extension that is also handy. If you’ve never used Jupyter extensions, then I suggest that you first go over this documentation:

To get this extension simply:

pip install mitosheet

and then activate it as follows,

jupyter nbextension install --py --user mitosheet
jupyter nbextension enable --py --user mitosheet

When we launch our new notebook, we should see the ‘New Mitosheet’ option in our toolbar.

Image by Author

Playing Around with Mito AI

For illustration purposes, we will be using the familiar iris dataset. you can access it here.

Our button of focus for this article is the AI icon.

Image by Author

The AI functionality opens up 3 different tabs. One dedicated for quick access examples, the other for our prompt, and finally for the reply back.

Let’s jump into some of its cool tricks, shall we?

Prompt 1

check how many duplicated rows we have in this dataset

Mito AI responds with

iris.duplicated().sum()

We can also execute the generated code.

Let’s try something more complex.

Prompt 2

generate some descriptive statistics per species

iris.groupby('species').describe()
Image by Author

So far so good. But can we keep going?

Prompt 3

generate 3 clusters using kmeans

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(iris.drop('species', axis=1))
iris['cluster'] = kmeans.labels_

This is amazing! The generated code is correct and took less than 5 seconds.

Prompt 4

Let us a try an open ended prompt without any specific direction. We will ask it to generate the code for us to visualise the data per species. We won’t specify what to visualise, or how. We’ll just let it be. We want it to use a pairplot to visualise all features. We will throw in a little hint by saying that we also want to compare.

visualise and compare the different properties per species

import seaborn as sns
sns.pairplot(iris, hue='species')
Image by Author

And there we go, it managed to pick the correct visualisation type and generate the correct code for it.

Prompt 5

Let us try chaining some prompts together.

scale our data to be in the range between 0 and 1 and convert the species columns to numeric

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
iris.iloc[:, :-1] = scaler.fit_transform(iris.iloc[:, :-1])
iris['species'] = pd.factorize(iris['species'])[0]

Once again, the generated code is correct and on-point. Truly remarkable.

Concluding Remarks

I strongly recommend that you see for yourself. Play around with your own datasets and replicate your experiments. I’ve done the test of redoing one of my last data analytics task using only Mito AI — and I finished everything in under 2 minutes. Granted that this time I knew exactly what I was looking for — but still — it would have taken me much more than 2 minutes to simply write the code.

The question now is, where do we go from here? :)

Did you enjoy this post? For $5/month, you can become a member to unlock unlimited access to Medium. You will be directly supporting me and all your other favourite writers on Medium. So huge thanks for that!

Want to Get in Touch?

I would love to hear your thoughts on the topic, or anything AI and Data.

Drop me an email at [email protected] should you wish to get in touch.

Linkedin

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Join our Discord community and follow us on Twitter, LinkedIn and YouTube.

Learn how to build awareness and adoption for your startup with Circuit.

Data Science
Machine Learning
Python
Data
Software Development
Recommended from ReadMedium