avatarAmy @GrabNGoInfo

Summary

The provided content is a comprehensive tutorial on how to use R programming within Google Colab notebooks, detailing how to switch between Python and R code and dataframes.

Abstract

This tutorial explains the process of integrating R into Google Colab, a platform typically used for Python programming in data science. It covers the creation of an R notebook in Colab, the use of the rpy2 package to run R code within a Python Colab notebook, and the conversion of dataframes between Python and R. The guide outlines steps to activate R magic commands, install and import R packages, and convert dataframes between R and Python, ensuring seamless data manipulation and analysis across both languages. It also addresses common errors and provides solutions for working with dataframes in the Colab environment. The tutorial is supplemented with video resources, blog posts, and references to further learning materials.

Opinions

  • The author, Amy @GrabNGoInfo, advocates for the use of both Python and R in data science projects to leverage the strengths of each language.
  • The tutorial suggests that some R packages may be more mature than their Python counterparts, implying a preference for certain R functionalities in specific scenarios.
  • By providing a referral link to Medium, the author encourages readers to support the platform and the writers, including themselves, through membership.
  • The author promotes additional learning resources, such as their YouTube channel and website, indicating a commitment to community education and engagement in data science topics.
  • The recommendation of an AI service at the end of the article suggests the author's endorsement of cost-effective alternatives to ChatGPT Plus (GPT-4) for similar AI performance and functions.

How to Use R with Google Colab Notebook

Run both python and R code in the same Python Colab notebook, and switch between python dataframe and R dataframe

Photo by Jess Bailey on Unsplash

This tutorial talks about how to use R with Google Colab notebook. Google Colab notebook is typically used for python in data science, but there are some R packages that are more mature than their python counterparts. So it’s convenient to use both python and R for data science projects and get the best of both languages.

In this tutorial, You will learn:

  • How to create a Colab notebook for R?
  • How to run both python and R code in the same Python Colab notebook?
  • How to switch between python dataframe and R dataframe?

Resources for this post:

Let’s get started!

Section 1: Create R Notebook in Google Colab

To change the Colab notebook’s default language from python to R, use the link https://colab.to/r to open a new page.

We can run R code directly in this page. For example, if we type x <- 2+3 and print the value for x, we will get the result of 5.

We can also confirm the runtime type by going to runtime -> Change runtime type. A window for Notebook settings will pop up. under Runtime type, we can see R is selected.

Google Colab R Notebook — Image from GrabNGoInfo.com

Section 2: Run R and Python in the Same Notebook

The rpy2 package enables us to run R code in a python colab notebook using the magic command.

  • To run multiple lines of R code in a cell, put %%R at the beginning of the cell.
  • To run a single line of R code, put %R at the beginning of the line.

Section 2.1: Activate R Magic in Colab Notebook

To active the R magic command in the Google Colab notebook, use %load_ext rpy2.ipython

# activate R magic
%load_ext rpy2.ipython

Section 2.2: Install and Import R Packages in Colab Notebook

After activating the R magic, we can use the normal R code to install and import packages with the magic command %%R at the beginning of the cell.

%%R
# Install R package tableone
install.packages('tableone')
# Import the R package
library(tableone)

Section 2.3: Convert Python Dataframe to R Dataframe

To convert between python dataframe and R dataframe, we need to import packages from rpy2. We also imported pandas to create an example pandas dataframe.

# Import pandas
import pandas as pd
# Import rpy2 for dataframe conversion
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
from rpy2.robjects import globalenv

Firstly, let’s create a pandas dataframe using python. This dataframe has three columns and 3 rows.

# Create a pandas dataframe
df = pd.DataFrame({'fruit_name': ['apple', 'orange', 'grape'],
                  'fruit_id': [1, 2, 3],
                  'quantity': [150, 300, 200]})
# Take a look at the data
df
Create a Pandas DataFrame — Image from GrabNGoInfo.com

Next, let’s convert the python dataframe to the R dataframe using py2rpy. We can see that the output type is R dataframe.

# Convert the python dataframe to the R dataframe
with localconverter(ro.default_converter + pandas2ri.converter):
  dfr = ro.conversion.py2rpy(df)
# Check the type of the convertion output
type(dfr)

Output:

rpy2.robjects.vectors.DataFrame

However, when trying to run R functions on the dataframe, we got the error message of object 'dfr' not found.

%R print(summary(dfr))

Output:

RInterpreterError: Failed to parse and evaluate line 'print(summary(dfr))'.
R error message: "Error in summary(dfr) : object 'dfr' not found"

This is because the converted R dataframe is not in the global environment. After creating a variable name in R’s global environment, we can run R functions without getting error messages.

# Create a variable name in R's global environment
globalenv['dfr'] = dfr
# Print statistics
%R print(summary(dfr))

Output:

fruit_name           fruit_id      quantity    
 Length:3           Min.   :1.0   Min.   :150.0  
 Class :character   1st Qu.:1.5   1st Qu.:175.0  
 Mode  :character   Median :2.0   Median :200.0  
                    Mean   :2.0   Mean   :216.7  
                    3rd Qu.:2.5   3rd Qu.:250.0  
                    Max.   :3.0   Max.   :300.0  
StrMatrix with 18 elements.
'Length:3...	'Class :c...	'Mode :c...	...	'Mean :...	'3rd Qu.:...	'Max. :...

Section 2.4: Convert R Dataframe to Python Dataframe

To convert R Dataframe to python dataframe, we use the rpy2py function. We can see that the output is the pandas dataframe.

# Convert R Dataframe to python dataframe
with localconverter(ro.default_converter + pandas2ri.converter):
  dfpd = ro.conversion.rpy2py(dfr)
type(dfpd)

Output:

pandas.core.frame.DataFrame

After conversion, we can operate on the dataframe directly using python code.

# Run python code on the converted python dataframe
dfpd.describe()
Convert R DataFrame to Python DataFrame — Image from GrabNGoInfo.com

Summary

This tutorial talked about how to use R with Google Colab notebook. You learned:

  • How to create a Colab notebook for R?
  • How to run both python and R code in the same Python Colab notebook?
  • How to switch between python dataframe and R dataframe?

More tutorials are available on GrabNGoInfo YouTube Channel and GrabNGoInfo.com.

Recommended Tutorials

References

Google Colab
Google Colaboratory
Colab R Notebook
R Dataframe To Python
Recommended from ReadMedium