avatarAlessandro Arrigo

Summary

The article presents a curated list of unique and lesser-known R packages that enhance various aspects of data science workflows, including data visualization, cleaning, manipulation, and modeling.

Abstract

The author of the article, an experienced R user, shares a collection of underrated R packages that provide unique functionalities beyond the commonly known libraries. These packages range from those that celebrate the completion of simulations (BRRR and beepr) to tools for creating calendars (calendR), managing package versions (checkpoint), and interactively editing data (DataEditR). The list includes packages for data visualization like bayesplot, cowplot, and Rayshader, which offer advanced plotting capabilities and thematic maps. For data cleaning and manipulation, packages such as janitor, sqldf, and naniar are recommended for their efficient data management and cleaning functions. In the realm of data exploration and modeling, the article highlights packages like DataExplorer, finalfit, and syuzhet, which facilitate exploratory analysis, provide elegant result tables, and perform sentiment analysis, respectively. The author encourages readers to explore these packages and contribute their own discoveries, emphasizing the continuous evolution of R's package ecosystem.

Opinions

  • The author believes that the listed packages are valuable and underappreciated tools within the R community.
  • Some packages are celebrated for their fun and unique applications, such as generating abstract art (generativeart) or creating anatograms (gganatogram).
  • The author suggests that drake can significantly improve workflow efficiency by analyzing and orchestrating tasks, providing evidence that results match the code and data.
  • usethis is praised for automating repetitive tasks in project development, enhancing productivity for both R package and non-package projects.
  • The author expresses admiration for the aesthetic and functional improvements provided by packages like hrbrthemes and vaporwave, the latter of which allows users to style plots with a retro aesthetic.
  • janitor is highlighted for its practicality in data cleaning, with examples on its GitHub repository commended for their usefulness.
  • validate and related packages (errorlocate, deductive, dcmodify) are recommended for their robust data validation capabilities.
  • The finalfit package is particularly noted for its ability to produce clean, publication-ready tables and plots, streamlining the presentation of research findings.
  • The article concludes with an invitation for readers to share their own discoveries, indicating a community-driven approach to expanding the list of underrated R packages.

The Most Underrated R packages: 2020 Edition

A curated list of awesome and less known R libraries

Photo by Safar Safarov on Unsplash

In my experience as an R user, I’ve come across a lot of different packages and curated lists. Some are in my bookmarks like the great awesome-R list, or the monthly “best of” list curated by R studio. If you don’t know them, go check them out asap.

In this post, I’d like to show you something else. These are the results of late-night GitHub/Reddit browsing, and cool stuff shared by colleagues.

Some of these packages are really unique, others are just fun to use and real underdogs among the data scientist/statistician I’ve worked with.

Let’s start!

💥Misc (the weird ones) 💥

  • BRRR and beepr: Have you ever wanted to know — and celebrate — when your simulations are finally done running in R? Have you ever been so proud of pulling off a tricky bit of code that you wanted Flavor Flav to yell “yeaaahhhh, boi!!” as soon as it successfully completes?
  • calendR: Ready to print monthly and yearly calendars made with ggplot2.
  • checkpoint: It makes it possible to install package versions from a specific date in the past as if you had a CRAN time machine.
  • DataEditR: DataEditR is a lightweight package to interactively view, enter or edit data in R.
  • Drake: It analyzes your workflow, skips steps with up-to-date results, and orchestrates the rest with optional distributed computing. In the end, drake provides evidence that your results match the underlying code and data, which increases your ability to trust your research
  • flow: Visualize as flow diagrams the logic of functions, expressions or scripts and ease debugging.
From “flow” Github. Look at this beauty.
  • generativeart: Beautiful math-inspired abstract art.
  • here: The goal of the here package is to enable easy file referencing. In contrast to using setwd(), which is fragile and dependent on the way you organize your files, here uses the top-level directory of a project to easily build paths to files.
  • installr: It allows users to update R and all installed packages with just one command.
  • mailR: Send Email from inside R.
  • plumber: An R package that converts your existing R code to a web API.
  • pushoverr: Send push notifications from R to mobile devices or the desktop.
  • statcheck: statcheck is a free, open-source R package that can be used to automatically extract statistical null-hypothesis significant testing (NHST) results from articles and recompute the p-values based on the reported test statistic and degrees of freedom to detect possible inconsistencies.
  • usethis: usethis is a workflow package: it automates repetitive tasks that arise during project setup and development, both for R packages and non-package projects.

✨Data Visualization ✨

  • bayesplot: An R package providing an extensive library of plotting functions for use after fitting Bayesian models (typically with MCMC). The plots created by bayesplot are ggplot objects, which means that after a plot is created it can be further customized using various functions from the ggplot2 package.
  • cowplot: Awesome for aligning graphs to grids.
  • Esquisse: Basically creates a drag & drop GUI for ggplot, so you don’t have to code the majority of the plots.
  • hrbrthemes: Additional Themes and Theme Components for ‘ggplot2’.
  • gganatogram: Create anatograms using ggplot2. Yeah, for real.
  • ggannotate: ggannotate is a point-and-click tool to help you put your annotations exactly where you want them to go.
  • golem: The package makes creating production-ready shiny apps child’s play.
  • patchwork: The goal of patchwork is to make it ridiculously simple to combine separate ggplots into the same graphic.
  • Rayshader: 3D Plots that doesn’t suck.
From “Rayshader” Github.
  • see: Easy blackboard background for ggplots.
  • sjplots: Collection of plotting and table output functions for data visualization.
  • tmap: Astonishing thematic maps in R.
  • vaporwave: A E S T H E T I C S’ time. Beautify your plots like it’s still the 80's.
  • visreg: for displaying the results of a fitted model in terms of how a predictor variable x is estimated to affect an outcome y.
  • wesanderson: A Wes Anderson color palette for R

🛁 Data Cleaning and Manipulation 🛁

  • janitor: A lot of cool function to clean data, go check their example on the github link.
  • sqldf: data management using SQL syntax. That’s the way to go if you need to load data bigger than your machine can handle. You can filter it and load on R just that selection.
  • naniar: All you need for Missing Data.
  • Tidylog: It provides feedback about basic dplyr operations. Great for long pipe chains.
  • validate: A great package to check if your data obeys predefined rules (to be used with errorlocate by the same author). Also by the same author, go check deductive and dcmodify.

💻 Data Exploration and Modelling 💻

  • DataExplorer: Great functions for exploratory analysis.
  • dlookr: Several custom functions for data quality diagnosis and EDA in a compact form.
  • DHARMa: an interesting R package for residual diagnostics of GLMMs.
  • finalfit: The finalfit package provides functions that help you quickly create elegant final results tables and plots when modelling in R. These can easily be exported as Word documents, PDFs, or html files. Example below.
explanatory = c("age.factor", "sex.factor", 
  "obstruct.factor", "perfor.factor")
dependent = 'mort_5yr'
colon_s %>%
  finalfit(dependent, explanatory, metrics=TRUE) -> t2
knitr::kable(t2[[1]], row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
knitr::kable(t2[[2]], row.names=FALSE, col.names="")
Super clean, publication ready table. From “finalfit” Github.
  • LongCatEDA: Useful to visualize longitudinal changes in categorical variables.
  • jtools: This package consists of a series of functions created by the author (Jacob) to automate otherwise tedious research tasks. At this juncture, the unifying theme is the more efficient presentation of regression analyses.
  • modelbased: It’s a lightweight package helping with model-based estimations, used in the computation of marginal means, contrast analysis and predictions.
  • performance: The primary goal of the performance package is to provide utilities for computing indices of model quality and goodness of fit. This includes measures like r-squared (R2), root mean squared error (RMSE) or intraclass correlation coefficient (ICC) , but also functions to check (mixed) models for overdispersion, zero-inflation, convergence or singularity.
  • skimr: A frictionless, pipeable approach to dealing with summary statistics.
  • speedglm: Fast glm for big data.
  • syuzhet: Easy sentiment analysis in R.

I hope you found something useful or fun for your work. Let me know in the comment section if you have something cool to add to the list!

EDIT 2022: Check out part 2 of this article!

Data Science
R
Statistics
R Package
Analytics
Recommended from ReadMedium