avatarDario Radečić

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

11440

Abstract

interest ( <code>dataframe_namecolumn_name</code>) to get the value out.</p><p id="5fd8">The surrounding <code>paste()</code> function is here just to give some context, or to print the function name in front of the value:</p><div id="8cca"><pre>paste<span class="hljs-punctuation">(</span><span class="hljs-string">"Min. salary:"</span><span class="hljs-punctuation">,</span> <span class="hljs-built_in">min</span><span class="hljs-punctuation">(</span>data<span class="hljs-operator"></span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span> paste<span class="hljs-punctuation">(</span><span class="hljs-string">"Average. salary:"</span><span class="hljs-punctuation">,</span> mean<span class="hljs-punctuation">(</span>data<span class="hljs-operator"></span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span> paste<span class="hljs-punctuation">(</span><span class="hljs-string">"Median salary:"</span><span class="hljs-punctuation">,</span> median<span class="hljs-punctuation">(</span>data<span class="hljs-operator"></span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span> paste<span class="hljs-punctuation">(</span><span class="hljs-string">"Max salary:"</span><span class="hljs-punctuation">,</span> <span class="hljs-built_in">max</span><span class="hljs-punctuation">(</span>data<span class="hljs-operator"></span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span> paste<span class="hljs-punctuation">(</span><span class="hljs-string">"STdev. salary:"</span><span class="hljs-punctuation">,</span> sd<span class="hljs-punctuation">(</span>data<span class="hljs-operator"></span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span></pre></div><p id="b783">These are the results you’ll see:</p><figure id="ea74"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2rFGMwz3zd5fRgBU5FXYRQ.png"><figcaption>Image 5 — Summary statistics in R</figcaption></figure><p id="724c">And that’s the basics of summary statistics for you. Up next, let’s discuss string concatenation.</p><h2 id="396b">Concatenation</h2><p id="23ae">String concatenation is one of the typical data manipulation and preprocessing operations. The idea is to take two (or more) columns and combine their values into a single new column.</p><p id="0eba">Luckily for us, we have the <code>First Name</code> and <code>Last Name</code> columns that beg to be combined into a <code>Full Name</code> column.</p><p id="8858">In Excel, you’ll want to call the <code>CONCAT()</code> function and pass in both values with a single space in between. Here's an example:</p><div id="1a71"><pre>=CONCAT(B2;<span class="hljs-string">" "</span>;C2)</pre></div><p id="10a7">And here’s what everything will look like when you apply the formula to the entire column:</p><figure id="6595"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4FVbA0wPVgwV1uuNfkrNCg.png"><figcaption>Image 6 — String concatenation in Excel</figcaption></figure><p id="185a">In R, you can use the <code>mutate()</code> function from <code>dplyr</code> to do the same:</p><div id="cb4b"><pre>library<span class="hljs-punctuation">(</span>dplyr<span class="hljs-punctuation">)</span>

data <span class="hljs-operator">%>%</span> mutate<span class="hljs-punctuation">(</span>FullName <span class="hljs-operator">=</span> paste<span class="hljs-punctuation">(</span>FirstName<span class="hljs-punctuation">,</span> LastName<span class="hljs-punctuation">,</span> sep <span class="hljs-operator">=</span> <span class="hljs-string">" "</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span></pre></div><p id="4f0c">The results are identical, not taking into consideration the UI differences:</p><figure id="767f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1wE0b4zqZlox6yWCQwA8eQ.png"><figcaption>Image 7 — String concatenation in R</figcaption></figure><p id="4998">The <code>mutate()</code> function is responsible for creating new attributes. In it, you're basically assigning values to a new column. The <code>paste()</code> function will combine multiple fields to produce the final results.</p><blockquote id="347a"><p><i>Are you new to R dplyr and want to learn more? <a href="https://appsilon.com/r-dplyr-tutorial/">We have a complete beginners guide to dplyr on our blog</a>.</i></p></blockquote><h2 id="e34d">Conditions</h2><p id="602b">Up next, let’s tackle conditions. In here, you want the cell value to populate differently, depending on values from different columns.</p><p id="dc90">We’ll create a new column named <code>Higher than avg. sal</code> that will have a value of "Y" if the salary for a given employee is greater than the average salary across the company, and "N" otherwise.</p><p id="557e">Here’s the Excel formula you need to use:</p><div id="ff70"><pre>=<span class="hljs-variable constant_">IF</span>(<span class="hljs-variable constant_">F2</span>><span class="hljs-variable constant_">AVERAGE</span>(F<span class="hljs-variable">2</span><span class="hljs-symbol">:F</span><span class="hljs-variable">6</span>); <span class="hljs-string">"Y"</span>; <span class="hljs-string">"N"</span>)</pre></div><p id="2f7d">And these are the results:</p><figure id="50fc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*B-xDMlDNPrOxibPQSrUkqg.png"><figcaption>Image 8 — Conditional statements in Excel</figcaption></figure><p id="c576">In R, you can once again use the <code>mutate()</code> function and assign the result to be the output from the <code>ifelse()</code> function. The declaration is pretty much self-explanatory:</p><div id="dc76"><pre>data <span class="hljs-operator">%>%</span> mutate<span class="hljs-punctuation">(</span>HigherThanAvgSal <span class="hljs-operator">=</span> ifelse<span class="hljs-punctuation">(</span>Salary <span class="hljs-operator">></span> mean<span class="hljs-punctuation">(</span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">,</span> <span class="hljs-string">"Y"</span><span class="hljs-punctuation">,</span> <span class="hljs-string">"N"</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span></pre></div><p id="a351">The results are once again identical:</p><figure id="955d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4AkSOF32zOch2ENsyIw4-A.png"><figcaption>Image 9 — Conditional statements in R</figcaption></figure><p id="ad52">And that’s conditionals for you. Up next, let’s go over string manipulation.</p><h2 id="e04e">String Manipulation</h2><p id="ff89">You can do all sorts of stuff with strings in Excel and R. We’ll demonstrate that on a rather simple example of extracting initials from first and last names.</p><p id="a5ad">To achieve this in Excel, you’ll need to use the familiar <code>CONCAT()</code> function, with its inner elements being the results of a <code>LEFT()</code> function. This one will extract the first N letters from a string.</p><p id="dc1f">If we extract only the first letter for both first and last names and then combine them, we will get the initials.</p><p id="1b9b">Here’s the implementation in Excel:</p><div id="5fc4"><pre><span class="hljs-operator">=</span>CONCAT(<span class="hljs-keyword">LEFT</span>(B2;<span class="hljs-number">1</span>); <span class="hljs-keyword">LEFT</span>(C2;<span class="hljs-number">1</span>))</pre></div><p id="df84">These are the results you should see:</p><figure id="a7ee"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*zEbSKqHuU5uBxDC-9JGk5w.png"><figcaption>Image 10 — String manipulation in Excel</figcaption></figure><p id="e90f">Things are quite similar in R. Instead of <code>LEFT</code>, you'll need to use <code>substr()</code> and paste the start and end points. These are then surrounded by the familiar <code>paste()</code> function:</p><div id="2108"><pre><span class="hljs-selector-tag">data</span> %>% <span class="hljs-selector-tag">mutate</span>(Initials = <span class="hljs-built_in">paste</span>(<span class="hljs-built_in">substr</span>(FirstName, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>), <span class="hljs-built_in">substr</span>(LastName, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>), sep = <span class="hljs-string">""</span>))</pre></div><p id="41ac">And you get the same output as with Excel:</p><figure id="d35f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*AS5q2UqtrIJ7xacyDTKEYA.png"><figcaption>Image 11 — String manipulation in R</figcaption></figure><p id="fe91">And finally, let’s go over date manipulations.</p><h2 id="f1e8">Date Calculations</h2><p id="9752">There’s no getting around dates. They’re an integral part of any dataset, and in ours, they signify the exact day at which the employee has joined the company.</p><p id="4e96">What we’re aiming to accomplish is simple — calculate how many years has the employee been in our company.</p><p id="5ca6">In Excel, one way of doing this is by subtracting the year of the hire date from the year of the current date. Here’s an example:</p><div id="69b5"><pre><span class="hljs-operator">=</span><span class="hljs-keyword">YEAR</span>(TODAY())<span class="hljs-operator">-</span><span class="hljs-keyword">YEAR</span>(G2)</pre></div><p id="3835">This is what you should see populated in your column:</p><figure id="fa27"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*QRIT5N5GOAzyTgPJ-__MjA.png"><figcaption>Image 12 — Date calculations in Excel</figcaption></figure><p id="cb0a">Doing the same in R is much easier with the <code>lubridate</code> pacakge. Make sure to install it first - <code>install.packages("lubridate")</code>, and then use the function call similar to the one from Excel:</p><div id="c1e4"><pre><span class="hljs-built_in">library</span>(lubridate)

data %>% <span class="hljs-built_in">mutate</span>(YearsExp = year(today()) - <span class="hljs-built_in">year</span>(DateJoined))</pre></div><p id="94bc">Once again, the results are identical:</p><figure id="c973"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xEUY36Dw-GhAspWTpCBf4Q.png"><figcaption>Image 13 — Date calculations in R</figcaption></figure><p id="d1e8">But we’re not done with this section yet. R has a secret weapon that will leave a lot of Excel users jealous.</p><h2 id="faa3">R dplyr — Do Everything With One Command</h2><p id="d431">And the secret is — the <code>dplyr</code> package allows you to chain operations together. For example, you can create all of the previously seen attributes in a single call.</p><p id="5dc7">Here’s the code for doing so:</p><div id="5d68"><pre>library<span class="hljs-punctuation">(</span>dplyr<span class="hljs-punctuation">)</span> library<span class="hljs-punctuation">(</span>lubridate<span class="hljs-punctuation">)</span>

data <span class="hljs-operator"><-</span> data <span class="hljs-operator">%>%</span> mutate<span class="hljs-punctuation">(</span> FullName <span class="hljs-operator">=</span> paste<span class="hljs-punctuation">(</span>FirstName<span class="hljs-punctuation">,</span> LastName<span class="hljs-punctuation">,</span> sep <span class="hljs-operator">=</span> <span class="hljs-string"

Options

" "</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">,</span> HigherThanAvgSal <span class="hljs-operator">=</span> ifelse<span class="hljs-punctuation">(</span>Salary <span class="hljs-operator">></span> mean<span class="hljs-punctuation">(</span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">,</span> <span class="hljs-string">"Y"</span><span class="hljs-punctuation">,</span> <span class="hljs-string">"N"</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">,</span> Initials <span class="hljs-operator">=</span> paste<span class="hljs-punctuation">(</span>substr<span class="hljs-punctuation">(</span>FirstName<span class="hljs-punctuation">,</span> <span class="hljs-number">1</span><span class="hljs-punctuation">,</span> <span class="hljs-number">1</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">,</span> substr<span class="hljs-punctuation">(</span>LastName<span class="hljs-punctuation">,</span> <span class="hljs-number">1</span><span class="hljs-punctuation">,</span> <span class="hljs-number">1</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">,</span> sep <span class="hljs-operator">=</span> <span class="hljs-string">""</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">,</span> YearsExp <span class="hljs-operator">=</span> year<span class="hljs-punctuation">(</span>today<span class="hljs-punctuation">(</span><span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span> <span class="hljs-operator">-</span> year<span class="hljs-punctuation">(</span>DateJoined<span class="hljs-punctuation">)</span> <span class="hljs-punctuation">)</span>

data</pre></div><p id="bc4a">This is the output you will see:</p><figure id="a48b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ap7sBz7t77NfWUkl_qoKfg.png"><figcaption>Image 14 — Multiple column creation with R dplyr mutate()</figcaption></figure><p id="5c8f">With R, there’s no guesswork in what’s going on. You can clearly read the code and the order of operations.</p><h1 id="fff2">Excel Functions in R for Data Summarization and Charts</h1><p id="58d0">This section will consider one other important aspect of the R vs. Excel debate — charts. Excel makes it easy to produce good-looking data visualizations, but you’ll have to wait and see if it’s up to par with what R has to offer.</p><h2 id="6c65">Conditional Aggregation and Charting in Excel</h2><p id="c122">Our chart logic is quite simple. We want to display the total monthly salaries across departments. This calculation doesn’t involve any work for the Sales and Marketing departments, since there’s only one employee there. But still, the formula will scale to future employees.</p><p id="ac48">The <code>SUMIF()</code> function in Excel allows us to calculate the sum only on those records where a defined condition is met. You first define the condition range (departments), then the criterion (IT, Sales, Marketing), and finally the range from which the values will be summed up:</p><div id="75be"><pre>=SUMIF(E2:E6;<span class="hljs-string">"IT"</span>;F2:F6) =SUMIF(E2:E6;<span class="hljs-string">"Sales"</span>;F2:F6) =SUMIF(E2:E6;<span class="hljs-string">"Marketing"</span>;F2:F6)</pre></div><p id="e7b6">Once calculated, you can select the attributes and create your chart with a couple of clicks:</p><figure id="3394"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*GkjskN91neZY0XHUONc3Eg.png"><figcaption>Image 15 — Chart in Excel</figcaption></figure><p id="9290">Neat. Let’s see what we have to do to achieve the same in R.</p><h2 id="a8f2">Conditional Aggregation and Charting in R</h2><p id="c2fe">R’s <code>dplyr</code> packs a convenient <code>group_by()</code> function that allows us to group the dataset by an apartment, and calculate the summary statistics from there.</p><p id="5883">All of the results will be stored to a new <code>data.frame</code>, just for the sake of easier visualization:</p><div id="9e90"><pre>department_salaries <span class="hljs-operator"><-</span> data <span class="hljs-operator">%>%</span> group_by<span class="hljs-punctuation">(</span>Department<span class="hljs-punctuation">)</span> <span class="hljs-operator">%>%</span> summarize<span class="hljs-punctuation">(</span>Salary <span class="hljs-operator">=</span> <span class="hljs-built_in">sum</span><span class="hljs-punctuation">(</span>Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span>

department_salaries</pre></div><p id="6700">This is what the summarized <code>data.frame</code> looks like:</p><figure id="390a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aAXF80yQuUZBFDffX_RVug.png"><figcaption>Image 16 — Summary statistics in R</figcaption></figure><p id="5192">You can now leverage the <code>ggplot2</code> package to make a bar chart. There's no point in explaining the function call line by line, so just read it and see what happens if you remove some bits and pieces:</p><div id="78c9"><pre>library<span class="hljs-punctuation">(</span>ggplot2<span class="hljs-punctuation">)</span>

bar_chart <span class="hljs-operator"><-</span> ggplot<span class="hljs-punctuation">(</span>department_salaries<span class="hljs-punctuation">,</span> aes<span class="hljs-punctuation">(</span>x <span class="hljs-operator">=</span> Department<span class="hljs-punctuation">,</span> y <span class="hljs-operator">=</span> Salary<span class="hljs-punctuation">)</span><span class="hljs-punctuation">)</span> <span class="hljs-operator">+</span> geom_bar<span class="hljs-punctuation">(</span>stat <span class="hljs-operator">=</span> <span class="hljs-string">"identity"</span><span class="hljs-punctuation">,</span> fill <span class="hljs-operator">=</span> <span class="hljs-string">"steelblue"</span><span class="hljs-punctuation">)</span> <span class="hljs-operator">+</span> labs<span class="hljs-punctuation">(</span>title <span class="hljs-operator">=</span> <span class="hljs-string">"Total monthly salaries by department"</span><span class="hljs-punctuation">,</span> y <span class="hljs-operator">=</span> <span class="hljs-string">"Salary in $"</span><span class="hljs-punctuation">,</span> x <span class="hljs-operator">=</span> <span class="hljs-string">"Department"</span><span class="hljs-punctuation">)</span>

bar_chart</pre></div><p id="8617">This is the chart you’ll see after running the code:</p><figure id="71a3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vyv9WvYW3p_VHbg7ZApmJg.png"><figcaption>Image 17 — Chart in R with ggplot2</figcaption></figure><p id="a424">Looking to dive deeper into ggplot2? <a href="https://appsilon.com/ggplot2-bar-charts/">We have an entire article series on making stunning charts, bar charts included</a>.</p><p id="37c2">And that’s basically it for the “how” part. The only thing left to do is discuss the reasons why you might consider switching from Excel to R.</p><h1 id="d50b">Why Transition from Excel to R — Is R a Viable Excel Alternative?</h1><p id="1f6e">Excel isn’t going anywhere, that’s a fact. It’s a tool enjoyed by millions worldwide, while R is a more niched technology. And that’s fine, R was never meant to replace Excel for average users.</p><p id="0b5d">Where R becomes interesting is if you consider yourself an Excel power user. That’s where you run into Excel limitations, such as:</p><ol><li><b>Limited data capacity: </b>This will depend on your Excel version, but the more recent ones have a cap of 1048576 rows and 16384 columns. It might be insufficient for handling huge datasets.</li><li><b>Performance issues: </b>Excel can become slow and unresponsive when dealing with complex calculations and large datasets, whereas R is designed to handle such tasks efficiently and without hiccups.</li><li><b>Limited statistical analysis: </b>Excel provides basic statistical functions, while R offers a comprehensive suite of statistical packages, allowing you to do more advanced things.</li><li><b>Lack of version control: </b>Excel lacks built-in version control, making it difficult to track changes and collaborate on projects compared to R, which can use version control systems like Git. Sure, you can version control an Excel file with Git in theory, but no one does it for a reason.</li><li><b>Reproducibility and automation challenges: </b>Excel lacks built-in features for reproducibility, making it harder to document and reproduce data analysis compared to R, which supports scripts and notebooks. Also, Excel requires manual execution of tasks, making it less suitable for automated data processing and analysis compared to R.</li></ol><p id="bf05">This is just a handful of reasons why we think Excel is a no-go for power users and companies working with big datasets. Your mileage may vary, and you’re likely to find additional advantages/disadvantages in your area of work.</p><p id="243a">But still, we think R is a viable Excel alternative whenever you do anything south of basic analysis and visualization.</p><h1 id="57dd">Summing up Excel Functions in R</h1><p id="b432">And there you have it — Excel functions in R, and why R is a viable alternative to Excel, especially as the amount of data grows. We hope the transition to R doesn’t scare you. It’s a quite simple programming language, but will definitely require some time to get used to. It’s a new way of thinking and implementing actions, after all.</p><p id="64ad">The examples provided today should be enough to get you started. R’s <code>dplyr</code> package supports pretty much anything Excel does, at least in the realm of data analysis and manipulation. If you want to visualize your data, look no further than <code>ggplot2</code> for static visualizations.</p><p id="2e1c"><i>Did you successfully transition from Excel to R? Or did you find a reason not to?</i> Whatever the case, make sure to share your thoughts with us in the comment section below. You can also reach out on Twitter — <a href="https://twitter.com/appsilon?lang=en">@appsilon</a>.</p><blockquote id="e957"><p><i>Is Excel obsolete? <a href="https://appsilon.com/switch-from-excel-to-r-shiny/">Here’s how (and why) to switch to R Shiny instead</a>.</i></p></blockquote><p id="be38"><i>Loved the article? Become a <a href="https://medium.com/@radecicdario/membership">Medium member</a> to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.</i></p><div id="efe9" class="link-block"> <a href="https://medium.com/@radecicdario/membership"> <div> <div> <h2>Join Medium with my referral link - Dario Radečić</h2> <div><h3>As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*rRU4bfEUsfphzzrh)"></div> </div> </div> </a> </div><p id="e449"><i>Originally published at <a href="https://appsilon.com/excel-functions-in-r/">https://appsilon.com</a>.</i></p></article></body>

Can I Do That In R? Advanced Excel Functions in R for Data Manipulation

From Excel to R — Advanced Excel Functions in R for Effortless Data Manipulation and Superior Scalability

Article thumbnail

Does Excel.. excel at everything? Should you consider R programming language instead? Well, yes — but not for the reasons you might initially think. Excel is fun and intuitive but has some obvious problems with reproducibility and scalability. More on that later.

This article will walk you through several examples of Excel functions in R. We’ll see how things are done in Excel and in R, and then do our best to compare the two. Maybe you’re a seasoned Excel user and want to dip your toes in data manipulation in R. You’ve found just the right place to start.

Looking to combine R and Excel? We bring you list of packages to combine the best of both worlds.

Before We Start

In this article, we’ll delve into the world of data manipulation in R and explore how it compares to Excel, especially when dealing with advanced functions. We’ll walk you through various examples, showing how to perform tasks that you might be familiar with in Excel but with the power and flexibility of R. But before we dive into the exciting world of data manipulation and analysis, let’s address an essential aspect for beginners: package installation.

Package Installation

If you’re new to R, it’s important to note that R’s strength lies in its extensibility through packages. We’ll use several R packages, such as dplyr, readxl, lubridate, and ggplot2, to enhance data manipulation and visualization. For our beginner audience, it could be helpful to understand how to install these packages. To get started, you can install these packages using the install.packages() function, which ensures that you have the necessary tools at your disposal to follow along with the examples in this article.

This video is a brief walkthrough for installing R packages in RStudio and from GitHub.

So, let’s embark on this journey of data manipulation in R, armed with the right tools to excel in your data tasks!

Creating and Reading Data in Excel vs. R

First things first, we need some data. Creating it in Excel is easy, fast, and intuitive. After all, it was built for creating and manipulating somewhat decently large data sheets.

For this article, we’ll create a dataset of employee information. We’ll keep track of their first name, last name, age, department, salary, and the date they joined our made-up company. Feel free to create your own Excel data sheet for your needs. I recommend keeping it relatively simple for the tutorial.

Here’s what the data looks like:

Image 1 — Tabular data created in Excel

The question now is — how can you replicate this in R? Well, you can declare a dataset from scratch by wrapping it into a data.frame() and providing a list of column names and values, as shown below:

data <- data.frame(
  "ID" = c(1, 2, 3, 4, 5),
  "FirstName" = c("John", "Jane", "Mark", "Bob", "Susan"),
  "LastName" = c("Doe", "Dean", "Markson", "Barston", "Cup"),
  "Age" = c(31, 33, 27, 41, 39),
  "Department" = c("IT", "Sales", "Marketing", "IT", "IT"),
  "Salary" = c(5000, 4500, 5000, 6500, 7000),
  "DateJoined" = c(as.Date("2018-05-27"), as.Date("2020-08-14"), as.Date("2021-07-13"), as.Date("2015-01-10"), as.Date("2012-09-04"))
)
data

Here’s what the dataset looks like:

Image 2 — Tabular data created in R

Or, you can leverage the already existing Excel file, and use packages such as readxl to load it:

library(readxl)

data <- read_xlsx("data.xlsx", sheet = "Sheet1")
data

This is the output you’ll see:

Image 3 — Reading an Excel File in R

The thing you need to remember about R is that it connects to virtually any data source, from local files to databases and anything in between. That’s not a luxury you get with Excel.

Sure, you can connect to the data source manually and export the data, but that’s not the same. We like things automated at Appsilon.

Basic Excel Functions in R — Getting Your Feet Wet

This section will walk you through some basic operations, such as summary statistics, string/date manipulation, and conditionals. We’ll implement the identical logic in R and Excel to see how they compare.

Let’s start with simple summary statistics.

Simple Summary Statistics

Think of these as points you always want to know the value for. What’s the average of something, or what is the range of values in a column? These functions will give you the answer.

We’ll use the Salary column for the calculation. In Excel, it's located in a column F and spans between rows 2:6. Yours may differ, so change the values in the formulas accordingly.

Here’s an example set of functions you can use to find the minimum, average (mean), median, and maximum, but also its standard deviation:

=MIN(F2:F6)
=AVERAGE(F2:F6)
=MEDIAN(F2:F6)
=MAX(F2:F6)
=STDEV(F2:F6)

Below are the results you’ll see in Excel:

Image 4 — Summary statistics in Excel

Neat, and something you’re used to, but how can you translate it into R?

R programming language has functions built-in for taking care of calculations like these. For example, you can call the mean() function and pass in the column of interest ( dataframe_name$column_name) to get the value out.

The surrounding paste() function is here just to give some context, or to print the function name in front of the value:

paste("Min. salary:", min(data$Salary))
paste("Average. salary:", mean(data$Salary))
paste("Median salary:", median(data$Salary))
paste("Max salary:", max(data$Salary))
paste("STdev. salary:", sd(data$Salary))

These are the results you’ll see:

Image 5 — Summary statistics in R

And that’s the basics of summary statistics for you. Up next, let’s discuss string concatenation.

Concatenation

String concatenation is one of the typical data manipulation and preprocessing operations. The idea is to take two (or more) columns and combine their values into a single new column.

Luckily for us, we have the First Name and Last Name columns that beg to be combined into a Full Name column.

In Excel, you’ll want to call the CONCAT() function and pass in both values with a single space in between. Here's an example:

=CONCAT(B2;" ";C2)

And here’s what everything will look like when you apply the formula to the entire column:

Image 6 — String concatenation in Excel

In R, you can use the mutate() function from dplyr to do the same:

library(dplyr)

data %>%
  mutate(FullName = paste(FirstName, LastName, sep = " "))

The results are identical, not taking into consideration the UI differences:

Image 7 — String concatenation in R

The mutate() function is responsible for creating new attributes. In it, you're basically assigning values to a new column. The paste() function will combine multiple fields to produce the final results.

Are you new to R dplyr and want to learn more? We have a complete beginners guide to dplyr on our blog.

Conditions

Up next, let’s tackle conditions. In here, you want the cell value to populate differently, depending on values from different columns.

We’ll create a new column named Higher than avg. sal that will have a value of "Y" if the salary for a given employee is greater than the average salary across the company, and "N" otherwise.

Here’s the Excel formula you need to use:

=IF(F2>AVERAGE(F$2:F$6); "Y"; "N")

And these are the results:

Image 8 — Conditional statements in Excel

In R, you can once again use the mutate() function and assign the result to be the output from the ifelse() function. The declaration is pretty much self-explanatory:

data %>%
  mutate(HigherThanAvgSal = ifelse(Salary > mean(Salary), "Y", "N"))

The results are once again identical:

Image 9 — Conditional statements in R

And that’s conditionals for you. Up next, let’s go over string manipulation.

String Manipulation

You can do all sorts of stuff with strings in Excel and R. We’ll demonstrate that on a rather simple example of extracting initials from first and last names.

To achieve this in Excel, you’ll need to use the familiar CONCAT() function, with its inner elements being the results of a LEFT() function. This one will extract the first N letters from a string.

If we extract only the first letter for both first and last names and then combine them, we will get the initials.

Here’s the implementation in Excel:

=CONCAT(LEFT(B2;1); LEFT(C2;1))

These are the results you should see:

Image 10 — String manipulation in Excel

Things are quite similar in R. Instead of LEFT, you'll need to use substr() and paste the start and end points. These are then surrounded by the familiar paste() function:

data %>%
  mutate(Initials = paste(substr(FirstName, 1, 1), substr(LastName, 1, 1), sep = ""))

And you get the same output as with Excel:

Image 11 — String manipulation in R

And finally, let’s go over date manipulations.

Date Calculations

There’s no getting around dates. They’re an integral part of any dataset, and in ours, they signify the exact day at which the employee has joined the company.

What we’re aiming to accomplish is simple — calculate how many years has the employee been in our company.

In Excel, one way of doing this is by subtracting the year of the hire date from the year of the current date. Here’s an example:

=YEAR(TODAY())-YEAR(G2)

This is what you should see populated in your column:

Image 12 — Date calculations in Excel

Doing the same in R is much easier with the lubridate pacakge. Make sure to install it first - install.packages("lubridate"), and then use the function call similar to the one from Excel:

library(lubridate)

data %>%
  mutate(YearsExp = year(today()) - year(DateJoined))

Once again, the results are identical:

Image 13 — Date calculations in R

But we’re not done with this section yet. R has a secret weapon that will leave a lot of Excel users jealous.

R dplyr — Do Everything With One Command

And the secret is — the dplyr package allows you to chain operations together. For example, you can create all of the previously seen attributes in a single call.

Here’s the code for doing so:

library(dplyr)
library(lubridate)

data <- data %>%
  mutate(
    FullName = paste(FirstName, LastName, sep = " "),
    HigherThanAvgSal = ifelse(Salary > mean(Salary), "Y", "N"),
    Initials = paste(substr(FirstName, 1, 1), substr(LastName, 1, 1), sep = ""),
    YearsExp = year(today()) - year(DateJoined)
  )

data

This is the output you will see:

Image 14 — Multiple column creation with R dplyr mutate()

With R, there’s no guesswork in what’s going on. You can clearly read the code and the order of operations.

Excel Functions in R for Data Summarization and Charts

This section will consider one other important aspect of the R vs. Excel debate — charts. Excel makes it easy to produce good-looking data visualizations, but you’ll have to wait and see if it’s up to par with what R has to offer.

Conditional Aggregation and Charting in Excel

Our chart logic is quite simple. We want to display the total monthly salaries across departments. This calculation doesn’t involve any work for the Sales and Marketing departments, since there’s only one employee there. But still, the formula will scale to future employees.

The SUMIF() function in Excel allows us to calculate the sum only on those records where a defined condition is met. You first define the condition range (departments), then the criterion (IT, Sales, Marketing), and finally the range from which the values will be summed up:

=SUMIF(E2:E6;"IT";F2:F6)
=SUMIF(E2:E6;"Sales";F2:F6)
=SUMIF(E2:E6;"Marketing";F2:F6)

Once calculated, you can select the attributes and create your chart with a couple of clicks:

Image 15 — Chart in Excel

Neat. Let’s see what we have to do to achieve the same in R.

Conditional Aggregation and Charting in R

R’s dplyr packs a convenient group_by() function that allows us to group the dataset by an apartment, and calculate the summary statistics from there.

All of the results will be stored to a new data.frame, just for the sake of easier visualization:

department_salaries <- data %>%
  group_by(Department) %>%
  summarize(Salary = sum(Salary))

department_salaries

This is what the summarized data.frame looks like:

Image 16 — Summary statistics in R

You can now leverage the ggplot2 package to make a bar chart. There's no point in explaining the function call line by line, so just read it and see what happens if you remove some bits and pieces:

library(ggplot2)

bar_chart <- ggplot(department_salaries, aes(x = Department, y = Salary)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Total monthly salaries by department",
       y = "Salary in $", 
       x = "Department")

bar_chart

This is the chart you’ll see after running the code:

Image 17 — Chart in R with ggplot2

Looking to dive deeper into ggplot2? We have an entire article series on making stunning charts, bar charts included.

And that’s basically it for the “how” part. The only thing left to do is discuss the reasons why you might consider switching from Excel to R.

Why Transition from Excel to R — Is R a Viable Excel Alternative?

Excel isn’t going anywhere, that’s a fact. It’s a tool enjoyed by millions worldwide, while R is a more niched technology. And that’s fine, R was never meant to replace Excel for average users.

Where R becomes interesting is if you consider yourself an Excel power user. That’s where you run into Excel limitations, such as:

  1. Limited data capacity: This will depend on your Excel version, but the more recent ones have a cap of 1048576 rows and 16384 columns. It might be insufficient for handling huge datasets.
  2. Performance issues: Excel can become slow and unresponsive when dealing with complex calculations and large datasets, whereas R is designed to handle such tasks efficiently and without hiccups.
  3. Limited statistical analysis: Excel provides basic statistical functions, while R offers a comprehensive suite of statistical packages, allowing you to do more advanced things.
  4. Lack of version control: Excel lacks built-in version control, making it difficult to track changes and collaborate on projects compared to R, which can use version control systems like Git. Sure, you can version control an Excel file with Git in theory, but no one does it for a reason.
  5. Reproducibility and automation challenges: Excel lacks built-in features for reproducibility, making it harder to document and reproduce data analysis compared to R, which supports scripts and notebooks. Also, Excel requires manual execution of tasks, making it less suitable for automated data processing and analysis compared to R.

This is just a handful of reasons why we think Excel is a no-go for power users and companies working with big datasets. Your mileage may vary, and you’re likely to find additional advantages/disadvantages in your area of work.

But still, we think R is a viable Excel alternative whenever you do anything south of basic analysis and visualization.

Summing up Excel Functions in R

And there you have it — Excel functions in R, and why R is a viable alternative to Excel, especially as the amount of data grows. We hope the transition to R doesn’t scare you. It’s a quite simple programming language, but will definitely require some time to get used to. It’s a new way of thinking and implementing actions, after all.

The examples provided today should be enough to get you started. R’s dplyr package supports pretty much anything Excel does, at least in the realm of data analysis and manipulation. If you want to visualize your data, look no further than ggplot2 for static visualizations.

Did you successfully transition from Excel to R? Or did you find a reason not to? Whatever the case, make sure to share your thoughts with us in the comment section below. You can also reach out on Twitter — @appsilon.

Is Excel obsolete? Here’s how (and why) to switch to R Shiny instead.

Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Originally published at https://appsilon.com.

R
Excel
Data Analysis
Data Visualization
Data Science
Recommended from ReadMedium