Learning R: R and RStudio (2023)

Introduction to R and RStudio IDE

Objectives
To understand:
1. the difference between R and RStudioIDE.
2. how to work within the RStudio environment including:

  • creating an Rproject and Rscript
  • navigating between directories
  • using functions
  • obtaining help

3. how R can enhance data analysis reproducibility

By the end of this section, you should be able to easily navigate and explore your RStudio environment.

What is R?

R is both a computational language and environment for statitical computing and graphics. It is open-source and widely used by scientists, not just bioinformaticians. Base packages of R are built into your initial installation, but R functionality is greatly improved by installing other packages. R as a programming language is based on the S language, developed by Bell laboratories. R is maintained by a network of collaborators from around the world, and core contributors are known as the R Core team (Term used for citations). However, R is also a resource for and by scientists, and R functionality makes it easy to develop and share packages on any topic. Check out more about R on The R Project for Statistical Computing website.

Why R?

R is a particularly great resource for statistical analyses, plotting, and report generating. The fact that it is widely used means that users do not need to reinvent the wheel. There is a package available for most types of analyses, and if users need help, it is only a Google search away. As of now, CRAN houses 18,944 available packages. There are also many field specific packages, including those useful in the -omics (genomics, transcriptomics, metabolomics, etc.). For example, the latest version of Bioconductor (v 3.16) includes 2,183 software packages, 416 experiment data packages, 909 annotation packages, 28 workflows, and 8 books

Where do we get R packages?

To take full advantage of R, you need to install R packages. R packages are loadable extensions that contain code, data, documentation, and tests in a standardized shareable format that can easily be installed by R users. The primary repository for R packages is the Comprehensive R Archive Network (CRAN). CRAN is a global network of servers that store identical versions of R code, packages, documentation, etc (cran.r-project.org). To install a CRAN package, use install.packages("packageName"). Github is another common source used to store R packages; though, these packages do not necessarily meet CRAN standards so approach with caution. To install a Github packages use library(devtools) followed by install_github(). Many genomics and other packages useful to biologists / molecular biologists can be found on Bioconductor - more on this later.

METACRAN is a useful database that allows you to search and browse CRAN/R packages.

Ways to run R

R can be used via command line interactively, command line using a script, or interactively through an environment. This course will demonstrate the utility of the RStudio integrated development environment (IDE).

What is RStudio?

RStudio is an integrated development environment for R, and now python. RStudio includes a console, editor, and tools for plotting, history, debugging, and work space management. It provides a graphic user interface for working with R, thereby making R more user friendly. RStudio is open-source and can be installed locally or used through a browser (RStudio Server). We will be showcasing RStudio Server, but we highly encourage new users to install R and RStudio locally to their PC or macbook.

Note: RStudio the company is now Posit.

**Installing R and RStudio**Detailed Instructions for installing R and RStudio can be found [here](https://btep.ccr.cancer.gov/docs/rtools/){target=_blank}.

Getting Started with R and R Studio

This tutorial closely follows the Intro to R and RStudio for Genomics lesson provided by datacarpentry.org.

Creating a R project

Because we are working on DNAnexus, and our files will not remain at the end of each class, we aren't going to use a R project for all lessons. However, it is worth creating an R project and discussing the benefits here.

Creating an R project for each project you are working on facilitates organization and scientific reproducibility.

An RStudio project allows you to more easily:

(Video) R Programming for Beginners | Complete Tutorial | R & RStudio

  • Save data, files, variables, packages, etc. related to a specific analysis project
  • Restart work where you left off
  • Collaborate, especially if you are using version control such as git. ---datacarpentry.org

R projects simplify data reproducibility by allowing us to use relative file paths that will translate well when sharing the project.

To start a new R project, select File > New Project... or use the R project button (See image below) Learning R: R and RStudio (1)

A New project wizard will appear. Click New Directory and New Project. Choose a new directory name....perhaps "LearningR"? To make your project more reproducible, consider clicking the option box for renv. The R project file ends in .Rproj.

One of the most wonderful and also frustrating aspects of working with R is managing packages. Unfortunately it is very common that you may run into versions of R and/or R packages that are not compatible. This may make it difficult for someone to run your R script using their version of R or a given R package, and/or make it more difficult to run their scripts on your machine. renv is an RStudio add-on that will associate your packages and project so that your work is more portable and reproducible. To turn on renv click on the Tools menu and select Project Options. Under Environments check off “Use renv with this project” and follow any installation instructions. ---datacarpentry.org

Read more about renv here.

Creating a R script

As we learn more about R and start learning our first commands, we will keep a record of our commands using an R script. Remember, good annotation is key to reproducible data analysis. An R script can also be generated to run on its own without user interaction, from R console using source() and from linux command line using Rscript.

To create an R script, click File > New File > R Script. You can save your script by clicking on the floppy disk icon. You can name your script whatever you want, perhaps "LearningR_intro". R scripts end in .R. Save your R script to your working directory, which will be the default location on RStudio Server.

Introduction to the RStudio layout

Learning R: R and RStudio (2)
Let's look a bit into our RStudio layout. (demonstrate minimize / maximize utility)

Source: This pane is where you will write/view R scripts. Some outputs (such as if you view a dataset using View()) will appear as a tab here.
Console/Terminal/Jobs: This is actually where you see the execution of commands. This is the same display you would see if you were using R at the command line without RStudio. You can work interactively (i.e. enter R commands here), but for the most part we will run a script (or lines in a script) in the source pane and watch their execution and output here. The “Terminal” tab give you access to the BASH terminal (the Linux operating system, unrelated to R). RStudio also allows you to run jobs (analyses) in the background. This is useful if some analysis will take a while to run. You can see the status of those jobs in the background.
Environment/History: Here, RStudio will show you what datasets and objects (variables) you have created and which are defined in memory. You can also see some properties of objects/datasets such as their type and dimensions. The “History” tab contains a history of the R commands you’ve executed R.
Files/Plots/Packages/Help/Viewer: This multipurpose pane will show you the contents of directories on your computer. You can also use the “Files” tab to navigate and set the working directory. The “Plots” tab will show the output of any plots generated. In “Packages” you will see what packages are actively loaded, or you can attach installed packages. “Help” will display help files for R functions and packages. “Viewer” will allow you to view local web content (e.g. HTML outputs).
---datacarpentry.org

Note: you can already see our R project and R script file in our project directory under the Files tab. If you chose to use renv you will also see some files and directories related to that.

Additional panes may show up depending on what you are doing in RStudio. For example, you may notice a Render tab in the Console/Terminal/Jobs pane when working with Rmarkdown files (.Rmd).

Also, you can change your RStudio layout. See this blog if you are interested. For simplicity, please do NOT change the layout during this course.

When to use Source vs Console?

We will use the Source pane to keep a record of the code that we run. However, at times, we may want to do quick testing without keeping a record. This is the scenario in which you would use the Console.

Uploading and exporting files from RStudio Server

RStudio Server works via a web browser, and so you see this additional Upload option in the Files pane. If you select this option, you can upload files from your local computer into the server environment. If you select More, you will also see an Export option. You can use this to export the files created in the RStudio environment.

Learning R: R and RStudio (3)

(Video) R Programming Tutorial - Learn the Basics of Statistical Computing

Data Management

Data organization is extremely important to reproducible science. Consider organizing your project directory in a way that facilitates reproducibility. For example, you may want directories for data, drafts_documents, outputs, and scripts. See additional details in this lesson from Data Carpentries. How you organize project directories is up to you, but consistency is fairly important for reproducibility. We will discuss more on this subject when introducing data frames.

Saving your R environment (.Rdata)

When exiting RStudio, you will be prompted to save your R workspace or .RData. The .RData file saves the objects generated in your R environment. You can also save the .RData at any time using the floppy disk icon just below the Environment tab. You may also save your R workspace from the console using save.image(). RData files are often not visible in a directory. You can see them using ls -a from the terminal. RData files within a working directory associated with a given project will launch automatically under the default option Restore .RData into workspace at startup. You may also load .Rdata by using load().

If you are working with significantly large datasets, you may not want to automatically save and restore .RData. To turn this off, go to Tools -> Global Options -> deselect "Restore .RData into workspace at startup" and choose "Never" for "Save workspace to .RData on exit".

Navigating directories

Now we are ready to work with some of our first R commands. We are going to run commands directly from our R script rather than typing into the R console.

Our first command will be getwd(). This simply prints your working directory and is the R equivalent of pwd (if you know unix coding).

#print our working directorygetwd()
To run this command, we have a number of options. First, you can use the Run button above. This will run highlighted or selected code. You may also use the source button to run your entire script. My preferred method is to use keyboard shortcuts. Move your cursor to the code of interest and use command + return for macs or control + enter for PCs. If a command is taking a long time to run and you need to cancel it, use control + c from the command line or escape in RStudio. Once you run the command, you will see the command print to the console in blue followed by the output.
[1] "/home/rstudio/LearningR"
It is good practice to annotate your code using a comment. We can denote comments with #.

We set our working directory when we created our R project, but if for some reason we needed to set our working directory, we can do this with setwd(). There is no need to run currently. However, if you were to run it, you would use the following notation:

setwd("/home/rstudio/Rlearning")

The path should be in quotes. You can use tab completion to fill in the path.

What is a path?

According to Wikipedia, a path is "a string of characters used to uniquely identify a location in a directory structure."

Therefore, a file path simply tells us where a file or files are located. You will need to direct R to the location of files that you want to work with or output that you create.

The working directory is the location in your file system that you are currently working in. In other words, it is the default location that R will look for input files and write output files.

Note: R uses unix formatting for directories, so regardless of whether you have a Windows computer or a mac, the way you enter the directory information will be the same. You can use tab completion to help you fill in directory information.

Using functions

A function in R (or any computing language) is a short program that takes some input and returns some output.

(Video) RStudio for the Total Beginner

An R function has three key properties:

  • Functions have a name (e.g. dir, getwd); note that functions are case sensitive!
  • Following the name, functions have a pair of ()
  • Inside the parentheses, a function may take 0 or more arguments--- datacarpentry.org

We have already used some R functions (e.g. getwd() and setwd())! Let's look at another example using the round() function. round() "rounds the values in its first argument to the specified number of decimal places (default 0)" --- R help.

Consider

round(5.65) #can provide a single number
## [1] 6
round(c(5.65,7.68,8.23)) #can provide a vector
## [1] 6 8 8
In this example, we only provided the required argument in this case, which was any numeric or complex vector. We can see that two arguments can be included by the context prompt while typing (See below image). The optional second argument (i.e., digits) indicates the number of decimal places to round to. Contextual help is generally provided as you type the name of a function. We will discuss other types of help in a moment.

Learning R: R and RStudio (4)

round(5.65,digits=1) #provide an additional argument rounding to the tenths place
## [1] 5.7

At times a function may be masked by another function. This can happen if two functions are named the same (e.g., dplyr::filter() vs plyr::filter()). We can get around this by explicitly calling a function from the correct package using the following syntax: package::function().

Getting help

Now we know a bit about using functions, but what if I had no idea what the function round() was used for or what arguments to provide?

Getting help in R is fairly easy. In the pane to the bottom right, you should see a Help tab. You can search for help regarding a specific topic using the search field (look for the magnifying glass).

Learning R: R and RStudio (5)

Alternatively, you can search directly for help in the console using ?round() or ??round(). help.search() or ?? can be used to search for a function using a keyword and will also work for unloaded packages; for example, you may try help.search("anova").

(Video) Getting started with R and RStudio

R help pages provide a lot of information. The description and argument sections are likely where you will want to start. If you are still unsure how to use the function, scroll down and check out the examples section of the documentation. Consider testing some of the examples yourself and applying to your own data.

Many R packages also include more detailed help documentation known as a vignette. To see a package vignette, use browseVignettes() (e.g., browseVignettes(package="dplyr")).

To see a function's arguments, you can use args().

args(round)
## function (x, digits = 0) ## NULL

round() takes two arguments, x, which is the number to be rounded, and a digits argument. The = sign indicates that a default (in this case 0) is already set. Since x is not set, round() requires we provide it, in contrast to digits where R will use the default value 0 unless you explicitly provide a different value. --- datacarpentry.org

R arguments are also positional, so instead of including digits=1 in our above use of round(), we could instead do the following:

round(5.65, 1)
## [1] 5.7

Addtional Sources for help

Try googling your problem or using some other search engine. rseek is an R specific search engine that searches several R related sites. If using google directly, make sure you use R to tag your search.

Stack Overflow is a particularly great resource for finding help. If you post a question, you will need to make a reproducible example (reprex) and be as descriptive as possible regarding the problem. For this purpose, you may find the reprex package particularly useful.

To provide details about your R session, use

sessionInfo()
## R version 4.1.2 (2021-11-01)## Platform: x86_64-apple-darwin17.0 (64-bit)## Running under: macOS Big Sur 10.16## ## Matrix products: default## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib## ## locale:## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8## ## attached base packages:## [1] stats graphics grDevices utils datasets methods base ## ## loaded via a namespace (and not attached):## [1] digest_0.6.29 R6_2.5.1 jsonlite_1.8.0 magrittr_2.0.3 ## [5] evaluate_0.15 stringi_1.7.6 rlang_1.0.2 cli_3.3.0 ## [9] rstudioapi_0.13 jquerylib_0.1.4 bslib_0.3.1 rmarkdown_2.14 ## [13] tools_4.1.2 stringr_1.4.0 xfun_0.36 yaml_2.3.5 ## [17] fastmap_1.1.0 compiler_4.1.2 htmltools_0.5.2 knitr_1.41 ## [21] sass_0.4.1

Test your learning

  1. Which of the following functions is used to print your working directory in R?
    a. pwd
    b. Setwd()
    c. getwd()
    d. wkdir()

Solution
C
(Video) Getting Started with RStudio and R | Part 1

  1. Which of the following can be used to learn more regarding an R function?
    a. ?function()
    b. ??function()
    c. args(function)
    d. All of the above

Solution
D

Acknowledgments

Material from this lesson was either taken directly or adapted from the Intro to R and RStudio for Genomics lesson provided by datacarpentry.org. Material was also inspired by content from Introduction to data analysis with R and Bioconductor, which is part of the Carpentries Incubator.

FAQs

Can you learn R in a day? ›

The time it takes to learn R depends on the time you devote to learning and what you want to do with the language. A beginner-friendly course like Learn R takes about 20 hours to complete. So if you have an hour a day to devote to learning R, then you can complete the course in less than a month.

Why is learning R so hard? ›

R is known for being hard to learn. This is in large part because R is so different from many programming languages. The syntax of R, unlike languages like Python, is very difficult to read. Basic operations like selecting, naming, and renaming variables are more confusing in R than they are in other languages.

Can I learn R in 3 months? ›

High-Quality Instruction. With R in 3 Months, you'll get high-quality instruction that will guide you from R newbie to R expert. Over the three months, you'll go through Getting Started with R, Fundamentals of R, and Going Deeper with R, courses that have helped thousands of people around the world learn R.

How long will it take to learn R programming? ›

R is considered one of the more difficult programming languages to learn due to how different its syntax is from other languages like Python and its extensive set of commands. It takes most learners without prior coding experience roughly four to six weeks to learn R.

Is R difficult than Python? ›

Both Python and R are considered fairly easy languages to learn. Python was originally designed for software development. If you have previous experience with Java or C++, you may be able to pick up Python more naturally than R. If you have a background in statistics, on the other hand, R could be a bit easier.

Is R very difficult? ›

R is a great language for programming beginners to learn, and you don't need any prior experience with code to pick it up. Nowadays, R is easier to learn than ever thanks to the tidyverse collection of packages.

Can you learn R in a week? ›

Then, you have to learn data exploration, manipulation, and visualization that would require 7–10 days of practice. Finally, you should download a dataset and start applying R programming techniques to analyze and visualize the data. Thus, it will take at least three weeks to learn R programming.

What is the hardest programming language? ›

Haskell. The language is named after a mathematician and is usually described to be one of the hardest programming languages to learn. It is a completely functional language built on lambda calculus.

Is R easier than Excel? ›

R and Excel are beneficial in different ways. Excel starts off easier to learn and is frequently cited as the go-to program for reporting, thanks to its speed and efficiency. R is designed to handle larger data sets, to be reproducible, and to create more detailed visualizations.

Can I learn R in 2 weeks? ›

Those who have programming knowledge may be able to learn how to use the language within two weeks. R online courses commonly offer instruction in the following topics: R syntax. Set-up.

Can I learn R on my own? ›

Absolutely possible. R is such a high-level, interpreted language, it is so easy to learn. There are hundreds of FREE quality online courses out there, Datacamp being the most famous one, where you can enroll for free, learn at your own pace, practice and understand R and get a completion certificate! FREE LUNCH?

Should I learn R or Python first? ›

In the context of biomedical data science, learn Python first, then learn enough R to be able to get your analysis done, unless the lab that you're in is R-dependent, in which case learn R and fill in the gaps with enough Python for easier scripting purposes. If you learn both, you can R code into Python using rpy.

Is R programming a valuable skill? ›

The standard among statistical programming languages, R is sometimes called the “golden child” of data science. It's a popular skill among Big Data analysts, and data scientists skilled in R are sought after by some of the biggest brands, including Google, Facebook, Bank of America, and the New York Times.

Is SQL or R easier? ›

If you are interested in doing statistical analysis and data visualization, then R would be a good choice. If you are interested in working with databases, then SQL would be a better choice. If you are unsure which one to choose, you could consider learning both, as they can be used together in many different ways.

Is SQL or R harder? ›

Although an SQL script is way lengthier than its R/python counterparts in most cases, it feels easier to do it that way like you read an English language. But learning a language like an R/python will always make your life easier and more effective in the way you handle the data.

Do data scientists use R? ›

R in data science is used to handle, store and analyze data. It can be used for data analysis and statistical modeling. R is an environment for statistical analysis.

Why is R so frustrating? ›

R's mean function accepts only a single variable and cannot directly handle multiple variables even if they are in a single data frame. The popular graphing package ggplot2 does not accept variables unless they're combined into a data frame. These frustrating inconsistencies simply need to be memorized.

Is R or Python better for data science? ›

This means that Python is more versatile and can be used for a wider range of tasks, such as web development, data manipulation, and machine learning. R, on the other hand, is primarily used for statistical analysis and data visualization.

Do you need to know math to learn R? ›

For general purpose work, you need a good grasp of arithmetic, basic algebra, number bases (binary, hexadecimal) and some basics in boolean logic (and, or, not, xor, etc). That's enough to do most general-purpose programming…you probably won't need calculus (which is a big relief to most people).

How long does it take to learn R after Python? ›

R and Python have become the most preferred languages for data analytics and machine learning. It would ideally take 12–15 weeks to learn R and Python if you are a complete beginner. But, if you know the basics of R and Python, then it should take nearly 8 weeks to master both Python and R.

What is the slowest coding language? ›

"Python is widely acknowledged as slow.

Is coding job stressful? ›

In general, coding is a fairly relaxing job. There is the flexibility of working remotely as a programmer, and in many cases there is the security of routine. However, as with any job, whether coding is stressful depends largely on the company you work with. Cultural pressures and tight deadlines can cause stress.

Which is faster R or Python? ›

R can't be used in production code because of its focus on research, while Python, a general-purpose language, can be used both for prototyping and as a product itself. Python also runs faster than R, despite its GIL problems.

Can R do everything Excel can? ›

R can handle very large datasets

Excel is limited in that there are only so many rows and columns per spreadsheet. So when you run out of rows/columns, you're forced to move to a new tab or a new file.

Should I learn SQL or R or Python? ›

This suggests that, in the end, you should focus more on R or Python than SQL. One thing to remember is that SQL is a big first step to some more complex languages (Python, R, JavaScript, etc.). Once you understand how a computer thinks, it is easy to learn a new programming language to analyze your data.

What language is R similar to? ›

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues.

Why is Python more popular than R? ›

Whereas, R is limited to statistics and analysis. Many data scientists and software developers select python over R because of its: Readability: Python is extremely easy to read and understand. Popularity: One of the most popular open-source programming languages for data scientists.

How much do R coders make? ›

How much do R Programming developers make? The salaries of candidates in this role range from a low of $150,000 to a high of $200,000, with a median salary of $189,750.

Is it worth learning R in 2022? ›

R Language

With the increasing demand for machine learning and data science, it is worth learning the R programming language. It is an open-source programming platform that includes a wide range of libraries and frameworks. This language is widely used in data science, statistical analysis, and machine learning.

What professions use R? ›

Jobs that require R include data architect, analyst manager, market researcher, business analyst, and senior data analyst.

How long does it take to be good at coding? ›

As a general rule, don't expect to become coding proficient in less than three months of full-time study, and depending on your professional goals, preferred programming language, acumen, and personal passion, it could take longer.

How long does it take to learn R or Python? ›

If you're looking for a general answer, here it is: If you just want to learn the Python basics, it may only take a few weeks. However, if you're pursuing a data science career from the beginning, you can expect it to take four to twelve months to learn enough advanced Python to be job-ready.

How fast is R compared to Python? ›

R is a low-level language, which means longer codes and more time for processing. Python being a high-level language renders data at a much higher speed. So, when it comes to speed - there is no beating Python. In the fight - R vs Python for data science - Python seems to be much faster with an easier syntax.

Can I learn Python at 45 and get a job? ›

For sure yes , if you have the desired skills and knowledge . No one will ever care about the age , there are plenty of jobs available in the field of python . Beside this you can also go for freelancing as an option.

Do data analysts use R or Python more? ›

While Python dominates the business environment, R is dominant in research. This is an important factor to consider when choosing your first programming language for data analysis – are you looking for a career in business or academia? Let's take a look at other important factors to take into account.

Is Python or R more in demand? ›

Python is more popular and has a vast user base. Primary users of python include developers and programmers. R is less popular among users. Its users include scientists and Research & Development who frequently rely on data analysis.

What is fastest programming language? ›

Go, also known as Golang, is a programming language developed by Google. It compiles to Assembly like most of the other languages here, but it has more modern features, simpler syntax, and is easier to write than the long-time leader of fast programming languages, C/C++.

Is R is a slow language? ›

While R is slow compared to other programming languages, for most purposes, it's fast enough.

Does R run faster than Rstudio? ›

I just noticed that the same codes run much faster in R launched from the server's terminal than in Rstudio Server, and the difference is quite significant. For example, I wrote a heavy R script that handles a large amount of data (>10GB).

Why is R so slow? ›

There is a lot of overhead in the processing because R needs to check the type of a variable nearly every time it looks at it. This makes it easy to change types and reuse variable names, but slows down computation for very repetitive tasks, like performing an action in a loop.

Videos

1. R programming in one hour - a crash course for beginners
(R Programming 101)
2. R Tutorial For Absolute Beginners [2021]
(Algovibes)
3. Learning R in RStudio: corrplot
(R at Colby)
4. R Programming Full Course for 2022 | R Programming For Beginners | R Tutorial | Simplilearn
(Simplilearn)
5. R programming for ABSOLUTE beginners
(R Programming 101)
6. R with RStudio: Getting Started
(UQ Library)
Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated: 02/28/2023

Views: 5719

Rating: 4 / 5 (71 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.