What Is R?
R (the language) was created in the early 1990s by Ross Ihaka and Robert Gentleman, then both working at the University of Auckland. It is based upon the S language that was developed at Bell Laboratories in the 1970s, primarily by John Chambers. R (the software) is a GNU project, reflecting its status as important free and open source software. Both the language and the software are now developed by a group of (currently) 20 people known as the R Core Team.
R is a powerful programming language and environment specifically designed for statistical computing and graphics. One of its key strengths lies in its modularity, which is achieved through the use of packages. Here's an explanation of how R is modular and how it is split into packages:
Core R and Base Packages
When you install R, you get the core language along with a set of base packages. These base packages provide essential functions and datasets that cover most fundamental operations in R. Examples include:
- stats: Functions for statistical calculations.
- graphics: Functions for base graphics.
- utils: Utility functions.
- datasets: Example datasets.
- methods: Functions for object-oriented programming.
CRAN and User-Contributed Packages
Beyond the base packages, R's true power comes from its extensive ecosystem of user-contributed packages. These packages extend R's functionality in numerous ways and are hosted on the Comprehensive R Archive Network (CRAN). CRAN is a repository where users can upload and share their packages with the community. As of now, there are thousands of packages available on CRAN, catering to various needs like data manipulation, machine learning, bioinformatics, and more.
Installing and Using Packages
To use a package from CRAN, you need to install it first. This is typically done using the install.packages() function. Once installed, you can load a package into your R session using the library() function. For example:
# Install the dplyr package
install.packages("dplyr")
# Load the dplyr package
library(dplyr)Package Structure
An R package typically includes the following components:
- R scripts: Functions and code.
- Documentation: Help files for the functions.
- DESCRIPTION file: Metadata about the package, such as its name, version, authors, and dependencies.
- NAMESPACE file: Defines the functions and variables to be exported and imported.
Specialized Repositories
Besides CRAN, there are other repositories where R packages can be found, such as:
- Bioconductor: Focuses on packages for bioinformatics.
- GitHub: Many developers host their packages on GitHub for version control and collaboration.
Advantages of Modularity
- Flexibility: Users can install only the packages they need.
- Collaboration: Researchers and developers can share their work easily.
- Scalability: The community can develop and maintain packages independently.
- Innovation: New methodologies and tools can be quickly disseminated.
Examples of Popular Packages
- ggplot2: For advanced graphics and data visualization.
- dplyr: For data manipulation.
- shiny: For building interactive web applications.
- caret: For machine learning workflows.
Summary
R's modular design through its use of packages makes it a versatile and powerful tool for data analysis, statistical computing, and beyond. The ability to extend its functionality with thousands of packages allows users to tailor their R environment to their specific needs, fostering an ever-growing ecosystem of tools and resources.
How to Get Help in R
- Firstly, if you want help on a function or a dataset that you know the name of, type
?followed by the name of the function.
?mean
?model.extract- To find functions, type two question marks (??) followed by a keyword related to the problem to search.
??plotting
??"regression model"- Special characters, reserved words, and multiword search terms need enclosing in double or single quotes.
?"+"
?"if"
The functions help and help.search do the same things as ? and ??, respectively, but with these you always need to enclose your arguments in quotes. The following commands are equivalent to the previous lot:

- if you ever forget the exact name of a code or variables,
aproposfunction can be used to find the match. This is really useful if you can only half-remember the name of a variable that you’ve created, or a function that you want to use. For example, suppose you create a variable a_vector:
a_vector <- c(1, 3, 6, 10)You can then recall this variable using apropos:
apropos("vector")
The results contain the variable you just created, a_vector, and all other variables that contain the string vector. In this case, all the others are functions that are built into R.
Just finding variables that contain a particular string is fine, but you can also do fancier matching with apropos using regular expressions.
A simple usage of apropos could, for example, find all variables that end in , or to find all variables containing a number between 4 and 9:
apropos("z$")[1] "quartz" "SSgompertz" "toeplitz" "unz" - Most functions have examples that you can run to get a better idea of how they work. Use the
examplefunction to run these. There are also some longer demonstrations of concepts that are accessible with thedemofunction:
example(plot)
demo()
demo(Japanese)
- R is r and is split into (more on this later), some of which contain
vignettes, which are short documents on how to use the packages. You can browse all the vignettes on your machine usingbrowseVignettes:
browseVignettes()You can also access a specific vignette using the vignette function (but if your memory is as bad as mine, using browseVignettes combined with a page search is easier than trying to remember the name of a vignette and which package it’s in)
vignette("Sweave", package = "utils")The help search operator ?? and browseVignettes will only find things in packages that you have installed on your machine.
RSiteSearch, which runs a query at http://search.r-project.org. Multiword terms need to be wrapped in braces:RSiteSearch("{Bayesian regression}")?, ??, apropos, and RSiteSearch with it.There are also lots of R-related resources on the Internet that are worth trying. There are too many to list here, but start with these:
- R has a number of mailing lists with archives containing years’ worth of questions on the language. At the very least, it is worth signing up to the general-purpose list, R-help.
- RSeek is a web search engine for R that returns functions, posts from the R mailing list archives, and blog posts.
- R-bloggers is the main R blogging community, and the best way to stay up to date with news and tips about R.
- The programming question and answer site Stack Overflow also has a vibrant R community, providing an alternative to the R-help mailing list. You also get points and badges for answering questions!