# R in a Nutshell

Why learn R? Because it's rapidly becoming the standard for developing statistical software. *R in a Nutshell* provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.

The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.

- Understand the basics of the language, including the nature of R objects
- Learn how to write R functions and build your own packages
- Work with data through visualization, statistical analysis, and other methods
- Explore the wealth of packages contributed by the R community
- Become familiar with the lattice graphics package for high-level data visualization
- Learn about bioinformatics packages provided by Bioconductor

"I am excited about this book. *R in a Nutshell* is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."

24 Slices |
Format | Buy | Remix | |
---|---|---|---|---|

## 1. Getting and Installing R |
ePub | |||

Today, R is maintained by a team of developers around the world. Usually, there is an official release of R twice a year, in April and in October. I used version 2.9.2 in this book. (Actually, it was 2.8.1 when I started writing the book and was updated three times while I was writing. I installed the updates, but they didnt change very much content.) R hasnt changed that much in the past few years: usually there
are some bug fixes, some optimizations, and a few new functions in
each release. There have been some changes to the language, but most
of these are related to somewhat obscure features that wont affect
most users. (For example, the type of |
||||

## 2. The R User Interface |
ePub | |||

If youre reading this book, you probably have a problem that you would like to solve in R. You might want to: Check the statistical significance of experimental results Plot some data to help understand it better Analyze some genome data The R system is a software environment for statistical computing and graphics. It includes many different components. In this book, Ill use the term R to refer to a few different things: A computer language The interpreter that executes code written in R A system for plotting computer graphics described using the R language The Windows, Mac OS, or Linux application that includes the interpreter, graphics system, standard packages, and user interface This chapter contains a short description of the R user interface and the R console, and describes how R varies on different platforms. If youve never used an interactive language, this chapter will explain some basic things you will need to know in order to work with R. Well take a quick look at the R graphical user interface (GUI) on each platform and then talk about the most important part: the R console. |
||||

## 3. A Short R Tutorial |
ePub | |||

Lets get started using R. When you enter an expression into the R console and press the Enter key, R will evaluate that expression and display the results (if there are any). If the statement results in a value, R will print that value. For example, you can use R to do simple math: The interactive R interpreter will automatically print an
object returned by an expression entered into the R console. Notice
the funny [1] that accompanies each returned value. In R, any
number that you enter in the console is interpreted as a
You can construct longer vectors using the is a vector that contains the first seven elements of the Fibonacci sequence. As an example of a vector that spans multiple lines, lets use the sequence operator to produce a vector with every integer between 1 and 50: |
||||

## 4. R Packages |
ePub | |||

A Typically, all of the functions in the package are related: for
example, the R offers an enormous number of packages: packages that display graphics, packages for performing statistical tests, and packages for trying the latest machine learning techniques. There are also packages designed for a wide variety of industries and applications: packages for analyzing microarray data, packages for modeling credit risks, and packages for social sciences. Some of these packages are included with R: you just have to tell R that you want to use them. Other packages are available from public package repositories. You can even make your own packages. This chapter explains how to use packages. |
||||

## 5. An Overview of the R Language |
ePub | |||

Learning a computer language is a lot like learning a spoken language (only much simpler). If youre just visiting a foreign country, you might learn enough phrases to get by without really understanding how the language is structured. Similarly, if youre just trying to do a couple simple things with R (like drawing some charts), you can probably learn enough from examples to get by. However, if you want to learn a new spoken language really well, you have to learn about syntax and grammar: verb conjugation, proper articles, sentence structure, and so on. The same is true with R: if you want to learn how to program effectively in R, youll have to learn more about the syntax and grammar. This chapter gives an overview of the R language, designed to help you understand R code and write your own. Ill assume that youve spent a little time looking at R syntax (maybe from reading Chapter3). Heres a quick overview of how R works. |
||||

## 6. R Syntax |
ePub | |||

Lets start by looking at constants. Constants are the basic building blocks for data objects in R: numbers, character values, and symbols. Numbers are interpreted literally in R: You may specify values in hexadecimal notation by prefixing them with By default, numbers in R expressions are interpreted as double-precision floating-point numbers, even when you enter simple integers: If you really want an integer, you can use the sequence
notation or the The sequence operator `b` and . To combine
an arbitrary set of numbers into a vector, use the `c` function:R allows a lot of flexibility when entering numbers. However, there is a limit to the size and precision of numbers that R can represent: In practice, this is rarely a problem. Most R users will load data from other sources on a computer (like a database) that also cant represent very large numbers. |
||||

## 7. R Objects |
ePub | |||

Table7-1 shows all of the built-in object types. I introduced these objects in Chapter3, so they should seem familiar. I classified the object types into a few categories, to make them easier to understand. These are vectors containing a single type of value: integers, floating-point numbers, complex numbers, text, logical values, or raw data. These objects are containers for the basic vectors: lists, pairlists, S4 objects, and environments. Each of these objects has unique properties (described below), but each of them contains a number of named objects. These objects serve a special purpose in R
programming: These are objects that represent R code; they can be evaluated to return other objects. Functions are the workhorses of R; they take arguments as inputs and return objects as outputs. Sometimes, they may modify objects in the environment or cause side effects outside the R environment like plotting graphics, saving files, or sending data over the network. |
||||

## 8. Symbols and Environments |
ePub | |||

When you define a variable in R, you are actually assigning a symbol to a value in an environment. For example, when you enter the statement: on the R console, it assigns the symbol It is possible to delay evaluation of an expression so that symbols are not evaluated immediately: It is also possible to create a promise object in R to delay evaluation of a variable until it is (first) needed. You
can create a promise object through the Promise objects are used within packages to make objects available to users without loading them into memory. Unfortunately, it is not possible to determine if an object is a promise object, nor is it possible to figure out the environment in which it was created. |
||||

## 9. Functions |
ePub | |||

A function definition in R includes the names of arguments. Optionally, it may include default values. If you specify a default value for an argument, then the argument is considered optional: If you do not specify a default value
for an argument, and you do not specify a value when calling the
function, you will get an error if the function attempts to use the
argument: In a function call, you may override the default value: In R, it is often convenient to specify a variable-length
argument list. You might want to pass extra arguments to another
function, or you may want to write a function that accepts a variable
number of arguments. To do this in R, you specify an ellipsis
( As an example, lets create a function that prints the first
argument and then passes all the other arguments to the |
||||

## 10. Object-Oriented Programming |
ePub | |||

The R system includes some support for object-oriented programming (OOP). OOP has become the preferred paradigm for organizing computer software; its used in almost every modern programming language (Java, C#, Ruby, and Objective C, among others) and in quite a few old ones (Smalltalk, C++). Its easy to understand why: OOP methods lead to code that is faster to write, easier to maintain, and less likely to contain errors. Many R packages are written using OOP mechanisms. If all you plan to do with R is to load some data, build some statistical models, and plot some charts, you can probably skim this chapter. On the other hand, if you want to write your own code for loading data, building statistical models, and plotting charts, you probably should read this chapter more carefully. R includes two different mechanisms for object-oriented programming. As you may recall, the R language is derived from the S language. Ss object-oriented programming system evolved over time. Around 1990, S version 3 (thus S3) introduced class attributes that allowed single-argument methods. Many R functions (such as the statistical modeling software) were implemented using S3 methods, so S3 methods are still around today. In S version 4 (hence S4), formal classes and methods were introduced that allowed multiple arguments, more abstract types, and more sophisticated inheritance. Many new packages were implemented using S4 methods (and you can find S4 implementations of many key statistical procedures as well). In particular, formal classes are used extensively in Bioconductor. |
||||

## 11. High-Performance R |
ePub | |||

When possible, try to use built-in functions for
mathematical computations instead of writing R code to perform those
computations. Many common math functions are included as native
functions in R. In most cases, these functions are implemented as
calls to external math libraries. As an obvious example, if you want
to multiply two matrices together, you should probably use the
Often, it is possible to use built-in functions by transforming a problem. As an example, lets consider an example from queueing theory. Queueing theory is the study of systems where customers arrive, wait in a queue for service, are served, and then leave. As an example, picture a cafeteria with a single cashier. After customers select their food, they proceed to the cashier for payment. If there is no line, they pay the cashier and then leave. If there is a line, they wait in the line until the cashier is free. If we suppose that customers arrive according to a Poisson process and that the time required for the cashier to finish each transaction is given by an exponential distribution, then this is called an M/M/1 queue. (This means memoryless arrivals, memoryless service time, and one server.) |
||||

## 12. Saving, Loading, and Editing Data |
ePub | |||

If you are entering a small number of observations, entering the data directly into R might be a good approach. There are a couple of different ways to enter data into R. Many of the examples in Parts I and II show how to create new objects directly on the R console. If you are entering a small amount of data, this might be a good approach. As we have seen before, to create a vector, use the Its often convenient to put these
vectors together into a data frame. To create a data frame, use the
Entering data using individual statements can be awkward for more than a handful of observations. (Thats why my example above only included five observations.) Luckily, R provides a nice GUI for editing tabular data: the data editor. To edit an object with the data editor, use the |
||||

## 13. Preparing Data |
ePub | |||

Back in my freshman year of college, I was planning to be a biochemist. I spent hours and hours of time in the lab: mixing chemicals in test tubes, putting samples in different machines, and analyzing the results. Over time, I grew frustrated because I found myself spending weeks in the lab doing manual work and just a few minutes planning experiments or analyzing results. After a year, I gave up on chemistry and became a computer scientist, thinking that I would spend less time on preparation and testing and more time on analysis. Unfortunately for me, I chose to do data mining work professionally. Everyone loves building models, drawing charts, and playing with cool algorithms. Unfortunately, most of the time you spend on data analysis projects is spent on preparing data for analysis. Id estimate that 80% of the effort on a typical project is spent on finding, cleaning, and preparing data for analysis. Less than 5% of the effort is devoted to analysis. (The rest of the time is spent on writing up what you did.) |
||||

## 14. Graphics |
ePub | |||

R includes tools for drawing most common types of
charts, including bar charts, pie charts, line charts, and scatter
plots. Additionally, R can also draw some less familiar charts like
quantile-quantile (Q-Q) plots, mosaic plots, and contour plots. The
following table shows many of the charts included in the You can show R graphics on the screen or save them in many different formats. Graphics Devices explains how to choose output methods. R gives you an enormous amount of control over graphics. You can control almost every aspect of a chart. Customizing Charts explains how to tweak the output of R to look the way you want. This section shows how to use many common types of R charts. To show how to use scatter plots, we will look at
cases of cancer in 2008 and toxic waste releases by state in 2006.
Data on new cancer cases (and deaths from cancer) are tabulated by
the American Cancer Society; information on toxic chemicals released
into the environment is tabulated by the U.S. Environmental
Protection Agency (EPA). |
||||

## 15. Lattice Graphics |
ePub | |||

In the early 1990s, Richard Becker and William Cleveland (two researchers at Bell Labs) built a
revolutionary new system for displaying data called The |
||||

## 16. Analyzing Data |
ePub | |||

R includes a variety of functions for calculating summary statistics. To calculate the mean of a vector, use the For each of these functions, the argument Optionally, you can also remove outliers when using the To calculate the minimum and maximum at the same time, use the
Another useful function is |
||||

## 17. Probability Distributions |
ePub | |||

As an example, well start with the normal distribution. As you may remember from statistics classes, the probability density function for the normal distribution is: To find the probability density at a given value, use the
The arguments to this function are fairly intuitive: The plot is shown in Figure17-1. Figure17-1.Normal distribution The distribution function for the normal distribution is
You can use the distribution function to tell you the
probability that a randomly selected value from the distribution is
less than or equal to |
||||

## 18. Statistical Tests |
ePub | |||

Many data problems boil down to statistical tests. For example, you might want to answer a question like: Does this new drug work better than a placebo? Does the new web site design lead to significantly more sales than the old design? Can this new investment strategy yield higher returns than an index fund? To answer questions like these, you would formulate a hypothesis, design an experiment, collect data, and use a tool like R to analyze the data. This chapter focuses on the tools available in R for answering these questions. To be helpful, Ive tried to include enough description of
different statistical methods to help remind you when to use each
method (in addition to how to find them in R). However, because this is a Nutshell book, I cant
describe where these formulas come from, or when theyre safe to use.
R is a good substitute for expensive, proprietary statistics software
packages. However, |

Details

- Title
- R in a Nutshell
- Authors
- Adler, Joseph
- Isbn
- 9780596801700
- Publisher
- O'Reilly Media
- Price
- 35.99
- Published
- July 28, 2013
- Street date
- January 25, 2010

- Format name
- ePub
- Encrypted
- No
- Sku
- B000000024648
- Isbn
- 9781449383046
- File size
- 5.41 MB
- Printing
- Allowed
- Copying
- Allowed
- Read aloud
- Allowed

- Format name
- ePub
- Encrypted
- No
- Printing
- Allowed
- Copying
- Allowed
- Read aloud
- Allowed
- Sku
- In metadata
- Isbn
- In metadata
- File size
- In metadata