Introduction to R
, character and
logical.
R supports vectors, matrices, lists and data frames.
Objects can be assigned values using an equal sign (=) or the
special <- operator.
R is highly vectorized - almost all operations work equally well
on scalars and arrays
All the elements of a matrix or vector must be of the same type
Lists provide a very general way to hold a collection of
arbitrary R objects.
A data frame is a cross between a matrix and a list columns
(variables) of a data frame can be of dierent types, but they
all must be the same length.
2
Using R
Typing the name of any object will display a printed
representation. Alternatively, the print() function can be
used to display the entire object.
Element numbers are displayed in square brackets
Typing a functions name will display its argument list and
denition, but sometimes its not very enlightening.
The str() function shows the structure of an object
If you dont assign an expression to an R object, R will display
the results, but they are also stored in the .Last.value object
Function calls require parentheses, even if there are no
arguments. For example, type q() to quit R.
Square brackets ([ ]) are used for subscripting, and can be
applied to any subscriptable value.
3
Getting Data into R
c() - allows direct entry of small vectors in programs.
scan() - reads data from a le, a URL, or the keyboard into a
vector.
Can be embedded in a call to matrix() or array().
Use the what= argument to read character data.
read.table - reads from a le or URL into a dataframe.
sep= allows a eld separator other than white space.
header= species if the rst line of the le contains variable
names.
as.is= allows control over character to factor conversion
Specialized versions of read.table() include read.csv()
(comma-separated values), read.delim() (tab-separated
values), and read.fwf (xed width formatted data).
data() - reads preloaded data sets into the current
environment.
4
Where R stores your data
Each time you start R, it looks for a le called .RData in the
current directory. If it doesnt exist it creates it. So managing
multiple projects is easy - change to a dierent directory for each
dierent project.
When you end an R session, you will be asked whether or not you
want to save the data.
You can use the objects() function to list what objects exist
in your local database, and the rm() function to remove ones
you dont want.
You can start R with the --save or --no-save option to avoid
being prompted each time you exit R.
You can use the save.image() function to save your data
whenever you want
5
Getting Help
To view the manual page for any R function, use the
help(functionname )
command, which can be abbreviated by
following a question mark (?) by the function name.
The help.search("topic ") command will often help you get
started if you dont know the name of a function.
The command help.start() will open a browser pointing to a
variety of (locally stored) information about R, including a search
engine and access to more lengthly PDF documents. Once the
browser is open, all help requests will be displayed in the browser.
Many functions have examples, available through the example()
function; general demonstrations of R capabilities can be seen
through the demo() function.
6
Libraries
Libraries in R provide routines for a large variety of data
manipulation and analysis. If something seems to be missing from
R, it is most likely available in a library.
You can see the libraries installed on your system with the
command library() with no arguments. You can view a brief
description of the library using library(help=libraryname )
Finally, you can load a library with the command
library(libraryname )
Many libraries are available through the CRAN (Comprehensize R
Archive Network) at
http://cran.r-project.org/src/contrib/PACKAGES.html
.
You can install libraries from CRAN with the install.packages()
function, or through a menu item in Windows. Use the lib.loc=
argument if you dont have administrative permissions.
7
Search Path
When you type a name into the R interpreter, it checks through
several directories, known as the search path, to determine what
object to use. You can view the search path with the command
search()
. To nd the names of all the objects in a directory on
the search path, type objects(pos=num ), where num is the
numerical position of the directory on the search path.
You can add a database to the search path with the attach()
function. To make objects from a previous session of R available,
pass attach() the location of the appropriate .RData le. To refer
to the elements of a data frame or list without having to retype the
object name, pass the data frame or list to attach(). (You can
temporarily avoid having to retype the object name by using the
with()
function.)
8
Sizes of Objects
The nchar() function returns the number of characters in a
character string. When applied to numeric data, it returns the
number of characters in the printed representation of the number.
The length() function returns the number of elements in its
argument. Note that, for a matrix, length() will return the total
number of elements in the matrix, while for a data frame it will
return the number of columns in the data frame.
For arrays, the dim() function returns a list with the dimensions of
its arguments. For a matrix, it returns a vector of length two with
the number of rows and number of columns. For convenience, the
nrow()
and ncol() functions can be used to get either dimension
of a matrix directly. For non-arrays dim() returns a NULL value.
9
Finding Objects
The objects() function, called with no arguments, prints the
objects in your working database. This is where the objects you
create will be stored.
The pos= argument allows you look in other elements of your
search path. The pat= argument allows you to restrict the search
to objects whose name matches a pattern. Setting the all.names=
argument to TRUE will display object names which begin with a
period, which would otherwise be suppressed.
The apropos() function accepts a regular expression, and returns
the names of objects anywhere in your search path which match
the expression.
10
get()
and assign()
Sometimes you need to retreive an object from a specic database,
temporarily overiding Rs search path. The get() function accepts
a character string naming an object to be retreived, and a pos=
argument, specifying either a position on the search path or the
name of the search path element. Suppose I have an object named
x
in a database stored in rproject/.RData . I can attach the
database and get the object as follows:
> attach("rproject/.RData")
> search()
[1] ".GlobalEnv"
"file:rproject/.RData" "package:methods"
[4] "package:stats"
"package:graphics"
"package:grDevices"
[7] "package:utils"
"package:datasets"
"Autoloads"
[10] "package:base"
> get("x",2)
The assign() function similarly lets you store an object in a
non-default location.
11
Combining Objects
The c() function attempts to combine objects in the most general
way. For example, if we combine a matrix and a vector, the result
is a vector.
> c(matrix(1:4,ncol=2),1:3)
[1] 1 2 3 4 1 2 3
Note that the list() function preserves the identity of each of its
elements:
> list(matrix(1:4,ncol=2),1:3)
[[1]]
[,1] [,2]
[1,]
1
3
[2,]
2
4
[[2]]
[1] 1 2 3
12
Combining Objects (contd)
When the c() function is applied to lists, it will return a list:
> c(list(matrix(1:4,ncol=2),1:3),list(1:5))
[[1]]
[,1] [,2]
[1,]
1
3
[2,]
2
4
[[2]]
[1] 1 2 3
[[3]]
[1] 1 2 3 4 5
To break down anything into its individual components, use the
recursive=TRUE
argument of c():
> c(list(matrix(1:4,ncol=2),1:3),recursive=TRUE)
[1] 1 2 3 4 1 2 3
The unlist() and unclass() functions may also be useful.
13
Subscripting
Subscripting in R provides one of the most eective ways to
manipulate and select data from vectors, matrices, data frames and
lists. R supports several types of subscripts:
Empty subscripts - allow modication of an object while
preserving its size and type.
x = 1
creates a new scalar, x, with a value of 1, while
x[] = 1
changes each value of x to 1.
Empty subscripts also allow refering to the i-th column of a
data frame or matrix as matrix[i,] or the j -th row as
matrix[,j]
.
Positive numeric subscripts - work like most computer
languages
The sequence operator (:) can be used to refer to contigious
portions of an object on both the right- and left- hand side of
assignments; arrays can be used to refer to non-contigious
portions.
14
Subscripts (contd)
Negative numeric subscripts - allow exclusion of selected
elements
Zero subscripts - subscripts with a value of zero are ignored
Character subscripts - used as an alternative to numeric
subscripts
Elements of R objects can be named. Use names() for vectors
or lists, dimnames(), rownames() or colnames() for data
frames and matrices. For lists and data frames, the notation
object$name
can also be used.
Logical subscripts - powerful tool for subsetting and modifying
data
A vector of logical subscripts, with the same dimensions as
the object being subscripted, will operate on those elements
for which the subscript is TRUE.
Note:
A matrix indexed with a single subscript is treated as a
vector made by stacking the columns of the matrix.
15
Examples of Subscripting Operations
Suppose x is a 5 × 3 matrix, with column names dened by
dimnames(x) = list(NULL,c("one","two","three"))
x[3,2]
is the element in the 3rd