By the end of this practical lab you will be able to:
R is a statistical programming language which can be downloaded for free and installed on a range of different platforms including Windows, Mac OS and Linux (e.g. Ubuntu). The appropriate installation files for your computer operating system can be downloaded here: cloud.r-project.org. Alternatively, if you are using a managed systems (e.g. at a university or office) you might already find R installed.
To make R function, commands are entered into the R terminal, which returns results. The R download contains a lot of pre-built functions referred to as “base R”.
R has a slightly different interface when running on either Windows, Mac OS or Linux; however, it is very common for R to be used within an Integrated Development Environment (IDE). The most common of these is R Studio which can be downloaded here: www.rstudio.com/products/rstudio/#Desktop. R Studio provides a user friendly interface to R, and helps new users by integrating a series of the components of R into a single window interface.
The R console (terminal) is visible in the bottom left of the interface and commands can be entered directly after the “>”. To run a command, after this is type in you press enter/return. For example, try a simple arithmetic:
1+5
## [1] 6
The basic R functions can be expanded by installing additional “packages”, for example:
#Install / download the package ggplot2
install.packages("ggplot2")
#Loads the package which make the functions contained within it available to use
library(ggplot2)
Sometimes it is useful to store the code you are writing for later reuse. This is quite simple because R code can be stored in text files; these are typically given the extension .R. These files can also be opened within R studio, and can be created using the file menu. Open .R files are shown in the top left window. By highlighting code within this window and clicking the “Run” button, the commands are sent to the console and run. This is a good way of saving your work!. There are also other types of files which can be opened or created within R-Studio that produce an excellent way to embed analysis and interpretation in a single document. These are outside the scope of this practical, however, there are some great tutorials available here: rmarkdown.rstudio.com/.
Before starting any R analysis it is useful to see where your working directory has been set. This is the default location where R looks for files to read in or write out (see the next practical - “Data Manipulation in R”). You can check your working directory using the getwd() function:
getwd()
You can also set this by entering a directory location using the setwd() function:
setwd("CHANGE TO FILE PATH")
Objects in R are a way of storing values which can be returned for reuse later in your code. These use the “<-” symbol. For example:
a <- 5
b <- 10
a + b
## [1] 15
There are a range of different data types which include: numeric, integer, logical, character and complex (which isn’t covered here). The following code illustrates how these variables can be assigned and their values checked. Also - note that we can add comments to our code using the “#” symbol - this is not run by the R compiler.
#Creates a variable called z, which stores a numeric value
z <- 54.8
class(z) #class function returns the class of the object
## [1] "numeric"
#Creates a variable called y, which stores a integer value
y <- 51
y #prints the value of a variable
## [1] 51
#Creates variables c and d, then q which stores the output of the logical query
c <- 5
d <- 2
q <- c < d #stores a true / false if d is greater than c
q
## [1] FALSE
#Creates a variable called s, which stores a character value
s <- "Hello"
is.character(s) #a function to check if an object is a character - returns true or false
## [1] TRUE
Sometimes it is necessary to convert between different object types, for example, numeric to character or vice versa.
u <- 4 #creates a numeric object
as.character(u) #Converts the variable to a character, which is visible in the printed result as the number is surrounded by a double quote
## [1] "4"
i <- "1" #creates a character object
as.numeric(i) #Converts the character object to a numeric
## [1] 1
It is also possible to store multiple values of the same data type within an object which are called vectors. These are created using the “c()” function. For example:
# creates a numeric vector
v_1 <- c(2,3,5,6,7,8,9)
v_2 <- c(4,7,9,12,11,1,3) # creates a second numeric vector
v_1 - v_2 # vectors can be used as variables with operators - this calculates the difference between v_1 and v_2
## [1] -2 -4 -4 -6 -4 7 6
v_1 * 10 #vectors can also be combined with constants
## [1] 20 30 50 60 70 80 90
# creates a character string vector
v_3 <- c("I","like","R","it","is","fun")
It is also possible to extract an element of a vector using an index.
#Returns the 4th element of the vector v_3
v_3[4]
## [1] "it"
A further type of vector which includes mixed objects types are called “lists”.
#Create a mixed list containing two characters, two numbers and a list
v_4 <- list("A", "B", 4, 5, v_1)
Accessing content of the list happens in the same way as a vector.
# Return the second and fourth element of the list
v_4[c(2,4)]
## [[1]]
## [1] "B"
##
## [[2]]
## [1] 5
However, in order to reference part of the list directly a double square bracket is needed. This can also be combined with a second square bracket to pull elements of nested list.
#Return the fifth element of the list directly
v_4[[5]]
## [1] 2 3 5 6 7 8 9
#Return the fifth element of the list directly; then the 3rd and fourth elements of this list
v_4[[5]][c(3,4)]
## [1] 5 6
Factors are a further object type that R uses to manage nominal and ordinal values. These can be created from vectors; for example:
#Create a nominal vector
n <- c("London","New York","London","New York","Shanghai","Chicago","Chicago")
is.character(n)
## [1] TRUE
o <- factor(n)
# The object o is no longer a character vector
is.character(o)
## [1] FALSE
is.factor(o)
## [1] TRUE
It is also possible to check the “levels” used in a factor object; and can also be summarised using the summary function.
levels(o) #returns the levels used int he factor
## [1] "Chicago" "London" "New York" "Shanghai"
summary(o) #returns the count items within the vector coded by each level
## Chicago London New York Shanghai
## 2 2 2 1
Ordinal variables have a slightly different specification; for example:
responses <- c("Very Unhappy", "Unhappy","Very Unhappy", "Unhappy","Very Unhappy", "Unhappy", "Fine", "Happy", "Very Happy", "Fine", "Happy", "Very Happy", "Fine", "Happy", "Very Happy")
p <- factor(responses, levels=c("Very Unhappy", "Unhappy", "Fine", "Happy", "Very Happy"), ordered=TRUE)
table(p) #Works in the same way as summary
## p
## Very Unhappy Unhappy Fine Happy Very Happy
## 3 3 3 3 3
There are also a series of functions within base R that can work with variables or vectors. Details about how to use any R function can be found by using the “?” command. For example, in the case of “is.character”:
?is.character()
#Character functions
length(v_3) #Return the length of a string
## [1] 6
#Find out how many characters each element of a string contains
nchar(v_3)
## [1] 1 4 1 2 2 3
#Create a new variable from the 2nd element of the vector v_3
k <- v_3[2]
#Use the substring function to extract characters between the 2-3rd characters
substr(k,2,3)
## [1] "ik"
# Substitute an element of a string
u <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
gsub("A","Z",u) #Replace the letter A with Z
## [1] "ZBCDEFGHIJKLMNOPQRSTUVWXYZ"
#Find the position of a string within a vector
grep("R",v_3)
## [1] 3
#Split a string on a particular character; returns a two element vector
u_2 <- strsplit(u,"D")
u_2[1] #prints the first element of the vector
## [[1]]
## [1] "ABC" "EFGHIJKLMNOPQRSTUVWXYZ"
#Concatenate strings
paste("A","B")#combines the two strings with a space separating
## [1] "A B"
paste0("A","B")#combines the two strings without a space separating
## [1] "AB"
#Change the case of a string
tolower(u)
## [1] "abcdefghijklmnopqrstuvwxyz"
toupper("hello")
## [1] "HELLO"
#Create a 100 random numbers
h <- runif(100, 0.0, 1.0)
# Min
min(h)
## [1] 0.002913415
# Max
max(h)
## [1] 0.9960857
# Standard Deviation
sd(h)
## [1] 0.2944428
# Mean
mean(h)
## [1] 0.5038988
# Median
median(h)
## [1] 0.4760459
# Range
range(h)
## [1] 0.002913415 0.996085722
#Round Variables
round(5.75)
## [1] 6
#Square Root
sqrt(5)
## [1] 2.236068
#Log10
log10(23)
## [1] 1.361728
It is also possible to generate a range of descriptive statistics using a summary function:
summary(h)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.002913 0.283000 0.476000 0.503900 0.776400 0.996100
There are a range of different operators that can be used to work with variables. These include standard arithmetic operators such as + - / * or exponents can be made using a ^.
1 + 3
## [1] 4
3 - 2
## [1] 1
4 / 7
## [1] 0.5714286
8 * 2
## [1] 16
2^4
## [1] 16
There are also logical operators that include:
#Five less than seven?
5 < 7
## [1] TRUE
#Eight more than 2?
8 > 2
## [1] TRUE
#7 less than or equal to 5
7<=5
## [1] FALSE
#2 more than or equal to 2
2 >= 2
## [1] TRUE
#Is a the same as b
a <- "Hello"
b <- "Goodbye"
a == b
## [1] FALSE
#Is a not equal to b
a != b
## [1] TRUE
#Does a equal Hello, or b equal Dog?
(a == "Hello") | (b == "Dog") #returns true because one side of the OR (|) operator is true
## [1] TRUE
#Does a equal Hello and b equal Dog?
(a == "Hello") & (b == "Dog") #returns false because both sides of the AND (|) operator are not true
## [1] FALSE
In many analysis tasks there is a need to make decisions based upon a condition being met. These use a range of control structures which commonly include if and else. These are used with with parenthesis, with statements to be run between these being open and closed.
a <- 10
b <- 15
if(a > b) {#Tests if a is greater than b (it isn't, so returns false)
a * 10 #this statement would be run only if the if had evaluated true
} else { #the else condition is run because the if statement evaluated as false
a * 20
}
## [1] 200
Throughout this practical you have been creating R objects. If you have been doing this within the R Studio interface you will have seen these being created in the top right hand pane. However, you can also see what objects are within the environment using the ls() command.
ls()
## [1] "a" "b" "c" "d" "h"
## [6] "i" "k" "n" "o" "p"
## [11] "q" "responses" "s" "u" "u_2"
## [16] "v_1" "v_2" "v_3" "v_4" "y"
## [21] "z"
You can also save your R environment so that you can reload this later - this uses the save.image() function:
#Save an image
save.image("Practical_1.RData")
#Load an image
load("Practical_1.RData")