Daniel Vaulot
2020-01-24
1 - Introduction to R
2 - R markdown
3 - Git
4 - Data wrangling
5 - Data visualisation
6 - Data mapping
Who has used R before ?
What other programming language have you used before ?
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Your turn...
Who has used R before ?
What other programming language have you used before ?
For those who are experts in R
Your turn...
Warning
Mid 1970s - S Language for Statistical Computing conceived by John Chambers, Rick Becker, Trevor Hastie, Allan Wilks and others at Bell Labs
Early 1990's - R was first implemented in the early 1990’s by Robert Gentleman and Ross Ihaka, both faculty members at the University of Auckland.
1995 - Open Source Project
1997 - Managed by the R Core Group
2000 - First release of R
2011 - First release of R studio
Experimental design course
print("Hello world")
[1] "Hello world"
print("Hello world")
[1] "Hello world"
Type directly in command window
Create a new script
Type in script window, select and execute (CTRL-R)
> x <- 1> y <- 2> x + y
[1] 3
> x <- 1> y <- 2> x + y
[1] 3
> z <- x + y> z
[1] 3
= can be used instead of <- but refrain from it (not good style)
> z = x + y
= can be used instead of <- but refrain from it (not good style)
> z = x + y
You can view the values of the objects in R-studio environment window (top-right)
> Z
> Z
> Z
Error in eval(expr, envir, enclos): objet 'Z' introuvable
Myvariable
, Myvariable1
, Myvariable.1
,Myvariable-01
are OK1Myvariable
, My-variable
, Myvariable@
are not OKFive conventions
Prefer third one, much more easy to read
character: "Daniel", "This is a course in R", 'Donald'
numeric: 2, 15.5, 10e-3
integer: 2L (the L tells R to store this as an integer)
date: 2018-02-25
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)
character: "Daniel", "This is a course in R", 'Donald'
numeric: 2, 15.5, 10e-3
integer: 2L (the L tells R to store this as an integer)
date: 2018-02-25
logical: TRUE, FALSE
complex: 1+4i (complex numbers with real and imaginary parts)
No data "NA"
Not a number "NaN" (e.g. division by zero)
Vector
List
Matrix
Data frames
Function
The basic R structure is a vector: [102030]
The basic R structure is a vector: [102030]
A vector can contain only a single element [10]
The basic R structure is a vector: [102030]
A vector can contain only a single element [10]
x <- 10x
[1] 10
x <- c(10, 20, 30)x
[1] 10 20 30
x <- c(10, 20, 30)x
[1] 10 20 30
x <- 10:30x
[1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
PoTU <- c("Donald", "Trump")PoTU
[1] "Donald" "Trump"
flags <- c(TRUE, FALSE, TRUE)flags
[1] TRUE FALSE TRUE
x[1]
[1] 10
x[1]
[1] 10
x[1:5]
[1] 10 11 12 13 14
x[1]
[1] 10
x[1:5]
[1] 10 11 12 13 14
x[-1]
[1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Apply functions (we will come back to functions latter)
typeof(x)length(x)
Apply functions (we will come back to functions latter)
typeof(x)length(x)
[1] "integer"
[1] 21
Apply functions (we will come back to functions latter)
typeof(x)length(x)
[1] "integer"
[1] 21
What is the type and length of PoTU ?
Operator | Description |
---|---|
+ | addition |
- | subtraction |
* | multiplication |
/ | division |
^ or ** | exponentiation |
x %% y | modulus (x mod y) 5%%2 is 1 |
x %/% y | integer division 5%/%2 is 2 |
We are performing vector operations !
[123..]+[123..]=[246..]
Vector one element
x <- 1 y <- 2 z <- x + y z
[1] 3
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Vector several elements
# Two instructions on the same line x <- 1:9; y <- 1:9 z <- x + y z
[1] 2 4 6 8 10 12 14 16 18
Use the other operators
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
[1] 2 3 4 5 6 7 8 9 10
What happens when the vectors have different number of elements ?
x <- 1:9y <- 1z <- x + yz
[1] 2 3 4 5 6 7 8 9 10
Equivalent to
y <- c(1, 1, 1, 1, 1, 1, 1, 1, 1)
The recycling rule...
x <- TRUEy <- FALSEz <- x + yz
x <- TRUEy <- FALSEz <- x + yz
[1] 1
No error but...
The resulting variable is transformed to a numeric
How you would show that ?
No error but...
The resulting variable is transformed to a numeric
How you would show that ?
typeof(x)
[1] "logical"
typeof(z)
[1] "integer"
Operator | Description |
---|---|
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | exactly equal to |
!= | not equal to |
!x | Not x |
x | y | x OR y |
x & y | x AND y |
isTRUE(x) | test if X is TRUE |
x <- TRUEy <- FALSEz1 <- x | yz2 <- x == y
x <- TRUEy <- FALSEz1 <- x | yz2 <- x == y
[1] TRUE
[1] FALSE
Do not mix
first <- "Donald"last <- "Trump"full <- first + last
first <- "Donald"last <- "Trump"full <- first + last
Generates an error
Error in first + last: argument non numérique pour un opérateur binaire
first <- "Donald"last <- "Trump"full <- first + last
Generates an error
Error in first + last: argument non numérique pour un opérateur binaire
What can we do ?
Functions perform specific task on objects
Functions perform specific task on objects
paste(first, last)
[1] "Donald Trump"
Functions perform specific task on objects
paste(first, last)
[1] "Donald Trump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste() # Do not forget the parenthesis
Functions perform specific task on objects
paste(first, last)
[1] "Donald Trump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste() # Do not forget the parenthesis
What happened ?
Functions perform specific task on objects
paste(first, last)
[1] "Donald Trump"
Functions take arguments and return an object called result
To know the arguments use ?
? paste() # Do not forget the parenthesis
What happened ?
Let's apply paste :
paste(first, last)
[1] "Donald Trump"
Let's apply paste :
paste(first, last)
[1] "Donald Trump"
paste(first, last, sep = "_")
[1] "Donald_Trump"
If you write 3 times the same piece of code write a function...
my_sum <- function(first_number, second_number) { c <- first_number + second_number return(c)}
If you write 3 times the same piece of code write a function...
my_sum <- function(first_number, second_number) { c <- first_number + second_number return(c)}
my_sum <- function(first_number, second_number) {first_number + second_number}
my_sum(10, 20)
[1] 30
my_sum(10, 20)
[1] 30
my_sum(first_number = 10, second_number = 20)
[1] 30
Most of the time you do not have to write functions because someone has already written one for what you want to do...
x <- 1:100sum(x)
[1] 5050
Most of the time you do not have to write functions because someone has already written one for what you want to do...
x <- 1:100sum(x)
[1] 5050
y <- rnorm(10, mean = 0, sd = 1)y
[1] 1.6731915 -0.2498182 1.0774267 0.7086024 -0.3065056 2.2168636 [7] 0.7236404 0.2397608 -0.1206248 -1.2904100
mean(y)
[1] 0.4672127
sd(y)
[1] 1.033406
mean(y)
[1] 0.4672127
sd(y)
[1] 1.033406
Sample more points... 10,000 instead of 100
y <- rnorm(10000, mean = 0, sd = 1)mean(y)
[1] -0.005369128
sd(y)
[1] 0.9956479
Histogram
library(graphics)hist(y)
Packages are set of functions that have a common goal
They are really the strength of R
And these are only the "official"" packages. You can find more on GitHub
Download on your computer the package you need
Install package stringr (to manipulate strings of characters)
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
library(stringr)str_c(first, last, sep = " ")
[1] "Donald Trump"
To use functions from the package
package::function
stringr::str_c(first, last, sep = " ")
[1] "Donald Trump"
library(stringr)str_c(first, last, sep = " ")
[1] "Donald Trump"
Sometimes functions from different libraries have similar names
List
Matrix
Factors
Data frames
df <- data.frame(label = letters[1:6], id = 1:6, value = rnorm(6, mean = 0, sd = 1), flag=c(TRUE, FALSE), # recycling rule stringsAsFactors = FALSE) df
label id value flag1 a 1 -0.5324537 TRUE2 b 2 -0.2858342 FALSE3 c 3 -0.7311013 TRUE4 d 4 -1.2440367 FALSE5 e 5 -0.8671309 TRUE6 f 6 -2.4274570 FALSE
stringsAsFactors = FALSE
dim(df) # returns the dimensions of data frame
[1] 6 4
nrow(df) # number of rows
[1] 6
ncol(df) # number of columns
[1] 4
str(df) # structure of data frame - name, type and preview of data in each column
'data.frame': 6 obs. of 4 variables: $ label: chr "a" "b" "c" "d" ... $ id : int 1 2 3 4 5 6 $ value: num -0.532 -0.286 -0.731 -1.244 -0.867 ... $ flag : logi TRUE FALSE TRUE FALSE TRUE FALSE
colnames(df) # columns names
[1] "label" "id" "value" "flag"
df[i,j]
notation, first index corresponds to row, second index to columndf[5, 3]
[1] -0.8671309
df[i,j]
notation, first index corresponds to row, second index to columndf[5, 3]
[1] -0.8671309
df[5, "value"]
[1] -0.8671309
df[i,j]
notationdf[, 3]
[1] -0.5324537 -0.2858342 -0.7311013 -1.2440367 -0.8671309 -2.4274570
df[, "value"]
[1] -0.5324537 -0.2858342 -0.7311013 -1.2440367 -0.8671309 -2.4274570
$
notationdf$value
[1] -0.5324537 -0.2858342 -0.7311013 -1.2440367 -0.8671309 -2.4274570
$
notationdf$value
[1] -0.5324537 -0.2858342 -0.7311013 -1.2440367 -0.8671309 -2.4274570
$
for the column, [i]
for the rowdf$value[5]
[1] -0.8671309
df[i,j]
notationdf[1, ]
label id value flag1 a 1 -0.5324537 TRUE
df[df$id <= 3,]
label id value flag1 a 1 -0.5324537 TRUE2 b 2 -0.2858342 FALSE3 c 3 -0.7311013 TRUE
Select lines for which the label is c
df[df$id <= 3,]
label id value flag1 a 1 -0.5324537 TRUE2 b 2 -0.2858342 FALSE3 c 3 -0.7311013 TRUE
Select lines for which the label is c
df[df$label == "c", ]
label id value flag3 c 3 -0.7311013 TRUE
This syntax is complicated - tidyverse packages make it much more easy to manipulate and remember
What you will learn :
Please install the following packages and their dependencies
Installation : https://bookdown.org/yihui/rmarkdown/installation.html
1 - Introduction to R
2 - R markdown
3 - Git
4 - Data wrangling
5 - Data visualisation
6 - Data mapping
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |